Ben Morris' notebook

Tuesday, May 17, 2011

How do you make a programming language?

Without going into the philosophical details of "creating" a language (is a language "invented", or is it an abstract concept that always exists and is "discovered?"), I'm going to give a brief, high-level review of how a programming language interpreter is created. It's actually pretty simple to create a simple interpreter. There are two main steps: parsing and evaluating.

1. Parsing is the process of reading the text in a code file and breaking it down into expressions for your interpreter to evaluate. For example, a Scotch code file is read and deciphered into Haskell data structures: 1 + 2 + 3 becomes Add (Add (NumInt 1) (NumInt 2)) (NumInt 3), where "Add (expression) (expression)" symbolizes addition and "NumInt (number)" represents an integer.

Parsing is rather complicated, but it's made much easier if you take advantage of some of the existing libraries - I use Parsec, one of the most popular parser combinator libraries. Parsing can also be very slow. To solve this problem, when I parse a code file, I save a binary representation of the result in a new file; I only re-parse if there have been changes to the file.

2. Evaluating means breaking an expression tree down until you're left with a meaningful result. The expression above can be represented as a tree like this:

+
+ 3
1 2

The most top level expression is addition of an expression and 3. Before I can evaluate this, I need to evaluate the second addition expression, 1 + 2, which can be evaluated to 3. My tree is now simpler:

+
3 3

This can be fully evaluated to 6 and the result can be returned.

Finally, there's a process known as "bootstrapping" in which a programming language becomes "self-hosting." Scotch is not self-hosting, because it's implemented in another language, Haskell. Many popular languages (Python, Ruby) are implemented on top of another language, which is often C. Haskell is an example of a self-hosting language, because it's actually implemented in Haskell. There can be performance benefits to self-hosting languages, but it seems at first to present a kind of chicken-and-egg problem. So, how does a language become self-hosting?

1. First, an interpreter is created, using some existing language.
2. A compiler for the new language is written in the new language itself.
3. The interpreter is used to run the compiler on itself, effectively compiling itself.

You now have a compiled program that can compile other programs, free of any intermediate language.

If you want to explore the technical details of programming language implementation, feel free to browse the Scotch source code, which is available under the GNU General Public License.

Sunday, May 8, 2011

Science Denial

Vaccination, global warming...why do we have such a pervasive culture of distrusting scientists?

That's not to say that you should blindly accept anything. But when smart people who study something for a living all agree about something (say, that there's no link between autism and vaccination) and you can analyze the data yourself and it clearly points to the same conclusions (like a study of 537,000 Danish children that showed no difference in autism rates between vaccinated and unvaccinated children...) I just don't understand the arrogance it must take for someone to ignore that and think they somehow know better, based on the word of some politician or celebrity. We have outbreaks in Utah of both pertussis and measles because people think they know better than doctors and scientists.

I guess Galileo would argue that this isn't a new problem, though.

Thursday, March 17, 2011

The baby

Ruben James Morris, born 3/16, weighed 7 lbs, 13 oz. Happy birthday, Ruben!

Tuesday, February 15, 2011

Utah legislature wants to eliminate tenure

Sometimes it seems like Utah state legislators are in a competition to see who can pass the stupidest bill. Chris Herrod, R-Provo, recently threw his hat into the ring with a call to eliminate tenure for Utah professors.

There's a reason professors get tenure. Academics need to be free to pursue their research without fear of being fired for political reasons. In a state whose legislature frequently passes bills to make political statements on scientific issues, such as formally questioning global warming, this is absolutely essential for unbiased research to be carried out.

The optimist in me thinks this bill will find little support. Should it pass, it is going to be a huge blow to Utah higher education. Current tenured and tenure-track professors would not be affected, but Utah's two research institutions, University of Utah and Utah State University, would have a tough time recruiting new talent. Why would a professor choose a non-tenure track position in Utah when there are 49 states that are willing to offer tenure?

The bill is Utah House Bill 485, and its status can be tracked here.

2/24/11 update: the bill officially died in committee, though it was not without support; the House Education committee vote was 9-3 against, with 3 abstaining.

Herrod: "I don't understand the controversy."

Thursday, February 10, 2011

Infinite Functions

A cool recent development in Scotch: functions that never end, which can be used in combinination with take to get a specific number of results and stop evaluating.

There are a few functions now that use "filter" on an infinite list, i.e.

evens = filter(even, [1..])

which returns all even numbers, which there are infinitely many of. If you try to evaluate "evens" the interpreter will hang, trying to call the "even" function on an infinite list.

Unfortunately, it's impossible for Scotch to figure out that this is an infinite list; that would be equivalent to solving the halting problem, an undecidable problem in computer science. It's impossible to even tell that the evaluation of [1..] does not terminate. Why? Well, for starters, how can we tell that [1..] is not just [1..1000]? We can only tell when the evaluation of [1..] reaches 1001. But then we don't know that it's not just [1..1002]. Basically, you can't predict future behavior of the list; you can only know that it has terminated. If it never terminates, you have no way of knowing it never will. So, Scotch will happily try to evaluate this infinite function.

In practice, infinite functions can be used like this:

take 100 from evens

or

take 1000 from primes

to get a set number of results from a function that never terminates.

I also tweaked things so that using take with the sum of a list and something else will evaluate to just take from the list if it's long enough. So infinite functions that involve addition, like this,

infinite_range(n) = [n] + infinite_range(n + 1)

can be used in combination with take to get a set number of results:

>> take 10 from infinite_range(1)
[1,2,3,4,5,6,7,8,9,10]

This type of expression required a different approach to be evaluable. The function, when called, would evaluate like this:

[1] + infinite_range(2)
[1] + ([2] + infinite_range(3))
[1] + ([2] + ([3] + infinite_range(4)))
...

The problem here is that the never terminating (and therefore never fully evaluable) function infinite_range is always trapped inside parentheses with the elements that need to be added to the list, so this function could never be evaluated.

The solution was to automatically rewrite addition expressions associatively: a + (b + c) should always be rewritten as (a + b) + c. This allows infinite sums like infinite_range to evaluate to [n, n+1, n+2 ... n+m] + infinite_range(n+m+1) which can be used in combination with "take m" to get the list of m elements.

Thursday, February 3, 2011

Cross-platform Package Manager

One of the reasons I love Ubuntu is apt-get and the huge software repositories that are set up by default. As a developer, I install a lot of software, and being able to install a library or application and all its dependencies with one terminal command is a huge time saver.

So my question - why is there no native package repository on Windows/Mac? And really, why isn't there a cross-platform packaging system yet? (I've seen some attempts to create an "apt-get for Windows" but they all seem to have died out.) Developers could release a single package that would work for any supported OS and be done. The system itself could deal with platform-specific issues so the developer didn't have to. Just a thought.

Friday, January 28, 2011

Scotch 0.3.0 is here

I've been working hard lately to hit all of my release goals so I could get a new, improved version of Scotch out the door.

You can read about the changes (or the basics) in the new documentation. Aside from a lot of new functionality, one improvement that I'm really happy about is just-in-time compilation. This results in a dramatic speedup by greatly reducing the number of times each module is parsed. Previously, parsing accounted for all of the major cost centers affecting running time. A big culprit was std.lib, which is loaded at startup and then reloaded into every other imported module as it was interpreted. JIT compilation has resulted in about a 10x speedup for the cases I tested, making Scotch once again faster (again, for specific benchmarks being tested) than object oriented languages in its class such as Python and Ruby.

A Scotch website is in the works, but for now you can head to http://scotchlang.org which is a redirect to the front Wiki page of the Github project. Downloads are available here. And, check out the build instructions here.

Pages