Ben Morris' notebook: How do you make a programming language?

Without going into the philosophical details of "creating" a language (is a language "invented", or is it an abstract concept that always exists and is "discovered?"), I'm going to give a brief, high-level review of how a programming language interpreter is created. It's actually pretty simple to create a simple interpreter. There are two main steps: parsing and evaluating.

1. Parsing is the process of reading the text in a code file and breaking it down into expressions for your interpreter to evaluate. For example, a Scotch code file is read and deciphered into Haskell data structures: 1 + 2 + 3 becomes Add (Add (NumInt 1) (NumInt 2)) (NumInt 3), where "Add (expression) (expression)" symbolizes addition and "NumInt (number)" represents an integer.

Parsing is rather complicated, but it's made much easier if you take advantage of some of the existing libraries - I use Parsec, one of the most popular parser combinator libraries. Parsing can also be very slow. To solve this problem, when I parse a code file, I save a binary representation of the result in a new file; I only re-parse if there have been changes to the file.

2. Evaluating means breaking an expression tree down until you're left with a meaningful result. The expression above can be represented as a tree like this:

+
+ 3
1 2

The most top level expression is addition of an expression and 3. Before I can evaluate this, I need to evaluate the second addition expression, 1 + 2, which can be evaluated to 3. My tree is now simpler:

+
3 3

This can be fully evaluated to 6 and the result can be returned.

Finally, there's a process known as "bootstrapping" in which a programming language becomes "self-hosting." Scotch is not self-hosting, because it's implemented in another language, Haskell. Many popular languages (Python, Ruby) are implemented on top of another language, which is often C. Haskell is an example of a self-hosting language, because it's actually implemented in Haskell. There can be performance benefits to self-hosting languages, but it seems at first to present a kind of chicken-and-egg problem. So, how does a language become self-hosting?

1. First, an interpreter is created, using some existing language.
2. A compiler for the new language is written in the new language itself.
3. The interpreter is used to run the compiler on itself, effectively compiling itself.

You now have a compiled program that can compile other programs, free of any intermediate language.

If you want to explore the technical details of programming language implementation, feel free to browse the Scotch source code, which is available under the GNU General Public License.

Ben Morris' notebook

Pages

Tuesday, May 17, 2011

How do you make a programming language?

No comments:

Post a Comment