Sunday, December 19, 2010

Intro to Scotch

Over this Christmas break, I've been developing a new programming language, Scotch (available at Google Code GitHub). It started out as just an exercise to understand language implementations better but I got a little carried away. The goal is to create a functional programming language that's easy to use to quickly throw something together with, with highly readable syntax a la Python. Here are a few of the language's cool features, especially for those who are unfamiliar with functional programming:

First, assignment can be done lazily or eagerly. Imperative languages (most popular languages nowadays) assign values eagerly, which means that when you say a = b you're really defining a as the current value of b. If b changes, a will remain the same. With lazy evaluation, a is simply defined as the expression b, which will only be evaluated when a is needed. You can do things like this that would raise exceptions in imperative languages:

>> a = b
>> b = 6
>> a
6

(Note: >> designates a line entered in the interpreter. Anything beneath is the value returned.)

So, variables can be defined in an order-agnostic way. I deviate from absolute pure functional languages like Haskell in that variables and functions are allowed to be redefined; in practical applications, some state is unavoidable, so Scotch allows state without having to pull teeth.

Pattern matching is also a really cool way to represent functions. Say, for example, we're constructing a factorial function (not especially useful, but I like to use this as an easy example.) Based on the definition of factorial, we can define it like this:

fact(n) = n * fact(n-1)

The result is a recursive function that calls itself with smaller and smaller values of n. Unfortunately, this function will continue forever, calling itself with values down to negative infinity. How do we tell it to end? In an imperative language, you could use if, like so:

fact(n) = if n == 0 then 1 else n * fact(n-1)

But, with pattern matching, we can define the function differently when the value of n is 0, like this:

fact(n) = n * fact(n-1)
fact(0) = 1

When the function is called, it checks the most recent definition to see if it's a match: if n is 0, it will match and return 1. If n is not 0, it will match to the next definition, where n can be anything. Not only is this version shorter, but to me, the meaning is much clearer here - and as the complexity of the function increases beyond that of a simple factorial function, the benefits to readability become more obvious.

Just about the entire Scotch standard library is written in this fashion, using (tail) recursive functions. Any function, from getting the length of a string to getting the rightmost n characters, can usually be represented as a general definition plus a base case that tells the recursion when to stop.

Patterns can also be used to split lists and strings. For example,

head(h:t) = h
tail(h:t) = t

In this example, the pattern is two variables, separated by a colon. When a list or string is passed as an argument to these functions, the first element/character becomes h, and the rest becomes t. This means you can do what you want with h, then call the same function on the rest of the list/string, to manipulate a string or list item by item.

Finally, in Scotch, functions are values, and so are partially applied functions. Let's look at a few examples of passing functions as arguments to other functions:

>> apply(f, x) = f(x)
>> g(n) = n
>> apply(g, 10)
10

The function g is passed as an argument to the function apply, and then called with the argument 10, returning g(10) which equals 10.

Partially applied functions are function calls to which we will add more arguments later. For example:

>> apply(f, x) = f(x)
>> add(x, y) = x + y
>> apply(add(10), 20)
30

Normally, add(10) wouldn't be a valid function call, because there's no definition of the add function that only takes one argument. But in this case, add(10), a partially applied function, is called with another argument, 20, resulting in add(10, 20), or 30.

For developers coming from imperative languages, imperative-style procedures ("procs") can be defined like so:

print_number(n) = do  a = "The number is " + n;
                      print a;

This capability relieves a bit of a pain point with pure functional languages such as Haskell, the use of monads to handle IO. Here, IO can be interspersed within regular code, which, among other things, makes quick debugging much easier.

Notice, also, that Scotch is weakly typed, meaning a number "n" can be added to a string without having to explicitly convert it to a string.

Anyway, that's Scotch so far. There are still some things that need to be implemented - notably concurrency, systems programming, and non-file I/O.

Monday, December 13, 2010

Introducing: the Scotch programming language

I worked through the initial chapters of Programming Languages: Application and Interpretation by Dr. Shriram Krishnamurthi of Brown recently. It was meant to be just an exercise, but I got really into it. From the beginning I did things completely differently from the way they're done in the book - I did everything in Haskell instead of Scheme, which involved writing a parser (definitely more involved than I realized), and my functions and variables were lazily evaluated from the start and utilized a much different form of lexical scope. I ended up coding up a working language interpreter and thus my new programming language, which I'm calling Scotch, was born. ("Scotch" is a veiled reference to the Glasgow Haskell Compiler, or whiskey, or scotch tape, or whatever else you want it to be.)

I'm convinced that functional programming is the way of the future, but functional languages sometimes seem pretty scary to newcomers (Lisp's parentheses, Haskell's monads, etc.) Functional programming needs a language that can do what Ruby and Python did for object oriented programming, a quick "prototyping" language with lots of power and few rules. The idea behind Scotch is to create a language with the readability and syntax of Python, but replacing support for object oriented programming (completely, to discourage its use) with functional tools such as lazy evaluation and recursion.

I'm trying also to deal with some common complaints about functional languages. For example, there is a simple print statement, and the syntax looks very similar to Python, although there are differences behind the scenes. Also, functions and variables can be defined more than once, so variables are capable of incrementing, although the default assignment is lazy so too much reliance on state is discouraged. And duck typing, one of the features which makes scripting languages so convenient, is also available.

The interpreter works, although you couldn't do much with it just yet. You can check it out on the Scotch page at Google Code.

Hopefully I'll be able to implement some more features to make it usable before I get too busy with next semester's classes.

Tuesday, December 7, 2010

Science's Worst Nightmare

Science's worst nightmare: being regulated by Joe Citizen.

Watch this video by Republican House whip Eric Cantor, and tell me if you're as terrified for the future of science in America as I am.

Cantor is on a personal mission to eliminate "wasteful spending" in government. To that end, he's encouraging "citizen review" of government agencies - starting with the NSF. Specifically, funded research projects that Cantor doesn't agree are useful. Cantor names the "hard sciences" as a worthy recipient of funding - according to him, this means physics, chemistry, and medicine - only fields in which a Nobel prize could be earned. I guess everything from ecology to psychology, sociology, linguistics... are just wasteful spending. He also encourages vigilant citizens to turn in individual award numbers of awards he feels are wasteful, as if he's building a case against the NSF.

The reason this worries me is, frankly, that the American public shows itself time and time again to be, on average, quite stupid and ignorant. There's a reason that NSF panels of actual scientists decide on the merit of awards, not Joe Citizen. When you consider the fact that 50% of Americans don't believe in evolution or the big bang, 20% (one in five) don't believe that the earth orbits the sun, and 10% don't believe in continental drift (source), this is probably not the place for the unschooled masses to dictate policy. Your average citizen is not in a position to judge science based on its merits.

Can you just see it? "This research is about some kind of fish or something! What do I care about fish? This is a waste of my tax money!" Glenn Beck then picks up the fish research story - what an absurd example of government waste! Downvoted.

I think Biology, because of the political nature it has somehow acquired, is in serious jeopardy because of proposals like this. I also wonder where Cantor thinks Computer Science falls on the usefulness scale. In the video, he specifically derides a project involving the development of a computer program, and the way he says those words doesn't give me a lot of hope on that front either.

Coincidentally, I made a comment about this a month ago in regard to a vote taken about whether or not the public wanted a county library system in Cache County, Utah -

In a republic, the idea stands on its own merits, not based on what people think of it, and I'm very glad for that - I shudder to think what would happen if the general public voted every publicly funded project up or down. If the political situation in Utah is any indicator of the rest of the country, I'm sure research in "liberal" fields such as ecology would be one of the first things to go.
 Looks like I spoke too soon. It might be time to start looking at grad programs at Cambridge...