Ben Morris' notebook: 2010

Sunday, December 19, 2010

Intro to Scotch

Over this Christmas break, I've been developing a new programming language, Scotch (available at ~~Google Code~~ GitHub). It started out as just an exercise to understand language implementations better but I got a little carried away. The goal is to create a functional programming language that's easy to use to quickly throw something together with, with highly readable syntax a la Python. Here are a few of the language's cool features, especially for those who are unfamiliar with functional programming:

First, assignment can be done lazily or eagerly. Imperative languages (most popular languages nowadays) assign values eagerly, which means that when you say a = b you're really defining a as the current value of b. If b changes, a will remain the same. With lazy evaluation, a is simply defined as the expression b, which will only be evaluated when a is needed. You can do things like this that would raise exceptions in imperative languages:

>> a = b

>> b = 6

>> a

6

(Note: >> designates a line entered in the interpreter. Anything beneath is the value returned.)

So, variables can be defined in an order-agnostic way. I deviate from absolute pure functional languages like Haskell in that variables and functions are allowed to be redefined; in practical applications, some state is unavoidable, so Scotch allows state without having to pull teeth.

Pattern matching is also a really cool way to represent functions. Say, for example, we're constructing a factorial function (not especially useful, but I like to use this as an easy example.) Based on the definition of factorial, we can define it like this:

fact(n) = n * fact(n-1)

The result is a recursive function that calls itself with smaller and smaller values of n. Unfortunately, this function will continue forever, calling itself with values down to negative infinity. How do we tell it to end? In an imperative language, you could use if, like so:

fact(n) = if n == 0 then 1 else n * fact(n-1)

But, with pattern matching, we can define the function differently when the value of n is 0, like this:

fact(n) = n * fact(n-1)

fact(0) = 1

When the function is called, it checks the most recent definition to see if it's a match: if n is 0, it will match and return 1. If n is not 0, it will match to the next definition, where n can be anything. Not only is this version shorter, but to me, the meaning is much clearer here - and as the complexity of the function increases beyond that of a simple factorial function, the benefits to readability become more obvious.

Just about the entire Scotch standard library is written in this fashion, using (tail) recursive functions. Any function, from getting the length of a string to getting the rightmost n characters, can usually be represented as a general definition plus a base case that tells the recursion when to stop.

Patterns can also be used to split lists and strings. For example,

head(h:t) = h

tail(h:t) = t

In this example, the pattern is two variables, separated by a colon. When a list or string is passed as an argument to these functions, the first element/character becomes h, and the rest becomes t. This means you can do what you want with h, then call the same function on the rest of the list/string, to manipulate a string or list item by item.

Finally, in Scotch, functions are values, and so are partially applied functions. Let's look at a few examples of passing functions as arguments to other functions:

>> apply(f, x) = f(x)

>> g(n) = n

>> apply(g, 10)

10

The function g is passed as an argument to the function apply, and then called with the argument 10, returning g(10) which equals 10.

Partially applied functions are function calls to which we will add more arguments later. For example:

>> apply(f, x) = f(x)

>> add(x, y) = x + y

>> apply(add(10), 20)

30

Normally, add(10) wouldn't be a valid function call, because there's no definition of the add function that only takes one argument. But in this case, add(10), a partially applied function, is called with another argument, 20, resulting in add(10, 20), or 30.

For developers coming from imperative languages, imperative-style procedures ("procs") can be defined like so:

print_number(n) = do  a = "The number is " + n;

                      print a;

This capability relieves a bit of a pain point with pure functional languages such as Haskell, the use of monads to handle IO. Here, IO can be interspersed within regular code, which, among other things, makes quick debugging much easier.

Notice, also, that Scotch is weakly typed, meaning a number "n" can be added to a string without having to explicitly convert it to a string.

Anyway, that's Scotch so far. There are still some things that need to be implemented - notably concurrency, systems programming, and non-file I/O.

Monday, December 13, 2010

Introducing: the Scotch programming language

I worked through the initial chapters of Programming Languages: Application and Interpretation by Dr. Shriram Krishnamurthi of Brown recently. It was meant to be just an exercise, but I got really into it. From the beginning I did things completely differently from the way they're done in the book - I did everything in Haskell instead of Scheme, which involved writing a parser (definitely more involved than I realized), and my functions and variables were lazily evaluated from the start and utilized a much different form of lexical scope. I ended up coding up a working language interpreter and thus my new programming language, which I'm calling Scotch, was born. ("Scotch" is a veiled reference to the Glasgow Haskell Compiler, or whiskey, or scotch tape, or whatever else you want it to be.)

I'm convinced that functional programming is the way of the future, but functional languages sometimes seem pretty scary to newcomers (Lisp's parentheses, Haskell's monads, etc.) Functional programming needs a language that can do what Ruby and Python did for object oriented programming, a quick "prototyping" language with lots of power and few rules. The idea behind Scotch is to create a language with the readability and syntax of Python, but replacing support for object oriented programming (completely, to discourage its use) with functional tools such as lazy evaluation and recursion.

I'm trying also to deal with some common complaints about functional languages. For example, there is a simple print statement, and the syntax looks very similar to Python, although there are differences behind the scenes. Also, functions and variables can be defined more than once, so variables are capable of incrementing, although the default assignment is lazy so too much reliance on state is discouraged. And duck typing, one of the features which makes scripting languages so convenient, is also available.

The interpreter works, although you couldn't do much with it just yet. You can check it out on the Scotch page at Google Code.

Hopefully I'll be able to implement some more features to make it usable before I get too busy with next semester's classes.

Tuesday, December 7, 2010

Science's Worst Nightmare

Science's worst nightmare: being regulated by Joe Citizen.

Watch this video by Republican House whip Eric Cantor, and tell me if you're as terrified for the future of science in America as I am.

Cantor is on a personal mission to eliminate "wasteful spending" in government. To that end, he's encouraging "citizen review" of government agencies - starting with the NSF. Specifically, funded research projects that Cantor doesn't agree are useful. Cantor names the "hard sciences" as a worthy recipient of funding - according to him, this means physics, chemistry, and medicine - only fields in which a Nobel prize could be earned. I guess everything from ecology to psychology, sociology, linguistics... are just wasteful spending. He also encourages vigilant citizens to turn in individual award numbers of awards he feels are wasteful, as if he's building a case against the NSF.

The reason this worries me is, frankly, that the American public shows itself time and time again to be, on average, quite stupid and ignorant. There's a reason that NSF panels of actual scientists decide on the merit of awards, not Joe Citizen. When you consider the fact that 50% of Americans don't believe in evolution or the big bang, 20% (one in five) don't believe that the earth orbits the sun, and 10% don't believe in continental drift (source), this is probably not the place for the unschooled masses to dictate policy. Your average citizen is not in a position to judge science based on its merits.

Can you just see it? "This research is about some kind of fish or something! What do I care about fish? This is a waste of my tax money!" Glenn Beck then picks up the fish research story - what an absurd example of government waste! Downvoted.

I think Biology, because of the political nature it has somehow acquired, is in serious jeopardy because of proposals like this. I also wonder where Cantor thinks Computer Science falls on the usefulness scale. In the video, he specifically derides a project involving the development of a computer program, and the way he says those words doesn't give me a lot of hope on that front either.

Coincidentally, I made a comment about this a month ago in regard to a vote taken about whether or not the public wanted a county library system in Cache County, Utah -

In a republic, the idea stands on its own merits, not based on what people think of it, and I'm very glad for that - I shudder to think what would happen if the general public voted every publicly funded project up or down. If the political situation in Utah is any indicator of the rest of the country, I'm sure research in "liberal" fields such as ecology would be one of the first things to go.

Looks like I spoke too soon. It might be time to start looking at grad programs at Cambridge...

Sunday, November 28, 2010

Rethinking Math Education

I just watched Conrad Wolfram's (WolframAlpha) TED talk called "Teaching kids real math with computers" and thought he had some great ideas. Wolfram advocates teaching kids to solve math problems like they will in the real world, with computers, instead of stressing learning how to calculate by hand which in today's world is more or less useless.

The topic of math education reminds me of my own experience with public school math. In 3rd through 6th grade I was in a "gifted" program where, among other things, we were allowed to go at our own pace in math. The minimum amount of assignments that needed to be turned in a week was set, but you could complete as many as you wanted beyond that. As a result, I was working on the 6th grade math curriculum in 4th grade, and by 5th grade had finished it and worked through books on algebra, geometry, and trigonometry. In 6th grade, my teacher, having run out of books, assigned me and another student to build a scale model of the school - which was a little too much freedom for a couple of 6th graders, because we never actually got anything done that year.

After 6th grade, the system started to fail me. In 7th grade I easily tested into an "honors" math class, which consisted solely of arithmetic and some "pre-algebra." For the next several years I took pre-algebra (again - after moving and switching schools, this was once again the most advanced option), geometry, algebra, and pre-calculus, covering very little material beyond what I had taught myself in elementary school. In 12th grade I took AP Calculus, dropped the class halfway due to boredom and still managed to pass both parts of the AP test - because I understood the fundamentals, and didn't need to spend half a year drilling them.

The lack of advanced options for smart students, in my experience, turns them into slackers. There is absolutely no merit in doing three years' worth (or even a single year) of drills in "pre-algebra" once you grasp the concepts you're working with. It becomes grunt work, and this is why, until college, I viewed math as something that I was naturally good at but found dreadfully boring.

Anyway, I think that Wolfram's suggestions have a lot of potential for keeping advanced students more involved in learning about math. I liked the idea of being given free reign to solve a real problem - for example, "what type of car insurance should I buy?" This is a real world application of math that will stick with a student much longer than quadratic equation drills. Give the students some parameters (what type of car, mileage) and let them gather data and solve the problem on their own. I think this type of approach to education is needed to generate students that are computationally literate - which is more important in today's world than being able to crunch numbers by hand.

Saturday, November 27, 2010

Eliminate the Computer Science major

I believe we should do away with Computer Science as a field of undergraduate study - at least, the way it's implemented right now.

It's a given that, generally, a Geology major is studying to be a geologist. Likewise, a Biology major is training to become a biologist, and an Art major will probably end up an artist of some kind.

Computer Science students, however, don't go to college to become computer scientists. They go to learn "programming."

Theoretical computer science involves complex math and theory that is useful and interesting and takes a sharp mind to fully comprehend; most undergrads don't care and increasingly are not being exposed to it. Instead, popular languages are being taught in undergraduate courses, with the intention of preparing students for future work (see Joel Spolsky's The Perils of JavaSchools for another take on this phenomenon.)

The sad thing is, having worked one-on-one with several recent Computer Science graduates, I don't believe that they're learning much about programming, either. Students with a passion for programming study it in their free time, work on their own projects, and do most of their learning outside of the classroom. Fresh graduates with a BS in Computer Science and no real experience programming just don't have the skills they need to do anything but grunt work. As an undergraduate Biology student, my first CS course was on bioinformatics algorithms and had several Masters students, and when I was able to implement the algorithms quicker and more efficiently than CS students, I saw how lacking my school's curriculum was.

Solutions thus far have been to dumb down the CS degree by using higher level ("easier") programming languages and teaching less theory and advanced math. I think that aiming to dumb down the curriculum in order to prepare students for a corporate programming career is killing off the pool of intelligent academic computer scientists. The solution? Accept that there's a difference and offer two distinct majors: Computer Science and Programming.

The immediate result would probably be a huge shift of students from CS to Programming. Maybe some CS departments would cease to exist due to lack of interest. Programming students would learn the basics they need to succeed in the corporate world (similar to some "Information Systems Management" programs, but with a more straightforward focus on the technical aspects of development.) But smart CS students wouldn't be hampered by the dumbing down of their programs of study. They'd be exposed to more current topics in research and have the opportunity to take more advanced courses. They'd be allowed to choose from emphasis areas more in line with the current directions of academia. The Computer Science department would be a fraction of its current size but the per capita ability and motivation would go up, and these students would go on to be the brilliant researchers of the future. Their peers in the Programming department could get the high-paying job at the corporation of their choice, or start their own companies.

Let's accept that there's a difference between the two types of student, and start treating them differently.

Edit: Based on comments from several readers, I think "Software Engineering" is a much better name for my proposed vocational split-off than "Programming." Also, this pattern has already been implemented at some Universities, including Purdue (see CS vs. Computer and Information Technology) although as of yet it's hardly pervasive.

Pages