Archives for the month of: April, 2008

Continuing on my functional musings, I want to illustrate how introducing assignment in an otherwise purely functional language basically fucks up everything (excuse my French). The start with, let’s say that we have the embryo of a programming language with only a simple set of primitives and most importantly lambdas. Our goal is to build higher level data structures on top of that.

Once again (please bear with me for a moment even if this sounds like my last post), a good start would be a purely functional definition of pairs and the corresponding extraction functions that get the head and the tail elements. The technique we’re going to use is even more functional than last time, only relying on proper function definitions. So assuming that the base language is a bare-bone implementation of Ruby, Javascript or Arc, we could add the following definitions:

def pair(x,y); lambda { |m| m[x,y] }; end
def head(p); p[lambda { |x,y| x }]; end
def tail(p); p[lambda { |x,y| y }]; end

var pair = function(x,y) { return function(m) { return m(x, y); }; };
var head = function(p) { return p( function(x,y) { return x; } ) };
var tail = function(p) { return p( function(x,y) { return y; } ) };

(def make_pair (a b) (fn (m) (m a b)))
(def head (p) (p (fn (a b) a)))
(def tail (p) (p (fn (a b) b)))

It could seem a bit cryptic at first so let me show you how it gets reduced by using an example based on the Arc code which I tend to find most readable (I hate Ruby’s [] syntax for lambdas invocation, tell me what’s the difference between invoking a named or an anonymous function?):

(head (make_pair (23, 7) )
≡ ( (fn (p) (p (fn (a b) a))) (fn (23 7) (fn (m) (m a b)) ) )
≡ ( (fn (p) (p (fn (a b) a)))  (fn (m) (m 23 7)) )
≡ ( (fn (m) (m 23 7)) (fn (a b) a) )
≡ ( (fn (a b) a) 23 7) )
≡ 23

Okay, maybe it’s not that readable. What I’m doing here is replacing the parameters by what they stand for at each line, reducing the expression little by little. As an exercise you can try it with Ruby or Javascript, you should get the same result (unless I have an error in my code).

By the way I didn’t make that up myself, this purely functional representation has been found by Alonzo Church, the father of lambda calculus, and is called the Church encoding of pairs (or Church pairs). Most of what’s really interesting nowadays in computer language comes from Church (Alan Turing was one of his students).

So now we have our very small, almost axiomatic language for which we introduced purely functional data structures. It’s all nice and rosy because there are many assumptions we can take from this point. We can always replace a function by its expression like I did just above. everything is purely deterministic, no uncertainty on results. A given function, provided the same input parameters, will always produce the same result. It’s really convenient.

But it seems that we’re missing something. Wouldn’t it be nice to be able to change the value of things? Even if recursion can go a long way, it’s sometimes convenient to have incremented values or global variables that can change over time. So let’s introduce assignment in our small language and see where it takes us. To be clear, by introducing assignment, I don’t mean giving a value to something. That we could already do (the head or tail functions were given a value in the form of the code they represent). What I’m going to add here is the ability to change that value. So now we can do:

a = 2
a += 5

Sweet. And really this really can’t harm anybody. Or can it? Well, let’s see what are the implications for the rest of our language, does it influence other constructions? The suspense won’t be long: it does. I can change the above purely functional definition of a pair to a mutable one just with a little modification. Let me show you.

def pair(x,y); lambda { |m| m[x,y, lambda { |n| x = n}, lambda { |n| y = n}] }; end
def head(p); p[lambda { |x,y| x }]; end
def tail(p); p[lambda { |x,y| y }]; end
def set-head(p, v); p[lambda { |x,y,mx,my| mx(v) }]; end
def set-tail(p, v); p[lambda { |x,y,mx,my| my(v) }]; end

var pair = function(x,y) { return function(m) { return m(x, y, function(n) { x = n}, function(n) { y = n}); }; };
var head = function(p) { return p( function(x,y, mx, my) { return x; } ) };
var tail = function(p) { return p( function(x,y, mx, my) { return y; } ) };
var setHead = function(p, v) { p( function(x,y, mx, my) { mx(v); } ) };
var setTail = function(p, v) { p( function(x,y, mx, my) { my(v); } ) };

(def make_pair (a b) (fn (m) (m a b (fn (n) (= a n) (fn (n) (= b n) )) )
(def head (p) (p (fn (a b ma mb) a)))
(def tail (p) (p (fn (a b ma mb) b)))
(def set-head (p v) (p (fn (a b ma mb) (ma v) )) )
(def set-tail (p v) (p (fn (a b ma mb) (mb v) )) )

Let’s see how that works with SpiderMonkey:

s> var mypair = pair(31, 9);
js> tail(mypair);
9
js> setTail(mypair, 11);
js> tail(mypair);
11

What we’ve done here is introducing mutator functions in the definition of the pair. Said differently, a pair knows how to modify itself. The set-head and set-tail functions just use what’s in pair.

By simply adding assignment in my little language, I’m now seeing it bubbling up to my pair definition. Once I provide assignment as part of my language, anybody can redefine my nice purely functional pair in something slightly different. Something that introduces side effects. And any code using this mutable pair definition will also be exposed to side effects. And so on and so forth all the way up the software stack. Assignment is viral.

Now, think of it, how many of the non-trivial bugs you’ve been confronted with in your programs can be tracked back to assignment? You know, when you modify something and it happens to leak to something else. When variables don’t really hold the value you expected them to have. Or when some part of a program produces an unintended state change by calling another. If you really think of it, you’ll realize it’s a very significant portion of your debugging life. A whole class of unintended consequences start to appear with the introduction of assignable variables.

So is it worth it? Shouldn’t we just all forget about assignment altogether? I tend to think the answer is yes, even though I have my doubts (I know, it’s ambiguous). We, human beings, are a messy bunch. Assignment reflects that. Besides, you can’t do much with no form of variability at all. Without I/Os programming gets hard and users are an important kind of I/O.

So what’s my point? If we really can’t do without assignment and if assignment is that dangerous, how come almost all programming languages handle it in such a flimsy way? Imperative languages tend to make it as easy as addition without even warning you or reflecting its inherent virality. Functional languages are not that much better. The only one I know of that seems to handle the problem in a reasonable way is Haskell with the monad theory (and monads bubble up). But given the sheer number of articles and blog posts trying to explain what monads are and how they work, I’m guessing it’s far from being the final answer.

There’s something lacking here and while I can’t really think of a proper way to handle the problem for now,  most languages have failed in helping us dealing with it. So the next time you press the = key, think of the bugs you’re probably introducing.

Pictures by DerrickT and latenightowl.

Ruby is hot indeed. This Ruby VM written in Javascript runs bytecode compiled by YARV (Ruby 1.9 sort of) in your browser. Yes, you read that right. Right now the Ruby code, when found by the browser, is sent to the server, compiled using YARV, brought back to the browser as bytecode and finally executed by HotRuby. A bit unwieldy but I’m sure it will quickly improve. And the first micro-benchmarks look promising. Interesting.

Follow

Get every new post delivered to your Inbox.