This is the combination of a little documented feature of V8 mixed with a non-documented feature of the node.js debugger client. Say you have an already running node process that you want to debug. The most common scenario is a process that’s running wild on one of your testing environments and you want to see what’s wrong with it. So a backtrace would be helpful. And full debugging even more. The following should help:


# start the remote debugger in the already running node process
kill -s USR1 pid
# attach to the debugger using the node.js client
node debug host:5858
# see where you are
debug> pause
debug> bt

From there you can poke around. You can also continue and pause again to see if you seem to consistently end up in the same code area. Or write a script that regularly connects, dumps the backtrace and disconnects.

Happy debugging.

I started working on MommaZoo with Catheryne in June and I’ve been meaning to describe the technology stack that powers it for some time. Before I do, knowing about MommaZoo may be helpful. And I can’t really put it better than our website (given that Catheryne wrote it):

MommaZoo helps moms and dads with the craziness around parenting school-aged kids. We remind you when your kid’s homework is due. We tell you the morning you need to bring a car seat for a field trip. We help you instantly contact other parents who can cover 15 minutes if you are running late. We connect you with other parents who live close to you, or whose kids are in similar activities, to share a ride or a walk-to-school.

We have a messaging part, a simple task management thingy (optimized for volunteering because schools do a lot of that) and class rosters. We’re also working on using location and activity matching to get parents who live close to each other to find one another. We want to encourage carpooling, walk-to-school and “let’s all help each other” connections. We’re at the pilot phase at the moment and are testing and tuning it with one school. We’re still pretty small so don’t expect some hardcore traffic numbers here.  But I’ll give an update when we reach a million users (*cough*)

One major challenge we had was we wanted to optimize for mobile delivery but also wanted a website. We’re bootstrapping (yes, we like challenges) and wanted to be out quickly (for the start of the school year, so only 2.5 months). So I decided to try a website that’d work for all devices (the “HTML5″ thing). That has worked much better than I initially expected. Of course there’s been some pain but all things considered, rather minimal given the time saved.

So without further ado, going from bottom to top, here is our stack:

  • Amazon EC2. Right now, a micro instance. Yup, micro. We don’t have that many concurrent users yet and we’re pretty lean an mean. And I love EC2.
  • Ubuntu server. I’m just very familiar with Ubuntu.
  • MongoDB. As already mentioned, development time is rather important. Developing on top of MongoDB is really pleasant. It’s not really a NoSQL storage, it’s much more conventional than what is talked about, but it’s also pretty frictionless. So I like. We’ll go fancier when we’ll have much more data.
  • Amazon SES. Takes a POST, sends an email. Simple as in the name.
  • CoffeeScript. That’s the programming language. And we use it everywhere, top to bottom, left to right, client to server. Okay, I haven’t tried to get the Coffee compiler in the MongoDB shell yet but someday… Anyway if you haven’t checked CoffeeScript out yet, you definitely should, it’s doing yourself a disservice not to. And even more so if you’re using node.js…
  • Node.js. You can bitch all you want about callback oriented programming but with a few good habits (CoffeeScript), it’s just fine. I thank Ryah Dahl every single time I don’t need to use a thread pool, have locks or use synchronization. And Javascript is just Scheme after a long acid trip so it can’t really be bad.
  • Mongoose. To interface with MongoDB. Honestly, I’m not totally seduced. I don’t necessarily like all the choices, there’s a little bit too much magic and it has some kinks. I’d rather have something thinner. I mean to check out Poutine when I have some time.
  • Underscore.js. Simple library, with a few good utilities. Used both in the browser and on the server.
  • Express.js. It’s thin, doesn’t get too much in the way, handles your resource routing and integrates connect.js (for cookies, request logging, etc.). Overall I’m not fully convinced but it does the job well. So I keep for now.
  • Amazon ELB. Just for failover and SSL termination at the moment.
  • Skeleton. We’ve now crossed the network and are in the browser. Skeleton has been working really well in handling all the layout differences between devices and screen sizes. It’s basically a CSS with a lot of media queries. Very nice.
  • jQuery. Just because. With a few plugins.
  • jQTouch. Originally intended for its good iOS support but it’s probably the piece I’m regretting the most right now. It seems a few best practices and tricks with a CSS animation library would serve us better. It’s not very well maintained and the code isn’t that clean.
  • Backbone.js. The third piece of software by Jeremy Askhenas we use. We definitely owe him his weight in beers. The good of backbone is that it’s simple and lightweight (notice a trend?). The less good is that it tends to orient you toward fat views. Of course YMMV.

Total bill? $31 a month right now. And we have 2 copies of everything, one for a test environment and one for production. Overall I’m very happy with our node.js+ec2 stack. It’s been stable, very tractable and will support our growth in the coming years.

So if what we’re doing sounds exciting, if you’re interested as a parent or as a technologist, drop me an email (matthieu at mommazoo dot com). We’re not hiring yet but I’m always up for some interesting email exchanges. Or a beer if you live in San Francisco or are passing by.

My next field of study is neural networks, I’ve been reading about them for some time. Interestingly, it’s not really easy to find resources that aren’t full of not-so-trivial math, so I had to get a calculus refresher as well. By the way if you’re interested in learning algorithms strictly from a programming standpoint, I can’t recommend enough Programming Collective Intelligence, very good practical book.

Before we get into the details, neural networks are used in supervised machine learning to solve classification problems and more generally build non-linear models out of raw data. One of their main strength is that they’re good at generalizing (responding sensibly to data that hasn’t been seen before). They were very popular in the 70s, many thought they were the early stage of a new kind of intelligence. That didn’t really happen so they fell in disgrace. To make a comeback in the mid-80s when back-propagation was invented. Then new machine learning techniques came about more recently, like Support Vector Machines, showing somewhat better results (with somewhat complex math techniques). But neural nets came back with a vengeance! With techniques still evolving, deep belief nets or LSTMs are now showing really good results on some categories of problems (on that last subject, a couple of Google tech talks from Hinton are worth the time).

So the simplest form of neural network is Rosenblatt’s Perceptron, which really isn’t much of a network given that it’s usually represented as a single neuron. Behold the Wikipedia illustration ripoff:

Basically, you have a series of input signals that stimulate the neuron into producing an output signal. Each input has a weight and the output is the sum of all inputs time their respective weight, passed through some normalization function. The role of the function is simply to normalize the sum.

The f function can be as simple as a step function (i.e. 0 for x < 0 and 1 for x > 0) although most problems are better handled with more smoothness (like an hyperbolic tangent). At this point you may start to wonder: but where is the learning? It’s all in the weight, brother. For example, say you want to predict the chances of someone having a cancer using a set of basic physiological measurements before recommending a more thorough and expensive examination. Each of theses measurements (blood pressure, quantity of white cells, …) is going to be one of the X input in the above picture, the expected output would ideally be a number between 0 to 1 telling you how sure the network is. To get to that you need to adjust the weights of each connection by training the network. Enters supervised learning.

In supervised learning, the network is fed a data set for which the answers are already known. The difference between the expected output and the one produced by the network is the error and can be used directly to adjust the weights:

In plain English, the new weight for a given input is the old weight added to the product of a learning factor (usually in the 0.1 to 0.01 ballpark, depending on how fast or precise you want the learning to happen), the error and the input. Pretty easy, no?

But there’s a catch, with Rosenblatt’s Perceptron you can only model a linear problem (the solutions can neatly be separated by a straight line). Sometimes that’s enough but often, like in my cancer detector example, it’s not. So what can we do about it? Just stack the neurons on top of each other.

Here we have 3 layers, neurons on each layer are connected to all the others neurons of the neighboring layers (the illustration above is slightly wrong, missing connections between the hidden and the output). Middle layers (there’s only one in the illustration) are called hidden layers, as they’re invisible to the outside world. The network can be as large as needed, with many hidden layers composed of many neurons. The thing is, depending on the problem, a network that would be too large can actually perform poorly (over-fitting) and will be long to train. That’s where Perceptrons can be annoying: in theory, they can model any function, in practice several educated guesses need to be made as to the size and structure of the network to get good results. That being said, people have been using them quite successfully for some time.

Multi-layers perceptrons were first described in 1969 but it took a while to figure out how to train them. 17 years actually. The trick is to propagate the error calculated at the output (just like for Rosenblatt’s perceptron) back through the hidden layers, all the way to the input layer. The algorithm is fairly well explained on Wikipedia so I won’t go into the details. Plus you’re going to get some code anyway.

Multi-layers perceptrons are still fairly basic as far as neural networks go nowadays but it’s necessary to understand them well to understand other networks. They also perform decently well and are easy to implement, so why not do that, huh? I’ve implemented one in Scala, it gave me a good occasion to try the language a little more. At the beginning I wanted the implementation to be purely functional but somewhere along the way it felt more natural to start mutating the objects. I’m not sure whether that’s because of Scala, because of object-orientation or because of me. My Scala is probably still a little rough, so let me know if there are more idiomatic ways.

To make it easier to try, instead of using a data set that you’d have to download, I’m training the network on the XOR function. It’s non-linear, so it’s a good illustration of the capabilities of the network, even though the result doesn’t have as much wow factor.

This is the second (and last) installment of a small personal study of distributed hash tables. The first one was focused on consistent hashing and the Chord DHT, check it out for more context.

Kademlia

Kademlia scales very well mostly because its design taps into the lessons learned by existing peer to peer networks. Which also makes it more complex and a little less elegant than Chord, feeling more like a set of heuristics rather than a single nice algorithm. But that’s often the price to pay to behave well under real-world circumstances.

Probably the most important observation that Kademlia relies on is that a node that has been in the network for a while is very likely to stay a little more. Conversely, a node that recently appeared is likely to leave soon. Kademlia’s routing table practically never removes an “old” node.

Getting a bit more into the details, like most DHTs Kademlia relies on consistent hashing to locate nodes and key / value pairs. But it shows some originality by using a slightly different notion of distance. Where Chord simply uses the traditional definition of how far two points are from each other on the circle (d(a,b) = abs(b-a)), Kademlia relies on an XOR metric (d(a,b)=a xor b). It’s a valid metric as it follows the necessary rules:

  • d(x,x) = 0
  • d(x,y) > 0 if x != y
  • d(x,y) = d(y,x)

So why using this crazy XOR metric instead of the classic distance do you ask? Because of the structure of the routing table that Kademlia puts in place. Once again a picture is worse a thousand words so I’ll give you one (coming from the paper) plus a few hundred words to boot.

Kademlia routing table

Kademlia routing table

Ah I hear you think, a cool binary tree. Yes it is. The binary tree represents all the numbers in the hash space, so if your node, when expressed in binary is 1100 (12 in decimal) it will be placed under the node found by going twice through the left branch and twice through the right branch. Simple, isn’t it? Except that the rules to build that tree aren’t that simple but I’ll try to explain anyway.

Each leaf in the tree contains what’s called a k-bucket, basically a list of k elements (and in practice, k=20 seems to work well). When a new node enters the Kademlia network it doesn’t have much of a tree, it just has a single empty k-bucket. When it starts learning about the existence of other nodes, it will insert them into that bucket until it reaches its maximum length of k. When the k+1 node needs to be inserted, the bucket gets split. All the nodes that have a hash binary representation starting with 1 get assigned to the new k-bucket on the left of the root node, the ones starting with 0 go to the one on the right. Here is our one level tree. Then our node starts learning about more other nodes and say the right k-bucket (with node hashes starting with 0) gets full again. The bucket is split and all nodes starting with 00 will be in the right-right position (from the top of the tree) and all the ones starting with 01 will be re-attributed to the right-left position.

Now if we were to continue this process on and on, we could end up with a routing table having the same width as the network: n. But we already saw that a too large routing table is expensive to maintain, which is why Kademlia’s routing table only includes O(log n) elements. How? Simply by not storing too many nodes in the “far away” distance using the XOR metric. It never splits k-buckets that aren’t on the same side of the tree as the current node hash.

In the above picture, the current node hash starts with 0011. Therefore it will only have one k-bucket on the left side of the three for all nodes starting with 1 (represented by the big gray circle). That bucket will never be split. It will also have one single bucket under the right-left path (represented by the medium gray circle), which similarly will never get split. Neat isn’t it? This guarantees that our routing table only maintains O(log n) contacts and also that a node knows well its surroundings without knowing too much about what’s far. Heard that already? Yep, Chord does the same.

But Kademlia’s scheme uses k-buckets, with k nodes. And that’s where its main advantage lies, on a full k-bucket that can’t be split, it can still reorder nodes. The buckets being a list, it orders oldest nodes toward the end. And those will never be dropped unless they stop responding. As older nodes are also the most persistent in the routing table, the network organizes itself around the most reliable points.

I won’t detail the lookup algorithm too much, it almost naturally comes as a result of the routing table’s existence. In a few words, you ask the closest nodes you know about in your routing table which nodes they know about that would be close. And you continue doing that until you don’t learn about any new closer nodes anymore. Another very important property of Kademlia lies right here. The lookup always considers k nodes but only uses a subset of those. This choice in selecting the subset (usually of 3) out of k nodes leaves a lot of space for latency optimization.

That’s about it! There are a few additional refinements, especially around maintaining the network by having nodes refresh each others’ information but the most important part is there. And that’s the most deployed DHT algorithm on Internet.

Improvements and Shortcomings

I’ve mentioned that a big advantage of Kademlia that having a choice in which nodes to contact in Kademlia was a big advantage. Why? Because if you can find your way around in more than one way, you can start optimizing the “best” way. On a network that would mean the fastest which is also often the cheapest. That’s important for you, because you don’t like to wait and that’s important for ISPs because they have strong interests in optimizing the routing. An interesting direction for those optimizations are network coordinates, like Vivaldi or Pharos, that map network distance with real coordinates. Using such a system, a given node can easily analyze the surrounding topology.

It’s also been observed in the wild, on Kademlia’s implementations, that strangely the distribution of nodes on the hash space wasn’t even. When you give a node an id in the hash space, you would normally expect they will be more or less evenly distributed, especially if the id attribution is random. However it’s not what has been observed. How come? Because it’s a relatively easy way to game the system. Suppose there’s a given key / value stored on the network that you would like to censor or swap for another. You could just create a lot of “fake” nodes which would advertise their id as being very close to the chosen key / value. Anybody on the network would need to ask one of your nodes and then you can just return bogus values.

The problem is actually not as bad as it sounds. If it’s relatively easy to “hide” a specific key / value pairs, given the number of nodes and values present in the network, large scale censorship is almost impossible to achieve. And it’s also not trivial to come up with a scheme that would protect the network against that sort of attack. The naive way would be to have neighbors of a node give it a value based on its IP address. Unfortunately it’s possible for many nodes to share the same public IPs and there are privacy issues associated with using your IP directly.

Conclusion

I hope you enjoyed this journey in DHT land, it was certainly enjoyable to me. Looking at the bigger picture, it seems that peer to peer networks are still fairly underused, I’m sure there are new and unexpected applications. The web is actually a very centralized model and that worked really, really well. But it might not always be this way and there may be a day where another model will emerge. I’m not saying P2P will be it but knowing the alternatives in the network layout, knowing how they work, their advantages and shortcomings will surely be useful.

Image by jef_safi.

I’ve had this long unfinished rant about Javascript in my drafts for almost a year now. But somehow it felt unfinished, like a useless rant. And it dawned on me recently: “if you’re unhappy about the state of Javascript, just fix it dumb sucker”. That little inner voice had a point. So this a lightweight rant with patches.

See, I actually like Javascript. It’s minimal, functional, now has good implementations (getting better and better), flexible and extensible without magic. Of course it’s not as full-featured as languages with very large standard libraries but it’s not meant to be one anyway and it’s a pretty damn good language to have in your toolbox. And I’m talking about Javascript the language, not the browser DOM which is a library (which, non-arguably, sucks).

The problem though, is that mostly for historical reason, it has some very annoying quirks, much more than most other mainstream languages (one huge quirk being its name). And for some reason, nobody involved in the specs has ever bothered fixing those, even when they’re pretty obviously broken “features”. I guess the main reason is the Holly Backward Compatibility but there are ways to address that.

For most pain posts below I have patches for Rhino, the Javascript implementation I know best. They’re all published on github, applied on the current Rhino trunk, fork away!

Function is 8 characters, that’s looooong

As I already mentioned, Javascript is functional and that’s one of the sweetest thing about it. It allows you to do things like:

[js light="true"]
var total = [0, 1, 2, 3].reduce(function(a, b) { return a + b; });
var flattened = [[0,1],[2,3],[4,5]].reduce(function(a,b) { return a.concat(b); },[]);
[/js]

Sweet no? Well, kind of. There’s actually quite a bit of visual pollution in here, making code longer to write and harder to read, even for very simple examples like these. More complex programs lose elegance.

Javascript 1.8 has improved things a bit with “expression closures” (a very bad name as it’s not necessarily a closure) which are shorter lambda notations. The first anonymous function above could be rewritten “function(a,b) a+b”. It’s better, but just the keyword function is still taking more visual space than the actual implementation.

So I’ve introduced two new symbols. The first one is -> and is strictly identical to the function keyword, it can be used anywhere function can be used. So you get:

[js light="true"]
var flattened = [[0,1],[2,3],[4,5]].reduce(->(a,b) a.concat(b), []);
[/js]

Sweet no? And then there’s one more candy for you: unary lambdas. The ~> symbol is used for a function accepting a single parameter (or even 0 because in Javascript if you can do one you can do zero) that will be bound to the implicit underscore variable. Functions with a single argument are usually pretty common, so I thought it was worth sugaring them a bit. With ~> you can do that:

[js light="true"]
var bigger = [0, 1, 2, 3].map(~> _ + 5);
[/js]

Which will produce a new array with 5 added to each element.

The patch adding -> is here (and very simple), the one adding ~> here. Feedback welcome.

Zero and the empty string are falsy

Truthiness is a pretty useful thing, it makes most conditional testing or assertions shorter to write and more compact. Just in case you don’t know, truthiness for some values is the fact that they can be coerced (or converted) to booleans automatically, even if they’re not. Like this:

[js light="true"]
var m = "a string";
if (m) print("it’s true!");
[/js]

The problem is that Javascript makes undefined, null, false (of course), 0 and the empty string falsy, which is sort of reminiscent of much older languages. Now I can’t recall any program when I had to test values and was considering all null, false, 0 and the empty string as failure cases. Null and false yes (mostly null actually). Null and the empty string, sometimes, but I usually have a utility function to test that as it doesn’t occur that often. Null, false, 0 and the empty string, never. And when you get an array passed to a function, it’s not like you can know easily in advance if there are going to be zeros or empty strings in them

It’s pretty annoying because it’s easy to miss unintended bugs like this:

[js light="true"]
var a = [-1,0,1,2,"","a"];
for (var i = 0, item; item = a[i]; i++) { print(item); }
[/js]

This is a fairly useful Javascript idiom but it won’t print the whole array, the loop will stop at the 0 element. And it’s not like I’m the only one complaining. One could argue that now functions like map shield you against this but you still have the occasional if where you just forgot and for loops still have some usefulness.

So I’ve changed that, making 0 and the empty string truthy and only null and false falsy. The patch is here, feel free to review and tweak.

Undefined considered harmful

So theoretically, you might think having undefined is a good idea. So you have a difference between uninitialized variables (which have for value undefined) and null values (which are initialized with a null value). Practically though, it’s more of a headache because you now have to worry about two different types of things not-really-having-a-real-value and test for both in your programs. Additionally:

[js light="true"]
js> foo != undefined
js: "<stdin>", line 11: uncaught JavaScript runtime exception: ReferenceError: "foo" is not defined.
at <stdin>:11
js> typeof foo == "undefined"
true
js> null == undefined
true
js> null === undefined
false
js> typeof null
object
js> typeof undefined
undefined
[/js]

Kind of messed up. Practically speaking it’s a lot of unnecessary accidental complexity. So my next patch will be to remove undefined, I don’t have that completely ready but stay tuned, I’ll update this post when it’s fully cooked and in the meantime you can watch the Github repository.

Objects, Types and Equality

There’s a duality between primitive types and object types in Javascript that sort of remind the brokenness of Java in that area (which has been somewhat alleviated by boxing/unboxing). The thing is, Javascript is dynamically typed so that bites even more: when you implement a function receiving a string, are you going to get a “abc” or new String(“abc”)? Here is a a Javascript session summing up the problem:

[js light="true"]
js> new String("abc") == new String("abc")
false
js> new String("abc") == "abc"
true
js> typeof "abc"
string
js> typeof new String("abc")
object
js> [] instanceof Array
true
js> new Array() instanceof Array
true
js> var str = "abc"
js> str.foo = true
true
js> str.foo
js> str = new String("abc")
abc
js> str.foo = true
true
js> str.foo
true
[/js]

There are ways to work around the discrepancy in behavior between object-wrapped types and primitive types but again, that’s a lot of accidental complexity. I haven’t completely thought out how to fix that yet, it’s a more ambitious project than the others. So keep watching and help is definitely welcome.

Conclusion

My aim by publishing these patches is not to break compatibility or even fork the language or implementations. It’s more to foster discussion and experimentation. Again, I like Javascript but some of its quirks need fixing. So which one would you fix first?

Recently I’ve been unlucky enough to contract a wrist tendonitis on both arms. I guess it was just a matter of time, always working on laptops in awkward positions or on the couch. I’m getting much better now but the point is that I’ve had to avoid keyboards as much as possible outside work for a while. So I started to read a few papers instead and somehow ran into distributed hash tables (usually abbreviated DHTs). Recovery was slow so I ended up reading quite a few papers, including algorithms, reports, studies, simulations and related optimizations. Got used to reading PDFs somehow. The wikipedia page is kind of light, staying a bit too abstract, so I thought I’d try to write a more concrete summary. It will help me remember some details too.

So DHTs basically provide the same interface as a normal hash table (put, get) but over a distributed network including a potentially very large number of nodes (think millions) and a very very large number of key / value pairs. Distribution provides reliability (through redundancy) and the ability to store a quantity of data far above anything else. The most common use is peer-to-peer with implementations like Kad (in eDonkey) or Azureus, but there are a few others like distributed file systems, content distribution or overlay for other applications (like chat). DHTs are very interesting in the sense that they mix distribution, resilience (to churn, mostly), reliability and distribution. A nice mix.

Most “classic” DHT algorithms I will talk about here are also optimized for high levels of churn (more on that later). There’s also another class of DHT, which actually aren’t stricto-sensu DHTs, that are optimized for lookups. Dynamo, the Amazon storage, is one of them. Here the keys for the nodes aren’t distributed, a node knows about all the others. Only values are distributed. It’s a different tradeoff: on the plus side, you reduce the number of hops as all nodes are known, on the minus side you can’t handle as many nodes or you’ll need much more network traffic. That last sentence might not be completely clear for you yet but hopefully it will make more sense later. Just keep in mind that I’m only going to consider “traditional” DHTs here.

Consistent Hashing

The common ground between all DHTs is the reliance on consistent hashing. Let’s say I come up with a naive DHT implementation based on modulo. We have 8 nodes and 100 keys, to assign a key to a node I just take the modulo of the key number by the node count. So key 18 will end up on node 2 and key 63 on node 7. Simple and effective so far. Until a node leaves or a new node arrives. By modifying the node count, the assignment of all the keys change. All the stored key/value pairs have to be moved around, leading to a fine and inefficient mess. Ugly pipe waste. Consistent hashing solves that problem.

The only drawback of consistent hashing is that you really need a drawing to explain it and I suck at drawing, even on a computer. So I’ve borrowed the following image from this Japanese blog post, which actually looks very interesting.

Consistent Hashing

At the start there’s a hash space, which usually corresponds to the whole space of SHA1 key (all the way to 2^160). Keys are expressed in this space by calculating their SHA1 (see the cute little HASH! explosion above?). And the space is usually represented as a circle that you would follow clockwise, so when you get past the maximum, you get back to zero. The secret of consistent hashing is to also give nodes a hash in that same space and to say that key / value pairs get attributed to the node closest to the key, going clockwise again. We’ll see that the idea of “closeness” can change but this is the easiest to understand.

Say we have two nodes, one at position 50 and another at position 200. If we want to store a key / value pair in the DHT and the key hash is 100, we’ll give it to node 200. Another key hash of 30 would go to the node 50. Makes sense so far? Good. Otherwise there’s a slightly more extensive explanation here.

It’s easy to see that consistent hashing is actually much better than my moronic modulo-based attribution. If a node leaves the network, only its key / value pairs need to be re-attributed. If a node joins the network, only a fraction of its follower node key / value pairs need to be moved. The rest of the hash space stays untouched.

Chord

With that out of the way, let’s start actually talking about DHTs. The canonical modern (introduced with the other 3, CAN, Pastry and Tapestry in 2001) DHT algorithm is Chord and it definitely contributed in popularizing consistent hashing. It’s the one you’ll see most often mentioned as it’s probably the simplest, in an elegant way. Which makes it interesting to study.

The basic idea is that once you have your nodes on the hash ring, the only thing that’s needed for a node is to know its successor. In the picture above, a key lookup starting at node (a), assuming it doesn’t have the key, would just forward the lookup to (b). If (b) doesn’t know about the key either, it will forward to (c) and so on. Sooner or later, you’ll hit a node that knows about your key. The good thing about this method is that it only requires a node to know about a single other node. The bad thing is that practically you actually have some decent memory to store more lookup information, enough network bandwidth for more contacts and that the lookup time with this method is linear with the number n of nodes (so O(n)). It’s also not very resilient against node disappearance.

To improve the lookup, Chord extends that idea by introducing a routing table containing O(log n) elements (at most 140 and that’s with a shitload of nodes). The first node in this table is the closest successor from our node’s position (say p) plus 2. The second element is the closest successor of key (p + 4). The third of key (p + 8). I think you get my drift. Generally speaking, element at index i in this table are located at successor(p + 2^i) starting at 1. Obviously some nodes will be duplicated in this table (it’s fairly unlikely you’ll have a node between p + 2 and p + 4) so the routing information to maintain is decently low. It’s important because maintaining routing information requires some constant traffic to make sure known nodes are still alive (otherwise a new node will need to be looked up to replace a missing one). With this scheme, the area “close” to a node in the ring is very well known by that node. The further you go, the sparser its knowledge is.

So with that whole scheme, it’s proven that a lookup will take, with high probability O(log n) hops. Which is actually pretty good in classic P2P: in a network of one million nodes, you shouldn’t get much more than 20 hops to find a value.

Additionally, to improve reliability and allow replication, Chord also introduces a successor table of size O(log n). It simply contains the successors of a given node. Chord is fairly sensitive to the disparition of its immediate successors (as the routing table jumps further and further fairly quickly) so that fixes it, when a close node disappears from its routing table, it has enough backups.

Finally, I should mention that the authors of Chords have created another DHT named Koorde. Koorde is interesting because it allows lookups with only O(log n / log log n) hops, which has been demonstrated to be the optimal for a DHT of n nodes with a (log n) sized routing table. However Koorde doesn’t resist as well to churn and as we’ll see in the next paragraph, it’s an important drawback (and the reason why I won’t detail Koorde).

Churn and Latency

At this point it should be reasonably clear that a major challenge for a DHT implementation in the real world is to resist well to high levels of churn. In file sharing applications for example, it’s pretty common to have nodes pop up in the network, do their thing (like download a couple songs) and quit after 20mn or even less. Under those circumstances, the network never reaches an equilibrium where all routing tables are accurate and all nodes are reachable. It’s in constant evolution with large number of nodes joining and leaving at any given moment. The job of the DHT then becomes being as resilient as possible, avoid pathological cases (like having parts of the network being totally isolated from others, forming islands for example) and try its best to maintain good connectivity.

There have been several studies comparing DHT implementations and their respective resistance to diverse levels of churn. Chord performs decently well but another DHT consistently comes ahead of the pack: Kademlia.

Another interesting property is to have an algorithm that’s conducive to some level of affinity between nodes based on network proximity. For example during a node lookup, if from the routing table we can know that two nodes A and B are both close enough to the key / value pair we target but that contacting B is much quicker, we could try to contact B preferably. And there are algorithm, like Vivaldi, that allow to calculate coordinates of a node in a network depending on its distance to other nodes. Unfortunately Chord’s routing table doesn’t allow that type of smart routing but other DHTs, like Kademlia again.

For these two reasons, on the second part of this post, I’ll detail Kademlia a bit more. It’s by far the most deployed DHT implementation and it definitely deserves some attention.

As a pet project, I’ve implemented an interpreter in Haskell for a small language that I’ve also designed. Before you definitely categorize me crazy, I’ve done it mostly to learn. So I have no pretension of coming up with the Next Big Language. And I’ve learned quite a bit about Haskell, about programming languages design and some more about interpreters, compilers and VMs. It’s actually amazing how much you learn by going a bit to the metal.

I’ve dubbed the language Witty. It’s strongly and dynamically typed, functional and encouraging a functional style (although not pure nor lazy), not particularly object-oriented (but something like CLOS could be supported) and somewhat homoiconic, permitting good macro support (Lisp-style and pattern-based). These are the main orientations, particularly with lambdas and macros, but I plan to add quite a few other features along the way. The main influences are, in that order: Scheme, Haskell and Javascript.

Witty is by no means finished, actually it’s closer to barely started. My biggest gripe right now is that it’s not functional enough. But I can already execute a few interesting things like the Fibonacci classic and enough to have a very simple Spec-style test framework. There are modules, exceptions, the classic data structures and of course macros and lambdas. It’s all a bit bare but it’s a good start I think. Here are a few code examples (the ^ symbol is equivalent to opening a parenthese and closing it at the end of the line, a bit like $ in Haskell):

> fact = lambda n ^ if (n == 1) 1 (n * fact (n – 1))
> fact 5
120

> macrox (unless `cond `body `contra?)  ^ if ((!) $cond) $body $contra
“unless”
> unless false (print “awesome!”)
awesome!

And here is a small test example:

describe “Map empty” (
it “should return true for an empty hash” (empty? {} == true)
it “should return false for a non-empty hash” (empty? {foo:1, bar:4} == false)
)

An interesting thing is that even == or ^ are implemented as macros. As you can see above, unless is also a macro.

I’ve decided to design my own pet language instead of implementing an existing one both because it was more fun (heh!) and because I had some ideas. There aren’t many dynamically-typed functional languages that support Lisp-style macros. Dylan might be close but I don’t like it for other reasons. My biggest surprise has been how hard syntax was to get right, even when it’s relatively minimal. I expected it to be hard but not that hard. I rewrote the grammar something like 6 or 8 times. Comparatively the interpreter is a piece of cake.

By implementing a non-pure programming language on top of Haskell, I was fearing I would end up banging my head against the wall. And I did a few times. But once you get past the steepest parts of the learning curve, it’s infinitely rewarding. And for quite a few things, it made the implementation much quicker because Haskell is so feature rich. For example I’ve added continuations to Witty in just a few evenings (3 I think) by snarfing Haskell ones. I know, I know, continuations support doesn’t often come up as a priority feature but I wanted to see if it would help implementing exceptions. And in the future, should I want to add software transactional memory or concurrency, it would be similarly easy. But before that, I should probably worry about file access…

I’ll probably write more posts about Witty but for a first I think that’s enough. All the source is in Github for your enjoyment. It’s definitely not ready for anything else than playing around but I’d be happy if others can learn something from it. You’ll need GHC should you want to build it and right now the tests are the best documentation.

I’m going to try writing a balanced post about a programming language. Meaning that the words suck, shit, moron and brain dead are prohibited. Enterprisey too. And scale.

I’ve been mucking around with Haskell for some time now, enough to know my monad transformers, functors and function composition. It’s all serious shit. Ooops, sorry. So I won’t lie to you: the learning curve is steeper than for most other languages I’ve learned before and I already have some decent functional programming skills. The purity of Haskell brings lots of advantages but it definitely requires more brain power than Java (random example). You also write at least twice less code than Java (another random example), in terms of terseness it’s close to Ruby and can arguably be more expressive.

Now don’t let the apparent complexity put you off, I’m just warning you a bit, you’ll have to stay focused to learn Haskell, but it’s still very reachable and most importantly (spoiler alert) it’s worth it.

The Community

Haskell enjoys a really nice, helpful community. That might explain their unofficial slogan (“avoid success at all cost”). The landscape would probably change a bit if the language became mainstream. The #haskell channel on freenode irc is pretty amazing for a relatively small community: it’s actually one of the largest around (they’ve beaten a record of some sort recently). You can pretty much always find a nice person to answer to your dumb questions, whatever your timezone.

Then there’s the Haskell Cafe mailing-list with similar qualities. The traffic is medium, trolling quasi inexistent and interesting subjects being discussed daily.

My only reserve in the community section (hey, I have to say at least one negative thing, right?) is that there’s still a very present academic stance. You’ll see a few conversations turn to category theory geekness. A lot of the libraries documentation is actually publication papers (pdf warnings). I don’t really find it annoying (except for the pdfs), it’s a nice change from the enterprisey (did it again) world. But YMMV.

Finally there’s a truly awesome book, Real World Haskell, to get started with the language. It may very well become an equivalent to the pickaxe in Ruby.

The Implementation(s)

Again, Haskell shines here. The compilers are really top-notch, nothing less. The reference is GHC but there are a few others and initiatives trying to out-GHC GHC. You can both compile code to an executable or have it interpreted, which makes experimentation much easier. The point is, there’s a strong ecosystem and most of it is production quality. No noodling here, just good software. Really.

And one of the sweetest thing I’ve found about Haskell is that the produced executables are pretty damn fast, especially for a high level language. To a point where you have to wonder why you would use anything else when you’re trying to optimize for performance. I can tell you, the Haskell experience beats hands down the C++ or Java experience.

The Language

Finally, I can fully indulge in my programming language geekery. So if you followed the first link I used for Haskell, you should know that it’s a pure functional language. It’s not its only important characteristic, far from it (personally I like partial application a lot), but it’s probably the one that had the biggest influence on the rest of the language. One of the consequences is that Haskell relies heavily on monads to model state (and not only that). Explaining monads is not the scope of this post, if you ask me they actually shouldn’t be explained, just experimented with. But the consequence is that Haskell is almost two languages in one. You have the “pure way” of programming and the “monadic way” of programming (and please, if this gives you a knee-jerk reaction, read on).

The “pure way” is like a honeymoon: it’s all beautiful, really. You have partial application, function composition, pattern matching, higher order functions, type inference and it’s all boundless. Once you get used to it, it’s like brain candy.

The monadic ways is more like 5 years after the honeymoon. Don’t get me wrong, I’ve been happily married for more than that. It’s just more an age of pragmatism. You have to build on top of all past decisions, work, lifestyle, … And you can’t always live in a honeymoon, there are other important stuff out there. Like hard disks, network and users (that’s where the metaphor breaks). So the “monadic style”, even though it works well is still terse and pretty sweet, just isn’t as beautiful.

Now you have to realize that my last two paragraphs were purposefully Manichean, I just wanted to give you a feel of the difference. Practically, it’s far from being so black and white. “monadic style” is more like an embedded DSL (it’s actually just a lot of syntax sugar), you can still do all the nice stuff like currying or composing monadic functions. Most importantly, it all feels like Haskell.

Writing a single section about the language is actually slightly painful, I feel it deserves much more. But that will be yours to discover. Believe me, you’ll enjoy the expressive power.

Conclusion

Haskell is now one of my favorite programming languages, sharing the spot with Ruby and Scheme (Javascript is close by). I’m not saying it will fit every purpose better than any other language, all programming languages have their respective sweet spot. But overall, I’m really damn happy to have Haskell with me in my toolbox.

Update: in the community section, I should probably add that Haskell has a very large ecosystem of libraries called Hackage. Hackage integrates a package manager (because you can never get enough of them) that makes installation rather straightforward. Now you know how to get a Haskell Tetris!

Image creds: martinlabar and arielarielariel.

I’m preparing to fly to New Orleans tomorrow morning (Monday) for ApacheCon US 2008. Lots of fun and schmoozing in perspective. If you’re in the area and feel like having a little hack session or a beer, drop me an e-mail.

An interesting paper from Alessandro Warth and Alan Kay, introducing what they call worlds, has been mentioned here and there. It reinforces a recent theory of mine that programming languages should allow more visibility on their running environment, which has been dearly lacking so far.

Side effects are getting back into the radar of language designers and implementers. They were already kind of a pain in the neck but we learned how to deal with them by applying some good engineering best practices. Namely, lots of test cases. And for some, the benefits of introducing assignment clearly outweighed the pain, which is why until recently, functional languages weren’t so hot. But Moore’s law has failed us and here comes the era of gazillion coreness. Suddenly, side effects become a Major Pain In Teh Ass.

Why you may ask? Well, ever wrote a significantly multi-threaded application that necessitated some state sharing? You get locks. Lots of locks. And those are at the language level, in every single data mutation you have. Your multicore processor starts waiting, its super effective pipelening rendered ineffective.

One answer to this is purity. Nothing can be changed, it is or it is not. It’s convenient because when there’s no side effects, the same piece of code called many times with the same inputs will always produce the same result. As a consequence, where and when it’s called matters less. And you can start parrallelizing fairly easily. Just a data point: if you check the programming language shootout (micro benchmark warning), Haskell is and 12th and 21st on single processor architectures (Intel and AMD). On a quad-core it’s in the 4th position.

The problem with purity is that side effects are just unavoidable. So you have to get by and find a way to model them. In Haskell you get monads which are slightly unwieldy. In Erlang, there’s message passing and some functions are allowed to mutate state (mostly for I/O).

So far there hasn’t been much effort in the opposite direction: putting a leash on those damn side effects in imperative programming languages. Worlds are an interesting step in that direction. Warth and Kay propose a language level construct to control side effects by introducing isolated but related environments.

Worlds can be “sprouted” from the root environment or another world. Changes applied in worlds are contained. And then a world can be merged back in its parent. So in a sense it’s similar to software transactional memory as already implemented in Haskell (again!) or Clojure (and coming in Scala from what I’ve heard). An example from the paper might help make it clearer:

A = thisWorld;
r = new Rectangle(4, 6);
B = A.sprout();
in B { r.h = 3; }
C = A.sprout();
in C { r.h = 7; }
C.commit();

At the end of this execution, our rectangle will have a height of 7. Without the commit, the rectangle in the main environment would still have a height of 6.

Interesting isn’t it? Although I’m not quite sure how usable that would be in practice. It seems to me that if that facility was available in a given language (they’ve implemented it in Javascript), most programmers would ignore it entirely when developing libraries.

So worlds are probably not the last word (I know, lame) but I think they’re a step in the right direction. I’m looking forward to more cross-paradigm pollination.

Follow

Get every new post delivered to your Inbox.