Tuesday, May 3, 2011

COMET as a justification for using Node.js?

A lot of excitement is circulating around Node.js. To an extent this strikes me as symptomatic of a pattern in internet software development where a flashy new idea comes along, a bunch of leading edge thinkers get excited and start using and promoting it, there's a wave of excitement, and a call to rewrite everything under the sun using this new bit of flashy technology. But is the reinvestment of rewriting everything using that new piece of technology worth the result? That is, will Node.js be the death of LAMP? Who knows about whether the excitement around Node.js will lead to that result, but in researching and evaluating Node.js I've come across a specific use case that directly calls for server architecture identical with Node.js. There's an exciting class of web applications which use asynchronous browser updates to increase interactivity, that as we'll see in a minute require an event driven server architecture which seems well suited to Node.js. Leaving aside the question of a general purpose Node.js based server stack, let's look at COMET style applications.


To start at the beginning we turn to a blog post by Alex Russel titled "Comet: Low Latency Data for the Browser" (it appears you'll have to yahoogle the phrase and then look at the google cache copy of the page). He was co-creator of the Dojo toolkit, co-creator of cometD, and co-author of the Bayeux Protocol, all of which circle around the COMET protocol. It focuses on "low latency data transfer to the browser" where a COMET application "can deliver data to the client at any time, not only in response to user input" where the data is "delivered over a single, previously-opened connection". The model is radically different from the traditional model of a browser opening an HTTP connection, doing a GET or PUSH, receiving data from the server, then closing the connection.

Because a "previously-opened connection" is maintained between server and browser it means servers will see large numbers of open connections. Apache, and other thread-per-connection server architectures, do not scale well with large numbers of connections.

Alex's blog posts suggests that a "long lived page" will go "stale" and that if the page were to maintain a connection to the server the server could update the page as new content arrives on the page. From his discussion I am imagining scenarios with reader generated comments or discussion. For example on twitter.com, as new tweets the twitter servers notify you of the new tweets. For example, facebook.com will nowadays notify you of comments while you're doing other things. For example, you can imagine a blog commenting system which dynamically updates the comment thread on the page as readers add comments. The disqus commenting system almost implements that idea.

Alex suggested "New server software is often required to make applications built using Comet scale" because of the maintained open connection to the server for each browser (client). He went on to suggest "event-driven IO on the server side" as the solution, and named off some possible solutions. Apache was supposed to "provide a Comet-ready worker module in the upcoming 2.2 release" and he named "tools like Twisted, POE, Nevow, mod_pubsub, and other higher-level event-driven IO abstractions". Modern OS's all now "support some sort of kernel-level event-driven IO system" and in Java the NIO layer is a good basis for event driven IO, which has in turn led to implementations in Java appservers like Tomcat.

Another 2006 blog post, this time by Andi Egloff titled Comet Basics, discusses Alex's earlier blog post. He described AJAX as "the ability to do 'invisible HTTP requests behind the scenes via javascript" and Comet as "primarily an extension or variation on how the communication over HTTP is done".

He asks to imagine an AJAX request that starts with an HTTP GET. The data might not be available and rather than just close the connection the server leaves it open and then sends back the HTTP response when the data is available. Then, it leaves the HTTP connection open and as further data arrives the server sends that data as well.

Essentially it allows to "push" or "stream" data to the web client via standard HTTP GETs instead of for example polling for updates at regular intervals.

Responding to the obvious question of whether all these open connections scale, he writes

Done right and used right this can be a very efficient way to send events to web clients; in fact, it can save a lot of unnecessary “polling” requests. Not only can it be more efficient, updates will get to the clients quicker (lower latency) than when polling.

Maintaining an open connection means keeping a socket open, and keeping some data in the server. Threads or processes per connection are not a requirement, but more an unfortunate architecture choice of some servers. What he says is "you do *not* want to happen is for the server to block for example a thread per request when it waits for data to arrive – that would be Comet done in a fashion that will not scale well."

Node.js as THE answer?

The way I read the above background is that they were explicitly calling for a server architecture which fortuitously Node.js implements. I don't know of Ryan Dahl had COMET in mind, but his creation fits the bill to a T.

The Node.js architecture is an asynchronous event driven programming model where callback functions are dispatched as events propagate through the system. The events can be I/O such as network traffic, or could be driven from other sources, because events are created by any EventEmitter object.

Node.js is implemented on top of a fast JavaScript execution engine (V8) and the JavaScript language is especially geared to writing event driven callback oriented applications. In Node.js callback functions are implemented with closure functions you provide as a listener to EventEmitter objects.

The result is a platform that makes it really easy to implement event driven server applications (even clients).

A slide show

The following are a two year old slide deck going over this territory

No comments:

Post a Comment