Friday, May 27, 2011

What might the excitement about Node.js be about? JavaScript on the server? Events? Or, what?

There's a lot of excitement brewing around Node.js with a rapidly growing community of people and projects surrounding the platform, and at least five books either published or on their way to being published. I myself am working in a team at a company I cannot yet name on a large Node.js project that could be very significant. The team is excited about the platform but also have some real eyes on the question of whether this makes sense for the company, or what. I thought it would be interesting to run through some attributes of Node.js and ponder what might or might not contribute to the excitement about Node.js and its eventual success. I'm also interested in what others think, so consider leaving your thoughts below.

JavaScript as an excellent language: I'm pretty new to JavaScript and still haven't fully grok'd it as an "excellent language". It certainly has some advanced features. However I spent over 10 years writing Java and think it's an excellent language as well. They're both excellent in different ways.

JavaScript is certainly light years better than PHP and Perl.

However one thing I'm worried about in Node.js is it's a single language execution environment. JavaScript Only. The other languages have their value but Node.js doesn't allow one to write anything but JavaScript code. The Java ecosystem allows multiple languages on top of the JVM, so long as someone can write a compiler from the language to Java bytecodes. If Node.js had been implemented on top of Java (don't laugh, it has event driven I/O and could be the basis for the Node.js model) then for free it would have been able to use multiple languages. A particularly important language for server side is PHP (and Java) and because Node.js is JavaScript only it carries with it an unwritten question of whether one should rewrite every application in sight in Node.js. If it's so much better then why do "legacy" applications need to remain written in Perl or PHP or Python or whatever? Well, what about the embedded investment in writing and testing those existing applications?

JavaScript on the server (free'd from the browser): Hurrah, JavaScript has been freed from the browser and it can now be applied to other tasks. One of which is Node.js. This is a great thing. But ..

Same language on browser and server: This is one thing we gain from freeing JavaScript from being chained to web browsers. That an application can have browser and server components in the same language, raising the potential of dynamically shifting code from browser to server and back depending on architectural decisions, and that the front-end and back-end teams speak the same language. Lots of goodness can come from this. But my management aren't so thrilled by this attribute and don't see this as a big selling point.

Event driven I/O: As I wrote earlier (see: COMET as a justification for using Node.js?) Node.js seems perfectly architected to solve a problem we expect to be more prevalent. Long running connections between web server and every client browser that's touched the server recently. Thread-per-connection systems don't scale well for this use scenario, non-threaded event driven systems do.

This architectural decision contributes to the high performance numbers claimed for it.

It's also a simpler programming model than highly threaded thread-per-connection systems.

Closures: This is part of the JavaScript is an excellent language argument. It is a specific feature the Java community has been clamoring about for years, but with no success in getting it to be part of the Java language. If you want closures however on the Java platform, Groovy is an excellent language. JavaScript comes with closures as lightweight anonymous methods. It's really nice to write a callback function and not worry about remembering the correct class template to implement as a wrapper around a non-anonymous method, as you have to do in Java. The equivalent to a closure in Java requires a lot more typing as well as implementation overhead. Way cool.

Loosely defined objects: This I think is a mixed blessing. Yes it's way convenient to be real loosey-goosey with your objects. For example just check if it has a function named suchAndSo to give you a clue whether you can use the object for something or other. But I think this is a bit of a panacea and can lead you to hairball systems as complexity grows larger.

There's an argument circling around Java for years along these lines. That because Java is so highly rigid it was inconvenient to program in, and that the rigidity hurts programmer productivity. I'm not convinced, especially when you consider all the time saved because the Java compiler knows a heck of a lot of information about each object and can directly tell you about misuse of methods, improper type combinations, and IDE's that know exactly what you're doing and can provide a drop down list of all possible method completions as you're typing code.

On the other hand there's plenty of times you need to write a light weight class whose scope is no further than the few lines of code surrounding it. In JavaScript objects can be quickly defined in passing while you're writing other code. Further you have anonymous objects that are just fields and functions with a initialization nice syntax.

I don't think there's a clear winner between rigid type systems like Java's, or loosey-goosey ones like JavaScripts. They each have their place. I worry about the complexity as systems grow and whether loosey-goosey works in huge complex systems.

These are a few thoughts. I'm interested in hearing what others have to think.

Tuesday, May 3, 2011

COMET as a justification for using Node.js?

A lot of excitement is circulating around Node.js. To an extent this strikes me as symptomatic of a pattern in internet software development where a flashy new idea comes along, a bunch of leading edge thinkers get excited and start using and promoting it, there's a wave of excitement, and a call to rewrite everything under the sun using this new bit of flashy technology. But is the reinvestment of rewriting everything using that new piece of technology worth the result? That is, will Node.js be the death of LAMP? Who knows about whether the excitement around Node.js will lead to that result, but in researching and evaluating Node.js I've come across a specific use case that directly calls for server architecture identical with Node.js. There's an exciting class of web applications which use asynchronous browser updates to increase interactivity, that as we'll see in a minute require an event driven server architecture which seems well suited to Node.js. Leaving aside the question of a general purpose Node.js based server stack, let's look at COMET style applications.

Background

To start at the beginning we turn to a blog post by Alex Russel titled "Comet: Low Latency Data for the Browser" (it appears you'll have to yahoogle the phrase and then look at the google cache copy of the page). He was co-creator of the Dojo toolkit, co-creator of cometD, and co-author of the Bayeux Protocol, all of which circle around the COMET protocol. It focuses on "low latency data transfer to the browser" where a COMET application "can deliver data to the client at any time, not only in response to user input" where the data is "delivered over a single, previously-opened connection". The model is radically different from the traditional model of a browser opening an HTTP connection, doing a GET or PUSH, receiving data from the server, then closing the connection.

Because a "previously-opened connection" is maintained between server and browser it means servers will see large numbers of open connections. Apache, and other thread-per-connection server architectures, do not scale well with large numbers of connections.

Alex's blog posts suggests that a "long lived page" will go "stale" and that if the page were to maintain a connection to the server the server could update the page as new content arrives on the page. From his discussion I am imagining scenarios with reader generated comments or discussion. For example on twitter.com, as new tweets the twitter servers notify you of the new tweets. For example, facebook.com will nowadays notify you of comments while you're doing other things. For example, you can imagine a blog commenting system which dynamically updates the comment thread on the page as readers add comments. The disqus commenting system almost implements that idea.

Alex suggested "New server software is often required to make applications built using Comet scale" because of the maintained open connection to the server for each browser (client). He went on to suggest "event-driven IO on the server side" as the solution, and named off some possible solutions. Apache was supposed to "provide a Comet-ready worker module in the upcoming 2.2 release" and he named "tools like Twisted, POE, Nevow, mod_pubsub, and other higher-level event-driven IO abstractions". Modern OS's all now "support some sort of kernel-level event-driven IO system" and in Java the NIO layer is a good basis for event driven IO, which has in turn led to implementations in Java appservers like Tomcat.

Another 2006 blog post, this time by Andi Egloff titled Comet Basics, discusses Alex's earlier blog post. He described AJAX as "the ability to do 'invisible HTTP requests behind the scenes via javascript" and Comet as "primarily an extension or variation on how the communication over HTTP is done".

He asks to imagine an AJAX request that starts with an HTTP GET. The data might not be available and rather than just close the connection the server leaves it open and then sends back the HTTP response when the data is available. Then, it leaves the HTTP connection open and as further data arrives the server sends that data as well.

Essentially it allows to "push" or "stream" data to the web client via standard HTTP GETs instead of for example polling for updates at regular intervals.

Responding to the obvious question of whether all these open connections scale, he writes

Done right and used right this can be a very efficient way to send events to web clients; in fact, it can save a lot of unnecessary “polling” requests. Not only can it be more efficient, updates will get to the clients quicker (lower latency) than when polling.

Maintaining an open connection means keeping a socket open, and keeping some data in the server. Threads or processes per connection are not a requirement, but more an unfortunate architecture choice of some servers. What he says is "you do *not* want to happen is for the server to block for example a thread per request when it waits for data to arrive – that would be Comet done in a fashion that will not scale well."

Node.js as THE answer?

The way I read the above background is that they were explicitly calling for a server architecture which fortuitously Node.js implements. I don't know of Ryan Dahl had COMET in mind, but his creation fits the bill to a T.

The Node.js architecture is an asynchronous event driven programming model where callback functions are dispatched as events propagate through the system. The events can be I/O such as network traffic, or could be driven from other sources, because events are created by any EventEmitter object.

Node.js is implemented on top of a fast JavaScript execution engine (V8) and the JavaScript language is especially geared to writing event driven callback oriented applications. In Node.js callback functions are implemented with closure functions you provide as a listener to EventEmitter objects.

The result is a platform that makes it really easy to implement event driven server applications (even clients).

A slide show

The following are a two year old slide deck going over this territory

Monday, May 2, 2011

Node.js: JavaScript on the Server - Ryan Dahl's original presentation at Google

The following is the original presentation by Ryan Dahl showing the ideas behind Node.js and some of the performance results which have wow'd people.

Introduction to node.js and JavaScript Services on webOS

The following is a presentation about the use of Node.js on WebOS devices (originally Palm, now HP). Node is meant for server side javascript, but WebOS is for client devices.

This session covers basics of JavaScript services, including service interfaces, service lifecycle and a basic service example. Advanced topics include debugging, application packaging, and more node.js topics such as web services and file I/O. Learn how and when services should be used with their application, how services are packaged and distributed, and how node.js runs on webOS.

Drupal + node.js module demo

Here's a little video demo'ing the Node.js integration module for Drupal. The module is for Drupal 7 only, and "It provides an API that other modules can use to add realtime capabilities to Drupal."

The demo shows triggers and actions which can distribute messages through a Node.js based service which pops up on all web browsers connected to a Drupal site. They indicate a future direction of implementing a chatroom on top of this.

Fargo: a Scheme for Node.js? Node.js supports only one language!

The guys who developed Node.js implemented it on top of a virtual machine which supports only one programming language: JavaScript. If they'd wanted us to use multiple programming languages they could have implemented Node.js on top of the Java/Hotspot VM and it's rich and mature support for multiple languages. But they didn't. There are a couple examples of "other" languages being used to write Node.js programs, such as CoffeeScript. A new example of this is Fargo. The developer has this to say about it: Fargo is a programming language that runs on Node.js. It's designed to ease asynchronous functional programming by providing features missing in JavaScript, namely tail recursion and some form of continuations. It is still an experiment and a toy.

The idea is to run code like the following on top of Node.js using a layer which makes the thing pretend it's not quite a Scheme system:

(define square (lambda (x) (* x x)))
(puts (map square '(1 2 3 4)))
 
(puts (let ((x 1)
            (y 2)
            (z 3)
            (h 7))
        (+ (+ x y) z)))

I don't quite get the purpose of having all those parenthesis, and for that matter I was never able to figure out LISP or Scheme.

The developer says: The main reason for Fargo's existence at present is to add fibers to the Node environment to make async programming easier. Fibers are a lightweight form of continuations that allow blocks of code to be suspended and resumed by the user. Many Ruby programmers are using fibers to let them write non-blocking code with blocking-style syntax. And, I can say after having written a few Node.js programs that the asynchronous programming model is a hurdle. It's desirable to have something to lower the hurdle a bit.

But, all these parenthesis?

The idea of "When a fiber is running, you can use the yield function which suspends the fiber and returns the yielded value as the result of the fiber's invokation" is attractive, but does it have to be implemented by a non-JavaScript language?

Just some thoughts...