Saturday, April 6, 2013

Findings while looking into implementing simultaneous multi-user editing on top of Node.js

I'm pondering developing something to support multiple people editing the same document at the same time.  I thought I'd seen a demo using something like backbone.js, socket.io, websockets, or something.  But that was months ago, I can't find it now, and after some searching have come up with some interesting pointers to some stuff but nothing that provides a solid starting point.  Instead I came up with warnings that this is HARD stuff - as in CompSciHard - including one toolkit written by an ex-Google-Wave engineer who said it took them 2 yrs to write Wave, and it would take another 2 yrs to reimplement because it's such a hard thing to do.

Having used Google Docs or Wave and done a bit of multi-user-editing of documents - the task as a user seems so simple.  However after reading up on some of the available libraries, I begin to see why it's hard.  The server has to maintain an object for each document, and between clients and server there must be a protocol for communicating changes to the document.  Because it's multiple clients, each could be trying to change the document at the same time, so the object model has to account for rationalizing where each edit occurred, which edits win in case there are overlapping changes.  Additionally there is the task of notifying all clients of all edits, simultaneously, in a way that prevents collisions and confusion.

The rest of this is probably TL;DR .. so the short story is ...
  • There are HARD problems here (as in CompSciHard)
  • There isn't a simple library to just pick up and bolt into an application
  • The leading library (DerbyJS) is entering a phase of massive rewrite
  • The other leading library (Meteor) is widely regarded as interesting, but incompatible with the Node.js ecosphere, so I'm ignoring that project

Let's get on with the things I found:-

Getting started with Meteor and Derby on your own server:

Did I say that the server side would be written with Node.js?  I must have missed that, however the title of this blog should be a giveaway.  Two principle toolkits I found were Meteor and DerbyJS.  That blog post goes over setting up and kicking the tires of both.

I was able to quickly dismiss Meteor from consideration despite it having a fairly active community behind it.  Why?  While it runs on Node.js, they implement the thing with Fibers and are eschewing the asynchronous aspect of Node.js.  The Meteor site even has a long discussion saying that Meteor is better because synchronous, in-line, code is much easier read than is asynchronous code with a zillion callbacks.

Specifically:
Meteor gathers all your JavaScript files, excluding anything under the client and public subdirectories, and loads them into a Node.js server instance inside a fiber.
And:
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node. We find the linear execution model a better fit for the typical server code in a Meteor application.

Granted there is a point to that line of reasoning, but if that's how they feel about things then why are they implementing on Node.js?  Sorry .. but this platform is about asynchronous code.  Further one of the things NOT implemented for Node.js 0.10.x was anything in the vicinity of Isolates or Fibers.  While fiber may be an important part of a healthy diet, the Node.js community seems to be shunning fibers as a programming model.

http://derbyjs.com/ is immediately more compatible with the Node.js environment because it can be hosted on top of Express.  Cool.

The Derby project has this to say about itself:
"Derby eliminates the tedium of wiring together a server, server templating engine, CSS compiler, script packager, minifier, client MVC framework, client JavaScript library, client templating and/or bindings engine, client history library, realtime transport, ORM, and database. It eliminates the complexity of keeping state synchronized among models and views, clients and servers, multiple windows, multiple users, and models and databases.
At the same time, it plays well with others. Derby is built on top of popular libraries, including Node.jsExpressSocket.IOBrowserifyStylusLESSUglifyJSMongoDB, and soon other popular databases and datastores. These libraries can also be used directly. The data synchronization layer, Racer, can be used separately. Other client libraries, such as jQuery, and other Node.js modules from npm work just as well along with Derby."
The blog post linked above does a walk-through of running an example DerbyJS application.  However the overall state of DerbyJS examples is really poor.  There is a github repository of them at https://github.com/codeparty/derby-examples but the examples weren't terribly useful. 

I did find a very useful blog post:-  Derby.js – Working with Views, Models, and Bindings  This gave enough insights into how Derby models and Views worked to get started writing some code.

The library let's you write a simple model description, write a simple view, and so long as you follow certain conventions it wires everything up for you without requiring additional coding. 

Then, browsing through the DerbyJS Google Group I found this question:- Can you make Google Docs like functionality with derby?  Essentially what I wanted to develop was an extremely simplified "Google Docs functionality."  The answer?  That the DerbyJS team replied saying
Not at the moment, but we're reworking the core so it can in the future.
And
For now, you should check out ShareJS if you want to implement collaborative text editing: http://sharejs.org/
But before I get into that, I do want to mention this thing that I found:  https://github.com/addyosmani/todomvc  It's a group project to implement the same example application in multiple frameworks - specifically, a simple TODO application.  They've developed examples for 2-3 dozen frameworks and it's a quite useful starting point for understanding.  The DerbyJS example is not in the main set of examples so go hunting for it.

https://github.com/josephg/ShareJS bills itself as supplying collaborative editing in any application.  It is directly the sort of thing required for this project, and the API looks clean and simple to use.  The example server is a little convoluted to follow but I believe that's just a matter of a few hours of playing with the code to see how it ticks.

The ShareJS website starts with this question:
You’re writing a web app. Your app contains data that users edit. Your users should be able to user your app from multiple computers if they need to. Sometimes you want multiple users to view & edit the same data.  How do you make that work, without the data going out of sync and without losing anything?
He claims the answer is Operational Transformation - describing that as
OT is a class of algorithms that do multi-site realtime concurrency. OT is like realtime git. It works with any amount of lag (from zero to an extended holiday). It lets users make live, concurrent edits with low bandwidth. OT gives you eventual consistency between multiple users without retries, without errors and without any data being overwritten.
And:
Unfortunately, implementing OT sucks. There's a million algorithms with different tradeoffs, mostly trapped in academic papers.
And:
I am an ex Google Wave engineer. Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time. (What??)
At this point I was thinking - okay, just how big of a task have I bitten off for myself?  He does describe ShareJS as a small/simple server written in 4k lines-of-code of coffescript.  And the demo code looks okay, though as I said it'll take some time to really grok.

But - I noticed a blog, and started reading the authors blog finding this one:-  It's time to rewrite ShareJS!  To cut to the chase - the guy was hired by Lever, the team behind DerbyJS, and they want to rewrite DerbyJS and Racer to base it on some stuff in ShareJS.  Along the way they want to do a massive rewrite of ShareJS.

Soooo... while ShareJS looks like a cool system directly useful to this project, I'm not going to be in a position to allow the code to become incompatible with the toolkit used to implement the application.

FWIW the Derby team wrote about hiring Joseph Gentle here:- Getting Derby ready for prime time.  They have a lot of really nice things to say about this and they portray an ambition for the combination of DerbyJS, Racer and ShareJS to be a powerful stack for developing real time web applications.  However, it's also clear that they'll be in a period of massive change to the three libraries.

Etherpad (http://etherpad.org/ and https://github.com/ether/etherpad-lite) is already an implementation of this idea.  It's even written in Node.js.  But it's a little difficult for me to get my head around it in the timeframe I have to study it.  And it doesn't seem to be written so I could extract some pieces/parts to build a much simpler application.

Next-generation JavaScript frameworks (https://gist.github.com/clarle/3396225) is the starting point for an overview of several frameworks ...

Rant: Backbone, Angular, Meteor, Derby (https://gist.github.com/lefnire/4454814) is another comparison piece ... makes some of the points I made above about Meteor, says great things about Derby and Backbone


Monday, March 25, 2013

Managing a Node.js server process with forever and an LSB-style init script for Debian

Want to run a Node.js process in the background with a good assurance it'll stay running?  Forever (https://github.com/nodejitsu/forever) provides some level of management and monitoring that a process you put in the background will keep running, and be restarted if it crashes.  Additionally you're likely to want to have the process (forever plus your server) started automagically when the system boots up.

This shell script is a not-quite-LSB-compliant init script that works on Debian (if you ignore a warning that's printed when you install it) and should demonstrate a little about using Forever.

#! /bin/sh -e
set -e
PATH=/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin
DAEMON=/usr/local/notes/app.js
case "$1" in
  start) forever start $DAEMON ;;
  stop)  forever stop  $DAEMON ;;
  force-reload|restart)
  forever restart $DAEMON ;;
  *) echo "Usage: /etc/init.d/node {start|stop|restart|force-reload}"
     exit 1
     ;;
esac
exit 0

A brief look at Sequelize, an ORM for Node.js with MySQL, PostgreSQL or SQLITE3

Want to do some database code but not think too much about it?  Such as, avoid SQL?  You can have your SQL and a simplified model of your database thanks to a module I just found for Node.js called Sequelize.  It adds an ORM-like layer on top of MySQL, PostgreSQL or SQLITE3, allowing you to do database interactions using JavaScript code rather than SQL.  It's fairly nice and easy to use, however I think it's likely there are some limitations to the complexity of what you can do with Sequelize.

Installation:  npm install sequelize

Basic usage:

var Sequelize = require("sequelize");
var sequelize = new Sequelize('databaseName', 'username', 'password', {
    host: "my.server.tld",
    dialect: 'mysql'
});

This creates a connection to a MySQL database on the named host.  The parameters object is used to tailor what kind of database to connect with, as well as a large ream of options.  Sequelize was originally written for MySQL and the PostgreSQL and SQLITE3 support are more recent.

Next, you create a table definition this way:-

User = sequelize.define('User', {
    id: Sequelize.INTEGER,
    username: Sequelize.STRING,
    password: Sequelize.STRING,
    email: Sequelize.STRING
});
User.sync().success(function() {
    ... success code
}).error(function(err) {
    ... error code
});

Under the covers this causes SQL to be generated and executed, and with the right options it gets printed on the console.  The User.sync() call is what forces this SQL to execute.

CREATE TABLE IF NOT EXISTS `Users` (
    `id` INTEGER NOT NULL auto_increment ,
    `username` VARCHAR(255),
    `password` VARCHAR(255),
    `email` VARCHAR(255),
    `createdAt` DATETIME NOT NULL,
    `updatedAt` DATETIME NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB;

The createdAt and updatedAt columns are generated by Sequelize for bookeeping.

Writing a record to the database is trivial and straight-forward:-

User.create({
    id: id,
    username: username,
    password: password,
    email: email
}).success(function(user) {
    callback();
}).error(function(err) {
    callback(err);
});

Likewise, to find a record:-

User.find({ where: { id: id } }).success(function(user) {
    if (!user) {
        callback('User ' + id + ' does not exist');
    } else {
        callback(null, {
            id: user.id, username: user.username, password: user.password, email: user.email
        });
    }
});

The update and delete functions are likewise as simple and straight-forward.

The table definitions can optionally attach any of a long list of validation parameters to each column definition.  This includes the obvious validations like "is it a number" but includes things like "IP Address" or "Credit Card Number."  The validations will help to ensure the data is consistent.

It does support something like joins to connect entries in one table to entries in another table.  These are called Associations and Sequelze supports One-to-One, One-to-Many and Many-to-Many associations.  Under the covers it adds columns to each table as appropriate, and methods in the classes.

var User = sequelize.define('User', {/* ... */})
var Project = sequelize.define('Project', {/* ... */})
 
// One-way associations
Project.hasOne(User)

This module can do a lot more - see http://www.sequelizejs.com/#home for more info.

There are, of course, other database choices, including many with ORM features, listed at https://github.com/joyent/node/wiki/modules#wiki-database

Monday, March 11, 2013

Node.js applications can now be hosted on Amazon's Elastic Beanstalk cloud platform

The scene for cloud hosting Node.js applications just got more interesting with the announcement that Amazon's Elastic Beanstalk now supports Node.js applications.  
Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring. At the same time, with Elastic Beanstalk, you retain full control over the AWS resources powering your application and can access the underlying resources at any time. Elastic Beanstalk leverages AWS services such as Amazon Elastic Cloud Compute (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon Simple Notification Service (Amazon SNS), Elastic Load Balancing, and Auto Scaling to deliver the same highly reliable, scalable, and cost-effective infrastructure that hundreds of thousands of businesses depend on today. AWS Elastic Beanstalk is easy to begin and impossible to outgrow.
The Elastic Beanstalk service already supports a number of languages like PHP, Java, and Ruby.  With today's announcement it now supports Node.js, and from the writeup on their blog post it looks like it'll take any natural normal Node application.

The writeup says you can configure the application to be proxy'd behind either Nginx or Apache, or to run without a proxy.  Load balancing can be either HTTP or TCP, and if you're using WebSockets it says TCP load balancing is more appropriate. 

Node.js 0.10.0 coming out soon - time to check for API changes and migrate code

Node.js v0.10.0 is about to be released, meaning that it's time to check the API changes and take a look at changes you need to make in your code.  For me this means verifying the code I'm writing for the updated version of Node Web Development (see links in sidebar) will not be broken.  API changes can of course break your application.

See ChangeLog for more details: https://github.com/joyent/node/wiki/Api-changes-between-v0.8-and-v0.10

Other than the Streams2 interface introduction I don't see much in the Change Log to be worried over.  Most of the API changes are in features that (I think) won't see wide usage.  But here's a couple high points:

In some cases where you use process.nextTick to dispatch work to the future, you should use process.setImmediate instead.  Do this when your nextTick's are recursive.

Parsed URL objects will now have all fields, even for fields that are empty.  Those fields will be set to null.

EventEmitter subclasses must now be done the correct way, by using util.inherits.

And of course there is the aforementioned Streams2 interface.

Tuesday, February 12, 2013

Uploading/mirroring files to remote server in Node.js without using rsync

How do you upload files to a server to deploy application or website code?  FTP?  rsync?  While it's easy enough to call a command line tool like rsync from a Node.js script, what if you're using a Windows computer that doesn't have those command line tools.  When I use Windows it's like stepping back into the dark ages where directory listings looked like we were making fire by rubbing sticks together.  Okay, there does appear to be an rsync for Windows but I had no confidence in it.  Also, I did not want to have a dependency on something like Cygwin.

Anyway the primary user of AkashaCMS (my girlfriend) likes her Windows machine very much.  After watching her struggle mightily with understanding how to use an FTP program to upload her website, I developed a Node.js script for uploading files that does not rely on rsync.

The script works fairly well but the approach was tricky enough it'd be nice to have some other eyes take a look at it.  The github repository is at https://github.com/robogeek/node-ssh2sync

The approach was to directly use the SSH2 protocol to use SFTP to copy files and to remove excess files on the remote server.  This is done using the SSH2 module for Node.

Previous to developing this tool I'd coded this up, to use rsync:

var user = config.deploy_rsync.user;
            var host = config.deploy_rsync.host;
            var dir  = config.deploy_rsync.dir;
            var rsync = spawn('rsync',
                    [ '--verbose', '--archive', '--delete', config.root_out+'/', user+'@'+host+':'+dir+'/' ],
                    {env: process.env, stdio: 'inherit'});

The purpose of this command was to make the remote directory a duplicate of the local directory.

That's the full extent of functionality in ssh2sync.  To upload files, attempt to set their file times to match the local file times, and remove any excess files on the remote server.

I've tested it and it appears to work between my laptop (Mac OS X) and server (Debian Linux).  That's not a very exhaustive set of tests.  In the workspace is a script, try.js, that I'm using to test the tool.  The options object is the same as is documented for the underlying library.  https://github.com/mscdex/ssh2

It doesn't appear to be setting the file times on the remote server.  But it does appear to be doing everything else I want it to do.

It is, however, rather limited in that it does only the one thing.  The rsync command has a long list of other options, and in theory it appears possible to implement everything rsync does using SFTP.  That means ssh2sync could theoretically be a more comprehensive tool.  But if that is to be, then someone else will have to do it.  At the moment it does everything I require, other than possibly some bugfixing after testing it on my girlfriend's laptop. 

Usage in AkashaCMS is fairly simple.  Add configuration in config.js, then run the command
akashacms deploy
It'll be very simple.  Honest.  Much simpler than a GUI FTP program.

Monday, January 14, 2013

Does Node.js need to be governed by an independent foundation? Or are we safe with Joyents overlordship?

The Vert.x project, a Java based event-oriented system inspired by Node.js, has run into a spot of legal trouble that should serve as a reminder to open source developers who work for companies.  Tim Fox, the developer of Vert.x, had worked for VMware until December 2012 and joined Redhat.  He had assumed that he'd be able to continue working on Vert.x after transitioning from one company to another, but instead was hand-delivered a letter from VMware's lawyers demanding that he hand over the keys to the Vert.x project.

With every job I've had working for a company, I signed a legal something insisting the company had ownership over any code I wrote whether or not it was on the job or not.

I'm simply reading from a piece on The Register penned by Matt Asay, a rather outspoken open source thought leader, discussing the troubles going on.  I don't know various details and the actual situation may be different from what Asay suggests.   Reading his piece I am under the impression that Tim Fox is a complete victim in this, that perhaps he developed Vert.x on his own time and that VMware had no corporate role in Vert.x's development, and that therefore it's an "of course" that VMware shouldn't be asserting ownership over the project.

However, Asay's piece does have this admission buried way down in the article and took me three readings to find it:
To be clear, VMware funded the creation and development of Vert.x. As such, it's reasonable that it assumes a measure of involvement and even control over Vert.x. But not like this. ... The project started after he began at VMware, and VMware funded his development of Vert.x.
I'm sorry Matt Asay, but when a company funds something they own it.  They can choose to treat the project in one way or another, but they do own it.

Fox sent a letter to the Vert.x community that said, in part, "In the spirit of open source and as a commitment to the Vert.x community I had expected (perhaps naively) that VMware would continue to let me continue to administer the Vert.x project after I had left their employment."  That phrasing is one of a person who clearly knew that he was administering the project under the permission of his Management, rather than having started the project on his own spare time and having a reasonable expectation of control over the project.

Asay goes on to describe what "should" happen in such a situation, essentially saying that the company (VMware in this case) "should" give up control over the project to the lead developer.  He points to several such cases, primarily the Netty project that had been started by Redhat employees and was now its own self-standing project.  Should an employer always do that, however?  Really?

For example, if Mark Reinhold were to leave Sun to join IBM would he be able to bring ownership over the OpenJDK project with him?  No.  Let me explain that I was involved with the launching of the OpenJDK project and know quite a bit about it.  That project was built out of Sun's Java implementation and was the result of a zillion contributions by Sun employees and employees of other corporations.  Sun, and now Oracle, fund the OpenJDK project, host the project repositories, pay the salary of the dozens of Oracle employees working on it, own the trademarks and domain names, etc.  Mark Reinhold is the project leader and did a heck of a lot to push Sun's management into starting the project and while his fingerprints are all over the project it was a team effort by a large staff of people, including myself.

Another example from that era is the Hudson project launched by another Sun employee.  This is a widely used build tool.  He wrote that while a Sun employee in the Glassfish team, and wrote it specifically to solve problems he and his team faced in doing their jobs.  That is, it supports continuous integration practices, automated test management, etc.  When he left Sun he wanted to take control of the Hudson project along with him, but Sun (or was it Oracle by that time) didn't want to give up ownership.  That meant he had to rename the project to Jenkins because Sun/Oracle owned the domains and trademarks etc.

The Hudson/Jenkins project is a closer example to Vert.x than the OpenJDK project which I threw in as an absurd example.  Koshuke did the lions share of the work on Hudson and was widely identified as the main person behind the project.  However, I know for certain that Sun put a fair amount of resources into supporting his work on the project, and for example paid his way to Belgium to attend the FOSDEM conference to present Hudson to European open source developers.  It was at that FOSDEM where I met Koshuke, FWIW, when management paid for my trip to help present the OpenJDK project to European open source developers.

What is the correct response by a corporation when one of the employees leaves, and wants to keep ownership over a project the company had funded?

The actual response will vary all over the map because each corporation runs under its own set of ideas and principles.  Some corporations obviously want to assert lots of control such as in the Sun/Oracle/Hudson/Jenkins example I gave.  Others are very permissive.

The correct response must vary based on the depth of support the company gave.  For example, a project that was the sole work of a specific person, where the project played no role in the corporate strategy, but the company did fund the development e.g. by allowing 10% of employee time to work on the project ... that sounds like an ideal time for the corporation to allow the former employee to take the project with them.  But if the project was critical to that corporations plans, the employee had worked full time on the project, and it was the work of several employees, that's an ideal time for the corporation to insist on ownership.

As I said earlier, I don't know where the Vert.x project fits within that continuum.

An additional thought to ponder because this blog is about Node.js is - what is the long-term fate of Node.js?  Its development is heavily funded by Joyent.  What if Joyent were to have a change of heart about Node.js and decide to pull their support of the project?  Will the community continue to have control over the project?

I rather doubt that would even happen because a) Joyent's management knows open source community practices very well .. b) Joyent uses Node.js as a core part of their technology stack ..

But we in the Node.js community would be in a clearer position if there were an independent foundation having ownership over Node.js.  However going by the outline I gave above, Node.js development was and continues to be funded by Joyent, Joyent is clearly in ownership of everything, etc.  Meaning that we're unlikely to ever see Node.js being handed to an independent foundation.