Wednesday, November 30, 2011

Node 0.6.x and the code in Node Web Development

Node 0.6.x came out a month or so ago, and befitting a software platform with a 0.x version number it came along with a slew of API changes.  The API changes are positive and great, and I'm sure there will be a few misfortunate people on Windows who will appreciate that Node is now running native on that operating system.  In any case on a more direct level to me was the worry of whether the API changes would break the code in my book, Node Web Development (see link in the sidebar).

I'm happy to report that all the examples in the book still work with 0.6.2 on my laptop (Mac OS X).  Whew!

That is, except for the examples which use the LearnBoost cluster package to manage multiple processes. 

Let's first back up a sec and go over the thing "cluster" is trying to solve.  In the book I had this fun-to-write section in one chapter where I spun a picture of your manager getting upset because the new Node.js app you'd written only uses one core of the shiny new 32 core server you bought to run the app on.  The "cluster" package gave you a solution for this by giving you a way to manage multiple child processes from a simple API.

This issue stems from the Node design principles.  A single execution thread, and no threading period.  That means your Node process will run on one core, period.  That puts "scaling" as an issue you have to solve right away in your application, how do you scale to use multiple cores and/or multiple servers. 

In any case coming along with Node 0.6.x was a new core package named "cluster".  You access it with "require('cluster')" which means the existing cluster package is now squeezed out by the core package of the same name.  Not only that, but the 0.6.x cluster package had a different API from the LearnBoost cluster package.  I see on the github page for the LearnBoost cluster package that they claim support for 0.2.x and 0.4.x but not 0.6.x.  Hmmm..

Fortunately for my book the use of the LearnBoost cluster package was limited to one example at the beginning, and you can skip right over it if you like.

I think, though, that I'll explore using the new cluster package and post something about it soon.

Monday, November 14, 2011

The lobrow method of using Node.js modules in the client side (browsers)

Node.js is a "server side javascript" platform for doing javascript software development outside the browser.  It's cool to bring this language to new places so it can be applied to problems other than web pages in browsers.  However as I noted a few days ago, there's a long-standing dream of using the same programming language for both client and server software development.  (see http://nodejs.davidherron.com/2011/11/yahoo-reveals-their-nodejs-mojito.html and http://nodejs.davidherron.com/2011/11/do-front-end-engineers-using-nodejs-win.html)

Over the weekend a blog post popped up discussing a prototype, named lobrow, implementing an implementation method for using Node.js modules in a browser.  While Node.js modules are written in JavaScript the API and structure are somewhat different than what gets used in browsers.

The basic usage is:

<script src="lobrow.js"></script>
    <script>
        lobrow.onload(["./mylib"],
            function (mylib) {
                ...
            });
    </script>

One thing leaps right out - that in browsers loading javascript (the require function) has to be asynchronous, whereas in Node.js the require function is synchronous.  That is, when Node executes the require statement the program can immediately use that module.  That's not how browser things work.  It means running Node.js code in a browser has to somehow support asynchronous require statements.

Another issue that isn't touched on by the 2ality.com post is other API differences.  There's a slew of core modules and add-on modules for Node that don't exist in a browser environment.  It means that to directly use "any" Node.js module in the browser means implementing some kind of compatibility layer of the core Node.js modules.  Otherwise for Node.js modules means limiting those modules to an API set which can be supported in the browser.

Related work:

http://www.2ality.com/2011/11/lobrow.html

https://github.com/rauschma/lobrow

Friday, November 4, 2011

The potential for performance wins by baking modules into memory in Node.js

In a presentation at Yahoo's front-end engineer's conference earlier this year, Dav Glass demonstrated a performance gain by building a custom Node binary that bakes his own modules into memory in the same way Node bakes in the core classes.  I already discussed one aspect of his presentation, whether its valuable for front-end engineers using Node to have access to the toolkits familiar to them from their client side work.  But there was a little segue in his presentation where he showed a neat trick that gave a neat performance improvement.

To start with we must recognize that Node's core packages are baked into the binary.  The require function has a mode where it can resolve a requested module from one that's already baked into memory.  During the Node build process it converts some the javascript files for the core packages (e.g. "http") into C files which are then compiled into memory.  This simplifies deploying Node onto servers because you have fewer files to install, but it turns out to be a performance enhancement.

What Dav showed was a performance gain from baking his own module sources into a custom Node binary.  However I'm not sure how appropriate his optimization is for general use on Node.

The test he showed was to repeatedly load YUI instances and with Y.use to load needed YUI modules.  Using the normal Node binary this was pretty fast, but he constructed a custom Node binary that had some (?all?) of the modules already baked into memory.  The performance jumped dramatically.

Where I'm not sure about this is that Node's require already caches module source in memory.  If you require('xyzzy') more than once the second time around the module will already be in memory and it won't fetch it from disk again.  Dav Glass in his talk claimed that the performance improvement came from not having to grope around the file system for module files, so by having module source baked into memory there's no need to search the file system doing readdir's and stat's along the way.

Because Node's require function already cache's the module in memory, Node only has to search the file system once to resolve a module request.  It's clear from the presentation that Dav Glass was talking about modules loaded using the Y.use method rather than through Node's require function.

http://yuilibrary.com/theater/davglass/f2esummit2011-glass/

Do front-end engineers using Node.js win if they can use familiar frameworks even on the server?

Node (a.k.a. Node.js), because it's an excellent server side javascript platform, makes it possible for "front end engineers" who normally code javascript in web pages, to now do stuff on the server side.  Dav Glass, a YUI engineer, gave a presentation at Yahoo's F2E (Front End Engineers) conference about using YUI3 on Node to make Node into a convenient familiar environment for front end engineers.

The video is available through the YUI Library website (see link below).  At the time I viewed the page it refused to let me play it in my browser, but I could download the .mp4 file (over a gigabyte!) and view it using QuickTime.  YMMV

The argument, that front-end engineers will find YUI3+Node comfortable, fits neatly with the observation I made yesterday (http://nodejs.davidherron.com/2011/11/yahoo-reveals-their-nodejs-mojito.html) about the long-standing dream of the same code running on server and client.  Dav's argument is that someone familiar with YUI3 in a browser will be instantly comfortable with YUI3 on Node.  And he proceeded to go through several code samples to demonstrate this argument.

Unfortunately the camera angle was not such that we could read the code samples.  Instead you'll have to just get the gist of it from how he presented the ideas, plus you can download his code from http://github.com/davglass.

A detail which sticks out right away is that YUI has its own module system, its own system for resolving dependencies and loading the correct modules.  Node has the baked in module system, based on the require function, and YUI has its own.  The YUI coder will write Y.use(), at which point YUI will resolve (and load) through its own mechanisms the requested YUI modules.  This may not be a problem but it's an instance of duplication.

What struck me is the potential value in developer knowledge reuse.  As it stands with Node, if you give Node to a front-end engineer their only familiarity will be the language.  They'll be learning a whole new framework, from the Node module architecture to the Node API's to the npm package manager, etc.  It's not like this is tough and there are books available, such as Node Web Development, to help them get up to speed.  But maybe Dav is right, that there's enough value in the familiarity of a familiar toolkit to refactor it for use on Node.

http://yuilibrary.com/theater/davglass/f2esummit2011-glass/

Thursday, November 3, 2011

Getting image metadata using the Node.js imagemagick module

Continuing on in my quest for Node.js scripts to manipulate images (see my earlier JPG to PNG conversion script), today I'm looking at how to access image metadata.  The result I'm interested in is to store a caption, keywords and other information in the image itself and to display the caption and keywords on web pages.  I found it pretty easy to add a caption and keywords and other metadata using Picasa.

The first step is to inspect the metadata - and with imagemagick's command line tools you do this:

$ identify -verbose ../Pictures/2011-10-20/DSCN8246.JPG

It dumps out a lot of data and there's two sections which I've found to be interesting.  One is the EXIF section which appears to be attributes from the camera about the exposure.  It contains things like GPS coordinates (even if your camera doesn't set GPS coordinates, you can add them after the fact using Picasa) and exposure parameters.  The other section is the IPTC profile and the only bits in that I found important were the Keywords and Caption, both of which can be set in Picasa.

    exif:DateTime: 0000:00:00 00:00:00
    exif:DateTimeDigitized: 0000:00:00 00:00:00
    exif:DateTimeOriginal: 0000:00:00 00:00:00
    exif:ExifImageLength: 3648
    exif:ExifImageWidth: 2736     exif:GPSInfo: 4732
    exif:GPSLatitude: 37/1, 24/1, 49005/2048
    exif:GPSLatitudeRef: N 
    exif:GPSLongitude: 122/1, 5/1, 82875/2048 
    exif:GPSLongitudeRef: W  Profile-iptc: 95 bytes
       unknown[1,0]:  
       City[1,90]: 0x00000000: 254700                                        -% 
       unknown[2,0]: 
       Keyword[2,25]: Junction Box
       Keyword[2,25]: Karmann Ghia
       Caption[2,120]: This is a caption
       Keyword[2,25]: Goober-Beans

I tried several other tools to get this data and had problems with each, only ImageMagick's identify command would give me the whole enchilada.

The imagemagick module can be used to bring all this data into Node as a JSON object but it takes a bit of work.  First, the module contains a function readMetadata whose purpose is to read metadata and give you a JSON object.  BUT the function only reads the EXIF data, not the IPTC data.  There's another module, identify.js, which also runs the identify command and returns a JSON object, but it doesn't returns the entire IPTC object.

Those considerations led me to a not-quite-optimal solution that happens to work.  Namely, to run the identify command multiple times for each metadata value collecting them all together into a JSON object.

The core of this is as so:

    im.identify(['-format', '%[EXIF:DateTimeOriginal]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateOriginal': metadata.trimRight() });
        });

This uses the imagemagick module to run the identify command.  You specify a -format option for the metadata item you're interested in, and the data arrives in the function.  It arrives with a newline character on the end which the trimRight function lops off.

It is unfortunate that the identify command appears to only allow you to query one metadata item at a time.  Hence you have to run this multiple times, as so:

var im   = require('imagemagick');
var util = require('util');
var async = require('async');

async.series([
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTime]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'date': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTimeDigitized]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateDigitized': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTimeOriginal]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateOriginal': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:ExifImageLength]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'length': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:ExifImageWidth]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'width': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLatitude]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLatitude': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLatitudeRef]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLatitudeRef': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLongitude]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLongitude': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLongitudeRef]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLongitudeRef': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:25]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'Keywords': metadata.trimRight().split(';') });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:55]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { '2-55': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:60]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { '2-60': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:120]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'Caption': metadata.trimRight() });
        });
    }
],
function(err, results){
    util.log(util.inspect(results));
});

The async module is used here to ensure the identify command invocations are run one-at-a-time.  In my experience running multiple simultaneous identify commands tends to lock up my laptop.

One run it gives output like so:

$ node ei.js
3 Nov 20:20:47 - [ { date: '0000:00:00 00:00:00' },
  { dateDigitized: '0000:00:00 00:00:00' },
  { dateOriginal: '0000:00:00 00:00:00' },
  { length: '3648' },
  { width: '2736' },
  { GPSLatitude: '37/1, 24/1, 49005/2048' },
  { GPSLatitudeRef: 'N' },
  { GPSLongitude: '122/1, 5/1, 82875/2048' },
  { GPSLongitudeRef: 'W' },
  { Keywords: [ 'Junction Box', 'Karmann Ghia', 'Goober-Beans' ] },
  { '2-55': '' },
  { '2-60': '' },
  { Caption: 'This is a caption' } ]

At the end of this I'm pondering - how would this be better?

I would image there's already a C/C++ library for image manipulation.  It's rather clunky to be interfacing with imagemagick through a command line interface like this.  It presents synchronization challenges, overhead, and suffers from the peculiarities of its command line options.  I think there must be an image manipulation LIBRARY that could be directly coupled to Node as a module with many native functions.  But I've looked at all the modules listed in NPM's registry and none were suitable.

The image/png/jpeg modules were mentioned in a comment left on my earlier post.  Those modules let you create an image, layer images together to make composite images, and that's kind of it.  They don't let you get the metadata, for example, and the image manipulation functions supported by those modules are limited.  The attraction of the imagemagick tool is the extensive manipulations it can do.

Anyway.. in terms of actual running code - the script above does what I want even if it's not optimal.

 

Wednesday, November 2, 2011

Yahoo reveals their Node.js Mojito Manhattan cocktail mix - now I can talk about it?

Today Yahoo announced a portion of a new web application platform, Mojito and Manhattan.  It's a project I worked on while at Yahoo and todays announcement somewhat frees me up to talk about this exciting platform in public.  Coincidentally the yearly YUI Conference begins tomorrow and there's a couple sessions which will talk about Mojito and Node.js.  However it doesn't appear the platform tools etc will actually be available for a while yet.
The naming theme in this project is "cocktails" which they arrived at in a backwards way.  They mashed together the words "module" and "widget" to make a new word, mojit, which then led to Mojito and the other cocktail names.  It's really too cute for words, especially with the other (as yet unannounced) cocktail names associated with other (as yet unannounced) projects in the Cocktails family.
Mojito - It's not just for the drinks section anymore

The two parts are:
  • Mojito: A module/widget/mojit system meant to build application UI's that live in HTML5 (or lesser capability) web browsers.  It also applies to mobile device applications because the mobile programming platforms let you incorporate browser components in the app UI.
  • Manhattan:  A cloud oriented hosting infrastructure for Node.js.  It's not simply Node, but Node plus a whole bunch of layers and geared in part to be a hosting platform for Mojits to live in.
Perhaps it should have been obvious that Yahoo was seriously interested in Node.  There's been a series of YUI Theater video presentations (linked below) starting with a video a year ago where Dav Glass gushed over and over and over about the game-changing power of being able to run YUI code on a server environment using Node.  What that launched was a significantly large project to build a complete hosting platform which Yahoo could use to develop new web properties, and be opened to the public so non-Yahoo's could host applications on Yahoo's infrastructure.
An article on Wired.com today describes the hosting platform as being akin to Google's AppEngine - and, yeah, that's the intention.  It's not a cloud infrastructure like Amazon's EC2 where you're able to rent virtual machines you configure from the metal on up.  Instead it's a platform into which you deploy code conforming to a language (JavaScript) and various API's (Mojito, YUI, Node, etc).  In todays announcement they talk quite a bit about running code on either client or server, unchanged.  But the line of reasoning you see below suggests this code, in order to execute in either server or client environment, cannot take advantage of all the Node API's or all the browser API's.
The idea is to assemble a Mojit using a hybrid Node/YUI package structure, then deploy the tarball into Manhattan.  Unfortunately I feel constrained in what I can say because Yahoo hasn't properly unveiled the product details yet, and at the time I left the team in June we were still in early stages of putting the platform together.  Yahoo says that Manhattan is a cloud hosting platform, but in June … uh … Anyway, what I'll do is go over their blog posts today and see what I can expand upon.
Cocktails: Cocktails makes it simple to build, personalize and modify content for all consumer platforms, and to connect audiences with premium content.
Yahoo has a long history with an earlier platform for mobile applications, Blueprint, that detects device characteristics and customize the actual HTML or WAP code sent to the device based on the device grade.  In part that knowledge is being applied to Mojito.  Not that Blueprint is being used in Mojito.  Instead they're re-applying that knowledge to Mojito.
In the videos in their blog entry, Ren and the others talk about the freedom of having your code being able to run on either server or client, because it's JavaScript at both ends.  In my book, Node Web Development, I talked about how this dream goes back to the earliest days of the Web - the original hype around Java was this same dream, that you'd have Applets doing dynamic things in the browser, and Servlets doing dynamic things in the server, and we'd sing praises to high heaven in a glory of nirvana.  That dream fell down somewhere however and running Java in the browser is basically dead.  The Mojito team wants to enable that dream with JavaScript, by allowing Mojito code to execute on either server or client depending on various characteristics of the browser, server, connection…etc
Part of this is enabling the YUI library to run on either server or client.  Manhattan is a hosting platform for Node, and while Node modules are excellent they're not quite compatible with being executed on a browser (client).  They've published a few videos (linked below) about this idea, using YUI code in a Node module.  Clearly for code to execute either on server or client it needs to use an API set that can run on either client or server.  Because Node's API isn't compatible with running in the browser, this is where YUI comes into play.
Something which is puzzling me - which their blog post doesn't address - is the lack of a DOM in Node, while most of the YUI modules operate on the DOM.  The DOM is the byproduct of a web browser loading a page from a server, parsing the page content into an object model, which it uses to render the page.  Node isn't a web browser and doesn't have a DOM.   YUI's original reason for existence was to abstract away JavaScript inconsistency between browsers and to add niceness to DOM manipulation.  A DOM which doesn't exist in Node.  I don't know what the answer to this and it will be interesting to see what they cook up.  FWIW there are a couple DOM implementations that run on Node, but what I recall is server side Mojits instead using templates.
Yahoo! Manhattan extends Node.JS to provide the necessary fault-isolation and fault-tolerance, scalability, availability, security and performance you’d otherwise expect from one of the largest web companies in the world.
That's a very information dense sentence because each word refers to whole areas of expertise which they poured into developing the hosting platform.  This is the team I actually worked in, and I feel constrained from sharing too much.  Basically what this means is a Mojito developer will not have full access to the entire Node API because of all these reasons.  Additionally the vision is that the Manhattan infrastructure would spin up in a cloud-oriented fashion servers to handle traffic to each Mojit.  Yahoo already has a world class web infrastructure which is providing some of the words mentioned in that sentence.
We will also make Mojito open source through YDN in the first quarter of 2012.
Later in 2012 we will be opening Yahoo! Manhattan for publishers to be able to run Mojito-based applications on Yahoo!’s Cloud.
Be patient ...
I wanted to close with embedding their videos in this blog post and talk about them.  Unfortunately they're not allowing the videos to be embedded - so click on the "Shaken, Not Stirred" link below.
The first video shows Matt Taylor using the "mojito command line tool" to develop an application, then the "ghh" tool to deploy to a Manhattan server.  Note the video doesn't show coding.  The video demonstrates automatically degrading an application from a rich user interface down to buttons on an HTML page depending on the device characteristics.  As mentioned earlier, they're intending to detect device characteristics and tune the user experience based on those characteristics.  This is something Yahoo has years of experience from their mobile applications.
The video also shows the "ghh" tool - it's a command line tool for deploying applications to Manhattan.  I'll be really surprised if ghh is delivered to the public for public use, because it didn't seem in June that ghh would ever be anything but an Yahoo-internal tool.  But plans do have a way of changing as a project matures.  I wonder if this is a video originally meant for internal training, that they trimmed down for this public showing?
The second video talks about solving the fragmentation problem that Yahoo saw.  Yahoo has a long history of preferencing actual Web Standards rather than going for proprietary stuff.  You can see this as a result of not owning the device of any customer hardware, and that therefore Yahoo's success depends on stressing open Web Standards.
A puzzle in the second video is what they mean by an "execution environment" in the browser for running Mojits.  Clearly Node modules aren't directly applicable to a web browser because Node offers a lot of modules and API's that don't exist in browsers.  Node modules are CommonJS modules (more or less) which can execute in a browser so I suppose that as long as your Mojit code doesn't use any Node-specific API's it could execute either on server or client.

Using Node.js and YUI 3 (Dav Glass in September 2010 gushing about YUI running in the server on Node)
Node.js Roadmap (Ryan Dahl presenting the Node roadmap at YUIConf 2010)
Node.js + YUI 3 (Dav Glass at YUIConf 2010)
YUI 3 & Node.js for JavaScript View Rendering on Client or Server Matt Taylor - The promise of Node.js and YUI 3 running server-side is that a new era of frameworks is possible in which view rendering on the client and server is implemented with the same JavaScript-based code.
YUI 3 and Node.js - Not Just For Web Pages (Dav Glass in May 2011)
Yahoo’s ‘Manhattan’ To Rescue Web From the iPad
Yahoo! Announces Cocktails – Shaken, Not Stirred
YUI for Cocktails