Thursday, November 3, 2011

Getting image metadata using the Node.js imagemagick module

Continuing on in my quest for Node.js scripts to manipulate images (see my earlier JPG to PNG conversion script), today I'm looking at how to access image metadata.  The result I'm interested in is to store a caption, keywords and other information in the image itself and to display the caption and keywords on web pages.  I found it pretty easy to add a caption and keywords and other metadata using Picasa.

The first step is to inspect the metadata - and with imagemagick's command line tools you do this:

$ identify -verbose ../Pictures/2011-10-20/DSCN8246.JPG

It dumps out a lot of data and there's two sections which I've found to be interesting.  One is the EXIF section which appears to be attributes from the camera about the exposure.  It contains things like GPS coordinates (even if your camera doesn't set GPS coordinates, you can add them after the fact using Picasa) and exposure parameters.  The other section is the IPTC profile and the only bits in that I found important were the Keywords and Caption, both of which can be set in Picasa.

    exif:DateTime: 0000:00:00 00:00:00
    exif:DateTimeDigitized: 0000:00:00 00:00:00
    exif:DateTimeOriginal: 0000:00:00 00:00:00
    exif:ExifImageLength: 3648
    exif:ExifImageWidth: 2736     exif:GPSInfo: 4732
    exif:GPSLatitude: 37/1, 24/1, 49005/2048
    exif:GPSLatitudeRef: N 
    exif:GPSLongitude: 122/1, 5/1, 82875/2048 
    exif:GPSLongitudeRef: W  Profile-iptc: 95 bytes
       unknown[1,0]:  
       City[1,90]: 0x00000000: 254700                                        -% 
       unknown[2,0]: 
       Keyword[2,25]: Junction Box
       Keyword[2,25]: Karmann Ghia
       Caption[2,120]: This is a caption
       Keyword[2,25]: Goober-Beans

I tried several other tools to get this data and had problems with each, only ImageMagick's identify command would give me the whole enchilada.

The imagemagick module can be used to bring all this data into Node as a JSON object but it takes a bit of work.  First, the module contains a function readMetadata whose purpose is to read metadata and give you a JSON object.  BUT the function only reads the EXIF data, not the IPTC data.  There's another module, identify.js, which also runs the identify command and returns a JSON object, but it doesn't returns the entire IPTC object.

Those considerations led me to a not-quite-optimal solution that happens to work.  Namely, to run the identify command multiple times for each metadata value collecting them all together into a JSON object.

The core of this is as so:

    im.identify(['-format', '%[EXIF:DateTimeOriginal]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateOriginal': metadata.trimRight() });
        });

This uses the imagemagick module to run the identify command.  You specify a -format option for the metadata item you're interested in, and the data arrives in the function.  It arrives with a newline character on the end which the trimRight function lops off.

It is unfortunate that the identify command appears to only allow you to query one metadata item at a time.  Hence you have to run this multiple times, as so:

var im   = require('imagemagick');
var util = require('util');
var async = require('async');

async.series([
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTime]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'date': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTimeDigitized]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateDigitized': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:DateTimeOriginal]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'dateOriginal': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:ExifImageLength]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'length': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:ExifImageWidth]' ,'../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'width': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLatitude]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLatitude': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLatitudeRef]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLatitudeRef': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLongitude]','../Pictures/2011-10-20/DSCN8246.JPG'], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLongitude': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[EXIF:GPSLongitudeRef]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'GPSLongitudeRef': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:25]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'Keywords': metadata.trimRight().split(';') });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:55]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { '2-55': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:60]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { '2-60': metadata.trimRight() });
        });
    },
    function(callback) {
        im.identify(['-format', '%[IPTC:2:120]','../Pictures/2011-10-20/DSCN8246.JPG' ], function(err, metadata){
          if (err) callback(err);
          callback(null, { 'Caption': metadata.trimRight() });
        });
    }
],
function(err, results){
    util.log(util.inspect(results));
});

The async module is used here to ensure the identify command invocations are run one-at-a-time.  In my experience running multiple simultaneous identify commands tends to lock up my laptop.

One run it gives output like so:

$ node ei.js
3 Nov 20:20:47 - [ { date: '0000:00:00 00:00:00' },
  { dateDigitized: '0000:00:00 00:00:00' },
  { dateOriginal: '0000:00:00 00:00:00' },
  { length: '3648' },
  { width: '2736' },
  { GPSLatitude: '37/1, 24/1, 49005/2048' },
  { GPSLatitudeRef: 'N' },
  { GPSLongitude: '122/1, 5/1, 82875/2048' },
  { GPSLongitudeRef: 'W' },
  { Keywords: [ 'Junction Box', 'Karmann Ghia', 'Goober-Beans' ] },
  { '2-55': '' },
  { '2-60': '' },
  { Caption: 'This is a caption' } ]

At the end of this I'm pondering - how would this be better?

I would image there's already a C/C++ library for image manipulation.  It's rather clunky to be interfacing with imagemagick through a command line interface like this.  It presents synchronization challenges, overhead, and suffers from the peculiarities of its command line options.  I think there must be an image manipulation LIBRARY that could be directly coupled to Node as a module with many native functions.  But I've looked at all the modules listed in NPM's registry and none were suitable.

The image/png/jpeg modules were mentioned in a comment left on my earlier post.  Those modules let you create an image, layer images together to make composite images, and that's kind of it.  They don't let you get the metadata, for example, and the image manipulation functions supported by those modules are limited.  The attraction of the imagemagick tool is the extensive manipulations it can do.

Anyway.. in terms of actual running code - the script above does what I want even if it's not optimal.

 

1 comment:

  1. exiftool has a json output flag, that might be easier to use

    ReplyDelete