Friday, July 8, 2011

Java, Twitter, and asynchronous event driven architecture

Twitter famously launched using the then-popular Ruby on Rails web framework. Since then they suffered scalability problems which they famously made light of with the Fail Whale. Word has been that they started using Scala a while back, and it turns out they've been doing an intense study of methods to scale their service to handle the traffic volume they've been facing. A recent article on InfoQ went over some of the things they did, and surprisingly they did not use any Node.js software.

Well, their choice may be surprising today when Node.js is getting so much excited attention, but we should recall the decisions they made began before Node.js was available and even today Node still has a pre-1.0 version number. In any case let's ponder what the InfoQ article says.

They changed the search engine storage from MySQL to Lucene, and replaced a Ruby on Rails search UI "with a Java server they called Blender." (Blender is "a Thrift and HTTP service built on Netty, a highly-scalable New I/O (NIO) client server library written in Java that enables the development of a variety of protocol servers")

They wrote an open source framework, Gizzard, "for creating distributed datastores, is used to partition MySQL". They're using "HDFS in Hadoop extensively for off-line computation" and so on.

Languages used at Twitter are: JavaScript, Ruby, Scala and Java where "developers coming from a Ruby background tend to prefer working in Scala, whilst developers coming from a C or C++ background choose Java."

They developed Finagle as a "a library for building asynchronous RPC servers and clients in Java, Scala, or any JVM language. It is written in Scala, but also supports a highly Java-idiomatic API."

The back-end code is being moved to run on the JVM, which supports multiple languages (Java, Scala, Python, Ruby and even JavaScript) and JavaScript is letting them build heavier and more powerful browser based client code. They see the Ruby runtime as "slow in comparison with the JVM". Because the JVM can run JRuby and run Rails on JRuby, it means they can move their Ruby code to the JVM without rewriting it. But a consideration is the various "clients" they use, citing the CRuby memcache implementation which is way faster than the JRuby one.

They're happy with their system performance, they are "one of the largest websites in the world, but run on a very small hardware footprint compared to other big dynamic sites" and "Keeping the hardware footprint small has advantages in terms of cost, but also avoids some of the secondary scaleability concerns, such as the performance of the TCP stack, that can impact sites with larger hardware demands." So performance wasn't their prime motivating factor, something else was: The primary driver is honestly encapsulation, so we can iterate faster as a company. Having a single, monolithic application codebase is not amenable to quick movement on a per-team basis. So when we decide to encapsulate something, then because of our performance concerns, its better to rewrite it in the JVM for most systems, than to write a new Ruby system.

Okay, that was a lot of cool information about their decisions. But let's ponder it versus Node.js as a potential tool for the issues they describe.

Languages: They're clearly a multi-language shop, and Node.js is a single language solution.

Performance: Performance is more complex than just asynchronous coding.

Asynchronous: The issue of event driven asynchronous architecture is only a small part of the overall system.

In other words, there are a lot more issues at play than the focus on an asynchronous architecture that was the focus of the design of Node.js.

Node.js is an exciting system. But is it the be-all-end-all of web application development?

You might be interested in some earlier articles about Node.js:

No comments:

Post a Comment