Friday, April 04, 2014

Node.js Introduction from a Java Developer's Perspective

Over the past several years the Web 2.0 world has grown increasingly interesting and exciting. Client-side JavaScript has grown up, and it's now possible to develop full-featured browser applications using just JavaScript/HTML5.

And thanks to Node.js we can now develop entire applications in JavaScript on the server side as well.

I've spent a lot of time in the Java universe over the past fifteen years, and my involvement in the Web 2.0 universe has been fairly minimal. Over the past few days I've had some extra time on my hands, so I've taken the opportunity to dive into Node.js. And since writing things down helps me learn and organize my thoughts, here's a brain dump of my brief (so far) exploration into Node.js.

Describing and Comparing Node.js to Java/JEE

First, the tl;dr: Node.js is a platform (or container) for running network-based server applications written in JavaScript. It is hosted on the Google V8 engine (with a few other C++ libraries to provide I/O bindings etc.). It provides JavaScript APIs for building server applications.

The sweet spot for Node.js is dynamic, responsive, real-time data-oriented applications. Think social apps, browser-based games, etc. built for both desktop and mobile browsers. Pretty much, modern web applications that aren't compute-intensive.

Coming from the Java perspective I found it useful to compare Java/JEE and Node.js fundamentals:

ComparatorJava/JEENode.js
LanguageJavaJavaScript
RuntimeJVMGoogle V8
APIServlet APINode.js API
ContainerTomcat/Jetty/...Node.js1
DI/LifecycleSpringArchitect2
Package ManagementIvy/MavenNPM
Build ToolsAnt/Maven/GradleGrunt/Jake/Mojito
IDEEclipse/Netbeans/JetBrainsCloud9/Visual Studio/JetBrains/Eclipse

The Java Servlet Specification is 200+ pages of text that describes a standard which any conforming implementation must adhere to. There is no official Node.js standard, and "compatible" and "conforming" aren't things in the Node.js world. Node.js is an open source project, and is a collaborative effort supported and backed by multiple individuals, projects, and organizations.

Java and JEE are mature, stable technologies with a slow rate of change. Node.js is young and rapidly evolving.

The Java ecosystem (libraries, tools, containers, and applications for building, testing, deploying, and monitoring applications) is rich and variegated. The Node.js ecosystem is full-featured but not as rich or mature -- although this is rapidly changing. Node.js is being successfully used in a variety of production systems.

There are several Java IDEs with great support for Java/JEE. There are also several IDEs with support for Node.js/JavaScript.

Hello World 1.0

Let's dive into Node.js and write a Hello World application. I'm running Windows and Cygwin; I've kept commands platform-neutral but specific details may vary based on your environment.

Download and Install

First, you need to get Node.js.
  1. Go to http://nodejs.org/download/ and download the appropriate installer/binaries.
  2. Run the installer or unpack the binary. Node won't care where it's installed.
  3. Add the Node.js root directory to your PATH.
Yeah, it's that easy.

Your First Hello World

Make a directory, let's call it hello-node. In that directory create hello-server.js with the following code:

var http = require('http'); // use built-in http module

// declare a function to handle a request
// this function is called by the engine when a request arrives
function fn(req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello Node!');
}

// create a server and tell it to start listening
var server = http.createServer(fn);
server.listen(3000, '127.0.0.1');

To start the Node.js server, run node hello-server from the command line. Yep, it's that easy.

Now go to http://127.0.0.1:3000 in your browser, and then sell your app to Google for $6 billion.

Single-Threaded Event Loop

The function called by the engine when a request arrives is a callback. You'll be using a lot of them in Node.js, so let's take a few minutes to understand the threading model Node.js uses.

In Node.js the main event loop is single-threaded. This means the application code you write all runs in the same thread. This is in complete contrast to the Java servlet model, where every request comes in on a separate thread. To get a feel for this single-threaded model, let's rewrite our (already very profitable) Hello World to introduce a long-running computation. Edit hello-server.js and update the fn callback function:

function fn(req, res) {
    setTimeout(function() {
        console.log("Waking up");
        res.writeHead(200, {'Content-Type': 'text/plain'});
        res.end('Hello Node!'); 
    }, 10000);
}

(This will sleep for 10 seconds and then call the anonymous function which sends the response to the client.) Restart the server. Now open two tabs in your browser. Go to http://127.0.0.1:3000 in both tabs, loading them at the same time. Watch the console output and your browser tabs; you'll see that even though you started loading both tabs at the same time, the second tab finishes loading ten seconds after the first tab finishes. This is a really critical point to understand: each client request is being handled by the same thread. The obvious implication is that you want the event thread to do as little work as possible, otherwise the response time for clients will quickly become unacceptable.

(As with all server applications, if you do care about scalability it's important to do some benchmarking with your favorite load testing tool on a regular basis during development.)

Callbacks, Callbacks Everywhere

We just learned the main event loop -- which executes the application code -- is single-threaded, which means we want to handle requests quickly. So how does this scale? And how do we keep from tying up the main event loop when we're reading a file or making queries to a database?

In Node.js, I/O is event-based and runs in separate threads in parallel to the main event loop. The basic programming model in Node.js uses asynchronous event handlers. We give Node.js a callback function, which Node.js calls when the I/O operation completes.

Let's revisit the first Hello World. The call to server.listen() is executed asynchronously. (Prove this by adding a call to console.log() at the end of Hello World.) Internally, Node.js uses separate threads to receive requests that arrive on this server's port and pushes each request onto the event stack. The event loop thread pulls these events from the stack and executes the code in the callback we provided in the function fn.

Hello World 2.0

Let's rewrite Hello World. We'll introduce the Express framework, the Node.js Package Manager (npm), and how to debug Node.js applications.

Describing Dependencies with NPM

If you're familiar with any kind of package management system (Ivy, Maven, RPM, yum, etc.) you know what NPM is. You can learn more, and browse the npm database, at https://www.npmjs.org/.

We'll use npm to download our dependencies. To start, create hello-node/package.json with the following contents:

{
  "name": "hello-world",
  "description": "hello world test app",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
    "express": "3.5.1"
  }
}

Here, we're telling npm that our application depends on Express version 3.5.1. To download and install Express (and its dependencies), make sure your working directory is hello-node and run npm install from your shell. (Afterwards, you can run npm ls to see all of the packages it downloaded.

Now let's create a new Hello World application, more powerful than the last. Create a new file, server.js, with the following contents:

// declare our requirements
var express = require('express');
var http = require('http');

// create a new Express app
var app = express();

// handle GET requests to /hello.txt with a callback function
app.get('/hello.txt', function(req, res) {
  res.send('Hello World 2!');
});

// use a static mapping to convert requests with /static into /public
app.use('/static', express.static(__dirname + '/public'));

// intercept and handle request param :id
// put the value of the param into a variable in the request
// then call the next handler for the request
app.param('id', function(req, res, next, id) {
  req.username = 'user ' + id;
  next();
});

// intercept and handle :postid param
app.param('postid', function(req, res, next, id2) { 
  req.postid = id2;
  next();
});

// handle a GET for e.g. /user/2
// use the request variable added by the :id handler
app.get('/user/:id', function(req, res) {
  res.send('Hello ' + req.username);
});

// handle a GET for e.g. /user/3/4
// uses two parameters
app.get('/user/:id/:postid', function(req, res) {
  res.send('Hello user ' + req.username + ', post ' + req.postid);
});

function fn() {
  var addr = server.address();
  console.log('Listening on port %s:%d', addr.address, addr.port);
}

var server = app.listen(3000, fn);
console.log('Server created...');

Also, make a directory called public/ and create a public/foo.txt with anything you like in it.

Run node server and fire up your browser. Some URLs to try:
  • http://localhost:3000/hello.txt
  • http://localhost:3000/static/foo.txt
  • http://localhost:3000/user/2
  • http://localhost:3000/user/3/4
For the most part the source code is self-explanatory. We define some handlers for different routes (paths) and parameters and provide a callback for each handler. A few notes about Express:
  • The general syntax for defining a request handler is app.VERB(regex, handler1 [, handler2...]). VERB is a (lowercase) HTTP verb e.g. get, post, etc. The regex may be a simple path as above or any regular expression. The handler is a callback function. Multiple handlers may be provided, either as separate parameters or as an array.
  • Use app.use(regex, handler) to define a handler that will be called for all requests. If multiple app.use() handlers are defined they are called sequentially in the order they were defined.
  • Order is important. If two handlers are defined and match the same URI, only the first will be called. (Subsequent handlers may be called by explicitly invoking next().)
Express is a fairly small MVC framework for Node.js. There are several others, along with full-stack frameworks, and of course many other libraries.

Debugging with Node.js

For any "serious" development and debugging, you'll want to use an IDE. However, basic debugging for Node.js is easy to set up:

  1. Run npm install -g node-inspector to install the debugger package (the -g puts it into your global Node.js installation instead of the current directory).
  2. When starting your application, use either node --debug myapp or node --debug-brk myapp if you want to pause on startup.
  3. In a separate shell, run node-inspector (I had to use node node_modules/node-inspector/bin/inspector.js from the Node.js installation directory, probably because Windows or Cygwin).
  4. In Chrome, go to http://127.0.0.1:8080/debug?port=5858. (Make sure that port is not blocked.)

Other Thoughts and Opinions

Developing code in a strongly statically typed language feels very different from coding in a dynamic untyped language. In Java, you have no choice: you conform to the API presented, there's no ambiguity or question about the types and methods. In JavaScript, there is no type information available, and method resolution at runtime is pretty simple: if the object has a method with the name being asked for, it's called -- regardless of whether the right number of arguments are passed. And there's no "compile time" checking at all. So for a Java developer it can be uncomfortable.

Node.js is a lightweight platform, and JavaScript is ill-suited for large applications. But where there is a need, a solution will be created. There are projects, languages, tools, and libraries to make large-scale development feasible.


A summary of the main points:
  • Node.js is a platform for developing server applications, consisting of JavaScript running in the Google V8 engine (with some C++ libraries).
  • The main event loop is single-threaded, but I/O operations are asynchronous and there are techniques to offload CPU or time-intensive tasks to worker threads. Or you can use something like WebWorker to offload tasks to the client.
  • Callbacks are commonly used to handle the results of I/O operations.
  • Use NPM to manage dependencies. There are a lot of packages available...choose wisely.
  • Node.js and its ecosystem are rapidly evolving. Keep on top of the changes.
  • Since JavaScript is a dynamic language, unit testing is important. Refactoring will be painful without good unit tests, although some tooling support exists.

My initial impression of Node.js is a qualified thumbs-up. Things I like:
  • Low barrier to entry and supports super-quick iteration and development.
  • There are some really nice benefits that come from using the same language for both the server and the client.
  • There's a lot of excitement and activity in the Node.js universe; with the low barrier and rapid development this translates to a lot of new packages/tools being developed and rapid evolution of features in existing ones.
  • Node.js plays 
  • Perhaps the most important, Node.js solves some real problems well, and has been battle tested in some fairly serious environments.

In the "minuses" column:
  • It's JavaScript, with all of its quirks and idiosyncrasies (not that it's unique in that regard). Plus, since it's an untyped prototype-based language with no support for large codebases, developing a largish application requires serious developer discipline to prevent the codebase from becoming too complex to work on. 
  • Everything is a callback, and your code will be littered with them, and if you're not careful you'll end up with pyramids of death all over. (And a few things aren't callbacks and will kill scalability if you don't profile for them.) 
  • Not really a minus but something to be aware of: because the Node.js ecosystem is churning so rapidly it's harder to find the "best" practices/patterns/idioms and picking the "right" framework/package can be a bit tricky. Do your homework and choose wisely.
  • Package support for third-party things is still coming along. 
  • IDE and tool support is still maturing. 
Frankly though, I think a lot of the complaints and issues are already addressable (for example using Typescript instead of JavaScript) or are being addressed in the near term. And on a larger scale, the universe of JavaScript -- languages (ClojureScript, CoffeeScript, Typescript, etc.), interpreters/VMs (Google V8, Mozilla SpiderMonkey, Oracle's Nashorn, etc.), and browser frameworks/libraries (Angular, Backbone, Ember, etc.) has a huge amount of momentum and synergy and life in it.

Summary and Further Reading

I've provided a brief introduction to Node.js -- what it is, what it's good for, and how to build, run and debug a Hello World application (and sell it to Google). Since I come from the Java world, I also compared it to the Java/JEE world.

Last, here are a bunch of articles and resources to help you journey further into the world of Node.js. Essentially this is a snapshot of the more interesting and useful things I've found over the past few days of exploration into Node.js. Your mileage may vary, caveat emptor, and all that.

Introductory and Informational

http://www.toptal.com/nodejs/why-the-hell-would-i-use-node-js - Introduction to Node.js

http://www.slideshare.net/chris.e.richardson/nodejs-the-good-parts-a-skeptics-view-jax-jax2013 - Presentation introducing Node.js (also from someone coming from a Java perspective)

http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ - Describes the Node.js event loop

http://programmers.stackexchange.com/questions/221615/why-do-dynamic-languages-make-it-more-difficult-to-maintain-large-codebases - A discussion around the difficulty of maintaining large codebases in a dynamic language like JavaScript

Case Studies and Success Stories

http://www.nearform.com/nodecrunch/how-node-js-has-revolutionized-the-mailonline - Using Node.js and Clojure on the Daily Mail Online website.

http://strongloop.com/strongblog/mobile-app-development-with-full-stack-javascript-part-1-of-4-loopback/ - Developing a mobile app using Node.js (part 1 of 4)

Plugins, Libraries and Tools

https://www.npmjs.org/ - the Node Package Manager library, get your plugins here

http://nodeframework.com/ - A list of 30+ handpicked Node.js frameworks

http://gruntjs.com/ - Automated build system (aka Maven) for Node.js

http://expressjs.com/ - MVC web application framework

https://www.npmjs.org/package/webworker-threads - WebWorker threads, for moving long-running tasks off the main event loop


How-To and Tutorial

http://www.bearfruit.org/2013/06/21/start-a-new-node-js-express-app-the-right-way/ - One approach to getting started with Node.js applications

http://greenido.wordpress.com/2013/08/27/debug-nodejs-like-a-pro/ - Setup for debugging with Node.js

http://pettergraff.blogspot.com/2013/01/java-vs-node.html - A Java programmer's experience with programming for Node.js

http://www.codeproject.com/Articles/523451/Node-Js-And-Stuff - Longish walkthrough for developing a Node.js application