Continuous availability deployment of node.js apps

I’ve been thinking about the problem of highly available node.js applications and how to keep things running smoothly when deploying new code. Ideally, when deploying, you don’t want to lose existing connections, and users should never see HTTP 502, 503, or 504 proxy errors. Although it’s possible to perform hot-code replacement in node.js, it’s not an approach I think is mature yet in handling all the little corner cases that may come up.

A more general way to tackle this is by taking advantage of load balancing. Let’s take the simple scenario of one reverse proxy (nginx) sitting in front of one node.js instance. You have nginx configured to to talk to the live production instance (version 1) or to the pre-live instance (version 2). When you want to deploy version 2, the code will be uploaded to the server, and the v2 node instance started. Now, at this point, traffic is still going to the v1 instance. We must let nginx know that for all future requests, route requests to v2, but let in-flight requests continue to talk to v1. This will keep things like WebSocket connections and long-poll sessions humming along until you retire v1.

So now we have two versions in production, but eventually, v1 can be shut down organically (when all requests finish) or it can be forced to shutdown after some specified amount of time. In the latter case, the web app or whatever thing is talking to your node application, should be designed to reestablish connections transparently. That way you can deploy during peak traffic without user complaints.

In this nginx configuration snippet, two upstreams are defined with one server each. You can add more if you like, since node.js doesn’t currently support worker processes out of the box. I’m using UNIX domain sockets because they’re represented as files and we can use mv to “atomically” switch versions on the fly.

upstream myapp_new {
  server unix:/tmp/myapp-new.sock;
}

upstream myapp_old {
  server unix:/tmp/myapp-old.sock;
}

server {
  location / {
    error_page 502 503 504 = @failover;
    proxy_next_upstream error timeout http_500 http_502 http_503 http_504 http_404;
    proxy_intercept_errors on;    
    proxy_pass http://myapp_new;
    break;
  }

  location @failover {
    proxy_next_upstream error timeout http_500 http_502 http_503 http_504 http_404;
    proxy_intercept_errors on;    
    proxy_pass http://myapp_old;
    break;
  }
}

Initially we have v1 deployed and listening on /tmp/myapp-new.sock and myapp-old.sock doesn’t exist yet. Now we bring up v2 and have it listen on some other unique socket, /tmp/myapp-v2.sock. First, we need to free up myapp-new.sock so we move it to myapp-old.sock. To the outside world, nothing has changed, but nginx is now proxying requests to a new socket. Finally, we move myapp-v2.sock to myapp-new.sock.

mv /tmp/myapp-new.sock /tmp/myapp-old.sock  # still only v1
mv /tmp/myapp-v2.sock /tmp/myapp-new.sock   # v1 + v2 hybrid
rm /tmp/myapp-old.sock                      # v2 remains in prod

Done.

And the best part? The only change to your app is the line that calls server.listen(): instead of specifying a port number, use an absolute path to a UNIX socket.