Hardening node.js for production part 3: zero downtime deployments with nginx

Below I’ll talk about deploying new node.js code for a HTTP server without ever suffering downtime. This is part of our series on hardening node.js for production use in the Silly Face Society – see part 1 and part 2.

Suffering downtime to perform an upgrade always feels a bit backwards. We do it because it is too technically complicated relative to the expense of an outage. In the Silly Face Society, I’m willing to suffer brief outages for PostgreSQL and Redis upgrades but not for code that we control: our day-to-day bug fixes. Besides bug fixes, the frequency of node.js module updates would require an outage multiple times a week just to stay fresh.

To solve this, the basic idea using nginx’s built-in failover mechanism for upstream servers / processes to shift traffic away from processes that are restarting. Each upstream server will allow existing connections to finish before shutting down and restarting with the new code. We perform a zero downtime upgrade of all processes on the machine by by iterating over each process, shutting it down and bringing it back up. For the rest of the tutorial, I’ll assume you have nginx set up to proxy node requests ala part 2 and are using express.js to handle HTTP requests.

All of this sounds much more complicated than it is, so let’s dive into code:

Graceful Shutdowns in Express.js

We will first implement graceful shutdowns of the process. When our process receives a kill signal we want it to refuse new connections and finish existing ones. I’ll assume you have an express.js server set up with something like:

app = express.createServer()
...
app.listen(31337)

We can modify this slightly to perform graceful shutdowns on SIGTERM. Additionally, we’ll create a timeout that forcefully exits the process if connections are taking an unreasonable amount of time to close:

httpServer = app.listen(31337)
process.on 'SIGTERM', ->
  console.log "Received kill signal (SIGTERM), shutting down gracefully."
  httpServer.close ->
    console.log "Closed out remaining connections."
    process.exit()

  setTimeout ->
    console.error "Could not close connections in time, forcefully shutting down"
    process.exit(1)
  , 30*1000

In the above code, we extract the underlying http server object from express.js (the result of the app.listen call). Whenever we receive SIGTERM (the default signal from kill), we attempt a graceful shutdown by calling httpServer.close. This puts the server in a mode that refuses new connections but keeps existing ones open. If there is a connection hog that doesn’t quit within that time period, we perform an immediate exit (setTimeout does this after 30 seconds). Modify this timeout as appropriate. Note: I don’t use web sockets, but they would be considered connection hogs by the above logic. To achieve zero impactful downtime, you would have to close out these connections manually and have some nifty retry logic on the client.

There is one issue with the code: HTTP 1.1 keep-alive connections would also be considered “connection hogs” and continue to accept new requests on the same connection. Since I use keep-alive connections in nginx, this is a big problem. Ideally we would force node.js into a mode that closes all existing idle connections. Unfortunately, I can’t find any way of doing this with existing APIs (see this newsgroup discussion). Fortunately, we can add middleware that automatically sends 502 errors to new HTTP requests on the server. Nginx will handle the rest (see below). Here’s the modification:

app = express.createServer()
...
gracefullyClosing = false
app.use (req, res, next) ->
  return next() unless gracefullyClosing
  res.setHeader "Connection", "close"
  res.send 502, "Server is in the process of restarting"
...
httpServer = app.listen(31337)
process.on 'SIGTERM', ->
   gracefullyClosing = true
   ...

This should be mostly self-explanatory: we flip a switch that makes every new request stop with a 502 error. We also send a Connection: close header to hint that this socket should be terminated. As usual, this minimal example is available as a gist.

Ignoring Restarting Servers in Nginx

We will assume you have an nginx server with more than one upstream server in a section like:

upstream silly_face_society_upstream {
server 127.0.0.1:61337;
server 127.0.0.1:61338;
keepalive 64;
}

By default, if nginx detects an error (i.e. connection refused) or a timeout on one upstream server, it will fail over to the next upstream server. The full process is explained within the proxy_next_upstream section of the HttpProxy module documentation. The default is essentially the behaviour we want, modulo fail-overs on keep-alive connections. As mentioned above, we throw a 502 to indicate a graceful shutdown in progress. Insert a proxy_next_upstream directive like:

...
location @nodejs {
...
proxy_next_upstream error timeout http_502;
...
proxy_pass http://silly_face_society_upstream;
}
...

With the above addition nginx will failover to the next upstream whenever it gets an error, timeout or 502 from the current one.

Performing zero downtime deployments

Believe it or not, everything is in place to do zero downtime deployments. Whenever new code is pushed we have to bounce each process individually. To gracefully start the server with new code:

  1. Issue a SIGTERM signal (kill <pid> will do that)
  2. Wait for termination. As a simplification, wait the kill timeout and a bit of a buffer.
  3. Start the process up again.

That’s it: nginx will handle the hard work of putting traffic on the healthy processes! If you are running in a managed environment, you can even automate the restarts. I’ve put a new version of my child_monitor.coffee script from part 1 on github as a gist to show how you can go about it. The master process listens for a SIGHUP (indicating a code push). Whenever it receives the signal, it kills+restarts each monitored child with a short waiting period in between each kill.

Bingo bango. Ah, I forgot a shameless plug for our upcoming iPhone game: if you like silly faces, visit www.sillyfacesociety.com and get the app! It’s silltastic!

Better Effects: 3 Simple iOS 5 Particle Systems by Example

The Silly Face Society is a casual iPhone game that uses a surprisingly large number of visual effects. Below, I’ll demo a few of the particle systems (using iOS 5′s built in libraries) and provide some sample code to implement them.

Most particle system tutorials focus on effects such as fire and smoke that are appropriate for sprite-based games. Although there are a number of great tutorials for UIKit particle systems, they are a bit top heavy. I’d like to spend a bit of time doing a “case study” on the less aggressive particle systems we implemented for our casual game. Here is a visual into the effects that I will be showing off:

Anatomy of the System

In UIKit, particle systems consist of two parts:

  • One or more CAEmitterCells. The emitter cells can be thought of as prototypes for individual particles (e.g., a single puff in a cloud of smoke). When emitting a particle, UIKit looks at the emitter cell and create a randomized particle based on the definition. The prototype includes properties that control the image, colour, direction, movement, scale and lifetime of particles.
  • One or more CAEmitterLayers but usually just one. The emitter layer mostly controls the shape (e.g., a point, a rectangle or a circle) and the position of the emission (e.g. inside the rectangle, or on the edge). The layer has global multipliers that can be applied to CAEmitterCells within the system. These give you an easy way to blanket changes over all particles – a contrived example would be changing the x velocity of rain to simulate wind.

The basics are simple but the parameters can be quite subtle. CAEmitterLayer has over 30 different parameters to customize the behaviour of particles. Below, I’ll spell out some of the particular issues that caused me grief.

Randomness

What makes the particle system into a system is randomness. CAEmitterCell’s properties generally have two parameters – a mean and a “cone”: e.g., velocity and velocityRange. By default, the “cone” is 0, meaning that all particles will have identical velocity. By changing the cone, each emitted particle will randomly be perturbed to have a velocity value falling within the cone. This is subtly mentioned in the Apple CAEmitterLayer documentation:

Each layer has its own random number generator state. Emitter cell properties that are defined as a mean and a range, such as a cell’s speed, the value of the properties are uniformly distributed in the interval [M - R/2, M + R/2].

Even colour has a cone: I’ll use this below to make psychedelic smoke. Groovy.

Direction of Emission

CAEmitterCells have a velocity property that means the speed in the direction of emission. The actual direction of emission is specified using the emissionLongitude property. Apple describes this as

The emission longitude is the orientation of the emission angle in the xy-plane. it is also often referred to as the azimuth.

. Every time I see this it makes me want to scream. Here is an example I made to clear things up:

Super quick code for this is available in this gist.

Code Setup / Structure

Setting up your code base is explained in much more detail here. Here are my cliff notes:

  • Add the “QuartzCore” kit to your project
  • Create a new view, but set the root layer class to CAEmitterLayer
  • In the init, set up the CAEmitterLayer and one or more CAEmitterCells
  • Add the cell to the layer by setting emitterLayer.emitterCells = [NSArray arrayWithObject:cell]

A rough outline would be

@implementation SFSChickenScreen {
    __weak CAEmitterLayer *_emitterLayer;
}

- (id) initWithFrame:(CGRect)frame chickenPosition:(CGPoint)position {
    if((self = [super initWithFrame:frame])) {
        self.userInteractionEnabled = NO;
        self.backgroundColor = [UIColor clearColor];
        _emitterLayer= (CAEmitterLayer*)self.layer;

        // ...
        // set up layer shape / position
        // ...

        CAEmitterCell *emitterCell = [CAEmitterCell emitterCell];
        emitterCell.contents = (__bridge id)[[UIImage imageNamed:@"SomeParticle.png"]];
        // ...
        // set up emitter cell
        // ...
        _emitterLayer.emitterCells = [NSArray arrayWithObject:chicken];
    }

    return self;
}

- (void)stopEmitting {
    _emitterLayer.birthRate = 0.0;
}

+ (Class) layerClass {
    return [CAEmitterLayer class];
}

The advantage of this is structure is that it is reasonably modular and can be inserted above / below any particular view. It can also act as a particle system liaison: for example, the stopEmitting code in the above will halt the emission of new particles immediately. Generally I add this view, wait a few seconds to stop it and then wait a few more seconds to remove the view.

Particles

Here is the scoop on the effects in the video:

Coloured Smoke

Some coloured smoke. You’ll need bitchin’ guitar riff to go with this.

We use coloured smoke as a transition between different text options within the same screen. The effect is made by layering a large number of randomly coloured smoke particles within a rectangular box. We configure them to turn white over time, creating a nice white-out effect.

First, we create the rectangular frame for particle emission:

       CGRect bounds = [[UIScreen mainScreen] bounds];
       smokeEmitter.emitterPosition = CGPointMake(bounds.size.width / 2, bounds.size.height / 2); //center of rectangle
       smokeEmitter.emitterSize = bounds.size;
       smokeEmitter.emitterShape = kCAEmitterLayerRectangle;

This makes a “screen” of smoke. From here, we go on to create an emitter cell with certain lifeTime properties.

       CAEmitterCell*smokeCell = [CAEmitterCell emitterCell];
       smokeCell.contents = (__bridge id)[[UIImage imageNamed:@"SmokeParticle.png"] CGImage];
       [smokeCell setName:@"smokeCell"];
       smokeCell.birthRate = 150;
       smokeCell.lifetime = 1.0;
       smokeCell.lifetimeRange = 0.5;

We account for the short lifeTime by having a relatively large (for the particle size) birthRate. The lifetimeRange makes our smoke particles disappear non-uniformly, adding a bit of variety. In terms of speed, we wanted to have are smoke particles shoot up and then “sink” over the course of their life. This is accomplished by balancing an upwards initial velocity with a downwards yAcceleration:

       smokeCell.velocity = 50;
       smokeCell.velocityRange = 20;
       smokeCell.yAcceleration = 100;
       smokeCell.emissionLongitude = -M_PI / 2; // up
       smokeCell.emissionRange = M_PI / 4; // 90 degree cone for variety

Before we get to colour, we should note that smoke tends to “poof” and grow. We can simulate this by increasing the scale of the particle of its lifespan:

       smokeCell.scale = 1.0;
       smokeCell.scaleSpeed = 1.0;
       smokeCell.scaleRange = 1.0;

The secret sauce. Colour. This is mostly a fun trial and error game. I set a slightly white-biased base colour, and configure random variations of red / blue / green around it. All colours have a positive speed, so they will drift towards white in time. Finally, I set a negative alphaSpeed so they get more transparent over their life. Here we go

       smokeCell.color = [[UIColor colorWithRed:0.6 green:0.6 blue:0.6 alpha:1.0] CGColor];
       smokeCell.redRange = 1.0;
       smokeCell.redSpeed = 0.5;
       smokeCell.blueRange = 1.0;
       smokeCell.blueSpeed = 0.5;
       smokeCell.greenRange = 1.0;
       smokeCell.greenSpeed = 0.5;
       smokeCell.alphaSpeed = -0.2;

Psychedelic. You can see the complete code here in the gist. For the particle image, see SmokeParticle.png. Feel free to use it in your own app.

Confetti

With confetti, it is party time all the time.

Confetti is used in our reveal animation – it is a visual indicator that you guessed people’s silly faces correctly. The emitter itself comes as a line of particles that “drop” into the screen via gravity. As such, the shape is a bit different than smoke:

      _confettiEmitter.emitterPosition = CGPointMake(self.bounds.size.width /2, 0);
      _confettiEmitter.emitterSize = self.bounds.size;
      _confettiEmitter.emitterShape = kCAEmitterLayerLine;

As in the apple documentation, the y-coordinate of the position is ignored for lines, and the x is the center of the line. For confetti colours, we will set it up to have a wide range with a slight bias towards white:

        CAEmitterCell *confetti = [CAEmitterCell emitterCell];
        confetti.contents = (__bridge id)[[UIImage imageNamed:@"Confetti.png"] CGImage];
        confetti.name = @"confetti";
        confetti.birthRate = 150;
        confetti.lifetime = 5.0;
        confetti.color = [[UIColor colorWithRed:0.6 green:0.6 blue:0.6 alpha:1.0] CGColor];
        confetti.redRange = 0.8;
        confetti.blueRange = 0.8;
        confetti.greenRange = 0.8;

The birth rate was determined experimentally – it felt about right. Acute observers would notice that the above code, gives a range of [0.2,1] for each of the colours for confetti. Naturally, we will emit the confetti straight down. While we do this we add yAcceleration so that it looks like gravity is having an effect. The exact numbers were all trial and error:

        confetti.velocity = 250;
        confetti.velocityRange = 50;
        confetti.emissionRange = (CGFloat) M_PI_2;
        confetti.emissionLongitude = (CGFloat) M_PI;
        confetti.yAcceleration = 150;

Another trick with confetti is to introduce a random spin so that the particle rotates as if there is some air resistance. We also modify scale / scaleRange to add some variety.

        confetti.spinRange = 10.0;
        confetti.scale = 1.0;
        confetti.scaleRange = 0.2;

One final issue: a “wall” of confetti looks a bit strange. The birth rate of confetti should slow down over time, as if there is a finite amount of confetti dropping. I hacked together a couple of quick functions that we use for linear decay over aninterval:

static NSTimeInterval const kDecayStepInterval = 0.1;
- (void) decayStep {
    _confettiEmitter.birthRate -=_decayAmount;
    if (_confettiEmitter.birthRate < 0) {
        _confettiEmitter.birthRate = 0;
    } else {
        [self performSelector:@selector(decayStep) withObject:nil afterDelay:kDecayStepInterval ];
    }
}

- (void) decayOverTime:(NSTimeInterval)interval {
    _decayAmount = (CGFloat) (_confettiEmitter.birthRate /  (interval / kDecayStepInterval));
    [self decayStep];
}

Done! To actually use this code, you’ll probably want the gist. For the particle image, see Confetti.png. Feel free to use it in your own app.

Chicken

Chickens. For those die hard Arg! fans.

Okay. The Chicken is an easter egg and an homage to Arg! The Pirates Strike Back. I’m not sure we really need to give a detailed description of how to create it. I’ll give a quick description and then let the gist handle the rest.

The Chicken is a point particle that gets emitted at a funny angle and spins while it falls. The only thing that is slightly unique about it is that we fade it in from nothing. The simplest way of doing this is to make the initial alpha colour as 0, and set a large alpha speed (10 = 0.1s fade in):

        CAEmitterCell *chicken = [CAEmitterCell emitterCell];
        chicken.contents = (__bridge id)[[UIImage imageNamed:@"ArgChicken.png"] CGImage];
        chicken.name = @"chicken";
        chicken.birthRate = 5;
        chicken.lifetime = 5.0;
        chicken.color = [[UIColor colorWithRed:1.0 green:1.0 blue:1.0 alpha:0.0] CGColor];
        chicken.alphaSpeed = 10.0;

I’ll spare you the rest: see the gist and ArgChicken.png.

Happy particling! If I ever have time, I’d love to put together a little Cocoa app for fiddling with the parameters of particle systems. For now, I’m occupied with the Silly Face Society. Sign up and learn when we launch.

Continuous Integration Using Bill Cosby

Cosbybot in the Silly Face Society lobby
The Silly Face Society lobby featuring a very special guest. It isn’t Chris Young.

The Silly Face Society is our soon-to-be-released social iOS game where you advance in stature by sending silly faces and having friends guess the type of face you made. In this post, I’ll cover how we use Bill Cosby to continuously test our server code.

Testing and lazyness

Arg! Team made up of two part-timers and it is apparent that neither of us know anything about project management. We can’t scope, our estimates are off by an order of magnitude, and we refuse to cut features in the interests of launching. As a result Chris is often practicing “scope by art” while I practice “design by code”. A stark contrast to my old enterprise day job that makes me grind my teeth when it comes to writing tests. So, I decided early in this project not to write any tests, server or client. And our server uses a notoriously soupy dynamic language. Don’t do as I did.

Besides the guilt, one thought still kept me up at night: what if I unwittingly make a small change that completely breaks our iOS experience?

Enter Cosbybot

When I was younger, I would spend a lot of time on IRC interacting with Eggdrop trivia bots. Recently, it occurred to me: I can I fool myself into writing tests by calling them bots instead. My “test suite” would be a bot that could be challenged by players using our iOS client. And who better to make silly faces than Bill Cosby?

As I wrote Cosbybot, it occurred to me that bots come with some other benefits:

  • Whenever I need to demo to someone, I always have someone to play with that responds quickly.
  • Bots can be operated outside the server: if the server is acting oddly towards our users then the bot will catch it.
  • Writing a “clean room” client outside of my iOS app ensures I didn’t make bad assumptions.

Mechanics of Cosbybot

The following is a brief visual summary of “testing” the Silly Face Society by playing against Cosbybot:

1. A member of the Silly Face Society starts a game with Cosbybot
2. The member sends Cosbybot a silly face
3. Cosbybot guesses randomly at the silly face and taunts the submitter.
4. Cosbybot sends his own picture to be guessed

Cosbybot is a real account (with a login) that uses programmatic client. I implemented a simple state machine that advances along WAITING SUBMIT -> GUESSING -> SUBMITTING -> WAITING GUESS. Every thirty seconds, Cosbybot scans for all open rounds and tries to advance them by a step in the state machine, submitting a photo of Bill Cosby when required. If an error occurs (bad response code, bad login, bad serialization, unexpected format, server down, etc.) then an error is logged at “fatal” which fires me an e-mail using node’s winston module.

Cosbybot acts without internal knowledge and I keep it in an external script that interacts with the Silly Face Society server over HTTP (via node’s request module). The source for the bot is a bit too long for this post, but is available in this gist. Here’s the run loop:

fe = forwardError = (cb, fn) -> (err) ->
  return cb(err) if err?
  fn Array::slice.call(arguments, 1)...

execute = ->
  login fe reportError, ->
    refresh fe reportError, (responseJson) ->
      {rankName, id, monocles} = responseJson.currentPlayer
      winston.info "#{botName} (#{id}) rank: '#{rankName}' - #{monocles} monocles"
      async.forEachSeries responseJson.rounds,
        (round, callback) ->
          switch actionState(id, round)
            when ACTION_STATES.YOU_SUBMIT then submitPhoto round, callback
            when ACTION_STATES.YOU_GUESS
              guessUntilClosed round, fe callback, ->
                taunt round, callback
            else callback()
        fe reportError, -> reexecute()

To understand this properly, see the context in the gist.

Continuity in Continuous Integration

Urkelbot and Cosbybot together at long last.

So far, all of our “tests” require a human to send a submission or guess a photo. Fortunately, there is a quick way of removing the human element: introduce one more bot.

I created another Silly Face Society account named “Urkelbot” and manually challenged Cosbybot to a round. I then booted copies of Cosbybot and Urkelbot on our production and dev servers. In each scan, Urkelbot’s state machine is advanced by Cosbybot (and vice-versa). Since the states are circular, we get continuous external testing of each server build. If I introduce a regression that prevents the game from being played, one of the bots will send me an e-mail alerting me of the failure.

And there you have it: Bill Cosby tests our code all day. If you like Bill Cosby and silly faces then be the first to try The Silly Face Society on the iPhone.

P.S. how was this show possibly ever on television?

Node.js Postgres Pooling Revisited (with transactions!)

This is a follow up to the last connection pooling blog entry. If it seems like I am spending an undue amount of time on this topic then you feel the same way as I do.

I’ve grown wary of node-postgres‘ built-in pooling and have decided to implement my own connection pooling on top of it. In production, I was seeing that connections would “leak” from the pool, causing the pool to fill and time out all further requests. The only fix I had was to have a periodic health check that rebooted the process: obviously this wouldn’t work for launch. The issue is documented here: https://github.com/brianc/node-postgres/issues/137 but most of the discussed solutions won’t work with my pooled decorator pattern.

Luckily, new versions of node-pool has the pattern built in and baking your own pool is a couple of lines. Let’s create a pool:

pg = require('pg').native # Or non-native, if you prefer
poolModule = require 'generic-pool'

connectionString = "tcp://user:pass@localhost/your_database"
pgPool = poolModule.Pool
    name: 'postgres'
    create: (cb) ->
        client = new pg.Client connectionString
        client.connect (err) ->
          return cb(err) if err?
          client.on "error", (err) ->
            console.log "Error in postgres client, removing from pool"
            pgPool.destroy(client)
          client.pauseDrain() #Make sure internal pooling is disabled
          cb null, client

    destroy: (client) -> client.end()
    max: 10
    idleTimeoutMillis : 30 * 1000
    log: true #remove me if you aren't debugging

pgPool is a pool of up to ten postgres clients. Calling pgPool.acquire will give you a native postgres client to work with, while calling pgPool.release will return it back to the pool. Lastly, calling pgPool.pooled and passing in a function will “decorate” your function so that it auto-acquires on call, and auto-releases on callback. See the pooled function decoration documentation.

Note that in create we call the pg.Client constructor directly instead of using pg.connect because pg.connect returns a proxy object that does internal connection pooling.

Okay! That was pretty basic. To spice this post up a bit, here’s a little class that helps out with transactions:

class Transaction
    #Unmanaged client (not auto-released)
    @startInClient: (pgClient, releaseOnCompletion, callback) ->
      [callback, releaseOnCompletion] = [releaseOnCompletion, false] if typeof releaseOnCompletion == 'function'
      (new Transaction(pgClient, releaseOnCompletion)).startTransaction callback

    #Managed client (auto-released on commit / rollback)
    @start: (callback) ->
      pgPool.acquire (err, pgClient) ->
        return callback(err) if err?
        Transaction.startInClient pgClient, true, callback

    constructor: (@pgClient, @releaseOnComplete) ->

    startTransaction: (cb) ->
      @pgClient.query "BEGIN", (err) => cb(err, @)

    rollback: (cb) ->
      @pgClient.query "ROLLBACK", (err) =>
        pgPool.release @pgClient if @releaseOnComplete
        cb(err, @) if cb?

    commit: (cb) ->
      @pgClient.query "COMMIT", (err) =>
        pgPool.release @pgClient if @releaseOnComplete
        cb(err, @) if cb?

    wrapCallback: (cb) -> (err) =>
      callerArguments = arguments
      if err?
        @rollback()
        return cb callerArguments...
      else
        @commit (commitErr) ->
          return cb(commitErr) if commitErr?
          cb callerArguments...

Transaction.start acquires a new connection from the pool, begins a transaction and returns an object that can be used to manage the transaction. You can either rollback or commit the transaction. One more sophisticated function is the wrapCallback function. This is best understood with an example:

# Start a transaction and do something with it
Transaction.startTransaction (err, t) =>
      return console.log "Couldn't start transaction" if err?
      callback = t.wrapCallback(callback)
      pgClient = t.pgClient
      pgClient.query "INSERT INTO fun(id) VALUES(1)", (err) ->
        return callback(err) if err?
        pgClient.query "INSERT INTO people(id,fun_id) VALUES (1,1)", callback

The above code transactionally inserts an item into the fun table and the people table. Beyond Transaction.startTransaction it never mentions the transaction again. The magic here is the t.wrapCallback function: it produces a wrapped callback that rollbacks when called with an error (the first parameter isn’t undefined/null) and commits otherwise.

In the languages I know, the “gold standard” for transactions is when they are as transparent as possible. Sometimes (i.e. Spring with JDBC in Java), this means resorting to thread-local contexts. I think the above transaction class matches this standard, even if the wrapCallback is a little tricky to understand.

And with that, we’re back to coding. If you like postgres and silly faces, try out the Silly Face Society on the iPhone.