Category: SillyFaceSociety

The Faces of Silly Faces: Dinosaur Soup

People have submitted almost 40,000 silly faces to The Silly Face Society. Today, we are delighted to showcase one of our favourite players: Dinosaur Soup.

DinoCollage

Dinosaur doing the “choking hot sauce eye”.

Dinosaur joined the esteemed society in April of 2013 and has risen to the rank of Toucan Sam with 1185 monocles earned to date. She is a welcome face to new players, and has even alerted us to a few application bugs.

International “Mustached Man” of Mystery

Hailing from the great state of New Jersey, Dinosaur Soup is a stay-at-home mom of an awesome two year old girl. Naturally, she uses this an excuse make silly faces to her phone in public. Her friends don’t even give her a second glance when she’s making the Barf or the Flamboyant Metalhead.

She also enjoys more traditional games like Skyrim and branches out into RPGs and survival-horror. But with the Silly Face Society, it was love at first app-open. Her advice to new players? “Don’t be afraid to be the silliest you can be!”.

Dinosaur has challenged members with over 500 faces! We’ve prepared a short video celebrating her rise to silly stardom:

Can you out-silly Dinosaur Soup? Challenge her and others with the Silly Face Society on your iPhone, iPad or iPod!

Hardening node.js for production part 3: zero downtime deployments with nginx

Below I’ll talk about deploying new node.js code for a HTTP server without ever suffering downtime. This is part of our series on hardening node.js for production use in the Silly Face Society – see part 1 and part 2.

Suffering downtime to perform an upgrade always feels a bit backwards. We do it because it is too technically complicated relative to the expense of an outage. In the Silly Face Society, I’m willing to suffer brief outages for PostgreSQL and Redis upgrades but not for code that we control: our day-to-day bug fixes. Besides bug fixes, the frequency of node.js module updates would require an outage multiple times a week just to stay fresh.

To solve this, the basic idea using nginx’s built-in failover mechanism for upstream servers / processes to shift traffic away from processes that are restarting. Each upstream server will allow existing connections to finish before shutting down and restarting with the new code. We perform a zero downtime upgrade of all processes on the machine by by iterating over each process, shutting it down and bringing it back up. For the rest of the tutorial, I’ll assume you have nginx set up to proxy node requests ala part 2 and are using express.js to handle HTTP requests.

All of this sounds much more complicated than it is, so let’s dive into code:

Graceful Shutdowns in Express.js

We will first implement graceful shutdowns of the process. When our process receives a kill signal we want it to refuse new connections and finish existing ones. I’ll assume you have an express.js server set up with something like:

app = express.createServer()
...
app.listen(31337)

We can modify this slightly to perform graceful shutdowns on SIGTERM. Additionally, we’ll create a timeout that forcefully exits the process if connections are taking an unreasonable amount of time to close:

httpServer = app.listen(31337)
process.on 'SIGTERM', ->
  console.log "Received kill signal (SIGTERM), shutting down gracefully."
  httpServer.close ->
    console.log "Closed out remaining connections."
    process.exit()

  setTimeout ->
    console.error "Could not close connections in time, forcefully shutting down"
    process.exit(1)
  , 30*1000

In the above code, we extract the underlying http server object from express.js (the result of the app.listen call). Whenever we receive SIGTERM (the default signal from kill), we attempt a graceful shutdown by calling httpServer.close. This puts the server in a mode that refuses new connections but keeps existing ones open. If there is a connection hog that doesn’t quit within that time period, we perform an immediate exit (setTimeout does this after 30 seconds). Modify this timeout as appropriate. Note: I don’t use web sockets, but they would be considered connection hogs by the above logic. To achieve zero impactful downtime, you would have to close out these connections manually and have some nifty retry logic on the client.

There is one issue with the code: HTTP 1.1 keep-alive connections would also be considered “connection hogs” and continue to accept new requests on the same connection. Since I use keep-alive connections in nginx, this is a big problem. Ideally we would force node.js into a mode that closes all existing idle connections. Unfortunately, I can’t find any way of doing this with existing APIs (see this newsgroup discussion). Fortunately, we can add middleware that automatically sends 502 errors to new HTTP requests on the server. Nginx will handle the rest (see below). Here’s the modification:

app = express.createServer()
...
gracefullyClosing = false
app.use (req, res, next) ->
  return next() unless gracefullyClosing
  res.setHeader "Connection", "close"
  res.send 502, "Server is in the process of restarting"
...
httpServer = app.listen(31337)
process.on 'SIGTERM', ->
   gracefullyClosing = true
   ...

This should be mostly self-explanatory: we flip a switch that makes every new request stop with a 502 error. We also send a Connection: close header to hint that this socket should be terminated. As usual, this minimal example is available as a gist.

Ignoring Restarting Servers in Nginx

We will assume you have an nginx server with more than one upstream server in a section like:

upstream silly_face_society_upstream {
server 127.0.0.1:61337;
server 127.0.0.1:61338;
keepalive 64;
}

By default, if nginx detects an error (i.e. connection refused) or a timeout on one upstream server, it will fail over to the next upstream server. The full process is explained within the proxy_next_upstream section of the HttpProxy module documentation. The default is essentially the behaviour we want, modulo fail-overs on keep-alive connections. As mentioned above, we throw a 502 to indicate a graceful shutdown in progress. Insert a proxy_next_upstream directive like:

...
location @nodejs {
...
proxy_next_upstream error timeout http_502;
...
proxy_pass http://silly_face_society_upstream;
}
...

With the above addition nginx will failover to the next upstream whenever it gets an error, timeout or 502 from the current one.

Performing zero downtime deployments

Believe it or not, everything is in place to do zero downtime deployments. Whenever new code is pushed we have to bounce each process individually. To gracefully start the server with new code:

  1. Issue a SIGTERM signal (kill <pid> will do that)
  2. Wait for termination. As a simplification, wait the kill timeout and a bit of a buffer.
  3. Start the process up again.

That’s it: nginx will handle the hard work of putting traffic on the healthy processes! If you are running in a managed environment, you can even automate the restarts. I’ve put a new version of my child_monitor.coffee script from part 1 on github as a gist to show how you can go about it. The master process listens for a SIGHUP (indicating a code push). Whenever it receives the signal, it kills+restarts each monitored child with a short waiting period in between each kill.

Bingo bango. Ah, I forgot a shameless plug for our upcoming iPhone game: if you like silly faces, visit www.sillyfacesociety.com and get the app! It’s silltastic!

Continuous Integration Using Bill Cosby

Cosbybot in the Silly Face Society lobby
The Silly Face Society lobby featuring a very special guest. It isn’t Chris Young.

The Silly Face Society is our soon-to-be-released social iOS game where you advance in stature by sending silly faces and having friends guess the type of face you made. In this post, I’ll cover how we use Bill Cosby to continuously test our server code.

Testing and lazyness

Arg! Team made up of two part-timers and it is apparent that neither of us know anything about project management. We can’t scope, our estimates are off by an order of magnitude, and we refuse to cut features in the interests of launching. As a result Chris is often practicing “scope by art” while I practice “design by code”. A stark contrast to my old enterprise day job that makes me grind my teeth when it comes to writing tests. So, I decided early in this project not to write any tests, server or client. And our server uses a notoriously soupy dynamic language. Don’t do as I did.

Besides the guilt, one thought still kept me up at night: what if I unwittingly make a small change that completely breaks our iOS experience?

Enter Cosbybot

When I was younger, I would spend a lot of time on IRC interacting with Eggdrop trivia bots. Recently, it occurred to me: I can I fool myself into writing tests by calling them bots instead. My “test suite” would be a bot that could be challenged by players using our iOS client. And who better to make silly faces than Bill Cosby?

As I wrote Cosbybot, it occurred to me that bots come with some other benefits:

  • Whenever I need to demo to someone, I always have someone to play with that responds quickly.
  • Bots can be operated outside the server: if the server is acting oddly towards our users then the bot will catch it.
  • Writing a “clean room” client outside of my iOS app ensures I didn’t make bad assumptions.

Mechanics of Cosbybot

The following is a brief visual summary of “testing” the Silly Face Society by playing against Cosbybot:

1. A member of the Silly Face Society starts a game with Cosbybot
2. The member sends Cosbybot a silly face
3. Cosbybot guesses randomly at the silly face and taunts the submitter.
4. Cosbybot sends his own picture to be guessed

Cosbybot is a real account (with a login) that uses programmatic client. I implemented a simple state machine that advances along WAITING SUBMIT -> GUESSING -> SUBMITTING -> WAITING GUESS. Every thirty seconds, Cosbybot scans for all open rounds and tries to advance them by a step in the state machine, submitting a photo of Bill Cosby when required. If an error occurs (bad response code, bad login, bad serialization, unexpected format, server down, etc.) then an error is logged at “fatal” which fires me an e-mail using node’s winston module.

Cosbybot acts without internal knowledge and I keep it in an external script that interacts with the Silly Face Society server over HTTP (via node’s request module). The source for the bot is a bit too long for this post, but is available in this gist. Here’s the run loop:

fe = forwardError = (cb, fn) -> (err) ->
  return cb(err) if err?
  fn Array::slice.call(arguments, 1)...

execute = ->
  login fe reportError, ->
    refresh fe reportError, (responseJson) ->
      {rankName, id, monocles} = responseJson.currentPlayer
      winston.info "#{botName} (#{id}) rank: '#{rankName}' - #{monocles} monocles"
      async.forEachSeries responseJson.rounds,
        (round, callback) ->
          switch actionState(id, round)
            when ACTION_STATES.YOU_SUBMIT then submitPhoto round, callback
            when ACTION_STATES.YOU_GUESS
              guessUntilClosed round, fe callback, ->
                taunt round, callback
            else callback()
        fe reportError, -> reexecute()

To understand this properly, see the context in the gist.

Continuity in Continuous Integration

Urkelbot and Cosbybot together at long last.

So far, all of our “tests” require a human to send a submission or guess a photo. Fortunately, there is a quick way of removing the human element: introduce one more bot.

I created another Silly Face Society account named “Urkelbot” and manually challenged Cosbybot to a round. I then booted copies of Cosbybot and Urkelbot on our production and dev servers. In each scan, Urkelbot’s state machine is advanced by Cosbybot (and vice-versa). Since the states are circular, we get continuous external testing of each server build. If I introduce a regression that prevents the game from being played, one of the bots will send me an e-mail alerting me of the failure.

And there you have it: Bill Cosby tests our code all day. If you like Bill Cosby and silly faces then be the first to try The Silly Face Society on the iPhone.

P.S. how was this show possibly ever on television?