Hardening node.js for production part 2: using nginx to avoid node.js load

This is part 2 of a quasi-series on hardening node.js for production systems (e.g. the Silly Face Society). The previous article covered a process supervisor that creates multiple node.js processes, listening on different ports for load balancing. This article will focus on HTTP: how to lighten the incoming load on node.js processes. Update: I’ve also posted a part 3 on zero downtime deployments in this setup.

Our stack consists of nginx serving external traffic by proxying to upstream node.js processes running express.js. As I’ll explain, nginx is used for almost everything: gzip encoding, static file serving, HTTP caching, SSL handling, load balancing and spoon feeding clients. The idea is use nginx to prevent unnecessary traffic from hitting our node.js processes. Furthermore, we remove as much overhead as possible for traffic that has to hit node.js.

Too much talk. Here is our nginx config:

 http {
    proxy_cache_path  /var/cache/nginx levels=1:2 keys_zone=one:8m max_size=3000m inactive=600m;
    proxy_temp_path /var/tmp;
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;

    gzip on;
    gzip_comp_level 6;
    gzip_vary on;
    gzip_min_length  1000;
    gzip_proxied any;
    gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
    gzip_buffers 16 8k;
 
    upstream silly_face_society_upstream {
      server 127.0.0.1:61337;
      server 127.0.0.1:61338;
      keepalive 64;
    }

    server {
        listen 80;
        listen 443 ssl;

        ssl_certificate /some/location/sillyfacesociety.com.bundle.crt;
        ssl_certificate_key /some/location/sillyfacesociety.com.key;
        ssl_protocols        SSLv3 TLSv1;
        ssl_ciphers HIGH:!aNULL:!MD5;

        server_name sillyfacesociety.com www.sillyfacesociety.com;

        if ($host = 'sillyfacesociety.com' ) {
                rewrite  ^/(.*)$  http://www.sillyfacesociety.com/$1  permanent;
        }

        error_page 502  /errors/502.html;

        location ~ ^/(images/|img/|javascript/|js/|css/|stylesheets/|flash/|media/|static/|robots.txt|humans.txt|favicon.ico) {
          root /usr/local/silly_face_society/node/public;
          access_log off;
          expires max;
        }

        location /errors {
          internal;
          alias /usr/local/silly_face_society/node/public/errors;
        }

        location / {
          proxy_redirect off;
          proxy_set_header   X-Real-IP            $remote_addr;
          proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
          proxy_set_header   X-Forwarded-Proto $scheme;
          proxy_set_header   Host                   $http_host;
          proxy_set_header   X-NginX-Proxy    true;
          proxy_set_header   Connection "";
          proxy_http_version 1.1;
          proxy_cache one;
          proxy_cache_key sfs$request_uri$scheme;
          proxy_pass         http://silly_face_society_upstream;
        }
    }
}

Also available as a gist.

Perhaps this code dump isn’t particularly enlightening: I’ll try to step through the config and give pointers on how this balances the express.js code.

The nginx <-> node.js link
First things first: how can we get nginx to proxy / load balance traffic to our node.js instances? We’ll assume that we are running two instances of express.js on ports 61337 and 61338. Take a look at the upstream section:

http {
    ...
    upstream silly_face_society_upstream {
      server 127.0.0.1:61337;
      server 127.0.0.1:61338;
      keepalive 64;
    }
    ...
}

The upstream directive specifies that these two instances work in tandem as an upstream server for nginx. The keepalive 64; directs nginx to keep a minimum of 64 HTTP/1.1 connections to the proxy server at any given time. This is a true minimum: if there is more traffic then nginx will open more connections to the proxy.

upstream alone is not sufficient – nginx needs to know how and when to route traffic to node. The magic happens within our server section. Scrolling to the bottom, we have a location / section like:

http {
    ...
    server {
        ...
        location / {
          proxy_redirect off;
          proxy_set_header   X-Real-IP            $remote_addr;
          proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
          proxy_set_header   Host                   $http_host;
          proxy_set_header   X-NginX-Proxy    true;
          ...
          proxy_set_header   Connection "";
          proxy_http_version 1.1;
          proxy_pass         http://silly_face_society_upstream;
        }
        ...
    }
}

This section is a fall-through for traffic that hasn’t matched any other rules: we have node.js handle the traffic and nginx proxy the response. The most important part of the section is proxy_pass – this tells nginx to use the upstream server that we defined higher up in the config. Next in line is proxy_http_version which tells nginx that it should use HTTP/1.1 for connections to the proxy server. Using HTTP/1.1 spares the overhead of establishing a connection between nginx and node.js with every proxied request and has a significant impact on response latency. Finally, we have a couple of proxy_set_header directives to tell our express.js processes that this is a proxied request and not a direct one. Full explanations can be found in the HttpProxyModule docs.

This part of the config is the minimum amount needed to get nginx serving port 80 and proxying our node.js processes underneath. The rest of this article will cover how to use nginx features to lighten the traffic load on node.js.

Static file intercept
Although express.js has built in static file handling through some connect middleware, you should never use it. Nginx can do a much better job of handling static files and can prevent requests for non-dynamic content from clogging our node processes. The location directive in question is:

http {
    ...
    server {
        ...
        location ~ ^/(images/|img/|javascript/|js/|css/|stylesheets/|flash/|media/|static/|robots.txt|humans.txt|favicon.ico) {
          root /usr/local/silly_face_society/node/public;
          access_log off;
          expires max;
        }
        ...
    }
}
 

Any requests for with a URI starting with images, img, css, js, ... will be matched by this location. In my express.js directory structure, the public/ directory is used to store all my static assets – things like CSS, javascript and the like. Using root I instruct nginx to serve these files without ever talking to the underlying servers. The expires max; section is a caching hint that these assets are immutable. For other sites, it may be more appropriate to use a quicker cache expiry through something like expires 1h;. Full information can be in nginx’s HttpHeadersModule.

Caching
In my opinion, any caching is better than no caching. Sites with extremely heavy traffic will use all kinds of caching solutions including varnish for HTTP acceleration and memcached for fragment caching and query caching. Our site isn’t so high-traffic but caching is still going to save us a fortune in server costs. For simplicity of configuration I decided to use nginx’s built-in caching.

Nginx’s built in caching is crude: when an upstream server provides HTTP header hints like Cache-Control, it enables caching with an expiry time matching the header hint. Within the expiry time, the next request will pull a cached file from disk instead of hitting the underlying node.js process. To set up caching, I have set two directives in the http section of the nginx config:

http {
    ...
    proxy_cache_path  /var/cache/nginx levels=1:2 keys_zone=one:8m max_size=3000m inactive=600m;
    proxy_temp_path /var/tmp;
    ...
}

These two lines instruct nginx that we are going to use it in caching mode. proxy_cache_path specifies the root directory for our cache, the directory-depth (levels), the max_size of the cache and the inactive expire time. More importantly, it specifies the size of the in-memory keys for the files through keys_zone. When nginx receives a request, it computes an MD5 hash and uses this key set to find the corresponding file on disk. If it is not available, the request will hit our underlying node.js processes. Finally, to make our proxied requests use this cache, we have to change the location / section to include some caching information:

http {
  server {
     ...
     location / {
          ...
          proxy_cache one;
          proxy_cache_key sfs$request_uri$scheme;
          ...
     }
     ...
  }
}

This instructs nginx that it can use our one keys_set to cache incoming requests. MD5 hashes will be computed using the proxy_cache_key

We have one miss: express.js will not be serving the proper HTTP cache hint headers. I wrote a quick piece of middleware that will provide this functionality.

cacheMiddleware = cacheMiddleware = (seconds) -> (req, res, next) ->
    res.setHeader "Cache-Control", "public, max-age=#{seconds}"
    next()

It is not appropriate to apply this middleware globally – certain requests (e.g. post requests that affect server state) should never be cached. As such, I use it on a per-route basis in my express.js app:

...
app.get "/recent", cacheMiddleware(5 * 60), (req, res, next) ->
  #When someone hits /recent, nginx will cache it for 5 minutes!
...

GZIP
GZIP is a no-brainer for HTTP. By compressing incoming requests, clients will spend less time hogging up your server and everyone saves money on bandwidth. You could use some express.js middleware to handle gzipping of outgoing requests but nginx will do a better job and leave express.js with more resources. To enable GZIPed requests in nginx, add the following lines to the http section:

http {
    ...
    gzip on;
    gzip_comp_level 6;
    gzip_vary on;
    gzip_min_length  1000;
    gzip_proxied any;
    gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
    gzip_buffers 16 8k;
    ...
}

I won’t go into details on what these directives do. Like caching, any gzip is better than no gzip. For more control, there are a thousand micro-optimizations that you can perform that are all documented well under nginx’s HttpGzipModule.

SSL
Continuing our theme of leaving node.js to handle only basic HTTP, we arrive at SSL. As long as upstream servers are within a trusted network, it doesn’t make sense to encrypt traffic further than nginx – node.js can serve HTTP traffic for nginx to encrypt. This setup is easy to configure; in the server directive you can tell nginx how to configure SSL:

http {
   ...
   server {
        ...
        listen 443 ssl;
        ssl_certificate /some/location/sillyfacesociety.com.bundle.crt;
        ssl_certificate_key /some/location/sillyfacesociety.com.key;
        ssl_protocols        SSLv3 TLSv1;
        ssl_ciphers HIGH:!aNULL:!MD5;
        ...
   }
}

listen tells nginx to enable SSL traffic. ssl_certificate and ssl_certificate_key tell nginx where to find the certificates for your server. The ssl_protocols and ssl_ciphers lines instruct
nginx on how to serve traffic. These are details: full configuration options are available in the nginx HttpSslModule.

We are almost there. The above configuration will get nginx decrypting traffic and proxying unecrypted requests to our upstream server. However, the upstream server may need to know whether it is in a secure context or not. This can be used to serve SSL-enabled assets from CDNs like Cloudfront, or to reject requests that come unencrypted. Add the following lines to the location / section:

http {
   ...
   server {
       ...
       location / {
          ...
          proxy_set_header   X-Forwarded-Proto $scheme;
          ...
        }
      }
   }
}

This will send an HTTP header hint down to your node.js processes. Again, I whipped up a bit of middleware that make SSL-detection a little bit easier:

app.use (req, res, next) ->
  req.forwardedSecure = (req.headers["x-forwarded-proto"] == "https")
  next()

Within your routes, req.forwardedSecure will be true iff nginx is handling HTTPS traffic. For reference, the silly face society uses SSL for Facebook authentication token exchange when a user logs in using SSO (single sign on) on their phone. As an extension of implementing this, I also threw up a secure version of the site here.

Wrapping up
Phew. We covered how to (1) set up node.js as an upstream server for nginx and (2) how to lighten the load on the upstream server by letting nginx handle load balancing, static files, SSL, GZIP and caching. Caveat emptor: the silly face society hasn’t launched yet. The above configuration is based on personal testing and research: we haven’t reached production-level traffic yet. If anyone is still reading, I welcome suggestions for improvements in the comments.

Incidentally, our use of express.js for HTTP traffic is an accident. We started using node.js to provide a socket-server for the real time “party mode” of the silly face society. As we got close to launching, we decided to add a Draw Something-esque passive mode so that the Silly Face Society could have the launch inertia to be successful. Instead of rewriting our technology stack, we reused as much of the old code as possible and exposed an HTTP interface. If I had to do it all from scratch, I would re-evaluate our choices: node.js is a hard beast to tame for CRUD-apps.

Does your interest in nginx align with your interest in silly faces? Try out the Silly Face Society on the iPhone.

This article has also been translated to the Serbo-Croatian language by Anja Skrba from Webhostinggeeks.com.

  • ejeklint

    Excellent! Thank you for sharing, this is very nice stuff.

  • Mickey

    best article I have read yet describing how to front node.js with nginx. thank you. only thing I disagree with is writing your node code in coffeescript. but that’s a different discussion. ;-)

    • http://twitter.com/cosbynator Thomas Dimson

      I happen to have a fondness for obscure punctuation :)

      • Jonjon Taka

        Great reference but I agree coffeescript should never be the default when writing nodejs code for an article, though it could be provided as an additional alternative and not the other way around. The code police will soon be distributing fines to developers who snub pure JS notation on public websites :)

    • http://twitter.com/utuxia Utuxia Consulting

      yes, jade and coffeescript are painful :)

  • James

    I’m also building a Node app, but it is using web sockets for real time stuff, I’ve heard that Nginx cannot support web sockets, is there a way of using your setup described but it somehow supporting web sockets too? Then it would be the ultimate solution.

    • http://twitter.com/cosbynator Thomas Dimson

      Not out of the box, but from what I’ve read you can throw Varnish or HAProxy in front of Nginx and route the websocket connections separately. Nginx 1.3 is also around the corner which is supposed to support them (and SPDY!).

      In the Silly Face Society, we have some (true, not web) socket servers that are hosted by node.js. For reasons that I’ll probably explain in an upcoming architecture article, I run them on each server independently without any proxy or load balancer. If it makes sense for your use case, you could do something similar and have the websockets route directly to machines.

  • Pingback: Sysadmin Sunday 88 | Server Density Blog

  • ralphtheninja

    Thanks for an awesome article. Answered many questions that I had cached and also questions I didn’t know I had!

  • ItsLeeOwen

    why use nginx at all? i thought nodejs is intended to serve http requests?

    • tdimson

      You don’t /need/ nginx but you’ll lighten the load on your node processes by leaving some of the heavier lifting (SSL, caching, gzip) to nginx. I have a lot more confidence in the nginx modules than I do in most node.js libraries.

      • ItsLeeOwen

        gotcha, do you think your confidence in nginx over node modules comes from lifespan of the tools, size of contributing community, and/or would you say there are other factors as well?

        • tdimson

          That is a bit tough to answer. In this particular case, nginx has much wider adoption than node.js and is engineered (partially) as a reverse proxy. The modules I use are compiled in by default and are used all over the web. With node you would usually use different connect middleware packages to get this functionality – this is far removed from the core HTTP functionality.

          Without knowing the particulars of each module, I am hesitant to call them all out as “immature”. That said, I’ve had widely varying experiences with different node modules: not all of them are ready for production (see my experiences with connection pooling or S3 uploads). I think it is just the nature of the community right now: lots of people pulling in different directions and lots of abandoned projects on github.

  • Pingback: nginx as a proxy server: farming & failover of a node.js app | I can haz cøde

  • James

    Good article. node.js is way far away to be a main http server in a production environment and it’s ridiculous to serve static contents through node.js. Your solution is by far the best combination to utilize node.js in a production environment. I am also wandering why not only route trafic to node.js only if the request matches certain rules and leave default requests to nginx(assume all you node.js codes are under /njs):
    location ~ ^/(njs/) {

    proxy_pass;
    }

    • http://twitter.com/cosbynator Thomas Dimson

      That would work too! For better or for worse, I’m using node to handle my primary HTTP traffic so it makes sense for it to be the default route (i.e. not namespaced by /njs). If you are using node.js for smaller parts of a large app (say file uploads), then a namespaced route to it makes a lot of sense.

      At this point I think it makes more sense to only use it for small parts of a bigger app (like file uploads, or certain real-time aspects), but that’s only in hindsight.

  • http://twitter.com/utuxia Utuxia Consulting

    Nice article…i was wondering this. Can explain how a CDN will fit into this mix?

    • http://twitter.com/cosbynator Thomas Dimson

      CDN? There isn’t much to do with a CDN, you would just upload your resources (or have a build script that does) and then have node link to the CDN servers instead of a relative path. They should just be a more available / faster host for static content.

  • aqarynet

    great article……great way to express the apps really impreeesed.

  • Pingback: Hardening node.js for production part 3: zero downtime deployments with nginx | Arg! Team Blog

  • Nathan

    Thank. You.

  • Pingback: Retrogaming.org » Nginx + Node.JS

  • http://www.facebook.com/danielhough Daniel Hough

    Guys – awesome guide to getting a good, safe front for node.js using nginx. Saved me hours and hours and hours of messing around with haproxy like I was before. This is a much, much better solution.

  • Pingback: Hardening node.js for production part 2: using nginx to avoid node.js load | Arg! Team Blog | Node.js and JavaScript | Scoop.it

  • http://twitter.com/tehsuck Chiefus Baltar

    This is a very helpful write-up for me. I have a web service written w/ Express.js and I want to offload some of the heavier crap somewhere else. Thanks for the article!

  • Pingback: Optimising NginX, Node.JS and networking for heavy workloads — GoSquared Engineering

  • Pingback: 为重负网络优化 Nginx 和 Node.js - 博客 - 伯乐在线

  • Pingback: 为高负载网络优化 Nginx 和 Node.js

  • Pingback: Definite Digest » 为重负网络优化 Nginx 和 Node.js

  • http://twitter.com/ngrilly Nicolas Grilly

    You wrote that node.js is not a good fit for CRUD apps. I’m curious to know why? Anyways, thanks for sharing this excellent nginx config!

    • http://twitter.com/cosbynator Thomas Dimson

      Well, I could launch into a long rant about tradeoffs. In a nutshell:

      Some benefits I see are:
      - A reasonable, way of expressing parallelism (async.js is pretty great – and it is nice to make external service calls without worrying about whether I am blocking requests)
      - Excellent programming model for realtime-apps (we used it for our “party mode” and still have no complaints)
      - Same code on server and client? Although, for us the client is objective C so that argument kind of goes away.

      Some issues I see are:
      - Really, really immature libraries for a lot of things (the main SQL ORM still doesn’t have transaction support. I had to submit a patch upstream to get painless connection pooling. I had to write my own S3 put that didn’t load things into memory. I had to write my own process monitor and bouncing). Of course, this is somewhat subjective but I got burned a few times during development.
      - Dealing with all the callbacks is a bit awkward – I think this is fine when you are dealing with a small component of an app, but Silly Face Society quickly became callback soup as we grew. If you are pragmatic, people argue that it doesn’t happen. Maybe that’s true.

      There are similar issues with Rails or Django, but in those cases you have over 5 years of people making CRUD apps on the platform – it is explored territory.

  • Anton Stoychev

    Great article. Enough detail on explanations and still in a good pace.

  • Pingback: 为重负网络优化 Nginx 和 Node.js | Nautilus

  • Pingback: 为重负网络优化 Nginx 和 Node.js | thomas&L技术博客

  • Pingback: 为重负网络优化 Nginx 和 Node.js | 蛋清

  • Pingback: Best Learning Resources for Meteor.js | XNFA ZEN

  • Tom Dworzanski

    This is a great explanation on how to build put an Nginx reverse proxy in front of Node.js. The only issue is dealing with Web Sockets which are referenced in Nginx’s docs on the topic: 

    http://nginx.org/en/docs/http/websocket.html

    That being said, I don’t think there is any point to using an Nginx reverse proxy if you can just as easily use a pure JavaScript/Node reverse proxy while getting similar performance. One of the major reasons so many people love Node is because it allows them to use JavaScript everywhere. Extending the use of JavaScript to the reverse proxy just seems natural. Node is very well tested in large production environments and should be able to serve static data just as well as Nginx with the benefit of having a pure JavaScript stack. Until someone can show me some serious reason (benchmarks, proven security issues, etc), I’m of the opinion that it’s best to stick with a Node-based reverse proxy.

    • http://www.linkedin.com/in/matthewtagg Matt Tagg

      Have you been convinced otherwise Tom? Otherwise I’m somewhat inclined to agree.

      • http://dworzanski.com/ Tom Dworzanski

        Hi Matt. Sorry for the late reply. I tend to re-research this topic every now-and-then to see if there are any new thoughts.

        Personally, I continue to believe that Nginx is unnecessary. I appreciate the link and the reasons presented, but they are not technically compelling to me.

  • http://arnodo.net/ Alessandro Arnodo

    which is the benefit of running two instance of node on the same machine instead of only one instance?

    • pedros

      i think its for matter of “multi-processing” as he pointed out in the article. Not sure i fully get this one but …

    • tdimson

      One is to utilize multiple processors, but you could also use node’s ‘cluster’ module for that. In part 3, I show how to use two instances to cut downtime during rollouts – you bounce one, keeping the other available while it reboots. In general, node processes are pretty fragile and it is good to have redundancy :)

  • Pingback: From http request to dynamically rendering a webpage | Toobler Blog

  • Jim

    Why is “cacheMiddleware = cacheMiddleware”?

  • Pingback: [推荐]为什么我会喜欢Node.js,作者分析的非常全面 | 天天三国杀

  • Pingback: 为什么我喜欢NodeJS – 码农网

  • Pingback: Why The Hell Would I Use Node.js? A Case-by-Case Introduction | Coding Storage

  • Robin

    Hi, this is good. I have a trouble though, my HTTP trailers are not sent when my app is behind nginx. I use the addTrailers method from nodejs, works great in dev, but doesn’t work in production (behind nginx). HTTP trailers are not sent ! Any idea where it comes from ?

  • absuk

    Hi, loved the post, thank you very much for sharing. just wanted to see if you have an idea on this. i keep on getting a problem when i place

    location ~ ^/(images/|img/|javascript/|js/|css/|stylesheets/|flash/|media/|static/|robots.txt|humans.txt|favicon.ico) {
    root /usr/local/silly_face_society/node/public;
    access_log off;
    expires max;
    }

    in my nginx config, in the logs i get an error permission denied. and when i look at the application, the links are all broken. All i can think, is that Express.js is not configured correctly, i mean the path needs to be set for the static files.. please let me know if this is the case.

    • http://oskarhane.com/ Oskar Hane

      It sounds like nginx doesn’t have read permission to that folder.

  • andrewvijay

    wow !! best read in a grueling 4 hrs. Its 3 am and feeling proud.!! yayy..!!!!

  • jim

    So do you have any data to prove that going via Nginx is more performant than just node.js for ssl, gzip and caching (although not the noddy case with static resources that should be on CDN)?

  • Avi Kohn

    This is one of the only article series on using Node in production, I found it extremely useful in my own application.

    The application was hosting multiple sites on one server, so I made a few changes to the nginx configuration.

    Here’s what I did:

    root /var/www/sites/{{site}}/static;

    location ~ ^/(images/|img/|fonts/|javascript/|js/|css/|stylesheets/|flash/|media/|static/|robots.txt|humans.txt|favicon.ico) {
    access_log off;
    expires max;

    try_files $uri $uri/ @node;
    }

    location / {
    try_files $uri @node;
    }

    include /var/www/nginx/conf/proxy.conf;

    Where proxy.conf is:

    location @node {
    proxy_redirect off;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Host $host;
    proxy_set_header X-NginX-Proxy true;
    proxy_set_header Connection “”;
    proxy_http_version 1.1;
    proxy_pass http://node_upstream;

    expires off;
    }

    This lets nginx serve static files/directories when possible and only send requests down to node when absolutely necessary. It also logs all requests NOT to the “approved” static resources so caching can be adjusted later.