As part of a follow up to last months column about PHP vs Node.js, I hit some problems with Node under load.
As with all technologies, Node.js does have some limitations that may or may not be a problem for your specific use case.
If the last column about comparing PHP and Node.js had a deeper message, that message would be that if you want to scale you have to know your stack. To be completely clear, when I say stack I mean the layers of technology used to server http requests.
One of the most common stacks out there are simply called LAMP - (L)inux (A)pache2 (M)ySQL (P)HP (or Perl). You now see a lot of references to LNMP, where Apache2 is replaced with Nginx. When building Node.js applications, things can vary a lot since node.js comes with it's own http server. In my previous text, I used Node.js together with MySQL on a Linux box, so I guess we can dub that the LNM stack if we absolutely need to have a name for it.
And when I say know your stack, I mean that if you want to produce better than average performance numbers, you have to be better than average in understanding how the different parts in your stack works together.
There are hundreds of little things that most of us never knew mattered that suddenly becomes important when things come under load. As it happens, watching your application work under load is a great way to force yourself to know your stack a little better.Background
When testing Apache/PHP against Node.js, I found that the raw performance of Node.js as well as the ability to handle many concurrent clients was excellent. Faster and more scalable than Apache2/PHP. One reader pointed out that the test wasn't very realistic since there was just one single resource being queried and there was no static content involved. Apache2/PHP could very well relatively better if some of the content was static. So I set up a test to check this and while running this. Node.js crashed. As in stopped working. As in would not server any more http reqeusts without manual intervention. So to keep it shord, Apach2/PHP won that round. But in the spirit of 'know your stack', we need to understand why Node.js crashed. The error message I got was this:
Unhandled 'error' event "events.js:71"
First of all, it took a fair amout of googling to figure out what that the error message was really about. Or, rather, the error message was saying that something happened and there's no error handler for it. So good luck.Fixing it.
The first indication I got via Google and Stack Overflow was that this may be an issue with Node.js before 0.8.22 and sure enough, I was running 0.8.19. So the first thing I did was upgrade to version 0.8.22. But that did not fix the problem at all (but a later and greater version is of course a nice side effect). With almost all other software involved being up to date, this actually required some structured problem solving.Back to the drawing board
I eventually managed to trace the error message down to a 'too many open files' problem which is Interesting as it answers the crucial question: What went wong? This happened at roughly 250 concurrent users with a test that was accessing 6 different static files. This is what it looks like in LoadImpact:
So a little depending on timing, and exactly when each request comes in, it would roughly indicate that some 1500 (6 files times 250 users) files can be open at the same time. Give or take. Most Linux systems are, by default, configured to allow relatively small number of open files, e.g. 1024. The Linux command to check this is ulimit:
$ ulimit -n 1024
1024 is the default on a lot of distros, including Ubuntu 12.10 that I was running the tests on. So my machine had 1024 as the limit but it appears that I had 1500 files open at the same time. Does this make any sense? Well, sort of, there are at least 3 factors involved here that would affect the results:
So in theory, the limit for concurrent simulated browser users should be 256 or less. But in reality, I saw the number of concurrent users go all the way up to 270 before the Node.js process died on me. The explanation to that is more likely than anything just timing. Not all VU's will hit the server at exactly the same time. At the end, hitting problems when running about 250 concurrent users reasons well with the open files limit being the problem. Luckily, the limit of number of open files per process is easy to change:
$ ulimit -n 2048
The next test shows real progress. Here's the graph:
Problem solved (at least within the limits of this test).Summary
Understanding what you build upon is important. If you choose to rely on node.js, you probably want to be aware of how that increases your dependency on various per process limitations in the operating system in general and max number of open files in particular.
You are more affected by these limitations since everything you do takes place inside a single process.
And yes. I know. There are numerous of more or less fantastic ways to work around this particular limitation. Just as there are plenty of ways to work around limitations in any other web development stack.
The key thing to remember is that when you select your stack, framework, language or server, you also select all the limitations that comes with it. There's (still) no silver bullet, even if some bullets are better out of the box than other.
Having spent countless of hours with other web development languages, I think I'm in a good position to compare and yes indeed! Node.js delivers some amazing performance. But at present, it comes with a bigger responsibility to 'Know Your stack' than a lot of the others.