Product 03 October 2017

Comparing load testing tools

Ragnar Lönn

Are you on the lookout for a load testing tool, but fear the time investment you’re going to have to make to install, configure and then evaluate a bunch of different tools?

We have done much of the hard work for you! As part of our own open source load testing tool review article series (article 1: the review, article 2: the benchmark and article 3: benchmarks v2.0) we created containerized installations for a dozen different load testing tools, plus a utility shell script for benchmarking the tools and compare their relative performance.

The only thing you need to install, in order to be able to test all these different load testing tools, is Docker (Docker installation instructions).

Given that you already have Docker installed, there are basically two things you can do:

  • Use the public Docker images to run each tool separately, on the command line
  • Use the runtests.sh shell script to run a set of benchmarks for some, or all, of the tools

We will go through both options below. First, how to use the public Docker images to try out all the tools, one by one:

Using the public Docker images

To run a simple, 5-second load test using Apachebench you can just do e.g.:

docker pull loadimpact/loadgentest-apachebench
docker run loadimpact/loadgentest-apachebench -t 5 http://test.loadimpact.com/

Try it! If you only have Docker installed it should work right out of the box, and execute a 5-second load test hitting the index page on test.loadimpact.com:

Screen Shot 2017-10-11 at 16.41.02.png

(you actually don't need to issue the docker pull command, that should happen automatically the first time you try the docker run command)

But wait, there is more!

Here are some (working!) command lines you can use right away to run the other 11 tools:

Wrk

docker run loadimpact/loadgentest-wrk http://test.loadimpact.com/

k6 (needs a script to execute, so we fetch a sample script)

curl -o k6script.js http://bit.ly/2yFjQun
docker run -i loadimpact/loadgentest-k6 run - <script.js

Vegeta (outputs binary data by default, needs to be invoked twice to parse output)

echo "GET http://test.loadimpact.com/" |docker run -i loadimpact/loadgentest-vegeta attack | docker run -i loadimpact/loadgentest-vegeta report

Locust (needs a script to execute, so we fetch a sample script)

curl -o locustfile.py http://bit.ly/2g1oirK
docker run -v $(pwd):/tmp2 loadimpact/loadgentest-locust --no-web --clients=5 --hatch-rate=5 --num-request=1000 --host=http://test.loadimpact.com --locustfile=/tmp2/locustfile.py

Artillery

docker run loadimpact/loadgentest-artillery quick --count 10 -n 20 http://test.loadimpact.com/

Gatling (needs a simulation to execute, so we fetch a sample simulation)

curl -o MySimulation.scala http://bit.ly/2wP8EXz
docker run -v $(pwd):/tmp2 loadimpact/loadgentest-gatling -sf /tmp2 -s MySimulation

Jmeter (needs a test plan to execute, so we fetch a sample test plan)

curl -o jmeterplan.xml http://bit.ly/2xx8g41
docker run loadimpact/loadgentest-jmeter -v $(pwd):/tmp2 loadimpact/loadgentest-jmeter jmeter -n -t /tmp2/jmeterplan.xml

Hey

docker run loadimpact/loadgentest-hey -c 10 http://test.loadimpact.com/

Tsung (needs a configuration file to execute, so we fetch a sample configuration file)

curl -o tsung.xml http://bit.ly/2wNszpF
docker run -v $(pwd):/tmp2 loadimpact/loadgentest-tsung -f /tmp2/tsung.xml start

Siege

docker run loadimpact/loadgentest-siege -t 5S http://test.loadimpact.com/

Grinder (needs a properties/config file AND a script to execute, so we fetch two sample files)

curl -o grinder.properties http://bit.ly/2wOh9lG
curl -o grinder.py http://bit.ly/2hBdQYv
docker run -v $(pwd):/tmp2 loadimpact/loadgentest-grinder /tmp2/grinder.properties

The Docker images are based on Alpine Linux, and are as "bare bones” as possible, only containing things needed to support running the tool in question.

Running some benchmarks

You can also use the shell script runtests.sh to more easily run a set of benchmarks for some, or all, of the tools, and get nicely tabulated comparison charts.

Just do:

git clone https://github.com/loadimpact/loadgentest
cd loadgentest
./runtests.sh

You will be shown a menu that allows you to specify a target URL, the concurrency level (usually means number of VUs), test duration (seconds) and (max) number of requests.

Screen Shot 2017-10-11 at 16.52.39.png

  • The only thing you have to do before starting a load test is to choose option 1 and set the target URL. This is the only URL that will be hit during the load test/load tests.
  • Then you can press a-i or A-D to start a load test against the URL, using a specific tool.
  • ...or you can choose 5-7 and run a whole suite of tests in sequence. This will give you a table with results from all the tools that were run, for easy comparison.
  • (For advanced users: the script will check for environment variables TARGETURL, CONCURRENT, REQUESTS and DURATION also, and set the defaults according to what these variables contain)

The "R" option (Add network delay) can only be used if you're on Linux and have netem on your system.

Here is a screenshot of the output after I chose option "6" (Run all static-URL tests):

Screen Shot 2017-10-11 at 14.57.45.png

So what does it do?

The script - runtests.sh - runs the various load testing tools, trying to use parameters as similar as possible when executing each tool, so that the results will be comparable. It is not always possible to use comparable configurations, due to the fact that the tools work differently and offer different configuration options, but we try. There may also be things I have misunderstood about some of the tools, which would mean that runtests.sh could fail to configure them correctly/fairly. Any bug reports (i.e. Github issues) or patches are most welcome!

Apart from the target URL, we try to set these parameters, when executing a tool:

  • Concurrency (how many HTTP/1.x requests the tool can issue in parallel)
  • Duration (how many seconds the tool should be running for)
  • Number of requests (how many HTTP requests to make, in total)

Note that for some tools, Duration is the only thing limiting the length of the test (they ignore the Number of requests setting), and for others it is the opposite. This is annoying; It would be nice if it was possible to define e.g. both a min/max duration and a min/max number of requests for each test. You often want enough samples to get statistical significance, which may mean that you can run the faster tools for shorter periods of time, while the slower ones have to run longer.

Concurrency affects different tools somewhat differently too. Wrk, for instance, allows you to set both the number of concurrent TCP connections Wrk may use, and the number of OS threads Wrk should spawn. Because we configure Wrk to use Concurrency number of threads AND Concurrency number of TCP connections, it could be that for large Concurrency values, Wrk will spawn a lot of threads that might at some point start to hamper performance. Vegeta has no fixed concurrency setting, which actually makes the benchmark numbers for Vegeta somewhat meaningless, I suppose (I haven't included Vegeta in my benchmarking articles either, for that reason). Other tools are often not entirely clear on exactly how their concurrency setting works - it may be just the number of concurrent TCP connections used by a single executing thread (whether OS thread or something else), or a combination of multiple threads and connections.

Notable also is that running any of these tools inside a Docker container tends to reduce performance by a certain amount (I have experienced ~40% reduction in RPS rates), so the results you get will not be directly comparable to results from when you have executed some tool natively on the host machine.

< Back to all posts