This document explains how to launch a large-scale k6 test on a single machine without the need for distributed execution.
The common misconception of many load testers is that distributed execution (ability to launch a load test on multiple machines) is required to generate large load. This is not the case with k6.
k6 is different from many other load testing tools in the way it handles hardware resources. A single k6 process will efficiently use all CPU cores on a load generator machine. A single instance of k6 is often enough to generate load of 30.000-40.000 simultaneous users (VUs). This amount of VUs can generate upwards of 300,000 requests per second (RPS).
Unless you need more than 100,000-300,000 requests per second (6-12M requests per minute), a single instance of k6 will likely be sufficient for your needs.
Below we will explore what hardware and considerations are needed for generating different levels of load.
The following OS changes allow k6 to use the full network capacity of the machine for maximum performance.
These commands enable reusing network connections, increase the limit of network connections, and range of local ports.
To apply these changes, you can either paste these commands as a root user before running a k6 test or change the configuration files in your operating system.
For detailed information about these settings, the macOS instructions, and how to make them permanent, check out our "Fine-tuning OS" article.
Network throughput of the machine is an important consideration when running large tests. Many AWS EC2 machines come with 1Gbit/s connection which may limit the amount of load k6 can generate.
When running the test, you can use iftop in the terminal to view in real-time the amount of network traffic generated. If the traffic is constant at 1Gbit/s, your test is probably limited by the network card. Consider upgrading to a different EC2 instance.
Unlike many other load testing tools, k6 is heavily multi-threaded. It will effectively use all available CPU cores.
The amount of CPU you need depends on your test files (sometimes called test script). Regardless of the test file, you can assume that large tests require a significant amount of CPU power. We recommend that you size the machine to have at least 20% idle cycles (up to 80% used by k6, 20% idle). If k6 uses 100% to generate load, it won't have enough CPU to measure the responses correctly. This may cause the result metrics to have a much larger response time than in reality.
k6 likes memory, but it isn't as greedy as other load testing tools. Memory consumption heavily depends on your test scenarios. To estimate the memory requirement of your test, run the test on your development machine with 100VUs and multiply the consumed memory by the target number of VUs.
Simple tests will use ~1-5MB per VU. (1000VUs = 1-5GB). Tests that are using file uploads can consume tens of megabytes per VU.
When running large stress tests, your script can't assume anything about the HTTP response. Often performance tests are written with a "happy path" in mind. For example, a "happy path" check like the one below is something that we see in k6 often.
Code like this runs fine when the system under test (SUT) is not overloaded and returns proper responses. When the system starts to fail, the above check won't work as expected.
The issue here is that the check assumes that there's always a body in a response. The r.body may not exist if server is failing. In such case, the check itself won't work as expected and error similar to the one below will be returned:
To fix this issue your checks must be resilient to any response type. This change will fix the above problem.
If you are running a test for the first time, it's a good idea to keep an eye on the available resources while the test is running. The easiest way to do so is to SSH to the server with 3 sessions:
- To run k6
- To monitor CPU and memory
- To monitor the network
The k6 settings listed below will unlock additional performance benefits when running large tests.
If you are pushing the limits of the hardware, this is the most impactful k6 setting you can enable.
k6 at its core executes ECMAScript 5.1 code. Most k6 script examples and documentation is written in ECMAScript 6+. By default, k6 transpiles ES6+ code to ES5.1 using babel and loads corejs to enable commonly used APIs. This works very well for 99% of use cases, but it adds significant overheard with large tests.
To get the best performance out of k6, it's best to transpile the scripts outside of k6 using webpack.
In k6-hardware-benchmark repository, we have prepared an efficient transpilation scheme that produces performant ES5.1 code for k6.
Use it like this:
Once your code is transpiled, run it like this:
k6 will use about 50-85% of memory in comparison to running the original script. It will also reduce the CPU load, and significantly decrease startup time.
You can tell k6 to not process the body of the response by setting discardResponseBodies in the options object like this:
k6 by default loads the response body of the request into memory. This causes much higher memory consumption and often is completely unnecessary. If you need response body for some requests you can set Params.responseType.
If you are running a local test and streaming results to the cloud (k6 run -o cloud), you may want to disable the terminal summary and local threshold calculation because thresholds and summary will be displayed in the cloud. This will save you some memory and CPU cycles.
Here are all the mentioned flags, all in one:
If everything else has failed and you are trying to squeeze more performance out of the hardware, you can consider optimizing the code of the load test itself.
Checks and groups
k6 records the result of every individual check and group separately. If you are using many checks and groups, you may consider removing them to boost performance.
Similar to checks, values for custom metrics (Trend, Counter, Gauge and Rate) are recorded separately. Consider minimizing the usage of custom metrics.
Thresholds with abortOnFail
If you have configured abortOnFail thresholds, k6 needs to evaluate the result constantly to verify that the threshold wasn't crossed. Consider removing this setting.
Special considerations must be taken when testing file uploads.
The network throughput of the load generator machine, as well as the SUT will likely be the bottleneck.
k6 needs a significant amount of memory when uploading files, as every VU is independent and has its own memory.
k6 can upload a large amount of data in a very short period of time. Make sure you understand the data transfer costs before commencing a large scale test.
Outbound Data Transfer is expensive in AWS EC2. The price ranges between $0.08 to $0.20 per GB depending on the region. If you use the cheapest region the cost is about $0.08 per GB. Uploading 1TB, therefore, costs about $80. Long-running test can cost several hundreds of dollars in data transfer alone.
The AWS EC2 instances are relatively cheap. Even the largest instance we have used in this benchmark (m5.24xlarge) costs only $4.6 per hour.
Make sure to turn off the load generator servers once you are done with your testing. Forgotten EC2 server will cost $3312 per month.
Tip: it's often possible to launch "spot instances" of the same hardware for 10-20% of the cost.
If you run into errors during the execution, it's good to understand if they were caused by the load generator or by the failing SUT.
Error similar to this one is caused by the target system resetting the TCP connection. This happens when the Load balancer or the server itself isn't able to handle the traffic.
Error like this happens when k6 was able to send a request, but the target system didn't respond in time. The default timeout in k6 is 60 seconds. If your system doesn't produce the response in this time frame, this error will appear.
This is a similar error to the one above, but in this case, k6 wasn't even able to make a request. The target system isn't able to establish a connection.
This error means that the load-generator machine isn't able to open TCP sockets because it reached the limit of open file descriptors. Make sure that your limit is set sufficiently high ulimit -n 250000 should be enough for anyone :tm:
Note: you should decide what level of errors is acceptable. At large scale, some errors are always present. If you make 50M requests with 100 failures, this is generally a good result (0.00002% errors).
We have executed a few large tests on different EC2 machines to see how much load k6 can generate. Our general observation is that k6 scales proportionally to the hardware. 2x larger machine is able to generate 2x more traffic. The limit to this scalability is in the number of open connections. A single Linux machine can open up to 65 535 sockets per IP. This means that maximum of 65k requests can be executed simultaneously on a single machine. The RPS limit depends on the response time of the SUT. If responses are delivered in 100ms, the RPS limit is 650 000.
Testing the theoretical limits is fun, but that's not the point of this benchmark. The point of this benchmark is to give users an indication of how much traffic k6 can generate when executing complicated, real-life tests. For this purpose, we have written a rather heavy real-life website test that uses almost all k6 features.
- All tests were executed on AWS EC2 instances
- The "discardResponseBodies" recommendation was NOT used. (results would be better with this setting).
- Scripts used for testing are available in the /scripts directory. The results are reproducible
- k6 v0.26.2 was used
- Note: the target system (test.k6.io) was running on a large cluster to boost performance.
- Note: the target system (test.k6.io) is a slow-ish PHP website, not optimized for performance - a static website would be much quicker.
The "website.js" test file uses a wide range of k6 features to make the test emulate a real usage of k6. This is not a test rigged for performance - quite the opposite. This test uses plenty of custom metrics, checks, parametrization, batches, thresholds and groups. It's a heavy test that should represent well the "real life" use case.
> AWS m5.large EC2 server
The m5.large instance has 8GB of RAM and 2 CPU cores.
The following command was used to execute the test
- Maximum VUS reached: 6000
- Memory used: 6.09 GB (out of 8.0)
- CPU load (avg): 1.49 (out of 2.0).
- Peak RPS: ~6000 (note, this test was not optimized for RPS).
- 2x sleep(5) in each iteration.
> AWS m5.4xlarge
The m5.4xlarge instance has 64GB of RAM and 16 CPU cores.
- Maximum VUS reached: 20.000
- Memory used: 20.1 GB (out of 61.4)
- CPU load (avg): 8.5 (out of 16.0).
- Peak RPS: ~20.000 (note, this test was not optimized for RPS).
- 2x sleep(5) in each iteration.
> AWS m5.24xlarge
The m5.24xlarge has 384GB of RAM and 96 CPU cores. NOTE: sleep has been reduced to 1s instead of 5s to produce more requests.
- Maximum VUS reached: 30.000
- Memory used: ~120 GB (out of 370 available)
- CPU load (avg): ~45 (out of 96.0).
- Peak RPS: ~61.500.
- sleep(1) in each iteration.
As stated at the beginning, k6 can produce a lot of requests very quickly, especially if the target system responds quickly. To test the RPS limit of our app we have written an RPS-optimized test. Unfortunately, our test.k6.io target system is a rather slow PHP app. Nevertheless using 30k VUs we have reached 188.000 RPS. Much higher numbers are possible for faster systems.
> AWS m5.24xlarge
- Maximum VUS reached: 30.000
- Memory used: 24 GB (out of 370 available)
- CPU load (avg): 80 (out of 96.0).
- Peak RPS: ~188.500.
k6 can utilize the available network bandwidth when uploading files, but it needs plenty of memory to do so.
Please read the warning about the cost of data transfer in AWS before commencing a large scale test.
> AWS m5.24xlarge
To test the network throughput we have written a file uploading script. We have executed this test for only 1 minute to minimize the data transfer costs. In 1 minute, k6 managed to transfer 36 GB of data with 1000 VUs.
- Maximum VUS reached: 1.000
- Memory used: 81 GB (out of 370 available)
- CPU load (avg): 9 (out of 96.0).
- Network throughput reached 4.7Gbit/s
- Data transferred: 36GB.
Note: each VU in k6 is completely independent, and therefore it doesn't share any memory with other VUs. 1000VUs uploading 26MB file need as much as 81GB of RAM since each VU holds the copy of the file in memory.
In load testing, distributed execution refers to running a load test distributed across multiple machines.
Users often look for the distributed execution mode to run large-scale tests. Although we have shown that a single k6 instance can generate enormous load, distributed execution is necessary to:
- Simulate load from multiple locations simultaneously.
- Scale the load of your test beyond what a single machine can handle.
In k6, you can split the load of a test across multiple k6 instances using the execution-segment option. For example:
However - at this moment - the distributed execution mode of k6 is not entirely functional. The current limitations are:
- k6 does not provide a test coordinator or master instance to coordinate the distributed execution of the test. Alternatively, you can use the k6 REST API and --paused to synchronize the multiple k6 instances' execution.
- Each k6 instance evaluates Thresholds independently - excluding the results of the other k6 instances. If you want to disable the threshold execution, use --no-thresholds.
- k6 reports the metrics individually for each instance. Depending on how you store the load test results, you'll have to aggregate some metrics to calculate them correctly.
The k6 goal is to support a native open-source solution for distributed execution. If you want to follow the progress, subscribe to the distributed execution issue on GitHub.
If you aren't sure which solution, OSS or Cloud, is a better fit for your project, we recommend reading this white paper to learn more about the risks and features to consider when building a scalable solution.