Automation, a hot topic in the broader testing community and somewhat of a holy grail, is the end-goal for many organizations when it comes to understanding performance over time. However, where to start, which tool to choose, or how to get there, is not always as straightforward. Especially if you don’t already have a lot of experience in performance engineering.
This guide aims to lay down the steps and best practices for achieving your goal of performance testing automation.
Why automate performance tests?
Let’s start by examining why you would consider automating your performance tests. To do that we need to revisit why we run performance tests in the first place:
- Avoid launch failures leading to a missed opportunity window and wasted investments, e.g. your app or site crashing during a high-profile product launch event.
- Avoid bad user experience leading visitors and customers to go with the competition, and you ultimately losing revenue, e.g. churning hard won customers due to a non responsive app or website.
- Avoid performance regressions as code changes get deployed to your production system and put in front of end users. This is what this guide is primarily aimed at.
From here, the decision to go for automated testing is hopefully clear:
- Shifting performance testing left, making sure it happens as close to development as possible, giving developers an early feedback loop for performance matters.
- Adding performance regression checks to your Continuous Integration and Delivery (CI/CD) pipelines.
Of course not all types of performance tests are suitable for automation, A/B type performance tests is one such type of performance test where it probably doesn’t make much sense to automate, unless you're aiming to compare the performance of A and B over time.
Know your goals
Besides creating a test case itself, the most important step to ensure a successful performance testing automation project is to document your goals. What metrics, and values (in absolute terms; "response times should be less than 500ms", and not "response times should not be slow"), are important to you, your team and the business.
If you have established Service Level Agreements (SLAs) in place with your customers, or just Service Level Objectives (SLOs) and Service Level Indicators (SLIs) defined internally, then that’s a great starting point. If not, then it’s a great opportunity to bring stakeholders together and discuss what goals you should have defined to drive a performance culture.
Starting with the results of a baseline test is another good way to find a starting point for your goals. A baseline test is a test run with a single or very few VUs that you know your system can handle without breaking a sweat. The idea being that you'll get some real numbers on where your system is at in terms of latency and response time. It's important to make sure that your baseline test is not resulting in any unwanted errors and is functionally accurate.
From the perspective of human perceptive abilities, the following guidance points from Nielsen Norman Group might be of help when deciding on what latency and response time to aim for:
- 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
- 1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
- 10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
Once your goals are clear, you need to codify them as thresholds which is the mechanism by which you specify pass/fail criteria in k6. More on that below.
How to automate performance testing
Once your goals are clear, you can start introducing load tests into your automation pipelines. Running load tests from a continuous integration (CI) system is easy with k6. The set up can easily be generalized across the various CI tools into the following sequence of steps:
- Why automate performance tests?
- Know your goals
- How to automate performance testing
- 1. Installation of k6
- 2. Create a test
- 3. Pass/fail criteria
- 4. Local vs Cloud execution
- 5. Test frequency
- 6. Notifications
- See also
We'll have a closer look at these general steps in more detail below.
1. Installation of k6
The first step to integrating load testing in your CI is to find and install a performance testing tool.
We built k6 for automation. It's a CLI tool that can integrate easily into your tech stack.
Installing k6 can be done in three different ways:
- Using one of the OS specific package managers
- Pulling the Docker image
- Downloading the binary for your OS
See the full installation instructions for more information.
Additionally, we also have available guides for installing k6 in specific CI tools.
2. Create a test
If you haven’t already created test cases for your system, then we suggest having a read through one of our guides for creating tests for websites and APIs/microservices:
In general, when creating test cases the golden rule is to start small and then expanding from that initial starting point. Identify the most business critical transactions in your system and write test cases covering that part of the system.
Version control test files
One of the best practices that we advise users and customers to adopt is version controlling load tests, preferably alongside application code like other types of tests. In our experience this will lead to a higher likeliness of tests being maintained as the application evolves. It also affords you all the usual benefits of version controlling your code.
Modules
Once you’ve gained enough experience with test creation, we strongly advise you to modularize your tests to make them common logic/patterns reusable across members of your team and different performance tests.
3. Pass/fail criteria
Every step in an automation pipeline either passes or fails. As mentioned in Know your goals, the mechanism by which k6 decides whether a test has passed or failed is called thresholds. Without your goals codified as thresholds there's no way for k6 to actually know if your test should be considered a success or failure.
A basic threshold on the 95th percentile of the response time metric looks like this:
You can setup thresholds on any metric in k6 and you can have multiple thresholds per metric. You can also optionally specify that a threshold should immediately abort a test if the threshold is reached. Adding that to the example above would look like this:
If the test ends with one or more failed thresholds k6 will exit with a non-zero exit code signalling to the CI tool that the load test step failed, halting the build/pipeline from progressing further, and hopefully notifying you so you can take corrective action, but more on notifications further down below.
4. Local vs Cloud execution
k6 supports both local (k6 run ...) and cloud execution (k6 cloud ...) modes. In local execution mode k6 will generate all the traffic from the machine where it's being run. In CI this would be the build servers. When executing a test locally, you can optionally stream the results to k6 Cloud for storage and visualization (k6 run -o cloud ...). In cloud execution mode k6 will instead bundle up and send the main JS test file, and all dependent files, to k6 Cloud as an archive for execution on cloud infrastructure managed by our k6 Cloud service. The different modes of execution are appropriate for different use cases. Some general guidance follows:
Use case | Execution mode |
---|---|
Load test with <1000 VUs on a machine with consistent dedicated resources | Local execution |
The target system is behind a firewall and not accessible from the public Internet | Local execution |
Can't ensure consistent dedicated resources locally for load test, e.g. your CI tool is running jobs on machines/containers with varying amounts of resources | Cloud execution |
Need to run test from multiple geographic locations in a test | Cloud execution |
Authenticating with k6 Cloud
If you decide to use the k6 Cloud service, either to stream results with local execution (k6 run -o cloud ...) or through cloud execution, you'll need to authenticate with k6 Cloud. The recommended way to do this is by setting the K6_CLOUD_TOKEN environment variable in your CI tool.
Alternatively you can pass in your k6 Cloud token to k6 login cloud like so:
Get your k6 Cloud token from the account settings page.
5. Test frequency
The boring, but true, answer to the question of how often you should run load tests is that "it depends". It depends on a number of parameters. How often is your application changing? Do you need many or few VUs to generate the necessary load? How long is one full cycle of a VU iteration? etc. Testing a full user journey or just a single API endpoint or website page has different implications on the answer as well.
Consider these three factors when picking the best solution for you:
- VU iteration duration
- Your branching strategy
- Pre-production test environment
VU iteration duration
A rule of thumb is that the shorter the "VU iteration duration" the more frequent you can run your tests without introducing long delays in the development cycle feedback loop, or blocking your team mates' deployments from access to shared pre-production environments.
A quick re-cap of the test life cycle article:
You can find the VU iteration duration in the k6 terminal output:
Branching strategy
Another important aspect to consider is your team's version control branching policy. If you are strict with keeping feature branches separate from you main team-wide development branch, and you have per-feature test environments, then you can also generally run load tests with a higher frequency than if your builds are competing for exclusive access to a shared pre-production environment.
It's also important to consider that load test might not need to run at the same frequency as unit or functional tests. Running load tests could just as well be run as part of Pull/Merge Request creation or when a feature branch gets merged into the main team-wide development branch. Both of these setup variants can be achieved with most CI tools.
Pre-production test environment
As touched upon in the previous section, on branching strategy, the test environment is the third point to consider. We point out "pre-production" specifically as that is the best practices when it comes to load testing. If your team is at the maturity level of running chaos experiments in production then fine, run load test in production as well. If not, stick to pre-production.
Things to consider with your test environment:
- Make sure there's no other traffic hitting the same test environment as your test
- Make sure the system has an adequate amount of data in any database that is being accessed as a result of the test traffic
- Make sure databases only contains test data, and not real production data
Guidance
Generalized, our recommendation is as follows, broken down by VU iteration duration:
VU iteration duration | Test frequency | Trigger mechanism |
---|---|---|
<10s | On every commit | Commit push |
<60s | Daily/Nightly | Scheduling, Pull/Merge Requests or when merging into main development branch |
>60s | Couple of times a week | Scheduling, Pull/Merge Requests or when merging into main development branch |
>5min | Once a week | Scheduling, Pull/Merge Requests or when merging into main development branch |
Scheduling
Besides triggering tests based on commit events, we also often see users and customers use cron or CI tool equivalent mechanisms for running tests on off-hours or at a particular cadence.
If you're using k6 Cloud you can use the built in scheduling feature to trigger tests at a frequency of your choosing.
Load test suite
Do you need to run your full load test suite on every commit? Probably not, but in any case should not be the initial ambition. Repeating the advice from earlier in the guide, start small and expand from there. Start by evaluating the most business critical transactions/journeys you have in your system and start by automating those.
6. Notifications
Once you have your load tests integrated into a CI pipeline you should make sure you’re also getting notified whenever a build/pipeline fails. You might already get notification via email, Slack, Microsoft Teams or Webhook through your CI tool, but if not you can set it up as follows.
For k6 OSS
There’s no builtin notification mechanism in k6 OSS, so you’d have to send a notification from the test script. One way to achieve this is by sending a notification event request to Slack or Microsoft Teams.
For Slack you need to first setup an incoming webhook. Once setup you get a webhook URL that you specify as the target of the POST request in the teardown function:
For Microsoft Teams you need to first setup an incoming webhook connector. Once setup you get a webhook URL that you specify as the target of the POST request in the teardown function:
For k6 cloud
If you are running cloud tests you can also configure custom email- and webhook notifications in the k6.io cloud GUI. It includes several pre-made templates, such as for Slack and Microsoft Teams.
Read more
We have written CI tool specific guides following the steps mentioned above: