📖What you will learn
- Why you should add performance tests to your CI/CD pipelines
- How to design a load testing script for pre-production environments
- How to integrate k6 with Jenkins, a popular CI tool
In this article, we’ll showcase how to use k6 for performance testing within CI/CD pipelines.
We’ll be using a playground we set up for this purpose and lead you through the necessary steps to get a feel for how it works, and hopefully by the end you’ll start thinking about how to implement this in your own projects.
Playground
For the purpose of readers to be able to reproduce or try and extend what we will showcase in this article, we’ve built a docker image that unifies everything we’re going to need.
The idea is that you can try and follow along the article, and keep the playground as a way to test POCs or new ideas you might come up with.
The environment consists of a Jenkins instance, with the capability to both deploy the sample web app locally as well as run k6 performance tests.
Repository
The test script, Dockerfile, test app, and pipeline as code can all be found in this repository.
Performance in CI/CD
CI is all about being confident that the changes you are trying to integrate into your application won’t introduce new bugs or at least won’t break the things that worked before.
The way to achieve this is by running automated tests, checking that the application meets your acceptance criteria. However, something that is often left behind in those checks is performance. Some people may argue that’s because performance is a non-functional requirement, but if your application is under enough load and does not scale well, it’s going to become non-functional soon!
That’s why many companies already use performance metrics in their acceptance criteria/DOD/non-functional requirements. If you have an e-commerce site, you don’t want it to crumble under load during peak sales seasons, like Black Friday and Christmas. If you manage a school’s enrollment system, you’ll want to make sure that students can register for their courses at the start of the semester, no matter how many of them are trying to at once. The list goes on. Performance matters.
So, where does performance testing fit into CI/CD pipelines?
When we talk about CD, we can mean Continuous Delivery or Continuous Deployment, but in this post, we’ll focus on the former.
The idea is that we should be testing performance once we’ve deployed to an intermediate environment, such as QA or staging, which should be as similar to that of production as possible. A common approach is to make those environments a scaled-down production environment.
The same way you don’t run a full automated regression test suite on each run of your pipeline due to time constraints (which shouldn’t take more than 15 minutes, the idea is to have a fast and reliable feedback loop for devs), you are not going to test the performance of all of your application’s functionalities.
A good idea would be to have two or three scenarios that you want to test on each pipeline run and an extended suite you can run overnight.
So, what to do once you have identified those scenarios and have your environment set up? You have to calibrate your test.
Calibrating your tests is about finding a tipping point. You start with a handful of virtual users, run your test, and check the throughput in terms of RPS (requests per second). Then you increase it until you see that the RPS number doesn’t keep increasing after adding more virtual users.
An example would be if you run your test with 100 virtual users, and your RPS is 100. Then you run it with 200 and the RPS is 120. Then with 250 and the RPS is 120. Then 300 and the RPS is 100. You have found your tipping point at around 250 virtual users. Any more than that and your application’s performance degrades.
Then, you should be running the test in your pipeline with around 250 virtual users, and give say a 10% margin for performance decreases. That means that you will be testing the previous scenario, with 250 virtual users, and asserting that the RPS is higher than 120*0.9 and the RT lower than what it was with 250 virtual users plus a 10% margin.
Any performance degradation in any of the components of your application that participate in that use case will be instantly noticeable and can be caught before being deployed to production.
Script calibration
As we previously mentioned, it’s very important to properly calibrate your script in order to get the most out of running it on a CICD pipeline.
For this test, we used the docker container we provided, and we limited it to using 1 CPU and 1GB of RAM. Keep in mind that this container is running both Jenkins and the web app, as well as the performance tests.
Disclaimer, this is not the best way or the standard way to do this. The servers you use for load generators should have nothing else running on them, and neither should your web servers or Jenkins. However, for ease of reproduction, we chose to put them all together in a container. What we mean by this is that none of the components will behave as well as if they had access to their own resources, since they are going to be sharing them (e.g: high response times from the web app can be attributed to this).
For this you can use the command docker run -dp 8080:8080 -m 1024m --cpus="1" <imageName>. Before this, you will need to build a Docker image based on the Dockerfile in the repository.
First, we looked for the point at which we obtained diminishing returns for adding more virtual users and these were the results:
VUS | RPS |
---|---|
25 | 23 |
50 | 43.3 |
100 | 76.5 |
200 | 138.4 |
400 | 171.2 |
800 | 196.2 |
1200 | 210.1 |
1600 | 191.5 |
Based on these results, we decided to run the simulation with 1200 virtual users. The RPS we had with that number was 210.1, and the 95 percentile response time was 5.25s.
Normally, we would allow only a 10% margin on RPS and the 95 percentile value for the response times, in order to check reliably for performance degradation. However, due to the unorthodox and not recommended for real-world usage setup we have going on here, we will increase that margin to 25%.
So, we should check that the simulation has a P95 RT of less than 5.25s 1.25 (6.56s) and the RPS metric is above 210.1 0.75 (157.6).
This wiggle room should ensure that if variance comes into play we don’t get a false positive on our pipeline execution.
We calibrated the script and deployed the application using manually configured freestyle Jenkins jobs, but it’s a good idea to have as much configuration as possible as code. Thus, we’re going to write a declarative pipeline that checks the code out from GitHub, deploys the application, runs our performance tests against it, and then stops the application.
In order to run it, just create a new Pipeline in Jenkins and paste the configuration from the repository into the pipeline definition textbox.
Conclusion
As we showcased, getting started with performance testing in your pipelines can seem like a daunting task, but there’s no reason to be intimidated.
By using a methodical approach, starting small and applying known principles you can now get started with performance testing using k6 as well!
About the Author
Juan Pablo Sobral currently works as a developer for software testing company, Abstracta, where he previously worked in the performance engineering team for two years. He also has a background in DevOps and is currently studying engineering.