At Load Impact, we build a tool that helps you understand and continuously keep track of your application’s performance at varying levels of traffic.
Our software does this by simulating virtual users interacting with your application. (It's pretty cool, if you ask us and thousands of our users )
Simply put: Load Impact is a performance testing service.
I have worked for Load Impact since its founding, and I’m going to share how we use the tool ourselves.
Our dogfooding history
I am ashamed to say this section will be pretty brief. It was the first paragraph I wrote, and I needed to unload it. The thing is, we haven’t prioritized performance testing much internally.
Yes, ouch. It hurts to admit that, but it's the truth.
We've always done load testing before major releases of Load Impact, but we have not done it recurringly as part of our regular development cycle.
Now, I would like to believe that’s because we've always had a good understanding of our applications’ performance limits, but that...would be a lie. We have been proven wrong a few times when we have experienced unforeseen traffic levels.
I want to change this, and I will be describing how we have gotten started with our dogfooding in this blog post.
For correctness sake I should say that we do use the tool a lot in professional services engagements, so we have an internal feedback loop. It’s just us in the engineering team that haven’t done much load testing. Which is nuts, considering that we say "Performance testing platform for developers” on the front page!
Targets: Web site, Web app and API
We have three public applications that we want to test. It is our marketing site, loadimpact.com; our web app at app.loadimpact.com; and the API that our web app and other clients talk to at api.loadimpact.com.
In this article I will focus on how I got started with the testing of our marketing site, loadimpact.com.
Dogfooding: Let’s go!
We always tell people they need to plan their performance testing efforts and set goals for what they want to achieve. This time was no different, but I like the testing part more so I aimed for minimum planning, maximum testing.
Yes, it sounds like something an overly excited infomercial presenter would proclaim about a new revolutionary cleaning detergent, but it was actually a good way to get started.
Minimum planning: Understand your users
Minimum planning it is. I needed two things for this. Identify the key user flows/transactions on the marketing site and then set some goals. First, let’s look at the key user flows/transactions. If you do not have this information for you own application, find someone in your organization that does — usually it’s the most travelled paths on your site/app. Information that you can find in Google Analytics, for example. These are the 5 key flows/transactions for our marketing site:
- Front page → Enter URL to run free test → XHR calls to our API to create test → Transfer to SPA app to watch test results stream in → Register account
- Front page → Pricing page
- Front page → Features page
- API load testing page → Pricing page OR Features page
- Load Script API documentation page (traffic comes from app.loadimpact.com)
The second thing I needed was one or more goal(s). What are we trying to achieve? In our case, with the marketing site I set our initial goals to:
- Handle traffic equivalent to 500 users accessing the site concurrently
- If you want help determining this number from your Google Analytics data see Determining Concurrent Users in Your Load Tests
- Have all pages load in less than 2 seconds
- Note: Not taking into account client side JS execution and page rendering
Five hundred users represents roughly 10x our peak concurrent users to the marketing site, calculated according to the formula from the linked article above based on our hourly visits and the average time on page.
I then split those 500 concurrent virtual users according to the distribution of traffic across our key flows. I also threw in a small percentage of long-tail traffic that randomly visits one of many other pages we have on our marketing site.
Hitting the long-tail pages in a server-side rendered web site is good practice to catch potential issues that could arise as a result of the long-tail pages being stale in cache, requiring access to the database layer.
Having identified the key flows and having our initial goals set I created the necessary user scenarios and test configuration. I created 6 user scenarios, one for each key flow and one for the long-tail pages. I could just as well have created 1 user scenario and made sure pages were hit with the correct probability according to the distribution of traffic seen in Google Analytics.
I used our Chrome recorder to record the flows, just to save some time from having to type the user scenarios by hand.
Here’s what the first user scenario looks like.
This user scenario is interesting because it interacts with all three of our target applications; the marketing site, the API and the app.
To put it all together, the test configuration looks as follows.
Maximum testing: Build up an application performance understanding
Here’s the fun part. The tests were run against our staging environment, which is a replica of our production environment. That way we do not have to care about any background production traffic getting interrupted or skewing our results while testing.
The first batch of tests I ran was with a target of 500 concurrent virtual users. This turned out to be the only testing needed to learn a bunch of things about our product and the marketing site’s performance.
Look at the "VU load time” metric in the screenshot above, it shows a more or less flat trend which is what we are looking for from this "early-warning” metric. This in combination with a look at the throughput metrics (below) told me we passed the test without trouble.
Breaking down the "VU load time” and looking at the individual load times of the pages making up our key flows, I expected to see the same flat trend in the load times, and I did. All good.
I wanted something more interesting to surface. It’s a little bit boring that the system could cope without issue. Usually it’s an iterative process. It takes a few cycles of test → analyze → change stuff to understand the target application’s performance. Some code or more often infrastructure or config changes needs tweaking etc.
To me this says our marketing site is overprovisioned, an effect of us not having set up any auto-scaling mechanism. We picked EC2 instance sizes with too much headroom. Something we will look into improving to save money on our AWS bill!
Product issues encountered while dogfooding
Just this first round of dogfooding has been a good learning experience for me. It becomes painfully obvious when you use your own product what issues you have to work on, and the things you need to make effective use of the product.
Here are a few things I have found during my dogfooding so far:
- User Scenarios
- I would like to be able to turn HTTP response compression on/off globally rather than having to specify it for each individual request
- Test Results
- I want to know which URLs are part of a grouping (aka "Page”)
- I want to see aggregate data, load times, for a specific grouping (aka "Page”) no matter what user scenario or load zone it originates from
- I would like to see how the URL load times are distributed, a histogram
- I would like to have percentiles for load time metrics, not just averages which hides information. Gil Tene did a great presentation on "How NOT to measure latency”
- I would like to be able to sort the URLs by load time and number of failures
- The UX when adding graphs from the "Pages” and "URLs” tabs is not good. It is not obvious that a small chart is added to the "Metrics” tab
I feel bad that it has taken us this long to get started with dogfooding. I also feel bad knowing I have asked people if they load test, seeing the slight guilt in their eyes as they answer "No, but we plan to after we finish X, Y and Z” when I am in the same boat!
Do I believe our current product will dramatically change this dilemma of load testing being seen as a good thing but too complex and/or time consuming to do? No, but I do believe we know how to make the necessary changes to get there! That’s why I am still working on this product. We are not done yet, by far.
However, I have got to say that it feels good to have started load testing regularly and dogfooding our product.
But wait, there’s more: In part 2 of this series, I’ll dive into the next step of our dogfooding process and take the findings from our initial testing and turning them into goals (thresholds) for automated testing.