10 December 2014

Performance Testing -vs- Performance Monitoring

Peter Cannel

In recent years, IT teams have been deluged with an ever increasing number of tools to monitor and performance test web and mobile applications. In fact, IT teams have been faced with a tool sprawl across the entire organization, which can be completely overwhelming.

(Note: This blog post has graphics from our older interface, but the content is so darn good that we've decided to keep it untouched)

Performance testing vs Performance monitoring

Vendor marketing literature and positioning can make it difficult to determine what tools and services are actually useful and necessary. Even the most agile organizations end up with redundant tools and capabilities which consume more of the IT budget than is necessary.

One of those areas we hope to clarify in this post is the difference between load testing and monitoring as it relates to web and mobile applications.

Load testing is the simplest form of performance testing. It is a methodology to determine how an application performs under certain levels of utilization from users potentially located around the globe - using difference devices, browsers and networks.

The goal is to see how a system will perform when subjected to both an expected and stressful amount of load - the latter of which is called stress testing.

Generally, you want to find out either how many concurrent users your website can handle or you want to look at the response times for a given amount of concurrent users.

Think of it as success simulation: For example, what happens if I have thousands of customers in my web shop at the same time? Will it break for everyone or will I actually sell more?

Knowing a bit about how your site or app reacts under load, you may want to dig deeper and examine why it reacts the way it does. When performance testing, you want to keep track of various indicators on the site/app itself while it receives a lot of traffic.

How much memory is consumed? How much time spent waiting for disk reads and write? What’s the database response time? etc.

Both performance testing and stress testing can play important roles in determining exactly how well a given piece of frontend software, such as a website, or a backend system, such as Apache server hosting that site, can deal with the actual loads they're likely to encounter through regular use.

This type of testing will give application teams hard data regarding when additional computing, network or third party (CDN, metrics, ad networks) resources are necessary.

It's worth repeating that users are extremely sensitive to site performance and even minor delays can result in lost revenue, damaged brands, lost customers, lost jobs!.

What is performance testing all about?

A feature set to determine how an application performs and scales under various load scenarios is exactly what Load Impact focuses on.

The first step here is deciding how many Virtual Users (VU) that you want to test with, and what parts of the world they should come from. Virtual Users should be as realistic as possible to simulate how real-world user traffic will affect the target application.

Performance testing should also be agile as you will probably want to run several test iterations.

Test setup and tweaking should be intuitive, not requiring onerous training.

Being able to easily and quickly emulate a wide variety of desktop and mobile browser types is easy:

Carefully consider all the test settings when performance testing

Test setup is easy and multiple different client types can be specified, mobile and/or desktop

Network emulation

Network emulation can be set for each client type as well, providing for much more realistic mobile simulations

Where the real power comes in being able to script or record different user scenarios and run those tests simultaneously. Want to have mobile users doing checkout from Ashburn while desktop users in San Paulo browse? No problem. Want to simulate a surge in traffic at a certain time from android tablets? Easy, you can schedule it:

Scheduling performance testing is typically a good idea

In addition to quickly creating realistic simulations, being able to correlate server resources to the test data is very helpful. By watching how your webserver (or servers) consume resources, you gradually build better and better understanding about how your web application can be improved to handle more load, or to improve response times under load.

Performance testing: Server Metrics Agents with Load Impact

Server metrics agent can collect and correlate a number of key data points during your test

Finally, you may want to start using performance testing tools like Load Impact as an automated development/DevOps tool.

As you make changes to code or infrastructure that you believe will impact the characteristics of your application, you then take measurements with every commit to see how your performance is trending due to those changes (i.e. is it getting better or worse?).

As you understand more and more about the potential performance problems, you iterate between testing and fixing/tuning to improve the trend towards better performance.

The process is called Continuous Delivery and is done using Continuous Integration servers such as Jenkins, CircleCI or TeamCity. In this context, performance testing with tools like Load Impact will be done after deployment to staging, but before user acceptance testing.

Continuous Delivery performance testing methodology

Website Monitoring is a different animal

Website monitoring tools such as Pingdom, Uptime Robot or Jetpack (a wordpress plugin) are just a few of the many that are out there that I have personally used. And these definitely do not fall in the "performance testing" category.

The majority of these tools have free and paid versions with more advanced features. The basic idea here is that a regular connection is made to the site being monitored, that response time is measured and recorded. If the site fails to respond within a pre-determined response window then alerts can be raised to notify the application owner of downtime.

The more advanced tools can check site availability from multi-geographic locations and check the site frequently. Alerting options are what you would expect, SMS, email, apps that provide push notification, etc etc.

I personally use Uptime Robot on my own site which I self host (which is easy to screw up) and just this morning I received an email alert that my site was down. Here is a perfect example of how useful these free monitoring tools are for troubleshooting:

Performance Testing vs. Performance Monitoring

I received this email alert while minding my business having a coffee. I had not changed a thing, honest.

This usually doesn't happen, so the first thing I do is go to my Iphone, open the app that controls VCenter and reboot the wordpress servers and the nginx proxy for good measure. Also a quick check from the internal network and external confirms the site is not available.

Next step was to hit the wordpress server IP address and low and behold, it responded. Huh. This quickly lead me to the following alerts from my DNS provider:

Prepare for DDOS attacks through performance testing

Apparently the site-down alert was not caused by my infrastructure at all, but by a third party (DNS provider)

DDOS Attacks: Not great. Protect yourself with knowledge from performance testing

There are many reasons why a 3rd party can take down your site and DNS should be checked early in troubleshooting

Performance testing vs. performance monitoring: Graphic 2

After some 45 minutes things got back to normal and I received the alert that I'm back online.

While the above example was a simple free monitoring tool, even the more complex tools do not provide capability and results for load testing. I can see where some people might get confused about this though. For example, even the free tool below provides a graph for response time.

Performance testing vs. performance monitoring: Uptime

Historical response times for this site monitor - but this is for a single user!

This is fine and can be useful, but its important to understand that this response time is for a single user. This is very different data from response time data generated by Load Impact:

Performance testing shows how your application will react under the stress of several concurrent users

The yellow, red and green lines show response times from Sydney, San Paulo and Ashburn as the load test progresses. Note that response time in Sydney is 3 times longer than Ashburn at peak load.

So which do most applications teams need most, performance testing or monitoring? In nearly all cases both are necessary and appropriate. They simply solve different problems and shouldn't be viewed as competing for the same IT budget.

Knowing that your servers are operational and the site is available is a basic function of IT operations and most enterprises will have a variety of tools to monitor these services. IT operations has not, historically, been as focused on application delivery as it relates to load testing. This is especially true for cloud based tools that measure last-mile, geographically distributed response times - which is where Load Impact is focused.

So which one do you need?

Regardless of where a performance bottleneck lies, half of the work in fixing it (or working around it) is usually spent identifying where it’s located - using performance monitoring and performance testing together will help you do that.

When it comes to performance testing, it’s usually a matter of experimenting until you find the point at which things either start to fall apart, often indicated by transaction times suddenly increasing rapidly, or just stop working.

When you run a test and reach the point at which the system is clearly under stress, you can then start looking for the bottleneck(s). In many cases, the mere fact that the system is under stress can make it a lot easier to find the bottlenecks.

If you know or suspect your major bottlenecks to be in your own codebase, you can use performance monitoring tools to find out exactly where the code latency is happening.

By combining these two types of tools — performance testing and performance monitoring — you will be able to optimize the right parts of the code and improve actual scalability.

Let’s say you have a website that is accessed by users using regular web browsers. The site infrastructure consists of a database (SQL) server and a web server. When a user accesses your site, the web server fetches data from the database server, then it performs some fairly demanding calculations on the data before sending information back to the user’s browser.

Now, let’s say you’ve forgotten to set-up an important database table index in your database – a pretty common performance problem experienced with SQL databases. In this case, if you only monitor your application components – the physical servers, the SQL server and the web server – while a single user is accessing your site, you might see that the database takes 50 ms to fetch the data and the calculations performed on the web server take 100 ms. This may lead you to start optimizing your web server code because it looks as if that is the major performance bottleneck.

However, if you submit the system to a performance test which simulates a large number of concurrent users with, let’s say, ten of those users loading your web site at exactly the same time, you might see that the database server now takes 500 ms to respond, while the calculations on the web server take 250 ms.

The problem in this example is that your database server has to perform a lot of disk operations because of the missing table index, and those scale linearly (at best) with increased usage because the system has only one disk.

The calculations, on the other hand, are each run on a single CPU core, which means a single user will always experience a calculation time of X (as fast as a single core can perform the calculation), but multiple concurrent users will be able to use separate CPU cores (often 4 or 8 on a standard server) and experience the same calculation time, X.

Another potential scalability factor could be if calculations are cached, which would increase scalability of the calculations. This would allow average transaction times for the calculations to actually decrease with an increased number of users.

The point of this example is that, until you submit a system to real heavy traffic, you have really no idea how it will perform when lots of people are using the system.

Put bluntly, optimizing the parts of the code you identified as performance bottlenecks when being monitored may end up being a total waste of time. It’s a combination of monitoring and testing that will deliver the information you need to properly scale.

< Back to all posts