What methodology would you use to measure the load capacity of a software server application? - performance

I have a high-performance software server application that is expected to get increased traffic in the next few months.
I was wondering what approach or methodology is good to use in order to gauge if the server still has the capacity to handle this increased load?

I think you're looking for Stress Testing and the scenario would be something like:
Create a load test simulating current real application usage
Start with current number of users and gradually increase the load until
you reach the "increased traffic" amount
or errors start occurring
or you start observing performance degradation
whatever comes the first
Depending on the outcome you either can state that your server can handle the increased load without any issues or you will come up with the saturation point and the first bottleneck
You might also want to execute a Soak Test - leave the system under high prolonged load for several hours or days, this way you can detect memory leaks or other capacity problems.
More information: Why ‘Normal’ Load Testing Isn’t Enough

Test the product with one-tenth the data and traffic. Be sure the activity is 'realistic'.
Then consider what will happen as traffic grows -- with the RAM, disk, cpu, network, etc, grow linearly or not?
While you are doing that, look for "hot spots". Optimize them.
Will you be using web pages? Databases? Etc. Each of these things scales differently. (In other words, you have not provided enough details in your question.)
Most canned benchmarks focus on one small aspect of computing; applying the results to a specific application is iffy.

I would start by collecting base line data on critical resources - typically, CPU, memory usage, disk usage, network usage - and track them over time. If any of those resources show regular spikes where they remain at 100% capacity for more than a fraction of a second, under current usage, you have a bottleneck somewhere. In this case, you cannot accept additional load without likely outages.
Next, I'd start figuring out what the bottleneck resource for your application is - it varies between applications, but in most cases it's the bottleneck resource that stops you from scaling further. Your CPU might be almost idle, but you're thrashing the disk I/O, for instance. That's a tricky process - load and stress testing are the way to go.
If you can resolve the bottleneck by buying better hardware, do so - it's much cheaper than rewriting the software. If you can't buy better hardware, look at load balancing. If you can't load balance, you've got to look at application architecture and implementation and see if there are ways to move the bottleneck.
It's quite typical for the bottleneck to move from one resource to the next - you've got CPU to behave, but now when you increase traffic, you're spiking disk I/O; once you resolve that, you may get another CPU challenge.

Related

Performing load/performance testing in VDI, does it provides proper results

Till now I was doing load/performance testing (load runner & jMeter) on my local instance(connected to LAN not over wifi) and I was sure about the results. But now I have to do it Virtual desktop infrastructure (VDI). Does it provide the same result as the local instance? Or is it good practice to perform the tests over the VDI?
LoadRunner or JMeter don't care about underlying hardware as you have at least 2 abstraction layers: operating system and C and/or Java runtime.
So given your VDI has the same hardware specifications - you should be getting the same results for the same test in terms of delivered load. I would however recommend monitoring your VDI main health metrics, like CPU load, RAM and Pagefile usage, Network and Disk IO, etc.
In the majority of cases VDIs don't have fully dedicated resources, i.e. if you see 64GB of RAM it is not guaranteed you can allocate them all as the RAM may be shared between other VDIs on hypervisor level.
So monitor your load generator(s) system resources usage and make sure you have enough headroom for running your load tests. See How to Monitor Your Server Health & Performance During a JMeter Load Test guide for more details.
Use a physical load generator as a control element. Run single virtual users of each type on the physical box. If you see that your control group begins to take on different performance characteristics (slower, higher degrees of variance as measured by standard deviation, higher averages, 90th percentiles and maximums) then you have a case for maintaining some physical infrastructure for testing.
The biggest issue directly attacking timing record integrity inside of virtualized load generators is clock jump. The system clock in the virtualized host floats slower with respect to the physical clock on the hardware. Occasionally this needs to be re-synched, causing time to jump. Inevitably this happens while the timing record is open and causes what appears to be a long timing record. Some people suggest that this doesn't happen until you start to see backups in the CPU queue length, which is somewhere in the 75-80% CPU range. I have seen it at even as low as 10% of CPU, because under those light loads the hypervisor can make decisions to parse resources to higher need virtualized instances and then when you get the CPU token back it is time to jump the clock.
The control load generator provides a check against this behavior. If necessary you can even use the control load generator in a statistical sampling model along the lines of manufacturing quality control. You can also show objectively to anyone demanding you move to a virtualized model what the impact of this change will be on the integrity of the response time samples collected.
It depends on the VDI. Is it on same network as your localhost?
If it is, the result would be almost same (depends on configuration of VDI too, very minimal overheads that we might not even notice though)
If it is not, the result would depend on how good is that network.

Performance of CPU

While going through Computer organisation by Patterson,I encountered a question where I am completely stuck. Question is:
Suppose we know that an application that uses both a desktop client and a remote server is limited by network performance. For the following changes state whether only the throughput improves, both response time and throughput improve, or neither improves.
And the changes made are:
More memory is added to the computer
If we add more memory ,shouldn't the throughput and execution time will improve?
To be clear ,the definition of throughput and response time is explained in the book as:
Throughput: The amount of work done in a given time.
Response Time: time required to complete a task ,tasks are i/o device activities, Operating System overhead, disk access, memory access.
Assume the desktop client is your internet browser. And the server is the internet, for example the stackoverflow website. If you're having network performance problems, adding more RAM to your computer won't make browsing the internet faster.
More memory helps only when the application needs more memory. For any other limitation, the additional memory will simply remain unused.
You have to think like a text book here. If your only given constraint is network performance, then you have to assume that there are no other constraints.
Therefore, the question boils down to: how does increasing the memory affect network performance?
If you throw in other constraints such as the system is low on memory and actively paging, then maybe response time improves with more memory and less paging. But the only constraint given is network performance.
It wont make a difference as you are already bound by the network performance. Imagine you have a large tank of water and tiny pipe coming out it. Suppose you want to get more water within given amount of time (throughput). Does it make sense to add more water to the tank to achieve that? Its not, as we are bound by the width of the pipe. Either you add more pipes or you widen the pipe you have.
Going back to your question, if the whole system is bound by network performance you need to add more bandwidth, to see any improvement. Doing anything else is pointless.

Report analysis

Please can you provide me the detailed analysis of the attached graph,Just a general review.This is a composite graph related to transaction,response and active threads.
I am not able to explain this graph to my client.
Configuration-
Thread-500
Ramp up-100
loop count-1.
Is it good to go or not,or what about the spikes,errors that needs to be handle.What are the reason for this error and what can we do to fix it.Please let me know.enter image description here
http://i.stack.imgur.com/IKfsi.png
At peak transactions(red-line) we would have expected peak errors(pink-line) and peak response time(blue-line). Oddly however, the peak errors occurred on the down ramp of transactions, implying a lag in processing while response times abnormally continued to increase. Based on my experience this implies strained system resources, try increasing the server's system resources and retest for comparison.
2 screenshots is not enough to perform analysis, however looking into them I can figure out the following:
On well-behaved system throughput should increase as the load increases. On this one it doesn't.
Response time continues to grow even when load ended. It normally indicates memory leak on application under test side. Try going for Soak Testing - put the application under prolonged load and see if it fail with the form of out of memory error.
I would also recommend monitoring system resources on application under test side as application can misbehave if it lacks i.e. RAM or due to slow disk. It might be cheaper to add RAM or replace disk than to tune the application to fit into existing hardware.

Is there a technique to predict performance impact of application

A customer is running a clustered web application server under considerable load. He wants to know if the upcoming application, which is not implemented yet, will still be manageable by his current setup.
Is there a established method to predict the performance impact of application in concept state, based on an existing requirement specification (or maybe a functional design specification).
First priority would be to predict the impact on CPU resource.
Is it possible to get fairly exact results at all?
I'd say the canonical answer is no. You always have to benchmark the actual application being deployed on its target architecture.
Why? Software and software development are not predictable. And systems are even more unpredictable.
Even if you know the requirements now and have done deep analysis what happens if:
The program has a performance bug (or two...) - which might even be a bug in a third-party library
New requirements are added or requirements change
The analysis and design don't spot all the hidden inter-relationships between components
There are non-linear effects of adding load and the new load might take the hardware over a critical threshold (a threshold that is not obvious now).
These concerns are not theoretical. If they were, SW development would be trivial and projects would always be delivered on time and to budget.
However there are some heuristics I personally used that you can apply. First you need a really good understanding of the current system:
Break the existing system's functions down into small, medium and large and benchmark those on your hardware
Perform a load test of these individual functions and capture thoughput in transactions/sec, CPU cost, network traffic and disk I/O figures for as many of these transactions as possible, making sure you have representation of small, medium and large. This load test should take the system up to the point where additional load will decrease transactions/sec
Get the figures for the max transactions/sec of the current system
Understand the rate of growth of this application and plan accordingly
Perform the analysis to get an 'average' small, medium and large 'cost' in terms of CPU, RAM, disk and network. This would be of the form:
Small transaction
CPU utilization: 10ms
RAM overhead 5MB (cache)
RAM working: 100kb (eg 10 concurrent threads = 1MB, 100 threads = 10MB)
Disk I/O: 5kb (database)
Network app<->DB: 10kb
Network app<->browser: 40kb
From this analysis you should understand how much headroom you have - CPU certainly, but check that there is sufficient RAM, network and disk capacity. Eg, the CPU required for small transactions is number of small transactions per second multiplied by the CPU cost of a small transaction. Add in the CPU cost of medium transactions and large ones, and you have your CPU budget.
Make sure the DBAs are involved. They need to do the same on the DB.
Now you need to analyse your upcoming application:
Assign each features into the same small, medium and large buckets, ensuring a like-for-like matching as far as possible
Ask deep, probing questions about how many transactions/sec each feature will experience at peak
Talk about the expected rate of growth of the application
Don't forget that the system may slow as the size of the database increases
On a personal note, you are being asked to predict the unpredictable - putting your name and reputation on the line. If you say it can fit, you are owning the risk for a large software development project. If you are being pressured to say yes, you need to ensure that there are many other people's names involved along with yours - and those names should all be visible on the go/no-go decision. Not only is this more likely to ensure that all factors are considered, and that the analysis is sound, but it will also ensure that the project has many involved individuals personally aligned to its success.

Optimal CPU utilization thresholds

I have built software that I deploy on Windows 2003 server. The software runs as a service continuously and it's the only application on the Windows box of importance to me. Part of the time, it's retrieving data from the Internet, and part of the time it's doing some computations on that data. It's multi-threaded -- I use thread pools of roughly 4-20 threads.
I won't bore you with all those details, but suffice it to say that as I enable more threads in the pool, more concurrent work occurs, and CPU use rises. (as does demand for other resources, like bandwidth, although that's of no concern to me -- I have plenty)
My question is this: should I simply try to max out the CPU to get the best bang for my buck? Intuitively, I don't think it makes sense to run at 100% CPU; even 95% CPU seems high, almost like I'm not giving the OS much space to do what it needs to do. I don't know the right way to identify best balance. I guessing I could measure and measure and probably find that the best throughput is achived at a CPU avg utilization of 90% or 91%, etc. but...
I'm just wondering if there's a good rule of thumb about this??? I don't want to assume that my testing will take into account all kinds of variations of workloads. I'd rather play it a bit safe, but not too safe (or else I'm underusing my hardware).
What do you recommend? What is a smart, performance minded rule of utilization for a multi-threaded, mixed load (some I/O, some CPU) application on Windows?
Yep, I'd suggest 100% is thrashing so wouldn't want to see processes running like that all the time. I've always aimed for 80% to get a balance between utilization and room for spikes / ad-hoc processes.
An approach i've used in the past is to crank up the pool size slowly and measure the impact (both on CPU and on other constraints such as IO), you never know, you might find that suddenly IO becomes the bottleneck.
CPU utilization shouldn't matter in this i/o intensive workload, you care about throughput, so try using a hill climbing approach and basically try programmatically injecting / removing worker threads and track completion progress...
If you add a thread and it helps, add another one. If you try a thread and it hurts remove it.
Eventually this will stabilize.
If this is a .NET based app, hill climbing was added to the .NET 4 threadpool.
UPDATE:
hill climbing is a control theory based approach to maximizing throughput, you can call it trial and error if you want, but it is a sound approach. In general, there isn't a good 'rule of thumb' to follow here because the overheads and latencies vary so much, it's not really possible to generalize. The focus should be on throughput & task / thread completion, not CPU utilization. For example, it's pretty easy to peg the cores pretty easily with coarse or fine-grained synchronization but not actually make a difference in throughput.
Also regarding .NET 4, if you can reframe your problem as a Parallel.For or Parallel.ForEach then the threadpool will adjust number of threads to maximize throughput so you don't have to worry about this.
-Rick
Assuming nothing else of importance but the OS runs on the machine:
And your load is constant, you should aim at 100% CPU utilization, everything else is a waste of CPU. Remember the OS handles the threads so it is indeed able to run, it's hard to starve the OS with a well behaved program.
But if your load is variable and you expect peaks you should take in consideration, I'd say 80% CPU is a good threshold to use, unless you know exactly how will that load vary and how much CPU it will demand, in which case you can aim for the exact number.
If you simply give your threads a low priority, the OS will do the rest, and take cycles as it needs to do work. Server 2003 (and most Server OSes) are very good at this, no need to try and manage it yourself.
I have also used 80% as a general rule-of-thumb for target CPU utilization. As some others have mentioned, this leaves some headroom for sporadic spikes in activity and will help avoid thrashing on the CPU.
Here is a little (older but still relevant) advice from the Weblogic crew on this issue: http://docs.oracle.com/cd/E13222_01/wls/docs92/perform/basics.html#wp1132942
If you feel your load is very even and predictable you could push that target a little higher, but unless your user base is exceptionally tolerant of periodic slow responses and your project budget is incredibly tight, I'd recommend adding more resources to your system (adding a CPU, using a CPU with more cores, etc.) over making a risky move to try to squeeze out another 10% CPU utilization out of your existing platform.

Resources