What is the use of Synchronizing Timer?
What is the purpose of "Std deviation" in summary report?
What is the difference between running the jmeter script in GUI and Command Prompt?
Synchronizing Timer:
Consider that you are load-testing.
Start 25 threads (with synchronizing timer disabled).
You will note that the start time of first thread will have a difference of about 800ms to 1000ms when compared to the last thread.
This ideally is not a good testing condition for loads.
Now consider the same scenario with synchronizing timer enabled. You will notice that the start time of all the threads is absolutely the same. Ideal scenario for load testing.
Std Deviation:
Standard deviation quantifies or indicates how much the response time varies around its mean or average. I would suggest not to judge the system performance based on Standard Deviation. In reality this just indicates how much the system is fluctuating. Nevertheless, Deviations should be minimum i.e. less than 5%.
GUI and CMD:
Lets just say that on one hand, the GUI makes the program more intuitive; on the other hand, it consumes more resources. JMeter GUI should only be used for test development or debugging. Personally, I do not advise using JMeter in GUI mode if you are initiating an actual load test.
JMeter official documentation defines Synchronizing Timer very well.
The purpose of the SyncTimer is to block threads until X number of threads have been blocked, and then they are all released at once. A SyncTimer can thus create large instant loads at various points of the test plan.
So, we can use Synchronizing Timer in order to create required loads. For example if we use 3000 value in Synchronizing Timer then all the requests will keep accumulating for 3 seconds and will be instantly released after 3 seconds, thus creating greater load.
Standard Deviation gives you an idea that how much variation is in results from the average. In general we can say that, a lower Std deviation value means good performance and higher std deviation value points to issues.
JMeter GUI mode is suitable only for creating scripts or debugging them. While performing actual load tests, JMeter should be run from CMD as it is more efficient and consumes less memory as compared to GUI mode. Check this JMeter blog on how to run JMeter from CMD.
Related
I have 5 Scenarios in total, and 70 Users segregated to different Scenarios which runs for around 15 Minutes only with 1 Loop configuration.
Is it ideal test duration to evaluate the realistic Performance results?
Or do I need to adjust with the test duration?
Any suggestion on this is highly appreciated.
Thanks
It depends on what you're trying to achieve. 70 concurrent users doesn't look like a real "load" to me, moreover given you have only one loop you may run into the situation when some users have already finished executing their scenarios and were shut down and some are still running or even have not yet been started. So I would recommend monitoring the real concurrency using i.e. Active Threads Over Time listener to see how many users were online at the given stage of the test.
Normally the following testing types are conducted:
Load testing - putting the system under anticipated load and ensuring that main metrics (i.e. response time and throughput) are matching NFRs or SLAs
Soak testing - basically the same as load testing, but it assumes prolonged duration (several hours, overnight or over the weekend). This testing type allows to discover obvious and non-obvious memory leaks
Stress testing - starting with anticipated number of users and gradually increasing the load until response time starts exceeding acceptable threshold or errors start occurring (whatever comes the first). If will shed some light on the slowest or most fragile component, to wit the first performance bottleneck
Check out Why ‘Normal’ Load Testing Isn’t Enough article for more information on the aforementioned performance testing types.
No matter which test you're conducting consider increasing (and decreasing) the load gradually, i.e. come up with proper ramp-up (and ramp-down) strategies, this way you will be able to correlate increasing load with i.e. increasing response time
Performance Tests in java is a bit tricky, it can vary wildly depending on what other programs are running on the system and what its load is.
In an ideal world, you need to use a dedicated system, if you can't make sure to quit all programs you're running (including the IDE), The Java HotSpot compiler kicks in when it sees a ‘hot spot’ in your code. It is therefore quite common that your code will run faster over time! So, you should adapt and repeat your testing methods, investigate memory and CPU usage.
or even better you can use a profiler. There are plenty around, both free profilers and demos / time-locked trials of commercials strength ones.
My scenario,
Step1: I have set my thread group for 1000:threads & 500:seconds
Step2:Configure heep space : HEAP=-Xms1024m -Xmx1024m
Step3:Now, running jmeter for non gui mode.
In this scenario,"Uncaught Exception java.lang.OutOfMemoryError: unable to create new native thread" error occuring in my system.
My system configuration
Processor:Intel® Pentium(R) CPU G2010 # 2.80GHz × 2
OS Type:32 bit
Disc:252.6GB
Memory:3.4 GiB
kindly give me a solution for this scenario.
Thanks,
Vairamuthu.
You don't have enough memory in your machine to consume 1000 threads. It is clearly visible from the error that your machine can not create 1000 threads. You should tweak your machine to resolve this situation.
You have to consider these points:
JMeter is a Java tool it runs with JVM. To obtain maximum capability, we need to provide maximum resources to JMeter during execution.First, we need to increase heap size (Inside JMeter bin directory, we get jmeter.bat/sh)
HEAP=-Xms512m –Xmx512m
It means default allocated heap size is minimum 512MB, maximum 512MB. Configure it as per you own PC configuration. Keep in mind, OS also need some amount of memory, so don't allocate all of you physical RAM.
Then, add memory allocation rate
NEW=-XX:NewSize=128m -XX:MaxNewSize=512m
This means memory will be increased at this rate. You should be careful, because, if your load generation is very high at the beginning, this might need to increase. Keep in mind, it will fragment your heap space inside JVM if the range too broad. If so Garbage Collector needs to work harder to clean up.
JMeter is Java GUI application. It also has the non-GUI edition which is very resource intensive(CPU/RAM). If we run Jmeter in non-GUI mode , it will consume less resource and we can run more thread.
Disable all the Listeners during the Test Run. They are only for debugging and use them to design your desired script.
Listeners should be disabled during load tests. Enabling them causes additional overheads, which consume valuable resources (more memory) that are needed by more important elements of your test.
Always try to use the Up-to-date software. Keep your Java and JMeter updated.
Don’t forget that when it comes to storing requests and response headers, assertion results and response data can consume a lot of memory! So try not to store these values on JMeter unless it’s absolutely necessary.
Also, you need to monitor whether your machine's Memory consumption, CPU usages are running below 80 % or not. If these usages exceed 80 % consider those tests as unreliable as report.
After all of these, if you can't generate 1000 threads from your machine, then you must try with the Distributed Load Testing.
Here is a document for JMeter Distributed Testing Step-by-step.
For better and more elaborated understanding these two blogs How many users JMeter can support? and 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure must help.
I have also found this article very helpful to understand and how to handle them.
The error is due to lack of free RAM.
Looking into your hardware, it doesn't seem you will be able to produce the load of 1k users so I would recommend reconsidering your approach.
For example, you anticipate 1000 simultaneous users working with your application. However it doesn't necessarily mean 100 concurrent users as:
real users don't hammer application non-stop, they need some time to "think" between operations, this "think time" differs depending on application nature, but you should keep it as close to reality as possible
application response time should be added to think time
So given you have 1000 users, each of them "thinks" 10 seconds between operations and application response time is 2 seconds, each user will be able to send 5 requests per minute (60 / (10 + 2)).
Assuming above scenario 1000 users will send 5000 requests per minute which gives us ~83 requests per second and it seems to be achievable with your current hardware.
So if you are not in position to get more powerful hardware or more similar machines to use JMeter in distributed more, the options are in:
Add "think times" between operations using i.e. Constant Timer or Uniform Random Timer
Change your test scenario logic to simulate "requests per second" rather than "concurrent users". You can do it using Constant Throughput Timer or Throughput Shaping Timer.
Your issue is due to using a 32 bit OS, in this mode you are limited both in what you can allocate as Heap (depending on OS you will not be able to exceed 1.6 to 2.1 g) and native threads creation.
I'd suggest switching to 64 Bits OS + 64 bits Jdk.
But if you don't have any other option try setting in jmeter.sh in JVM_ARGS:
-Xss128k
Or if too low:
-Xss256k
When using dotTrace, I have to pick a profiling mode and a time measurement method. Profiling modes are:
Tracing
Line-by-line
Sampling
And time measurement methods are:
Wall time (performance counter)
Thread time
Wall time (CPU instruction)
Tracing and line-by-line can't use thread time measurement. But that still leaves me with seven different combinations to try. I've now read the dotTrace help pages on these well over a dozen times, and I remain no more knowledgeable than I started out about which one to pick.
I'm working on a WPF app that reads Word docs, extracts all the paragraphs and styles, and then loops through that extracted content to pick out document sections. I'm trying to optimize this process. (Currently it takes well over an hour to complete, so I'm trying to profile it for a given length of time rather than until it finishes.)
Which profiling and time measurement types would give me the best results? Or if the answer is "It depends", then what does it depend on? What are the pros and cons of a given profiling mode or time measurement method?
Profiling types:
Sampling: fastest but least accurate profiling-type, minimum profiler overhead. Essentially equivalent to pausing the program many times a second and viewing the stacktrace; thus the number of calls per method is approximate. Still useful for identifying performance bottlenecks at the method-level.
Snapshots captured in sampling mode occupy a lot less space on disk (I'd say 5-6 less space.)
Use for initial assessment or when profiling a long-running application (which sounds like your case.)
Tracing: Records the duration taken for each method. App under profiling runs slower but in return, dotTrace shows exact number of calls of each function, and function timing info is more accurate. This is good for diving into details of a problem at the method-level.
Line-by-line: Profiles the program on a per-line basis. Largest resource hog but most fine-grained profiling results. Slows the program way down. The preferred tactic here is to initially profile using another type, and then hand-pick functions for line-by-line profiling.
As for meter kinds, I think they are described quite well in Getting started with dotTrace Performance by the great Hadi Hariri.
Wall time (CPU Instruction): This is the simplest and fastest way to measure wall time (that is, the
time we observe on a wall clock). However, on some older multi-core processors this may produce
incorrect results due to the cores timers being desynchronized. If this is the case, it is recommended
to use Performance Counter.
Wall time (Performance Counter): Performance counters is part of the Windows API and it allows
taking time samples in a hardware-independent way. However, being an API call, every measure takes
substantial time and therefore has an impact on the profiled application.
Thread time: In a multi-threaded application concurrent threads contribute to each other's wall time.
To avoid such interference we can use thread time meter which makes system API calls to get the
amount of time given by the OS scheduler to the thread. The downsides are that taking thread time
samples is much slower than using CPU counter and the precision is also limited by the size of
quantum used by thread scheduler (normally 10ms). This mode is only supported when the Profiling
Type is set to Sampling
However they don't differ too much.
I'm not a wizard in profiling myself but in your case I'd start with sampling to get a list of functions that take ridiculously long to execute, and then I'd mark them for line-by-line profiling.
Recently I was doing some deep timing checks on a DirectShow application I have in Delphi 6, using the DSPACK components. As part of my diagnostics, I created a Critical Section class that adds a time-out feature to the usual Critical Section object found in most Windows programming languages. If the time duration between the first Acquire() and the last matching Release() is more than X milliseconds, an Exception is thrown.
Initially I set the time-out at 10 milliseconds. The code I have wrapped in Critical Sections is pretty fast using mostly memory moves and fills for most of the operations contained in the protected areas. Much to my surprise I got fairly frequent time-outs in seemingly random parts of the code. Sometimes it happened in a code block that iterates a buffer list and does certain quick operations in sequence, other times in tiny sections of protected code that only did a clearing of a flag between the Acquire() and Release() calls. The only pattern I noticed is that the durations found when the time-out occurred were centered on a median value of about 16 milliseconds. Obviously that's a huge amount of time for a flag to be set in the latter example of an occurrence I mentioned above.
So my questions are:
1) Is it possible for Windows thread management code to, on a fairly frequent basis (about once every few seconds), to switch out an unblocked thread and not return to it for 16 milliseconds or longer?
2) If that is a reasonable scenario, what steps can I take to lessen that occurrence and should I consider elevating my thread priorities?
3) If it is not a reasonable scenario, what else should I look at or try as an analysis technique to diagnose the real problem?
Note: I am running on Windows XP on an Intel i5 Quad Core with 3 GB of memory. Also, the reason why I need to be fast in this code is due to the size of the buffer in milliseconds I have chosen in my DirectShow filter graphs. To keep latency at a minimum audio buffers in my graph are delivered every 50 milliseconds. Therefore, any operation that takes a significant percentage of that time duration is troubling.
Thread priorities determine when ready threads are run. There's, however, a starvation prevention mechanism. There's a so-called Balance Set Manager that wakes up every second and looks for ready threads that haven't been run for about 3 or 4 seconds, and if there's one, it'll boost its priority to 15 and give it a double the normal quantum. It does this for not more than 10 threads at a time (per second) and scans not more than 16 threads at each priority level at a time. At the end of the quantum, the boosted priority drops to its base value. You can find out more in the Windows Internals book(s).
So, it's a pretty normal behavior what you observe, threads may be not run for seconds.
You may need to elevate priorities or otherwise consider other threads that are competing for the CPU time.
sounds like normal windows behaviour with respect to timer resolution unless you explicitly go for some of the high precision timers. Some details in this msdn link
First of all, I am not sure if Delphi's Now is a good choice for millisecond precision measurements. GetTickCount and QueryPerformanceCoutner API would be a better choice.
When there is no collision in critical section locking, everything runs pretty fast, however if you are trying to enter critical section which is currently locked on another thread, eventually you hit a wait operation on an internal kernel object (mutex or event), which involves yielding control on the thread and waiting for scheduler to give control back later.
The "later" above would depend on a few things, including priorities mentioned above, and there is one important things you omitted in your test - what is the overall CPU load at the time of your testing. The more is the load, the less chances to get the thread continue execution soon. 16 ms time looks perhaps a bit still within reasonable tolerance, and all in all it might depends on your actual implementation.
We sometimes have to run some CPU-bound tests where we want to measure runtime. The tests last in the order of a minute. The problem is that from run to run the runtime varies by quite a lot (+/- 5%). We suspect that the variation is caused by activity from other applications/services on the system, eg:
Applications doing housekeeping in their idle time (e.g. Visual Studio updating IntelliSense)
Filesystem indexers
etc..
What tips are there to make our benchmark timings more stable?
Currently we minimize all other applications, run the tests at "Above Normal" priority, and not touch the machine while it runs the test.
The usual approach is to perform lots of repetitions and then discard outliers. So, if the distractions such as the disk indexer only crops up once every hour or so, and you do 5 minutes runs repeated for 24 hours, you'll have plenty of results where nothing got in the way. It is a good idea to plot the probability density function to make sure you are understand what is going on. Also, if you are not interested in startup effects such as getting everything into the processor caches then make sure the experiment runs long enough to make them insignificant.
First of all, if it's just about benchmarking the application itself, you should use CPU time, not wallclock time as a measure. That's then (almost) free from influences of what the other processes or the system do. Secondly, as Dickon Reed pointed out, more repetitions increase confidence.
Quote from VC++ team blog, how they do performance tests:
To reduce noise on the benchmarking machines, we take several steps:
Stop as many services and processes as possible.
Disable network driver: this will turn off the interrupts from NIC caused by >broadcast packets.
Set the test’s processor affinity to run on one processor/core only.
Set the run to high priority which will decrease the number of context switches.
Run the test for several iterations.
I do the following:
Call the method x times and measure the time
Do this n times and calculate the mean and standard deviation of those measurements
Try to get the x to a point where you're at a >1 second per measurement. This will reduce the noise a bit.
The mean will tell you the average performance of your test and the standard deviation the stability of your test/measurements.
I also set my application at a very high priority, and when I test a single-thread algorithm I associate it with one cpu core to make sure there is not scheduling overhead.
This code demonstrates how to do this in .NET:
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
if (Environment.ProcessorCount > 1)
{
Process.GetCurrentProcess().ProcessorAffinity =
new IntPtr(1 << (Environment.ProcessorCount - 1));
}