How do online judge sites isolate program performance? - performance

There are many online judge sites which can verify your program by comparing its output to the correct answers. What's more, they also check the running time to make sure that your program running time doesn't exceed the maximum limit.
So here is my question, since some online judge sites run several test programs at the same time, how do they achieve performance isolation, i.e., how can they make sure that a user program running in a heavy-loaded environment will finish within the same time, as when it is running in an idle environment?

Operating systems keep track of CPU time separately from real-world "wall clock" time. It's very common when benchmarking to only look at one or the other kind of time. CPU or file I/O intensive tasks can be measured with just CPU time. Tasks that require external resources, like querying a remote database, are best measured in wall clock time because you don't have access to the CPU time on the remote resource.
If a judging site is just comparing CPU times of different tests, the site can run many tests simultaneously. On the other hand, if wall clock times matter, then the site must either use independent hardware or a job queue that ensures one test finishes before the next starts.

As The Computer Language Benchmarks Game measures both CPU time and Elapsed time those measurements are made sequentially in an idle environment.

Related

Ideal test duration for Performance Test

I have 5 Scenarios in total, and 70 Users segregated to different Scenarios which runs for around 15 Minutes only with 1 Loop configuration.
Is it ideal test duration to evaluate the realistic Performance results?
Or do I need to adjust with the test duration?
Any suggestion on this is highly appreciated.
Thanks
It depends on what you're trying to achieve. 70 concurrent users doesn't look like a real "load" to me, moreover given you have only one loop you may run into the situation when some users have already finished executing their scenarios and were shut down and some are still running or even have not yet been started. So I would recommend monitoring the real concurrency using i.e. Active Threads Over Time listener to see how many users were online at the given stage of the test.
Normally the following testing types are conducted:
Load testing - putting the system under anticipated load and ensuring that main metrics (i.e. response time and throughput) are matching NFRs or SLAs
Soak testing - basically the same as load testing, but it assumes prolonged duration (several hours, overnight or over the weekend). This testing type allows to discover obvious and non-obvious memory leaks
Stress testing - starting with anticipated number of users and gradually increasing the load until response time starts exceeding acceptable threshold or errors start occurring (whatever comes the first). If will shed some light on the slowest or most fragile component, to wit the first performance bottleneck
Check out Why ‘Normal’ Load Testing Isn’t Enough article for more information on the aforementioned performance testing types.
No matter which test you're conducting consider increasing (and decreasing) the load gradually, i.e. come up with proper ramp-up (and ramp-down) strategies, this way you will be able to correlate increasing load with i.e. increasing response time
Performance Tests in java is a bit tricky, it can vary wildly depending on what other programs are running on the system and what its load is.
In an ideal world, you need to use a dedicated system, if you can't make sure to quit all programs you're running (including the IDE), The Java HotSpot compiler kicks in when it sees a ‘hot spot’ in your code. It is therefore quite common that your code will run faster over time! So, you should adapt and repeat your testing methods, investigate memory and CPU usage.
or even better you can use a profiler. There are plenty around, both free profilers and demos / time-locked trials of commercials strength ones.

Set CPU affinity for profiling

I am working on a calculation intensive C# project that implements several algorithms. The problem is that when I want to profile my application, the time it takes for a particular algorithm varies. For example sometimes running the algorithm 100 times takes about 1100 ms and another time running 100 times takes much more time like 2000 or even 3000 ms. It may vary even in the same run. So it is impossible to measure improvement when I optimize a piece of code. It's just unreliable.
Here is another run:
So basically I want to make sure one CPU is dedicated to my app. The PC has an old dual core Intel E5300 CPU running on Windows 7 32 bit. So I can't just set process affinity and forget about one core forever. It would make the computer very slow for daily tasks. I need other apps to use a specific core when I desire and the when I'm done profiling, the CPU affinities come back to normal. Having a bat file to do the task would be a fantastic solution.
My question is: Is it possible to have a bat file to set process affinity for every process on windows 7?
PS: The algorithm is correct and every time runs the same code path. I created some object pool so after first run, zero memory is allocated. I also profiled memory allocation with dottrace and it showed no allocation after first run. So I don't believe GC is triggered when the algorithm is working. Physical memory is available and system is not running low on RAM.
Result: The answer by Chris Becke does the job and sets process affinities exactly as intended. It resulted in more uniform results specially when background apps like visual studio and dottrace are running. Further investigation into the divergent execution time revealed that the root for the unpredictability is CPU overheat. The CPU overheat alarm was off while the temperature was over 100C! So after fixing the malfunctioning fan, the results became completely uniform.
You mean SetProcessAffinityMask?
I see this question, while tagged windows, is c#, so... I see the System.Diagnostics.Process object has a ThreadAffinity member that should perform the same function.
I am just not sure that this will stabilize the CPU times quite in the way you expect. A single busy task that is not doing IO should remain scheduled on the same core anyway unless another thread interrupts it, so I think your variable times are more due to other threads / processes interrupting your algorithm than the OS randomly shunting your thread to a different core - so unless you set the affinity for all other threads in the system to exclude your preferred core I can't see this helping.

How to measure interference between processes

In parallel systems every process has an impact onto other processes, because they all compete for several scarce resources like cpu-caches, memory, disk I/O, network, etc.
What method is best suited for measuring interference between processes? Such as Process A & B each access the disk heavily. So running them parallel will probably slower then running sequential (individual runtime). Because the bottleneck is the hard drive.
If I don't know exactly the behaviour of a process (disk-, memory- or cpu- intensive), what method would be best to analyse that?
Measure individual runtime and compare the relative share of each parallel process?
Like process A runs on average 30s alone, when 100% parallel with B 45s, when 20% parallel 35s.. etc ??
Would it be better to compare several indicators like L1 & LLC cache misses, page faults, etc.??
What you need to do is first determine what the limiting factors are on each of the individual programs. If you want to run CPU-bound and IO-bound at the same time it'll have very little impact. If you want to run two IO-bound processes and the same time there'll be a lot of contention.
I wrote a rather detailed answer about how to interpret the output of "time [command]" results to see what's the limiting factor. It's here: What caused my elapsed time much longer than user time?
Once you have the ouput from "time"ing your programs you can determine which are likely to step on one another and which are not.

How do I get repeatable CPU-bound benchmark runtimes on Windows?

We sometimes have to run some CPU-bound tests where we want to measure runtime. The tests last in the order of a minute. The problem is that from run to run the runtime varies by quite a lot (+/- 5%). We suspect that the variation is caused by activity from other applications/services on the system, eg:
Applications doing housekeeping in their idle time (e.g. Visual Studio updating IntelliSense)
Filesystem indexers
etc..
What tips are there to make our benchmark timings more stable?
Currently we minimize all other applications, run the tests at "Above Normal" priority, and not touch the machine while it runs the test.
The usual approach is to perform lots of repetitions and then discard outliers. So, if the distractions such as the disk indexer only crops up once every hour or so, and you do 5 minutes runs repeated for 24 hours, you'll have plenty of results where nothing got in the way. It is a good idea to plot the probability density function to make sure you are understand what is going on. Also, if you are not interested in startup effects such as getting everything into the processor caches then make sure the experiment runs long enough to make them insignificant.
First of all, if it's just about benchmarking the application itself, you should use CPU time, not wallclock time as a measure. That's then (almost) free from influences of what the other processes or the system do. Secondly, as Dickon Reed pointed out, more repetitions increase confidence.
Quote from VC++ team blog, how they do performance tests:
To reduce noise on the benchmarking machines, we take several steps:
Stop as many services and processes as possible.
Disable network driver: this will turn off the interrupts from NIC caused by >broadcast packets.
Set the test’s processor affinity to run on one processor/core only.
Set the run to high priority which will decrease the number of context switches.
Run the test for several iterations.
I do the following:
Call the method x times and measure the time
Do this n times and calculate the mean and standard deviation of those measurements
Try to get the x to a point where you're at a >1 second per measurement. This will reduce the noise a bit.
The mean will tell you the average performance of your test and the standard deviation the stability of your test/measurements.
I also set my application at a very high priority, and when I test a single-thread algorithm I associate it with one cpu core to make sure there is not scheduling overhead.
This code demonstrates how to do this in .NET:
Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
if (Environment.ProcessorCount > 1)
{
Process.GetCurrentProcess().ProcessorAffinity =
new IntPtr(1 << (Environment.ProcessorCount - 1));
}

What is the best way to measure "spare" CPU time on a Linux based system

For some of the customers that we develop software for, we are required to "guarantee" a certain amount of spare resources (memory, disk space, CPU). Memory and disk space are simple, but CPU is a bit more difficult.
One technique that we have used is to create a process that consumes a guaranteed amount of CPU time (say 2.5 seconds every 5 seconds). We run this process at highest priority in order to guarantee that it runs and consumes all of its required CPU cycles.
If our normal applications are able to run at an acceptable level of performance and can pass all of their functionality tests while the spare time process is running as well, then we "assume" that we have met our commitment for spare CPU time.
I'm sure that there are other techniques for doing the same thing, and would like to learn about them.
So this may not be exactly the answer you're looking for, but if all you want to do is make sure your application doesn't exceed certain limits on resource consumption and you're running on linux you can customize /etc/security/limits.con (may be different file on your distro of choice) to force the limits on a particular user and only run the process under that user. This is of course assuming that you have that level of control on your client's production environment.
If I understand correctly, your concern is wether the application also runs while a given percentage of the processing power is not available.
The most incontrovertible approach is to use underpowered hardware for your testing. If the processor in your setup allows you to, you can downclock it online. The Linux kernel gives you an easy interface for doing this, see /sys/devices/system/cpu/cpu0/cpufreq/. There is also a bunch of GUI applications for this available.
If your processor isn't capable of changing clock speed online, you can do it the hard way and select a smaller multiplier in your BIOS.
I think you get the idea. If it runs on 1600 Mhz instead of 2400 Mhz, you can guarantee 33% of spare CPU time.
SAR is a standard *nix process that collects information about the operational use of system resources. It also has a command line tool that allows you to create various reports, and it's common for the data to be persisted in a database.
With a multi-core/processor system you could use Affinity to your advantage.

Resources