Elimination of run time variation over repeated executions of the same program - performance

I am trying to design an Online Programming Contest Judge, and one of the things that I need to ensure is that when the same code is compiled (assuming the requirement),
given the same input, it should take exactly the same amount of time for the program to execute, each time this is done.
Currently, I am using a simple python script that
has 2 threads, one of which invokes a blocking system call that starts the execution of the test code, and the other keeps track of time and sends a kill signal to the
child process after the time limit expires. Incidentally, I am doing this inside a virtual machine for reason of security, and convenience (setting up a proper chroot is
way too complicated, and more risky).
However, given identical conditions (ie, when I restore a snapshot), I still get a variation in the time taken for execution in range of approximately 50ms on either side. As this prevents setting strict time limits, is there anyway to eliminate this variation?

I'm not an expert in that field, but I don't think you can do it. Even if you restore the snapshot inside the VM, the state of the "Outside" Machine is going to be pretty different. You have two OSs running, each one which multiple process which are probably going to compete for the resources at some point. If it's a website or a PC with an internet connection, you can get hit by different amounts of connections (or request), and that will make process start running and consume requests etc... If some application tries to access the hard disk, the initial position of the physical disk matters a lot for seek time, etc...
If you want a "deterministic" limit, you might wanna check if you can count how many instructions were executed by a certain process, or something like that.
Anyways, I've participated in several programming contents, and as far as I know, they don't care about the 50 ms differences... If you do a proper algorithm, you can get inside the time with a really big margin. So I'd advise you to live with it, and just include that in the rules.

Related

Can we time commands deterministically?

We know that in bash, time foo will tell us how long a command foo takes to execute. But there is so much variability, depending on unrelated factors including what else is running on the machine at the time. It seems like there should be some deterministic way of measuring how long a program takes to run. Number of processor cycles, perhaps? Number of pipeline stages?
Is there a way to do this, or if not, to at least get a more meaningful time measurement?
You've stumbled into a problem that's (much) harder than it appears. The performance of a program is absolutely connected to the current state of the machine in which it is running. This includes, but is not limited to:
The contents of all CPU caches.
The current contents of system memory, including any disk caching.
Any other processes running on the machine and the resources they're currently using.
The scheduling decisions the OS makes about where and when to run your program.
...the list goes on and on.
If you want a truly repeatable benchmark, you'll have to take explicit steps to control for all of the above. This means flushing caches, removing interference from other programs, and controlling how your job gets run. This isn't an easy task, by any means.
The good news is that, depending on what you're looking for, you might be able to get away with something less rigorous. If you run the job on your regular workload and it produces results in a good amount of time, then that might be all that you need.

Would threading be beneficial for this situation?

I have a CSV file with over 1 million rows. I also have a database that contains such data in a formatted way.
I want to check and verify the data in the CSV file and the data in the database.
Is it beneficial/reduces time to thread reading from the CSV file and use a connection pool to the database?
How well does Ruby handle threading?
I am using MongoDB, also.
It's hard to say without knowing some more details about the specifics of what you want the app to feel like when someone initiates this comparison. So, to answer, some general advice that should apply fairly well regardless of the problem you might want to thread.
Threading does NOT make something computationally less costly
Threading doesn't make things less costly in terms of computation time. It just lets two things happen in parallel. So, beware that you're not falling into the common misconception that, "Threading makes my app faster because the user doesn't wait for things." - this isn't true, and threading actually adds quite a bit of complexity.
So, if you kick off this DB vs. CSV comparison task, threading isn't going to make that comparison take any less time. What it might do is allow you to tell the user, "Ok, I'm going to check that for you," right away, while doing the comparison in a separate thread of execution. You still have to figure out how to get back to the user when the comparison is done.
Think about WHY you want to thread, rather than simply approaching it as whether threading is a good solution for long tasks
Like I said above, threading doesn't make things faster. At best, it uses computing resources in a way that is either more efficient, or gives a better user experience, or both.
If the user of the app (maybe it's just you) doesn't mind waiting for the comparison to run, then don't add threading because you're just going to add complexity and it won't be any faster. If this comparison takes a long time and you'd rather "do it in the background" then threading might be an answer for you. Just be aware that if you do this you're then adding another concern, which is, how do you update the user when the background job is done?
Threading involves extra overhead and app complexity, which you will then have to manage within your app - tread lightly
There are other concerns as well, such as, how do I schedule that worker thread to make sure it doesn't hog the computing resources? Are the setting of thread priorities an option in my environment, and if so, how will adjusting them affect the use of computing resources?
Threading and the extra overhead involved will almost definitely make your comparison take LONGER (in terms of absolute time it takes to do the comparison). The real advantage is if you don't care about completion time (the time between when the comparison starts and when it is done) but instead the responsiveness of the app to the user, and/or the total throughput that can be achieved (e.g. the number of simultaneous comparisons you can be running, and as a result the total number of comparisons you can complete within a given time span).
Threading doesn't guarantee that your available CPU cores are used efficiently
See Green Threads vs. native threads - some languages (depending on their threading implementation) can schedule threads across CPUs.
Threading doesn't necessarily mean your threads wind up getting run in multiple physical CPU cores - in fact in many cases they definitely won't. If all your app's threads run on the same physical core, then they aren't truly running in parallel - they are just splitting CPU time in a way that may make them look like they are running in parallel.
For these reasons, depending on the structure of your app, it's often less complicated to send background tasks to a separate worker process (process, not thread), which can easily be scheduled onto available CPU cores at the OS level. Separate processes (as opposed to separate threads) also remove a lot of the scheduling concerns within your app, because you essentially offload the decision about how to schedule things onto the OS itself.
This last point is pretty important. OS schedulers are extremely likely to be smarter and more efficiently designed than whatever algorithm you might come up with in your app.

Why do you use the keyword delete?

I understand that delete returns memory to the heap that was allocated of the heap, but what is the point? Computers have plenty of memory don't they? And all of the memory is returned as soon as you "X" out of the program.
Example:
Consider a server that allocates an object Packet for each packet it receives (this is bad design for the sake of the example).
A server, by nature, is intended to never shut down. If you never delete the thousands of Packet your server handles per second, your system is going to swamp and crash in a few minutes.
Another example:
Consider a video game that allocates particles for the special effect, everytime a new explosion is created (and never deletes them). In a game like Starcraft (or other recent ones), after a few minutes of hilarity and destruction (and hundres of thousands of particles), lag will be so huge that your game will turn into a PowerPoint slideshow, effectively making your player unhappy.
Not all programs exit quickly.
Some applications may run for hours, days or longer. Daemons may be designed to run without cease. Programs can easily consume more memory over their lifetime than available on the machine.
In addition, not all programs run in isolation. Most need to share resources with other applications.
There are a lot of reasons why you should manage your memory usage, as well as any other computer resources you use:
What might start off as a lightweight program could soon become more complex, depending on your design areas of memory consumption may grow exponentially.
Remember you are sharing memory resources with other programs. Being a good neighbour allows other processes to use the memory you free up, and helps to keep the entire system stable.
You don't know how long your program might run for. Some people hibernate their session (or never shut their computer down) and might keep your program running for years.
There are many other reasons, I suggest researching on memory allocation for more details on the do's and don'ts.
I see your point, what computers have lots of memory but you are wrong. As an engineer you have to create programs, what uses computer resources properly.
Imagine, you made program which runs all the time then computer is on. It sometimes creates some objects/variables with "new". After some time you don't need them anymore and you don't delete them. Such a situation occurs time to time and you just make some RAM out of stock. After a while user have to terminate your program and launch it again. It is not so bad but it not so comfortable, what is more, your program may be loading for a while. Because of these user feels bad of your silly decision.
Another thing. Then you use "new" to create object you call constructor and "delete" calls destructor. Lets say you need to open so file and destructor closes it and makes it accessible for other processes in this case you would steel not only memory but also files from other processes.
If you don't want to use "delete" you can use shared pointers (it has garbage collector).
It can be found in STL, std::shared_ptr, it has one disatvantage, WIN XP SP 2 and older do not support this. So if you want to create something for public you should use boost it also has boost::shared_ptr. To use boost you need to download it from here and configure your development environment to use it.

How to force workflow runtime to use more CPU power?

Hello
I've quite unordinary problem because I think that in my case workflow runtime doesn't use enough CPU power. Scenario is as follow:
I send a lot of messages to queues. I use EnqueueItem method from WorkflowRuntime class.
I create new instance of workflow with CreateWorkflow method of WorkflowRuntime class.
I wait until new workflow will be moved to the first state. Under normal conditions it takes dozens of second (the workflow is complicated). When at the same time messages are being sent to queues (as described in the point 1) it takes 1 minute or more.
I observe low CPU (8 cores) utilization, no more than 15%. I can add that I have separate process that is responsible for workflow logic and I communicate with it with WCF.
You've got logging, which you think is not a problem, but you don't know. There are many database operations. Those need to block for I/O. Having more cores will only help if different threads can run unimpeded.
I hate to sound like a stuck record, always trotting out the same answer, but you are guessing at what the problem is, and you're asking other people to guess too. People are very willing to guess, but guesses don't work. You need to find out what's happening.
To find out what's happening, the method I use is, get it running under a debugger. (Simplify the problem by going down to one core.) Then pause the whole thing, look at each active thread, and find out what it's waiting for. If it's waiting for some CPU-bound function to complete for some reason, fine - make a note of it. If it's waiting for some logging to complete, make a note. If it's waiting for a DB query to complete, note it. If it's waiting at a mutex for some other thread, note it.
Do this for each thread, and do it several times. Then, you can really say you know what it's doing. When you know what it's waiting for and why, you'll have a pretty good idea how to improve it. That's a variation on this technique.
What are you doing in the work item?
If you have any sort of cross thread synchronisation (Critical sections etc) then this could cause you to spend time stalling the threads waiting for resources to become free.
For example, If you are doing any sort of file access then you are going to spend considerable time blocked waiting for the loads to complete and this will leave your threads idle a lot of the time. You could throw more threads at the problem but then you'd end up generating more disk requests and the resource contention would become even more of a problem.
Thats a couple of potential ideas but I'd really need to know what you are doing before I can be more useful ...
Edit: in answer to your comments...
1) OK
2) You'd perform terribly with 2000 threads working flat out due to switching overhead. In fact running 20-25 threads on an 8 core machine may be a bad plan too because if you get them running at high speed then they will spend time stealing each other's runtime and regular context switches (software thread switches) are very expensive. They may not be as expensive as the waits your code is suffering.
3) Logging? Do you just submit them to an asynchronous queue that spits them out to disk when it has the opportunity or are they sychronous file writes? If they are aysnchronous can you guarantee that there isn't a maximum number of request that can be queued before you DO have to wait? And if you have to wait how many threads end up iin contention for the space that just opened up? There are a lot of ifs there alone.
4) Database operation even on the best database are likely to block if 2 threads make similar calls into the database simultaneously. A good database is designed to limit this but its quite likely that, at least some, clashing will happen.
Suffice to say you will want to get a good thread profiler to see where time is REALLY being lost. Failing that you will just have to live with the performance or attack the problem in a different way ...
WF3 performance is a little on the slow side. If you are using .NET 4 you will get a better performance moving to WF4. Mind you is means a rewrite as WF4 is a completely different product.
As to WF3. There is white paper here that should give you plenty of information to improve things from the standard settings. Look for things like increasing the number of threads used by the DefaultWorkflowSchedulerService or switching to the ManualWorkflowSchedulerService and disabling performance counters which are enabled by default.

Preventing a heavy process from sinking in the swap file

Our service tends to fall asleep during the nights on our client's server, and then have a hard time waking up. What seems to happen is that the process heap, which is sometimes several hundreds of MB, is moved to the swap file. This happens at night, when our service is not used, and others are scheduled to run (DB backups, AV scans etc). When this happens, after a few hours of inactivity the first call to the service takes up to a few minutes (consequent calls take seconds).
I'm quite certain it's an issue of virtual memory management, and I really hate the idea of forcing the OS to keep our service in the physical memory. I know doing that will hurt other processes on the server, and decrease the overall server throughput. Having that said, our clients just want our app to be responsive. They don't care if nightly jobs take longer.
I vaguely remember there's a way to force Windows to keep pages on the physical memory, but I really hate that idea. I'm leaning more towards some internal or external watchdog that will initiate higher-level functionalities (there is already some internal scheduler that does very little, and makes no difference). If there were a 3rd party tool that provided that kind of service is would have been just as good.
I'd love to hear any comments, recommendations and common solutions to this kind of problem. The service is written in VC2005 and runs on Windows servers.
As you mentioned, forcing the app to stay in memory isn't the best way to share resources on the machine. A quick solution that you might find that works well is to simply schedule an event that wakes your service up at a specific time each morning before your clients start to use it. You can just schedule it in the windows task scheduler with a simple script or EXE call.
I'm not saying you want to do this, or that it is best practice, but you may find it works well enough for you. It seems to match what you've asked for.
Summary: Touch every page in the process, on page at a time, on a regular basis.
What about a thread that runs in the background and wakes up once every N seconds. Each time the page wakes up, it attempts to read from address X. The attempt is protected with an exception handler in case you read a bad address. Then increment X by the size of a page.
There are 65536 pages in 4GB, 49152 pages in 3GB, 32768 pages in 2GB. Divide your idle time (overnight dead time) by how often you want (attempt) to hit each page.
BYTE *ptr;
ptr = NULL;
while(TRUE)
{
__try
{
BYTE b;
b = *ptr;
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
// ignore, some pages won't be accessible
}
ptr += sizeofVMPage;
Sleep(N * 1000);
}
You can get the sizeOfVMPage value from the dwPageSize value in the returned result from GetSystemInfo().
Don't try to avoid the exception handler by using if (!IsBadReadPtr(ptr)) because other threads in the app may be modifying memory protections at the same time. If you get unstuck because of this it will almost impossible to identify why (it will most likely be a non-repeatable race condition), so don't waste time with it.
Of course, you'd want to turn this thread off during the day and only run it during your dead-time.
A third approach could be to have your service run a thread that does something trivial like incrementing a counter and then sleeps for a fairly long period, say 10 seconds. Thios should have minimal effect on other applications but keep at least some of your pages available.
The other thing to ensure is that your data is localized.
In other words: do you really need all 300 MiB of the memory before you can do anything? Can the data structures you use be rearranged so that any particular request could be satisfied with only a few megabytes?
For example
if your 300 MiB of heap memory contains facial recognition data. Can the data internally be arranged so that male and female face data are stored together? Or big-noes are separate from small-noses?
if it has some sort of logical structure to it can be it sorted? so that a binary search can be used to skip over a lot of pages?
if it's a propritary, in-memory, database engine, can the data be better indexed/clustered to not require so many memory page hits?
if they're image textures, can commonly used textures be located near each other?
Do you really need all 300 MiB of the memory before you can do anything? You cannot service request without all that data back in memory?
Otherwise: scheduled task at 6 ᴀᴍ to wake it up.
In terms of cost, the cheapest and easiest solution is probably just to buy more RAM for that server, and then you can disable the page file entirely. If you're running 32-bit Windows, just buy 4GB of RAM. Then the entire address space will be backed with physical memory, and the page file won't be doing anything anyway.

Resources