Can we time commands deterministically? - bash

We know that in bash, time foo will tell us how long a command foo takes to execute. But there is so much variability, depending on unrelated factors including what else is running on the machine at the time. It seems like there should be some deterministic way of measuring how long a program takes to run. Number of processor cycles, perhaps? Number of pipeline stages?
Is there a way to do this, or if not, to at least get a more meaningful time measurement?

You've stumbled into a problem that's (much) harder than it appears. The performance of a program is absolutely connected to the current state of the machine in which it is running. This includes, but is not limited to:
The contents of all CPU caches.
The current contents of system memory, including any disk caching.
Any other processes running on the machine and the resources they're currently using.
The scheduling decisions the OS makes about where and when to run your program.
...the list goes on and on.
If you want a truly repeatable benchmark, you'll have to take explicit steps to control for all of the above. This means flushing caches, removing interference from other programs, and controlling how your job gets run. This isn't an easy task, by any means.
The good news is that, depending on what you're looking for, you might be able to get away with something less rigorous. If you run the job on your regular workload and it produces results in a good amount of time, then that might be all that you need.

Related

Ocaml execution of program not producing new output after some time

I have 3 ocaml modules, the last one of the tree does the actual computation and uses functions defined in the other 2.
The basic idea of the program is to have a port as initial state, which contains boats, and moving those boats until we reach either a winning situation or no more moves are possible.
The actual code is not really a problem here. It is probably not very efficient, but finds a solution... except when it does not.
While the program takes a few seconds for complex situations, in some cases, when I add a bit more complexity, it takes forever to find a solutions.
When I execute the program like this:
$ ocamlc -o test.exe port.ml moves.ml solver.ml
$ ./test.exe > file
The resulting file is really huge, but its size stops increasing after some time..
It seems to me that after a while the program stops running but without terminating, no Stackoverflow or out of memory error is thrown. The program simply does not continue the execution. In other words, the command
$ ./test.exe > file
is still executed, but no more new lines are added to the file. If I log to the shell itself and not a file, I get the same result: no new lines keep getting added after some time.
What could this be?
The main function (that is responsible for finding a solution) uses a Depth-First-Search algorithm, and contains a lot of List operations, such as List.fold, List.map, List.iter, List.partition, List.filter. I was thinking that maybe these functions have problems dealing with huge lists of complex types at some moment, but again, no error is thrown, the execution just stops.
I explained this very vaguely, but I really don't understand the problem here. I don't know whether the problem has to do with my shell (Ubuntu subsystem on windows) running out of memory, or ocaml List functions being limited at some point... If you have any suggestions, feel free to comment
To debug such cases you should use diagnostic utilities provided by your operating system and the OCaml infrastructure itself.
First of all, you shall look into the state of your process. You can use top or htop utilities if you're running a Unix machine. Otherwise, you can use the task manager.
If the process is running out of the physical memory, it could be swapped by the operating system. In that case, all memory operations will turn into hard drive reads and writes. Therefore, garbage collecting a heap stored in a hard drive will take some time. If this was the case, then you can use a memory profiler to identify the crux of the problem.
If the process is constantly running without a change in the memory footprint, then it looks like that you either hit a bug in your code, i.e., an infinite loop, or that some of your algorithms have exponential complexity, as Konstantin has mentioned in the comment. Use a debugging output or tracing to identify the location where the program stalled.
Finally, if your program is in the sleeping state, then it could be a deadlock. For example, if you're reading and writing to the same file, this can end up in a race condition. In general, if your program is multithreaded or operates multiple processes, there are lots of possibilities to induce a race condition.

Execution time java program

First of all, here's just something I'm curious about
I've made a little program which fills some templates with values and I noticed that every time I run it the execution time changes a little bit, it ranges from 0.550s to 0.600s. My CPU is running at 2.9GHZ if that could be useful.
The instructions are always the same, is it maybe something that has to do with physics or something more software oriented?
it has to do with java running on a virtual machine; even a c program might run different times slightly longer/shorter, also the operating system steers when a program has resources (cpu time, memory …) to be executed.

If a CPU is always executing instructions how do we measure its work?

Let us say we have a fictitious single core CPU with Program Counter and basic instruction set such as Load, Store, Compare, Branch, Add, Mul and some ROM and RAM. Upon switching on it executes a program from ROM.
Would it be fair to say the work the CPU does is based on the type of instruction it's executing. For example, a MUL operating would likely involve more transistors firing up than say Branch.
However from an outside perspective if the clock speed remains constant then surely the CPU could be said to be running at 100% constantly.
How exactly do we establish a paradigm for measuring the work of the CPU? Is there some kind of standard metric perhaps based on the type of instructions executing, the power consumption of the CPU, number of clock cycles to complete or even whether it's accessing RAM or ROM.
A related second question is what does it mean for the program to "stop". Usually does it just branch in an infinite loop or does the PC halt and the CPU waits for an interupt?
First of all, that a CPU is always executing some code is just an approximation these days. Computer systems have so-called sleep states which allow for energy saving when there is not too much work to do. Modern CPUs can also throttle their speed in order to improve battery life.
Apart from that, there is a difference between the CPU executing "some work" and "useful work". The CPU by itself can't tell, but the operating system usually can. Except for some embedded software, a CPU will never be running a single job, but rather an operating system with different processes within it. If there is no useful process to run, the Operating System will schedule the "idle task" which mostly means putting the CPU to sleep for some time (see above) or jsut burning CPU cycles in a loop which does nothing useful. Calculating the ratio of time spent in idle task to time spent in regular tasks gives the CPU's business factor.
So while in the old days of DOS when the computer was running (almost) only a single task, it was true that it was always doing something. Many applications used so-called busy-waiting if they jus thad to delay their execution for some time, doing nothing useful. But today there will almost always be a smart OS in place which can run the idle process than can put the CPU to sleep, throttle down its speed etc.
Oh boy, this is a toughie. It’s a very practical question as it is a measure of performance and efficiency, and also a very subjective question as it judges what instructions are more or less “useful” toward accomplishing the purpose of an application. The purpose of an application could be just about anything, such as finding the solution to a complex matrix equation or rendering an image on a display.
In addition, modern processors do things like clock gating in power idle states. The oscillator is still producing cycles, but no instructions execute due to certain circuitry being idled due to cycles not reaching them. These are cycles that are not doing anything useful and need to be ignored.
Similarly, modern processors can execute multiple instructions simultaneously, execute them out of order, and predict and execute which instructions will be executed next before your program (i.e. the IP or Instruction Pointer) actually reaches them. You don’t want to include instructions whose execution never actually complete, such as because the processor guesses wrong and has to flush those instructions, e.g. as due to a branch mispredict. So a better metric is counting those instructions that actually complete. Instructions that complete are termed “retired”.
So we should only count those instructions that complete (i.e. retire), and cycles that are actually used to execute instructions (i.e. unhalted).)
Perhaps the most practical general metric for “work” is CPI or cycles-per-instruction: CPI = CPU_CLK_UNHALTED.CORE / INST_RETIRED.ANY. CPU_CLK_UNHALTED.CORE are cycles used to execute actual instructions (vs those “wasted” in an idle state). INST_RETIRED are those instructions that complete (vs those that don’t due to something like a branch mispredict).
Trying to get a more specific metric, such as the instructions that contribute to the solution of a matrix multiple, and excluding instructions that don’t directly contribute to computing the solution, such as control instructions, is very subjective and difficult to gather statistics on. (There are some that you can, such as VECTOR_INTENSITY = VPU_ELEMENTS_ACTIVE / VPU_INSTRUCTIONS_EXECUTED which is the number of SIMD vector operations, such as SSE or AVX, that are executed per second. These instructions are more likely to directly contribute to the solution of a mathematical solution as that is their primary purpose.)
Now that I’ve talked your ear off, check out some of the optimization resources at your local friendly Intel developer resource, software.intel.com. Particularly, check out how to effectively use VTune. I’m not suggesting you need to get VTune though you can get a free or very discounted student license (I think). But the material will tell you a lot about increasing your programs performance (i.e. optimizing), which is, if you think about it, increasing the useful work your program accomplishes.
Expanding on Michał's answer a bit:
Program written for modern multi-tasking OSes are more like a collection of event handlers: they effectively setup listeners for I/O and then yield control back to the OS. The OS wake them up each time there is something to process (e.g. user action, data from device) and they "go to sleep" by calling into the OS once they've finished processing. Most OSes will also preempt in case one process hog the CPU for too long and starve the others.
The OS can then keep tabs on how long each process are actually running (by remembering the start and end time of each run) and generate the statistics like CPU time and load (ready process queue length).
And to answer your second question:
To stop mostly means a process is no longer scheduled and all associated resource (scheduling data structures, file handles, memory space, ...) destroyed. This usually require the process to call a special OS call (syscall/interrupt) so the OS can release the resources gracefully.
If however a process run into an infinite loop and stops responding to OS events, then it can only be forcibly stopped (by simply not running it anymore).

Would threading be beneficial for this situation?

I have a CSV file with over 1 million rows. I also have a database that contains such data in a formatted way.
I want to check and verify the data in the CSV file and the data in the database.
Is it beneficial/reduces time to thread reading from the CSV file and use a connection pool to the database?
How well does Ruby handle threading?
I am using MongoDB, also.
It's hard to say without knowing some more details about the specifics of what you want the app to feel like when someone initiates this comparison. So, to answer, some general advice that should apply fairly well regardless of the problem you might want to thread.
Threading does NOT make something computationally less costly
Threading doesn't make things less costly in terms of computation time. It just lets two things happen in parallel. So, beware that you're not falling into the common misconception that, "Threading makes my app faster because the user doesn't wait for things." - this isn't true, and threading actually adds quite a bit of complexity.
So, if you kick off this DB vs. CSV comparison task, threading isn't going to make that comparison take any less time. What it might do is allow you to tell the user, "Ok, I'm going to check that for you," right away, while doing the comparison in a separate thread of execution. You still have to figure out how to get back to the user when the comparison is done.
Think about WHY you want to thread, rather than simply approaching it as whether threading is a good solution for long tasks
Like I said above, threading doesn't make things faster. At best, it uses computing resources in a way that is either more efficient, or gives a better user experience, or both.
If the user of the app (maybe it's just you) doesn't mind waiting for the comparison to run, then don't add threading because you're just going to add complexity and it won't be any faster. If this comparison takes a long time and you'd rather "do it in the background" then threading might be an answer for you. Just be aware that if you do this you're then adding another concern, which is, how do you update the user when the background job is done?
Threading involves extra overhead and app complexity, which you will then have to manage within your app - tread lightly
There are other concerns as well, such as, how do I schedule that worker thread to make sure it doesn't hog the computing resources? Are the setting of thread priorities an option in my environment, and if so, how will adjusting them affect the use of computing resources?
Threading and the extra overhead involved will almost definitely make your comparison take LONGER (in terms of absolute time it takes to do the comparison). The real advantage is if you don't care about completion time (the time between when the comparison starts and when it is done) but instead the responsiveness of the app to the user, and/or the total throughput that can be achieved (e.g. the number of simultaneous comparisons you can be running, and as a result the total number of comparisons you can complete within a given time span).
Threading doesn't guarantee that your available CPU cores are used efficiently
See Green Threads vs. native threads - some languages (depending on their threading implementation) can schedule threads across CPUs.
Threading doesn't necessarily mean your threads wind up getting run in multiple physical CPU cores - in fact in many cases they definitely won't. If all your app's threads run on the same physical core, then they aren't truly running in parallel - they are just splitting CPU time in a way that may make them look like they are running in parallel.
For these reasons, depending on the structure of your app, it's often less complicated to send background tasks to a separate worker process (process, not thread), which can easily be scheduled onto available CPU cores at the OS level. Separate processes (as opposed to separate threads) also remove a lot of the scheduling concerns within your app, because you essentially offload the decision about how to schedule things onto the OS itself.
This last point is pretty important. OS schedulers are extremely likely to be smarter and more efficiently designed than whatever algorithm you might come up with in your app.

Elimination of run time variation over repeated executions of the same program

I am trying to design an Online Programming Contest Judge, and one of the things that I need to ensure is that when the same code is compiled (assuming the requirement),
given the same input, it should take exactly the same amount of time for the program to execute, each time this is done.
Currently, I am using a simple python script that
has 2 threads, one of which invokes a blocking system call that starts the execution of the test code, and the other keeps track of time and sends a kill signal to the
child process after the time limit expires. Incidentally, I am doing this inside a virtual machine for reason of security, and convenience (setting up a proper chroot is
way too complicated, and more risky).
However, given identical conditions (ie, when I restore a snapshot), I still get a variation in the time taken for execution in range of approximately 50ms on either side. As this prevents setting strict time limits, is there anyway to eliminate this variation?
I'm not an expert in that field, but I don't think you can do it. Even if you restore the snapshot inside the VM, the state of the "Outside" Machine is going to be pretty different. You have two OSs running, each one which multiple process which are probably going to compete for the resources at some point. If it's a website or a PC with an internet connection, you can get hit by different amounts of connections (or request), and that will make process start running and consume requests etc... If some application tries to access the hard disk, the initial position of the physical disk matters a lot for seek time, etc...
If you want a "deterministic" limit, you might wanna check if you can count how many instructions were executed by a certain process, or something like that.
Anyways, I've participated in several programming contents, and as far as I know, they don't care about the 50 ms differences... If you do a proper algorithm, you can get inside the time with a really big margin. So I'd advise you to live with it, and just include that in the rules.

Resources