I have a program in which there are about 10 tasks running at the same time. They are all calling the same function with different values, and I want to know how long it takes for each one of my tasks to execute this function. However, it seems to me that I can't use something like that :
A:=Clock;
MyFunction(...);
B:=Clock;
Time:= B-A;
Indeed, I think this would not return the "real" CPU Time, but the time elapsed between the beginning and the end of the function, which is not correct, because my tasks could be switched when they're executing the function. So, I'm wondering if there is a way to know the "real" CPU time of each one of my tasks, i.e. the time they really spent using the CPU ?
I think you need the facilities of Annex D.14:
"This subclause describes a language-defined package to measure execution time."
"The execution time or CPU time of a given task is defined as the time spent by the system executing that task, including the time spent executing run-time or system services on its behalf."
This is specified in Ada 2005 and Ada 2012. Whether it's available in your compiler on your operating system is another issue! For example, "Execution_Time is not supported in this configuration" on Mac OS X (GNAT GPL 2013, GCC 4.8).
Burns & Welling "Concurrent and Real-time Programming in Ada" describes this as an execution-time clock. (ch15.5 if you can get hold of a copy) and this was added in Ada-2005.
You would be looking for a package in your installation called "Ada.Execution_Time". Hopefully its package spec can tell you all you need to know.
There would probably also be a child package "Ada.Execution_Time.Timers" which would let you set up events to occur , e.g. when a CPU time budget is exceeded.
On my system, using Gnat installed in /usr/local/bin, the relevant file is:
/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/adainclude/a-exetim.ads
and the function you need is Ada.Execution_Time.Clock.
Related
I would like to create a minimal Windows executable that does nothing - and is minimal in size.
All I care about is keeping a process entry in the task manager.
On Linux, this is very easy (it only takes 2 assembly instructions to use the pause
syscall). How can I achieve similar results on Windows?
I'm trying to keep the executable size to a minimum, I don't want to have 10kB executable that literally does nothing.
Is there a way to achieve this in assembly? As I mentioned, I'd rather not include huge libraries just to make the process "hang".
As Hans suggests in the comments, Sleep(INFINITE) is probably the simplest non-busy wait. It does however mean you have to kill the process with Task Manager to stop it.
Calling MessageBox followed by ExitProcess is probably less annoying if you need to start/stop this process multiple times.
You can probably get it down to 1 KiB with Visual C++ if you don't use the CRT (WinMainCRTStartup and compile with /Zl and smaller alignment)
You can get it slightly smaller with assembly but it is probably not worth it.
I met a wired problem but I wonder if I'm asking the correct question:
result = parLapply(cl, 1:4,
function(j,rho_list_needed,delta0_needed,
V_iter_s,Sigma_list_needed) {
rhoj = rho_list_needed[[j]]
delta0_in_cpp = delta0_needed
v = as.vector(V_iter_s[,,,j])
sigmaj = Sigma_list_needed[[j]]
sourceCpp('sample_Z.cpp')#first time complie slow,then cashed
return(Sample_Z(rhoj,delta0_in_cpp, v,sigmaj,A,Cmatrix))
},rho_list_needed,delta0_needed,
V_iter[[s]],Sigma_list_needed)
When I was testing my sample_Z.cpp with parallel through parLapply, the single calculation takes around 1 sec. By parallel, my 4 iterations takes around 1.2 secs, which is a big improvement compared to unparalleled version, which is 8 sec.
There's no problem at all when I run my program yesterday. Just now I noticed a bug and revised my program. To give my PC a fresh environment, I restarted my computer. When started to run my program, I only opened the .R file, and run. But it took 9 sec for that parallel, which used to be 1.2 sec. The 9 sec was after warming up my cores, i.e., already sourced the cpp before I time it.
I just don't know where is the bug. Then try to source the cpp file directly in my global merriment, and I found out that there was no caching at all. The second time took the same time as the first one.
But I accidentally opened the sample_Z.cpp in Rstudio, explicitly at the editor. And then, everything works correct now.
I don't know how to search this similar problem on google with what kind of key words and I don't know if opening the cpp file is a must, while I never known before.
Can anyone tell me what's the real issue? Thanks!
After restarting your PC, you probably had extra processes running which would have competed for CPU cores that slowed down your algorithm. The fact you're rebooting suggests to me you're not using Linux... but if you are, watch with top while starting your code, or equivalent for your platform.
I am writing a Hadoop scheduler. My scheduling requires finding the CPU time taken by each Map/Reduce task.
I know that:
The TaskInProgress class maintains the execStartTime and execFinishTime values which are wall-clock times when the process started and finished, but they do not accurately indicate the CPU time consumed by the task.
Each task is executed in a new JVM, and I could use the OperatingSystemMXBean.getProcessCpuTime () method, but again the description of the method tells me: "Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds". I am not entirely clear if this is what I want.
I am using a library that records resource metrics like CPU Usage/IDLE time, swap usage and memory usage.
http://code.google.com/p/hadoop-toolkit/
You have to extract a patch and apply it to a 20.2 tag version.
I am not entirely clear if this is what I want.
I am pretty sure that this method returns the wall clock time as well.
Just for posterity, I solved this problem by making a change in src/mapred/org/apache/hadoop/mapred/TaskLog.java (Hadoop 0.20.203) on line 572
mergedCmd.append("exec setsid 'time' "); // add 'time'
The CPU time will be written to: logs/userlogs/JOBID/TASKID/stderr. I also wrote a script to reap the cumulative CPU time: https://gist.github.com/1984365
Before running the job, you need to make sure you do:
rm -rf logs/userlogs/*
so that the script works.
I have a program that creates a file of about 50MB size. During the process the program frequently rewrites sections of the file and forces the changes to disk (in the order of 100 times). It uses a FileChannel and direct ByteBuffers via fc.read(...), fc.write(...) and fc.force(...).
New text:
I have a better view on the problem now.
The problem appears to be that I use three different JVMs to modify a file (one creates it, two others (launched from the first) write to it). Every JVM closes the file properly before the next JVM is started.
The problem is that the cost of fc.write() to that file occasionally goes through the roof for the third JVM (in the order of 100 times the normal cost). That is, all write operations are equally slow, it is not just one that hang very long.
Interestingly, one way to help this is to insert delays (2 seconds) between the launching of JVMs. Without delay, writing is always slow, with delay, the writing is slow aboutr every second time or so.
I also found this Stackoverflow: How to unmap a file from memory mapped using FileChannel in java? which describes a problem for mapped files, which I'm not using.
What I suspect might be going on:
Java does not completely release the file handle when I call close(). When the next JVM is started, Java (or Windows) recognizes concurrent access to that file and installes some expensive concurrency handler for that file, which makes writing expensive.
Would that make sense?
The problem occurs on Windows 7 (Java 6 and 7, tested on two machines), but not under Linux (SuSE 11.3 64).
Old text:
The problem:
Starting the program from as a JUnit test harness from eclipse or from console works fine, it takes around 3 seconds.
Starting the program through an ant task (or through JUnit by kicking of a separate JVM using a ProcessBuilder) slows the program down to 70-80 seconds for the same task (factor 20-30).
Using -Xprof reveals that the usage of 'force0' and 'pwrite' goes through the roof from 34.1% (76+20 tics) to 97.3% (3587+2913+751 tics):
Fast run:
27.0% 0 + 76 sun.nio.ch.FileChannelImpl.force0
7.1% 0 + 20 sun.nio.ch.FileDispatcher.pwrite0
[..]
Slow run:
Interpreted + native Method
48.1% 0 + 3587 sun.nio.ch.FileDispatcher.pwrite0
39.1% 0 + 2913 sun.nio.ch.FileChannelImpl.force0
[..]
Stub + native Method
10.1% 0 + 751 sun.nio.ch.FileDispatcher.pwrite0
[..]
GC and compilation are negligible.
More facts:
No other methods show a significant change in the -Xprof output.
It's either fast or very slow, never something in-between.
Memory is not a problem, all test machines have at least 8GB, the process uses <200MB
rebooting the machine does not help
switching of virus-scanners and similar stuff has no affect
When the process is slow, there is virtually no CPU usage
It is never slow when running it from a normal JVM
It is pretty consistently slow when running it in a JVM that was started from the first JVM (via ProcessBuilder or as ant-task)
All JVMs are exactly the same. I output System.getProperty("java.home") and the JVM options via RuntimeMXBean RuntimemxBean = ManagementFactory.getRuntimeMXBean(); List arguments = RuntimemxBean.getInputArguments();
I tested it on two machines with Windows7 64bit, Java 7u2, Java 6u26 and JRockit, the hardware of the machines differs, though, but the results are very similar.
I tested it also from outside Eclipse (command-line ant) but no difference there.
The whole program is written by myself, all it does is reading and writing to/from this file, no other libraries are used, especially no native libraries. -
And some scary facts that I just refuse to believe to make any sense:
Removing all class files and rebuilding the project sometimes (rarely) helps. The program (nested version) runs fast one or two times before becoming extremely slow again.
Installing a new JVM always helps (every single time!) such that the (nested) program runs fast at least once! Installing a JDK counts as two because both the JDK-jre and the JRE-jre work fine at least once. Overinstalling a JVM does not help. Neither does rebooting. I haven't tried deleting/rebooting/reinstalling yet ...
These are the only two ways I ever managed to get fast program runtimes for the nested program.
Questions:
What may cause this performance drop for nested JVMs?
What exactly do these methods do (pwrite0/force0)? -
Are you using local disks for all testing (as opposed to any network share) ?
Can you setup Windows with a ram drive to store the data ? When a JVM terminates, by default its file handles will have been closed but what you might be seeing is the flushing of the data to the disk. When you overwrite lots of data the previous version of data is discarded and may not cause disk IO. The act of closing the file might make windows kernel implicitly flush data to disk. So using a ram drive would allow you to confirm that their since disk IO time is removed from your stats.
Find a tool for windows that allows you to force the kernel to flush all buffers to disk, use this in between JVM runs, see how long that takes at the time.
But I would guess you are hitten some iteraction with the demands of the process and the demands of the kernel in attempting to manage disk block buffer cache. In linux there is a tool like "/sbin/blockdev --flushbufs" that can do this.
FWIW
"pwrite" is a Linux/Unix API for allowing concurrent writing to a file descriptor (which would be the best kernel syscall API to use for the JVM, I think Win32 API already has provision for the same kinds of usage to share a file handle between threads in a process, but since Sun have Unix heritige things get named after the Unix way). Google "pwrite(2)" for more info on this API.
"force" I would guess that is a file system sync, meaning the process is requesting the kernel to flush unwritten data (that is currently in disk block buffer cache) into the file on the disk (such as would be needed before you turned your computer off). This action will happen automatically over time, but transactional systems require to know when the data previously written (with pwrite) has actually hit the physical disk and is stored. Because some other disk IO is dependant on knowing that, such as with transactional checkpointing.
One thing that could help is making sure you explicitly set the FileChannel to null. Then call System.runFinalization() and maybe System.gc() at the end of the program. You may need more than 1 call.
System.runFinalizersOnExit(true) may also help, but it's deprecated so you will have to deal with the compiler warnings.
I want to run a command and, when it's complete, have a record of the maximum memory use of the resulting process. For instance, I want something analogous to the 'time' command on Linux, where 'time foo' will run 'foo' and, when 'foo' exits, will print out the amount of CPU time that 'foo' took.
For my present application I need this to run on Windows, but if you know of a Linux-only program let me know too. (At the very least it'd be interesting, but it may also give me a lead to find a Windows equivalent.)
You can, if you have Vista (maybe 7 too, not sure). Go to start -> control panel -> system and maintenance -> administrative tools -> Reliability and Performance Monitor -> Performance Monitor -> Create new watch (green + symbol) -> Process -> Working Set -> [select a process below] and press Ok. You can log this, etc.
Screenshot: http://www.freeimagehosting.net/image.php?912df44d75.jpg
I don't know of a program that does this, but there are APIs.
If you're using .NET, use the Process.TotalProcessorTime property.
If you're using native code, use the GetProcessTimes() function.
I've created a cmd exe for anaylsing memory usage of long time running program.
see here:MemoryUsageMonitor
I have created a simple Windows program called timemem.exe that behaves similarly to /usr/bin/time on Linux/Mac OS X, and will show similar statistics, such as elapsed time, user and kernel CPU time, and maximum working set size in memory used by another Win32 process. See:
http://homepage.mac.com/jafingerhut/files/code/code.html