How write benchmark statistics. How output from 'time' put into file - time

I have trouble with put 'time' output to file.
When I use on linux console time program > file I get output no information from program time.
I need 3 digit time output, looks like on console

"time -o outputfile.txt command" gives you a full output:
%Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k
%Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps
If you add "-p" you get the three line output like in interactive console:
real %e
user %U
sys %S
With:
Time
%E Elapsed real time (in [hours:]minutes:seconds).
%e (Not in tcsh.) Elapsed real time (in seconds).
%S Total number of CPU-seconds that the process spent in kernel
mode.
%U Total number of CPU-seconds that the process spent in user mode.
%P Percentage of the CPU that this job got, computed as (%U + %S) /
%E.
Memory
%M Maximum resident set size of the process during its lifetime, in
Kbytes.
%t (Not in tcsh.) Average resident set size of the process, in
Kbytes.
%K Average total (data+stack+text) memory use of the process, in
Kbytes.
%D Average size of the process's unshared data area, in Kbytes.
%p (Not in tcsh.) Average size of the process's unshared stack
space, in Kbytes.
%X Average size of the process's shared text space, in Kbytes.
%Z (Not in tcsh.) System's page size, in bytes. This is a per-sys-
tem constant, but varies between systems.
%F Number of major page faults that occurred while the process was
running. These are faults where the page has to be read in from
disk.
%R Number of minor, or recoverable, page faults. These are faults
for pages that are not valid but which have not yet been claimed
by other virtual pages. Thus the data in the page is still
valid but the system tables must be updated.
%W Number of times the process was swapped out of main memory.
%c Number of times the process was context-switched involuntarily
(because the time slice expired).
%w Number of waits: times that the program was context-switched
voluntarily, for instance while waiting for an I/O operation to
complete.
I/O
%I Number of file system inputs by the process.
%O Number of file system outputs by the process.
%r Number of socket messages received by the process.
%s Number of socket messages sent by the process.
%k Number of signals delivered to the process.
%C (Not in tcsh.) Name and command line arguments of the command
being timed.
%x (Not in tcsh.) Exit status of the command.
source: man time :)

Related

Performance Counters and IMC Counter Not Matching

I have an Intel(R) Core(TM) i7-4720HQ CPU # 2.60GHz (Haswell) processor. In a relatively idle situation, I ran the following Perf commands and their outputs are shown, below. The counters are offcore_response.all_data_rd.l3_miss.any_response and mem_load_uops_retired.l3_miss:
sudo perf stat -a -e offcore_response.all_data_rd.l3_miss.any_response,mem_load_uops_retired.l3_miss sleep 10
Performance counter stats for 'system wide':
3,713,037 offcore_response.all_data_rd.l3_miss.any_response
2,909,573 mem_load_uops_retired.l3_miss
10.016644133 seconds time elapsed
These two values seem consistent, as the latter excludes prefetch requests and those not targeted at DRAM. But they do not match the read counter in the IMC. This counter is called UNC_IMC_DRAM_DATA_READS and documented here. I read the counter reread it 1 second later. The difference was around 30,000,000 (EDITED). If multiplied by 10 (to estimate for 10 seconds) the resulting value will be around 300 million (EDITED), which is 100 times the value of the above-mentioned performance counters (EDITED). It is nowhere near 3 million! What am I missing?
P.S.: The difference is much smaller (but still large), when the system has more load.
The question is also asked, here:
https://community.intel.com/t5/Software-Tuning-Performance/Performance-Counters-and-IMC-Counter-Not-Matching/m-p/1288832
UPDATE:
Please note that PCM output matches my IMC counter reads.
This is the relevant PCM output:
The values for columns READ, WRITE and IO are calculated based on UNC_IMC_DRAM_DATA_READS, UNC_IMC_DRAM_DATA_WRITES and UNC_IMC_DRAM_IO_REQUESTS, respectively. It seems that requests classified as IO will be either READ or WRITE. In other words, during the depicted one second interval, almost (because of the inaccuracy reported in the above-mentioned doc) 2.01GB of the 2.42GB READ and WRITE requests belong to IO. Based on this explanation, the above three columns seem consistent with each other.
The problem is that there still exists a LARGE gap between the IMC and PMC values!
The situation is the same when I boot in runlevel 1. The processes on the scheduler are one of swapper, kworker and migration. Disk IO is almost 85KB/s. I'm wondering what leads to such a (relatively) huge amount of IO. Is it possible to detect that (e.g., using a counter or a tool)?
UPDATE 2:
I think that there is something wrong with the IO column. It is always something in the range [1.99,2.01], regardless of the amount of load in the system!
UPDATE 3:
In runlevel 1, the average number of occurrences of the uops_retired.all event in a 1-second interval is 15,000,000. During the same period, the number of read requests recorded by the associated IMC counter is around 30,000,000. In other words, assuming that all memory accesses are directly caused by cpu instructions, for each retired micro-operation, there exists two memory accesses. This seems impossible specially concerning the fact that there exist multiple levels of caches. Therefore, in the idle scenario, perhaps, the read accesses are caused by IO.
Actually, it was mostly caused by the GPU device. This was the reason for exclusion from performance counters. Here is the relevant output for a sample execution of PCM on a relatively idle system with resolution 3840x2160 and refresh rate 60 using xrandr:
And this is for the situation with resolution 800x600 and the same refresh rate (i.e., 60):
As can be seen, changing screen resolution reduced read and IO traffic considerably (more than 100x!).

Windbg ProcessUptime is not equal to Kernel Time + User Time

I was analyzing mini-dump of one of my processes using Windbg. I used .time command to see the process time and I got the result as below. I was expecting (Process Uptime = Kernel Time + User Time), which was not the case. Does any body know why or my interpretation is wrong?
0:035> .time
Debug session time: Tue May 5 14:30:24.000 2020 (UTC - 7:00)
System Uptime: not available
Process Uptime: 3 days 5:29:22.000
Kernel time: 0 days 9:06:26.000
User time: 11 days 18:50:47.000
The kernel & user times match the CPU / Kernel & User Times displayed in Process Explorer under the Performance tab, and are likely related to the times returned by GetProcessTimes. They add up to the Total Time displayed in Process Explorer, or the CPU Time displayed in Task Manager for the same process.
This "CPU time" is the total time across all CPUs, and does not include time the process spent sleeping, waiting, or otherwise sitting idle. Because of that it can be either (a) smaller than the process "uptime" which is simply the time difference between the start and end times, in the case of mostly idle processes, or (b) larger than the process uptime in the case of heavy usage across multiple CPUs.

Need bash script that constantly uses high memory but low cpu?

I am running few experiments to see changes in system behavior under different memory and cpu loads. I was wondering is there a bash script which constantly uses high memory but low CPU?
For the purpose of simulating CPU/memory/IO load, most *NIX systems (Linux included) provide handy tool called stress.
The tool varies from OS to OS. On Linux, to take up 512MB of RAM with low CPU load:
stress --vm 1 --vm-bytes 512M --vm-hang 100
(The invocation means: start one memory thread (--vm 1), allocate/free 512MB of memory in every thread, sleep before freeing memory 100 seconds.)
This is silly, and can't be reasonably expected to provide data which will be useful in any real-world scenario. However, to generate at least the amount of memory consumption associated with a given power-of-two bytes:
build_string() {
local pow=$1
local dest=$2
s=' '
for (( i=0; i<pow; i++ )); do
s+="$s"
done
printf -v "$dest" %s "$s"
}
build_string 10 kilobyte # build a string of length 1024
echo "Kilobyte string consumes ${#kilobyte} bytes"
build_string 20 megabyte # build a string of length 1048576
echo "Megabyte string consumes ${#megabyte} bytes"
Note that transiently, during construction, at least 2x the requested space will be required (for the local); a version that didn't have this behavior would either be using namevars (depending on bash 4.3) or eval (depending on the author's willingness to do evil).

Why does Unix block size increase with bigger memory size?

I am profiling binary data which has
increasing Unix block size (one got from stat > Blocks) when the number of events are increased as in the following figure
but the byte distance between events stay constant
I have noticed some changes in other fields of the file which may explain the increasing Unix block size
The unix block size is a dynamic measure.
I am interested in why it is increasing with bigger memory units in some systems.
I have had an idea that it should be constant.
I used different environments to provide the stat output:
Debian Linux 8.1 with its default stat
OSX 10.8.5 with Xcode 6 and its default stat
Greybeard's comment may have the answer to the blocks behaviour:
The stat (1) command used to be a thin CLI to the stat (2) system
call, which used to transfer relevant parts of a file's inode. Pretty
early on, the meaning of the st_blksize member of the C struct
returned by stat (2) was changed to "preferred" blocksize for
efficient file system I/O, which carries well to file systems with
mixed block sizes or non-block oriented allocation.
How can you measure the block size in case (1) and (2) separately?
Why can the Unix block size increase with bigger memory size?
"Stat blocks" is not a block size. It is number of blocks the file consists of. It is obvious that number of blocks is proportional to size. Size of block is constant for most file systems (if not all).

Getting cpu usage and calculating % used

I need to calculate the cpu usage and aggregate it from proc file in linux
/proc/stat gives me data but how would i come to know the % used of cpu at time as
stat gives me the count of processes at cores running at any time which does not give me any idea of %use of cpu?
And i am coding this in Golang and have to do this w/o scripts
Thanks in advance!!
/proc/stat does not only give you the count of processes on each core. man proc will tell you the exact format of that file. Copied from it, here is the part you should be interested in:
/proc/stat
cpu 3357 0 4313 1362393
The amount of time, measured in units of USER_HZ
(1/100ths of a second on most architectures, use
sysconf(_SC_CLK_TCK) to obtain the right value), that the
system spent in user mode, user mode with low priority
(nice), system mode, and the idle task, respectively.
The last value should be USER_HZ times the second entry
in the uptime pseudo-file.
It is then easy to do the substraction of the idle field between two measures, which will give you the time spent not doing anything by this CPU. The other value that you can extract is the time doing something, which is the difference between two measures of:
time in user mode + time spent in user mode with low priority + time spent in system mode
You will then have two values; one, A, is expressing the time doing nothing, and the other, B, the time actually doing something. B / (A + B) will give you the percentage of time the CPU was busy.

Resources