jupyter notebook %%time doesn't measure cpu time of %%sh commands? - shell

When I run python code in a jupyter-lab (v3.4.3) ipython notebook (v8.4.0) and use the %%time cell magic, both cpu time and wall time are reported.
%%time
for i in range(10000000):
a = i*i
CPU times: user 758 ms, sys: 121 µs, total: 758 ms
Wall time: 757 ms
But when the same computation is performed using the %%sh magic to run a shell script, the cpu time results are nonsense.
%%time
%%sh
python -c "for i in range(10000000): a = i*i"
CPU times: user 6.14 ms, sys: 12.5 ms, total: 18.6 ms
Wall time: 920 ms
The docs for %time do say "Time execution of a Python statement or expression.", but this still surprised me because I had assumed that the shell script will run in a python subprocess and thus can also be measured. So, what's going on here? Is this a bug, or just a known caveat of using %%sh?
I know I can use the shell builtin time or /usr/bin/time to get similar output, but this is a bit cumbersome for multiple lines of shell---is there a better workaround?

Related

Is MEX of MATLAB known to be slow on macOS?

Question: Is MEX of MATLAB known to be slow on macOS?
Here, "slow" refers to the speed of setting MEX up, of compiling MEX code, and of running the MEX function.
I have done some timing using GitHub Actions. The hardware specification is as follows. It is copied from the official documentation of GitHub Actions. The detailed information of CPU (frequency etc) and memory are not available.
GNU/Linux (ubuntu 20.04) and Windows (Windows Server 2019):
2-core CPU
7 GB of RAM memory
14 GB of SSD disk space
macOS (macOS Big Sur 11):
3-core CPU
14 GB of RAM memory
14 GB of SSD disk space
Here is the data obtained.
System: GNU/Linux | Language: C | MATLAB: 2021b | Time: 2022.02.26 05:46:36
MEX configured to use 'gcc' for C language compilation.
- Time for setting MEX up: 0.477950 seconds
- Time for mexifying timestwo: 4.500026 seconds
- Time for 100 runs of timestwo: 0.003845 seconds
System: Windows | Language: C | MATLAB: 2021b | Time: 2022.02.26 05:47:56
MEX configured to use 'Microsoft Visual C++ 2019 (C)' for C language compilation.
- Time for setting MEX up: 2.518557 seconds
- Time for mexifying timestwo: 4.416958 seconds
- Time for 100 runs of timestwo: 0.004215 seconds
System: macOS | Language: C | MATLAB: 2021b | Time: 2022.02.26 05:49:01
MEX configured to use 'Xcode with Clang' for C language compilation.
- Time for setting MEX up: 17.602277 seconds
- Time for mexifying timestwo: 5.979585 seconds
- Time for 100 runs of timestwo: 0.130843 seconds
System: GNU/Linux | Language: Fortran | MATLAB: 2021b | Time: 2022.02.26 05:46:20
MEX configured to use 'gfortran' for FORTRAN language compilation.
- Time for setting MEX up: 0.835881 seconds
- Time for mexifying timestwo: 2.768746 seconds
- Time for 100 runs of timestwo: 0.003279 seconds
System: Windows | Language: Fortran | MATLAB: 2021b | Time: 2022.02.26 05:51:04
MEX configured to use 'Intel oneAPI 2021 for Fortran with Microsoft Visual Studio 2019' for FORTRAN language compilation.
- Time for setting MEX up: 1.660305 seconds
- Time for mexifying timestwo: 3.495534 seconds
- Time for 100 runs of timestwo: 0.003299 seconds
System: macOS | Language: Fortran | MATLAB: 2021b | Time: 2022.02.26 05:49:47
MEX configured to use 'Intel Fortran Composer XE' for FORTRAN language compilation.
- Time for setting MEX up: 248.263933 seconds
- Time for mexifying timestwo: 87.093711 seconds
- Time for 100 runs of timestwo: 0.078741 seconds
It turns out that MEX is much slower on macOS than on GNU/Linux or Windows: slow to set up, slow to mexify, and the MEX function is slow to run. In particular, it is about 300 times slower to set up MEX for Fortran on macOS than on Linux. However, note that the significant difference probably comes from the virtual environment of GitHub Actions. On local machines, the difference may not be that dramatic.
If you are interested in the timing, below is the code I used for it. It is also available on GitHub.
function mex_time(language)
% MEX_TIME measures the running time of MATLAB concerning MEX.
orig_warning_state = warning;
warning('off','all');
if nargin == 1 && (isa(language, 'char') || isa(language, 'string')) && strcmpi(language, 'Fortran')
language = 'Fortran';
timestwo_src = 'timestwo.F';
else
language = 'C';
timestwo_src = 'timestwo.c';
end
if ismac
sys = 'macOS';
elseif isunix
sys = 'GNU/Linux';
elseif ispc
sys = 'Windows';
else
error('Platform not supported.')
end
matlab_version = version('-release');
date_time = datestr(now,'yyyy.mm.dd HH:MM:SS');
fprintf('\nSystem: %s | Language: %s | MATLAB: %s | Time: %s\n\n', sys, language, matlab_version, date_time);
tic;
mex('-setup', language);
fprintf('\n- Time for setting MEX up: %f seconds\n\n', toc);
clear('timestwo');
tic;
mex(fullfile(matlabroot, 'extern', 'examples', 'refbook', timestwo_src));
fprintf('\n- Time for mexifying timestwo: %f seconds\n', toc);
tic;
for i = 1 : 100
timestwo(i);
end
fprintf('\n- Time for 100 runs of timestwo: %f seconds\n\n', toc);
delete('timestwo.*');
warning(orig_warning_state);
You may try
mex_time('C');
mex_time('Fortran');
on your computer and post the results here. Thank you very much.
(See the discussions on MATLAB Answers and GitHub)

bash script with parallel execution

I am trying to use parallel in a bash script, to verify if s3 path exists or not and I am trying to verify multiple s3 paths, by counting the objects in the path. If the count of the object is zero it will continue to the next date in the for loop, with parallel it is not working as expected.
For Date range I provided in the for loop, we actually don't have those folders in the s3bucket, and in the function checkS3Path if s3 path doesnt exists, I am creating a 0KB file, but I dont see those 0KB files being created after script is executed. From the output of the script, I am seeing S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03, instead of S3 Path Doesnt Exists folder1:+2019-10-03. Please see the output below.
please let me what might be the issue.
Here is the sample code.
#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)
checkS3Path() {
fldName=$1
date=$2
objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
echo $objectNum
if [ "$objectNum" -eq 0 ]
then
echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
touch /home/ubuntu/${fldName}_${date}.txt
continue
else
echo "S3 Path Consists csv Files, Proceeding to next step ${fldName}:${date}"
fi
}
final() {
fldName=$1
date=$2
checkS3Path $fldName $date
function2 $fldName $date
function3 $fldName $date
}
export -f final checkS3Path
for date in 2019-10-{01..03}
do
# finalstep folder1 $date
parallel --jobs 4 --eta finalstep ::: "${Array[#]}" ::: +"$date"
done
Here is the output I am seeing.
$ ./test.sh
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/2.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-01
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s local:4/3/100%/0.3s 202
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s local:4/3/100%/0.3s 202
$
Thanks
If checkS3Path works when run by hand, then you probably just need to:
export s3Bucket=testbucket
export version=v20
Each GNU Parallel job runs in its own shell (started from Perl) which is the reason you need to export variables, if you want them to be visible to the job.
Also look at env_parallel to do this automatically.

High RSS and OOM kill despite low value in runtime.MemStats.Sys

I have a process which slowly consumes more RAM until it eventually hits its cgroup limit and is OOM killed, and I'm trying to figure out why.
Oddly, go's runtime seems to think not much RAM is used, whereas the OS seems to think a lot is used.
Specifically, looking at runtime.MemStats (via the extvar package) I see:
"Alloc":51491072,
"TotalAlloc":143474637424,
"Sys":438053112,
"Lookups":0,
"Mallocs":10230571,
"Frees":10195515,
"HeapAlloc":51491072,
"HeapSys":388464640,
"HeapIdle":333824000,
"HeapInuse":54640640,
"HeapReleased":0,
"HeapObjects":35056,
"StackInuse":14188544,
"StackSys":14188544,
"MSpanInuse":223056,
"MSpanSys":376832,
"MCacheInuse":166656,
"MCacheSys":180224,
"BuckHashSys":2111104,
"GCSys":13234176,
"OtherSys":19497592
But from the OS perspective:
$ ps auxwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 178 0.0 0.0 3996 3372 pts/0 Ss 17:33 0:00 bash
root 246 0.0 0.0 7636 2828 pts/0 R+ 17:59 0:00 \_ ps auxwf
root 1 166 2.8 11636248 5509288 ? Ssl 17:24 57:15 app server -api-public
So, the OSS reports an RSS of 5380 MiB, but the Sys field in MemStats shows only 417 MiB. My understanding is these fields should be approximately the same.
GC is running, as confirmed by setting GODEBUG=gctrace=1,madvdontneed=1. For example, I see output like:
gc 6882 #2271.137s 0%: 0.037+2.2+0.087 ms clock, 3.5+0.78/37/26+8.4 ms cpu, 71->72->63 MB, 78 MB goal, 96 P
The numbers vary a bit depending on the process, but they are all <100 MB, whereas the OS is reporting >1GB (and growing, until eventual OOM).
madvdontneed=1 was a shot in the dark but seems to make no difference. I wouldn't think the madvise parameters would be relevant, since it doesn't seem there's any need to return memory to the kernel, as the Go runtime doesn't think it's using much memory anyway.
What could explain this discrepancy? Am I not correctly understanding the semantics of these fields? Are there mechanisms that would result in the growth of RSS (and an eventual OOM kill) but not increase MemStats.Sys?

Query a `perf.data` file for the total raw execution time of a symbol

I used perf to generate a perf file with perf record ./application. perf report shows me various things about it. How can I show the total time it took to run the application, and the total time to run a specific "symbol"/function? perf seems to often show percentages, but I want raw time, and it want "inclusive" time, i.e. including children.
perf v4.15.18 on Ubuntu Linux 18.04
perf is statistical (sampling) profiler (in its default perf record mode), and it means it have no exact timestamps on function entry and exit (tracing is required for exact data). Perf asks OS kernel to generate interrupts thousands times per second (4 kHz for hardware PMU if -e cycles supported, less for software event -e cpu-clock). Every interrupt of program execution is recorded as sample which contains EIP (current instruction pointer), pid (process/thread id), timestamp of current time. When program runs for several seconds, there will be thousands of samples, and perf report can generate histograms from them: which parts of program code (which functions) were executed more often than other. You will get generic overview that some functions did take around 30% of program execution time while other - 5%.
perf report does not compute total program execution time (it may estimate it by comparing timestamps of first and last sample, but it is not exact if there were off-CPU periods). But it does estimate total event count (it is printed in first line in interactive TUI and is listed in text output):
$ perf report |grep approx
# Samples: 1K of event 'cycles'
# Event count (approx.): 844373507
There is perf report -n option which adds column "number of samples" next to percent column.
Samples: 1K of event 'cycles', Event count (approx.): 861416907
Overhead Samples Command Shared Object Symbol
42.36% 576 bc bc [.] _bc_rec_mul
37.49% 510 bc bc [.] _bc_shift_addsub.isra.3
14.90% 202 bc bc [.] _bc_do_sub
0.89% 12 bc bc [.] bc_free_num
But samples are taken not at same intervals and they are less exact than computed overhead (every sample may have different weight). I will recommend you to run perf stat ./application to have real total running time and total hardware counts for your application. It is better when your application has stable running time (do perf stat -r 5 ./application to have variation estimated by tool as "+- 0.28%" in last column)
To include children functions stack traces must be sampled at every interrupt. They are not sampled in default perf record mode. This sampling is turned on with -g or --call-graph dwarf options: perf record -g ./application or perf record --call-graph dwarf ./application. It is not simple to use it correctly for preinstalled libraries or applications in Linux (as most distributions strip debug information from packages), but can be used for your own applications compiled with debug information. The default -g which is same as --call-graph fp requires that all code is compiled with -fno-omit-frame-pointer gcc option, and non-default --call-graph dwarf is more reliable. With correctly prepared program and libraries, single-threaded application, and long enough stack size samples (8KB is default, change with --call-graph dwarf,65536), perf report should show around 99% for _start and main functions (including children).
bc calculator compiled with -fno-omit-frame-pointer:
bc-no-omit-frame$ echo '3^123456%3' | perf record -g bc/bc
bc-no-omit-frame$ perf report
Samples: 1K of event 'cycles:uppp', Event count (approx.): 811063902
Children Self Command Shared Object Symbol
+ 98.33% 0.00% bc [unknown] [.] 0x771e258d4c544155
+ 98.33% 0.00% bc libc-2.27.so [.] __libc_start_main
+ 98.33% 0.00% bc bc [.] main
bc calculator with dwarf call graph:
$ echo '3^123456%3' | perf record --call-graph dwarf bc/bc
$ perf report
Samples: 1K of event 'cycles:uppp', Event count (approx.): 898828479
Children Self Command Shared Object Symbol
+ 98.42% 0.00% bc bc [.] _start
+ 98.42% 0.00% bc libc-2.27.so [.] __libc_start_main
+ 98.42% 0.00% bc bc [.] main
bc without debug info has incorrect call graph handling by perf in -g (fp) mode (no 99% for main):
$ cp bc/bc bc.strip
$ strip -d bc.strip
$ echo '3^123456%3' | perf record --call-graph fp ./bc.strip
Samples: 1K of event 'cycles:uppp', Event count (approx.): 841993392
Children Self Command Shared Object Symbol
+ 43.94% 43.94% bc.strip bc.strip [.] _bc_rec_mul
+ 39.73% 39.73% bc.strip bc.strip [.] _bc_shift_addsub.isra.3
+ 11.27% 11.27% bc.strip bc.strip [.] _bc_do_sub
+ 0.92% 0.92% bc.strip libc-2.27.so [.] malloc
Sometimes perf report --no-children can be useful to disable sorting on self+children overhead (will sort by "self" overhead), for example when call graph was not fully captured.

Using the top program in bash to extract cpu time into a variable

I have an assignment in bash scripting trying to measure cpu time used for a process passed into the script by name. I can find the process id and pass it to the top program in bash. However, I haven't figured out how to extract the cpu time from the top program. for example:
top is printing out:
top - 00:57:07 up 6:06, 2 users, load average: 0.46, 0.31, 0.55
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.7 us, 0.8 sy, 0.0 ni, 94.6 id, 0.9 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 1928720 total, 1738072 used, 190648 free, 57184 buffers
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3337 amarkovi 20 0 372m 31m 10m R 0.7 1.7 13:28.74 chromium-browse
all I want from this is the TIME+ field to be assigned to variable so I can add up the time and print it out by it self.
I am a noob to bash scripting so please be patient.
thanks,
Do you have to use top? It should be much simpler (once you work out the right options) to use ps to give you just the fields you want, then use grep to select just the processes you want.
Since it's an assigment i don't want to spoil all the fun :D. I'll just point you to some commands of which can help you in your endeavour : sed, awk and cut. With this 3 you can solve it in many ways, enjoy!

Resources