Sum time output from processes (bash) - bash

I've made an script which measure the time of some processes. This is the file that I get:
real 0m6.768s
real 0m5.719s
real 0m5.173s
real 0m4.245s
real 0m5.257s
real 0m5.479s
real 0m6.446s
real 0m5.418s
real 0m5.654s
The command I use to get the time is this one:
{ time my-command } |& grep real >> times.txt
What I need is to sum all this times and get as a result how many (hours if applies) minutes and seconds using a bash script.

From man bash, then if PAGER is less / time.
If the time reserved word precedes a pipeline, the elapsed as well as user and system time consumed by its exe-
cution are reported when the pipeline terminates. The -p option changes the output format to that specified by
POSIX. The TIMEFORMAT variable may be set to a format string that specifies how the timing information should
be displayed; see the description of TIMEFORMAT under Shell Variables below.
then /TIMEFORMAT
The optional l specifies a longer format, including minutes, of the form MMmSS.FFs. The value of p
determines whether or not the fraction is included.
If this variable is not set, bash acts as if it had the value $'\nreal\t%3lR\nuser\t%3lU\nsys%3lS'. If
the value is null, no timing information is displayed. A trailing newline is added when the format
string is displayed.
If it can be changed to something like
TIMEFORMAT=$'\nreal\t%3R'
without the l, it may be easier to sum.
Note also format may depend on locale LANG:
compare
(LANG=fr_FR.UTF-8; time sleep 1)
and
(LANG=C; time sleep 1)
In that case the sum can be done with an external tool like awk
awk '/^real/ {sum+=$2} END{print sum} ' times.txt
or perl
perl -aln -e '$sum+=$F[1] if /^real/; END{print $sum}' times.txt

Pipe the output to this command
grep real | awk '{ gsub("m","*60+",$2); gsub("s","+",$2); printf("%s",$2); } END { printf("0\n"); }' | bc
This should work if you have generated the output using built-in time command. The output is in seconds.

Related

Split Laravel Log Files by Date

I've inherited a Laravel system with a large single log file that is currently around 17GB in size, I'm now rotating future log files monthly, however I need to split the existing log by month.
The date is formatted as yyyy-mm-dd hh:mm:ss ("[2018-06-28 13:32:05]"). Does anybody know how I could perform the split using only bash scripting (e.g. through use of awk, sed etc.).
The input file name is laravel.log. I'd like output files to have format such as laravel-2018-06.log.
Help much appreciated.
Since the information you provide is a bit sparse, I will go with the following assumptions :
each log-entry is a single line
somewhere there is always one string of the form [yyyy-mm-dd hh:mm:ss], if there are more, we take the first.
your log-file is sorted in time.
The regex which matches your date is,
\\[[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}\\]
or a bit less strict
\\[[-:0-9 ]{19}\\]
So we can use this in combination with match(s,ere) to get the desired string :
awk 'BEGIN{ere="\\[[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}\\]"}
{ match($0,ere); fname="laravel-"substr($0,RSTART+1,7)".log" }
(fname != oname) { close(oname); oname=fname }
{ print > oname }' laravel.log
As you say that your file is a bit on the large side, you might want to test this first on a subset which covers a couple of months.
$ head -10000 laravel.log > laravel.head.log
$ awk '{...}' laravel.head.log
$ md5sum laravel.head.log
$ cat laravel.*-*.log | md5sum
If the md5sum is not matching, you might have a problem.

Unique Linux filename, sortable by time

Previously I was using uuidgen to create unique filenames that I then need to iterate over by date/time via a bash script. I've since found that simply looping over said files via 'ls -l' will not suffice because evidently I can only trust the OS to keep timestamp resolution in seconds (nonoseconds is all zero when viewing files via stat on this particular filesystem and kernel)
So I then though maybe I could just use something like date +%s%N for my filename. This will print the seconds since 1970 followed by the current nanoseconds.
I'm possibly over-engineering this at this point, but these are files generated on high-usage enterprise systems so I don't really want to simply trust the nanosecond timestamp on the (admittedly very small) chance two files are generated in the same nanosecond and we get a collision.
I believe the uuidgen script has logic baked in to handle this occurrence so it's still guaranteed to be unique in that case (correct me if I'm wrong there... I read that someplace I think but the googles are failing me right now).
So... I'm considering something like
FILENAME=`date +%s`-`uuidgen -t`
echo $FILENAME
to ensure I create a unique filename that can then be iterated over with a simple 'ls' and who's name can be trusted to both be unique and sequential by time.
Any better ideas or flaws with this direction?
If you order your date format by year, month (zero padded), day (zero padded), hour (zero padded), minute (zero padded), then you can sort by time easily:
FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t`
echo $FILENAME
or
FILENAME=`date '+%Y-%m-%d-%H-%M'`-`uuidgen -t | head -c 5`
echo $FILENAME
Which would give you:
2015-02-23-08-37-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
or
2015-02-23-08-37-xxxxx
# the same as above, but shorter unique string
You can choose other delimiters for the date/time besides - as you wish, as long as they're within the valid characters for Linux file name.
You will need %N for precision (nanoseconds):
filename=$(date +%s.%N)_$(uuidgen -t); echo $filename
1424699882.086602550_fb575f02-bb63-11e4-ac75-8ca982a9f0aa
BTW if you use %N and you're not using multiple threads, it should be unique enough.
You could take what TIAGO said about %N precision, and combine it with taskset
You can find some info here: http://manpages.ubuntu.com/manpages/hardy/man1/taskset.1.html
and then run your script
taskset --cpu-list 1 my_script
Never tested this, but, it should run your script only on the first core of your CPU. I'm thinking that if your script runs on your first CPU core, combined with date %N (nanoseconds) + uuidgen there's no way you can get duplicate filenames.

Parsing the output of Bash's time builtin

I'm running a C program from a Bash script, and running it through a command called time, which outputs some time statistics for the running of the algorithm.
If I were to perform the command
time $ALGORITHM $VALUE $FILENAME
It produces the output:
real 0m0.435s
user 0m0.430s
sys 0m0.003s
The values depending on the running of the algorithm
However, what I would like to be able to do is to take the 0.435 and assign it to a variable.
I've read into awk a bit, enough to know that if I pipe the above command into awk, I should be able to grab the 0.435 and place it in a variable. But how do I do that?
Many thanks
You must be careful: there's the Bash builtin time and there's the external command time, usually located in /usr/bin/time (type type -a time to have all the available times on your system).
If your shell is Bash, when you issue
time stuff
you're calling the builtin time. You can't directly catch the output of time without some minor trickery. This is because time doesn't want to interfere with possible redirections or pipes you'll perform, and that's a good thing.
To get time output on standard out, you need:
{ time stuff; } 2>&1
(grouping and redirection).
Now, about parsing the output: parsing the output of a command is usually a bad idea, especially when it's possible to do without. Fortunately, Bash's time command accepts a format string. From the manual:
TIMEFORMAT
The value of this parameter is used as a format string specifying how the timing information for pipelines prefixed with the time reserved word should be displayed. The % character introduces an escape sequence that is expanded to a time value or other information. The escape sequences and their meanings are as follows; the braces denote optional portions.
%%
A literal `%`.
%[p][l]R
The elapsed time in seconds.
%[p][l]U
The number of CPU seconds spent in user mode.
%[p][l]S
The number of CPU seconds spent in system mode.
%P
The CPU percentage, computed as (%U + %S) / %R.
The optional p is a digit specifying the precision, the number of fractional digits after a decimal point. A value of 0 causes no decimal point or fraction to be output. At most three places after the decimal point may be specified; values of p greater than 3 are changed to 3. If p is not specified, the value 3 is used.
The optional l specifies a longer format, including minutes, of the form MMmSS.FFs. The value of p determines whether or not the fraction is included.
If this variable is not set, Bash acts as if it had the value
$'\nreal\t%3lR\nuser\t%3lU\nsys\t%3lS'
If the value is null, no timing information is displayed. A trailing newline is added when the format string is displayed.
So, to fully achieve what you want:
var=$(TIMEFORMAT='%R'; { time $ALGORITHM $VALUE $FILENAME; } 2>&1)
As #glennjackman points out, if your command sends any messages to standard output and standard error, you must take care of that too. For that, some extra plumbing is necessary:
exec 3>&1 4>&2
var=$(TIMEFORMAT='%R'; { time $ALGORITHM $VALUE $FILENAME 1>&3 2>&4; } 2>&1)
exec 3>&- 4>&-
Source: BashFAQ032 on the wonderful Greg's wiki.
You could try the below awk command which uses split function to split the input based on digit m or last s.
$ foo=$(awk '/^real/{split($2,a,"[0-9]m|s$"); print a[2]}' file)
$ echo "$foo"
0.435
You can use this awk:
var=$(awk '$1=="real"{gsub(/^[0-9]+[hms]|[hms]$/, "", $2); print $2}' file)
echo "$var"
0.435

Readable output for tracking runtime

I want to have a proper output style using /usr/bin/time and when I try something like
/usr/bin/time -f'time=%E' ls > /dev/null
the output is
time=0:00.05
where the 5 says 5 centiseconds.
If my command/script runs a longer time, the output will be e.g.:
time=1:30:05
where the 5 says 5 seconds.
I wanted to have the output written in man time:
The format string
The format is interpreted in the usual printf-like way. Ordinary characters are directly copied, tab, newline and backslash are escaped using \t, \n and \\, a
percent sign is represented by %%, and otherwise % indicates a conversion. The program time will always add a trailing newline itself. The conversions follow.
All of those used by tcsh(1) are supported.
Time
%E Elapsed real time (in [hours:]minutes:seconds).
So I don't want to have those confusing centiseconds. The format should be logical and easy readable without using additional scripts like sed. When I have a log for several commands, the output should be something like:
time=0:00:01
time=3:30:12
time=0:10:01

Script for finding average runtime of a program

I found partial solutions on several sites, so I pulled several parts together, but I still couldn't figure it out.
Here is what I am doing:
I am running a simple java program from Terminal, and need to find the average runtime for the program.
What I am doing is running the command several times, finding the total time, and then dividing that total time by the number of times I ran the program.
I would also like to acquire the output of the program rather than displaying it on standard output.
Here is my current code and the output.
Shell Script:
startTime=$(date +%s%N)
for ((i = 0; i < $runTimes; i++))
do
java Program test.txt > /dev/null
done
endTime=$(date +%s%N)
timeDiff=$(( $endTime - $startTime ))
timeAvg=$(( $timeDiff / $numTimes ))
echo "Avg Time Taken: "
echo $timeAvg
Output:
./run: line 12: 1305249784N: value too great for base (error token is "1305249784N")
The line number 12 is off because this code is part of a larger file.
The line number 12 is the line with timeDiff being evaluated.
I appreciate any help, and apologize if this question is redundant or off-topic.
On my machine, I don't see what the %N format for date is getting you, as the value seems to be 7 zeros, BUT it is making a much bigger number to evaluate in the math, i.e. 1305250833570000000. Do you really need nano-second precision? I'll bet if you go with just %s it will be fine.
Otherwise you look to be on the right track.
P.S.
Oh yeah, minor point,
echo "Avg Time Taken: $timeAvg"
Is a a simpler way to achieve your required output ;-)
Option 2. You could take out the date calculations all together, and turn your loop into a small script. Then you can use a built-in feature of the shell
time myJavaTest.sh
Will give you details like
real 0m0.049s
user 0m0.016s
sys 0m0.015s
I hope this helps.

Resources