Extract Bash output to a CSV File for Plotting - bash

I am trying to measure the time it takes for a Kubernetes object to be deployed in a Kubernetes cluster by using the time utility. I am trying to do that severally with a time sleep to get values for multiple simulation of deployments.
This is the script.
#!/bin/bash
function time_check {
i=$1
time kubectl apply -f deploy.yml --dry-run=client
}
for i in {1..3}
do
time_check $i &
sleep 2
done
This is the Output
deployment.apps/nginx-raw created (dry run)
real 0m0.421s
user 0m0.359s
sys 0m0.138s
deployment.apps/nginx-raw created (dry run)
real 0m0.359s
user 0m0.443s
sys 0m0.158s
deployment.apps/nginx-raw created (dry run)
real 0m0.138s
user 0m0.412s
sys 0m0.122s
deployment.apps/nginx-raw created (dry run)
real 1.483s
user 0m0.412s
sys 0m0.122s
deployment.apps/nginx-raw created (dry run)
real 1.456s
user 0m0.234s
sys 0m0.567s
deployment.apps/nginx-raw created (dry run)
real 2.345
user 0m0.435s
sys 0m0.123s
Goal
I want to pipe the output and take the first row of each iteration's real 0m0.421s , Then take the number part 0m0.421s and strip the 0m if it's in seconds or just leave it if it's in minutes like 1.483. Also strip the s at the end
The final results should be output in a CSV file to be plotted. The expected output in CSV
real
0.421
0.359
0.138
1.483
1.456
2.345
Add-on
I will do this for another deployment and plot the two times data in a line graph to see the time it takes for each deployment

You are using the shell builtin command time. If you switch to linux's time command you can control the output and get the just the data you want.
$ /usr/bin/time -f '%e' sleep 1.5
1.50
see man time for more details
you can take the output and pipe it into grep -v deployment | tr '\n' ',' that will strip the dry run lines, and convert the renaming newlines into commas
$ printf "1\njunk\n2\njunk\n3\n"
1
junk
2
junk
3
$ printf "1\njunk\n2\njunk\n3\n" | grep -v junk | tr '\n' ','
1,2,3, $
this is a quick and dirty way to slice the date. I'm sure there are other solutions as well.

I just used a randomized sub-second sleep to get the output stream, but the principle should work.
$: for x in 0 1 2 3 4 5 6 7 8 9; do time sleep 0.$RANDOM; done 2>&1 |
> awk 'BEGIN{print "real"}/^real/{ print gensub(/^.*m([0-9.]+)s/,"\\1",1)}'
real
0.266
0.716
0.847
0.251
0.358
0.236
0.669
0.266
0.308
0.856
Explained a bit with inline comments -
$: for x in 0 1 2 3 4 5 6 7 8 9; do # this is just a dump loop
time sleep 0.$RANDOM # create some output
done 2>&1 | # dup time output stderr -> stdout
> awk >my.csv 'BEGIN{print "real"} # the header
/^real/{ # match for the lines we want (ignore the rest)
print gensub(/^.*m([0-9.]+)s/,"\\1",1) # just print matched part
}'
separate issue
I want to pipe the output and take the first row of each iteration's real 0m0.421s , Then take the number part 0m0.421s and strip the 0m if it's in seconds or just leave it if it's in minutes like 1.483. Also strip the s at the end
So what you are saying is that if it takes 1.5 seconds you want it to output 1.500, but if it takes 90.0 seconds you want it to output 1.500.
How will you tell which it is without either specifying units or standardizing units?

Related

How to quickly concatenate large numbers of strings

Background: I am running unit tests and one requires calling a PSQL function with a high number of URLs (ie. 2000+). and this is extremely slow as shown in this Minimal Working Example (MWE)
MWE:
#!/bin/bash
# Generate random 4096 character alphanumeric
# string (upper and lowercase)
URL="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $((4096-15)) | head -n 1).com"
# Create a comma separated list of 2000 URLs
for i in $(seq 2000)
do
URLS="$URLS,$URL"
done
We call it and measure the run time like so
$ time ./generate_urls.sh
real 1m30.681s
user 1m14.648s
sys 0m16.000s
Question: Is there a faster, more efficient way to achieve this same result?
Instead of concatenating over and over, just print them all and store the result.
URLS=$(
for i in $(seq 2000) ; do
printf %s, "$URL"
done
)
echo "${URLS%,}" # Remove the final comma.
Takes less than 2 secs on my machine. Even when I move the URL generation inside the loop, it takes just about 8 secs.
If you always want 2000 URLs then this code is much faster than the code in the question:
# Create a comma separated list of 2000 (5*5*5*4*4) URLs
urls=$url,$url,$url,$url,$url # x 5
urls=$urls,$urls,$urls,$urls,$urls # x 5
urls=$urls,$urls,$urls,$urls,$urls # x 5
urls=$urls,$urls,$urls,$urls # x 4
urls=$urls,$urls,$urls,$urls # x 4
See Correct Bash and shell script variable capitalization for an explanation of why I changed the variable names to lowercase.

How to run a short program repeatedly and then time it in command line [duplicate]

How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!
You may use seq in iteration as well:
for i in `seq 1 100`; do ... done
for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.
The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s
You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.
This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'

I want to make a simple bash script that will run one of my executables over 100 times

I have a simple C program that measures the time it takes based on inputted files to analyze. It prints out the time taken in a sentence to stdout. Is there a way to have a bash script run this program with a particular input many times and pull out the time run from each iteration (from stdout) and so I can average that time over all iterations?
So I would run the original C program like so:
./test file1 file2 out.out
And after running, it would print
"Elapsed time is xx.xx seconds" to stdout.
In particular, how would I write a shell script that would run test 100 times, on the same input files, and just average out the elapsed time for all runs?
Thank you, sorry for not clarifying
Like this maybe:
#!/bin/bash
start=$SECONDS
for i in {0..99}; do
./program file1 file2
done
end=$SECONDS
echo "scale=2;($end-$start)/100"|bc
I am just using $SECONDS (which bash counts up for you anyway) to find the total elapsed time without the need to run awk. Then at the end, I let bc calcuate the floating point average time in seconds to 2 decimal places (scale=2).
Try
for i in {0..99};
/usr/bin/time test file1 file2 out.out 2>&1 >/dev/null | awk '
{split($3, ct, "[:a-zA-Z]"); summ += ct[1]; sums += ct[2]}
END {print summ/100.0, sums/100.0}
'
done
The order of the redirection is important; you want stderr (the output of the time command) to be piped to awk when stdout goes to null. If you don't have any stdout, you can delete the >/dev/null. You can't use the bash time built-in, since it will time the entire pipe.
You can process the minute and second sums as you want, either to get a total elapsed time, an average time, etc.

Parse time cmdline output using awk cmdline

I have a program that I am timing, lets say some_bin. I run the time command and it produces output like so:
time -p some_bin --some-args=args
real 1.09
user 1.08
sys 0.00
I want to get just the 1.09 real time the program was in use. I'm trying to use awk in this case to pattern match the first line on real and then extract the time.
Everything I've tried thus far however has failed to work. Any ideas on how I can accomplish this?
Both these command would provide the real time on its own
/usr/bin/time -f "%e" MYCOMMAND
TIMEFORMAT='%2R'; time MYCOMMAND
TIMEFORMAT also allows you to change the precision of the time output.
example output
/usr/bin/time -f "%e" sleep 2
2.00
.
TIMEFORMAT='%2R'; time sleep 2
2.00
time command's output comes after completion of your command and it is written on stderr. You can use it like this:
( time -p some_bin --some-args=args ) |& awk '/real/{print $2}'
You can also use:
( time -p some_bin --some-args=args ) 2>&1 | awk '/real/{print $2}'

Bash shell script for 100 iterations and log time taken

How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!
You may use seq in iteration as well:
for i in `seq 1 100`; do ... done
for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.
The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s
You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.
This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'

Resources