Bash shell script for 100 iterations and log time taken - bash

How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!

You may use seq in iteration as well:
for i in `seq 1 100`; do ... done

for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.

The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s

You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.

This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'

Related

Extract Bash output to a CSV File for Plotting

I am trying to measure the time it takes for a Kubernetes object to be deployed in a Kubernetes cluster by using the time utility. I am trying to do that severally with a time sleep to get values for multiple simulation of deployments.
This is the script.
#!/bin/bash
function time_check {
i=$1
time kubectl apply -f deploy.yml --dry-run=client
}
for i in {1..3}
do
time_check $i &
sleep 2
done
This is the Output
deployment.apps/nginx-raw created (dry run)
real 0m0.421s
user 0m0.359s
sys 0m0.138s
deployment.apps/nginx-raw created (dry run)
real 0m0.359s
user 0m0.443s
sys 0m0.158s
deployment.apps/nginx-raw created (dry run)
real 0m0.138s
user 0m0.412s
sys 0m0.122s
deployment.apps/nginx-raw created (dry run)
real 1.483s
user 0m0.412s
sys 0m0.122s
deployment.apps/nginx-raw created (dry run)
real 1.456s
user 0m0.234s
sys 0m0.567s
deployment.apps/nginx-raw created (dry run)
real 2.345
user 0m0.435s
sys 0m0.123s
Goal
I want to pipe the output and take the first row of each iteration's real 0m0.421s , Then take the number part 0m0.421s and strip the 0m if it's in seconds or just leave it if it's in minutes like 1.483. Also strip the s at the end
The final results should be output in a CSV file to be plotted. The expected output in CSV
real
0.421
0.359
0.138
1.483
1.456
2.345
Add-on
I will do this for another deployment and plot the two times data in a line graph to see the time it takes for each deployment
You are using the shell builtin command time. If you switch to linux's time command you can control the output and get the just the data you want.
$ /usr/bin/time -f '%e' sleep 1.5
1.50
see man time for more details
you can take the output and pipe it into grep -v deployment | tr '\n' ',' that will strip the dry run lines, and convert the renaming newlines into commas
$ printf "1\njunk\n2\njunk\n3\n"
1
junk
2
junk
3
$ printf "1\njunk\n2\njunk\n3\n" | grep -v junk | tr '\n' ','
1,2,3, $
this is a quick and dirty way to slice the date. I'm sure there are other solutions as well.
I just used a randomized sub-second sleep to get the output stream, but the principle should work.
$: for x in 0 1 2 3 4 5 6 7 8 9; do time sleep 0.$RANDOM; done 2>&1 |
> awk 'BEGIN{print "real"}/^real/{ print gensub(/^.*m([0-9.]+)s/,"\\1",1)}'
real
0.266
0.716
0.847
0.251
0.358
0.236
0.669
0.266
0.308
0.856
Explained a bit with inline comments -
$: for x in 0 1 2 3 4 5 6 7 8 9; do # this is just a dump loop
time sleep 0.$RANDOM # create some output
done 2>&1 | # dup time output stderr -> stdout
> awk >my.csv 'BEGIN{print "real"} # the header
/^real/{ # match for the lines we want (ignore the rest)
print gensub(/^.*m([0-9.]+)s/,"\\1",1) # just print matched part
}'
separate issue
I want to pipe the output and take the first row of each iteration's real 0m0.421s , Then take the number part 0m0.421s and strip the 0m if it's in seconds or just leave it if it's in minutes like 1.483. Also strip the s at the end
So what you are saying is that if it takes 1.5 seconds you want it to output 1.500, but if it takes 90.0 seconds you want it to output 1.500.
How will you tell which it is without either specifying units or standardizing units?

tracking status/progress in gnu parallel

I've implemented parallel in one of our major scripts to perform data migrations between servers. Presently, the output is presented all at once (-u) in pretty colors, with periodic echos of status from the function being executed depending on which sequence is being run (e.g. 5/20: $username: rsyncing homedir or 5/20: $username: restoring account). These are all echoed directly to the terminal running the script, and accumulate there. Depending on the length of time a command is running, however, output can end up well out of order, and long running rsync commands can be lost in the shuffle. Butm I don't want to wait for long running processes to finish in order to get the output of following processes.
In short, my issue is keeping track of which arguments are being processed and are still running.
What I would like to do is send parallel into the background with (parallel args command {#} {} ::: $userlist) & and then track progress of each of the running functions. My initial thought was to use ps and grep liberally along with tput to rewrite the screen every few seconds. I usually run three jobs in parallel, so I want to have a screen that shows, for instance:
1/20: user1: syncing homedir
current file: /home/user1/www/cache/file12589015.php
12/20: user12: syncing homedir
current file: /home/user12/mail/joe/mailfile
5/20: user5: collecting information
current file:
I can certainly get the above status output together no problem, but my current hangup is separating the output from the individual parallel processes into three different... pipes? variables? files? so that it can be parsed into the above information.
Not sure if this is much better:
echo hello im starting now
sleep 1
# start parallel and send the job to the background
temp=$(mktemp -d)
parallel --rpl '{log} $_="Working on#arg"' -j3 background {} {#} ">$temp/{1log} 2>&1;rm $temp/{1log}" ::: foo bar baz foo bar baz one two three one two three :::+ 5 6 5 3 4 6 7 2 5 4 6 2 &
while kill -0 $! 2>/dev/null ; do
cd "$temp"
clear
tail -vn1 *
sleep 1
done
rm -rf "$temp"
It make a logfile for each job. Tails all logfiles every second and removes the logfile when a jobs is done.
The logfiles are named 'working on ...'.
I believe that this is close to what I need, though it isnt very tidy and probably isnt optimal:
#!/bin/bash
background() { #dummy load. $1 is text, $2 is number, $3 is position
echo $3: starting sleep...
sleep $2
echo $3: $1 slept for $2
}
progress() {
echo starting progress loop for pid $1...
while [ -d /proc/$1 ]; do
clear
tput cup 0 0
runningprocs=`ps faux | grep background | egrep -v '(parallel|grep)'`
numprocs=`echo "$runningprocs" | wc -l`
for each in `seq 1 ${numprocs}`; do
line=`echo "$runningprocs" | head -n${each} | tail -n1`
seq=`echo $line | rev | awk '{print $3}' | rev`
# print select elements from the ps output
echo working on `echo $line | rev | awk '{print $3, $4, $5}' | rev`
# print the last line of the log for that sequence number
cat logfile.log | grep ^$seq\: | tail -n1
echo
done
sleep 1
done
}
echo hello im starting now
sleep 1
export -f background
# start parallel and send the job to the background
parallel -u -j3 background {} {#} '>>' logfile.log ::: foo bar baz foo bar baz one two three one two three :::+ 5 6 5 3 4 6 7 2 5 4 6 2 &
pid=$!
progress $pid
echo finished!
I'd rather not depend on scraping all information from ps and would prefer to get the actual line output of each parallel process, but a guy's gotta do what a guy's gotta do. regular output sent to a logfile for parsing later on.

How to run a short program repeatedly and then time it in command line [duplicate]

How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!
You may use seq in iteration as well:
for i in `seq 1 100`; do ... done
for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.
The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s
You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.
This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'

Fastest way to print a single line in a file

I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance).
There are many ways to do this, i manly use these 2
cat ${file} | head -1
or
cat ${file} | sed -n '1p'
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
Drop the useless use of cat and do:
$ sed -n '1{p;q}' file
This will quit the sed script after the line has been printed.
Benchmarking script:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
Just save as benchmark.sh and run bash benchmark.sh.
Results:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**Results from file with 1,000,000 lines.*
So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
Note: timings are different from original post due to being on a faster Linux box.
If you are really just getting the very first line and reading hundreds of files, then consider shell builtins instead of external external commands, use read which is a shell builtin for bash and ksh. This eliminates the overhead of process creation with awk, sed, head, etc.
The other issue is doing timed performance analysis on I/O. The first time you open and then read a file, file data is probably not cached in memory. However, if you try a second command on the same file again, the data as well as the inode have been cached, so the timed results are may be faster, pretty much regardless of the command you use. Plus, inodes can stay cached practically forever. They do on Solaris for example. Or anyway, several days.
For example, linux caches everything and the kitchen sink, which is a good performance attribute. But it makes benchmarking problematic if you are not aware of the issue.
All of this caching effect "interference" is both OS and hardware dependent.
So - pick one file, read it with a command. Now it is cached. Run the same test command several dozen times, this is sampling the effect of the command and child process creation, not your I/O hardware.
this is sed vs read for 10 iterations of getting the first line of the same file, after read the file once:
sed: sed '1{p;q}' uopgenl20121216.lis
real 0m0.917s
user 0m0.258s
sys 0m0.492s
read: read foo < uopgenl20121216.lis ; export foo; echo "$foo"
real 0m0.017s
user 0m0.000s
sys 0m0.015s
This is clearly contrived, but does show the difference between builtin performance vs using a command.
If you want to print only 1 line (say the 20th one) from a large file you could also do:
head -20 filename | tail -1
I did a "basic" test with bash and it seems to perform better than the sed -n '1{p;q} solution above.
Test takes a large file and prints a line from somewhere in the middle (at line 10000000), repeats 100 times, each time selecting the next line. So it selects line 10000000,10000001,10000002, ... and so on till 10000099
$wc -l english
36374448 english
$time for i in {0..99}; do j=$((i+10000000)); sed -n $j'{p;q}' english >/dev/null; done;
real 1m27.207s
user 1m20.712s
sys 0m6.284s
vs.
$time for i in {0..99}; do j=$((i+10000000)); head -$j english | tail -1 >/dev/null; done;
real 1m3.796s
user 0m59.356s
sys 0m32.376s
For printing a line out of multiple files
$wc -l english*
36374448 english
17797377 english.1024MB
3461885 english.200MB
57633710 total
$time for i in english*; do sed -n '10000000{p;q}' $i >/dev/null; done;
real 0m2.059s
user 0m1.904s
sys 0m0.144s
$time for i in english*; do head -10000000 $i | tail -1 >/dev/null; done;
real 0m1.535s
user 0m1.420s
sys 0m0.788s
How about avoiding pipes?
Both sed and head support the filename as an argument. In this way you avoid passing by cat. I didn't measure it, but head should be faster on larger files as it stops the computation after N lines (whereas sed goes through all of them, even if it doesn't print them - unless you specify the quit option as suggested above).
Examples:
sed -n '1{p;q}' /path/to/file
head -n 1 /path/to/file
Again, I didn't test the efficiency.
I have done extensive testing, and found that, if you want every line of a file:
while IFS=$'\n' read LINE; do
echo "$LINE"
done < your_input.txt
Is much much faster then any other (Bash based) method out there. All other methods (like sed) read the file each time, at least up to the matching line. If the file is 4 lines long, you will get: 1 -> 1,2 -> 1,2,3 -> 1,2,3,4 = 10 reads whereas the while loop just maintains a position cursor (based on IFS) so would only do 4 reads in total.
On a file with ~15k lines, the difference is phenomenal: ~25-28 seconds (sed based, extracting a specific line from each time) versus ~0-1 seconds (while...read based, reading through the file once)
The above example also shows how to set IFS in a better way to newline (with thanks to Peter from comments below), and this will hopefully fix some of the other issue seen when using while... read ... in Bash at times.
For the sake of completeness you can also use the basic linux command cut:
cut -d $'\n' -f <linenumber> <filename>

How can I run a shell script every 5 seconds for 2 minutes?

I have seen many solutions to half my problem - running the script every 5 seconds.
In addition to this I also only want it to run for only 2 minutes.
The point of the script is to sample the RSSI at a certain position for a period of time:
#!/bin/bash
RSSI_CSV=$1
DISTANCE=$2
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
At the command line it is called with:
sh rssi_script.sh output.csv position
What would be the most robust solution to solve my problem?
Possibilities I have considered:
repeat the script 40 times within itself (measure the RSSI 40 times and output it to the CSV 40 times, the position will be the same throughout the experiment) This would also solve the problem of limiting the run to 2 minutes. I might add some new command line arguments which could be difficult to keep track of if I have to change 40 variants every time
use watch to sample every 5 seconds and cron to limit it to 2 minutes (not 100% cron can do this)
A while loop for 2 minutes (not sure how to do a while loop like this?) with a 5 second sleep at the end of each loop
use cron to start the shell script as 40 processes of the script and delay each iteration by 5 seconds, I am not sure how command line arguments would be passed across this and as I said above I might add some new command line arguments so that could cause issues (extra work)
Let me know if there is any info I have missed or would help and thanks for any help anyone can give.
#!/bin/bash
[ "$3" = "0" ] && { exit; }
RSSI_CSV=$1
DISTANCE=$2
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
sleep 5
N=$3
$0 $1 $2 $((N-1))
Run like this:
sh rssi_script.sh output.csv position 24
Inspired by PeterMmm's input I managed to get it to work, and in fact it works better for me to run N samples with a hard coded rest between them
#!/bin/bash
RSSI_CSV=$1
DISTANCE=$2
N=$3
while [ $N -ne 0 ]
do
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
sleep 2
((N--))
echo $N
done
exit

Resources