How to time a bunch of things in a bash script - bash

I want to wrote a script that runs a script/service (curls from a service) with different parameters. Now, I want to time each query and store it in a file? How can I do this?
#! /bin/bash
input="/home/ubuntu/flowers"
while IFS= read -r line
do
time myservice 'get?flower="$line"'
done < "$input"
I also tried :
cat flowers | xargs -I {} time myservice "get?flower={}" | jq -c '.[] | {flower}'
My output looks something like
/usr/local/lib/python2.7/dist-packages/gevent/builtins.py:96: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
result = _import(*args, **kwargs)
{"flower":"daffodil"}
{"flower":"daffodil"}
{"flower":"daffodil"}
{"flower":"daffodil"}
0.47user 0.07system 0:10.49elapsed 5%CPU (0avgtext+0avgdata 65432maxresident)k
OR contains stuff like
Myservice 10.#.#.#:7092 returned bad json for get?flower=lilly
Myservice 10.#.#.#:7092 returned bad json for get?flower=lilly
Myservice 10.#.#.#:7092 returned bad json for get?flower=lilly
Failed to connect (or too slow) on 10.#.#.#2:7092 timed out
Timeout Error: ('10.#.#.#', 7092)
Failed to connect (or too slow) on 10.#.#.#:7092
Timeout Error: ('10.#.#.#', 7092)
Failed to connect (or too slow) on 10.#.#.#:7092
Timeout Error: ('10.#.#.#', 7092)
which I would like to skip.
I know I can do a clean up later if there isn't a simple way to do this.
I want a file that is something like
lilly 0.91
hibiscus 0.93
Where the number is the time on the userside.

If all you are looking for is the amount of time each query takes, and you don't care about the output from myservice, then you can redirect it to /dev/null and ignore it.
Measuring the time is a little more tricky. You cannot redirect output from the time command to a different place than the command it is running. So it is better to use other approaches. Bash has an internal variable "SECONDS" that can be used to measure elapsed time, but I think you want more granularity than that. So you should use the 'date' command instead.
You will also need to use bc (or similar) to do floating point arithmetic.
Also, if the myservice command handles failures correctly (i.e. returns a non-zero value upon failure), then you can also handle failures cleanly.
#!/bin/bash
input_file="/home/ubuntu/flowers"
while IFS= read -r line; do
start_time=$(date +%s.%N) #
myservice 'get?flower="$line"' > /dev/null 2>&1
return_value=$?
end_time=$(date +%s.%N)
elapsed_time=$(echo "scale=3; ${end_time} - ${start_time}" | bc)
if [ ${return_value} -eq 0 ]; then
echo "${line}: ${elapsed_time}"
else
echo "${line}: Failed"
fi
done < "${input_file}"
The %s.%3N format string in the date commands means:
%s Seconds since 1 Jan 1970
. The '.' character
%N Nanoseconds
The scale=3 input to the bc command tells it to output 3 decimal places.

Related

Use curl and loop in shell script to through different jq values

I am using a curl command to get json data from an application called "Jira".
Stupidly (in my view), you cannot use the api to return more than 50 values at a time.
The only choice is to do it in multiple commands and they call this "pagination". It is not possible to get more than 50 results, no matter the command.
This is the command here:
curl -i -X GET 'https://account_name.atlassian.net/rest/api/3/project/search?jql=ORDER%20BY%20Created&maxResults=50&startAt=100' --user 'scouse_bob#mycompany.com:<sec_token_deets>'
This is the key piece of what I am trying to work into a loop to avoid having to do this manually each time:
startAt=100
My goal is to "somehow" have this loop in blocks of fifty, so, startAt=50 then startAt=100, startAt=150 etc and append the entire output to a file until the figure 650 is reached and / or there is no further output available.
I have played around with a command like this:
#!/bin/ksh
i=1
while [[ $i -lt 1000 ]] ; do
curl -i -X GET 'https://account_name.atlassian.net/rest/api/3/project/search?jql=ORDER%20BY%20Created&maxResults=50&startAt=100' --user 'scouse_bob#mycompany.com:<sec_token_deets>'
echo "$i"
(( i += 1 ))
done
Which does not really get me far as although it will loop, I am uncertain as to how to apply the variable.
Help appreciated.
My goal is to "somehow" have this loop in blocks of fifty, so, startAt=50 then startAt=100, startAt=150 etc and append the entire output to a file until the figure 650 is reached and / or there is no further output available.
The former is easy:
i=0
while [[ $i -lt 650 ]]; do
# if you meant until 650 inclusive, change to -le 650 or -lt 700
curl "https://host/path?blah&startAt=$i"
# pipe to/through some processing if desired
# note URL is in " so $i is expanded but
# other special chars like & don't screw up parsing
# also -X GET is the default (without -d or similar) and can be omitted
(( i+=50 ))
done
The latter depends on just what 'no further output available' looks like. I'd expect you probably don't get an HTTP error, but either a contenttype indicating error or a JSON containing either an end or error indication or a no-data indication. How to recognize this depends on what you get, and I don't know this API. I'll guess you probably want something more or less like:
curl ... >tmpfile
if jq -e '.eof==true' tmpfile; then break; else cat/whatever tmpfile; fi
# or
if jq -e '.data|length==0' tmpfile; then break; else cat/whatever tmpfile; fi
where tmpfile is some suitable filename that won't conflict with your other files; the most general way is to use $(mktemp) (saved in a variable). Or instead of a file put the data in a variable var=$(curl ...) and then use <<<$var as input to anything that reads stdin.
EDIT: I meant to make this CW to make it easier for anyone to add/fix the API specifics, but forgot; instead I encourage anyone who knows to edit.
You may want to stop when you get partial output i.e. if you ask for 50 and get 37, it may mean there is no more after those 37 and you don't need to try the next batch. Again this depends on the API which I don't know.

Bash: getting keyboard input with timeout

I have a script that aims to find out which key is pressed. The problem is that I don't manage it quickly because I need the timeout in the fifth digit that makes it not react quickly, or not react at all.
#!/bin/bash
sudo echo Start
while true
do
file_content=$(sudo timeout 0.5s cat /dev/input/event12 | hexdump)
content_split=$(echo $file_content | tr " " "\n")
word_counter=0
for option in $content_split
do
word_counter=$((word_counter+1))
if [ $word_counter -eq 25 ]
then
case $option in
"0039")echo "<space>";;
"001c")echo "<return>";;
"001e")echo "a";;
"0030")echo "b";;
"002e")echo "c";;
"0020")echo "d";;
"0012")echo "e";;
"0021")echo "f";;
"0022")echo "g";;
"0023")echo "h";;
"0017")echo "i";;
"0024")echo "j";;
"0025")echo "k";;
"0026")echo "l";;
"0032")echo "m";;
"0031")echo "n";;
"0018")echo "o";;
"0019")echo "p";;
"0010")echo "q";;
"0013")echo "r";;
"001f")echo "s";;
"0014")echo "t";;
"0016")echo "u";;
"002f")echo "v";;
"0011")echo "w";;
"002d")echo "x";;
"002c")echo "y";;
"0015")echo "z";;
esac
fi
done
done
Do not run cat in a timeout in a loop - that's just an invalid way to look at the problem. No matter how "fast" your program runs it will always miss some events that way. Overall polling approach is just invalid here.
The parsing and linux philosophy is build around streams that transfers bytes. The always available streams are stdin, stdout and stderr. They allow to pass data from one context to another. The shell most common | operator allows to bind together output from one program to input of another program - and this is the way™ you should work in shell. Shell primary use is to "connect" programs together.
So you could do:
# read from the input
sudo cat /dev/input/mouse1 |
# transform input to hex data one byte at at time so
# we could parse it in **shell**
xxd -c1 -p |
# read one byte and parse it in **shell**
while IFS= read -r line; do
: parse line
done
but shell is very slow and while read is very slow. If you want speed (and events from input are going to be fast) do not use shell and use a good programming language - python, perl, ruby, at least awk - these are common scripting languages. The case $option in construct looks like a mapping from hex values to output strings. I could see:
# save one `cat` process by just calling xxd
sudo xxd -c1 -p dev/input/mouse1 |
awk 'BEGIN{
map["1c"]="<return>";
map["1e"]="a";
# etc. add all other cases to mapping
}
{ if ($0 in a) print map[$0] }
'

Parse SLURM job wall time to bash variables

With SLURM, I run the command
squeue -u enter_username
and I get an table output with the following heading
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
I'm trying to capture the duration of time the job has been running for. I couldn't find an environmental variable provided by SLURM to capture this time, so I think I'm left parsing the output of squeue. This is not as easy as I thought it would be, because the wall clock does not have a fixed format. In other words, it doesn't always show dd-HH:MM:SS. If there are no days, then the output is just HH:MM:SS, and if there are no hours the output is MM:SS, etc.
I'm doing this with bash and I need to capture the day (dd) and hour (HH) and assign each of them to a variable. I'm having a hard time doing this when the format is dynamic.
To capture the time entry, I'm simply doing the following (within a SLURM bash script)
time_str=$(squeue -u enter_username | grep "$SLURM_JOB_ID" | cut -d "R" -f2- | gawk '{print $1}')
As I said before, time_str does not have a fixed format. Hoping someone with experience can help.
From reading the man page of the squeue command, it seems that you can simplify the problem by having squeue only output the information you need:
squeue -h -j ${jobid} -O timeused
Then your task is simply to parse that output, which can be done as follows:
#!/bin/bash
line="-$(squeue -h -j ${jobid} -O timeused)" # Leading '-' aids parsing.
parts=( 0 0 0 0 )
index=3
while [ ${#line} -gt 0 ]; do
parts[${index}]=${line##*[-:]}
line=${line%[-:]*}
((index--))
done
Now the array ${parts[*]} contains exactly 4 elements, 0 to 3, representing days, hours, minutes and seconds respectively.
The solution presented by #Andrew Vickers above works as expected. However, I took this one step further to enforce a fixed 2 digit format
index=0
while [ ${index} -lt 4 ]; do
if [ ${#parts[${index}]} -lt 2 ]; then
parts[${index}]="0${parts[${index}]}"
fi
((index++))
done
The conditional check could be incorporated into his answer, but his loop would need to be adjusted to format all variables.

Performance profiling tools for shell scripts

I'm attempting to speed up a collection of scripts that invoke subshells and do all sorts of things. I was wonder if there are any tools available to time the execution of a shell script and its nested shells and report on which parts of the script are the most expensive.
For example, if I had a script like the following.
#!/bin/bash
echo "hello"
echo $(date)
echo "goodbye"
I would like to know how long each of the three lines took. time will only only give me total time for the script. bash -x is interesting but does not include timestamps or other timing information.
You can set PS4 to show the time and line number. Doing this doesn't require installing any utilities and works without redirecting stderr to stdout.
For this script:
#!/bin/bash -x
# Note the -x flag above, it is required for this to work
PS4='+ $(date "+%s.%N ($LINENO) ")'
for i in {0..2}
do
echo $i
done
sleep 1
echo done
The output looks like:
+ PS4='+ $(date "+%s.%N ($LINENO) ")'
+ 1291311776.108610290 (3) for i in '{0..2}'
+ 1291311776.120680354 (5) echo 0
0
+ 1291311776.133917546 (3) for i in '{0..2}'
+ 1291311776.146386339 (5) echo 1
1
+ 1291311776.158646585 (3) for i in '{0..2}'
+ 1291311776.171003138 (5) echo 2
2
+ 1291311776.183450114 (7) sleep 1
+ 1291311777.203053652 (8) echo done
done
This assumes GNU date, but you can change the output specification to anything you like or whatever matches the version of date that you use.
Note: If you have an existing script that you want to do this with without modifying it, you can do this:
PS4='+ $(date "+%s.%N ($LINENO) ")' bash -x scriptname
In the upcoming Bash 5, you will be able to save forking date (but you get microseconds instead of nanoseconds):
PS4='+ $EPOCHREALTIME ($LINENO) '
You could pipe the output of running under -x through to something that timestamps each line when it is received. For example, tai64n from djb's daemontools.
At a basic example,
sh -x slow.sh 2>&1 | tai64n | tai64nlocal
This conflates stdout and stderr but it does give everything a timestamp.
You'd have to then analyze the output to find expensive lines and correlate that back to your source.
You might also conceivably find using strace helpful. For example,
strace -f -ttt -T -o /tmp/analysis.txt slow.sh
This will produce a very detailed report, with lots of timing information in /tmp/analysis.txt, but at a per-system call level, which might be too detailed.
Sounds like you want to time each echo. If echo is all that you're doing this is easy
alias echo='time echo'
If you're running other command this obviously won't be sufficient.
Updated
See enable_profiler/disable_profiler in
https://github.com/vlovich/bashrc-wrangler/blob/master/bash.d/000-setup
which is what I use now. I haven't tested on all version of BASH & specifically but if you have the ts utility installed it works very well with low overhead.
Old
My preferred approach is below. Reason is that it supports OSX as well (which doesn't have high precision date) & runs even if you don't have bc installed.
#!/bin/bash
_profiler_check_precision() {
if [ -z "$PROFILE_HIGH_PRECISION" ]; then
#debug "Precision of timer is unknown"
if which bc > /dev/null 2>&1 && date '+%s.%N' | grep -vq '\.N$'; then
PROFILE_HIGH_PRECISION=y
else
PROFILE_HIGH_PRECISION=n
fi
fi
}
_profiler_ts() {
_profiler_check_precision
if [ "y" = "$PROFILE_HIGH_PRECISION" ]; then
date '+%s.%N'
else
date '+%s'
fi
}
profile_mark() {
_PROF_START="$(_profiler_ts)"
}
profile_elapsed() {
_profiler_check_precision
local NOW="$(_profiler_ts)"
local ELAPSED=
if [ "y" = "$PROFILE_HIGH_PRECISION" ]; then
ELAPSED="$(echo "scale=10; $NOW - $_PROF_START" | bc | sed 's/\(\.[0-9]\{0,3\}\)[0-9]*$/\1/')"
else
ELAPSED=$((NOW - _PROF_START))
fi
echo "$ELAPSED"
}
do_something() {
local _PROF_START
profile_mark
sleep 10
echo "Took $(profile_elapsed) seconds"
}
Here's a simple method that works on almost every Unix and needs no special software:
enable shell tracing, e.g. with set -x
pipe the output of the script through logger:
sh -x ./slow_script 2>&1 | logger
This will writes the output to syslog, which automatically adds a time stamp to every message. If you use Linux with journald, you can get high-precision time stamps using
journalctl -o short-monotonic _COMM=logger
Many traditional syslog daemons also offer high precision time stamps (milliseconds should be sufficient for shell scripts).
Here's an example from a script that I was just profiling in this manner:
[1940949.100362] bremer root[16404]: + zcat /boot/symvers-5.3.18-57-default.gz
[1940949.111138] bremer root[16404]: + '[' -e /var/tmp/weak-modules2.OmYvUn/symvers-5.3.18-57-default ']'
[1940949.111315] bremer root[16404]: + args=(-E $tmpdir/symvers-$krel)
[1940949.111484] bremer root[16404]: ++ /usr/sbin/depmod -b / -ae -E /var/tmp/weak-modules2.OmYvUn/symvers-5.3.18-57-default 5.3.18-57>
[1940952.455272] bremer root[16404]: + output=
[1940952.455738] bremer root[16404]: + status=0
where you can see that the "depmod" command is taking a lot of time.
Copied from here:
Since I've ended up here at least twice now, I implemented a solution:
https://github.com/walles/shellprof
It runs your script, transparently clocks all lines printed, and at the end prints a top 10 list of the lines that were on screen the longest:
~/s/shellprof (master|✔) $ ./shellprof ./testcase.sh
quick
slow
quick
Timings for printed lines:
1.01s: slow
0.00s: <<<PROGRAM START>>>
0.00s: quick
0.00s: quick
~/s/shellprof (master|✔) $
I'm not aware of any shell profiling tools.
Historically one just rewrites too-slow shell scripts in Perl, Python, Ruby, or even C.
A less drastic idea would be to use a faster shell than bash. Dash and ash are available for all Unix-style systems and are typically quite a bit smaller and faster.

Custom format for time command

I'd like to use the time command in a bash script to calculate the elapsed time of the script and write that to a log file. I only need the real time, not the user and sys. Also need it in a decent format. e.g 00:00:00:00 (not like the standard output). I appreciate any advice.
The expected format supposed to be 00:00:00.0000 (milliseconds) [hours]:[minutes]:[seconds].[milliseconds]
I've already 3 scripts. I saw an example like this:
{ time { # section code goes here } } 2> timing.log
But I only need the real time, not the user and sys. Also need it in a decent format. e.g 00:00:00:00 (not like the standard output).
In other words, I'd like to know how to turn the time output into something easier to process.
You could use the date command to get the current time before and after performing the work to be timed and calculate the difference like this:
#!/bin/bash
# Get time as a UNIX timestamp (seconds elapsed since Jan 1, 1970 0:00 UTC)
T="$(date +%s)"
# Do some work here
sleep 2
T="$(($(date +%s)-T))"
echo "Time in seconds: ${T}"
printf "Pretty format: %02d:%02d:%02d:%02d\n" "$((T/86400))" "$((T/3600%24))" "$((T/60%60))" "$((T%60))""
Notes:
$((...)) can be used for basic arithmetic in bash – caution: do not put spaces before a minus - as this might be interpreted as a command-line option.
See also: http://tldp.org/LDP/abs/html/arithexp.html
EDIT:
Additionally, you may want to take a look at sed to search and extract substrings from the output generated by time.
EDIT:
Example for timing with milliseconds (actually nanoseconds but truncated to milliseconds here). Your version of date has to support the %N format and bash should support large numbers.
# UNIX timestamp concatenated with nanoseconds
T="$(date +%s%N)"
# Do some work here
sleep 2
# Time interval in nanoseconds
T="$(($(date +%s%N)-T))"
# Seconds
S="$((T/1000000000))"
# Milliseconds
M="$((T/1000000))"
echo "Time in nanoseconds: ${T}"
printf "Pretty format: %02d:%02d:%02d:%02d.%03d\n" "$((S/86400))" "$((S/3600%24))" "$((S/60%60))" "$((S%60))" "${M}"
DISCLAIMER:
My original version said
M="$((T%1000000000/1000000))"
but this was edited out because it apparently did not work for some people whereas the new version reportedly did. I did not approve of this because I think that you have to use the remainder only but was outvoted.
Choose whatever fits you.
To use the Bash builtin time rather than /bin/time you can set this variable:
TIMEFORMAT='%3R'
which will output the real time that looks like this:
5.009
or
65.233
The number specifies the precision and can range from 0 to 3 (the default).
You can use:
TIMEFORMAT='%3lR'
to get output that looks like:
3m10.022s
The l (ell) gives a long format.
From the man page for time:
There may be a shell built-in called time, avoid this by specifying /usr/bin/time
You can provide a format string and one of the format options is elapsed time - e.g. %E
/usr/bin/time -f'%E' $CMD
Example:
$ /usr/bin/time -f'%E' ls /tmp/mako/
res.py res.pyc
0:00.01
Use the bash built-in variable SECONDS. Each time you reference the variable it will return the elapsed time since the script invocation.
Example:
echo "Start $SECONDS"
sleep 10
echo "Middle $SECONDS"
sleep 10
echo "End $SECONDS"
Output:
Start 0
Middle 10
End 20
Not quite sure what you are asking, have you tried:
time yourscript | tail -n1 >log
Edit: ok, so you know how to get the times out and you just want to change the format. It would help if you described what format you want, but here are some things to try:
time -p script
This changes the output to one time per line in seconds with decimals. You only want the real time, not the other two so to get the number of seconds use:
time -p script | tail -n 3 | head -n 1
The accepted answer gives me this output
# bash date.sh
Time in seconds: 51
date.sh: line 12: unexpected EOF while looking for matching `"'
date.sh: line 21: syntax error: unexpected end of file
This is how I solved the issue
#!/bin/bash
date1=$(date --date 'now' +%s) #date since epoch in seconds at the start of script
somecommand
date2=$(date --date 'now' +%s) #date since epoch in seconds at the end of script
difference=$(echo "$((date2-$date1))") # difference between two values
date3=$(echo "scale=2 ; $difference/3600" | bc) # difference/3600 = seconds in hours
echo SCRIPT TOOK $date3 HRS TO COMPLETE # 3rd variable for a pretty output.

Resources