how to use GNU Time with pipeline - bash

I want to measure the running time of some SQL query in postgresql. Using BASH built-in time, I could do the following:
$ time (echo "SELECT * FROM sometable" | psql)
I like GNU time, which provides more formats. However I don't know how to do it with pipe line. For simplicity, I use ls | wc in the following examples:
$ /usr/bin/time -f "%es" (ls | wc)
-bash: syntax error near unexpected token `('
$ /usr/bin/time -f "%es" "ls | wc"
/usr/bin/time: cannot run ls | wc: No such file or directory
If I do not group the pipe in any way, it does not complains:
$ /usr/bin/time -f "%es" ls | wc
0.00s
But apparently, this only measure the first part of the pipe, as showing in the next example
$ /usr/bin/time -f "%es" ls | sleep 20
0.00s
So the question is what is the correct syntax for GNU Time with pipe line?

Call the shell from time:
/usr/bin/time -f "%es" bash -c "ls | wc"
Of course, this will include the shell start-up time as well; it shouldn't be too much, but if you're on a system that has a lightweight shell like dash (and it's sufficient to do what you need), then you could use that to minimize the start-up time overhead:
/usr/bin/time -f "%es" dash -c "ls | wc"
Another option would be to just time the command you are actually interested in, which is the psql command. time will pass its standard input to the program being executed, so you can run it on just one component of the pipeline:
echo "SELECT * FROM sometable" | /usr/bin/time -f "%es" psql

Create a script that calls your pipeline. Then
/usr/bin/time -f '%es' script.sh

Related

How to find the number of instances of current script running in bash?

I have the below code to find out the number of instances of current script running that is running with same arg1. But looks like the script creates a subshell and executes this command which also shows up in output. What would be the better approach to find the number of instances of running script ?
$cat test.sh
#!/bin/bash
num_inst=`ps -ef | grep $0 | grep $1 | wc -l`
echo $num_inst
$ps aux | grep test.sh | grep arg1 | grep -v grep | wc -l
0
$./test.sh arg1 arg2
3
$
I am looking for a solution that matches all running instance of ./test.sh arg1 arg2 not the one with ./test.sh arg10 arg20
The reason this creates a subshell is that there's a pipeline inside the command substitution. If you run ps -ef alone in a command substitution, and then separately process the output from that, you can avoid this problem:
#!/bin/bash
all_processes=$(ps -ef)
num_inst=$(echo "$all_processes" | grep "$0" | grep -c "$1")
echo "$num_inst"
I also did a bit of cleanup on the script: double-quote all variable references to avoid weird parsing, used $() instead of backticks, and replaced grep ... | wc -l with grep -c.
You might also replace the echo "$all_processes" | ... with ... <<<"$all_processes" and maybe the two greps with a single grep -c "$0 $1":
...
num_inst=$(grep -c "$0 $1" <<<"$all_processes")
...
Modify your script like this:
#!/bin/bash
ps -ef | grep $0 | wc -l
No need to store the value in a variable, the result is printed to standard out anyway.
Now why do you get 3?
When you run a command within back ticks (fyi you should use syntax num_inst=$( COMMAND ) and not back ticks), it creates a new sub-shell to run COMMAND, then assigns the stdout text to the variable. So if you remove the use of $(), you will get your expected value of 2.
To convince yourself of that, remove the | wc -l, you will see that num_inst has 3 processes, not 2. The third one exists only for the execution of COMMAND.

Watch with Process Substitution

I often run the command
squeue -u $USER | tee >(wc -l)
where squeue is a Slurm command to see how many jobs you are running. This gives me both the output from squeue and automatically tells how many lines are in it.
How can I watch this command?
watch -n.1 "squeue -u $USER | tee >(wc -l)" results in
Every 0.1s: squeue -u randoms | tee >(wc -l) Wed May 9 14:46:36 2018
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `squeue -u randoms | tee >(wc -l)'
From the watch man page:
Note that command is given to "sh -c" which means that you may need to use extra quoting to get the desired effect.
sh -c also does not support process substitution, the syntax you're using here as >().
Fortunately, that syntax isn't actually needed for what you're doing:
watch -n.1 'out=$(squeue -u "$USER"); echo "$out"; { echo "$out" | wc -l; }'
...or, if you really want to use your original code even at a heavy performance penalty (starting not just one but two new shells every tenth of a second -- first sh, and then bash):
bash_cmd() { squeue -u "$USER" | tee >(wc -l); } # create a function
export -f bash_cmd # export function to the environment
watch -n.1 'bash -c bash_cmd' # call function from bash started from sh started by watch

xargs output buffering -P parallel

I have a bash function that i call in parallel using xargs -P like so
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction #'
Everything works fine but output is messed up for obvious reasons (no buffering)
Trying to figure out a way to buffer output effectively. I was thinking I could use awk, but I'm not good enough to write such a script and I can't find anything worthwhile on google? Can someone help me write this "output buffer" in sed or awk? Nothing fancy, just accumulate output and spit it out after process terminates. I don't care the order that shell functions execute, just need their output buffered... Something like:
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction # | sed -u ""'
P.s. I tried to use stdbuf as per
https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe but did not work, i specified buffering on o and e but output still unbuffered:
echo ${list} | xargs -n 1 -P 24 -I# stdbuf -i0 -oL -eL bash -l -c 'myAwesomeShellFunction #'
Here's my first attempt, this only captures first line of output:
$ bash -c "echo stuff;sleep 3; echo more stuff" | awk '{while (( getline line) > 0 )print "got ",$line;}'
$ got stuff
This isn't quite atomic if your output is longer than a page (4kb typically), but for most cases it'll do:
xargs -P 24 bash -c 'for arg; do printf "%s\n" "$(myAwesomeShellFunction "$arg")"; done' _
The magic here is the command substitution: $(...) creates a subshell (a fork()ed-off copy of your shell), runs the code ... in it, and then reads that in to be substituted into the relevant position in the outer script.
Note that we don't need -n 1 (if you're dealing with a large number of arguments -- for a small number it may improve parallelization), since we're iterating over as many arguments as each of your 24 parallel bash instances is passed.
If you want to make it truly atomic, you can do that with a lockfile:
# generate a lockfile, arrange for it to be deleted when this shell exits
lockfile=$(mktemp -t lock.XXXXXX); export lockfile
trap 'rm -f "$lockfile"' 0
xargs -P 24 bash -c '
for arg; do
{
output=$(myAwesomeShellFunction "$arg")
flock -x 99
printf "%s\n" "$output"
} 99>"$lockfile"
done
' _

Catch output of several piped commands

Till today I was always able to find answer for my all bash questions. But now I stuck. I am testing 20TB RAID6 configuration working on LSI 9265.
I wrote script to create files from /dev/urandom and I am creating second to calculate md5 from all files with two addons.
One is to use time command to calculate md5sum execution time
Second is use pv command to show progress of each md5sum command
My command looks like this:
filename="2017-03-13_12-38-08"
/usr/bin/time -f "real read %E" pv $filename | md5sum | sed "s/-/$filename /"
This is example terminal printout:
/usr/bin/time -f "real read %E" pv $i | md5sum | sed "s/-/$i/"
1GiB 0:00:01 [ 551MiB/s] [==================================================================================================>] 100%
real read 0:01.85
f561af8cc0927967c440fe2b39db894a 2017-03-13_12-38-08
And I want to log it to file. I failed all tries using 2>&1, using tee, using brackets. I know pv uses stdErr but this doesnt help in finding solution. I can only catch "f561af8cc0927967c440fe2b39db894a 2017-03-13_12-38-08_done"
which is not enough.
This is the solution:
(time pv -f $filename | md5sum | sed "s/-/$filename/") 2>&1 | tee output.log
or equivalent but without printing into terminal only file to output.log
(time pv -f $filename | md5sum | sed "s/-/$filename/") > output.log 2>&1

Writing a script that runs a command and times it

I'm trying to write a script that will run and time a given and output that to a file in a .csv format.
So far from looking at SO previous posts, I've found that sh -c "$index_of_command_arg" can be used to invoke that command.
I'm also familiar with time and I know that people use /usr/bin/time for formatting, but I need to format the time given in total seconds (for example, 1.34516) but the only given option to format the real time is %E which return [hours:]minutes:seconds. Is there any way to format it the way I need?
The general idea of my script is:
# ----
# some input validation
# ----
rule=$1
command=$2
execution_time=/usr/bin/time -f "%total_seconds" sh -c "$command" #is this line possible?
echo "$rule,$execution_time" > output_file.csv
Can this be formatted the way I want? and also, the line with the comment after it,
Will this even work the way I wrote it? is the syntax correct?
Say I use the normal time and I get the real 0m2.003 ...
output, how can I take the 2.003 out of it?
The normal time you are mentioning is the bash built-in time. From the Bash Reference Manual:
The TIMEFORMAT variable may be set to a format string
that specifies how the timing information should be displayed.
…
%[p][l]R
The elapsed time in seconds.
So, you could use
execution_time=`TIMEFORMAT=%R bash -c "time $command" 2>&1 >/dev/tty`
You replace:
execution_time=/usr/bin/time -f "%total_seconds" sh -c "$command" `
with:
execution_time=`(time sh -c "$command > /dev/null 2>&1") 2>&1 | grep real |sed "s/.*m//;s/s.*//;"`
or with:
execution_time=`(time sh -c "$command > /dev/null 2>&1") 2>&1 | grep real | cut -c 8-12`
Why > /dev/null - you don't want to see $command output 2>&1 (STDERR > STDOUT)
Why 2>&1 for time command (for the same reason as in previous bullet)

Resources