I've implemented parallel in one of our major scripts to perform data migrations between servers. Presently, the output is presented all at once (-u) in pretty colors, with periodic echos of status from the function being executed depending on which sequence is being run (e.g. 5/20: $username: rsyncing homedir or 5/20: $username: restoring account). These are all echoed directly to the terminal running the script, and accumulate there. Depending on the length of time a command is running, however, output can end up well out of order, and long running rsync commands can be lost in the shuffle. Butm I don't want to wait for long running processes to finish in order to get the output of following processes.
In short, my issue is keeping track of which arguments are being processed and are still running.
What I would like to do is send parallel into the background with (parallel args command {#} {} ::: $userlist) & and then track progress of each of the running functions. My initial thought was to use ps and grep liberally along with tput to rewrite the screen every few seconds. I usually run three jobs in parallel, so I want to have a screen that shows, for instance:
1/20: user1: syncing homedir
current file: /home/user1/www/cache/file12589015.php
12/20: user12: syncing homedir
current file: /home/user12/mail/joe/mailfile
5/20: user5: collecting information
current file:
I can certainly get the above status output together no problem, but my current hangup is separating the output from the individual parallel processes into three different... pipes? variables? files? so that it can be parsed into the above information.
Not sure if this is much better:
echo hello im starting now
sleep 1
# start parallel and send the job to the background
temp=$(mktemp -d)
parallel --rpl '{log} $_="Working on#arg"' -j3 background {} {#} ">$temp/{1log} 2>&1;rm $temp/{1log}" ::: foo bar baz foo bar baz one two three one two three :::+ 5 6 5 3 4 6 7 2 5 4 6 2 &
while kill -0 $! 2>/dev/null ; do
cd "$temp"
clear
tail -vn1 *
sleep 1
done
rm -rf "$temp"
It make a logfile for each job. Tails all logfiles every second and removes the logfile when a jobs is done.
The logfiles are named 'working on ...'.
I believe that this is close to what I need, though it isnt very tidy and probably isnt optimal:
#!/bin/bash
background() { #dummy load. $1 is text, $2 is number, $3 is position
echo $3: starting sleep...
sleep $2
echo $3: $1 slept for $2
}
progress() {
echo starting progress loop for pid $1...
while [ -d /proc/$1 ]; do
clear
tput cup 0 0
runningprocs=`ps faux | grep background | egrep -v '(parallel|grep)'`
numprocs=`echo "$runningprocs" | wc -l`
for each in `seq 1 ${numprocs}`; do
line=`echo "$runningprocs" | head -n${each} | tail -n1`
seq=`echo $line | rev | awk '{print $3}' | rev`
# print select elements from the ps output
echo working on `echo $line | rev | awk '{print $3, $4, $5}' | rev`
# print the last line of the log for that sequence number
cat logfile.log | grep ^$seq\: | tail -n1
echo
done
sleep 1
done
}
echo hello im starting now
sleep 1
export -f background
# start parallel and send the job to the background
parallel -u -j3 background {} {#} '>>' logfile.log ::: foo bar baz foo bar baz one two three one two three :::+ 5 6 5 3 4 6 7 2 5 4 6 2 &
pid=$!
progress $pid
echo finished!
I'd rather not depend on scraping all information from ps and would prefer to get the actual line output of each parallel process, but a guy's gotta do what a guy's gotta do. regular output sent to a logfile for parsing later on.
Related
This question already has answers here:
Why no output is shown when using grep twice?
(5 answers)
Closed 5 years ago.
When I use a single grep command, it processes and output the data live as it comes.
Here is my simple test file test.sh:
echo a
sleep 1
echo b
sleep 1
echo ab
sleep 1
echo ba
sleep 1
echo baba
I do the following:
sh test.sh | grep a
a
ab
ba
ab
ba
all good so fa. 'a' appears immediately, then 'ab', etc.
But when I pipe multiple grep commands like that
sh ./test.sh | grep a | grep b
ab
ba
baba
I only get the output at the end, not as it comes!
the terminal stays empty till the entire file is processed then outputs everything in one go.
Why is that?
How can i chain/cascade multiple greps without losing that 'process and output as it comes' property?
This is for greping and processing live huge logs with a lot of data where I only have a chance to save to disk the filtered version and not the huge raw ouput that would fill up the disk quite quickly.
There's an option called line-buffered:
Other Options
--line-buffered
Use line buffering on output. This can cause a performance penalty.
So:
sh ./test.sh | grep --line-buffered a | grep b
I am looking for a bash snippet for limiting the amount of console output from a shell command that could potentially become too verbose.
The purpose of this is for usage in build/CI environments where you do want to limit the amount out console output in order to prevent overloading the CI server (or even client tailing the output).
Full requirements:
display only up to 100 lines from the top (head) of the command output
display only up to 100 lines from the bottom (tail) of the command output
archive both stdout and stderr in full into a command.log.gz file
console output must be displayed relatively in realtime, a solution that output the result at the end is not acceptable as we need to be able to see its execution progress.
Current findings
unbuffer could be used to force the stdout/stderr to be unbuffered
|& tee can be used to send output to both archiver and tail/head
|& gzip --stdout >command.log.gz could archive the console output
head -n100 and tail -n100 can be used to limit the console output they introduce at least some problems like undesired results if number of output lines is under 200.
From what I understand you need to do limit output online (while it's being generated).
Here is a function that I can think of that would be useful for you.
limit_output() {
FullLogFile="./output.log" # Log file to keep the input content
typeset -i MAX=15 # number or lines from head, from tail
typeset -i LINES=0 # number of lines displayed
# tee will save the copy of the input into a log file
tee "$FullLogFile" | {
# The pipe will cause this part to be executed in a subshell
# The command keeps LINES from losing it's value before if
while read -r Line; do
if [[ $LINES -lt $MAX ]]; then
LINES=LINES+1
echo "$Line" # Display first few lines on screen
elif [[ $LINES -lt $(($MAX*2)) ]]; then
LINES=LINES+1 # Count the lines for a little longer
echo -n "." # Reduce line output to single dot
else
echo -n "." # Reduce line output to single dot
fi
done
echo "" # Finish with the dots
# Tail last few lines, not found in head and not more then max
if [[ $LINES -gt $MAX ]]; then
tail -n $(($LINES-$MAX)) "$FullLogFile"
fi
}
}
Use it in a script, load it to current shell or put it in .bash_profile to be loaded on user session.
Usage examples: cat /var/log/messages | limit_output or ./configure | limit_output
The function will read the standard input, save it to a log file, display the first MAX lines, then reduce each line to a single dot (.) on screen, then finally display the last MAX lines (or less if output was shorter then MAX*2).
Here is my current incomplete solution which for convenience is demonstrating processing a 10 lines output and that will (hopefully) limit the output to first 2 lines and last two lines.
#!/bin/bash
seq 10 | tee >(gzip --stdout >output.log.gz) | tail -n2
One way I use to achieve this is:
./configure | tee output.log | head -n 5; tail -n 2 output.log
What this does is:
Write the complete output to a filed called output.log using tee
Only print the first 5 lines using head -n
In the end print the last two lines from the written output.log using tail -n
How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!
You may use seq in iteration as well:
for i in `seq 1 100`; do ... done
for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.
The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s
You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.
This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'
I have seen many solutions to half my problem - running the script every 5 seconds.
In addition to this I also only want it to run for only 2 minutes.
The point of the script is to sample the RSSI at a certain position for a period of time:
#!/bin/bash
RSSI_CSV=$1
DISTANCE=$2
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
At the command line it is called with:
sh rssi_script.sh output.csv position
What would be the most robust solution to solve my problem?
Possibilities I have considered:
repeat the script 40 times within itself (measure the RSSI 40 times and output it to the CSV 40 times, the position will be the same throughout the experiment) This would also solve the problem of limiting the run to 2 minutes. I might add some new command line arguments which could be difficult to keep track of if I have to change 40 variants every time
use watch to sample every 5 seconds and cron to limit it to 2 minutes (not 100% cron can do this)
A while loop for 2 minutes (not sure how to do a while loop like this?) with a 5 second sleep at the end of each loop
use cron to start the shell script as 40 processes of the script and delay each iteration by 5 seconds, I am not sure how command line arguments would be passed across this and as I said above I might add some new command line arguments so that could cause issues (extra work)
Let me know if there is any info I have missed or would help and thanks for any help anyone can give.
#!/bin/bash
[ "$3" = "0" ] && { exit; }
RSSI_CSV=$1
DISTANCE=$2
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
sleep 5
N=$3
$0 $1 $2 $((N-1))
Run like this:
sh rssi_script.sh output.csv position 24
Inspired by PeterMmm's input I managed to get it to work, and in fact it works better for me to run N samples with a hard coded rest between them
#!/bin/bash
RSSI_CSV=$1
DISTANCE=$2
N=$3
while [ $N -ne 0 ]
do
RSSI=$(iwconfig wlan0 | awk -F'[ =]+' '/Signal level/ {print $7}\')
printf "$DISTANCE,$RSSI\n" >> $RSSI_CSV
sleep 2
((N--))
echo $N
done
exit
How do I run 100 iterations using a bash shell script? I want to know how long it will take to execute one command (start and end time). I want to keep track which iteration is currently running. I want to log each iteration. I have one automated script I need to run and log it.
for i in 1 2 3
do
command1
done
But I want to know how long it takes to complete one iteration - and write the information to a log file too!
You may use seq in iteration as well:
for i in `seq 1 100`; do ... done
for ((i = 1; i <= 100; i++)); do
echo "--- Iteration #$i: $(date) ---"
time command1
done 2>&1 | tee timing.log
There's a lot going on here. What's happening?
The for loop iterates from 1 to 100 using C-style syntax.
The $i in the echo printout prints the current iteration number.
$(date) inserts a timestamp into each printout.
The time command runs a command and prints how long it took to execute.
The output from everything inside of the loop is piped to tee, which saves a copy to timing.log.
The 2>&1 redirects stderr to stdout so that the log file will contain both regular output and error messages.
The following script shows one way to do it.
#!/usr/bin/bash
for i in {1..100} ; do
echo =============================
echo "Number $i: $(date +%Y-%m-%d-%H:%M:%S)"
( time ( echo $i ; sleep 1 ) ) 2>&1 | sed 's/^/ /'
done | tee timing.log
It uses the bash range feature to run 100 iterations of the loop, outputting the loop counter and date.
It then times your command (echo $i ; sleep 1 in this case) and combines standard output and error before nicely formatting it, and sending it to both the terminal and a log file for later analysis.
A smaple run with five iterations:
pax> testprog.sh
=============================
Number 1: 2010-09-16-13:44:19
1
real 0m1.063s
user 0m0.077s
sys 0m0.015s
=============================
Number 2: 2010-09-16-13:44:20
2
real 0m1.056s
user 0m0.030s
sys 0m0.046s
=============================
Number 3: 2010-09-16-13:44:21
3
real 0m1.057s
user 0m0.046s
sys 0m0.030s
=============================
Number 4: 2010-09-16-13:44:22
4
real 0m1.057s
user 0m0.061s
sys 0m0.031s
=============================
Number 5: 2010-09-16-13:44:23
5
real 0m1.057s
user 0m0.046s
sys 0m0.015s
You can try with:
for i in {1..100}; do time some_script.sh; done 2>&1 | grep ^real | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
Explanation
The for loop runs some_script.sh 100 times, measuring its execution time with time
The stderr of the for loop is redirected to stdout, this is to capture the output of time so we can grep it
grep ^real is to get only the lines starting with "real" in the output of time
sed is to delete the beginning of the line up to minutes part (in the output of time)
For each line, awk adds to the sum, so that in the end it can output the average, which is the total sum, divided by the number of input records (= NR)
Limitations
The snippet assumes that the running time of some_script.sh is less than 1 minute, otherwise it won't work at all. Depending on your system, the time builtin might work differently. Another alternative is to use the time command /usr/bin/time instead of the bash builtin.
Note: This script was extracted from here.
This script is based on the answer from #marian0, but there's no limitation to the running time. Name it timeit.sh and then do ./timeit.sh 10 ./script to run ./script 10 times and print the average time. Additional arguments can be added as desired.
#!/bin/bash
for i in `seq 1 $1`; do
time "${#:2}"
done 2>&1 |\
grep ^real |\
sed -r -e "s/.*real\t([[:digit:]]+)m([[:digit:]]+\.[[:digit:]]+)s/\1 \2/" |\
awk '{sum += $1 * 60 + $2} END {print sum / NR}'