shell script stops proceeding inside the loop - bash

Using bash version 5, I have a loop that goes through a list, and uses each item in that list in a find command inside the loop, and then append that find result to an empty array.
This code however stops at saving 1st result successfully in pid and script exits. I have tried trap and set -x with no progress.
Anyone understands what's happening here ?
#!/bin/bash
set -e
set -o pipefail
set -vx
## run cleanup on signal 1, 2, 3, 6
trap cleanup 1 2 3 6
cleanup()
{
echo "Caught Signal $? ... cleaning up."
}
r=$(cat /proc/net/tcp |grep " 01 " |awk '{print $10}')
inodes=()
inodes+=($r)
pattern="^.*(proc)\/\K([a-zA-Z0-9_-]*)(?=\/[a-zA-Z0-9]*)?"
con_pids=()
for inode in ${inodes[*]}
do
echo inode number is: $inode
pid=$(find /proc/* -type l -ls 2>/dev/null |grep "socket:\[$inode\]" | grep -m1 -oP $pattern)
echo "code didn't stop"
# echo $?
con_pids+=($pid)
#con_pids=(${con_pids[*]} "$pid")
done
echo ${con_pids[*]}

Related

wait doesn't wait for the processes in the while loop to finish

Here is my code:
count=0
head -n 10 urls.txt | while read LINE; do
curl -o /dev/null -s "$LINE" -w "%{time_total}\n" &
count=$((count+1))
[ 0 -eq $((count % 3)) ] && wait && echo "process wait" # wait for 3 urls
done
echo "before wait"
wait
echo "after wait"
I am expecting the last curl to finish before printing the last echo, but actually it's not the case:
0.595499
0.602349
0.618237
process wait
0.084970
0.084243
0.099969
process wait
0.067999
0.068253
0.081602
process wait
before wait
after wait
➜ Downloads 0.088755 # already exited the script
Does anyone know why it's happening? And how to fix this?
As described in BashFAQ #24, this is caused by your pipeline causing the while loop to be performed in a different shell from the rest of your script.
Consequently, your curls are subprocesses of that subshell, not the outer interpreter; so the outer interpreter cannot wait for them.
This can be resolved by not piping to while read, but instead redirecting its input in a way that doesn't shuffle it into a pipeline element -- as with <(...), a process substitution:
#!/usr/bin/env bash
# ^^^^ - NOT /bin/sh; also, must not start with "sh scriptname"
count=0
while IFS= read -r line; do
curl -o /dev/null -s "$line" -w "%{time_total}\n" &
count=$((count+1))
(( count % 3 == 0 )) && { wait; echo "process wait"; } # wait for 3 urls
done < <(head -n 10 urls.txt)
echo "before wait"
wait
echo "after wait"
why it's happening?
Because you run the processes in the subshell, the parent process can't wait for them.
$ echo | { echo subshell; sleep 100 & }
$ wait # exits immiedately
$
Call wait from the same process the background processes were spawned:
someotherthing | {
while someotherthing; do
something &
done
wait # will wait for something
}
And how to fix this?
I recommend not to use a crude while read loop and use different approach using some tool. Use GNU xargs with -P option to run 3 processes concurently:
head -n 10 urls.txt | xargs -P3 -n1 -d '\n' curl -o /dev/null -w "%{time_total}\n" -s
But you could just use move wait into the subshell as above, or make the while loop to be executed in the parent shell alternatively.

Bash Script - Will not completely execute

I am writing a script that will take in 3 outputs and then search all files within a predefined path. However, my grep command seems to be breaking the script with error code 123. I have been staring at it for a while and cannot really seem the error so I was hoping someone could point out my error. Here is the code:
#! /bin/bash -e
#Check if path exists
if [ -z $ARCHIVE ]; then
echo "ARCHIVE NOT SET, PLEASE SET TO PROCEED."
echo "EXITING...."
exit 1
elif [ $# -ne 3 ]; then
echo "Illegal number of arguments"
echo "Please enter the date in yyyy mm dd"
echo "EXITING..."
exit 1
fi
filename=output.txt
#Simple signal handler
signal_handler()
{
echo ""
echo "Process killed or interrupted"
echo "Cleaning up files..."
rm -f out
echo "Finsihed"
exit 1
}
trap 'signal_handler' KILL
trap 'signal_handler' TERM
trap 'signal_handler' INT
echo "line 32"
echo $1 $2 $3
#Search for the TimeStamp field and replace the / and : characters
find $ARCHIVE | xargs grep -l "TimeStamp: $2/$3/$1"
echo "line 35"
fileSize=`wc -c out.txt | cut -f 1 -d ' '`
echo $fileSize
if [ $fileSize -ge 1 ]; then
echo "no"
xargs -n1 basename < $filename
else
echo "NO FILES EXIST"
fi
I added the echo's to know where it was breaking. My program prints out line 32 and the args but never line 35. When I check the exit code I get 123.
Thanks!
Notes:
ARCHIVE is set to a test directory, i.e. /home/'uname'/testDir
$1 $2 $3 == yyyy mm dd (ie a date)
In testDir there are N number of directories. Inside these directories there are data files that have contain data as well as a time tag. The time tag is of the following format: TimeStamp: 02/02/2004 at 20:38:01
The scripts goal is to find all files that have the date tag you are searching for.
Here's a simpler test case that demonstrates your problem:
#!/bin/bash -e
echo "This prints"
true | xargs false
echo "This does not"
The snippet exits with code 123.
The problem is that xargs exits with code 123 if any command fails. When xargs exits with non-zero status, -e causes the script to exit.
The quickest fix is to use || true to effectively ignore xargs' status:
#!/bin/bash -e
echo "This prints"
true | xargs false || true
echo "This now prints too"
The better fix is to not rely on -e, since this option is misleading and unpredictable.
xargs makes the error code 123 when grep returns a nonzero code even just once. Since you're using -e (#!/bin/bash -e), bash would exit the script when one of its commands return a nonzero exit code. Not using -e would allow your code to continue. Just disabling it on that part can be a solution too:
set +e ## Disable
find "$ARCHIVE" | xargs grep -l "TimeStamp: $2/$1/$3" ## If one of the files doesn't match the pattern, `grep` would return a nonzero code.
set -e ## Enable again.
Consider placing your variables around quotes to prevent word splitting as well like "$ARCHIVE".
-d '\n' may also be required if one of your files' filename contain spaces.
find "$ARCHIVE" | xargs -d '\n' grep -l "TimeStamp: $2/$1/$3"

Timeout for grep in bash script

I need to grep for a specific string in a specific time on an input:
trap "kill 0" EXIT SIGINT SIGTERM
RESULT=$(adb logcat MyTag:V *:S | grep -m 1 "Hello World") &
sleep 10
if [ "$RESULT" = "" ]; then
echo "Timeout!"
else
echo "found"
fi
with the trap the subshell gets killed correctly but i see that the grep does not work anymore now. adb logcat is the only process running in the subshell, when executing the script
You could open a file descriptor as input with process substitution. After n seconds you could read the result.
{
sleep 10
IFS= read -rd '' -u 4 RESULT
if [ "$RESULT" = "" ]; then
echo "Timeout!"
else
echo "found"
fi
} 4< <(adb logcat MyTag:V *:S | grep -m 1 "Hello World")
You could also use exec to keep it not just within {}.
Although I still wonder why you have to place the command in the background and use sleep to wait for it. It runs with a subshell so the value saved in RESULT would always be lost.
RESULT=$(adb logcat MyTag:V *:S | grep -m 1 "Hello World") &
sleep 10
RESULT=$(adb logcat MyTag:V *:S | grep -m 1 "Hello World")
Can you kill the subshell using $! (the pid of the last process invoked)

Determining if process is running using pgrep

I have a script that I only want to be running one time. If the script gets called a second time I'm having it check to see if a lockfile exists. If the lockfile exists then I want to see if the process is actually running.
I've been messing around with pgrep but am not getting the expected results:
#!/bin/bash
COUNT=$(pgrep $(basename $0) | wc -l)
PSTREE=$(pgrep $(basename $0) ; pstree -p $$)
echo "###"
echo $COUNT
echo $PSTREE
echo "###"
echo "$(basename $0) :" `pgrep -d, $(basename $0)`
echo sleeping.....
sleep 10
The results I'm getting are:
$ ./test.sh
###
2
2581 2587 test.sh(2581)---test.sh(2587)---pstree(2591)
###
test.sh : 2581
sleeping.....
I don't understand why I'm getting a "2" when only one process is actually running.
Any ideas? I'm sure it's the way I'm calling it. I've tried a number of different combinations and can't quite seem to figure it out.
SOLUTION:
What I ended up doing was doing this (portion of my script):
function check_lockfile {
# Check for previous lockfiles
if [ -e $LOCKFILE ]
then
echo "Lockfile $LOCKFILE already exists. Checking to see if process is actually running...." >> $LOGFILE 2>&1
# is it running?
if [ $(ps -elf | grep $(cat $LOCKFILE) | grep $(basename $0) | wc -l) -gt 0 ]
then
abort "ERROR! - Process is already running at PID: $(cat $LOCKFILE). Exitting..."
else
echo "Process is not running. Removing $LOCKFILE" >> $LOGFILE 2>&1
rm -f $LOCKFILE
fi
else
echo "Lockfile $LOCKFILE does not exist." >> $LOGFILE 2>&1
fi
}
function create_lockfile {
# Check for previous lockfile
check_lockfile
#Create lockfile with the contents of the PID
echo "Creating lockfile with PID:" $$ >> $LOGFILE 2>&1
echo -n $$ > $LOCKFILE
echo "" >> $LOGFILE 2>&1
}
# Acquire lock file
create_lockfile >> $LOGFILE 2>&1 \
|| echo "ERROR! - Failed to acquire lock!"
The argument for pgrep is an extended regular expression pattern.
In you case the command pgrep $(basename $0) will evaluate to pgrep test.sh which will match match any process that has test followed by any character and lastly followed by sh. So it wil match btest8sh, atest_shell etc.
You should create a lock file. If the lock file exists program should exit.
lock=$(basename $0).lock
if [ -e $lock ]
then
echo Process is already running with PID=`cat $lock`
exit
else
echo $$ > $lock
fi
You are already opening a lock file. Use it to make your life easier.
Write the process id to the lock file. When you see the lock file exists, read it to see what process id it is supposedly locking, and check to see if that process is still running.
Then in version 2, you can also write program name, program arguments, program start time, etc. to guard against the case where a new process starts with the same process id.
Put this near the top of your script...
pid=$$
script=$(basename $0)
guard="/tmp/$script-$(id -nu).pid"
if test -f $guard ; then
echo >&2 "ERROR: Script already runs... own PID=$pid"
ps auxw | grep $script | grep -v grep >&2
exit 1
fi
trap "rm -f $guard" EXIT
echo $pid >$guard
And yes, there IS a small window for a race condition between the test and echo commands, which can be fixed by appending to the guard file, and then checking that the first line is indeed our own PID. Also, the diagnostic output in the if can be commented out in a production version.

Run bash commands in parallel, track results and count

I was wondering how, if possible, I can create a simple job management in BASH to process several commands in parallel. That is, I have a big list of commands to run, and I'd like to have two of them running at any given time.
I know quite a bit about bash, so here are the requirements that make it tricky:
The commands have variable running time so I can't just spawn 2, wait, and then continue with the next two. As soon as one command is done a next command must be run.
The controlling process needs to know the exit code of each command so that it can keep a total of how many failed
I'm thinking somehow I can use trap but I don't see an easy way to get the exit value of a child inside the handler.
So, any ideas on how this can be done?
Well, here is some proof of concept code that should probably work, but it breaks bash: invalid command lines generated, hanging, and sometimes a core dump.
# need monitor mode for trap CHLD to work
set -m
# store the PIDs of the children being watched
declare -a child_pids
function child_done
{
echo "Child $1 result = $2"
}
function check_pid
{
# check if running
kill -s 0 $1
if [ $? == 0 ]; then
child_pids=("${child_pids[#]}" "$1")
else
wait $1
ret=$?
child_done $1 $ret
fi
}
# check by copying pids, clearing list and then checking each, check_pid
# will add back to the list if it is still running
function check_done
{
to_check=("${child_pids[#]}")
child_pids=()
for ((i=0;$i<${#to_check};i++)); do
check_pid ${to_check[$i]}
done
}
function run_command
{
"$#" &
pid=$!
# check this pid now (this will add to the child_pids list if still running)
check_pid $pid
}
# run check on all pids anytime some child exits
trap 'check_done' CHLD
# test
for ((tl=0;tl<10;tl++)); do
run_command bash -c "echo FAIL; sleep 1; exit 1;"
run_command bash -c "echo OKAY;"
done
# wait for all children to be done
wait
Note that this isn't what I ultimately want, but would be groundwork to getting what I want.
Followup: I've implemented a system to do this in Python. So anybody using Python for scripting can have the above functionality. Refer to shelljob
GNU Parallel is awesomesauce:
$ parallel -j2 < commands.txt
$ echo $?
It will set the exit status to the number of commands that failed. If you have more than 253 commands, check out --joblog. If you don't know all the commands up front, check out --bg.
Can I persuade you to use make? This has the advantage that you can tell it how many commands to run in parallel (modify the -j number)
echo -e ".PHONY: c1 c2 c3 c4\nall: c1 c2 c3 c4\nc1:\n\tsleep 2; echo c1\nc2:\n\tsleep 2; echo c2\nc3:\n\tsleep 2; echo c3\nc4:\n\tsleep 2; echo c4" | make -f - -j2
Stick it in a Makefile and it will be much more readable
.PHONY: c1 c2 c3 c4
all: c1 c2 c3 c4
c1:
sleep 2; echo c1
c2:
sleep 2; echo c2
c3:
sleep 2; echo c3
c4:
sleep 2; echo c4
Beware, those are not spaces at the beginning of the lines, they're a TAB, so a cut and paste won't work here.
Put an "#" infront of each command if you don't the command echoed. e.g.:
#sleep 2; echo c1
This would stop on the first command that failed. If you need a count of the failures you'd need to engineer that in the makefile somehow. Perhaps something like
command || echo F >> failed
Then check the length of failed.
The problem you have is that you cannot wait for one of multiple background processes to complete. If you observe job status (using jobs) then finished background jobs are removed from the job list. You need another mechanism to determine whether a background job has finished.
The following example uses starts to background processes (sleeps). It then loops using ps to see if they are still running. If not it uses wait to gather the exit code and starts a new background process.
#!/bin/bash
sleep 3 &
pid1=$!
sleep 6 &
pid2=$!
while ( true ) do
running1=`ps -p $pid1 --no-headers | wc -l`
if [ $running1 == 0 ]
then
wait $pid1
echo process 1 finished with exit code $?
sleep 3 &
pid1=$!
else
echo process 1 running
fi
running2=`ps -p $pid2 --no-headers | wc -l`
if [ $running2 == 0 ]
then
wait $pid2
echo process 2 finished with exit code $?
sleep 6 &
pid2=$!
else
echo process 2 running
fi
sleep 1
done
Edit: Using SIGCHLD (without polling):
#!/bin/bash
set -bm
trap 'ChildFinished' SIGCHLD
function ChildFinished() {
running1=`ps -p $pid1 --no-headers | wc -l`
if [ $running1 == 0 ]
then
wait $pid1
echo process 1 finished with exit code $?
sleep 3 &
pid1=$!
else
echo process 1 running
fi
running2=`ps -p $pid2 --no-headers | wc -l`
if [ $running2 == 0 ]
then
wait $pid2
echo process 2 finished with exit code $?
sleep 6 &
pid2=$!
else
echo process 2 running
fi
sleep 1
}
sleep 3 &
pid1=$!
sleep 6 &
pid2=$!
sleep 1000d
I think the following example answers some of your questions, I am looking into the rest of question
(cat list1 list2 list3 | sort | uniq > list123) &
(cat list4 list5 list6 | sort | uniq > list456) &
from:
Running parallel processes in subshells
There is another package for debian systems named xjobs.
You might want to check it out:
http://packages.debian.org/wheezy/xjobs
If you cannot install parallel for some reason this will work in plain shell or bash
# String to detect failure in subprocess
FAIL_STR=failed_cmd
result=$(
(false || echo ${FAIL_STR}1) &
(true || echo ${FAIL_STR}2) &
(false || echo ${FAIL_STR}3)
)
wait
if [[ ${result} == *"$FAIL_STR"* ]]; then
failure=`echo ${result} | grep -E -o "$FAIL_STR[^[:space:]]+"`
echo The following commands failed:
echo "${failure}"
echo See above output of these commands for details.
exit 1
fi
Where true & false are placeholders for your commands. You can also echo $? along with the FAIL_STR to get the command status.
Yet another bash only example for your interest. Of course, prefer the use of GNU parallel, which will offer much more features out of the box.
This solution involve tmp file output creation for collecting of job status.
We use /tmp/${$}_ as temporary file prefix $$ is the actual parent process number and it is the same for all the script execution.
First, the loop for starting parallel job by batch. The batch size is set using max_parrallel_connection. try_connect_DB() is a slow bash function in the same file. Here we collect stdout + stderr 2>&1 for failure diagnostic.
nb_project=$(echo "$projects" | wc -w)
i=0
parrallel_connection=0
max_parrallel_connection=10
for p in $projects
do
i=$((i+1))
parrallel_connection=$((parrallel_connection+1))
try_connect_DB $p "$USERNAME" "$pass" > /tmp/${$}_${p}.out 2>&1 &
if [[ $parrallel_connection -ge $max_parrallel_connection ]]
then
echo -n " ... ($i/$nb_project)"
wait
parrallel_connection=0
fi
done
if [[ $nb_project -gt $max_parrallel_connection ]]
then
# final new line
echo
fi
# wait for all remaining jobs
wait
After run all jobs is finished review all results:
SQL_connection_failed is our convention of error, outputed by try_connect_DB() you may filter job success or failure the way that most suite your need.
Here we decided to only output failed results in order to reduce the amount of output on large sized jobs. Especially if most of them, or all, passed successfully.
# displaying result that failed
file_with_failure=$(grep -l SQL_connection_failed /tmp/${$}_*.out)
if [[ -n $file_with_failure ]]
then
nb_failed=$(wc -l <<< "$file_with_failure")
# we will collect DB name from our output file naming convention, for post treatment
db_names=""
echo "=========== failed connections : $nb_failed/$nb_project"
for failure in $file_with_failure
do
echo "============ $failure"
cat $failure
db_names+=" $(basename $failure | sed -e 's/^[0-9]\+_\([^.]\+\)\.out/\1/')"
done
echo "$db_names"
ret=1
else
echo "all tests passed"
ret=0
fi
# temporary files cleanup, could be kept is case of error, adapt to suit your needs.
rm /tmp/${$}_*.out
exit $ret

Resources