Bash while loop stops after first iteration when subshell is called [duplicate] - bash

This question already has answers here:
While loop stops reading after the first line in Bash
(5 answers)
Closed 6 years ago.
This contrived bash script demonstrates the issue.
#!/bin/bash
while read -r node ; do
echo checking $node for Agent;
PID=$(ssh $node ""ps -edf | grep [j]ava | awk '{print $2}'"")
echo $PID got to here.
done < ~/agents_master.list
agents_master.list contains 1 server per line:
server1
server2
server3
Which only outputs the following:
checking server1 for Agent
Authorized use only
25176 got to here
Server 2 and 3 aren't even echoed out to screen by the line echo checking $node...
If I comment out the line PID=$(.... then the while completes the whole agents_master.list file correctly...
checking server1 for Agent
got to here
checking server2 for Agent
got to here
checking server3 for Agent
got to here
From the googling I've done, it sounds like this is related to the subshell that $(...) creates, but I don't understand why it is causing the loop to stop at the first server, server1.
Yes, this code could be re-written but I'm keen to understand this behaviour of bash and why this is happening for future.

The problem -- one of the problems -- is that ssh is forwarding stdin to the remote server. As it happens, the command you are running on the remote server (ps -edf, see below) doesn't use its standard input, but ssh will still forward what it reads, just in case. As a consequence, nothing is left for read to read, so the loop ends.
To avoid that, use ssh -n (or redirect input to /dev/null yourself, which is what the -n option does).
There are a couple of other issues which are not actually interfering with your scripts execution.
First, I have no idea why you use "" in
ssh $node ""ps -edf | grep [j]ava | awk '{print $2}'""
The "" "expands" to an empty string, so the above is effectively identical to
ssh $node ps -edf | grep [j]ava | awk '{print $2}'
that means that the grep and awk commands are being run on the local host; the output from the ps command is forwarded back to the local host by ssh. That doesn't change anything, although it does make the brackets in [j]ava redundant, since the grep won't show up in the process list, as it is not running on the host where the ps is executed. In fact, it's a good thing that the brackets are redundant, since they might not be present in the command if there happens to be a file named java in your current working directory. You really should quote that argument.
I presume that what you intended was to run the entire pipeline on the remote machine, in which case you might have tried:
ssh $node "ps -edf | grep [j]ava | awk '{print $2}'"
and found that it didn't work. It wouldn't have worked because the $2 in the awk command will be expanded to whatever $2 is in your current shell; the $2 is not protected by interior single-quotes. As far as bash is concerned, $2 is just part of a double quoted string. (And it also would shift the issue of the argument to grep not being quoted to the remote host, so you'll have problems if there is a file named java in the home directory on the remote host.
So what you actually want is
ssh -n $node 'ps -edf | grep "[j]ava" | awk "{print \$2}"'
Finally, don't use PID as the name of a shell variable. Variable names in all upper case are generally reserved, and it is perilously close to BASHPID and PPID, which are specific bash variables. Your own shell variables should have lower-case names, as in any other programming language.

Related

Why it is exit when ssh remote executed script in linux shell [duplicate]

This question already has answers here:
ssh breaks out of while-loop in bash [duplicate]
(2 answers)
While loop stops reading after the first line in Bash
(5 answers)
ssh in bash script exits loop [duplicate]
(1 answer)
Closed 2 years ago.
In my project, i need to find the user processed on node.
I have a file: jodIdUser. The content in this file has two columns, like:
395163 chem-yupy
395164 chem-yupy
395165 phy-xiel
395710 mae-chent
Now i have a script appRecord.sh, and i have a whle loop in it.
The while method code is like:
cat $workDir/jobIdUser | while read LINE
do
jobUser=`echo $LINE | awk '{print $2}'`
jobId=`echo $LINE | awk '{print $1}'`
jobOnNodes=`/usr/bin/jobToNode $jobId | xargs`
echo $timeStr" "$jobId" "$jobUser" "$jobOnNodes >> $workDir/tmpRecFile
#20200702, it is needed to find out user process on nodes at this time here
designatedNode=`echo $jobOnNodes | awk '{print $NF}'`
echo $jobOnNodes
echo $designatedNode
ssh $designatedNode sh $workDir/nodeProInfo.sh ##Here code will exit while loop
echo $timeStr" "$jobId" "$jobUser" "$jobOnNodes >> $workDir/$recordFile
done
The code of nodeProInfo.sh is like:
#!/bin/bash
source /etc/profile
workDir=/work/ccse-xiezy/appRecord
hostName=`hostname`
jobInfo=`ps axo user:15,comm | grep -Ev "libstor|UID|ganglia|root|gdm|postfix|USER|rpc|polkitd|dbus|chrony|libstoragemgmt|para-test|ssh|ccse-x|lsf|lsbatch" | tail -n 1`
echo $hostName" "$jobInfo >> $workDir/proRes
Now I run the script sh appRecord.sh, it is wrong. It will exit when the first loop in while
[cc#login04 appRecord]$ sh appRecord.sh
r03n56 r04n09 r04n15
r04n15
[cc#login04 appRecord]$
I don't know why it will exit when remote ssh node method, who can help me?
UPDATE:
I have another script which runns ok. the jobIdUser has content like:
r01n23 xxx-ser
r92n12 yyn-ser
and the while loop is:
cat $workDir/jobIdUser | while read LINE
do
.............
ssh $NODE pkill -u -9 $USER
.............
done
cat $workDir/jobIdUser | while read LINE
do
...
ssh $designatedNode sh $workDir/nodeProInfo.sh ##Here code will exit while loop
...
done
ssh (and every other command running inside the loop) inherits its standard input from the standard input of the while loop. By default, ssh reads its standard
input and passes the data to the standard input of the remote process. This means that ssh is consuming the data being piped into the loop, preventing the while statement from reading it.
You need to prevent ssh from reading its standard input. You can do this one of two ways. First of all, the -n flag prevents ssh from reading its standard input:
ssh -n $designatedNode sh $workDir/nodeProInfo.sh
Or you can redirect ssh's standard input from another source:
ssh $designatedNode sh $workDir/nodeProInfo.sh < /dev/null

BASH stops without error, but works if copied in terminal

I am trying to write a script to slice a 13 Gb file in smaller parts to launch a split computation on a cluster. What I wrote so far works on terminal if I copy and paste it, but stops at the first cycle of the for loop.
set -ueo pipefail
NODES=8
READS=0days_rep2.fasta
Ntot=$(cat $READS | grep 'read' | wc -l)
Ndiv=$(($Ntot/$NODES))
for i in $(seq 0 $NODES)
do
echo $i
start_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+1)) | tail -n 1)
echo ${start_read}
end_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+$Ndiv)) | tail -n 1)
echo ${end_read}
done
If I run the script:
(base) [andrea#andrea-xps data]$ bash cluster.sh
0
>baa12ba1-4dc2-4fae-a989-c5817d5e487a runid=314af0bb142c280148f1ff034cc5b458c7575ff1 sampleid=0days_rep2 read=280855 ch=289 start_time=2019-10-26T02:42:02Z
(base) [andrea#andrea-xps data]$
it seems to stop abruptly after the command "echo ${start_read}" without raising any sort of error. If I copy and paste the script in terminal it runs without problems.
I am using Manjaro linux.
Andrea
The problem:
The problem here (as #Jens suggested in a comment) has to do with the use of the -e and pipefail options; -e makes the shell exit immediately if any simple command gets an error, and pipefail makes a pipeline fail if any command in it fails.
But what's failing? Take a look at the command here:
start_read=$(cat $READS | grep 'read' | head -n $(($Ndiv*${i}+1)) | tail -n 1)
Which, clearly, runs the cat, grep, head, and tail commands in a pipeline (which runs in a subshell so the output can be captured and put in the start_read variable). So cat starts up, and starts reading from the file and shoving it down the pipe to grep. grep reads that, picks out the lines containing 'read', and feeds them on toward head. head reads the first line of that (note that on the first pass, Ndiv is 0, so it's running head -n 1) from its input, feeds that on toward the tail command, and then exits. tail passes on the one line it got, then exits as well.
The problem is that when head exited, it hadn't read everything grep had to give it; that left grep trying to shove data into a pipe with nothing on the other end, so the system sent it a SIGPIPE signal to tell it that wasn't going to work, and that caused grep to exit with an error status. And then since it exited, cat was similarly trying to stuff an orphaned pipe, so it got a SIGPIPE as well and also exited with an error status.
Since both cat and grep exited with errors, and pipefail is set, that subshell will also exit with an error status, and that means the parent shell considers the whole assignment command to have failed, and abort the script on the spot.
Solutions:
So, one possible solution is to remove the -e option from the set command. -e is kind of janky in what it considers an exit-worthy error and what it doesn't, so I don't generally like it anyway (see BashFAQ #105 for details).
Another problem with -e is that (as we've seen here) it doesn't give much of any indication of what went wrong, or even that something went wrong! Error checking is important, but so's error reporting.
(Note: the danger in removing -e is that your script might get a serious error partway through... and then blindly keep running, in a situation that doesn't make sense, possibly damaging things in the process. So you should think about what might go wrong as the script runs, and add manual error checking as needed. I'll add some examples to my script suggestion below.)
Anyway, just removing -e is just papering over the fact that this isn't a really good approach to the problem. You're reading (or trying to read) over the entire file multiple times, and processing it through multiple commands each time. You really should only be reading through the thing twice: once to figure out how many reads there are, and once to break it into chunks. You might be able to write a program to do the splitting in awk, but most unix-like systems already have a program specifically for this task: split. There's also no need for cat everywhere, since the other commands are perfectly capable of reading directly from files (again, #Jens pointed this out in a comment).
So I think something like this would work:
#!/bin/bash
set -uo pipefail # I removed the -e 'cause I don't trust it
nodes=8 # Note: lower- or mixed-case variables are safer to avoid conflicts
reads=0days_rep2.fasta
splitprefix=0days_split_
Ntot=$(grep -c 'read' "$reads") || { # grep can both read & count in a single step
# The || means this'll run if there was an error in that command.
# A normal thing to do is print an error message to stderr
# (with >&2), then exit the script with a nonzero (error) status
echo "$0: Error counting reads in $reads" >&2
exit 1
}
Ndiv=$((($Ntot+$nodes-1)/$nodes)) # Force it to round *up*, not down
grep 'read' "$reads" | split -l $Ndiv -a1 - "$splitprefix" || {
echo "$0: Error splitting fasta file" >&2
exit 1
}
This'll create files named "0days_split_a" through "0days_split_h". If you have the GNU version of split, you could add its -d option (use numeric suffixes instead of letters) and/or --additional-suffix=.fasta (to add the .fasta extension to the split files).
Another note: if only a little bit of that big file is read lines, it might be faster to run grep 'read' "$reads" >sometempfile first, and then run the rest of the script on the temp file, so you don't have to read & thin it twice. But if most of the file is read lines, this won't help much.
Alright, we have found the troublemaker: set -e in combination with set -o pipefail.
Gordon Davisson's answer provides all the details. I provide this answer for the sole purpose of reaping an upvote for my debugging efforts in the comments to your answer :-)

How do I pass a variable in a loop to a remotely executed command?

I am trying to write a shell script to remotely power on and off VMs on an esxi server. Since the vmid is not static I have to grep out the hostname and then use awk to get just the first column. The following command works as expected:
ssh 192.168.0.10 'VMID="$(vim-cmd vmsvc/getallvms | grep HPDesktop2 | awk "{print $1}" )"; vim-cmd vmsvc/power.off $VMID'
What I would like to do is have a list of hostnames in a separate file, loop through those hostnames, and run the command on each hostname. I tried the following:
while read ID; do
ssh 192.168.0.10 'VMID="$(vim-cmd vmsvc/getallvms | grep $ID | awk "{print $1}" )"; vim-cmd vmsvc/power.off $VMID'
done <Hostnames
It seems to not be passing the variable because I get instruction on the usage of grep and the following error:
Usage: power.off vmid
Power off the specified virtual machine.
Insufficient arguments.
example of the problem you're having:
ubuntu#ubuntu:~$ hello="hi"
ubuntu#ubuntu:~$ echo '$hello'
$hello
ubuntu#ubuntu:~$
the command you're passing as an argument to ssh is in single quotes, so it won't get evaluated on your local machine. That's a problem, since the ID variable only exists in the context of your script running on the local machine. Maybe use quotes and us \" to escape the quotes inside the quoted string?

echo $(command) gets a different result with the output of the command

The Bash command I used:
$ ssh user#myserver.com ps -aux|grep -v \"grep\"|grep "/srv/adih/server/app.js"|awk '{print $2}'
6373
$ ssh user#myserver.com echo $(ps -aux|grep -v \"grep\"|grep "/srv/adih/server/app.js"|awk '{print $2}')
8630
The first result is the correct one and the second one will change echo time I execute it. But I don't know why they are different.
What am I doing?
My workstation has very limited resources, so I use a remote machine to run my Node.js application. I run it using ssh user#remotebox.com "cd /application && grunt serve" in debug mode. When I command Ctrl + C, the grunt task is stopped, but the application is is still running in debug mode. I just want to kill it, and I need to get the PID first.
The command substitution is executed by your local shell before ssh runs.
If your local system's name is here and the remote is there,
ssh there uname -n
will print there whereas
ssh there echo $(uname -n) # should have proper quoting, too
will run uname -n locally and then send the expanded command line echo here to there to be executed.
As an additional aside, echo $(command) is a useless use of echo unless you specifically require the shell to perform wildcard expansion and whitespace tokenization on the output of command before printing it.
Also, grep x | awk { y } is a useless use of grep; it can and probably should be refactored to awk '/x/ { y }' -- but of course, here you are reinventing pidof so better just use that.
ssh user#myserver.com pidof /srv/adih/server/app.js
If you want to capture the printed PID locally, the syntax for that is
pid=$(ssh user#myserver.com pidof /srv/adih/server/app.js)
Of course, if you only need to kill it, that's
ssh user#myserver.com pkill /srv/adih/server/app.js
Short answer: the $(ps ... ) command substitution is being run on the local computer, and then its output is sent (along with the echo command) to the remote computer. Essentially, it's running ssh user#myserver.com echo 8630.
Your first command is also probably not doing what you expect; the pipes are interpreted on the local computer, so it's running ssh user#myserver.com ps -aux, piping the output to grep on the local computer, piping that to another grep on the local computer, etc. I'm guessing that you wanted that whole thing to run on the remote computer so that the result could be used on the remote computer to kill a process.
Long answer: the order things are parsed and executed in shell is a bit confusing; with an ssh command in the mix, things get even more complicated. Basically, what happens is that the local shell parses the command line, including splitting it into separate commands (separated by pipes, ;, etc), and expanding $(command) and $variable substitutions (unless they're in single-quotes). It then removes the quotes and escapes (they've done their jobs) and passes the results as arguments to the various commands (such as ssh). ssh takes its arguments, sticks all the ones that look like parts of the remote command together with spaces between them, and sends them to a shell on the remote computer which does this process over again.
This means that quoting and/or escaping things like $ and | is necessary if you want them to be parsed/acted on by the remote shell rather than the local shell. And quotes don't nest, so putting quotes around the whole thing may not work the way you expect (e.g. if you're not careful, the $2 in that awk command might get expanded on the local computer, even though it looks like it's in single-quotes).
When things get messy like this, the easiest way is sometimes to pass the remote command as a here-document rather than as arguments to the ssh command. But you want quotes around the here-document delimiter to keep the various $ expansions from being done by the local shell. Something like this:
ssh user#myserver.com <<'EOF'
echo $(ps -aux|grep -v "grep"|grep "/srv/adih/server/app.js"|awk '{print $2}')
EOF
Note: be careful with indenting the remote command, since the text will be sent literally to the remote computer. If you indent it with tab characters, you can use <<- as the here-document delimiter (e.g. <<-'EOF') and it'll remove the leading tabs.
EDIT: As #tripleee pointed out, there's no need for the multiple greps, since awk can do the whole job itself. It's also unnecessary to exclude the search commands from the results (grep -v grep) because the "/" characters in the pattern need to be escaped, meaning that it won't match itself.. So you can simplify the pipeline to:
ps -aux | awk '/\/srv\/adih\/server\/app.js/ {print $2}'
Now, I've been assuming that the actual goal is to kill the relevant pid, and echo is just there for testing. If that's the case, the actual command winds up being:
ssh user#myserver.com <<'EOF'
kill $(ps -aux | awk '/\/srv\/adih\/server\/app.js/ {print $2}')
EOF
If that's not right, then the whole echo $( ) thing is best skipped entirely. There's no reason to capture the pipeline's output and then echo it, just run it and let it output directly.
And if pkill (or something similar) is available, it's much simpler to use that instead.

Passing multiple arguments to a UNIX shell script

I have the following (bash) shell script, that I would ideally use to kill multiple processes by name.
#!/bin/bash
kill `ps -A | grep $* | awk '{ print $1 }'`
However, while this script works is one argument is passed:
end chrome
(the name of the script is end)
it does not work if more than one argument is passed:
$end chrome firefox
grep: firefox: No such file or directory
What is going on here?
I thought the $* passes multiple arguments to the shell script in sequence. I'm not mistyping anything in my input - and the programs I want to kill (chrome and firefox) are open.
Any help is appreciated.
Remember what grep does with multiple arguments - the first is the word to search for, and the remainder are the files to scan.
Also remember that $*, "$*", and $# all lose track of white space in arguments, whereas the magical "$#" notation does not.
So, to deal with your case, you're going to need to modify the way you invoke grep. You either need to use grep -F (aka fgrep) with options for each argument, or you need to use grep -E (aka egrep) with alternation. In part, it depends on whether you might have to deal with arguments that themselves contain pipe symbols.
It is surprisingly tricky to do this reliably with a single invocation of grep; you might well be best off tolerating the overhead of running the pipeline multiple times:
for process in "$#"
do
kill $(ps -A | grep -w "$process" | awk '{print $1}')
done
If the overhead of running ps multiple times like that is too painful (it hurts me to write it - but I've not measured the cost), then you probably do something like:
case $# in
(0) echo "Usage: $(basename $0 .sh) procname [...]" >&2; exit 1;;
(1) kill $(ps -A | grep -w "$1" | awk '{print $1}');;
(*) tmp=${TMPDIR:-/tmp}/end.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15
ps -A > $tmp.1
for process in "$#"
do
grep "$process" $tmp.1
done |
awk '{print $1}' |
sort -u |
xargs kill
rm -f $tmp.1
trap 0
;;
esac
The use of plain xargs is OK because it is dealing with a list of process IDs, and process IDs do not contain spaces or newlines. This keeps the simple code for the simple case; the complex case uses a temporary file to hold the output of ps and then scans it once per process name in the command line. The sort -u ensures that if some process happens to match all your keywords (for example, grep -E '(firefox|chrome)' would match both), only one signal is sent.
The trap lines etc ensure that the temporary file is cleaned up unless someone is excessively brutal to the command (the signals caught are HUP, INT, QUIT, PIPE and TERM, aka 1, 2, 3, 13 and 15; the zero catches the shell exiting for any reason). Any time a script creates a temporary file, you should have similar trapping around the use of the file so that it will be cleaned up if the process is terminated.
If you're feeling cautious and you have GNU Grep, you might add the -w option so that the names provided on the command line only match whole words.
All the above will work with almost any shell in the Bourne/Korn/POSIX/Bash family (you'd need to use backticks with strict Bourne shell in place of $(...), and the leading parenthesis on the conditions in the case are also not allowed with Bourne shell). However, you can use an array to get things handled right.
n=0
unset args # Force args to be an empty array (it could be an env var on entry)
for i in "$#"
do
args[$((n++))]="-e"
args[$((n++))]="$i"
done
kill $(ps -A | fgrep "${args[#]}" | awk '{print $1}')
This carefully preserves spacing in the arguments and uses exact matches for the process names. It avoids temporary files. The code shown doesn't validate for zero arguments; that would have to be done beforehand. Or you could add a line args[0]='/collywobbles/' or something similar to provide a default - non-existent - command to search for.
To answer your question, what's going on is that $* expands to a parameter list, and so the second and later words look like files to grep(1).
To process them in sequence, you have to do something like:
for i in $*; do
echo $i
done
Usually, "$#" (with the quotes) is used in place of $* in cases like this.
See man sh, and check out killall(1), pkill(1), and pgrep(1) as well.
Look into pkill(1) instead, or killall(1) as #khachik comments.
$* should be rarely used. I would generally recommend "$#". Shell argument parsing is relatively complex and easy to get wrong. Usually the way you get it wrong is to end up having things evaluated that shouldn't be.
For example, if you typed this:
end '`rm foo`'
you would discover that if you had a file named 'foo' you don't anymore.
Here is a script that will do what you are asking to have done. It fails if any of the arguments contain '\n' or '\0' characters:
#!/bin/sh
kill $(ps -A | fgrep -e "$(for arg in "$#"; do echo "$arg"; done)" | awk '{ print $1; }')
I vastly prefer $(...) syntax for doing what backtick does. It's much clearer, and it's also less ambiguous when you nest things.

Resources