How to interrupt bash pipeline on error? - bash

In the following example echo statement gets executed regardless of exit code of previous command in pipeline:
asemenov#cpp-01-ubuntu:~$
asemenov#cpp-01-ubuntu:~$ false|echo 123
123
asemenov#cpp-01-ubuntu:~$ true|echo 123
123
asemenov#cpp-01-ubuntu:~$
I want echo command to execute only on zero exit code of previous command, that is I want to achieve this behavior:
asemenov#cpp-01-ubuntu:~$ false|echo 123
asemenov#cpp-01-ubuntu:~$
Is it possible in bash?
Here is a more practical example:
asemenov#cpp-01-ubuntu:~$ find SomeNotExistingDir|xargs ls -1
find: `SomeNotExistingDir': No such file or directory
..
..
files list from my current directory
..
..
asemenov#cpp-01-ubuntu:~$
There is no reason to execute xargs ls -1 if find failed.

The components of a pipeline are always run unconditionally and logically in parallel; you cannot make the second (or later) processes in the pipeline only if the first (or earlier) process completes successfully.
In the specific case you show with find, you have at least two options:
find SomeNotExistingDir ... -exec ls -1 {} +
Or you can use a very useful feature of GNU xargs (not present in POSIX):
find SomeNotExistingDir ... | xargs -r ls -1
The -r option is equivalent to --no-run-if-empty option, which explains fairly precisely what it does. If you're using GNU find and GNU xargs, you should use the extensions -print0 and -0:
find SomeNotExistingDir ... -print0 | xargs -r -0 ls -1
This handles every character that can appear in a file name correctly.

In terms of command flow, the easiest way to do what you want would be to use the logical OR operator, like this:
[pierrep#DEVELOPMENT8 ~]: false || echo 123
123
[pierrep#DEVELOPMENT8 ~]: true || echo 123
[pierrep#DEVELOPMENT8 ~]:
This works since the || operator is evaluated in a lazy fashion, meaning that the right statement is only evaluated when the left statement evaluated to false or 1.
note: commands which are run successfully return exit status 0 when successful. Something other than 0 when they are not. in your example with find:
[pierrep#DEVELOPMENT8 ~]: find somedir || echo 123
find: `somedir': No such file or directory
123
[pierrep#DEVELOPMENT8 ~]: find .profile || echo 123
.profile
Using || wont redirect any kind of output from the command on the left of the ||.
If you want to run some command only when one succeeds you should just do a basic exit code check and temporarily store the output of one command in a variable in your script in order to feed it to the next command, like so:
$result=( $(find SomeNotExistingDir) )
$exit_code=$?
if [ $exit_code -eq 0 ]; then
for path in ${result[#]}; do
#do some stuff with the find results here...
echo $path;
done
fi
What this does: When find is run, it puts its results into the $result array. $? holds the exit code of the last run command, so here it is the find command. If find found SomeNotExisitingDir then loop through its results (since it might have found multiple instances of it) and do stuff with those paths. Else do nothing. Here else would be triggered when an error occurred in the execution of the find command or when the file/dir could not be found.

You can't do that with pipes because pipe creation will not wait for completion, other wise how could cat | tr 'a-z' 'A-Z' work?. Simulating pipes with test and temp files:
file1=$(mktemp)
file2=$(mktemp)
false > $file1 && (echo 123 > $file2) < $file1 && (prog3 > $file1) < $file2 && #....
rm $file1 $file2

the point is that when the first command fails there is no output for the second command and no reason to execute it - the result of this behavior becomes unexpected
If there is NO output on stdout when exit code is non-zero, then this information itself can be used for piping the data. No need to check for exit code. (Except for the optimization part off course.)
e.g. If you ignore optimization part, consider only the correctness part,
find SomeNotExistingDir|xargs ls -1
Can be changed to
find SomeNotExistingDir| while read x; do ls -1 "$x"; done
Except for while loop, the commands inside it will not be executed. The downfall of this approach is, some information (like line numbers) will be lost for commands like awk/sed/head etc. to be used in place of ls. Plus, ls will be executed N number of times, instead of 1, in case of xargs approach.

For the particular example you give, it's sufficient to simply check if there is any data on the pipe. The problem you experience is that xargs is getting no input, so it invokes ls with no arguments, and ls be default prints the contents of the current directory. anishsane's solution is almost sufficient, but it is not quite the same since it will invoke ls for each line of output, which is not at all what xargs does. However, you can do:
find /bad/path | xargs sh -c 'test $# = 0 || exec ls -1 "$#"'
Now, this pipe line will always succeed, and perhaps that is not desirable (this is the same behavior you get with just find /bad/path | xargs ls -l, though) To ensure that the pipeline fails, you can do:
find /bad/path | xargs sh -c 'test $# = 0 && exit 1; exec ls -1 "$#"'
There are some concerns however. xargs will quite happily invoke its command with many arguments (that is the point of it!), but some shells will handle a much smaller number of arguements than xargs, so it is quite possible that the shell will truncate the arguments. However, that is possibly an academic concern.

Related

Bash conditional on command exit code

In bash, I want to say "if a file doesn't contain XYZ, then" do a bunch of things. The most natural way to transpose this into code is something like:
if [ ! grep --quiet XYZ "$MyFile" ] ; then
... do things ...
fi
But of course, that's not valid Bash syntax. I could use backticks, but then I'll be testing the output of the file. The two alternatives I can think of are:
grep --quiet XYZ "$MyFile"
if [ $? -ne 0 ]; then
... do things ...
fi
And
grep --quiet XYZ "$MyFile" ||
( ... do things ...
)
I kind of prefer the second one, it's more Lispy and the || for control flow isn't that uncommon in scripting languages. I can see arguments for the first one too, although when the person reads the first line, they don't know why you're executing grep, it looks like you're executing it for it's main effect, rather than just to control a branch in script.
Is there a third, more direct way which uses an if statement and has the grep in the condition?
Yes there is:
if grep --quiet .....
then
# If grep finds something
fi
or if the grep fails
if ! grep --quiet .....
then
# If grep doesn't find something
fi
You don't need the [ ] (test) to check the return value of a command. Just try:
if ! grep --quiet XYZ "$MyFile" ; then
This is a matter of taste since there obviously are multiple working solutions. When I deal with a problem like this, I usually apply wc -l after grep in order to count the lines that match. Then you have a single integer number that you can evaluate within a test condition. If the question only is whether there is a match at all (the number of matching lines does not matter), then applying wc probably is OTT and evaluation of grep's return code seems to be the best solution:
Normally, the exit status is 0 if selected lines are found and 1
otherwise. But the exit status is 2 if an error occurred, unless the
-q or --quiet or --silent option is used and a selected line is found. Note, however, that POSIX only mandates, for programs such as grep,
cmp, and diff, that the exit status in case of error be greater than
1; it is therefore advisable, for the sake of portability, to use
logic that tests for this general condition instead of strict equality
with 2.

Can Unix shell be used to report completion status in some manner?

I have seen some ideas for progress bars around SO and externally for specific commands (such as cat). However, my question seems to deviate slightly from the standard...
Currently, I am using the capability of the find command in shell, such as the follow example:
find . -name file -exec cmd "{}" \;
Where "cmd" is generally a zipping capability or removal tool to free up disk space.
When "." is very large, this can take minutes, and I would like some ability to report "status".
Is there some way to have some type of progress bar, percentage completion, or even print periods (i.e., Working....) until completed? If at all possible, I would like to avoid increasing the duration of this execution by adding another find. Is it possible?
Thanks in advance.
Clearly, you can only have a progress meter or percent completion if you know how long the command will take to run, or if it can tell you that it has finished x tasks out of y.
Here's a simple way to show an indicator while something is working:
#!/bin/sh
echo "launching: $#"
spinner() {
while true; do
for char in \| / - \\; do
printf "\r%s" "$char"
sleep 1
done
done
}
# start the spinner
spinner &
spinner_pid=$!
# launch the command
"$#"
# shut off the spinner
kill $spinner_pid
echo ""
So, you'd do (assuming the script is named "progress_indicator")
find . -name file -exec progress_indicator cmd "{}" \;
The trick with find is that you add two -print clauses, one at the start, and
one at the end. You then use awk (or perl) to update and print a line counter for each
unique line. In this example I tell awk to print to stderr.
Any duplicate lines must be the result of the conditions we specified, so we treat that special.
In this example, we just print that line:
find . -print -name aa\* -print |
awk '$0 == last {
print "" > "/dev/fd/2"
print
next
}
{
printf "\r%d", n++ > "/dev/fd/2"
last=$0
}'
It's best to leave find to just report pathnames, and do further processing from awk,
or just add another pipeline. (Because the counters are printed to stderr, those will not
interfere.)
If you have the dialog utility installed (), you can easily make a nice rolling display:
find . -type f -name glob -exec echo {} \; -exec cmd {} \; |
dialog --progressbox "Files being processed..." 12 $((COLUMNS*3/2))
The arguments to --progressbox are the box's title (optional, can't look like a number); the height in text rows and the width in text columns. dialog has a bunch of options to customize the presentation; the above is just to get you started.
dialog also has a progress bar, otherwise known as a "gauge", but as #glennjackman points out in his answer, you need to know how much work there is to do in order to show progress. One way to do this would be to collect the entire output of the find command, count the number of files in it, and then run the desired task from the accumulated output. However, that means waiting until the find command finishes in order to start doing the work, which might not be desirable.
Just because it was an interesting challenge, I came up with the following solution, which is possibly over-engineered because it tries to work around all the shell gotchas I could think of (and even so, it probably misses some). It consists of two shell files:
# File: run.sh
#!/bin/bash
# Usage: run.sh root-directory find-tests
#
# Fix the following path as required
PROCESS="$HOME/bin/process.sh"
TD=$(mktemp --tmpdir -d gauge.XXXXXXXX)
find "$#" -print0 |
tee >(awk -vRS='\0' 'END{print NR > "'"$TD/_total"'"}';
ln -s "$TD/_total" "$TD/total") |
{ xargs -0 -n50 "$PROCESS" "$TD"; printf "XXX\n100\nDone\nXXX\n"; } |
dialog --gauge "Starting..." 7 70
rm -fR "$TD"
# File: process.sh
#!/bin/bash
TD="$1"; shift
TOTAL=
if [[ -f $TD/count ]]; then COUNT=$(cat "$TD/count"); else COUNT=0; fi
for file in "$#"; do
if [[ -z $TOTAL && -f $TD/total ]]; then TOTAL=$(cat "$TD/total"); fi
printf "XXX\n%d\nProcessing file\n%q\nXXX\n" \
$((COUNT*100/${TOTAL:-100})) "$file"
#
# do whatever you want to do with $file
#
((++COUNT))
done
echo $COUNT > "$TD/count"
Some notes:
There are a lot of gnu extensions scattered in the above. I haven't made a complete list, but it certainly includes the %q printf format (which could just be %s); the flags used to NUL-terminate the filename list, and the --tmpdir flag to mktemp.
run.sh uses tee to simultaneously count the number of files found (with awk) and to start processing the files.
The -n50 argument to xargs causes it to wait only for the first 50 files to avoid delaying startup if find spends a lot of time not finding the first files; it might not be necessary.
The -vRS='\0' argument to awk causes it to use a NUL as a line delimiter, to match the -print0 action to find (and the -0 option to xargs); all this is only necessary if filepaths could contain a new-line.
awk writes the count to _total and then we symlink _total to total to avoid a really unlikely race condition where total is read before it is completely written. symlinking is atomic, so doing it this way guarantees that total either doesn't exist or is completely written.
It might have been better to have counted the total size of the files rather than just counting them, particularly if the processing work is related to file size (compression, for example). That would be a reasonably simple modification. Also, it would be tempting to use xargs parallel execution feature, but that would require a bit more work coordinating the sum of processed files between the parallel processes.
If you're using a managed environment which doesn't have dialog, the simplest solution is to just run the above script using ssh from an environment which does have dialog. Remove | dialog --gauge "Starting..." 7 70 from run.sh, and put it in your ssh invocation instead: ssh user#host /path/to/run.sh root-dir find-tests | dialog --gauge "Starting..." 7 70

BASH Script - Safe limits for string from command output

Good day,
I am writing a relatively simple BASH script that performs an SVN UP command, captures the console output, then does some post processing on the text.
For example:
#!/bin/bash
# A script to alter SVN logs a bit
# Update and get output
echo "Waiting for update command to complete..."
TEST_TEXT=$(svn up --set-depth infinity)
echo "Done"
# Count number of lines in output and report it
NUM_LINES=$(echo $TEST_TEXT | grep -c '.*')
echo "Number of lines in output log: $NUM_LINES"
# Print out only lines containing Makefile
echo $TEST_TEXT | grep Makefile
This works as expected (ie: as commented in the code above), but I am concerned about what would happen if I ran this on a very large repository. Is there a limit on the maximum buffer size BASH can use to hold the output of a console command?
I have looked for similar questions, but nothing quite like what I'm searching for. I've read up on how certain scripts need to use the xargs in cases of large intermediate buffers, and I'm wondering if something similar applies here with respect to capturing console output.
eg:
# Might fail if we have a LOT of results
find -iname *.cpp | rm
# Shouldn't fail, regardless of number of results
find -iname *.cpp | xargs rm
Thank you.
Using
var=$(hexdump /dev/urandom | tee out)
bash didn't complain; I killed it at a bit over 1G and 23.5M lines. You don't need to worry as long as your output fits in your system's memory.
I see no reason not to use a temporary file here.
tmp_file=$(mktemp XXXXX)
svn up --set-depth=infinity > $tmp_file
echo "Done"
# Count number of lines in output and report it
NUM_LINES=$(wc -l $tmp_file)
echo "Number of lines in output log: $NUM_LINES"
# Print out only lines containing Makefile
grep Makefile $tmp_file
rm $tmp_file

Shell script: Get name of last file in a folder by alphabetical order

I have a folder with backups from a MySQL database that are created automatically. Their name consists of the date the backup was made, like so:
2010-06-12_19-45-05.mysql.gz
2010-06-14_19-45-05.mysql.gz
2010-06-18_19-45-05.mysql.gz
2010-07-01_19-45-05.mysql.gz
What is a way to get the filename of the last file in the list, i.e. of the one which in alphabetical order comes last?
In a shell script, I would like to do something like
LAST_BACKUP_FILE= ???
gunzip $LAST_BACKUP_FILE;
ls -1 | tail -n 1
If you want to assign this to a variable, use $(...) or backticks.
FILE=`ls -1 | tail -n 1`
FILE=$(ls -1 | tail -n 1)
#Sjoerd's answer is correct, I'll just pick a few nits from it:
you don't need the -1 option to enforce one path per line if you pipe the output somewhere:
ls | tail -n 1
you can use -r to get the listing in reverse order, and take the first one:
ls -r | head -n 1
gunzip some.log.gz will write uncompressed data into some.log and remove some.log.gz, which may or may not be what you want (probably isn't). if you want to keep the compressed source, pipe it into gunzip:
gunzip < some.file.gz
you might want to protect the script against situation when the dir contains no files, since
gunzip $empty_variable
expands to just
gunzip
and such invocation will wait indefinitely for data on standard input:
latest="$(ls -r /some/where/*.gz | head -1)"
if test -z "$latest"; then
# there's no logs yet, bail out
exit
fi
gunzip < $latest
ls can yield unexpected results when parsed by other commands if the filenames have unusual characters. The following always works:
for LAST_BACKUP_FILE in *; do : ; done
for LAST_BACKUP_FILE in * loops through every filename (and folder name, if there are any) in order in the current directory, storing each in $LAST_BACKUP_FILE
do : does nothing
done finishes after the last file
Now, the last file is stored in $LAST_BACKUP_FILE.
If you happen to want the first file, use this:
for FIRST_BACKUP_FILE in *; do break; done
The break statement jumps out of the loop after the first file is stored in $FIRST_BACKUPT_FILE.
(from comment below) If you want hidden files included in the search, then use the command shopt -s dotglob before running the loops.
The shell is more powerful than many think. Just let it work for you. Assuming filenames without spaces,
set -- $(ls -r *.gz)
LAST_BACKUP_FILE=$1
does the trick with a single fork, no pipes, and you can even avoid the fork if your shell supports arithmetic expansion as in
set -- *.gz
shift $(($# - 1))
LAST_BACKUP_FILE=$1

Handle special characters in bash for...in loop

Suppose I've got a list of files
file1
"file 1"
file2
a for...in loop breaks it up between whitespace, not newlines:
for x in $( ls ); do
echo $x
done
results:
file
1
file1
file2
I want to execute a command on each file. "file" and "1" above are not actual files. How can I do that if the filenames contains things like spaces or commas?
It's a little trickier than I think find -print0 | xargs -0 could handle, because I actually want the command to be something like "convert input/file1.jpg .... output/file1.jpg" so I need to permutate the filename in the process.
Actually, Mark's suggestion works fine without even doing anything to the internal field separator. The problem is running ls in a subshell, whether by backticks or $( ) causes the for loop to be unable to distinguish between spaces in names. Simply using
for f in *
instead of the ls solves the problem.
#!/bin/bash
for f in *
do
echo "$f"
done
UPDATE BY OP: this answer sucks and shouldn't be on top ... #Jordan's post below should be the accepted answer.
one possible way:
ls -1 | while read x; do
echo $x
done
I know this one is LONG past "answered", and with all due respect to eduffy, I came up with a better way and I thought I'd share it.
What's "wrong" with eduffy's answer isn't that it's wrong, but that it imposes what for me is a painful limitation: there's an implied creation of a subshell when the output of the ls is piped and this means that variables set inside the loop are lost after the loop exits. Thus, if you want to write some more sophisticated code, you have a pain in the buttocks to deal with.
My solution was to take the "readline" function and write a program out of it in which you can specify any specific line number that you may want that results from any given function call. ... As a simple example, starting with eduffy's:
ls_output=$(ls -1)
# The cut at the end of the following line removes any trailing new line character
declare -i line_count=$(echo "$ls_output" | wc -l | cut -d ' ' -f 1)
declare -i cur_line=1
while [ $cur_line -le $line_count ] ;
do
# NONE of the values in the variables inside this do loop are trapped here.
filename=$(echo "$ls_output" | readline -n $cur_line)
# Now line contains a filename from the preceeding ls command
cur_line=cur_line+1
done
Now you have wrapped up all the subshell activity into neat little contained packages and can go about your shell coding without having to worry about the scope of your variable values getting trapped in subshells.
I wrote my version of readline in gnuc if anyone wants a copy, it's a little big to post here, but maybe we can find a way...
Hope this helps,
RT

Resources