Do test operators -a and -o short circuit?

Do test operators -a and -o short circuit? - shell

Do test operators -a and -o short circuit?
I tried if [ 0 -eq 1 -a "" -eq 0 ]; then ... which complained about the syntax of the second conditional. But I can't tell if that's because
-a does not short circuit
or test wants everything properly formatted before it begins and it still short circuits.
The result is leading me to create a nested if when really what I wanted was a situation where the first conditional would guard against executing the second if a particular var had not yet been set...
edit: As for why am I using obsolescent operators, the code has to work everywhere in my environment and I just found a machine where
while [ -L "$file" ] && [ "$n" -lt 10 ] && [ "$m" -eq 0 ]; do
is an infinite loop and changing to the obsolete -a yields good behavior:
while [ -L "$file" -a "$n" -lt 10 -a "$m" -eq 0 ]; do
What should I do? The first expression works on many machines but not this machine which appears to require the second expression instead...

Per the POSIX specification for test:
>4 arguments:
The results are unspecified.
Thus, barring XSI extensions, POSIX says nothing about how this behaves.
Moreover, even on a system with XSI extensions:
expression1 -a expression2: True if both expression1 and expression2 are true; otherwise, false. The -a binary primary is left associative. It has a higher precedence than -o. [Option End]
expression1 -o expression2: True if either expression1 or expression2 is true; otherwise, false. The -o binary primary is left associative. [Option End]
There's no specification with respect to short-circuiting.
If you want short-circuiting behavior -- or POSIX-defined behavior at all -- use && and || to connect multiple, separate test invocations.
Quoting again, from later in the document:
APPLICATION USAGE
The XSI extensions specifying the -a and -o binary primaries and the '(' and ')' operators have been marked obsolescent. (Many expressions using them are ambiguously defined by the grammar depending on the specific expressions being evaluated.) Scripts using these expressions should be converted to the forms given below. Even though many implementations will continue to support these obsolescent forms, scripts should be extremely careful when dealing with user-supplied input that could be confused with these and other primaries and operators. Unless the application developer knows all the cases that produce input to the script, invocations like:
test "$1" -a "$2"
should be written as:
test "$1" && test "$2"

Well, you already know the behaviour, so this question is really about how to interpret those results. But TBH, there aren't many real word scenarios where you'll observe different behaviour.
I created a small test case to check what's going on (at least, on my system, since the other answer suggests it's not standardized):
strace bash -c "if [ 0 -eq 1 -a -e "/nosuchfile" ]; then echo X; fi"
If you check the output you'll see that bash looks for the file, so the answer is:
The operators don't short circuit.

Related

Why won't my for/if loop work for renaming sequence names in a fasta file in bash

I am currently working on a bioinformatics project, and have been assigned the role of editing some genetic sequence files (fasta/.fa) to be viable for the next stage of processing. I am doing this on the command line linux with bash.
With how the files have been obtained, each read within the file has been assigned an arbitrary name following this format for 1-1587663 (denoted x) V1_x.
For the next step of my reads, I need to format these names within the file following a specific naming pattern. This is where all empty spaces must contain a 0. For example, V1_1 must be reformatted to V1_0000001, V1_15 must be reformatted to V1_0000015, V1_1050 must be formatted to V1_0001050, eventually ending with V1_1587663.
I will give an example of how one file is laid out:
V1_1 flag=1 multi=9.0000 len=342\
AAGGAGTGATGGCATGGCGTGGGACTTCTCCACCGACCCCGAGTTCCAGGAGAAGCTCGACTGGGTCGAGCGGTTCTGCCAGGAAAGGGTCGAGCCGCTCGACTATGTGTTTCCCCACGCGGTGCGCTGGCCAGACCCGGTGGTAAAGGCGTACGTCCGCGAACTCCAGCAGGAGGTCAAGGACCAGGGCCTGTGGGCGATCTTCCTCGACCGGGAACTAGGTGGCCCGGGCTTCGGACAGCTCAGGCTGGCTCTGCTCAACGAGGTGATCGGCCGCTATCCCGGCGCGCCCGCGATGTTCGGTGCCGCGGCGCCCGATACCGGGAA
V1_2 flag=1 multi=9.0000 len=330
ATCTTCACCCAGCTCGGCAGCATGTTTCCCGTGGCGATGGAGTGCAGCATCGAGCCCAGGCAGATCACCAGCCCGGCGTCTTTCAACTGCGCGGCGTAGGCGTCCTGCGCCGCGTTCATATCGGTAATCGTATCGGGCAGCGGGCCGTCGTCGCGCAGGCTGCCCGCCAGCACGAACGGAATCCCAGAGCGCACGCATTCGTACAGGATGCCTTCCCGCAGGCATCCGCCCTCCACGGCCTGCCGGACGCTCCCGGCGCGATAGATCGCATTGATGGCGCGCATGTGATTGCGGTGCCCGTGCTCTTCCTGCCTCCCGTCGCTCAGCCGC\
I am currently trying to write a loop which would do this all in one go, as it is a lot of reads and I have multiple of these genetic sequence fasta files.
I don't want to ruin my file so I have created a copy of the file with the first 5000 reads in to test my code.
The code I have been trying to make work is as follows
for i in {1..5000}
do
if [ "$i" -le "9"]; then
sed -i 's/V1_i/V1_000000i/' testfile.fa
elif [["$i" -gt "9"] && ["i" -le "99"]]; then
sed -i s/V1_i/V1_00000i/' testfile.fa
elif [["i" -gt "99"] && ["i" -le "999"]]; then
sed -i s/V1_i/V1_0000i/' testfile.fa
elif [["i" -gt "999"] && ["i" -le "9999"]]; then
sed -i s/V1_i/V1_000i/' testfile.fa
fi
done
I will rewrite the code below to explain what I think each line should be doing
for i in {1..5000} - **Denoting that it should be ran with i standing as 1-5000**
do
if [ "$i" -le "9"]; then **If 'i' is less than 9 then do...**
sed -i 's/V1_i/V1_000000i/' testfile.fa **replace V1_i with V1_000000i within testfile.fa**
elif [["$i" -gt "9"] && ["i" -le "99"]]; then **else if 'i' is more than 9 but equal to or less than 99 then do....**
sed -i s/V1_i/V1_00000i/' testfile.fa **replace V1_i with V1_000000i within testfile.fa**
elif [["i" -gt "99"] && ["i" -le "999"]]; then
sed -i s/V1_i/V1_0000i/' testfile.fa
elif [["i" -gt "999"] && ["i" -le "9999"]]; then
sed -i s/V1_i/V1_000i/' testfile.fa
fi
done
The result I get evertime is 4 lots of 'command not found' as pasted below, per number in the range.
[1: command not found
[[1: command not found
[[1: command not found
[[1: command not found
[2: command not found
[[2: command not found
[[2: command not found
[[2: command not found
etc until 5000
I assume I must have something wrong with how I've written the code, but as someone who is new to this, I can't see what is wrong.
Thank you for reading, if you can help that is very much appreciated. If you need anymore details, I will gladly try and help to the best of my ability. Unfortunately, I can't share the exact files I'm working on (I know this isn't helpful sorry) as I do not have permission.

Shell syntax
The result I get evertime is 4 lots of 'command not found' as pasted
below, per number in the range.
[1: command not found
[[1: command not found
[[1: command not found
[[1: command not found
[2: command not found
[[2: command not found
[[2: command not found
[[2: command not found
etc until 5000
The [ character is not special to the shell. [ and [[ are not operators, but rather an ordinary command and a reserved word, repsectively. They have no involvement in splitting command lines into words. Similar applies to ] and ]] -- the shell does not automatically break words on either side them.
The " character is special to the shell, but it does not create a word boundary. The shell has quoting, but it does not have not quote-delimited strings as a syntactic unit in the sense that some other languages do.
With that in mind, consider this code fragment:
elif [["$i" -gt "9"] && ["i" -le "99"]]; then
Because neither [[ nor " produce a word break, [["$i" expands to a single word, for example [[1, which, given its position, is interpreted as the name of a command to execute. There being no built-in command by that name and no program by that name in the path, executing that command fails with "command not found".
You need to insert whitespace to make separate words separate (but see also below):
elif [[ "$i" -gt "9" ] && [ "i" -le "99" ]]; then
Moreover, again, [ is a command and [[ is a reserved word naming a built-in command. ] is an argument with special meaning to the [ command, and ]] is an argument with special significance to the [[ built-in. Although they (intentionally) have a similar appearance, these are not analogous to parentheses. You don't need to impose grouping here anyway. The && operator already separates commands, and the overall pipeline does not need to be explicitly demarcated as a group. This would be correct and more natural:
elif [[ "$i" -gt "9" ]] && [[ "$i" -le "99" ]]; then
Furthermore, although it is not wrong, it is unnecessary and a bit weird to quote your numbers. The case is amore nuanced for the expansions of $i, since its values are fully under your control, but "always quote your variable expansions" is a pretty good rule until your shell scripting is strong enough for you to decide for yourself when you can do otherwise. So, this is where we arrive:
elif [[ "$i" -gt 9 ]] && [[ "$i" -le 99 ]]; then
You will want to do likewise throughout your script.
But wait, there's more!
I think the changes described above would make your script work, but it would be extremely slow, because it will make 5000 passes through the whole file. And on the whole 1.5M entry file, you would need a version that made 1.5M passes through the whole half-gigabyte-ish of data. It would take years to complete.
That approach is not viable, not really even for the 5000 lines. You need something that will make only a single pass through the file, or at worst a small, fixed number of passes. I think a one-pass approach would be possible with sed, but it would take a complex and very cryptic sed expression. I'm a sed fan, but I would recommend awk for this, or even shell without any external tool.
A pure-shell version could be built with the read and printf built-in commands combined with some of the shell's other features. An awk version could be expressed as a not-overly-complex one-liner. Details of either of these options depends on the file syntax, however, which, as I commented on the question, I think you have misrepresented.

Idiomatic way to test if no positional params are given?

What is the most idiomatic way in Bash to test if no positional parameters are given? There are so many ways to check this, I wonder if there is one preferred way.
Some ways are:
((! $# )) # check if $# is 'not true'
(($# == 0)) # $# is 0
[[ ! $# ]] # $# is unset or null

For me, the classical way is:
[[ $# -eq 0 ]]

If you want it to be an error to have no positional parameters:
: ${#?no positional parameters}
will print "no positional parameters" to standard error (and exit a non-interactive shell) if $# is unset.
Otherwise, I'm not aware of any better options than one of the various methods of checking if $# is 0.

Use Sensible Semantics
The key is readability and the intent of your code. Unless you have a good reason to do otherwise, you probably just want to determine the length of the parameter list.
# Is length of parameter list equal to zero?
[ $# -eq 0 ]
However, you can certainly use any parameter expansion or comparison operator that expresses the intent of your code. There's no right way to do it, but you certainly may wish to consider whether the semantics of your test are portable.
Food for Thought
It isn't the conditional expression that's intrinsically important. What's important is why you want to know. For example:
set -e
foo="$1"
shift
# $2 is now $1 (but only if the line is reached)
echo "$1"
In this case, the length of the parameter list is never checked directly. The first parameter is simply assigned (even though it may be unset), and then the shell throws an error and exits when you try to shift the parameter list. This code says "I just expect the parameters to be there; I shouldn't have to check for them explicitly."
The point here is that you need to determine what your code is trying to express, and match the semantics of your tests and conditionals to express that as clearly as you can. There really is no orthogonal answer.

Here's a most logical way:
[ ! $# ]
It is based on a single rule:
[ ] # this returns 1
Well then,
[ ! ] # this returns 0
The rest is obvious:
$# is the special parameter that expands to a list of all positional parameters.
Test:
(It will work even if you throw a couple of empty strings ("" "" "") at it.)
if [ ! $# ]; then
printf 'Note: No arguments are given.'
fi

I prefer using the fact that if there are no positional parameters, there is also no first parameter:
[[ -z $1 ]]
test -z "$1"
[ -z "$1" ]
It's just a tiny bit lighter on the reader. Of course it only works when the assumption that the first parameter can't be an empty string is true.

Test file existence with exit status from gnu-find

When test -e file is not flexible enough, I tend to use the following Bash idiom to check the existence of a file:
if [ -n "$(find ${FIND_ARGS} -print -quit)" ] ; then
echo "pass"
else
echo "fail"
fi
But since I am only interested in a boolean value, are there any ${FIND_ARGS} that will let me do instead:
if find ${FIND_ARGS} ; ...

I'd say no. man find...
find exits with status 0 if all files are processed successfully, greater than 0 if errors occur. This is deliberately a very broad description, but if the return value is non-zero, you should not rely on the correctness of the results of find.
Testing for output is probably fine for find. That isn't a "Bash idiom". If that's not good enough and you have Bash available then you can use extglobs and possibly globstar for file matching tests with [[. Find should only be used for complex recursive file matching, or actual searching for files, and other things that can't easily be done with Bash features.

What are the different uses of the different types of bracing used for conditionals in shell scripts?

I know of at least of 4 ways to test conditions in shell scripts.
[ <cond> ];
[[ <cond> ]];
(( <cond> ));
test <cond>;
I would like to have a comprehensive overview of what the differences between these methods are, and also when to use which of the methods.
I've tried searching the web for an summary but didn't find anything decent. It'd be great to have a decent list up somewhere (stack overflow to the rescue!).

Let's describe them here.
First of all, there are basically 3 different test methods
[ EXPRESSION ], which is exactly the same as test EXPRESSION
[[ EXPRESSION ]]
(( EXPRESSION )), which is exactly the same as let "EXPRESSION"
Let's go into the details:
test
This is the grandfather of test commands. Even if your shell does not support it, there's still a /usr/bin/test command on virtually every unix system. So calling test will either run the built-in or the binary as a fallback. Enter $ type test to see which version is used. Likewise for [.
In most basic cases, this should be sufficient to do your testing.
if [ "$a" = test -o "$a" = Test ];
if test "$a" = test -o "$a" = Test;
If you need more power, then there's...
[[]]
This is a bash special. Not every shell needs to support this, and there's no binary fallback. It provides a more powerful comparison engine, notably pattern matching and regular expression matching.
if [[ "$a" == [Tt]es? ]]; # pattern
if [[ "$a" =~ ^[Tt]es.$ ]]; # RE
(())
This is a bash special used for arithmetic expressions, and is true if the result of the calculation is non-zero. Not every shell needs to support this, and there's no binary fallback.
if (( x * (1 + x++) ));
if let "x * (1 + x++)";
Note that you can omit the $ sign when referencing variables within (( ... )) or let.

On the site linked here, if you scroll down to the [ special character, you will see a separate entry for [[, with a link to the discussion of the differences between them. There is also an entry for (( below those. Hope that helps!

How to prevent code/option injection in a bash script

I have written a small bash script called "isinFile.sh" for checking if the first term given to the script can be found in the file "file.txt":
#!/bin/bash
FILE="file.txt"
if [ `grep -w "$1" $FILE` ]; then
echo "true"
else
echo "false"
fi
However, running the script like
> ./isinFile.sh -x
breaks the script, since -x is interpreted by grep as an option.
So I improved my script
#!/bin/bash
FILE="file.txt"
if [ `grep -w -- "$1" $FILE` ]; then
echo "true"
else
echo "false"
fi
using -- as an argument to grep. Now running
> ./isinFile.sh -x
false
works. But is using -- the correct and only way to prevent code/option injection in bash scripts? I have not seen it in the wild, only found it mentioned in ABASH: Finding Bugs in Bash Scripts.

grep -w -- ...
prevents that interpretation in what follows --
EDIT
(I did not read the last part sorry). Yes, it is the only way. The other way is to avoid it as first part of the search; e.g. ".{0}-x" works too but it is odd., so e.g.
grep -w ".{0}$1" ...
should work too.

There's actually another code injection (or whatever you want to call it) bug in this script: it simply hands the output of grep to the [ (aka test) command, and assumes that'll return true if it's not empty. But if the output is more than one "word" long, [ will treat it as an expression and try to evaluate it. For example, suppose the file contains the line 0 -eq 2 and you search for "0" -- [ will decide that 0 is not equal to 2, and the script will print false despite the fact that it found a match.
The best way to fix this is to use Ignacio Vazquez-Abrams' suggestion (as clarified by Dennis Williamson) -- this completely avoids the parsing problem, and is also faster (since -q makes grep stop searching at the first match). If that option weren't available, another method would be to protect the output with double-quotes: if [ "$(grep -w -- "$1" "$FILE")" ]; then (note that I also used $() instead of backquotes 'cause I find them much easier to read, and quotes around $FILE just in case it contains anything funny, like whitespace).

Though not applicable in this particular case, another technique can be used to prevent filenames that start with hyphens from being interpreted as options:
rm ./-x
or
rm /path/to/-x

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio