What is the meaning of the number of + signs in stderr in Bash when "set -x" - bash

In Bash, you can see
set --help
-x Print commands and their arguments as they are executed.
Here's test code:
# make script
echo '
#!/bin/bash
set -x
n=$(echo "a" | wc -c)
for i in $(seq $n)
do
file=test_$i.txt
eval "ls -l | head -$i"> $file
rm $file
done
' > test.sh
# execute
chmod +x test.sh
./test.sh 2> stderr
# check
cat stderr
Output
+++ echo a
+++ wc -c
++ n=2
+++ seq 2
++ for i in $(seq $n)
++ file=test_1.txt
++ eval 'ls -l | head -1'
+++ ls -l
+++ head -1
++ rm test_1.txt
++ for i in $(seq $n)
++ file=test_2.txt
++ eval 'ls -l | head -2'
+++ ls -l
+++ head -2
++ rm test_2.txt
What is the meaning of the number of + signs at the beginning of each row in the file? It's kind of obvious, but I want to avoid misinterpreting.
In addition, can a single + sign appear there? If so, what is the meaning of it?

The number of + represents subshell nesting depth.
Note that the entire test.sh script is being run in a subshell because it doesn't begin with #!/bin/bash. This has to be on the first line of the script, but it's on the second line because you have a newline at the beginning of the echo argument that contains the script.
When a script is run this way, it's executed by the original shell in a subshell, approximately like
( source test.sh )
Change that to
echo '#!/bin/bash
set -x
n=$(echo "a" | wc -c)
for i in $(seq $n)
do
file=test_$i.txt
eval "ls -l | head -$i"> $file
rm $file
done
' > test.sh
and the top-level commands being run in the script will have a single +.
So for example the command
n=$(echo "a" | wc -c)
produces the output
++ echo a
++ wc -c
+ n=' 2'
echo a and wc -c are executed in the subshell created for the command substitution, so they get two +, while n=<result> is executed in the original shell with a single +.

From man bash:
-x
After expanding each simple command, for command, case command, select command, or arithmetic for command, display the expanded value of PS4, followed by the command and its expanded arguments or associated word list.
So what's PS4 here?
PS4
The value of this parameter is expanded as with PS1 and the value is printed before each command bash displays during an execution trace. The first character of the expanded value of PS4 is replicated multiple times, as necessary, to indicate multiple levels of indirection. The default is + .
The meaning of "indirection" is not further explained, as far as I can find...

Related

Different output of command substitution

Why does adding | wc -l alters the result as in the following?
tst:
#!/bin/bash
pgrep tst | wc -l
echo $(pgrep tst | wc -l)
echo $(pgrep tst) | wc -l
$ ./tst
1
2
1
and even
$ bash -x tst
+ wc -l
+ pgrep tst
0
++ pgrep tst
++ wc -l
+ echo 0
0
++ pgrep tst
+ echo
pgrep and subshells can have weird interactions, but in this case that's just a red herring; the actual cause is missing double-quotes around the command substitution:
$ cat tst2
#!/bin/bash
pgrep tst | wc -l
echo "$(pgrep tst | wc -l)"
echo "$(pgrep tst)" | wc -l
$ ./tst2
1
2
2
What's going on in the original script is that in the command
echo $(pgrep tst) | wc -l
pgrep prints two process IDs (the main shell running the script, and a subshell created to handle the echo part of the pipeline). It prints each one as a separate line, something like:
11730
11736
The command substitution captures that, but since it's not in double-quotes the newline between them gets converted to an argument break, so the whole thing becomes equivalent to:
echo 11730 11736 | wc -l
As a result, echo prints both IDs as a single line, and wc -l correctly reports that.
The command substitution induces an additional process that has tst in its name, which is included in the input to wc -l.

Set a command to a variable in bash script problem

Trying to run a command as a variable but I am getting strange results
Expected result "1" :
grep -i nosuid /etc/fstab | grep -iq nfs
echo $?
1
Unexpected result as a variable command:
cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
$cmd
echo $?
0
It seems it returns 0 as the command was correct not actual outcome. How to do this better ?
You can only execute exactly one command stored in a variable. The pipe is passed as an argument to the first grep.
Example
$ printArgs() { printf %s\\n "$#"; }
# Two commands. The 1st command has parameters "a" and "b".
# The 2nd command prints stdin from the first command.
$ printArgs a b | cat
a
b
$ cmd='printArgs a b | cat'
# Only one command with parameters "a", "b", "|", and "cat".
$ $cmd
a
b
|
cat
How to do this better?
Don't execute the command using variables.
Use a function.
$ cmd() { grep -i nosuid /etc/fstab | grep -iq nfs; }
$ cmd
$ echo $?
1
Solution to the actual problem
I see three options to your actual problem:
Use a DEBUG trap and the BASH_COMMAND variable inside the trap.
Enable bash's history feature for your script and use the hist command.
Use a function which takes a command string and executes it using eval.
Regarding your comment on the last approach: You only need one function. Something like
execAndLog() {
description="$1"
shift
if eval "$*"; then
info="PASSED: $description: $*"
passed+=("${FUNCNAME[1]}")
else
info="FAILED: $description: $*"
failed+=("${FUNCNAME[1]}")
done
}
You can use this function as follows
execAndLog 'Scanned system' 'grep -i nfs /etc/fstab | grep -iq noexec'
The first argument is the description for the log, the remaining arguments are the command to be executed.
using bash -x or set -x will allow you to see what bash executes:
> cmd="grep -i nosuid /etc/fstab | grep -iq nfs"
> set -x
> $cmd
+ grep -i nosuid /etc/fstab '|' grep -iq nfs
as you can see your pipe | is passed as an argument to the first grep command.

snakemake rule calls a shell script but exits after first command

I have a shell script that works well if I just run it from command line. When I call it from a rule within snakemake it fails.
The script runs a for loop over a file of identifiers and uses those to grep the sequences from a fastq file followed by multiple sequence alignment and makes a consensus.
Here is the script. I placed some echo statements in there and for some reason it doesn't call the commands. It stops at the grep statement.
I have tried adding set +o pipefail; in the rule but that doesn't work either.
#!/bin/bash
function Usage(){
echo -e "\
Usage: $(basename $0) -r|--read2 -l|--umi-list -f|--outfile \n\
where: ... \n\
" >&2
exit 1
}
# Check argument count
[[ "$#" -lt 2 ]] && Usage
# parse arguments
while [[ "$#" -gt 1 ]];do
case "$1" in
-r|--read2)
READ2="$2"
shift
;;
-l|--umi-list)
UMI="$2"
shift
;;
-f|--outfile)
OUTFILE="$2"
shift
;;
*)
Usage
;;
esac
shift
done
# Set defaults
# Check arguments
[[ -f "${READ2}" ]] || (echo "Cannot find input file ${READ2}, exiting..." >&2; exit 1)
[[ -f "${UMI}" ]] || (echo "Cannot find input file ${UMI}, exiting..." >&2; exit 1)
#Create output directory
OUTDIR=$(dirname "${OUTFILE}")
[[ -d "${OUTDIR}" ]] || (set -x; mkdir -p "${OUTDIR}")
# Make temporary directories
TEMP_DIR="${OUTDIR}/temp"
[[ -d "${TEMP_DIR}" ]] || (set -x; mkdir -p "${TEMP_DIR}")
#RUN consensus script
for f in $( more "${UMI}" | cut -f1);do
NAME=$(echo $f)
grep "${NAME}" "${READ2}" | cut -f1 -d ' ' | sed 's/#M/M/' > "${TEMP_DIR}/${NAME}.name"
echo subsetting reads
seqtk subseq "${READ2}" "${TEMP_DIR}/${NAME}.name" | seqtk seq -A > "${TEMP_DIR}/${NAME}.fasta"
~/software/muscle3.8.31_i86linux64 -msf -in "${TEMP_DIR}/${NAME}.fasta" -out "${TEMP_DIR}/${NAME}.muscle.fasta"
echo make consensus
~/software/EMBOSS-6.6.0/emboss/cons -sequence "${TEMP_DIR}/${NAME}.muscle.fasta" -outseq "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i 's/n//g' "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i "s/EMBOSS_001/${NAME}.cons/" "${TEMP_DIR}/${NAME}.cons.fasta"
done
cat "${TEMP_DIR}/*.cons.fasta" > "${OUTFILE}"
Snakemake rule:
rule make_consensus:
input:
r2=get_extracted,
lst="{prefix}/{sample}/reads/cell_barcode_umi.count"
output:
fasta="{prefix}/{sample}/reads/fasta/{sample}.R2.consensus.fa"
shell:
"sh ./scripts/make_consensus.sh -r {input.r2} -l {input.lst} -f {output.fasta}"
Edit Snakemake error messages I changed some of the paths to a neutral filepath
RuleException:
CalledProcessError in line 29 of ~/user/scripts/consensus.smk:
Command ' set -euo pipefail; sh ./scripts/make_consensus.sh -r ~/user/file.extracted.fastq -l ~/user/cell_barcode_umi
.count -f ~/user/file.consensus.fa ' returned non-zero exit status 1.
File "~/user/scripts/consensus.smk", line 29, in __rule
_make_consensus
File "~/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
If there are better ways to do this than using a shell for loop please let me know!
thanks!
Edit
Script ran as standalone: first grep
grep AGGCCGTTCT_TGTGGATG R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/AGGCCGTTCT_TGTGGATG.name
Script ran through snakemake: first 2 grep statements
grep :::::::::::::: R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/::::::::::::::.name
I'm now trying to figure out where those :::: in snakemake are coming from. All ideas welcome
It stops at the grep statement
My guess is that the grep command in make_consensus.sh doesn't capture anything. grep returns exit code 1 in such cases and the non-zero exit status propagates to snakemake. (see also Handling SIGPIPE error in snakemake)
Loosely related... There is an inconsistency between the shebang of make_consensus.sh that says the script should be executed with bash (#!/bin/bash) and the actual execution using sh (sh ./scripts/make_consensus.sh). (In practice it shouldn't make any difference since sh is probably redirected to bash anyway)

sed command find and replace in file and overwrite file , how to initialize file has current file/script

I wanted to increment the current decimal variable,
so I made the following code
#! /bin/bash
k=1.3
file=/home/script.sh
next_k=$(echo "$k + 0.1" | bc -l)
sed -i "s/$k/$next_k/g" "$file"
echo $k
As you can see here I have to specify the file in line 3 , is there a workaround to just tell it to edit and replace in the current file. Instead of me pointing it to the file. Thank you.
I think you're asking how to reference the own script name, which $0 holds, e.g.
#! /bin/bash
k=1.3
next_k=$(echo "$k + 0.1" | bc -l)
sed -i "s/$k/$next_k/g" "$0"
echo $k
You can read more on Positional Parameters here, specifically this bit:
($0) Expands to the name of the shell or shell script. This is set at shell initialization. If Bash is invoked with a file of commands (see Shell Scripts), $0 is set to the name of that file. If Bash is started with the -c option (see Invoking Bash), then $0 is set to the first argument after the string to be executed, if one is present. Otherwise, it is set to the filename used to invoke Bash, as given by argument zero.
e.g.
$ cat test.sh
#! /bin/bash
k=1.3
next_k=$(echo "$k + 0.1" | bc -l)
sed -i "s/$k/$next_k/g" $0
echo $k
$ ./test.sh; ./test.sh ; ./test.sh
1.3
1.4
1.5
$ cat test.sh
#! /bin/bash
k=1.6
next_k=$(echo "$k + 0.1" | bc -l)
sed -i "s/$k/$next_k/g" $0
echo $k

Different pipeline behavior between sh and ksh

I have isolated the problem to the below code snippet:
Notice below that null string gets assigned to LATEST_FILE_NAME='' when the script is run using ksh; but the script assigns the value to variable $LATEST_FILE_NAME correctly when run using sh. This in turn affects the value of $FILE_LIST_COUNT.
But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
When I comment out the tee command in the below line, the ksh script works fine and correctly assigns the value to variable $LATEST_FILE_NAME.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
Kindly consider:
1. Source Code: script.sh
#!/usr/bin/ksh
set -vx # Enable debugging
SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp #Will store all extract filenames
FILE_LIST_COUNT=0 # Stores total number of files
getFileListDetails(){
rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH
# Get list of all files, Sort in reverse order, and store names of the files line-wise. If no files are found, error is muted.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
echo "FATAL ERROR - Could not create a temp file for file list.";exit 1;
fi
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";
}
getFileListDetails;
exit 0;
2. Output when using shell sh script.sh:
+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0
3. Output when using ksh ksh script.sh:
+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0
OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)
Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:
The shell waits for all commands in the pipeline to terminate before returning a value.
Notice the wording in the ksh manpage:
Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.
In your example, the last command is the tee command. Since there is no input to tee because you redirect stdout to ${SOURCE_FILE_PATH}/${FILE_LIST} in the command before, it immediately exits. Oversimplified speaking, the tee is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep at the end of the whole command:
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
That being said, here are a few other things to consider:
Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:
do_something "${this_is_my_file}"
head -1 is deprecated, use head -n 1
If you only have one command on a line, the ending semicolon ; is superfluous...just skip it
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"
No need to cd into the directory first, just specify the whole path as argument to head:
LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"
This is called Useless Use Of Cat because the cat is not needed - wc can deal with files. You probably used it because the output of wc -l myfile includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")" instead.
Furthermore, you will want to read Why you shouldn't parse the output of ls(1) and How can I get the newest (or oldest) file from a directory?.

Resources