In snakemake, what is the recommended way to use the shell() function to execute multiple commands?
You can call shell() multiple times within the run block of a rule (rules can specify run: rather than shell:):
rule processing_step:
input:
# [...]
output:
# [...]
run:
shell("somecommand {input} > tempfile")
shell("othercommand tempfile {output}")
Otherwise, since the run block accepts Python code, you could build a list of commands as strings and iterate over them:
rule processing_step:
input:
# [...]
output:
# [...]
run:
commands = [
"somecommand {input} > tempfile",
"othercommand tempfile {output}"
]
for c in commands:
shell(c)
If you don't need Python code during the execution of the rule, you can use triple-quoted strings within a shell block, and write the commands as you would within a shell script. This is arguably the most readable for pure-shell rules:
rule processing_step:
input:
# [...]
output:
# [...]
shell:
"""
somecommand {input} > tempfile
othercommand tempfile {output}
"""
If the shell commands depend on the success/failure of the preceding command, they can be joined with the usual shell script operators like || and &&:
rule processing_step:
input:
# [...]
output:
# [...]
shell:
"command_one && echo 'command_one worked' || echo 'command_one failed'"
Thought I would throw in this example. It maybe isn't a direct answer to the user's question but I came across this question when searching a similar thing and trying to figure out how to run multiple shell commands and run some of them in a particular directory (for various reasons).
To keep things clean you could use a shell script.
Say I have a shell script scripts/run_somecommand.sh that does the following:
#!/usr/bin/env sh
input=$(realpath $1)
output=$(realpath $2)
log=$(realpath $3)
sample="$4"
mkdir -p data/analysis/${sample}
cd data/analysis/${sample}
somecommand --input ${input} --output ${output} 2> ${log}
Then in your Snakemake rule you can do this
rule somerule:
input:
"data/{sample}.fastq"
output:
"data/analysis/{sample}/{sample}_somecommand.json"
log:
"logs/somecommand_{sample}.log"
shell:
"scripts/run_somecommand.sh {input} {output} {log} {sample}"
Note: If you are working on a Mac and don't have realpath you can install with brew install coreutils it's a super handy command.
Related
Suppose I write:
SHELL:=/usr/bin/env bash
commands:
source sourceme.sh && alias
But I wanted to push that source sourceme.sh back into the SHELL declaration:
SHELL:=/usr/bin/env bash -c "source sourceme.sh" # or something along these lines
Can this be done, and if so, how?
No, you can't do that. Make takes your recipe lines and sends them to, effectively, $(SHELL) -c <recipeline> The shell, unfortunately, doesn't accept multiple -c options so to do what you want you'd need to have a way for make to insert that string at the beginning of every recipe line and there's no way to do that.
You can do it yourself, from outside your makefile, by writing your own wrapper:
$ cat wrapper.sh
#!/usr/bin/env bash
shift
source sourceme.sh
eval "$#"
$ cat Makefile
SHELL := wrapper.sh
commands:
alias
I need to run below code as a single line in docker run -it image_name -c \bin\bash --script with --script below
(dir and dockerImageName being parameters)
'''cd ''' + dir+ ''' \
&& if make image ''' + dockerImageName''' 2>&1 | grep -m 1 "No rule to make target"; then
exit 1
fi'''
How can this be run as a single line?
You can abstract all of this logic into your higher-level application. If you can't do this, write a standard shell script and COPY it into your image.
The triple quotes look like Python syntax. You can break this up into three parts:
The cd $dir part specifies the working directory for the subprocess;
make ... is an actual command to run;
You're inspecting its output for some condition.
In Python, you can call subprocess.run() with an array of arguments and specify these various things at the application level. The array of arguments isn't reinterpreted by a shell and so protects you from this particular security issue. You might run:
completed = subprocess.run(['make', 'image', dockerImageName],
cwd=dir,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
if 'No rule to make target' in completed.stdout:
...
If you need to do this as a shell script, doing it as a proper shell script and making sure to quote your arguments again protects you.
#!/bin/sh
set -e
cd "$1"
if make image "$2" 2>&1 | grep -m 1 "No rule to make target"; then
exit 1
fi
You should never construct a command line by combining strings in the way you've shown. This makes you vulnerable to a shell injection attack. Especially if an attacker knows that the user has permissions to run docker commands, they can set
dir = '.; docker run --rm -v /:/host busybox cat /host/etc/shadow'
and get a file of encrypted passwords they can crack at their leisure. Pretty much anything else is possible once the attacker uses this technique to get unlimited root-level read/write access to the host filesystem.
In shell scripts, our corporate coding standard requires using...
set -x
command
set +x
...for logging, rather than...
echo "doing command"
command
However, when a wildcard is part of the command, this can produce very verbose output.
For example...
for i in {1..10}; do touch $i.foo; done; # create 10 foo files
set -x # log command execution (stdout to be redirected to log file)
rm *.foo # delete foo files
set +x # end logging
...produces the output...
rm 10.foo 1.foo 2.foo 3.foo 4.foo 5.foo 6.foo 7.foo 8.foo 9.foo
Okay for 10 files, but not so great for 10,000.
The desired output is...
rm *.foo
My first thought was to put *.foo in quotes...
rm "*.foo"
However, that gives the error...
rm: cannot remove ‘*.foo’: No such file or directory
Is there a way, using set -x, to echo the command without expanding the wildcard?
For many cases, where simple '-x' or '-v' do not work (as per comments above), and staying within your coding standard (no separate echo), consider:
VAR=/tmp/123
$SHELL -cv "ls $VAR/*"
which will execute the command, but will log the command WITH variable substitution, command substitution, but WITHOUT wild-card substitution.
I compile my project with:
debug=yes make -j4 or debug=no make -j4
The debug variable changes some compiler flags in the Makefile
Instead of typing this repeatedly in the shell, I wrote this script (lets call it daemon):
#!/bin/bash
inotifywait -q -m -e close_write `ls *.c* *.h*` |
while read; do
make -j4
done
so I just do ./daemon which automatically builds whenever a file is written to.
However, I would like to be able to pass the debug=no make -j4 to the ./daemon script like this:
./daemon debug=no make -j4
So I modified the script:
#!/bin/bash
if [ $# -lt 1 ]; then
echo "Usage `basename $0` [COMMAND]"
exit 1;
fi
inotifywait -q -m -e close_write `ls *.c* *.h*` |
while read; do
"$#"
done
This works with ./daemon make -j4 but when I say daemon debug=no make -j4 I get the following error:
./daemon: line 9: debug=no: command not found
How can I make it so debug=no is recognized as a variable and not a command in the daemon script?
Thanks
The expansion of "$#" is parsed after any pre-command assignments are recognized. All you need to do is ensure that debug=... is in the environment of the command that runs make, which is your daemon script.
debug=no ./daemon make -j4
Variable expansions will only ever become arguments (including the zeroth argument: the command name).
They will never become:
Redirections, so you can't var='> file'; cmd $var
Shell keywords or operators, so you can't var='&'; mydaemon $var
Assignments, including prefix assignments, so you can't var='debug=yes'; $var make as you discovered
Command expansions, loops, process substitutions, &&/||, escape sequences, or anything else.
If you want to do this though, you're in luck: there's a standard POSIX tool that will turn leading key=value pairs into environment variables and run the program you want.
It's called env. Here's an example:
run() {
env "$#"
}
run debug=yes make -j 4
Though TBH I'd use chepner's solution
You always need to put the (env) variable settings at the beginning of the command, i.e. before "daemon".
I have configured a Jenkins job to source a bash script that sources another bash script which adds an alias to the .bashrc of its user and sources the .bashrc itself, and then original script tries to use that alias (set by the second). However, it cannot seem to find the alias it has just created. I am not using any scripting plugins aside from using a "Send files or execute commands over SSH" build step to source the script.
The job does this:
source ./test_script.sh
test_script.sh looks like this:
echo "In test_script.sh"
echo $USER
echo $HOME
source ./setup_env.sh
echo "\nBack in test_script.sh"
alias foo
foo
And finally, setup_env.sh looks like this:
echo "\nIn setup_env.sh"
echo "alias foo=\"echo foobar\"" >> $HOME/.bashrc
source $HOME/.bashrc 2>/dev/null
cat $HOME/.bashrc
The output I receive from the Jenkins job looks like this:
In test_script.sh
my_user
/home/my_user
\nIn setup_env.sh
...all of my bashrc...
alias foo="echo foo"
\nBack in test_script.sh
alias foo='echo foo'
./test_script.sh: line 7: foo: command not found
I don't understand why this is happening, when I can happily run it myself on the command-line and watch it succeed. Why can't Jenkins use the new alias, when it can obviously find it (as demonstrated by the output of the alias foo command)?
For anyone else who's having this problem, you may need to set the expand_aliases shell option, which seems to be off by default with Jenkins:
shopt expand_aliases # check if it's on
shopt -s expand_aliases # set expand_aliases option to true
shopt expand_aliases # it should be on now
# test with a simple alias, should print 1
alias x="python -c 'print 1'"
x
The \n that is showing in the output of your echo commands
suggest this is not running under bash, as you may be expecting.
Please check the setting of your jenkins user
(or any other user that you run Jenkins with) -
especially the setting of the default shell.
To test this, add one of those lines at the beginning of your script:
env
or
env | grep -i shell
Should also consider making sure your scripts run under the correct shell,
by adding the "shebang" line as the first line in each script.
In the case of 'bash', for example, you should add the following first line:
#!/bin/bash
(it is not a comment, despite what the auto-syntax-highlighter may think...)