Snakemake with Singularity - cluster-computing

Snakemake with Singularity - cluster-computing

I'm trying to use Singularity within one of my Snakemake rules. This works as expected when running my Snakemake pipeline locally. However, when I try to submit using sbatch onto my computing cluster, I run into errors. I'm wondering if you have any suggestions about how to translate the local pipeline to one that can work on the cluster. Thank you in advance!
The rule which causes errors uses Singularity to call variants with DeepVariant:
# Call variants with DeepVariant.
rule deepvariant_call:
input:
ref_path='/labs/jandr/walter/varcal/data/refs/{ref}.fa',
bam='results/{samp}/bams/{samp}_{mapper}_{ref}.rmdup.bam'
params:
nshards='1',
version='0.7.0'
threads: 8
output:
vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_deep.g.vcf.gz'
shell:
'singularity exec --bind /srv/gsfs0 --bind /labs/jandr/walter/ /home/kwalter/.singularity/shub/deepvariant-docker-deepvariant:0.7.0.simg \
/labs/jandr/walter/tb/test/scripts/call_deepvariant.sh {input.ref_path} {input.bam} {params.nshards} {params.version} {output.vcf} '
#
# Error in rule deepvariant_call:
# jobid: 17
# output: results/T1-XX-2017-1068_S51/vars/T1-XX-2017-1068_S51_bowtie2_H37Rv_deep.g.vcf.gz
# shell:
# singularity exec --bind /srv/gsfs0 --bind /labs/jandr/walter/ /home/kwalter/.singularity/shub/deepvariant-docker-deepvariant:0.7.0.simg; /labs/jandr/walter/tb/test/scripts/call_deepvariant.sh /labs/jandr/walter/varcal/data/refs/H37Rv.fa results/T1-XX-2017-1068_S51/bams/T1-XX-2017-1068_S51_bowtie2_H37Rv.rmdup.bam 1 0.7.0 results/T1-XX-2017-1068_S51/vars/T1-XX-2017-1068_S51_bowtie2_H37Rv_deep.g.vcf.gz
# (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
I submit jobs to the cluster with the following:
snakemake -j 128 --cluster-config cluster.json --cluster "sbatch -A {cluster.account} --mem={cluster.mem} -t {cluster.time} -c {threads}"

As seen in the resolved command of error message where semi-colon separates two lines of shell: instead of whitespace, this error is due to string formatting in shell:.
You could use triple-quoted format:
shell:
'''
singularity exec --bind /srv/gsfs0 --bind /labs/jandr/walter/ /home/kwalter/.singularity/shub/deepvariant-docker-deepvariant:0.7.0.simg \
/labs/jandr/walter/tb/test/scripts/call_deepvariant.sh {input.ref_path} {input.bam} {params.nshards} {params.version} {output.vcf}
'''
Or, each line within single quotes:
shell:
'singularity exec --bind /srv/gsfs0 --bind /labs/jandr/walter/ /home/kwalter/.singularity/shub/deepvariant-docker-deepvariant:0.7.0.simg \'
'/labs/jandr/walter/tb/test/scripts/call_deepvariant.sh {input.ref_path} {input.bam} {params.nshards} {params.version} {output.vcf}'

Related

Why are shell builtins not found when using Kubectl exec

I am making a bash script to copy files from a Kubernetes pod running Debian. When I include the following line:
kubectl --namespace "$namesp" exec "$pod" -c "$container" -- cd /var
it errors out:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "cd": executable file not found in $PATH: unknown
command terminated with exit code 126
I also tried
kubectl --namespace "$namesp" exec "$pod" -c "$container" -- builtin
kubectl --namespace "$namesp" exec "$pod" -c "$container" -it -- cd /var
which gave the same result.
I was able to resolve the issue by changing the command to:
kubectl --namespace "$namesp" exec "$pod" -c "$container" -- /bin/bash -c "builtin"
Would love to understand why the first command(s) don't work and the latter one does. I would have thought that builtin commands are the one group of commands that would always be found, in contrast to commands that rely on the PATH environment variable.

kubectl exec is used to execute an executable in a running container. The command has to be built into the container.
Neither builtin nor cd are valid executables in your container. Only /bin/bash is.
To execute a builtin shell command, you have to execute the shell and call it as the command argument like in your third example.

Passing Variables in Makefile

I'm using a Makefile to run various docker-compose commands and I'm trying to capture the output of a script run on my local machine and pass that value to a Docker image.
start-service:
VERSION=$(shell aws s3 ls s3://redact/downloads/1.2.3/) && \
docker-compose -f ./compose/docker-compose.yml run \
-e VERSION=$$(VERSION) \
connect make run-service
When I run this I can see the variable being assigned but it still errors. Why is the value not getting passed into the -e argument:
VERSION=1.2.3-build342 && \
docker-compose -f ./compose/docker-compose.yml run --rm \
-e VERSION?=$(VERSION) \
connect make run-connect
/bin/sh: VERSION: command not found

You're mixing several different Bourne shell and Make syntaxes here. The Make $$(VERSION) translates to shell $(VERSION), which is command-substitution syntax; GNU Make $(shell ...) generally expands at the wrong time and isn't what you want here.
If you were writing this as an ordinary shell command it would look like
# Set VERSION using $(...) substitution syntax
# Refer to just plain $VERSION
VERSION=$(aws s3 ls s3://redact/downloads/1.2.3/) && ... \
-e VERSION=$VERSION ... \
So when you use this in a Make context, if none of the variables are Make variables (they get set and used in the same command), just double the $ to $$ not escape them.
start-service:
VERSION=$$(aws s3 ls s3://redact/downloads/1.2.3/) && \
docker-compose -f ./compose/docker-compose.yml run \
-e VERSION=$$VERSION \
connect make run-service

How to call multiple multiline commands in a yml script?

I'm not quite sure how to add a multiline script with multiple commands in a yml file of my CI - which is in my case a .gitlab-ci.yml:
production:
stage: deploy
image: ${DOCKER_IMAGE}
script:
- while IFS='-' read -r dom app; do
docker stop "$dom-$app" || true &&
docker rm "$dom-$app" || true
docker run
--name "$dom-$app"
--detach
--restart=always
-e VIRTUAL_HOST=$dom
"$dom-$app":latest
done < $FILE
So what I'm doing here is to read a file with a list of apps. For each line I have to stop the existing docker image, remove it and run the new one with some parameters.
How do I have to connect the docker commands (stop, rm and run)? Maybe a &&?
Do I have to use " for $dom-$app?

There are several ways to create multiline strings in YAML.
The way you are using the multiline plain string, all lines will be folded together with spaces.
Also your last line of the string isn't indented enough.
Longer strings like that should be quoted, because chances are high that there is a : or # inside the string which is special in YAML.
I suggest using literal block style, because that means the text will be interpreted exactly as you see it:
script:
- |
while IFS='-' read -r dom app; do
docker stop "$dom-$app" || true
docker rm "$dom-$app" || true
docker run \
--name "$dom-$app" \
--detach \
--restart=always \
-e VIRTUAL_HOST=$dom \
"$dom-$app":latest
done < $FILE
(Note that sequences (items starting with -) don't have to be indented, that's why the - is directly below script.)
You can find more information about YAML quoting styles on my website:
https://www.yaml.info/learn/quote.html

Does not getting build failure status even the build not successful run(cloud-build remote builder)

Cloud-build is not showing build failure status
I created my own remote-builder which scp all files from /workspace to my Instance and running build on using gcloud compute ssh -- COMMAND
remote-builder
#!/bin/bash
USERNAME=${USERNAME:-admin}
REMOTE_WORKSPACE=${REMOTE_WORKSPACE:-/home/${USERNAME}/workspace/}
GCLOUD=${GCLOUD:-gcloud}
KEYNAME=builder-key
ssh-keygen -t rsa -N "" -f ${KEYNAME} -C ${USERNAME} || true
chmod 400 ${KEYNAME}*
cat > ssh-keys <<EOF
${USERNAME}:$(cat ${KEYNAME}.pub)
EOF
${GCLOUD} compute scp --compress --recurse \
$(pwd)/ ${USERNAME}#${INSTANCE_NAME}:${REMOTE_WORKSPACE} \
--ssh-key-file=${KEYNAME}
${GCLOUD} compute ssh --ssh-key-file=${KEYNAME} \
${USERNAME}#${INSTANCE_NAME} -- ${COMMAND}
below is the example of the code to run build(cloudbuild.yaml)
steps:
- name: gcr.io/$PROJECT_ID/remote-builder
env:
- COMMAND="docker build -t [image_name]:[tagname] -f Dockerfile ."
During docker build inside Dockerfile it got failure and show errors in log but status showing SUCCESS
can any help me how to resolve it.
Thanks in advance.

try adding
|| exit 1
at the end of your docker command... alternatively, you might just need to change the entrypoint to 'bash' and run the script manually
To confirm -- the first part was the run-on.sh script, and the second part was your cloudbuild.yaml right? I assume you trigger the build manually via UI and/or REST API?

I wrote all docker commands on bash script and add below error handling code to it.
handle_error() {
echo "FAILED: line $1, exit code $2"
exit 1
}
trap 'handle_error $LINENO $?' ERR
It works!

Run inline command with pipe in docker container [duplicate]

I'm trying to run MULTIPLE commands like this.
docker run image cd /path/to/somewhere && python a.py
But this gives me "No such file or directory" error because it is interpreted as...
"docker run image cd /path/to/somewhere" && "python a.py"
It seems that some ESCAPE characters like "" or () are needed.
So I also tried
docker run image "cd /path/to/somewhere && python a.py"
docker run image (cd /path/to/somewhere && python a.py)
but these didn't work.
I have searched for Docker Run Reference but have not find any hints about ESCAPE characters.

To run multiple commands in docker, use /bin/bash -c and semicolon ;
docker run image_name /bin/bash -c "cd /path/to/somewhere; python a.py"
In case we need command2 (python) will be executed if and only if command1 (cd) returned zero (no error) exit status, use && instead of ;
docker run image_name /bin/bash -c "cd /path/to/somewhere && python a.py"

You can do this a couple of ways:
Use the -w option to change the working directory:
-w, --workdir="" Working directory inside the container
https://docs.docker.com/engine/reference/commandline/run/#set-working-directory--w
Pass the entire argument to /bin/bash:
docker run image /bin/bash -c "cd /path/to/somewhere; python a.py"

You can also pipe commands inside Docker container, bash -c "<command1> | <command2>" for example:
docker run img /bin/bash -c "ls -1 | wc -l"
But, without invoking the shell in the remote the output will be redirected to the local terminal.

bash -c works well if the commands you are running are relatively simple. However, if you're trying to run a long series of commands full of control characters, it can get complex.
I successfully got around this by piping my commands into the process from the outside, i.e.
cat script.sh | docker run -i <image> /bin/bash

Just to make a proper answer from the #Eddy Hernandez's comment and which is very correct since Alpine comes with ash not bash.
The question now referes to Starting a shell in the Docker Alpine container which implies using sh or ash or /bin/sh or /bin/ash/.
Based on the OP's question:
docker run image sh -c "cd /path/to/somewhere && python a.py"

If you want to store the result in one file outside the container, in your local machine, you can do something like this.
RES_FILE=$(readlink -f /tmp/result.txt)
docker run --rm -v ${RES_FILE}:/result.txt img bash -c "grep root /etc/passwd > /result.txt"
The result of your commands will be available in /tmp/result.txt in your local machine.

For anyone else who came here looking to do the same with docker-compose you just need to prepend bash -c and enclose multiple commands in quotes, joined together with &&.
So in the OPs example docker-compose run image bash -c "cd /path/to/somewhere && python a.py"

If you don't mind the commands running in a subshell, just put a set of outer parentheses around the multiple commands to run:
docker run image (cd /path/to/somewhere && python a.py)

TL;DR;
$ docker run --entrypoint /bin/sh image_name -c "command1 && command2 && command3"
A concern regarding the accepted answer is below.
Nobody has mentioned that docker run image_name /bin/bash -c just appends a command to the entrypoint. Some popular images are smart enough to process this correctly, but some are not.
Imagine the following Dockerfile:
FROM alpine
ENTRYPOINT ["echo"]
If you try building it as echo and running:
$ docker run echo /bin/sh -c date
You will get your command appended to the entrypoint, so that result would be echo "/bin/sh -c date".
Instead, you need to override the entrypoint:
$ docker run --entrypoint /bin/sh echo -c date
Docker run reference

In case it's not obvious, if a.py always needs to run in a particular directory, create a simple wrapper script which does the cd and then runs the script.
In your Dockerfile, replace
CMD [ 'python', 'a.py' ]
or whatever with
CMD [ '/wrapper' ]
and create a script wrapper in your root directory (or wherever it's convenient for you) with contents like
#!/bin/sh
set -e
cd /path/to/somewhere
python a.py
In many situations, perhaps also consider rewriting a.py so that it doesn't need a wrapper. Either make it os.chdir() where it needs to be, or have it look for its data files in a directory you configure in its environment or similar.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Snakemake with Singularity - cluster-computing

Related

Why are shell builtins not found when using Kubectl exec

Passing Variables in Makefile

How to call multiple multiline commands in a yml script?

Does not getting build failure status even the build not successful run(cloud-build remote builder)

Run inline command with pipe in docker container [duplicate]

Categories

Resources