I have a problem which I think should be very easy to fix, but I for some reason I cant solve it..
I am using GNU parallel inside of Snakemake to perform some variant calling.
My input file (contigs.txt) looks something like this:
GL000207.1
GL000226.1
GL000229.1
GL000231.1
GL000210.1
GL000239.1
GL000235.1
The command which ultimatly gets executes looks something like this:
"name=$(bash get_name.sh normal_recal.bam) \n"
"cat {input.contigs} | "
"env_parallel --env name --jobs {threads} "
"'({input.GATK} Mutect2 -I {input.tumor} -I {input.normal} "
"-R {input.reference} -normal $name "
"--native-pair-hmm-threads 4 "
"-L {{}} --germline-resource {input.gnomAD} "
"-O {input.path}/{wildcards.sample_id}/{{}}.somatic.vcf "
"--f1r2-tar-gz {input.path}/{wildcards.sample_id}/{{}}.f1r2.tar.gz) &> {log.err}.{{}}.err' \n"
If I execute this script in my interactive terminal every thing works as expected. However when I am trying to execute it inside snakemake, the program just ends without an error. I think I have tracked down what the problem is using this simple example:
cat contigs | env_parallel --env name --jobs 4 'echo $name'
This prints something like this:
sample_tumor GL000207.1
sample_tumor GL000226.1
sample_tumor GL000229.1
sample_tumor GL000231.1
I think that the problem is, that the input variable also gets passed to the $name variable, which causes the program to crash.
I was wondering how I can achieve, that only the the actual $name (i.e. here sample_tumor) gets passed. So the output of the above example would be
sample_tumor
sample_tumor
sample_tumor
sample_tumor
Cheers!
It turns out my guess regarding the error was wrong. I think I miss used env_parallel.
This code solved my problem.
"name=$(bash get_name.sh normal_recal.bam) \n"
"export name \n"
"cat {input.contigs} | "
"parallel --jobs {threads} "
"'({input.GATK} Mutect2 -I {input.tumor} -I {input.normal} "
"-R {input.reference} -normal $name "
"--native-pair-hmm-threads 4 "
"-L {{}} --germline-resource {input.gnomAD} "
"-O {input.path}/{wildcards.sample_id}/{{}}.somatic.vcf "
"--f1r2-tar-gz {input.path}/{wildcards.sample_id}/{{}}.f1r2.tar.gz) &> {log.err}.{{}}.err'"
Related
I need to extract some variables and functions from a zsh script into a bash script. Is there any way to do this? What I've tried (some are embarrassingly wrong, but covering everything):
. /script/path.zsh (zsh-isms exist, so it fails)
exec zsh
. /script/path.zsh
exec bash
zsh << 'EOF'
. /script/path.zsh
EOF
chsh -s zsh
. /script/path.zsh
chsh -s bash
This thread is the closest I've found. Unfortunately, I have too many items to import for that to be feasible, and neither script is anywhere near a polyglot. However, the functions and variables that I need to import are polyglots.
You can "scrape" the zsh source file for what you need, then execute the code in bash using eval. Here's an example for doing this for a few functions:
File script.zsh:
test1() {
echo "Hello from test1"
}
test2() {
echo $((1 + $1))
}
File script.sh (bash):
# Specify source script and functions
source_filename="script.zsh"
source_functions=" \
test1 \
test2 \
"
# Perform "sourcing"
function_definitions="$(python -B -c "
import re
with open('${source_filename}', mode='r') as file:
content = file.read()
for func in '${source_functions}'.split():
print(re.search(func + r'\(\).*?\n}', content, flags=re.DOTALL).group())
" )"
eval "${function_definitions}"
# Try out test functions
test1 # Hello from test1
n=5
echo "$n + 1 is $(test2 $n)" # 5 + 1 is 6
Run the bash script and it will make use of the functions test1 and test2 defined in the zsh script:
bash script.sh
The above makes use of Python, specifically its re module. It simply looks for character sequences of the form funcname(), and assumes that the function ends at the first }. So it's not very general, but works if you write your functions in this manner.
I try to use one tool and I need to use a wildcard present on input.
This is an example:
aDict = {"120":"121" } #tumor : normal
rule all:
input: expand("{case}.mutect2.vcf",case=aDict.keys())
def get_files_somatic(wildcards):
case = wildcards.case
control = aDict[wildcards.case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}'
log:
"logs/{case}.mutect2.log"
threads: 8
shell:
" gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} -I {input[1]} -normal {wildcards.control}"
" -L {params.target} -O {output}"
I Have this error:
'Wildcards' object has no attribute 'control'
So I have a function with case and control. I'm not able to extract code.
The wildcards are derived from the output file/pattern. That is why you only have the wildcard called case. You have to derive the control from that. Try replacing your shell statement with this:
run:
control = aDict[wildcards.case]
shell(
"gatk-launch Mutect2 -R {params.genome} -I {input[0]} "
"-tumor {params.name_tumor} -I {input[1]} -normal {control} "
"-L {input.target2} -O {output}"
)
You could define control in params. Also {input.target2} in shell command would result in error. May be it's supposed to be params.target?
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}',
control = lambda wildcards: aDict[wildcards.case]
shell:
"""
gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} \\
-I {input[1]} -normal {params.control} -L {params.target} -O {output}
"""
Script:
#!/bin/sh -x
ARGS=""
CMD="./run_this_prog"
. . .
ARGS="-first_args '-A select[val]' "
. . .
$CMD $ARGS
I want the commandline to be expanded like this when I run this shell script:
./run_this_prog -first_args '-A select[val]'
Instead what shell does (note the added '\' before each single quote):
+ ARGS=
+ CMD='./run_this_prog'
+ ARGS='-first_args '\''-A select[val]'\'' '
and what it ran on commandline (escaped every special char - Not what I want):
./run_this_prog -first_args \'\-A select\[val\]\'
I tried escaping single quotes like :
ARGS="-first_args \'-A select[val]\' "
But that resulted in (added '\' after each backslash):
+ ARGS=
+ CMD='./run_this_prog'
+ ARGS='-first_args \'\''-A select[val]\'\'' '
I did my googling but couldn't find anything relevant. What am I missing here?
I am using sh-3.2 on rel6 centOS.
Once a quote is inside a string, it will not work the way you want: Inside a string quotes are not syntactic elements, they are just literal characters. This is one reason why bash offers arrays.
Replace:
#!/bin/sh -x
...
ARGS="-first_args '-A select[val]' "
$CMD $ARGS
With:
#!/bin/bash -x
...
ARGS=(-first_args '-A select[val]')
"$CMD" "${ARGS[#]}"
For a much more detailed discussion of this issue, see: "I'm trying to put a command in a variable, but the complex cases always fail!"
Sorry, I'm from Brazil and my english is not fluent.
I wanna concatenate 20 files using a shellscript through cat command. However when I run it from a file, all content of files are showed on the screen.
When I run it directly from terminal, works perfectly.
That's my code above:
#!/usr/bin/ksh
set -x -a
. /PROD/INCLUDE/include.prod
DATE=`date +'%Y%m%d%H%M%S'`
FINAL_NAME=$1
# check if all paremeters are passed
if [ -z $FINAL_NAME ]; then
echo "Please pass the final name as parameter"
exit 1
fi
# concatenate files
cat $DIRFILE/AI6LM760_AI6_CF2_SLOTP01* $DIRFILE/AI6LM761_AI6_CF2_SLOTP02* $DIRFILE/AI6LM763_AI6_CF2_SLOTP04* \
$DIRFILE/AI6LM764_AI6_CF2_SLOTP05* $DIRFILE/AI6LM765_AI6_CF2_SLOTP06* $DIRFILE/AI6LM766_AI6_CF2_SLOTP07* \
$DIRFILE/AI6LM767_AI6_CF2_SLOTP08* $DIRFILE/AI6LM768_AI6_CF2_SLOTP09* $DIRFILE/AI6LM769_AI6_CF2_SLOTP10* \
$DIRFILE/AI6LM770_AI6_CF2_SLOTP11* $DIRFILE/AI6LM771_AI6_CF2_SLOTP12* $DIRFILE/AI6LM772_AI6_CF2_SLOTP13* \
$DIRFILE/AI6LM773_AI6_CF2_SLOTP14* $DIRFILE/AI6LM774_AI6_CF2_SLOTP15* $DIRFILE/AI6LM775_AI6_CF2_SLOTP16* \
$DIRFILE/AI6LM776_AI6_CF2_SLOTP17* $DIRFILE/AI6LM777_AI6_CF2_SLOTP18* $DIRFILE/AI6LM778_AI6_CF2_SLOTP19* \
$DIRFILE/AI6LM779_AI6_CF2_SLOTP20* > CF2_FINAL_TEMP
mv $DIRFILE/CF2_FINAL_TEMP $DIRFILE/$FINAL_NAME
I solved the problem putting the cat block inside a function, and redirecting stdout to the final file.
Ex:
concatenate()
I am trying to create a string with a query that will be save / send to another location, this string contains different variables.
The issue that I am having is that the echo of the variables are completely upside down and mix.
See code below:
tokenID=$(docker exec -ti $dockerContainerID /bin/sh -c "cat /tempdir/tokenfile.txt")
serverName="asdasd"
attQuery="$tokenID $serverName"
agentRegQuery="$./opt/mule/bin/amc_setup -H $attQuery"
echo TOKEN ID $tokenID
echo SERVER NAME $serverName
echo $attQuery
echo $agentRegQuery
Find below the output I am receiving:
TOKEN ID 29a6966f-fa0e-4f08-87eb-418722872d80---46407
SERVER NAME asdasd
asdasdf-fa0e-4f08-87eb-418722872d80---46407
asdasdmule/bin/amc_setup -H 29a6966f-fa0e-4f08-87eb-418722872d80---46407
There's a carriage return character at the end of the tokenID variable, probably because /tempdir/tokenfile.txt is in DOS/Windows format (lines end with carriage return+linefeed), not unix (lines end with just linefeed). When you print tokenID by itself, it looks ok, but if you print something else after that on the same line, it winds up overwriting the first part of the line. So when you print $attQuery, it prints this:
29a6966f-fa0e-4f08-87eb-418722872d80---46407[carriage return]
asdasd
...but with the second line printed on top of the first, so it comes out as:
asdasdf-fa0e-4f08-87eb-418722872d80---46407
The solution is to either convert the file to unix format (dos2unix will do this if you have it), or remove the carriage return in your script. You can do it like this:
tokenID=$(docker exec -ti $dockerContainerID /bin/sh -c "cat /tempdir/tokenfile.txt" | tr -d '\r')
I think everything works as it should
echo TOKEN ID $tokenID -> TOKEN ID 29a6966f-fa0e-4f08-87eb-418722872d80---46407
echo SERVER NAME $serverName -> SERVER NAME asdasd
echo $attQuery -> asdasdf-fa0e-4f08-87eb-418722872d80---46407
echo $agentRegQuery -> asdasdmule/bin/amc_setup -H 29a6966f-fa0e-4f08-87eb-418722872d80---46407
Why do you think something is wron here?
Best regards, Georg