Snakemake deletes the temp files prematurely, while running with --immediate-submit option - job-scheduling

While I submit jobs with --immediate-submit and --dependency=afterok:{dependencies}, the temp files are getting deleted even before the rules that depend on temp files are started. It's working perfectly when running in normal way. Does any other came across this issue?
Snakemake script
rule all:
input: 'file2', 'file3'
rule one:
input: 'file1'
output: temp('file2')
shell: "touch {output}"
rule two:
input: 'file2'
output: 'file3'
shell: "touch {output}"
Submission command
snakemake -s sample.snake --cluster '../mybatch.py {dependencies}' -j 4 --immediate-submit
Snakemake output message
Provided cluster nodes: 4
Job counts:
count jobs
1 all
1 one
1 two
3
rule one:
input: file1
output: file2
jobid: 1
rule two:
input: file2
output: file3
jobid: 2
localrule all:
input: file2, file3
jobid: 0
Removing temporary output file file2.
Finished job 0.
1 of 3 steps (33%) done
Removing temporary output file file2.
Finished job 1.
2 of 3 steps (67%) done
localrule all:
input: file2, file3
jobid: 0
Removing temporary output file file2.
Finished job 0.
3 of 3 steps (100%) done
Error in job two while creating output file file3.
ClusterJobException in line 9 of /faststorage/home/veera/pipelines/ipsych-GATK/test/sample.snake:
Error executing rule two on cluster (jobid: 2, jobscript: /faststorage/home/veera/pipelines/ipsych-GATK/test/.snakemake/tmp.cmvmr3lz/snakejob.two.2.sh). For
detailed error see the cluster log.
Will exit after finishing currently running jobs.
Error message
Missing files after 5 seconds:
file2
/faststorage/home/veera/pipelines/ipsych-GATK/test/.snakemake/tmp.jqh2qz0n
touch: cannot touch `/faststorage/home/veera/pipelines/ipsych-GATK/test/.snakemake/tmp.jqh2qz0n/1.jobfailed': No such file or directory
Missing files after 5 seconds:
file2
The jobs are being submitted with proper dependency. The ../mybatch.py is custom wrapper script for sbatch. Is this a bug or error in my code? Thanks for the help in advance.

Related

'xaa' would overwrite input: error in using split command in bash

I am trying to split a 13Gb file into equal chunks using Linux Bash Shell in windows 10 by:
split -n l/13 myfile.csv
and I am getting the following error:
split: 'xaa' would overwrite input; aborting
the xaa which is created is empty.
I have also tried using:
split -l 9000000 myfile.csv
which wields the same results.
I have used the split command before with similar arguments with no problem.
Any ideas what am I missing?
Thanks in advance
EDIT: even if i provide my own prefix I still get the same error:
split -n l/13 myfile.csv completenewprefix
split: 'completenewprefixaa' would overwrite input; aborting
EDIT2:
ls -di completenewprefixaa myfile.csv
1 completenewprefixaa 1 myfile.csv
findmnt -T .
TARGET SOURCE FSTYPE OPTIONS
/mnt/u U: drvfs rw,relatime,case=off

Save output file each run in a loop

I just want to change one number inside a file then run it and save the output file. This need to be done in an iterative way.
cat file1
decay=( 1 5 2 6 )
for i in $decay
do
run file2.i
done
cat file2.i
cell 215 cell 115 cell ${i}
So now for each decay(i), I need file2.i to use the decay(i). Also, the output file will always have the same name file2.i_outp. How to save the output file each run instead of overwriting each i.
Copy the template file2.i to a temporary file, replacing ${decay} with the current array element.
decay=(1 5 2 6)
for i in $decay
do
sed "s/\\${i}/$i/" file2.i > /tmp/file2.i.$$
run /tmp/file2.i.$$
rm /tmp/file2.i.$$
done > file2.i_outp
This one worked fine
for i in {1 5 2 6}
do
sed "s/\\${i}/$i/" file2.i > /tmp/file2.i.$$
run /tmp/file2.i.$$
rm /tmp/file2.i.$$
done > file2.i_outp

find pattern getting from file in directory with files

i have first file with 3 lines :
test1
test2
test3
i use the grep cmd to search every lines from directory with 10 files :
grep -Ril "test2"
result is :
/usr/src/files/rog.txt
i need grep to delete the 5 lines from the finiding file , 2 lines before and 2 after test2
please can help me for good grep use .
There is one way to use the -A and -B options of grep. But to use it, you need to perform two steps.
First you select all matches with the previous and next line and with that list, but that would have some side effects (which in your application are probably acceptable).
To do this, you issue the following commands:
grep -A 2 -B 2 "test2" file1.txt > negative.txt
grep -v -f negative.txt file1.txt
The first line outputs all findings of test2 in file1.txt accompanied by the 2 preceding and 2 succeeding lines of each line found. If I got your question right, this is the "negative" of the lines you want. The second line now lists all lines from file1.txt which do not correspond to a "negative line". This should be close to what you need.
There is only one side effect which you should know. If file1.txt contains duplicate lines like this:
test1
test2
test3
test4
...
test11
test12
test3
test4
The code above would also filter out the two last lines, even though there is no "test2" line near because they are duplicates of the lines 3 and 4 which were written to "negative.txt" because of line 2. But if you're processing file lists probably duplicates are no issue.

Compare a string from a csv file againt a lookup file in unix shell scripting

I am new to unix scripting. I have a requirement that I need to check a string from a csv file against a set of strings in another csv file. My file1 will have values like below,
file1
task desc
1 network error in there in line one.
2 data error is there in line three.
3 connection is missing from device.
4 new error is there
And file2 will have the standard lookup data
file2
Network error
data error
connection is
The file2 is static and my file1 changes daily. My requirement is to check each line in the file1 and check against the strings in file2. if there are any new error entry in the file1, then that entire row needs to be write into a new file called file3. this kind of daily standard error checking from a log file and report the new error. For example from my above example, the output file3 needs to look like below.
file3
4 new error is there
tail + grep solution:
tail -n +2 file1 | grep -ivf file2 > file3
> cat file3
4 new error is there
tail -n +2 file1 - output the last part of file1 starting from the 2nd line
grep options:
-f - obtain patterns from file
-i - caseinsensitive matching
-v - invert the sense of matching, to select non-matching lines

Redirection operator in UNIX

Suppose I have three files file1 file2 file3 having some content
Now when I do this on shell prompt cat file1 > file2 >file3
Content of file1 is copied to file3 and file2 becomes empty
Similarly when I do cat > file1 > file2 > file3
It ask for input and this input is stored in file3 and both file1 and file2 are empty
and also for cat > file1 > file2 < file3 contents of file3 is copied to file2 and file1 is empty.
Can someone please explain to me what is happening I am new to UNIX. Also any website where I can learn more about these redirection operators.
Thanks
Consider how the shell processes each part of the command as it parses it:
cat file1 > file2 >file3
cat file1: prepare a new process with the cat program image with argument file1. ( given 1 or more arguments, cat will read from each argument as a file and write to its output file descriptor)
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (initially the console for an interactive shell) - create `file2 if necessary.
> file3: change the new process' output file descriptor to write to file3 instead of the current output sink (was file2) - create file3 if necessary
End of command: Spawn the new process
So in the end, file2 is created, but unused. file3 gets the data.
cat > file1 > file2 > file3
cat: prepare a new process with the cat program/image with no arguments. (given no arguments, cat will read from its input file descriptor and write to its output file descriptor)
> file1: change the new process' output file descriptor to write to file1 instead of the current output sink (initially the console for an interactive shell) - create file1 if necessary.
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (was file1) - create file2 if necessary.
> file3: change the new process' output file descriptor to write to file3 instead of the current output sink - (was file2) create file3 if necessary
End of command: Spawn the new process
So in the end, file1 and file2 are created, but unused. file3 gets the data. cat waits for input on its input device (the console device as default for an interactive shell). Any input that cat receives will go to its output device (which ends up being file3 by the time the shell finished processing the command and invoked cat).
cat > file1 > file2 < file3
cat: prepare a new process with the cat program/image with no arguments. (given no arguments, cat will read from its input file descriptor and write to its output file descriptor)
> file1: change the new process' output file descriptor to write to file1 instead of the current output sink (initially the console for an interactive shell) - create file1 if necessary.
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (was file1) - create file2 if necessary.
< file3: change the new process' input file descriptor to read from file3 instead of the current input source (initially the console for an interactive shell)
End of command: Spawn the new process
So in the end, file1 is created, but unused. file2 gets the data. cat waits for input on its input device (which as set to file3 by the time the shell finished processing the command and invoked cat). Any input that cat receives will go to its output device (which ends up being file2 by the time the shell finished processing the command and invoked cat).
--
Note that in the first example, cat is the one who processes/opens file1. The shell simply passed the word file1 to the program as an argument. However, the shell opened/created file2 and file3. cat knew nothing about file3 and has no idea where the stuff it was writing to its standard output was going.
In the other 2 examples, the shell opened all the files. cat knew nothing about any files. cat had no idea where its standard input was coming from and where its standard output was going to.
Per #Sorpigal comment - the BASH manual has some good descriptions of what the different redirection operators do. Much of it is the same across different Unix shells to varying degrees, but consult your specific shell manual/manpage to confirm. Thanks #Sorpigal.
http://gnu.org/software/bash/manual/html_node/Redirections.html
You can redirect the standard input < standard output 1> or > error output 2> or both outputs &> but you can only redirect 1:1, you can't redirect one output into two different files.
What you are looking for is the tee utility.
If you don't want to lose original content, you should use redirect and append >> or << operators instead. You can read more here.

Resources