I am currently aligning paired-end reads to the reference genome (genome index created) and the goal is to end up with a single bam file. This is the code that I am using and everything works fine until the last code line. I get the error message that the file 'SRR5882797_10M.bam' doesn't exist. This file doesn't exist yet, of course, but this is what I am trying to send my output file to and is therefore, supposed to be created with this code. I am not sure how to fix this since it seems to be asking me to have a file in the folder already. Thanks :)
bwa mem -t 2 Refs/Athaliana/Arabidopsis_thaliana_TAIR10 02_trimmedData/fastq/SRR5882797_10M_1.fastq.gz 02_trimmedData/fastq/SRR5882797_10M_2.fastq.gz |
samtools view -bhS -F4 - > 03_alignedData/bam/SRR5882797_10M.bam
Related
I'm trying to create a temporary file in my pipeline, then use that file in another rule.
For example, I have two rules in a .smk file:
#Unzip adapter trimmed fastq file
rule unzip_fastq:
input:
'{sample}.adapterTrim.round2.fastq.gz',
output:
temp('{sample}.adapterTrim.round2.fastq')
conda:
'../envs/rep_element.yaml'
shell:
'gunzip -c {input[0]} > {output[0]}'
#Run bowtie2 to align to rep elements and parse output
rule parse_bowtie2_output_realtime:
input:
'{sample}.adapterTrim.round2.fastq'
output:
'rep_element_pipeline/{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam'
params:
bt2=config["ref"]["bt2_index_path"], eid=config["ref"]["enst2id"]
conda:
'../envs/rep_element.yaml'
shell:
'perl ../scripts/parse_bowtie2_output_realtime_includemultifamily.pl '
'{input[0]} {params.bt2} {output[0]} {params.eid}'
{sample}.adapterTrim.round2.fastq is used once and should ultimately be deleted upon completion. However, I'm finding that this file is uploaded to Amazon S3, even with the addition of temp(). I'm also finding that this file is removed locally, but still persists on S3.
Am I doing this correctly? '{sample}.adapterTrim.round2.fastq' is not currently written in the rule-all of the Snakefile.
We ultimately need to prevent this file from being uploaded to S3, so if there is a way to specify not to upload this file in the rule, that would be useful.
It seems that the snippet in the question is not consistent with actual use, since for S3 files one would need to wrap file names in remote.
However, as a general solution, documentation contains the following:
The remote() wrapper is mutually-exclusive with the temp() and protected() wrappers.
Hence, if you intend to use a temp file, make sure it's not wrapped in remote, or explicitly wrap the file in local.
I would like to write a script that restores a file, but preserving the changes that may be done after the backout file is created.
With more details: at some moment I create a backup of a file (file_orig). Do some changes to the original file as well(file_my_changes). After that, the original file can be changed again (file_additional_changes), but after the restore I want to have the backup file, plus the additional changes (file_orig + file_addtional_changes). In general backing out my changes only.
I am talking about grub.cfg file, so the expected possible changes will be adding or removing parts of a line.
Is it possible this to be done with a bash script?
I have 2 ideas:
Add some comments above the lines I am going to change, and then before the resotore if the line differ from the one from the backed out file, to read the comment, which will tell me what exactly to remove from the line;
If there is a way to display only the part of the line that differs from the file_orig and file_additional_changes, then to replace this line with the line from file_orig + the part that differs. But I am not sure if this is possible to be done at all.
Example"
line1: This is line1
line2: This is another line1
Is it possible to display only "another"?
Of course any other ideas are welcome!
Thank you!
Unclear, but perhaps if you're using a bash script you could run a diff on the 2 edited file and the last one and save that output someplace that you want to keep it? That would mean you have a copy of the changes.
Or just use git like everybody else.
One possibility would be to use POSIX commands patch and
diff.
Create the backup:
cp operational-file operational-file.001
Edit the operational file.
Create a patch from the differences:
diff -u operational-file.001 operational-file > operational-file.patch001
Copy the operational file again.
cp operational-file operational-file.002
Edit the operational file again.
Create a new patch
diff -u operational-file.002 operational-file > operational-file.patch002
If you need to recover but skip the changes from patch.001, then:
cp operational-file.001 operational-file
patch -i patch.002
This would apply just the second set of changes to the original file, as log as there's no overlap.
Consider using a version control system to keep records of the file changes. Consider using date/time stamps instead of version numbers on the file names.
I want to validate my XML's for well-formed ness, but some of my files are not having a single root (which is fine as per business req eg. <ri>...</ri><ri>..</ri> is valid xml in my context) , xmlwf can do this, but it flags out a file if it's not having single root, So wanted to build a custom script which internally uses xmlwf, my custom script should do below,
iterate through list of files passed as input (eg. sample.xml or s*.xml or *.xml)
for each file prepare a temporary file as <A>+contents of file+</A>
and call xmlwf on that temp file,
Can some one help on this?
You could add text to the beginning and end of the file using cat and bash, so that your file has a root added to it for validation purposes.
cat <(echo '<root>') sample.xml <(echo '</root>') | xmlwf
This way you don't need to write temporary files out.
Can you give me a sample on how to filter a certain keyword like for example "error" in the /var/log/messages and then send email if it finds real-time word for error.
I would just like to watch for error keyword in the /var/log/messages and then send it to my email address.
simply grepit.
tail -f log.log | grep error
This will list you all error you can then mail them
What you can do is this:
On a regular basis (which you decide), you:
copy the main file to another file
you DIFF on that file, only taking out the newly added parts (if the file is sequentially written, this will be a nice and clean block of lines, at the end of the file)
you copy the main file to the other file, again (this sets the new reference for the next check)
then you GREP on whatever you want, in the block of lines you've found 2 steps back
you report the found lines, using the wanted method (mail,..)
My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!
Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.