Run rule in Snakemake only if another rule fails, for the specific samples that it failed for? - bioinformatics

I'm running a metagenomics pipeline in Snakemake. I am running MetaSPAdes for my assemblies, but it's not uncommon that MetaSPAdes will often fail for particular samples. If MetaSPAdes fails, I want to run MEGAHIT on only the samples that it failed for. Is there any way to create this sort of rule dependancy in Snakemake?
For example:
generate a particular file if a rule fails (in this case, assembly with MetaSPAdes). I suppose this would mean that the output of the MetaSPAdes rule needs to be either the contigs, or a "this failed" output file. This would help Snakemake recognize not to re-run this rule.
create a list of samples that the rule failed for, and
run a different rule only on this list of samples with failed MetaSPAdes assemblies (in this case, run MEGAHIT instead on those samples).
Has anyone figured out an elegant way to do something like this?

I'm not familiar with the programs you mention but I think you don't need separate rules for what you need. You can write a single rule that tries to run metaspades first and if it fails try megahit. For example:
rule assembly:
input:
'{sample}.in',
output:
'{sample}.out',
run:
import subprocess
p = subprocess.Popen("MetaSPAdes {input} > {output}", shell= True, stderr= subprocess.PIPE, stdout= subprocess.PIPE)
stdout, stderr= p.communicate()
if p.returncode != 0:
shell("megahit {input} > {output}")
stdout, stderr= p.communicate() captures the stderr, stdout and return code of the process. You can analyse stderr and/or the returncode to decide what to do next. You probably need something more than the above but hopefully the idea is about right.

Related

Where should I start to debug when Make throws a particular error

My knowledge of Make is small. I have been told that everything you put after make (that does not contain "-") is a target.
Well a building process I have is failing.
First there is a line
make path/to/configuration_file
configuration_file is not a target. It is a autogenerated configuration file buried inside the directory structure ("path/to") that is of the form
#
# Boot Configuration
#
#
# DRAM Component
#
CONFIG_DRAM_TYPE_LPDDR4=y
# CONFIG_DRAM_TYPE_DDR4 is not set
CONFIG_DDR_SIZE=0x80000000
#
# Boot Device
#
# CONFIG_ENABLE_EMMC_BOOT is not set
# CONFIG_ENABLE_NAND_BOOT is not set
CONFIG_ENABLE_SPINAND_BOOT=y
# CONFIG_ENABLE_SPINOR_BOOT is not set
CONFIG_EMMC_ACCESS_8BIT=y
# CONFIG_EMMC_ACCESS_4BIT is not set
# CONFIG_EMMC_ACCESS_1BIT is not set
so I cannot understand how this is a target. For reference, when I run make there is a Makefile but this Makefile does not reference this file.
Still this line is going well.
The path where it fails says
make diags
and I have verified there is no "diags" target.
I will print here the error file that can give us more info of what is happening
GEN cortex_a/output/Makefile
Init diag test "orc_scheduler" ...
remoteconfig: Failed to generate configure in cortex_a/soc/visio/tests/orc_scheduler!
Makefile:11 recipe for target 'orc_scheduler-init' failed
make[10]: *** [orc_scheduler-init] Error 25
At least what I would like to know is how to interpret this error message. I don't know what the "11" or the "10" or the "25" refers to.
make is fundamentally a tool for automatically running commands in the right order so you don't have to type them in yourself. So all the commands make runs are commands that you could just type into your shell prompt. And all the errors that those commands generate are the same ones that you would see if you typed the command yourself. So, looking at make to try to understand those errors is looking in the wrong place: you have to look at the documentation for whatever command was invoked.
A "target" is just a file that make knows how to build. The fact that when you typed make <somefile> is didn't give you an error that it doesn't know how to build <somefile>, means that <somefile> is a target as far as your makefiles are concerned.
The error message Makefile:11: simply refers to the filename Makefile, line 11, which is where the command that make ran, that failed, can be found. But this likely won't help you solve the problem of why the command failed (unless the problem is you invoked it with the wrong arguments and you need to adjust the makefile to specify different arguments).
The command that failed generated the message:
remoteconfig: Failed to generate configure in cortex_a/soc/visio/tests/orc_scheduler!
I don't know what that means, but it's not related to make. You'll need to find out what this remoteconfig command is, what it does, and why it failed. It's unfortunate that it doesn't show any better error message as to why it failed to "generate configure", but again there's nothing make can do about that.
If you want to learn more about make you can look at the GNU make manual (note, GNU make is only one implementation of make; there are others and they are fundamentally the same but different in details).

Print MSTest summary after command line exeution

When running a large set of tests using MsTest from the command line, I can see each test executing and its outcome logged in the window like so:
Passed Some.NameSpace.Test1
Passed Some.NameSpace.Test2
And so on for thousands of tests. Once completed, MsTest will spit out a summary like this
Summary
---------
Test run failed
Passed 2000
Failed 1
------------
Total 2001
At this point I either have to start scrolling backwards in the window trying to find the needle in a haystack that represents my single failing test, or I can open the huge xml file that represents the result, and text-search for some keyword indicating a failed test.
Isn't there an easier way? Can I have MsTest report progress without dumping Passed test names to the console (still logging failed ones), or can I have a summary of just Failed tests at the end?
I think its obvious what any command line user wants to do: follow progress AND know the outcome at the end, without having to read xml or browse the cmd window history.
Answering my own question: A simple wrapper/parser script that calls MsTest.exe and parses/summarizes the output, either the stdout or the trx, is the only solution it seems.
You could use the TestContext.CurrentTestOutcome at the end of each test to determine if the test was failed and then log all failed tests to a different file.
[TestCleanup]
public void CleanUp()
{
if (TestContext.CurrentTestOutcome.ToString().Equals("Failed"))
{
TestContext.WriteLine("{0}.{1} ==> {2}", TestContext.FullyQualifiedTestClassName,
TestContext.TestName, TestContext.CurrentTestOutcome.ToString());
//Log the result to a file.
}
}
I don't know if this could help you.

Rake synthesized tasks and file date checking

I have a Rakefile I'm using to generate HTML from markdown (and do some other stuff that's irrelevant to the question.
I'm generating files from my source, .feature files (in the FileList DOCUMENTS), into my output directory OUTPUT as HTML. I have an htmlfile method to assemble and write my HTML file.
I'm trying two alternative options here:
File tasks:
DOCUMENTS.each do |doc|
file doc.pathmap("#{OUTPUT}/%X.html") => doc do |t|
htmlfile t.name, RDiscount.new(F.read doc).to_html, t.name.pathmap('%n')
end
end
Synthesized file tasks with a rule:
rule '.html' => proc {|html| html.pathmap("%{#{OUTPUT}/,}X.feature")} do |t|
htmlfile t.name, RDiscount.new(F.read t.source).to_html, t.name.pathmap('%n')
end
My understanding was that the latter option would synthesize file tasks, and have the same net effect. However I find that if I choose it, it does not cope with incremental building, whereas the first option does.
If I build, then modify one file, then run rake --trace I get the following:
With synthesized tasks:
** Invoke output/Module/Feature.html (first_time, not_needed)
** Invoke output/Module (not_needed)
And with the explicit file tasks:
** Invoke output/Module/Feature.html (first_time)
** Invoke output/Module (not_needed)
** Invoke Module/Feature.feature (first_time, not_needed)
** Execute output/Module/Feature.html
This option is clearly checking the source file. I thought linking output and source was exactly what rule
(I believe it's most helpful to put the answer as an actual answer, rather than a comment. See https://meta.stackexchange.com/questions/68507/what-to-do-if-you-find-the-answer-to-your-own-question)
It turns out that if you have file outdoc => something elsewhere in your Rakefile, it will mess with synthesized tasks. Whereas if you have file tasks for those output documents it adds to the pre-requisites and works fine. This sort of makes sense; synthesized tasks don't really exist.
I also found out that rules only work to one level of inference ( http://onestepback.org/articles/buildingwithrake/rulelimitations.html) though that didn't turn out to be the answer.
Fix: rearrange pre-requisites of tasks, or use the explicit file tasks.

How to explicitly fail a task in ruby rake?

Let's say I have a rakefile like this:
file 'file1' => some_dependencies do
sh 'external tool I do not have control over, which sometimes fail to create the file'
???
end
task :default => 'file1' do
puts "everything's OK"
end
Now if I put nothing in place of ???, I get the OK message, even if the external tool fails to generate file. What is the proper way to informing rake, that 'file1' task has failed and it should abort (hopefully presenting a meaningful message - like which task did fail) - the only think I can think of now is raising an exception there, but that just doesn't seem right.
P.S The tool always returns 0 as exit code.
Use the raise or fail method as you would for any other Ruby script (fail is an alias for raise). This method takes a string or exception as an argument which is used as the error message displayed at termination of the script. This will also cause the script to return the value 1 to the calling shell. It is documented here and other places.
You can use abort("message") to gracefully fail rake task.
It will print message to stdout and exit with code 1.
Exit code 1 is a failure in Unix-like systems.
See Kernel#abort for details.

Redirect Output of Capistrano

I have a Capistrano deploy file (Capfile) that is rather large, contains a few namespaces and generally has a lot of information already in it. My ultimate goal is, using the Tinder gem, paste the output of the entire deployment into Campfire. I have Tinder setup properly already.
I looked into using the Capistrano capture method, but that only works for the first host. Additionally that would be a lot of work to go through and add something like:
output << capture 'foocommand'
Specifically, I am looking to capture the output of any deployment from that file into a variable (in addition to putting it to STDOUT so I can see it), then pass that output in the variable into a function called notify_campfire. Since the notify_campfire function is getting called at the end of a task (every task regardless of the namespace), it should have the task name available to it and the output (which is stored in that output variable). Any thoughts on how to accomplish this would be greatly appreciated.
I recommend not messing with the Capistrano logger, Instead use what unix gives you and use pipes:
cap deploy | my_logger.rb
Where your logger reads STDIN and STDOUT and both records, and pipes it back to the appropriate stream.
For an alternative, the Engineyard cap recipies have a logger – this might be a useful reference if you do need to edit the code, but I recommend not doing.
It's sort of a hackish means of solving your problem, but you could try running the deploy task in a Rake task and capturing the output using %x.
# ...in your Rakefile...
task :deploy_and_notify do
output = %x[ cap deploy ] # Run your deploy task here.
notify_campfire(output)
puts output # Echo the output.
end

Resources