I want to include RSeQC results using multiQC in a snakemake workflows.
I have the issue that one of the RSeQC tool only reports a .r and a .pdf while it seems that multiQC requires a .txt input to create a plot.
Has anyone working code for snakemake that recover info from RSeQC into a multiQC rule.
As this is a combination of three tools, it is difficult to get support.
My code here of which only the geneBodyCoverage.txt RSeQC output is used (not the two .r outputs and especially junctionSaturation_plot.r of which there is nothing else than the .r and the png picture)
rule multiqc_global:
"""
Aggregate all MultiQC reports
"""
input:
expand("intermediate/{smp}_fastqc.zip", smp=SAMPLES),
expand("intermediate/merged_{smp}_fastqc.zip", smp=SAMPLES),
expand("logs/star/{smp}_Log.final.out", smp=SAMPLES),
expand("intermediate/{smp}.geneBodyCoverage.txt", smp=SAMPLES),
expand("intermediate/{smp}.geneBodyCoverage.r", smp=SAMPLES),
expand("intermediate/{smp}.junctionSaturation_plot.r", smp=SAMPLES),
output:
html = "results/global_multiqc.html",
stats = "intermediate/global_multiqc_general_stats.txt"
log:
"logs/multiqc/global_multiqc.log"
version: "1.0"
shadow: "minimal"
shell:
"""
# Run multiQC and keep the html report
multiqc -n multiqc.html {input} 2> {log}
mv multiqc.html {output.html}
mv multiqc_data/multiqc_general_stats.txt {output.stats}
"""
This is sort of anecdotal, since as #JeeYem pointed out in a comment, it could depend on what analysis you're running with RSeQC. Here's how I use the read_distribution.py analysis in snakemake, which generates a compatible file that MultiQC recognizes.
rule read_distribution:
input:
bam = "data/bam/{srr}.bam",
bed = config["gencodeBED"]
output:
"qc/aligned/{srr}.read_distribution.txt"
shell:
"""
read_distribution.py -i {input.bam} -r {input.bed} &> {output}
"""
Basically, just redirect the stdout and err streams to a file. Hopefully, it's a similar thing for the other RSeQC scripts.
Related
I am trying to run a script to modify the documentation built by sphinx hosted by Read the Docs (because some links are not properly handled). The script works when I try to build it locally, but either fails on the Read the Docs build or the changes do not get propagated to the web site.
The script I'm trying to run is super simple, it replaces some html links that are not properly converted by sphinx-markdown-tables:
#!/bin/bash
# fix_table_links.sh
FILE="_build/html/api_reference.html"
if [[ "$1" != "" ]]; then
FILE="$1"
fi
sed -E 's/a href="(.*)\.md"/a href="\1\.html"/g' -i ${FILE}
My readthedocs.yml looks like this:
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Optionally build your docs in additional formats such as PDF and ePub
formats: all
# Optionally set the version of Python and requirements required to build your docs
python:
install:
- requirements: docs/requirements.readthedocs.txt
build:
os: ubuntu-20.04
tools:
python: "3.8"
jobs:
post_build:
- echo "Running post-build commands."
- bash docs/fix_table_links.sh _readthedocs/html/api_reference.html
There are two cases:
Case 1) Using the readthedocs.yml as above, the build fails because _readthedocs/html/api_reference.html does not exist, despite this directory being the place the documentation claims will get uploaded from here. An example failure of this run is here.
Case 2) If I change the final of readthedocs.yml to bash docs/fix_table_links.sh docs/_build/html/api_reference.html, then the build passes (example here). But the links are not updated on the Read the Docs site: they still point to markdown pages rather than their corresponding HTML pages, so it must not be the version that gets uploaded to the Read the Docs web site.
Wading through documentation, I can't figure out how do this. Has anybody done this before or have a better grasp on how Read the Docs builds work? Thanks!
If you're willing to rewrite the script as a Python function, then you can do this super easily by connecting it as an event handler for the build-finished event.
I've done something similar in one of my own repos, except I post-process a .rst file. It's not actually used on RTD, but I can see it works in the build logs. So it should work to post-process your HTML files as well, since the build-finished event would occur after they've been generated.
First, define the script as a function in your conf.py. It needs to have app and exception as parameters.
def replace_html_links(app, exception):
with open(FILE, 'r') as f:
html = f.read()
# stuff to edit and save the html
Then either define or add to your setup function:
def setup(app):
app.connect('build-finished', replace_html_links)
And that's it!
I've been working with Terraform, v0.15.4, for a few weeks now, and have gotten to grips with most of the lingo. I'm currently trying to create a cluster of RHEL 7 instances dynamically on GCP, and have, for the most part, got it to run okay.
I'm at the point of deploying an instance with certain metadata passed along to it for use in scripts built into the machine image for configuration thereafter. This metadata is typically just passed via an echo into a text file, which the scripts then pickup as required.
It's... very simple. Echo "STUFF" > file... Alas, I am hitting the same issue OVER AND OVER and it's driving me INSANE. I've Google'd around for ages, but all I can find is examples of the exact thing that I'm doing, the only difference is that theirs works, mine doesn't... So hopefully I can get some help here.
My 'makes it half-way' code is as follows:
resource "google_compute_instance" "GP_Master_Node" {
...
metadata_startup_script = <<-EOF
echo "hello
you" > /test.txt
echo "help
me" > /test2.txt
EOF
Now the instance with this does create successfully, although when I look onto the instance, I get one file called ' /test.txt? ' (or if I 'ls' the file, it shows as ' /test.txt^M ') and no second file.. I can run any command instead of echo, and whilst the first finishes, the second+ does not. Why?? What on earth is causing that??
The following code I found also, but it doesn't work for me at all, with the error, 'Blocks of type "metadata" are not expected here.'
resource "google_compute_instance" "GP_Master_Node" {
...
metadata {
startup-script = "echo test > /test.txt"
}
Okaaaaay! Simple answer for a, in hindsight, silly question (sort of). The file was somehow formmated in DOS, meaning the script required a line continuation character to run correctly (specifically \ at the end of each individual command). Code as follows:
resource "google_compute_instance" "GP_Master_Node" {
...
metadata_startup_script = <<-EOF
echo "hello
you" > /test.txt \
echo "help
me" > /test2.txt \
echo "example1" > /test3.txt \
echo "and so on..." > /final.txt
EOF
However, what also fixed my issue was just 'refreshing' the file (probably a word for this, I don't know). I created a brand new file using touch, 'more'd the original file contents to screen, and then copy pasted them into the new one. On save, it is no longer DOS, as expected, and then when I run terraform the code runs as expected without requiring the line continuation characters at the end of commands.
Thank you to commentors for the help :)
I'm looking for a way to record violation findings of shellcheck in my Jenkins Pipeline script. I was not able to find something so far. For other tools (Java, Python), I'm using Warnings Next Generation, but it does not seem to support shellcheck, yet. I'd like to have the violations visualized within my Jenkins Job dashboard. Does anyone have experience with that? Or perhaps a ready to use custom tool for Warnings NG?
I did find a feasible solution myself. Like suggested in the comments, spellcheck offers checkstyle format, which can be parsed and visualized with Warnings NG. The following Pipeline stage definition works fine.
stage('Analyze') {
steps {
catchError(buildResult: 'SUCCESS') {
sh """#!/bin/bash
# The null command `:` only returns exit code 0 to ensure following task executions.
shellcheck -f checkstyle *.sh > shellcheck.xml || :
"""
recordIssues(tools: [checkStyle(pattern: 'shellcheck.xml')])
}
}
}
Running the build generates a nice trend diagram like follows.
Running shellcheck for all files merging the output in a single xml file didn't play well with recordIssues in my case.
I had create individual report for each source file to make it work.
stage('Shellcheck') {
steps {
catchError(
buildResult: hudson.model.Result.SUCCESS.toString(),
stageResult: hudson.model.Result.UNSTABLE.toString(),
message: "shellcheck error detected, but allowing job to continue!")
{
sh '''
# shellcheck with all files in single xml doesnt play well with jenkins report
ret=0
for file in $(grep -rl '^#!/.*/bash' src); do
echo shellcheck ${file}
mkdir -p .checkstyle/${file}/
shellcheck -f checkstyle ${file} > .checkstyle/${file}/shellcheck.xml || (( ret+=$? ))
done
exit ${ret}
'''
}//catchError
}//steps
post {
always {
recordIssues(tools: [checkStyle(pattern: '.checkstyle/**/shellcheck.xml')])
}
}//post
}//stage
We have a rails app, and we use webpack which takes multiple javascript files and outputs one single javascript file. It takes a long time to run, and I'd like to create a rake task for this. But being new to rake I need some help.
I'd like to use rake's build system so that I can get automatic checking of the time stamps between the input and output .js files. So that if any of the input files are newer than the output file it will execute webpack. Otherwise if the none of the input files are newer than the output file, than the task does nothing.
In MSBuild, this is a cakewalk and lightning fast. But in Ruby I'm kind of lost.
I'm guessing it might consist of writing file tasks, and looping through and making the one output file depend on the inputs. Or should I use a rule, like this?
outputfile = "~/foo.js"
inputfiles = Dir["~/**/*.js"]
rule outputfile => inputfiles do
bin/webpack bla bla bla
end
You can use Rake::FileList to achieve this. Something like this:
file "foo.js" => Rake::FileList["**/*.js"] do
...
end
And, I'm not sure whether rake allows to use ~ in paths, I believe a full path is required. Or just use a "#{Dir.home}/foo.js" rule.
Then call it using:
rake ~/foo.js
And when you have multiple outputs:
task :build => Rake::FileList["config1.xml", "config2.xml"] do
# all that stuff is run only when the FileList above is changed
touch 'foo1.js'
touch 'foo2.js'
sh "compile foo3.js"
sh "do-anything-else foo4.js"
end
Run it using:
rake build
I am trying to test if certain files, called up in a list of textfiles, are in a certain directory. Every once in a while (and I am quite certain I use the same statements every time) I get an error, complaining that the echo command cannot be found.
The textfiles I have in my directory /audio/playlists/ are named according to their date on which they are supposed to be used: 20130715.txt for example for today:
me#computer:/some/dir# ls /audio/playlists/
20130715.txt 20130802.txt 20130820.txt 20130907.txt 20130925.txt
20130716.txt 20130803.txt 20130821.txt 20130908.txt 20130926.txt
(...)
me#computer:/some/dir# cat /audio/playlists/20130715.txt
#A Comment line goes here
00:00:00 141-751.mp3
00:03:35 141-704.mp3
00:06:42 140-417.mp3
00:10:46 139-808.mp3
00:15:13 136-126.mp3
00:20:26 071-007.mp3
(...)
23:42:22 136-088.mp3
23:46:15 128-466.mp3
23:50:15 129-592.mp3
23:54:29 129-397.mp3
So much for the facts. The following statement, which lets me test if all files called upon in all of the textfiles in the given directory are actually a file in the directory /audio/mp3/, produces an error:
me#computer:/some/dir# for i in $(cat /audio/playlists/*.txt|cut -c 10-16|sort|uniq); do [ -f "/audio/mp3s/$i.mp3" ] || echo $i; done
echo: command not found
me#computer:/some/dir#
I would guess bash wants to complain about the "A Comment"-line (actually " line ") not being a file, but why would that cause echo not to be found? Again, mostly this works, but every so often I get this error. Any help is greatly appreciated.
That space before echo isn't U+0020, it's U+00A0. And indeed, the command " echo" doesn't exist.