Python convert sphinx RST to HTML - python-sphinx

I have tried this code:
from docutils.core import publish_string
text = publish_string(open(file_path, 'r').read(), writer_name='html')
But it says:
<p>Unknown directive type "toctree".</p>
So it won't work with some specific sphinx directives.
What is the easiest way to do same stuff for sphinx RST files?
upd.
Seems like it must be:
sphinx-build -b singlehtml -D extensions='sphinx.ext.autodoc' -D master_doc='index' -C /mypath/docs .
How can I call that from Python code instead of console?

Here is what i wanted to do:
import sphinx
args = ". -b singlehtml -D extensions=sphinx.ext.autodoc -D master_doc=index -C /tmp/doc /tmp/out"
sphinx.main(args.split())
result = open('/tmp/out/index.html', 'r')

here is another example:
# sphinx version = 2.3.1
# Python version = 3.7.3
from sphinx.cmd.build import main as sphinx_main
from pathlib import Path
from os import startfile
import sys
master_doc = 'index'
source_suffix = '.rst'
output_file = 'tmp/dist'
html_theme = 'nature'
build_format = 'html' # singlehtml, ...
args = f". -b {build_format} -D extensions=sphinx.ext.autodoc " \
f"-D master_doc={master_doc} " \
f"-D source_suffix={source_suffix} " \
f"-D html_theme={html_theme} " \
f"-C {output_file} "
sys.path.append(str(Path('.').absolute()))
sphinx_main(args.split())
startfile(Path(output_file).joinpath(master_doc+'.html'))
more parameters >
https://www.sphinx-doc.org/en/master/man/sphinx-build.html
https://www.sphinx-doc.org/en/master/usage/configuration.html

Related

Using bash functions in snakemake

I am trying to download some files with snakemake. The files (http://snpeff.sourceforge.net/SnpSift.html#dbNSFP) I would like to download are on a google site/drive and my usual wget approach does not work. I found a bash function that does the job (https://www.zachpfeffer.com/single-post/wget-a-Google-Drive-file):
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 rm -rf /tmp/cookies.txt }
gdrive_download 120aPYqveqPx6jtssMEnLoqY0kCgVdR2fgMpb8FhFNHo test.txt
I have tested this function with my ids in a plain bash script and was able to download all the files. To add a bit to the complexity, I must use a workplace template, and incorporate the function into it.
rule dl:
params:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{genome}/{afile}'
output:
'data/{genome}/{afile}'
params:
id1 = '0B7Ms5xMSFMYlOTV5RllpRjNHU2s',
f1 = 'dbNSFP.txt.gz'
shell:
"""CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id={{params.id1}}" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p") && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id={{params.id1}}" -O {{params.f1}} && rm -rf /tmp/cookies.txt"""
#'wget -c {params.url} -O {output}'
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
#expand('tmp/{genome}/info/SnpSift{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
There are four downloadable files for which I have ids (I only show one in params), however I don't know how to call the bash functions as written by ZPfeffer for all the ids I have with snakemake. Additionally, when I run this script, there are several errors, the most pressing being
sed: -e expression #1, char 31: unterminated `s' command
I am far from a snakemake expert, any assistance on how to modify my script to a) call the functions with 4 different ids, b) remove the sed error, and c) verify whether this is the correct url format (currently url = 'https://docs.google.com/uc?export/{afile}) will be greatly appreciated.
You would want to use raw string literal so that snakemake doesn't escape special characters, such as backslash in sed command. For example (notice r in front of shell command):
rule foo:
shell:
r"sed d\s\"
You could use --printshellcmds or -p to see how exactly shell: commands get resolved by snakemake.
Here is how I "solved" it:
import pandas as pd
rule dl:
output:
'data/{genome}/{afile}'
shell:
"sh download_snpsift.sh"
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
And here is the bash script.
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
rm -rf /tmp/cookies.txt
}
gdrive_download 0B7Ms5xMSFMYlSTY5dDJjcHVRZ3M data/GRCh37/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlOTV5RllpRjNHU2s data/GRCh37/dbNSFP.txt.gz.tbi
gdrive_download 0B7Ms5xMSFMYlbTZodjlGUDZnTGc data/GRCh38/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlNVBJdFA5cFZRYkE data/GRCh38/dbNSFP.txt.gz.tbi

Multiple Processes - Python

I am looking to run multiple instances of a command line script at the same time. I am new to this concept of "multi-threading" so am at bit of a loss as to why I am seeing the things that I am seeing.
I have tried to execute the sub-processing in two different ways:
1 - Using multiple calls of Popen without a communicate until the end:
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(stdoutdata, stderrdata) = process.communicate()
this starts up each of the command line item but only completes the last entry leaving the other 2 hanging.
2 - Attempting to implement an example from Python threading multiple bash subprocesses? but nothing happens except for a printout of the commands (program hangs with no command line arguments running as observed in windows task manager:
import threading
import Queue
import commands
import time
workspace = r'F:\Processing\SM'
image = 't08r_e'
image_name = (image.split('.'))[0]
i = 0
process_image_tif = workspace + '\\{}{}.tif'.format((image.split('r'))[0], str(i))
# thread class to run a command
class ExampleThread(threading.Thread):
def __init__(self, cmd, queue):
threading.Thread.__init__(self)
self.cmd = cmd
self.queue = queue
def run(self):
# execute the command, queue the result
(status, output) = commands.getstatusoutput(self.cmd)
self.queue.put((self.cmd, output, status))
# queue where results are placed
result_queue = Queue.Queue()
# define the commands to be run in parallel, run them
cmds = ['raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i)),
]
for cmd in cmds:
thread = ExampleThread(cmd, result_queue)
thread.start()
# print results as we get them
while threading.active_count() > 1 or not result_queue.empty():
while not result_queue.empty():
(cmd, output, status) = result_queue.get()
print(cmd)
print(output)
How can I run all of these commands at the same time achieving a result at the end? I am running in windows, pyhton 2.7.
My first try didn't work because of the repeated definitions of stdout and sterror. Removing these definitions causes expected behavior.

Use wildcard on params

I try to use one tool and I need to use a wildcard present on input.
This is an example:
aDict = {"120":"121" } #tumor : normal
rule all:
input: expand("{case}.mutect2.vcf",case=aDict.keys())
def get_files_somatic(wildcards):
case = wildcards.case
control = aDict[wildcards.case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}'
log:
"logs/{case}.mutect2.log"
threads: 8
shell:
" gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} -I {input[1]} -normal {wildcards.control}"
" -L {params.target} -O {output}"
I Have this error:
'Wildcards' object has no attribute 'control'
So I have a function with case and control. I'm not able to extract code.
The wildcards are derived from the output file/pattern. That is why you only have the wildcard called case. You have to derive the control from that. Try replacing your shell statement with this:
run:
control = aDict[wildcards.case]
shell(
"gatk-launch Mutect2 -R {params.genome} -I {input[0]} "
"-tumor {params.name_tumor} -I {input[1]} -normal {control} "
"-L {input.target2} -O {output}"
)
You could define control in params. Also {input.target2} in shell command would result in error. May be it's supposed to be params.target?
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}',
control = lambda wildcards: aDict[wildcards.case]
shell:
"""
gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} \\
-I {input[1]} -normal {params.control} -L {params.target} -O {output}
"""

Passing go code directly into go run without a file

Is it possible to pass a string of go code into go run instead of go run /some/path/script.go? I tried:
echo "some awesome go code here" | go run
But does not work. Thanks.
I don't think that there is such an option. At least not with the standard *g compilers or
go run.
You can try using gccgo as GCC supports reading from stdin.
Since I thought that this would be a useful thing to have, I wrote a relatively small Python script that achieves what I think you want. I called it go-script, and here are some usage examples:
# Assuming that test.go is a valid go file including package and imports
$ go-script --no-package < test.go
# Runs code from stdin, importing 'fmt' and wrapping it in a func main(){}
$ echo 'fmt.Println("test")' | go-script --import fmt --main
$ echo 'fmt.Println("test")' | go-script -ifmt -m
Help:
Usage: go-script [options]
Options:
-h, --help show this help message and exit
-i PACKAGE, --import=PACKAGE
Import package of given name
-p, --no-package Don't specify 'package main' (enabled by default)
-m, --main Wrap input in a func main() {} block
-d, --debug Print the generated Go code instead of running it.
The source (also available as a gist):
#!/usr/bin/env python
from __future__ import print_function
from optparse import OptionParser
import os
import sys
parser = OptionParser()
parser.add_option("-i", "--import", dest="imports", action="append", default=[],
help="Import package of given name", metavar="PACKAGE")
parser.add_option("-p", "--no-package", dest="package", action="store_false", default=True,
help="Don't specify 'package main' (enabled by default)")
parser.add_option("-m", "--main", dest="main", action="store_true", default=False,
help="Wrap input in a func main() {} block")
parser.add_option("-d", "--debug", dest="debug", action="store_true", default=False,
help="Print the generated Go code instead of running it.")
(options, args) = parser.parse_args()
stdin = ""
for line in sys.stdin.readlines():
stdin += "%s\n" % line
out = ""
if options.package:
out += "package main\n\n"
for package in options.imports:
out += "import \"%s\"\n" % package
out += "\n"
if options.main:
out += "func main() {\n%s\n}\n" % stdin
else:
out += stdin
if options.debug:
print(out)
else:
tmpfile = "%s%s" % (os.environ["TMPDIR"], "script.go")
f = open(tmpfile, 'w')
print(out, file=f)
f.close()
os.execlp("go", "", "run", tmpfile)
This works
cat <<EOF | tee /tmp/blah.go | go run /tmp/blah.go
package main
import "fmt"
func main() {
fmt.Println("Hello, World!")
}
EOF
If you want to not have to open a file and edit it first. Although I wouldn't find this super practical for every day use.

Paste to codepad.org using bash or curl

How can I paste to codepad.org from the commandline using curl?
here's a Python script
import urllib
import urllib2
url = 'http://codepad.org'
content=open("myscript.py").read()
values = {'lang' : 'Python',
'code' : content,
'submit':'Submit'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
for href in the_page.split("</a>"):
if "Link:" in href:
ind=href.index('Link:')
found = href[ind+5:]
for i in found.split('">'):
if '<a href=' in i:
print "The link: " ,i.replace('<a href="',"").strip()
output
$ python python.py
The link: http://codepad.org/W0G8HIzM
Yes, you can do it with curl. Assuming your code is Python and in myfile.python, you can do it like this:
$ curl -d "lang=Python&submit=Submit" --data-urlencode code#myfile.py codepad.org
(Edited to make it work.)
You can also use reval:
reval test.py
reval -l ruby -e 'p 2+2'
reval prog.hs -p # make private

Resources