I am looking to run multiple instances of a command line script at the same time. I am new to this concept of "multi-threading" so am at bit of a loss as to why I am seeing the things that I am seeing.
I have tried to execute the sub-processing in two different ways:
1 - Using multiple calls of Popen without a communicate until the end:
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(stdoutdata, stderrdata) = process.communicate()
this starts up each of the command line item but only completes the last entry leaving the other 2 hanging.
2 - Attempting to implement an example from Python threading multiple bash subprocesses? but nothing happens except for a printout of the commands (program hangs with no command line arguments running as observed in windows task manager:
import threading
import Queue
import commands
import time
workspace = r'F:\Processing\SM'
image = 't08r_e'
image_name = (image.split('.'))[0]
i = 0
process_image_tif = workspace + '\\{}{}.tif'.format((image.split('r'))[0], str(i))
# thread class to run a command
class ExampleThread(threading.Thread):
def __init__(self, cmd, queue):
threading.Thread.__init__(self)
self.cmd = cmd
self.queue = queue
def run(self):
# execute the command, queue the result
(status, output) = commands.getstatusoutput(self.cmd)
self.queue.put((self.cmd, output, status))
# queue where results are placed
result_queue = Queue.Queue()
# define the commands to be run in parallel, run them
cmds = ['raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i)),
]
for cmd in cmds:
thread = ExampleThread(cmd, result_queue)
thread.start()
# print results as we get them
while threading.active_count() > 1 or not result_queue.empty():
while not result_queue.empty():
(cmd, output, status) = result_queue.get()
print(cmd)
print(output)
How can I run all of these commands at the same time achieving a result at the end? I am running in windows, pyhton 2.7.
My first try didn't work because of the repeated definitions of stdout and sterror. Removing these definitions causes expected behavior.
Related
I would like to run a program that does not properly support my desired resolution+DPI settings.
Also I want to change my default GTK theme to a lighter one.
What I currently have:
#!/bin/bash
xfconf-query -c xsettings -p /Xft/DPI -s 0
GTK_THEME=/usr/share/themes/Adwaita/gtk-2.0/gtkrc /home/unknown/scripts/ch_resolution.py --output DP-0 --resolution 2560x1440 beersmith3
This sets my DPI settings to 0, changes the gtk-theme, runs a python script that changes my resolution and runs the program, and on program exit changes it back. This is working properly.
Now I want to change back my DPI settings to 136 on program exit
xfconf-query -c xsettings -p /Xft/DPI -s 136
My guess is I need to use a while loop but have no idea how to do it.
ch_resolution.py
#!/usr/bin/env python3
import argparse
import re
import subprocess
import sys
parser = argparse.ArgumentParser()
parser.add_argument('--output', required=True)
parser.add_argument('--resolution', required=True)
parser.add_argument('APP')
args = parser.parse_args()
device_context = '' # track what device's modes we are looking at
modes = [] # keep track of all the devices and modes discovered
current_modes = [] # remember the user's current settings
# Run xrandr and ask it what devices and modes are supported
xrandrinfo = subprocess.Popen('xrandr -q', shell=True, stdout=subprocess.PIPE)
output = xrandrinfo.communicate()[0].decode().split('\n')
for line in output:
# luckily the various data from xrandr are separated by whitespace...
foo = line.split()
# Check to see if the second word in the line indicates a new context
# -- if so, keep track of the context of the device we're seeing
if len(foo) >= 2: # throw out any weirdly formatted lines
if foo[1] == 'disconnected':
# we have a new context, but it should be ignored
device_context = ''
if foo[1] == 'connected':
# we have a new context that we want to test
device_context = foo[0]
elif device_context != '': # we've previously seen a 'connected' dev
# mode names seem to always be of the format [horiz]x[vert]
# (there can be non-mode information inside of a device context!)
if foo[0].find('x') != -1:
modes.append((device_context, foo[0]))
# we also want to remember what the current mode is, which xrandr
# marks with a '*' character, so we can set things back the way
# we found them at the end:
if line.find('*') != -1:
current_modes.append((device_context, foo[0]))
for mode in modes:
if args.output == mode[0] and args.resolution == mode[1]:
cmd = 'xrandr --output ' + mode[0] + ' --mode ' + mode[1]
subprocess.call(cmd, shell=True)
break
else:
print('Unable to set mode ' + args.resolution + ' for output ' + args.output)
sys.exit(1)
subprocess.call(args.APP, shell=True)
# Put things back the way we found them
for mode in current_modes:
cmd = 'xrandr --output ' + mode[0] + ' --mode ' + mode[1]
subprocess.call(cmd, shell=True)
edit:
Thanks #AndreLDM for pointing out that I do not need a separate python script to change the resolution, I don't know why I didn't think of that.
I changed it so I don't need the python script and it is working properly now. If I can improve this script please tell me!
#!/bin/bash
xrandr --output DP-0 --mode 2560x1440
xfconf-query -c xsettings -p /Xft/DPI -s 0
GTK_THEME=/usr/share/themes/Adwaita/gtk-2.0/gtkrc beersmith3
if [ $? == 0 ]
then
xrandr --output DP-0 --mode 3840x2160
xfconf-query -c xsettings -p /Xft/DPI -s 136
exit 0
else
xrandr --output DP-0 --mode 3840x2160
xfconf-query -c xsettings -p /Xft/DPI -s 136
exit 1
fi
I am trying to download some files with snakemake. The files (http://snpeff.sourceforge.net/SnpSift.html#dbNSFP) I would like to download are on a google site/drive and my usual wget approach does not work. I found a bash function that does the job (https://www.zachpfeffer.com/single-post/wget-a-Google-Drive-file):
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 rm -rf /tmp/cookies.txt }
gdrive_download 120aPYqveqPx6jtssMEnLoqY0kCgVdR2fgMpb8FhFNHo test.txt
I have tested this function with my ids in a plain bash script and was able to download all the files. To add a bit to the complexity, I must use a workplace template, and incorporate the function into it.
rule dl:
params:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{genome}/{afile}'
output:
'data/{genome}/{afile}'
params:
id1 = '0B7Ms5xMSFMYlOTV5RllpRjNHU2s',
f1 = 'dbNSFP.txt.gz'
shell:
"""CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id={{params.id1}}" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p") && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id={{params.id1}}" -O {{params.f1}} && rm -rf /tmp/cookies.txt"""
#'wget -c {params.url} -O {output}'
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
#expand('tmp/{genome}/info/SnpSift{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
There are four downloadable files for which I have ids (I only show one in params), however I don't know how to call the bash functions as written by ZPfeffer for all the ids I have with snakemake. Additionally, when I run this script, there are several errors, the most pressing being
sed: -e expression #1, char 31: unterminated `s' command
I am far from a snakemake expert, any assistance on how to modify my script to a) call the functions with 4 different ids, b) remove the sed error, and c) verify whether this is the correct url format (currently url = 'https://docs.google.com/uc?export/{afile}) will be greatly appreciated.
You would want to use raw string literal so that snakemake doesn't escape special characters, such as backslash in sed command. For example (notice r in front of shell command):
rule foo:
shell:
r"sed d\s\"
You could use --printshellcmds or -p to see how exactly shell: commands get resolved by snakemake.
Here is how I "solved" it:
import pandas as pd
rule dl:
output:
'data/{genome}/{afile}'
shell:
"sh download_snpsift.sh"
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
And here is the bash script.
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
rm -rf /tmp/cookies.txt
}
gdrive_download 0B7Ms5xMSFMYlSTY5dDJjcHVRZ3M data/GRCh37/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlOTV5RllpRjNHU2s data/GRCh37/dbNSFP.txt.gz.tbi
gdrive_download 0B7Ms5xMSFMYlbTZodjlGUDZnTGc data/GRCh38/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlNVBJdFA5cFZRYkE data/GRCh38/dbNSFP.txt.gz.tbi
I want to export one certain page from a pdf document to an image and automatically fill the page number in the file name. When I run the following code:
gs \
-sDEVICE=jpeg \
-o outfile-%03.jpeg \
-dFirstPage=12 \
-dLastPage=12 \
wo.pdf
I get: outfile-001.jpeg instead of outfile-012.jpeg.
I've wrote a bash script for the job:
function extract_nth_page(){
printf -v j "outfile-%05g.png" $1
echo $j
gs -q -dNOPAUSE -sDEVICE=png16m -r600 -dFirstPage=$1 -dLastPage=$1 -sOutputFile=$j $2 -c quit
return 0
}
# Extracts page number 42 from myFile.pdf to outfile-00042.png
extract_nth_page 42 myFile.pdf
I try to use one tool and I need to use a wildcard present on input.
This is an example:
aDict = {"120":"121" } #tumor : normal
rule all:
input: expand("{case}.mutect2.vcf",case=aDict.keys())
def get_files_somatic(wildcards):
case = wildcards.case
control = aDict[wildcards.case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}'
log:
"logs/{case}.mutect2.log"
threads: 8
shell:
" gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} -I {input[1]} -normal {wildcards.control}"
" -L {params.target} -O {output}"
I Have this error:
'Wildcards' object has no attribute 'control'
So I have a function with case and control. I'm not able to extract code.
The wildcards are derived from the output file/pattern. That is why you only have the wildcard called case. You have to derive the control from that. Try replacing your shell statement with this:
run:
control = aDict[wildcards.case]
shell(
"gatk-launch Mutect2 -R {params.genome} -I {input[0]} "
"-tumor {params.name_tumor} -I {input[1]} -normal {control} "
"-L {input.target2} -O {output}"
)
You could define control in params. Also {input.target2} in shell command would result in error. May be it's supposed to be params.target?
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}',
control = lambda wildcards: aDict[wildcards.case]
shell:
"""
gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} \\
-I {input[1]} -normal {params.control} -L {params.target} -O {output}
"""
I need to see my program's printf output in sync with the dtrace output.
I like to build my own version of dtrace command that produce the equivalent output of the "sudo dtruss -t write_nocancel ls" command.
This is the "correct dtruss command/output":
sudo dtruss -t write_nocancel ls
Chap1 Chap10 Chap11 Chap12 Chap2 Chap3 Chap4 Chap5 Chap6 Chap7 Chap8 Chap9 README
SYSCALL(args) = return
write_nocancel(0x1, "Chap1\tChap10\tChap11\tChap12\tChap2\tChap3\tChap4\tChap5\tChap6\tChap7\tChap8\tChap9\tREADME\n\0", 0x52) = 82 0
Base on looking at the dtruss script source code, I tried this dtrace command, but it failed.
sudo dtrace -q \
-n '*:*:write_nocancel:entry {self->arg0=arg0; self->arg1 =arg1; \
self->arg2 =arg2; self->code=0; } ' \
-n '*:*:write_nocancel:return { \
printf("return %s(0x%X, \"%S\", 0x%X) = %d %d", \
probefunc,self->arg0, arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),self->arg2,(int)arg0, \
(int)errno); }' \
-c ls 2>&1
Chap1
Chap10
Chap11
Chap12
Chap2
Chap3
Chap4
Chap5
Chap6
Chap7
Chap8
Chap9
README
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 4 (ID 958: syscall::write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff801a7c0010) in action #3 at DIF offset 92
Any dtrace experts out there might have a clue on how to fixe this?
Find the answer: (The issue of two -n options).
sudo dtrace -q -n \
'syscall::write_nocancel:entry{self->start = 1; \
self->vstart = 1; self->arg0 = arg0; \
self->arg1 = arg1; self->arg2 = arg2;} \
*:*:write_nocancel:return /self->start/ \
{ printf("return %s(0x%X, \"%S\", 0x%X) = %d %d" \
,probefunc,self->arg0, \
arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),\
self->arg2,(int)arg0, (int)errno); }' \
-c ls 2>&1