Multiple Processes - Python - windows

I am looking to run multiple instances of a command line script at the same time. I am new to this concept of "multi-threading" so am at bit of a loss as to why I am seeing the things that I am seeing.
I have tried to execute the sub-processing in two different ways:
1 - Using multiple calls of Popen without a communicate until the end:
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(stdoutdata, stderrdata) = process.communicate()
this starts up each of the command line item but only completes the last entry leaving the other 2 hanging.
2 - Attempting to implement an example from Python threading multiple bash subprocesses? but nothing happens except for a printout of the commands (program hangs with no command line arguments running as observed in windows task manager:
import threading
import Queue
import commands
import time
workspace = r'F:\Processing\SM'
image = 't08r_e'
image_name = (image.split('.'))[0]
i = 0
process_image_tif = workspace + '\\{}{}.tif'.format((image.split('r'))[0], str(i))
# thread class to run a command
class ExampleThread(threading.Thread):
def __init__(self, cmd, queue):
threading.Thread.__init__(self)
self.cmd = cmd
self.queue = queue
def run(self):
# execute the command, queue the result
(status, output) = commands.getstatusoutput(self.cmd)
self.queue.put((self.cmd, output, status))
# queue where results are placed
result_queue = Queue.Queue()
# define the commands to be run in parallel, run them
cmds = ['raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i)),
]
for cmd in cmds:
thread = ExampleThread(cmd, result_queue)
thread.start()
# print results as we get them
while threading.active_count() > 1 or not result_queue.empty():
while not result_queue.empty():
(cmd, output, status) = result_queue.get()
print(cmd)
print(output)
How can I run all of these commands at the same time achieving a result at the end? I am running in windows, pyhton 2.7.

My first try didn't work because of the repeated definitions of stdout and sterror. Removing these definitions causes expected behavior.

Related

Change back DPI settings in a bash script

I would like to run a program that does not properly support my desired resolution+DPI settings.
Also I want to change my default GTK theme to a lighter one.
What I currently have:
#!/bin/bash
xfconf-query -c xsettings -p /Xft/DPI -s 0
GTK_THEME=/usr/share/themes/Adwaita/gtk-2.0/gtkrc /home/unknown/scripts/ch_resolution.py --output DP-0 --resolution 2560x1440 beersmith3
This sets my DPI settings to 0, changes the gtk-theme, runs a python script that changes my resolution and runs the program, and on program exit changes it back. This is working properly.
Now I want to change back my DPI settings to 136 on program exit
xfconf-query -c xsettings -p /Xft/DPI -s 136
My guess is I need to use a while loop but have no idea how to do it.
ch_resolution.py
#!/usr/bin/env python3
import argparse
import re
import subprocess
import sys
parser = argparse.ArgumentParser()
parser.add_argument('--output', required=True)
parser.add_argument('--resolution', required=True)
parser.add_argument('APP')
args = parser.parse_args()
device_context = '' # track what device's modes we are looking at
modes = [] # keep track of all the devices and modes discovered
current_modes = [] # remember the user's current settings
# Run xrandr and ask it what devices and modes are supported
xrandrinfo = subprocess.Popen('xrandr -q', shell=True, stdout=subprocess.PIPE)
output = xrandrinfo.communicate()[0].decode().split('\n')
for line in output:
# luckily the various data from xrandr are separated by whitespace...
foo = line.split()
# Check to see if the second word in the line indicates a new context
# -- if so, keep track of the context of the device we're seeing
if len(foo) >= 2: # throw out any weirdly formatted lines
if foo[1] == 'disconnected':
# we have a new context, but it should be ignored
device_context = ''
if foo[1] == 'connected':
# we have a new context that we want to test
device_context = foo[0]
elif device_context != '': # we've previously seen a 'connected' dev
# mode names seem to always be of the format [horiz]x[vert]
# (there can be non-mode information inside of a device context!)
if foo[0].find('x') != -1:
modes.append((device_context, foo[0]))
# we also want to remember what the current mode is, which xrandr
# marks with a '*' character, so we can set things back the way
# we found them at the end:
if line.find('*') != -1:
current_modes.append((device_context, foo[0]))
for mode in modes:
if args.output == mode[0] and args.resolution == mode[1]:
cmd = 'xrandr --output ' + mode[0] + ' --mode ' + mode[1]
subprocess.call(cmd, shell=True)
break
else:
print('Unable to set mode ' + args.resolution + ' for output ' + args.output)
sys.exit(1)
subprocess.call(args.APP, shell=True)
# Put things back the way we found them
for mode in current_modes:
cmd = 'xrandr --output ' + mode[0] + ' --mode ' + mode[1]
subprocess.call(cmd, shell=True)
edit:
Thanks #AndreLDM for pointing out that I do not need a separate python script to change the resolution, I don't know why I didn't think of that.
I changed it so I don't need the python script and it is working properly now. If I can improve this script please tell me!
#!/bin/bash
xrandr --output DP-0 --mode 2560x1440
xfconf-query -c xsettings -p /Xft/DPI -s 0
GTK_THEME=/usr/share/themes/Adwaita/gtk-2.0/gtkrc beersmith3
if [ $? == 0 ]
then
xrandr --output DP-0 --mode 3840x2160
xfconf-query -c xsettings -p /Xft/DPI -s 136
exit 0
else
xrandr --output DP-0 --mode 3840x2160
xfconf-query -c xsettings -p /Xft/DPI -s 136
exit 1
fi

Using bash functions in snakemake

I am trying to download some files with snakemake. The files (http://snpeff.sourceforge.net/SnpSift.html#dbNSFP) I would like to download are on a google site/drive and my usual wget approach does not work. I found a bash function that does the job (https://www.zachpfeffer.com/single-post/wget-a-Google-Drive-file):
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 rm -rf /tmp/cookies.txt }
gdrive_download 120aPYqveqPx6jtssMEnLoqY0kCgVdR2fgMpb8FhFNHo test.txt
I have tested this function with my ids in a plain bash script and was able to download all the files. To add a bit to the complexity, I must use a workplace template, and incorporate the function into it.
rule dl:
params:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{genome}/{afile}'
output:
'data/{genome}/{afile}'
params:
id1 = '0B7Ms5xMSFMYlOTV5RllpRjNHU2s',
f1 = 'dbNSFP.txt.gz'
shell:
"""CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id={{params.id1}}" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p") && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id={{params.id1}}" -O {{params.f1}} && rm -rf /tmp/cookies.txt"""
#'wget -c {params.url} -O {output}'
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
#expand('tmp/{genome}/info/SnpSift{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
There are four downloadable files for which I have ids (I only show one in params), however I don't know how to call the bash functions as written by ZPfeffer for all the ids I have with snakemake. Additionally, when I run this script, there are several errors, the most pressing being
sed: -e expression #1, char 31: unterminated `s' command
I am far from a snakemake expert, any assistance on how to modify my script to a) call the functions with 4 different ids, b) remove the sed error, and c) verify whether this is the correct url format (currently url = 'https://docs.google.com/uc?export/{afile}) will be greatly appreciated.
You would want to use raw string literal so that snakemake doesn't escape special characters, such as backslash in sed command. For example (notice r in front of shell command):
rule foo:
shell:
r"sed d\s\"
You could use --printshellcmds or -p to see how exactly shell: commands get resolved by snakemake.
Here is how I "solved" it:
import pandas as pd
rule dl:
output:
'data/{genome}/{afile}'
shell:
"sh download_snpsift.sh"
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
And here is the bash script.
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
rm -rf /tmp/cookies.txt
}
gdrive_download 0B7Ms5xMSFMYlSTY5dDJjcHVRZ3M data/GRCh37/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlOTV5RllpRjNHU2s data/GRCh37/dbNSFP.txt.gz.tbi
gdrive_download 0B7Ms5xMSFMYlbTZodjlGUDZnTGc data/GRCh38/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlNVBJdFA5cFZRYkE data/GRCh38/dbNSFP.txt.gz.tbi

GhostScript auto pagenumbering

I want to export one certain page from a pdf document to an image and automatically fill the page number in the file name. When I run the following code:
gs \
-sDEVICE=jpeg \
-o outfile-%03.jpeg \
-dFirstPage=12 \
-dLastPage=12 \
wo.pdf
I get: outfile-001.jpeg instead of outfile-012.jpeg.
I've wrote a bash script for the job:
function extract_nth_page(){
printf -v j "outfile-%05g.png" $1
echo $j
gs -q -dNOPAUSE -sDEVICE=png16m -r600 -dFirstPage=$1 -dLastPage=$1 -sOutputFile=$j $2 -c quit
return 0
}
# Extracts page number 42 from myFile.pdf to outfile-00042.png
extract_nth_page 42 myFile.pdf

Use wildcard on params

I try to use one tool and I need to use a wildcard present on input.
This is an example:
aDict = {"120":"121" } #tumor : normal
rule all:
input: expand("{case}.mutect2.vcf",case=aDict.keys())
def get_files_somatic(wildcards):
case = wildcards.case
control = aDict[wildcards.case]
return [case + ".sorted.bam", control + ".sorted.bam"]
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}'
log:
"logs/{case}.mutect2.log"
threads: 8
shell:
" gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} -I {input[1]} -normal {wildcards.control}"
" -L {params.target} -O {output}"
I Have this error:
'Wildcards' object has no attribute 'control'
So I have a function with case and control. I'm not able to extract code.
The wildcards are derived from the output file/pattern. That is why you only have the wildcard called case. You have to derive the control from that. Try replacing your shell statement with this:
run:
control = aDict[wildcards.case]
shell(
"gatk-launch Mutect2 -R {params.genome} -I {input[0]} "
"-tumor {params.name_tumor} -I {input[1]} -normal {control} "
"-L {input.target2} -O {output}"
)
You could define control in params. Also {input.target2} in shell command would result in error. May be it's supposed to be params.target?
rule gatk_Mutect2:
input:
get_files_somatic,
output:
"{case}.mutect2.vcf"
params:
genome="ref/hg19.fa",
target= "chr12",
name_tumor='{case}',
control = lambda wildcards: aDict[wildcards.case]
shell:
"""
gatk-launch Mutect2 -R {params.genome} -I {input[0]} -tumor {params.name_tumor} \\
-I {input[1]} -normal {params.control} -L {params.target} -O {output}
"""

OSX: dtrace printf(), write_nocancel() output base on dtruss script

I need to see my program's printf output in sync with the dtrace output.
I like to build my own version of dtrace command that produce the equivalent output of the "sudo dtruss -t write_nocancel ls" command.
This is the "correct dtruss command/output":
sudo dtruss -t write_nocancel ls
Chap1 Chap10 Chap11 Chap12 Chap2 Chap3 Chap4 Chap5 Chap6 Chap7 Chap8 Chap9 README
SYSCALL(args) = return
write_nocancel(0x1, "Chap1\tChap10\tChap11\tChap12\tChap2\tChap3\tChap4\tChap5\tChap6\tChap7\tChap8\tChap9\tREADME\n\0", 0x52) = 82 0
Base on looking at the dtruss script source code, I tried this dtrace command, but it failed.
sudo dtrace -q \
-n '*:*:write_nocancel:entry {self->arg0=arg0; self->arg1 =arg1; \
self->arg2 =arg2; self->code=0; } ' \
-n '*:*:write_nocancel:return { \
printf("return %s(0x%X, \"%S\", 0x%X) = %d %d", \
probefunc,self->arg0, arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),self->arg2,(int)arg0, \
(int)errno); }' \
-c ls 2>&1
Chap1
Chap10
Chap11
Chap12
Chap2
Chap3
Chap4
Chap5
Chap6
Chap7
Chap8
Chap9
README
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 4 (ID 958: syscall::write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff801a7c0010) in action #3 at DIF offset 92
Any dtrace experts out there might have a clue on how to fixe this?
Find the answer: (The issue of two -n options).
sudo dtrace -q -n \
'syscall::write_nocancel:entry{self->start = 1; \
self->vstart = 1; self->arg0 = arg0; \
self->arg1 = arg1; self->arg2 = arg2;} \
*:*:write_nocancel:return /self->start/ \
{ printf("return %s(0x%X, \"%S\", 0x%X) = %d %d" \
,probefunc,self->arg0, \
arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),\
self->arg2,(int)arg0, (int)errno); }' \
-c ls 2>&1

Resources