OSX: dtrace printf(), write_nocancel() output base on dtruss script - macos

I need to see my program's printf output in sync with the dtrace output.
I like to build my own version of dtrace command that produce the equivalent output of the "sudo dtruss -t write_nocancel ls" command.
This is the "correct dtruss command/output":
sudo dtruss -t write_nocancel ls
Chap1 Chap10 Chap11 Chap12 Chap2 Chap3 Chap4 Chap5 Chap6 Chap7 Chap8 Chap9 README
SYSCALL(args) = return
write_nocancel(0x1, "Chap1\tChap10\tChap11\tChap12\tChap2\tChap3\tChap4\tChap5\tChap6\tChap7\tChap8\tChap9\tREADME\n\0", 0x52) = 82 0
Base on looking at the dtruss script source code, I tried this dtrace command, but it failed.
sudo dtrace -q \
-n '*:*:write_nocancel:entry {self->arg0=arg0; self->arg1 =arg1; \
self->arg2 =arg2; self->code=0; } ' \
-n '*:*:write_nocancel:return { \
printf("return %s(0x%X, \"%S\", 0x%X) = %d %d", \
probefunc,self->arg0, arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),self->arg2,(int)arg0, \
(int)errno); }' \
-c ls 2>&1
Chap1
Chap10
Chap11
Chap12
Chap2
Chap3
Chap4
Chap5
Chap6
Chap7
Chap8
Chap9
README
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 4 (ID 958: syscall::write_nocancel:return): invalid address (0xffffff80218dfc40) in action #3 at DIF offset 92
dtrace: error on enabled probe ID 3 (ID 209288: fbt:mach_kernel:write_nocancel:return): invalid address (0xffffff801a7c0010) in action #3 at DIF offset 92
Any dtrace experts out there might have a clue on how to fixe this?

Find the answer: (The issue of two -n options).
sudo dtrace -q -n \
'syscall::write_nocancel:entry{self->start = 1; \
self->vstart = 1; self->arg0 = arg0; \
self->arg1 = arg1; self->arg2 = arg2;} \
*:*:write_nocancel:return /self->start/ \
{ printf("return %s(0x%X, \"%S\", 0x%X) = %d %d" \
,probefunc,self->arg0, \
arg0 == -1 ? "" : stringof(copyin(self->arg1,arg0)),\
self->arg2,(int)arg0, (int)errno); }' \
-c ls 2>&1

Related

dtrace errors out with incorrect type for thread_t on macOS 12.2.1

The following dtrace invocation used to work on MacOS 11.6.2 but after upgrading to MacOS 12.2.1 it stopped working
sudo --non-interactive dtrace -l -b 16m -x mangled -x disallow_dsym -x strip -x evaltime=preinit -x preallocate=256m -n 'oneshot$target:a.out::entry { hdr_vm_addr = ((thread_t)curthread)->task->mach_header_vm_address; }' -w -p 4557
dtrace: invalid probe specifier oneshot$target:a.out::entry { hdr_vm_addr = ((thread_t)curthread)->task->mach_header_vm_address; }: in action list: task is not a member of struct thread
How do I get task from thread_t now?
It looks like it may be t_task. Here's how you can view the type of a built-in variable such as curthread:
$ sudo dtrace -n 'BEGIN { print(*curthread); exit(0); }'
dtrace: description 'BEGIN ' matched 1 probe
CPU ID FUNCTION:NAME
4 1 :BEGIN struct thread {
union {
queue_chain_t runq_links = {
struct queue_entry *next = 0
struct queue_entry *prev = 0
}
...
struct task *t_task = 0xffffff8bd07859f0
...
}

Using bash functions in snakemake

I am trying to download some files with snakemake. The files (http://snpeff.sourceforge.net/SnpSift.html#dbNSFP) I would like to download are on a google site/drive and my usual wget approach does not work. I found a bash function that does the job (https://www.zachpfeffer.com/single-post/wget-a-Google-Drive-file):
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 rm -rf /tmp/cookies.txt }
gdrive_download 120aPYqveqPx6jtssMEnLoqY0kCgVdR2fgMpb8FhFNHo test.txt
I have tested this function with my ids in a plain bash script and was able to download all the files. To add a bit to the complexity, I must use a workplace template, and incorporate the function into it.
rule dl:
params:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{genome}/{afile}'
output:
'data/{genome}/{afile}'
params:
id1 = '0B7Ms5xMSFMYlOTV5RllpRjNHU2s',
f1 = 'dbNSFP.txt.gz'
shell:
"""CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id={{params.id1}}" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p") && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id={{params.id1}}" -O {{params.f1}} && rm -rf /tmp/cookies.txt"""
#'wget -c {params.url} -O {output}'
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
#expand('tmp/{genome}/info/SnpSift{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
There are four downloadable files for which I have ids (I only show one in params), however I don't know how to call the bash functions as written by ZPfeffer for all the ids I have with snakemake. Additionally, when I run this script, there are several errors, the most pressing being
sed: -e expression #1, char 31: unterminated `s' command
I am far from a snakemake expert, any assistance on how to modify my script to a) call the functions with 4 different ids, b) remove the sed error, and c) verify whether this is the correct url format (currently url = 'https://docs.google.com/uc?export/{afile}) will be greatly appreciated.
You would want to use raw string literal so that snakemake doesn't escape special characters, such as backslash in sed command. For example (notice r in front of shell command):
rule foo:
shell:
r"sed d\s\"
You could use --printshellcmds or -p to see how exactly shell: commands get resolved by snakemake.
Here is how I "solved" it:
import pandas as pd
rule dl:
output:
'data/{genome}/{afile}'
shell:
"sh download_snpsift.sh"
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
And here is the bash script.
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
rm -rf /tmp/cookies.txt
}
gdrive_download 0B7Ms5xMSFMYlSTY5dDJjcHVRZ3M data/GRCh37/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlOTV5RllpRjNHU2s data/GRCh37/dbNSFP.txt.gz.tbi
gdrive_download 0B7Ms5xMSFMYlbTZodjlGUDZnTGc data/GRCh38/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlNVBJdFA5cFZRYkE data/GRCh38/dbNSFP.txt.gz.tbi

GhostScript auto pagenumbering

I want to export one certain page from a pdf document to an image and automatically fill the page number in the file name. When I run the following code:
gs \
-sDEVICE=jpeg \
-o outfile-%03.jpeg \
-dFirstPage=12 \
-dLastPage=12 \
wo.pdf
I get: outfile-001.jpeg instead of outfile-012.jpeg.
I've wrote a bash script for the job:
function extract_nth_page(){
printf -v j "outfile-%05g.png" $1
echo $j
gs -q -dNOPAUSE -sDEVICE=png16m -r600 -dFirstPage=$1 -dLastPage=$1 -sOutputFile=$j $2 -c quit
return 0
}
# Extracts page number 42 from myFile.pdf to outfile-00042.png
extract_nth_page 42 myFile.pdf

How to run a script with correct permissions from custom udev rule?

I set up a custom udev rule triggered when my bluetooth mouse connects:
ACTION=="add" \
, ATTRS{idProduct}=="XXX" \
, ATTRS{idVendor}=="XXX" \
, ENV{DISPLAY}=":0" \
, ENV{XAUTHORITY}="/run/user/1000/gdm/Xauthority" \
, RUN+="/home/XXX/.scripts/mouse_connected.sh"
udevadm test succeeds, everything seems fine.
Then i created my script that should be triggered:
#!/bin/bash
if (xinput list | grep Maus | grep -o id=[0-9]* | grep -o [0-9]*$);
then xinput set-button-map "$(xinput list | grep Maus | grep -o id=[0-9]* | grep -o [0-9]*$)" 1 1 3 && logger "success";
else logger "fail" && sleep 3;
fi
when I manually run the script in terminal using "bash '/home/XXX/.scripts/mouse_connected.sh'" it works as expected. But when it gets successfully triggered via my custom udev rule it fails. - But why?

Multiple Processes - Python

I am looking to run multiple instances of a command line script at the same time. I am new to this concept of "multi-threading" so am at bit of a loss as to why I am seeing the things that I am seeing.
I have tried to execute the sub-processing in two different ways:
1 - Using multiple calls of Popen without a communicate until the end:
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
command = 'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
(stdoutdata, stderrdata) = process.communicate()
this starts up each of the command line item but only completes the last entry leaving the other 2 hanging.
2 - Attempting to implement an example from Python threading multiple bash subprocesses? but nothing happens except for a printout of the commands (program hangs with no command line arguments running as observed in windows task manager:
import threading
import Queue
import commands
import time
workspace = r'F:\Processing\SM'
image = 't08r_e'
image_name = (image.split('.'))[0]
i = 0
process_image_tif = workspace + '\\{}{}.tif'.format((image.split('r'))[0], str(i))
# thread class to run a command
class ExampleThread(threading.Thread):
def __init__(self, cmd, queue):
threading.Thread.__init__(self)
self.cmd = cmd
self.queue = queue
def run(self):
# execute the command, queue the result
(status, output) = commands.getstatusoutput(self.cmd)
self.queue.put((self.cmd, output, status))
# queue where results are placed
result_queue = Queue.Queue()
# define the commands to be run in parallel, run them
cmds = ['raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum1 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum1{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum2 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum2{}'.format(i), str(i)),
'raster2pgsql -I -C -e -s 26911 %s -t 100x100 -F p839.%s_image_sum_sum3 | psql -U david -d projects -h pg3' % (workspace + '\\r_sumsum3{}'.format(i), str(i)),
]
for cmd in cmds:
thread = ExampleThread(cmd, result_queue)
thread.start()
# print results as we get them
while threading.active_count() > 1 or not result_queue.empty():
while not result_queue.empty():
(cmd, output, status) = result_queue.get()
print(cmd)
print(output)
How can I run all of these commands at the same time achieving a result at the end? I am running in windows, pyhton 2.7.
My first try didn't work because of the repeated definitions of stdout and sterror. Removing these definitions causes expected behavior.

Resources