How do I find a line of code with pyelftools/libdwarf - debugging

I have a function name and an offset from the top of that function. I know I can find the line of code from looking at the assembly listing file and compute the offset for the line of code and get the line number that way.
What I'm trying to do is use the .o file to get that same information. I can see the DWARF information for the ELF file and can find the DIE for function in the DWARF data, but how do I actually see the info for the instructions of that function and map that to a line of code. I've been using pyelftools so I would hopefully like to be able to use that but I am open to other options if I can't use pyelftools.

There's a sample in pyelftools that does that: https://github.com/eliben/pyelftools/blob/master/examples/dwarf_decode_address.py
Specifically, finding the line for the address goes like this:
def decode_file_line(dwarfinfo, address):
# Go over all the line programs in the DWARF information, looking for
# one that describes the given address.
for CU in dwarfinfo.iter_CUs():
# First, look at line programs to find the file/line for the address
lineprog = dwarfinfo.line_program_for_CU(CU)
prevstate = None
for entry in lineprog.get_entries():
# We're interested in those entries where a new state is assigned
if entry.state is None:
continue
if entry.state.end_sequence:
# if the line number sequence ends, clear prevstate.
prevstate = None
continue
# Looking for a range of addresses in two consecutive states that
# contain the required address.
if prevstate and prevstate.address <= address < entry.state.address:
filename = lineprog['file_entry'][prevstate.file - 1].name
line = prevstate.line
return filename, line
prevstate = entry.state
return None, None

Related

Unexpected index error using choice module

I've been trying to use args in a function so that it can accept an unspecified amount of parameters but during testing I seem to be lumped with this error. Which seems to be produced by the choice module itself not the file i'm writing.
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1520.0_x64__qbz5n2kfra8p0\lib\random.py", line 378, in choice
return seq[self._randbelow(len(seq))]
IndexError: list index out of range
Here is the code i'm trying to run:
def key_word_choice(key, *otherkeys):
key_choice = []
for items in sauces:
if key and otherkeys in items:
key_choice.append(items[0])
print("You should choose:", choice(key_choice))
key_word_choice('hot', 'garlic', 'buffalo')

Snakemake - parameter file treated as a wildcard

I have written a pipeline in Snakemake. It's an ATAC-seq pipeline (bioinformatics pipeline to analyze genomics data from a specific experiment). Basically, until merging alignment step I use {sample_id} wildcard, to later switch to {sample} wildcard (merging two or more sample_ids into one sample).
working DAG here (for simplicity only one sample shown; orange and blue {sample_id}s are merged into one green {sample}
Tha all rule looks as follows:
configfile: "config.yaml"
SAMPLES_DICT = dict()
with open(config['SAMPLE_SHEET'], "r+") as fil:
next(fil)
for lin in fil.readlines():
row = lin.strip("\n").split("\t")
sample_id = row[0]
sample_name = row[1]
if sample_name in SAMPLES_DICT.keys():
SAMPLES_DICT[sample_name].append(sample_id)
else:
SAMPLES_DICT[sample_name] = [sample_id]
SAMPLES = list(SAMPLES_DICT.keys())
SAMPLE_IDS = [sample_id for sample in SAMPLES_DICT.values() for sample_id in sample]
rule all:
input:
# FASTQC output for RAW reads
expand(os.path.join(config['FASTQC'], '{sample_id}_R{read}_fastqc.zip'),
sample_id = SAMPLE_IDS,
read = ['1', '2']),
# Trimming
expand(os.path.join(config['TRIMMED'],
'{sample_id}_R{read}_val_{read}.fq.gz'),
sample_id = SAMPLE_IDS,
read = ['1', '2']),
# Alignment
expand(os.path.join(config['ALIGNMENT'], '{sample_id}_sorted.bam'),
sample_id = SAMPLE_IDS),
# Merging
expand(os.path.join(config['ALIGNMENT'], '{sample}_sorted_merged.bam'),
sample = SAMPLES),
# Marking Duplicates
expand(os.path.join(config['ALIGNMENT'], '{sample}_sorted_md.bam'),
sample = SAMPLES),
# Filtering
expand(os.path.join(config['FILTERED'],
'{sample}.bam'),
sample = SAMPLES),
expand(os.path.join(config['FILTERED'],
'{sample}.bam.bai'),
sample = SAMPLES),
# multiqc report
"multiqc_report.html"
message:
'\n#################### ATAC-seq pipeline #####################\n'
'Running all necessary rules to produce complete output.\n'
'############################################################'
I know it's too messy, I should only leave the necessary bits, but here my understanding of snakemake fails cause I don't know what I have to keep and what I should delete.
This is working, to my knowledge exactly as I want.
However, I added a rule:
rule hmmratac:
input:
bam = os.path.join(config['FILTERED'], '{sample}.bam'),
index = os.path.join(config['FILTERED'], '{sample}.bam.bai')
output:
model = os.path.join(config['HMMRATAC'], '{sample}.model'),
gappedPeak = os.path.join(config['HMMRATAC'], '{sample}_peaks.gappedPeak'),
summits = os.path.join(config['HMMRATAC'], '{sample}_summits.bed'),
states = os.path.join(config['HMMRATAC'], '{sample}.bedgraph'),
logs = os.path.join(config['HMMRATAC'], '{sample}.log'),
sample_name = '{sample}'
log:
os.path.join(config['LOGS'], 'hmmratac', '{sample}.log')
params:
genomes = config['GENOMES'],
blacklisted = config['BLACKLIST']
resources:
mem_mb = 32000
message:
'\n######################### Peak calling ########################\n'
'Peak calling for {output.sample_name}\n.'
'############################################################'
shell:
'HMMRATAC -Xms2g -Xmx{resources.mem_mb}m '
'--bam {input.bam} --index {input.index} '
'--genome {params.genome} --blacklist {params.blacklisted} '
'--output {output.sample_name} --bedgraph true &> {log}'
And into the rule all, after filtering, before multiqc, I added:
# Peak calling
expand(os.path.join(config['HMMRATAC'], '{sample}.model'),
sample = SAMPLES),
Relevant config.yaml fragments:
# Path to blacklisted regions
BLACKLIST: "/mnt/data/.../hg38.blacklist.bed"
# Path to chromosome sizes
GENOMES: "/mnt/data/.../hg38_sizes.genome"
# Path to filtered alignment
FILTERED: "alignment/filtered"
# Path to peaks
HMMRATAC: "peaks/hmmratac"
This is the error* I get (It goes on for every input and output of the rule). *Technically it's a warning but it halts execution of snakemake so I am calling it an error.
File path alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
WARNING:snakemake.logging:File path alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
It isn't actually ... - I just didn't feel safe providing an absolute path here.
For a couple of days, I have struggled with this error. Looked through the documentation, listened to the introduction. I understand that the above description is far from perfect (it is huge bc I don't even know how to work it down to provide minimal reproducible example...) but I am desperate and hope you can be patient with me.
Any suggestions as to how to google it, where to look for an error would be much appreciated.
Technically it's a warning but it halts execution of snakemake so I am calling it an error.
It would be useful to post the logs from snakemake to see if snakemake terminated with an error and if so what error.
However, in addition to Eric C.'s suggestion to use wildcards.sample instead of {sample} as file name, I think that this is quite suspicious:
alignment/filtered//mnt/data/.../hg38.blacklist.bed.bam
/mnt/ is usually at the root of the file system and you are prepending to it a relative path (alignment/filtered). Are you sure it is correct?

Loading variable addresses into registers PowerPC inline Assembly

I am trying to put together and example of coding inline assembly code in a 'C' program. I have not had any success. I am using GCC 4.9.0. As near as I can tell my syntax is correct. However the build blows up with the following errors:
/tmp/ccqC2wtq.s:48: Error: syntax error; found `(', expected `,'
Makefile:51: recipe for target 'all' failed
/tmp/ccqC2wtq.s:48: Error: junk at end of line: `(31)'
/tmp/ccqC2wtq.s:49: Error: syntax error; found `(', expected `,'
/tmp/ccqC2wtq.s:49: Error: junk at end of line: `(9)'
These are related to the input/output/clobbers lines in the code. Anyone have an idea where I went wrong?
asm volatile("li 7, %0\n" // load destination address
"li 8, %1\n" // load source address
"li 9, 100\n" // load count
// save source address for return
"mr 3,7\n"
// prepare for loop
"subi 8,8,1\n"
"subi 9,9,1\n"
// perform copy
"1:\n"
"lbzu 10,2(8)\n"
"stbu 10,2(7)\n"
"subi 9,9,1\n" // Decrement the count
"bne 1b\n" // If zero, we've finished
"blr\n"
: // no outputs
: "m" (array), "m" (stringArray)
: "r7", "r8"
);
It's not clear what you are trying to do with the initial instructions:
li 7, %0
li 8, %1
Do you want to load the address of the variables into those registers? In general, this is not possible because the address is not representable in an immediate operand. The easiest way out is to avoid using r7 and r8, and instead use %0 and %1 directly as the register names. It seems that you want to use these registers as base addresses, so you should use the b constraint:
: "b" (array), "b" (stringArray)
Then GCC will take care of the details of materializing the address in a suitable fashion.
You cannot return using blr from inline assembler because it's not possible to tear down the stack frame GCC created. You also need to double-check the clobbers and make sure that you list all the things you clobber (including condition codes, memory, and overwritten input operands).
I was able to get this working by declaring a pair of pointers and initializing them with the addresses of the arrays. I didn't realize that the addresses wouldn't be available directly. I've used inline assembly very sparsely in the past, usually just
to raise or lower interrupt masks, not to do anything that references variables. I just
had some folks who wanted an example. And the "blr" was leftover when I copied a
snipet of a pure assembly routine to use as a starting point. Thanks for the responses.
The final code piece looks like this:
int main()
{
char * stringArrayPtr;
unsigned char * myArrayPtr;
unsigned char myArray[100];
stringArrayPtr = (char *)&stringArray;
myArrayPtr = myArray;
asm volatile(
"lwz 7,%[myArrayPtr]\n" // load destination address
"lwz 8, %[stringArrayPtr]\n" // load source address
"li 9, 100\n" // load count
"mr 3,7\n" // save source address for return
"subi 8,8,1\n" // prepare for loop
"subi 9,9,1\n"
"1:\n" // perform copy
"lbzu 10,1(8)\n"
"stbu 10,1(7)\n"
"subic. 9,9,1\n" // Decrement the count
"bne 1b\n" // If zero, we've finished
// The outputs / inputs / clobbered list
// No output variables are specified here
// See GCC documentation on extended assembly for details.
:
: [stringArrayPtr] "m" (stringArrayPtr), [myArrayPtr]"m"(myArrayPtr)
: "7","8","9","10"
);
}

How to sequentially create multiple CSV files in Ruby?

Silly question, but I want to do some processing on a dataset and put them into different CSVs, like UDID1.csv, UDID2.csv, ..., UDID1000.csv. So this is my code:
for i in 1..1000
logfile = File.new('C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv',"a")
#I'll do some processing here
end
But the program throws an error when running because of the UDID#{i} part. So, how to overcome this issue? Thanks.
Edit: This is the error:
in `initialize': No such file or directory # rb_sysopen - C:\Users\hp1\Desktop\Datasets\New File\udid#{1}\.csv (Errno::ENOENT)from C:/Ruby21/bin/hashedUDID.rb:38:in `new' from C:/Ruby21/bin/hashedUDID.rb:38:in '<main>'
The ' is one problem, another problem is the path.
In your posting the New File must exist as a directory. Inside this directory must exist another directories like UDID0001. This gets a .csv file.
Correct is (I don't use the non-rubyesk for-loop):
1.upto(1000) do |i|
logfile = File.new("C:\\Users\\hp1\\Desktop\\Datasets\\UDID#{i}.csv", "a")
#I'll do some processing here
logfile.close #Don't forget to close the file
end
Inside " the backslash must be masked (\\). Instead you may use /:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID#{i}/.csv", "a")
Another possibility is the usage of %i to insert the number:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID%02i/.csv" % i, "a")
I prefer to use open, then the file is closed with the end of the block:
File.open("C:/Users/hp1/Desktop/Datasets/New File/UDID%04i/.csv" % i, "a") do |logfile|
#I'll do some processing here
end #closes the file
Warning:
I'm not sure, if you really want to create 1000 log files (The File is opened inside the loop. so each step creates a file.).
If yes, then the %04i-version has the advantage, that the files get all the same number of digits (starting with 0001 and ending with 1000).
(1..10).each { |i| logfile = File.new("/base/path/UDID#{i}.csv") }
You must use double quote (") when you need string interpolation.
#{} can only be used in strings with double quotes ". So change your code to:
for i in 1..1000
logfile = File.new("C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv","a")
# other stuff
end

Python edit file with an insanely long line

I am trying to edit particular html files that I download in python. I am running into a problem where I run my code to edit the file and my python context locks up. I checked the file it's writing to and found that there are two files. The html file and a .bak file.
The html file starts out at 0kb and the .bak file constantly grows to a point, maybe 12 mb or so, then the .html file will grow to a larger size, then the .bak file will grow again. This seems to cycle endlessly. The html file I am editing is 22kb. I watched the output file grow to a gig once just to see if it would stop... It doesn't.
Here is the function I am using to edit the file:
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
content = f.readlines()
for i, line in enumerate(content):
content[i] = line.replace(search_str, replace_str)
f.writelines(content)
f.close()
The issue, I imagine relates to the fact that the html file, as downloaded, is mostly in a single line with ~ 21,000 characters in it. Any ideas?
edit:
I have also tried another function, but get the same result:
def replace(self, search_str, replace_str):
assert self.path != None, 'No file path provided.'
fi = fileinput.FileInput(self.path,inplace=1)
for line in fi:
if search_str in line:
line=line.replace(search_str,replace_str)
print line
fi.close()
Try using generator. Thats the way to go if you need to read a large file
for line in open(self.path,'r+'):
# do stuff with line
I re-wrote the function to write everything out to a new file and it works.
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
new_path = self.path.split('.')[0]+'.TEMP'
new_f = open(new_path,'w')
new_lines = [x.replace(search_str, replace_str) for x in f]
new_f.writelines(new_lines)
f.close()
new_f.close()
os.remove(self.path)
os.rename(new_path, self.path)

Resources