I've been trying to get gperftools CPU profiling working on my program.
I'm running into an issue where all the function names in my program are pointer addresses when reported by pprof. Annoyingly, most of the function names from libraries I've linked are readable, but none from my program files are. Example below.
s979258ca% pprof --text ./hmiss hmiss.prof
Using local file ./hmiss.
Using local file hmiss.prof.
Total: 469 samples
152 32.4% 32.4% 152 32.4% 0x000000010ba6dd45
47 10.0% 42.4% 47 10.0% 0x000000010ba6d365
46 9.8% 52.2% 46 9.8% 0x000000010ba6d371
34 7.2% 59.5% 34 7.2% 0x000000010ba8a04a
32 6.8% 66.3% 32 6.8% 0x000000010ba6d35a
10 2.1% 68.4% 10 2.1% 0x000000010ba8873c
9 1.9% 70.4% 9 1.9% 0x00007fff63f409da
6 1.3% 71.6% 6 1.3% 0x000000010ba7feca
6 1.3% 72.9% 6 1.3% 0x00007fff63f40116
6 1.3% 74.2% 6 1.3% 0x00007fff63f409f2
5 1.1% 75.3% 5 1.1% 0x000000010ba6dd4c
...
What do I need to do to get my functions names included in the pprof output?
Here's what the process to get to the above point looks like for me, if it helps.
I build my program with the options below
clang++
"-std=c++17",
"-g",
"-O2",
"...cpp files..."
"-o",
"~/cpp/hmiss/hmiss",
"/usr/local/Cellar/gperftools/2.7/lib/libprofiler.dylib",
I enable CPU profiling with gprof by running
DYLD_FALLBACK_LIBRARY_PATH=/opt/local/lib CPUPROFILE=hmiss.prof ./hmiss
I then run pprof --text ./hmiss hmiss.prof
From an answer to a similar question I thought possibly including debugging symbols might get the names in there, but just building my program with -g didn't seem to help. Also, removing the -O2 flag did not help either.
Use google's pprof instead of brew's pprof https://github.com/google/pprof
I had a similar issue that pprof was only showing the binaries, not the function name and line. Also it was taking ages to produce the traces.
I found you can't call it as go tool pprof instead you must call ~/go/bin/pprof or have it on your path directly.
Related
I am using "samtools calmd" to add MD tag back to BAM file. The size of original BAM is around 50Gb (whole genome sequence by using pacbio HIFI reads). The issue that I encountered is that the speed of "calmd" is incredibly slow! The jobs have already run 12 hours, and only 600MB BAM with MD tag are generated. In this way, 50GB BAM will take 30days to be finished!
Here is the code I used to add MD tag (very normal):
rule addMDTag:
input:
rules.pbmm2_alignment.output
output:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Tmp/rawReads{readsIndex}.MD.bam"
params:
ref = strRef
threads:
16
log:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Log/rawReads{readsIndex}.MD.log"
benchmark:
strBAMDir + "/pbmm2/v37/{wcReadsType}/Benchmark/rawReads{readsIndex}.MD.benchmark.txt"
shell:
"samtools calmd -# {threads} {input} {params.ref} -bAr > {output}"
The version of samtools I used is v1.10.
BTW, I use 16 cores to run calmd, however, it looks like the samtools is still using 1 core to run it:
top - 11:44:53 up 47 days, 20:35, 1 user, load average: 2.00, 2.01, 2.00
Tasks: 1723 total, 3 running, 1720 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.8%us, 0.3%sy, 0.0%ni, 96.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 529329180k total, 232414724k used, 296914456k free, 84016k buffers
Swap: 12582908k total, 74884k used, 12508024k free, 227912476k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93137 lix33 20 0 954m 151m 2180 R 100.2 0.0 659:04.13 samtools
May I know how to make calmd be much faster? Or is there any other tool that can do the same job more efficiently?
Thanks so much
After the collaboration with samtools maintenance team, this issue has been solved.
The calmd will be super slow if the bam was unsorted. Therefore, always make sure the BAM has been sorted before run calmd.
See the details below:
Are your files name sorted, and does your reference have more than one entry?
If so calmd will be switching between references all the time,
which means it may be doing a lot of reference loading and not much MD calculation.
You may find it goes a lot faster if you position-sort the input, and then run it through calmd.
I'm running a KornShell script which originally has 61 input arguments:
./runOS.ksh 2.8409 24 40 0.350 0.62917 8 1 2 1.00000 4.00000 0.50000 0.00 1 1 4900.00 1.500 -0.00800 1.500 -0.00800 1 100.00000 20.00000 4 1.0 0.0 0.0 0.0 1 90 2 0.10000 0.10000 0.10000 1.500 -0.008 3.00000 0.34744 1.500 -0.008 1.500 -0.008 0.15000 0.21715 1.500 -0.008 0.00000 1 1.334 0 0.243 0.073 0.642 0.0229 38.0 0.03071 2 0 15 -1 20 1
I only vary 6 of them. Would it make a difference in performance if I fixed the remaining 55 arguments inside the script and just call the variable ones, say:
./runOS.ksh 2.8409 24 40 0.350 0.62917 8
If anyone has a quick/general answer to this, it will be highly appreciated, since it might take me a long time to fix the 55 extra arguments inside the script and I'm afraid it won't change anything.
There's no performance impact, as you're asking, but I see other threads:
What is the commandline limitation for your system? You mention 61 input parameters, some of them having a length of 8 characters. If the number of input parameters increases, you might have problems with the maximum command length.
Are you performing 440 million scripts? That's too much, far too much. You need to consider why you're doing this: you mention needing to wait ±153 days for their execution to finish, which is far too much (and unpredictable).
I've got my Go benchmark working with my API calls but I'm not exactly sure what it means below:
$ go test intapi -bench=. -benchmem -cover -v -cpuprofile=cpu.out
=== RUN TestAuthenticate
--- PASS: TestAuthenticate (0.00 seconds)
PASS
BenchmarkAuthenticate 20000 105010 ns/op 3199 B/op 49 allocs/op
coverage: 0.0% of statements
ok intapi 4.349s
How does it know how many calls it should make? I do have a loop with b.N as size of the loop but how does Golang know how many to run?
Also I now have cpu profile file. How can I use this to view it?
From TFM:
The benchmark function must run the target code b.N times. The benchmark package will vary b.N until the benchmark function lasts long enough to be timed reliably.
$ go tool pprof pgears.go profilefile.prof
addr2line: crackhdr: unknown header type
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 8 samples
5 62.5% 62.5% 5 62.5% 0000000000028a8b
1 12.5% 75.0% 1 12.5% 000000000002295c
1 12.5% 87.5% 1 12.5% 000000000009375a
1 12.5% 100.0% 1 12.5% 00000000000d278a
0 0.0% 100.0% 1 12.5% 000000000000252a
0 0.0% 100.0% 1 12.5% 000000000000259d
0 0.0% 100.0% 2 25.0% 0000000000017d9e
0 0.0% 100.0% 2 25.0% 000000000001a2bf
0 0.0% 100.0% 6 75.0% 000000000001b630
0 0.0% 100.0% 1 12.5% 0000000000045401
(pprof)
Why go tool pprof shows addresses instead of function names?
from http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
or http://blog.golang.org/profiling-go-programs
We know it must be function name.
Or how can I change the byte information to function name?
Operation System: Mac OS 10.9.2
Go Version: go1.2 darwin/amd64
The first argument to pprof must be a binary, not a source file.
So you must compile the binary:
$ go build -o ppears
and use it as the input to pprof:
$ go tool pprof pgears
go build -o bin // build the binary file
go tool pprof bin profilefile.prof
While profiling some of our Ruby code perftools.rb shows the following output:
Total: 291 samples
110 37.8% 37.8% 112 38.5% #<Module:0x007ff364e2bfd0>#__temp__
19 6.5% 44.3% 19 6.5% BigDecimal#div
18 6.2% 50.5% 171 58.8% BinSearch::Methods#_bin_search
15 5.2% 55.7% 15 5.2% BigDecimal#add
So, most of the time is spent in a method designated as #<Module:0x007ff364e2bfd0>#__temp__. How do I get more information on where this is exactly?
If you're using Rails, that's probably where it's coming from: http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Read/ClassMethods.html#method-i-define_method_attribute
I'm still not sure why it shows up, though. It looks like that method should exist for only a very short time.