Module#__temp__ in perftools.rb output - ruby

While profiling some of our Ruby code perftools.rb shows the following output:
Total: 291 samples
110 37.8% 37.8% 112 38.5% #<Module:0x007ff364e2bfd0>#__temp__
19 6.5% 44.3% 19 6.5% BigDecimal#div
18 6.2% 50.5% 171 58.8% BinSearch::Methods#_bin_search
15 5.2% 55.7% 15 5.2% BigDecimal#add
So, most of the time is spent in a method designated as #<Module:0x007ff364e2bfd0>#__temp__. How do I get more information on where this is exactly?

If you're using Rails, that's probably where it's coming from: http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Read/ClassMethods.html#method-i-define_method_attribute
I'm still not sure why it shows up, though. It looks like that method should exist for only a very short time.

Related

YamlDotNet- packet to yaml

Hello I wanted to create program that will help me with converting packet data from game to yaml it should look like this
monsters:
- map_monster_id: 1678
vnum: 333
map_x: 30
map_y: 165
- map_monster_id: 1679
vnum: 333
map_x: 24
map_y: 157
i have code that is supposed to write those things in database and I want rework so it can write to yaml anyone who should tell me where to start thank you :)

No function names when using gperftools/pprof

I've been trying to get gperftools CPU profiling working on my program.
I'm running into an issue where all the function names in my program are pointer addresses when reported by pprof. Annoyingly, most of the function names from libraries I've linked are readable, but none from my program files are. Example below.
s979258ca% pprof --text ./hmiss hmiss.prof
Using local file ./hmiss.
Using local file hmiss.prof.
Total: 469 samples
152 32.4% 32.4% 152 32.4% 0x000000010ba6dd45
47 10.0% 42.4% 47 10.0% 0x000000010ba6d365
46 9.8% 52.2% 46 9.8% 0x000000010ba6d371
34 7.2% 59.5% 34 7.2% 0x000000010ba8a04a
32 6.8% 66.3% 32 6.8% 0x000000010ba6d35a
10 2.1% 68.4% 10 2.1% 0x000000010ba8873c
9 1.9% 70.4% 9 1.9% 0x00007fff63f409da
6 1.3% 71.6% 6 1.3% 0x000000010ba7feca
6 1.3% 72.9% 6 1.3% 0x00007fff63f40116
6 1.3% 74.2% 6 1.3% 0x00007fff63f409f2
5 1.1% 75.3% 5 1.1% 0x000000010ba6dd4c
...
What do I need to do to get my functions names included in the pprof output?
Here's what the process to get to the above point looks like for me, if it helps.
I build my program with the options below
clang++
"-std=c++17",
"-g",
"-O2",
"...cpp files..."
"-o",
"~/cpp/hmiss/hmiss",
"/usr/local/Cellar/gperftools/2.7/lib/libprofiler.dylib",
I enable CPU profiling with gprof by running
DYLD_FALLBACK_LIBRARY_PATH=/opt/local/lib CPUPROFILE=hmiss.prof ./hmiss
I then run pprof --text ./hmiss hmiss.prof
From an answer to a similar question I thought possibly including debugging symbols might get the names in there, but just building my program with -g didn't seem to help. Also, removing the -O2 flag did not help either.
Use google's pprof instead of brew's pprof https://github.com/google/pprof
I had a similar issue that pprof was only showing the binaries, not the function name and line. Also it was taking ages to produce the traces.
I found you can't call it as go tool pprof instead you must call ~/go/bin/pprof or have it on your path directly.

Can a large amount of arguments deteriorate performance of a ksh or bash script?

I'm running a KornShell script which originally has 61 input arguments:
./runOS.ksh 2.8409 24 40 0.350 0.62917 8 1 2 1.00000 4.00000 0.50000 0.00 1 1 4900.00 1.500 -0.00800 1.500 -0.00800 1 100.00000 20.00000 4 1.0 0.0 0.0 0.0 1 90 2 0.10000 0.10000 0.10000 1.500 -0.008 3.00000 0.34744 1.500 -0.008 1.500 -0.008 0.15000 0.21715 1.500 -0.008 0.00000 1 1.334 0 0.243 0.073 0.642 0.0229 38.0 0.03071 2 0 15 -1 20 1
I only vary 6 of them. Would it make a difference in performance if I fixed the remaining 55 arguments inside the script and just call the variable ones, say:
./runOS.ksh 2.8409 24 40 0.350 0.62917 8
If anyone has a quick/general answer to this, it will be highly appreciated, since it might take me a long time to fix the 55 extra arguments inside the script and I'm afraid it won't change anything.
There's no performance impact, as you're asking, but I see other threads:
What is the commandline limitation for your system? You mention 61 input parameters, some of them having a length of 8 characters. If the number of input parameters increases, you might have problems with the maximum command length.
Are you performing 440 million scripts? That's too much, far too much. You need to consider why you're doing this: you mention needing to wait ±153 days for their execution to finish, which is far too much (and unpredictable).

strace'ing/profiling a bash script

I'm currently trying to benchmark a bash script in 4 different versions. Each one does a giant rsync job and it usually takes a very long time to finish. There are many steps in the bash script which involves setting up and tearing down the environment to rsync to.
However, when I ran strace on the bash scripts, I get surprisingly short results, which leads me to believe that strace is not actually tracing the time waiting for a command like rsync(which might be spawned in a subshell and is completely not recorded by rsync), or, it's waking up intermittently and sleep for another amount of time of which strace is not counting. Here's a snippet:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.98 12.972555 120116 108 52 wait4
0.01 0.000751 13 56 clone
0.00 0.000380 1 553 rt_sigprocmask
0.00 0.000303 2 197 85 stat
0.00 0.000274 2 134 read
0.00 0.000223 19 12 open
0.00 0.000190 48 4 getdents
0.00 0.000110 1 82 8 close
0.00 0.000110 1 153 rt_sigaction
0.00 0.000084 1 61 getegid
0.00 0.000074 4 19 write
So what tools can I use that are similar to strace, OR, maybe I'm missing some type of recursive flag in strace to find out correctly where my bash script is waiting on?
I would like something along the lines of:
% time command
------ --------
... rsync
... ls
Any suggestions would be appreciated. Thank you!

orthAgogue incorrectly processing BLAST files

Need to recruit the help of any budding bioinformaticians that are lurking in the shadows here.
I am currently in the process of formatting some .fasta files for use in a set of grouping programs but I cannot for the life of me get them to work. First things first, all the files have to have a 3 or 4 character name such as the following:
PP41.fasta
PP59.fasta
PPBD.fasta
...etc...
The files must have headers for each gene sequence that look like so: >xxxx|yyyyyyyyyy where xxxx is the same 3 or 4 letter 'taxon' identifier as the file names I put above and yyyyyyy is a numerical identifier for each of the proteins within each of the taxons (the pipe symbol can also be replaced with an _ as below). I then cat all of these in to one file which has a header that looks correct like so:
>PP49_00001
MIENFNENNDMSDMFWEVEKGTGEVINLVPNTSNTVQPVVLMRLGLFVPTLKSTKRGHQG
EMSSMDATAELRQLAIVKTEGYENIHITGARLDMDNDFKTWVGIIHSFAKHKVIGDAVTL
SFVDFIKLCGIPSSRSSKRLRERLGASLRRIATNTLSFSSQNKSYHTHLVQSAYYDMVKD
TVTIQADPKIFELYQFDRKVLLQLRAINELGRKESAQALYTYIESLPPSPAPISLARLRA
RLNLRSRVTTQNAIVRKAMEQLKGIGYLDYTEIKRGSSVYFIVHARRPKLKALKSSKSSF
KRKKETQEESILTELTREELELLEIIRAEKIIKVTRNHRRKKQTLLTFAEDESQ*
>PP49_00002
MQNDIILPINKLHGLKLLNSLELSDIELGELLSLEGDIKQVSTGNNGIVVHRIDMSEIGS
FLIIDSGESRFVIKAS*
Next step is to construct a blast database which I do as follows, using the formatdb tool of NCBI Blast:
formatdb -i allproteins.fasta -p T -o T
This produces a set of files for the database. Next I conduct an all-vs-all BLAST of the concatenated proteins against the database that I made of them like so, which outputs a tabular file which I suspect is where my issues are beginning to arise:
blastall -p blastp -d allproteins.fasta -i allproteins.fasta -a 6 -F '0 S' -v 100000 -b 100000 -e 1e-5 -m 8 -o plasmid_allvall_blastout
These files have 12 columns and look like the below. It appears correct to me, but my supervisor suspects the error is in the blast file - I don't know what I'm doing wrong however.
PP49_00001 PP51_00025 100.00 354 0 0 1 354 1 354 0.0 552
PP49_00001 PP49_00001 100.00 354 0 0 1 354 1 354 0.0 552
PP49_00001 PPTI_00026 90.28 288 28 0 1 288 1 288 3e-172 476
PP49_00001 PPNP_00026 90.28 288 28 0 1 288 1 288 3e-172 476
PP49_00001 PPKC_00016 89.93 288 29 0 1 288 1 288 2e-170 472
PP49_00001 PPBD_00021 89.93 288 29 0 1 288 1 288 2e-170 472
PP49_00001 PPJN_00003 91.14 79 7 0 145 223 2 80 8e-47 147
PP49_00002 PPTI_00024 100.00 76 0 0 1 76 1 76 3e-50 146
PP49_00002 PPNP_00024 100.00 76 0 0 1 76 1 76 3e-50 146
PP49_00002 PPKC_00018 100.00 76 0 0 1 76 1 76 3e-50 146
SO, this is where the problems really begin. I now pass the above file to a program called orthAgogue which analyses the paired sequences I have above using parameters laid out in the manual (still no idea if I'm doing anything wrong) - all I know is the several output files that are produced are all just nonsense/empty.
Command looks like so:
orthAgogue -i plasmid_allvsall_blastout -t 0 -p 1 -e 5 -O .
Any and all ideas welcome! (Hope I've covered everything - sorry about the long post!)
EDIT Never did manage to find a solution to this. Had to use an alternative piece of software. If admins wish to close this please do, unless it is worth having open for someone else (though I suspect its a pretty niche issue).
Discovered this issue (of orthAgogue) first today:
though my reply may be old, I hope it may help future users;
issue is due to a missing parameter: seems like you forgot to specify the separator: -s '_', ie, the following set of command-line parameters should do the trick*:
orthAgogue -i plasmid_allvsall_blastout -t 0 -p 1 -e 5 -O -s '_'
(* Under the assumption that your input-file is a tabular-seperated file of columns.)
A brief update after comment made by Joe:
In brief, the problem described in the intiail error report (by Joe) is (in most cases) not a bug. Instead it is one of the core properties of the Inparanoid algorithm which orthAgogue implements: if your ortholog-result-file is empty (though constructed), this (in most cases) implies that there are no reciprocal best match between a protein-pair from two different taxa/species.
One (of many) explanations for this could be that your blastp-scores are too similar, a case where I would suggest a combined tree-based/homology clustering as in TREEFAM.
Therefore, when I receive your data, I'll send it to one of the biologists I'm working with, with goal of identifying the tool proper for your data: hope my last comment makes your day ;)
Ole Kristian Ekseth, developer of orthAgogue

Resources