Why is Valgrind memcheck running numerous times on my Ruby program? - ruby

When I run my Ruby program with the following command:
valgrind --tool=memcheck ruby hello.rb
I get outputs for heap, leak and error summaries exactly four times every time.
Full output
All my Ruby program does is that it loads a text file containing 10,000 characters. Memcheck worked just fine for my similar Python programs when I ran them with the same commands. Why do I get four outputs? Is some of them indicating the correct amount of memory used? I'm measuring memory consumptions for my master's thesis, so I really need to find out what's causing this!
Thanks!

By default, valgrind does not 'follow' child processes (i.e. does not follow
the forked children that calls an exec system call).
But by default, a forked child that does not exec will output some results.
Using --child-silent-after-fork=yes should make the output of the forking not
exec-ing child disappear.

Related

How to have multiple processes in zsh or bash appending to the same file concurrently?

I am running some tests from shell (currently zsh but could be bash as well) concurrently that will output the results by appending to the same file (results.txt).
Each result will be a few lines and some hundred bytes long (not longer than 1000).
I want to be able to read the output of each test in whole in the results file, without any interleaving from any test that might have finished at the same time.
I see a couple of obvious options from a theoretical point of view:
Atomic write(append)
Use a mutex to acquire the results file
The problem is that I have no idea whether how to do any of the 2 in a shell.
More specifically, I don't know if appends are atomic by nature and I don't know how to use a mutex in a shell context.
Any help appreciated.

How to debug potential CPU/RAM errors in Bash script on Linux

I have a relatively simple bash script that reads from a set of static input files, stores the input in bash variables and then does a bunch of processing over said input by calling out to external scripts (e.g. written in Python, Go, other bash scripts etc.) and using the intermediate results.
Lately I have been experiencing an intermittent problem where a single character seems to be getting altered somewhere during the processing which then causes subsequent errors. Specifically, a lot of the processing I'm doing involves slicing up a list of comma-separated records, and one of the values on each line is a unix timestamp, e.g. 1354245000.
What seems to be happening is that occasionally one of these values will get altered slightly, so I end up with a timestamp like 13542458=2 or 13542458>2 or 13542458;2 coming out of one of the intermediate scripts. This then subsequently gets fed into another script, which throws an exception when it tries to parse the value to an integer.
In the title of this question, I've suggested that this might be a potential CPU/RAM error. I know the general folly in thinking errors are caused by low level things like hardware/compilers etcetera, but the nature of this particular error makes me think it may be possible, for the following reasons:
The input files are the same on each invocation of the script, and the script only fails on some invocations.
I cannot think of any sources of randomness in the source code prior to where the script is breaking. It's basically just slicing and dicing csv input.
I cannot think of any sources of concurrency in the source code -- even the Go scripts aren't actually written to run anything concurrently.
This problem has only arisen in the last week or so. Prior to this time, this error would never occur.
While I haven't documented every erroneous character, they seem to often be quite close in the ASCII table to numeric values (=, >, ; etc). That said, I guess the Hamming distance between two characters quite far apart can be small also with changes to a high order bit.
The script often breaks at a different stage on different runs. i.e. I have a number of separate Python scripts, and sometimes it'll make it past one script and then the error will be induced in another. Other times it'll be induced on an earlier script.
What I'd like to know is, is there any methodical way to either confirm or rule out a hardware error for this problem? Or if it is a hardware problem, is it possibly undetectable by the operating system?
A bit of further info on the machine:
Linux 64-bit, Ubuntu 12.04
Intel i7 processor
16GB DDR3 RAM
I'm hoping someone can either point me to a reliable way to verify whether the hardware is to blame or otherwise a sound reason as to what else might be the cause.
Try booting into Memtest to check your memory.
While it is highly unlikely that it will be hardware, if you have exhausted you standard software debug as suggested by #OliCharlesworth, here is an outline of hardware error investigation:
(1) check your log area for any `MCE` logs (machine check exceptions).
If you find any in either your log area (syslog) or sometimes in
the present working dir or /dir -- you have a hardware failure.
(2) check your log area for disk errors. e.g:
smartd[3963]: Device: /dev/sda [SAT], 34 Currently unreadable (pending) sectors
(3) check your drive integrity, e.g.: (as root) # `smartctl -a /dev/sda` if any abnormality, run:
smartctl -t short /dev/sda (change drive as required)
(4) download/install/boot to [memtest86](http://www.memtest86.com/download.htm)
(run the complete test)
If your cpu/motherboard has thrown no mce's, you have no disk error, your drive tests OK with smartctl and you have no memory errors with memtest86, then recheck the software debugging. While additional hardware errors can still be present (bad capacitors, etc..) the likelihood at this point is software. Good luck.

how to get output of rb_backtrace() in gdb for passenger process

I'm trying to dump a backtrace of a passenger process in gdb. I know I should just execute
attach <PID>
call rb_backtrace()
after starting gdb, but I can't figure out where the output is going, I've looked at rails production logs (set to info), nginx logs in /var/logs/nginx but I can't find the output. Any ideas?
I don't know the answer on the ruby end -- I'd guess it is going to the ruby process' stdout or stderr -- but gdb recently got a new feature that is designed to help with this scenario.
The new feature is called "frame filters" and it lets you change how stack traces are presented by writing simple Python scripts that examine the state of the inferior process. For example, you could write such a script that understands the Ruby interpreter, and then have gdb's "bt" automatically interleave interpreted (Ruby) frames with C frames.
For more information, start here and read the next few nodes: http://sourceware.org/gdb/current/onlinedocs/gdb/Frame-Filter-API.html#Frame-Filter-API
I'd like to see this feature be adopted by the various interpreter projects. There's been pretty good adoption of pretty printing, and I think this is a logical next step.

Does multiple runs make it parallel?

I have written a short python script to process my big fastq files in size from 5Gb to 35Gb. I am running the script in a Linux server that has many cores. The script is not written in parallel at all and taking about 10 minutes to finish for a single file in average.
If I run the same script on several files like
$ python my_script.py file1 &
$ python my_script.py file2 &
$ python my_script.py file3 &
using the & sign to push back the process.
do those scripts run in parallel and will I save some time?
It seems not to me, since I am using top command to check the processor usage and each ones usage drops as I added new runs or shouldn't it use somewhere close 100% ?
So if they are not running in parallel, is there a way to make the os run them in parallel ?
Thanks for answers
Commands executed this way do indeed run in parallel. The reason why they're not using up 100% of your CPU time might be because they're I/O bound, rather than CPU bound. The description of what the the script does ("big fastq files in size from 5Gb to 35Gb") suggests that this might just be the case.
If you look at the process list given by ps, though, you should see three python processes on there - unless one or more of them will have terminated by the time you run ps.
Time spent in waiting on I/O operations is accounted as a different kind of CPU usage, usually %wa. You are probably just looking at the %us (user CPU time).

Ruby process is at 100% after script ends, profiling, solution?

UPDATE: Problem located in my related question - Nokogiri performance problem
I am having a serious problem with my program. After program reaches it's last statement, Aptana studio shows the program is still running even after the last line was evaluated. Ruby process (after the last line of the script) is still running with 100% CPU usage, it ends after several seconds (15-30 maybe). I am trying to at atleast see where the problem is but after a long time I am still at the beginning. So the question is, what could cause this problem and how can I at least see where the problem is, what are my options? Some additional information:
Aptana debbug mode: After the last line, this will show in the Debug window:
<terminated, exit value: 0>path/to/ruby
But Ruby process is still running and using 100% CPU
I was trying to use gdb to profile Ruby process itself, but ended up with nothing using method described here: Profilig using gdb. I am using debian squeeze 64-bit and i tried both versions of script (8,12 > 16,24). When I tried to get some stack info I just get this:
Program received signal SIGSEGV, Segmentation fault.
0x00007f20539a80b8 in ?? () from /lib/libc.so.6
/home/giron/programovani/gdb_init.sh:1: Error in sourced command file:
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(backtrace) will be abandoned.
When the function is done executing, GDB will silently stop.
After I quit gdb, following output shows up in Aptana console (But this is maybe absolutely useless, probably gdb did this, I don't know):
/home/giron/Aptana Studio 3 Workspace/RedisXmlConcept/bin/main.rb: [BUG] Segmentation fault
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]
-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:000f68 d:000f68 TOP
---------------------------
-- C level backtrace information -------------------------------------------
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(rb_vm_bugreport+0x5f)[0x7f205488216f]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(+0x63274) [0x7f205476a274]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(rb_bug+0xb3) [0x7f205476a413]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(+0x10c215) [0x7f2054813215]
/lib/libpthread.so.0(+0xeff0) [0x7f20544f9ff0]
/lib/libc.so.6(+0xe40b8) [0x7f20539a80b8]
/lib/libgcc_s.so.1(_Unwind_Backtrace+0x49) [0x7f2050d5b599]
/lib/libc.so.6(backtrace+0x4e) [0x7f20539a81ae]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/bin/ruby(_start+0) [0x400890]
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Just to be sure that I have described problem well, last line of code (before this, Nokogiri parsing and work with Redis database is done):
puts "End"
End is printed out and after this Ruby process will consume 100% CPU for several seconds
This question is related to my previous one here: Nokogiri performance problem where are some more code snippets but since I am focusing on the different approach here (profiling Ruby), I have created new question.
Thank you in advance for any tips, I am pretty much clueless right now.
I was trying to use gdb to profile Ruby process itself
Don't do that. Calling backtrace may not be safe in the context you are executing in, and (apparently) causes your program to SIGSEGV.
Instead, just attach gdb to the Ruby process, and execute thread apply all where command. Update your question with the output, and you may get a better answer.

Resources