Logging GC events in Ruby - ruby

I'm trying to debug my c extension (https://github.com/rgeo/rgeo), and am currently improving handling of compaction (a nice video introducing that topic), and I'd like to have informations about when the GC is running.
So far I found one solutions to see how many times the GC has run:
GC::Profiler.enable
at_exit { GC::Profiler.report }
However I'd rather gather information when the GC runs directly, not by waiting for the end of my program.
I found that gc.c has a gc_report method that would do the trick, however it looks like I have to recompile ruby to work with that one.
Isn't there a simple way to enable GC events reporting? Note that I'm using rbenv on a macos, if that may help.

Related

How do I instrument only the actual benchmark of SPEC CPU2006 with Intel's Pin?

I've been trying to instrument SPEC CPU2006 benchmarks using Intel's Pin on Ubuntu. I have a Pintool with a simple cache simulator that counts reads and writes. When running the Pintool on a 'runspec -nonreportable' command for a specific benchmark I get the data I want. However, the results of different benchmarks hardly differ at all. My pintool doesn't seem to be the problem as it looks to be working correctly on other applications. I suspect the results are due to the Pintool is instrumenting everything including the setup of the benchmark.
What I've previously done it just running the pintool on the runspec command. I've also tried to use '--action build' and '--action setup' prior to using runspec to reduce the overhead, but it seems like runs much of the same setup anyway. I know there are monitoring hooks in SPEC CPU 2006 where I can run additional commands right before starting a benchmark, and I'm thinking there might be someway in which I can use those but I'm know sure how. Maybe the 'monitor_wrapper' hook is most appropriate? Maybe I can get a hold of the pid somehow and attach my pintool to the correct process just as the benchmark is starting? Super thankful for any help I can get!
You're probably just instrumenting runspec itself, which runs in a process that creates another process in which the benchmark is run. You have two options: either tell Pin to follow child processes (using the -follow_execv option) or directly inject Pin into the process of the benchmark when it gets created (by running the benchmark using specinvoke instead of runspec).

Are there still benefits to running JRuby vs. the latest MRI with Puma?

I'm considering updating our ruby interpreter to JRuby, it's been quite a headache because we've had to remove any 2.x specific syntax from our app and resort to ruby 1.9.3 compatibility. Which isn't the end of the world.
When it came time to run the app, I found out that we cannot use Puma in clustered mode. The question is, given all the fixes and changes to MRI in the past few years, are the benefits of having "real threads" still valid?
update
To make this more objective, the question is, "Does the latest version of MRI negate the need to adopt JRuby to achieve the same benefits that native threads give you?"
Does the latest version of MRI negate the need to adopt JRuby to
achieve the same benefits that native threads give you?
The answer is no. It does not negate the need, and it depends on your application as mentioned in other answers.
Also, JRuby does not allow you to run in cluster mode, but that is not really a problem in regards to your question, because it is multithreaded and parallel.
Simply run in Single mode with as many threads as you need. It should be perfectly fine, if not even more lightweight.
Let me give you some references that give more insight and allow you to dig further.
This answer discusses experiments with MRI and JRuby testing concurrent requests using Puma (up to 40 threads). It is quite comprehensive.
The experiments are available on GitHub, MRI and JRuby.
The caveat is that it only tests concurrent requests, but does not have a race condition in the controller. However, I think you could implement the test from this article Removing config.threadsafe! without too much effort.
The difference between JRuby and MRI is that JRuby can execute code in parallel. MRI is limited by the GIL and only one thread at a time can be executed. You can read more information about the GIL in this article Nobody understands the GIL.
The results are quite surprising. MRI is faster than JRuby. Feel free to improve and add race conditions.
Note that both are multi-threaded and not thread safe. The difference really is that MRI cannot execute code in parallel and JRuby can.
You might be tempted to say why I answer "No" if the experiment shows that MRI is faster.
I think we need more experiments and in particular real world applications.
If you believe that JRuby should be faster because it can execute code in parallel then reasons could be:
The experiments should be executed in a highly parallel environment
to be able leverage the potential of JRuby.
It could be the web server itself. Maybe Puma does not leverage the full potential of JRuby. MRI has a GIL, so why is it faster than JRuby in handling requests?
Other factors might be relevant that are more in depth and we did not discover yet...
Really depends on your scenario with the web-server (which you should have the very best understanding) ... case you feel your production is just serving about fine under MRI than you probably do not have that much concurrency around. puma's README pretty much explains what you get under MRI compared to Rubinius/JRuby :
On MRI, there is a Global Interpreter Lock (GIL) that ensures only one thread can be run at a time. But if you're doing a lot of blocking IO (such as HTTP calls to external APIs like Twitter), Puma still improves MRI's throughput by allowing blocking IO to be run concurrently (EventMachine-based servers such as Thin turn off this ability, requiring you to use special libraries). Your mileage may vary. In order to get the best throughput, it is highly recommended that you use a Ruby implementation with real threads like Rubinius or JRuby
... so in one sentence: ** you can have multiple threads under MRI, but you have no parallelism **
IMHO It depends on what your application does.
I've tested both MRI/YARV and JRuby on my Rails application.
Since most of what the app does is route HTTP requests, fetch from DB, apply simple business logic and write to the DB, parallelism isn't much of an issue. Puma on MRI does handle multi-threading for blocking IO operations (DB, API). Tasks that fall off this scope (image processing, crunching report data, calls to external APIs, etc.) should probably be handled by background jobs anyway (I recommend https://github.com/brandonhilkert/sucker_punch).
Depending on your deployment needs memory consumption might be more of an issue and JRuby is very hungry for memory. Almost 2x memory in my case.
If you're deploying your application on Heroku you might find that you get more bang for the buck by being able to run 2 instances concurrently on 1 dyno.

How to profile garbage collection in Ruby

I'm trying to profile GC in a non-Rails application, preferably using YARV Ruby.
perftools.rb is telling me that the majority of my CPU time is spent in garbage_collector (6061 (61.4%)).
I'm also able to get how many objects are created by which methods with perftools.rb . Some methods create more objects than others, but it's not extremely skewed.
Where do I go from here? Is it possible to get more detailed information on why it's spending so much time doing GC? Is it possible to see whether the time is spent getting rid of objects, or whether it is spent checking whether an object should be garbage collected or not?
I have access to OS X Lion, Windows 7 and Ubuntu 12.04.
On osx you have dtrace. There are dtrace providers in YARV ruby.
You have a couple of probes related to GC that you can use:
gc-begin
gc-end
gc-mark-begin
gc-mark-end
gc-sweep-begin
gc-sweep-end
I think they can help finding what the GC in your program is doing. have a look at this file to see how use them: https://github.com/tenderlove/ruby/blob/probes/test/dtrace/test_gc.rb.
And this post for more explanations: http://tenderlovemaking.com/2011/06/29/i-want-dtrace-probes-in-ruby.html
There's a bug opened in ruby http://bugs.ruby-lang.org/issues/2565 where you can find a patch to apply to ruby to have those probes or you can use https://github.com/tenderlove/ruby/tree/probes where the patch is already applied.
Hope this helps

Since adding observers to my Ruby module, my system locks up

It only happens on certain types of errors, for example if I make a call to a method that doesn't exist on one of my objects. But it's hard to get any information on what is causing this because I can't step through what is causing it, as my debugger locks up as well. When I look at top, I see something like 97% of my CPU time being taken up by a Ruby process. I tried running Sample Process in activity monitor to see if it could show me where it is getting stuck, but nothing relevant seems to come up (just alot of OSX classes).
This is a Padrino project, I am running Ruby 1.9.2 and using the Observable mixin. I am on OSX Lion. Any ideas or suggestions for troubleshooting? This is killing my productivity!!
Which version of padrino do you have? Latest 0.10.1 fix this problem.

Diagnosing Deadlocks in Win32 Program

What are the steps and techniques to debug an apparent hang due to a deadlock in a Win32 production process. I heard that WinDbg can be used for this purpose but could you please provide clear hints on how this can be accomplished?
This post should get you started on the various options..Check the posts tagged with Debugging..
Another useful article on debugging deadlocks..
Debugging a true deadlock is actually kind of easy, if you have access to the source and a memory dump (or live debugging session).
All you do is look at the threads, and find the ones that are waiting on some kind of shared resource (for example hung waiting in WaitForSingleObject). Generally speaking from there it is a matter of figuring out which two or more threads have locked each other up, and then you just have to figure out which one broke the lock heirarchy.
If you can't easily figure out which threads are locked up, use the method shown in this post here to trace the lock chain for each thread. When you get into a loop, the threads in the loop are the ones that are deadlocked.
If you are very lazy, you can install Application Verifier, then add you module and select just "locks" from the basic test.
then you can run your application under any debugger.
if a critical section deadlock happens you with find the reason right away.
What language/IDE are you using?
In .Net you can view the threads of an application: Debug->Windows->Threads or Ctrl+Alt+H
Debugging deadlocks can be tricky. I usually do some kind of logging and see where the log stops. I either log to a file or to the debug console using OutputDebugString().
The best thing is to start by adding logging statements. Generally I would recommend only around the shared resources that are deadlocking but also adding them in general might point to situations or areas of code you weren't expecting. The much publicized stackoverflow.com database issue actually turned out to be log4net! The stackoverflow team never suspected log4net, and only by examining logging (ironically) showed this. I would initially forgo any complicated tools e.g., WinDgb since using them is not very intuitive IMHO.

Resources