Forcing a synchronous garbage collection in Ruby - ruby

I am trying to use the GDAL bindings to create geographic datasets in a Ruby on Rails app. However, GDAL only flushes those datasets on disk when the corresponding Ruby objects are destroyed. This (unanswered) question provides a nice explanation of what I am facing.
I tried setting every variable to nil and manually running GC.start, but as I understand it the Ruby GC is somewhat asynchronous (tell me if I'm wrong there, as I have limited Ruby experience), so this doesn't work all the time.
Is there a way to force a synchronous garbage collection so that I can be absolutely certain that my objects are destroyed when it is done?
Note that I would vastly prefer using GDAL over other libraries, as I have a large existing Python codebase that also uses the GDAL bindings, and the Python to Ruby translation is (or should be) relatively painless.

Related

Ruby performance tuning

I've an application runs on k8s that uses ruby v2.7.4, I'm tryning to have a look on some environment variables that may enhance the performance of my application. Can you help me to understand the below parameters and how to calculate the right value ?
WEB_CONCURRENCY
RUBY_GC_MALLOC_LIMIT
RUBY_GC_MALLOC_LIMIT_MAX
RUBY_GC_OLDMALLOC_LIMIT
RUBY_GC_OLDMALLOC_LIMIT_MAX
Thanks
Most are Garbage Collection Settings
Just don't. Unless you have very specific problems around garbage collection or odd memory constraints because you're running an embedded system, you shouldn't have to worry about garbage collection at all, especially on newer Rubies. You can find most of the values you're looking for in GC#stat, but I have no idea where you're getting "WEB_CONCURRENCY" from. That one is likely tied to your web server rather than Ruby's GC module or any known Ruby environment variable, so you're going to have to figure that one out some other way.
If you're having trouble with memory usage in Ruby, the problem is most often tied to objects that never go out of scope and therefore never get garbage collected. There are many better ways to optimize most Ruby applications than messing around with GC settings, but if you do have a valid use case, the GC module is where you should start.

How can I pass Selenium WebDriver objects between seperate Ruby processes?

I want to pass an instance of an object between two Ruby processes. Specifically, I want to pass an instance of a Selenium WebDriver from one process to another process. The reason I want to do this is because it takes a lot of time for Ruby to create this object, but I want it to be used by the other process.
I've found some related questions here and here that seem to point towards using DRb, but I've been unable to find any useful examples or sample code.
Is there a tool other than DRb that I should be using? Does anyone have an example similar to this that I could copy from?
It looks like you're going to have to use DRb, although the documentation for it seems to be lacking. There is however an interesting article here. You might also want to consider purchasing The dRuby Book by Masatoshi Seki to get a better idea of how to do this effectively.
Another option to investigate if you are not looking at simultaneous access, but you just want to send the object from one process to another, is to serialize (that is, encode in a way that Ruby can read) the object with YAML (for a human readable file) or Marshall (for a binary encoded file) and send it using a pipe. This was mentioned in another answer that has since been deleted.
Note that either of these solutions require modifying the Selenium code heavily since the objects you want to manipulate neither support copying, nor simultaneous access natively.
TL;DR
Most queue or distributed processes are going to require some sort of serialization to work properly. If you want to pass objects rather than messages, then this will a limiting factor in how you approach the problem.
DRb
I don't know if you can marshal a WebDriver object. If you can't, then DRb may be a good choice for your distributed Ruby programs because it supports DRbObject references for things that can't be marshaled. There are some examples provided in the DRb documentation.
Selenium Wire Protocol
Depending on what you're really trying to do, it may be worth taking a closer look at using the remote bindings for the Remote WebDriver client/server, or Selenium's JSON Wire Protocol as an alternative to passing objects between processes.
Other Alternatives: Fixtures, Factories, Stubs, and Mocks
Whether or not these work in your specific case will depend a lot on why you want to pass objects instead of simply driving the remote server. If it's largely an issue of how long it takes to build your object, then the serialization/de-serialization cycle may not necessarily be faster in all cases.
You might want to revisit why your object is so slow to create. If gathering and processing the data for it is what's taking too long, you can use some sort of test fixture or factory to trim that time, either by using a smaller set of fixed data, or using a pre-serialized object that's optimized for speed.
You might also consider whether you actually need real data or objects for your test at all. In many cases, you can speed up your tests a lot by stubbing methods or creating mock objects that will return the values you need for your integration tests without needing to perform expensive calculations or long-running operations.
There are certainly cases where you need to drive the full stack and perform acceptance tests on real data. Even then, you may be able to devise a set of fixture data that will take less time or memory to process. It's certainly worth at least thinking about.

A scripting engine for Ruby?

I am creating a Ruby On Rails website, and for one part it needs to be dynamic so that (sorta) trusted users can make parts of the website work differently. For this, I need a scripting language. In a sort of similar project in ASP.Net, I wrote my own scripting language/DSL. I can not use that source code(written at work) though, and I don't want to make another scripting language if I don't have to.
So, what choices do I have? The scripting must be locked down and not be able to crash my server or anything. I'd really like if I could use Ruby as the scripting language, but it's not strictly necessary. Also, this scripting part will be called on almost every request for the website, sometimes more than once. So, speed is a factor.
I looked at the RubyLuaBridge but it is Alpha status and seems dead.
What choices for a scripting language do I have in a Ruby project?
Also, I will have full control over where this project is deployed(root access), so there are no real limits..
There's also Rufus-lua though it's at version 0.1.0...
What about JRuby? You can use java implementation of many scripting language, such as javascript, scheme etc
Well, since it hasn't been suggested yet, there's Locking Ruby In The Safe as described by the Pickaxe book. This allows you to use Ruby as the language without significant slowdown AFAIK.
This technique is intended to allow safe sandboxing of untrusted Ruby code and bug fixes and discussions are directed toward keeping it that way, but infinite loops and some other things still allow malicious users to peg the CPU. (e.g. this discussion maybe.)
What I don't know is how you return data that is inherently safe to use from outside the safe thread. A singleton object (for instance) can mimic whatever class and then do something dangerous when any method is called in the returning thread. I'm still googling around about it. (The Ruby Programming Language says that level 4 "Prevents metaprogramming methods" which would allow you to safely verify the class of a returned object, which I suppose would make results safe to use.)
Barring that, it might not be hard (*snrk*) to implement a Lisp-1 with dynamic scope since you already have a garbage collector.

How to deal with memory leaks in RMagick in Ruby?

Im developing web-application with Merb and im looking for some safe and stable image processing library. I used to work with Imagick in php, then moved to ruby and start using RMagick. But there is a problem. Long running scripts causing memory leaks. There are couple solution exists, but I don't know which one is the most stable. So, what do you think?
Right now, my app uses internal API that i wrote to process images, in PHP. Its running on separate server along with other applications, so its not a big problem. But i think its not a good architecture.
Anyway, i`ll consider any practical tips.
I too have encountered this issue - the solution is to force garbage collection.
When you have reassigned the image variable to a new image simply use GC.start to ensure the old reference is released from memory.
On later versions of RMagick, I also believe you can also call destroy! on the image when you have finished processing it.
A combination of the two would probably ensure you are covered, but im not sure of the real life impact on performance (I would assume it is negligible i most cases).
Alternatively, you could use mini-magick which is a wrapper for the ImageMagick commandline client.
When using RMagick it's important to remember to destroy the image once you are done, otherwise you will fill up the /tmp dir when working with large sets of images. For example you must call destroy!
require 'RMagick'
Dir.foreach('/home/tiffs/') do |file|
next if file == '.' or file == '..'
image = Magick::Image.read(file).first
image.format = "PNG"
image.write("/home/png/#{File.basename(file, '.*')}.png")
image.destroy!
end
Actually, it isn't really a Ruby specific problem, other Interpreters share that as well. The concrete problem is that the GC of Ruby only sees memory that was allocated by Ruby itself, and not by external libraries (with the notable exception of the library using Rubys memory management facilities). So, a ImageMagick-Object in Ruby memory space is really small, but the image in the space managed by ImageMagick is large. So, this is not a leak per se, but it behaves like one.
Rubys Garbage Collector never kicks in if your Process stays under a certain limit (8MB is standard). As ImageMagick never creates large objects in Ruby space, it probably never kicks in. So, either you use the proposed method of spawning a new process or using exec. Another rather nifty one is to have an image processing service in the backend that forks for every task. Another one would be to have some kind of monitoring in place that kickstarts the GC every once in a while.
There is another Library called MagickWand by Timothy Paul Hunter (the author of RMagick) that tries to address these issues and create a nicer API. It's in alpha and requires a rather new release of ImageMagick, though.
Now you can tell ImageMagick which memory space should be used.
I think RMAGICK_ENABLE_MANAGED_MEMORY = true and GC.start is what you need.
MANAGED_MEMORY
If true, RMagick is using Ruby managed memory for all allocations. If false,
RMagick allocates memory for objects directly from the operating system. You can
enable RMagick to use Ruby managed memory (when built with ImageMagick 6.4.0-11
and later) by setting
RMAGICK_ENABLE_MANAGED_MEMORY = true
before requiring RMagick.
https://rmagick.github.io/constants.html
However, image.destroy! itself is enough to stabilize the memory consumption.
This is not due to ImageMagick; it's due to Ruby itself, and it's a well known problem. My suggestion is to split your program into two parts: a long-running part that allocates little memory and just deals with the control of the system, and a separate program that actually does the processing work. The long-running control process should do just enough to find some work for a child process that it spawns, and the child should do all of the processing for that particular work item.
Another option would be to leave the two combined, but after a work unit is complete, use exec to replace your process with a freshly started version of the same program, which would search for another work item, process it, and exec itself again.
This is assuming that the work items are fairly large, which they almost certainly are if you're using ImageMagick. If they're not, you'll find that the overhead of spawning a new process and having the Ruby interpreter re-parse your entire program starts to get a little too large. You can deal with this by having your program do more work units (say, ten or a hundred) before re-executing itself.

Is there a way to dump the objects in memory from a running ruby process?

Killing the processs while obtaining this information would be fine.
A quick-and-dirty way would be ObjectSpace.each_object{|e| p e}. You could do some tests to determine what you wanted to keep, or Marshal the objects.
For 1.9.2/1.9.3 there's heap_dump gem, it can be injected into a running process using gdb (but more stable was is to include it in process itself, no performance overhead)
It dumps references to objects, not objects themselves, but this is usable if you're into fighting leaks
For the more hardcore there is also BleakHouse which gives you a special custom-compiled copy of ruby with better memory leak tracking powarz

Resources