How to deal with Ruby 2.1.2 memory leaks? - ruby

I have a worker process which spawns up to 50 threads and do some async operations (most of which are http calls). When I start up the process, it starts with some 35MB of used memory, and quickly grows to 250MB. From that point it grows further more and the problem is that the memory never stops growing (even though the growing phase decreases over time). After several days, process just outgrows the available memory and crashes.
I did a lot of analysis and profiling and can't seem to find what is wrong. Process memory is constantly growing, even though the heap size is pretty much constant. I've collected GC.stat output into spreadsheet that you can access here:
https://docs.google.com/spreadsheets/d/17TohDNXQ_MXM31CeAmR2ptHFYfvOeF3dB6WCBkBS_Bc/edit?usp=sharing
Even though it seems that the process memory has finally stabilized at 415MB, it will continue to grow over next couple of days until it reaches 512MB limit and crashes.
I've also tried tracking objects with objectspace, but the sum of memory of tracked objects never crosses 70-80MB which perfectly aligns with GC reports. Where are the remaining 300MB+ (and growing) spent.. i have no clue.
How to deal with these kinds of problems? Are there any tools that could give me clearer insight on how the memory is being consumed?
UPDATE: Gems and OS
I'm using following gems:
gem "require_all", "~> 1.3"
gem "thread", "~> 0.1"
gem "equalizer", "~> 0.0.9"
gem "digest-murmurhash", "~> 0.3", require: "digest/murmurhash"
gem "google-api-client", "~> 0.7", require: "google/api_client"
gem "aws-sdk", "~> 1.44"
The application is deployed on heroku, though memory leak is noticable when running it locally on Mac OS X 10.9.4.
UPDATE: Leaks
I've upgraded stringbuffer and analyzed everything like #mtm suggested and now there are no memory leaks identified by leak tool, no increases in ruby heap size over time, and yet, the process memory is still growing. Originally I thought that it stopped growing at some point, but several hours later it outgrew the limit and process crashed.

From your GC logs it appears the issue is not a ruby object reference leak as the heap_live_slot value is not increasing significantly. That would suggest the problem is one of:
Data being stored outside the heap (Strings, Arrays etc)
A leak in a gem that uses native code
A leak in the Ruby interpreter itself (least likely)
It's interesting to note that the problem exhibits on both OSX and Heroku (Ubuntu Linux).
Object data and the "heap"
Ruby 2.1 garbage collection uses the reported "heap" only for Objects that contain a tiny amount of data. When the data contained in an Object goes over a certain limit, the data is moved and allocated to an area outside of the heap. You can get the overall size of each data type with ObjectSpace:
require 'objspace'
ObjectSpace.count_objects_size({})
Collecting this along with your GC stats might indicate where memory is being allocated outside the heap. If you find a particular type, say :T_ARRAY increasing a lot more than the others you might need to look for an array you are forever appending to.
You can use pry-byebug to drop into a console to troll around specific objects, or even looking at all objects from the root:
ObjectSpace.memsize_of(some_object)
ObjectSpace.reachable_objects_from_root
There's a bit more detail on one of the ruby developers blog and also in this SO answer. I like their JRuby/VisualVM profiling idea.
Testing native gems
Use bundle to install your gems into a local path:
bundle install --path=.gems/
Then you can find those that include native code:
find .gems/ -name "*.c"
Which gives you: (in my order of suspiciousness)
digest-stringbuffer-0.0.2
digest-murmurhash-0.3.0
nokogiri-1.6.3.1
json-1.8.1
OSX has a useful dev tool called leaks that can tell you if it finds unreferenced memory in a running process. Not very useful for identifying where the memory comes from in Ruby but will help to identify when it is occurring.
First to be tested is digest-stringbuffer. Grab the example from the Readme and add in some GC logging with gc_tracer
require "digest/stringbuffer"
require "gc_tracer"
GC::Tracer.start_logging "gclog.txt"
module Digest
class Prime31 < StringBuffer
def initialize
#prime = 31
end
def finish
result = 0
buffer.unpack("C*").each do |c|
result += (c * #prime)
end
[result & 0xffffffff].pack("N")
end
end
end
And make it run lots
while true do
a=[]
500.times do |i|
a.push Digest::Prime31.hexdigest( "abc" * (1000 + i) )
end
sleep 1
end
Run the example:
bundle exec ruby ./stringbuffertest.rb &
pid=$!
Monitor the resident and virtual memory sizes of the ruby process, and the count of leaks identified:
while true; do
ps=$(ps -o rss,vsz -p $pid | tail +2)
leaks=$(leaks $pid | grep -c Leak)
echo "$(date) m[$ps] l[$leaks]"
sleep 15
done
And it looks like we've found something already:
Tue 26 Aug 2014 18:22:36 BST m[104776 2538288] l[8229]
Tue 26 Aug 2014 18:22:51 BST m[110524 2547504] l[13657]
Tue 26 Aug 2014 18:23:07 BST m[113716 2547504] l[19656]
Tue 26 Aug 2014 18:23:22 BST m[113924 2547504] l[25454]
Tue 26 Aug 2014 18:23:38 BST m[113988 2547504] l[30722]
Resident memory is increasing and the leaks tool is finding more and more unreferenced memory. Confirm the GC heap size, and object count looks stable still
tail -f gclog.txt | awk '{ print $1, $3, $4, $7, $13 }
1581853040832 468 183 39171 3247996
1581859846164 468 183 33190 3247996
1584677954974 469 183 39088 3254580
1584678531598 469 183 39088 3254580
1584687986226 469 183 33824 3254580
1587512759786 470 183 39643 3261058
1587513449256 470 183 39643 3261058
1587521726010 470 183 34470 3261058
Then report the issue.
It appears to my very untrained C eye that they allocate both a pointer and a buffer but only clean up the buffer.
Looking at digest-murmurhash, it seems to only provide functions that rely on StringBuffer so the leak might be fine once stringbuffer is fixed.
When they have patched it, test again and move onto the next gem. It's probably best to use snippets of code from your implementation for each gem test rather than a generic example.
Testing MRI
First step would be to prove the issue on multiple machines under the same MRI to rule out anything local, which you've already done.
Then try the same Ruby version on a different OS, which you've done too.
Try the code on JRuby or Rubinius if possible. Does the same issue occur?
Try the same code on 2.0 or 1.9 if possible, see if the same problem exists.
Try the head development version from github and see if that makes any difference.
If nothing becomes apparent, submit a bug to Ruby detailing the issue and all the things you have eliminated. Wait for a dev to help out and provide whatever they need. They will most likely want to reproduce the issue so if you can get the most concise/minimal example of the issue set up. Doing that will often help you identify what the issue is anyway.

I fix memory leak and release digest/stringbuffer v0.0.3.
https://rubygems.org/gems/digest-stringbuffer
You can try again by v0.0.3.

I'm author of digest/murmurhash and digest/stringbuffer gems.
Indeed, It seems like have leaks in digest/stringbuffer.
I'll fix it later.
Can you explain more code?
I recommend using singleton methods like this.
Digest::MurmurHash1.hexdigest(some_data)
Maybe, It's not leak since singleton methods are not using digest/stringbuffer.

Related

Memory not being released back to OS

I've created an image resizing server that creates a few different thumbnails of and image that you upload to it. I'm using the package https://github.com/h2non/bimg for resizing, which is using libvips with c-bindings.
Before going to production I've started to stress test my app with jmeter and upload 100 images to it concurrently for a few times after each other and noticed that the memory is not being released back to the OS.
To illustrate the problem I've written a few lines of code that reads 100 images and resize them (without saving them anywhere) and then waits for 10 minutes. It repeats like this for 5 times
My code and memory/CPU graph can be found here:
https://github.com/hamochi/bimg-memory-issue
It's clear that the memory is being reused for ever cycle, otherwise it should have doubled (I think). But it's never released back to the OS.
Is this a general behaviour for cgo? Or bimg that is doing something weird. Or is it just my code that is faulty?
Thank you very much for any help you can give!
There's a libvips thing to track and debug reference counts -- you could try enabling that and see if you have any leaks.
https://libvips.github.io/libvips/API/current/libvips-vips.html#vips-leak-set
Though from your comment above about bimg memory stats, it sounds like it's probably all OK.
It's easy to test libvips memory from Python. I made this small program:
#!/usr/bin/python3
import pyvips
import sys
# disable libvips operation caching ... without this, it'll cache all the
# thumbnail operations and we'll just be testing the jpg write
pyvips.cache_set_max(0)
for i in range(0, 10000):
print("loop {} ...".format(i))
for filename in sys.argv[1:]:
# thumbnail to fit 128x128 box
image = pyvips.Image.thumbnail(filename, 128)
thumb = image.write_to_buffer(".jpg")
ie. repeatedly thumbnail a set of source images. I ran it like this:
$ for i in {1..100}; do cp ~/pics/k2.jpg $i.jpg; done
$ ../fing.py *
And watched RES in top. I saw:
loop | RES (kb)
-- | --
100 | 39220
250 | 39324
300 | 39276
400 | 39316
500 | 39396
600 | 39464
700 | 39404
1000 | 39420
As long as you have no refcount leaks, I think what you are seeing is expected behaviour. Linux processes can only release pages at the end of the heap back to the OS (have a look at the brk and sbrk sys calls):
https://en.wikipedia.org/wiki/Sbrk
Now imagine if 1) libvips allocates 6GB, 2) the Go runtime allocates 100kb, 3) libvips releases 6GB. Your libc (the thing in your process that will call sbrk and brk on your behalf) can't hand the 6GB back to the OS because of the 100kb alloc at the end of the heap. Some malloc implementations have better memory fragmentation behaviour than others, but the default linux one is pretty good.
In practice, it doesn't matter. malloc will reuse holes in your memory space, and even if it doesn't, they will get paged out anyway under memory pressure and won't end up eating RAM. Try running your process for a few hours, and watch RES. You should see it creep up, but then stabilize.
(I'm not at all a kernel person, the above is just my understanding, corrections very welcome of course)
The problem is in the resize code:
_, err = bimg.NewImage(buffer).Resize(width, height)
The image is gobject and need unref explicitly to release the memory, try:
image, err = bimg.NewImage(buffer).Resize(width, height)
defer C.g_object_unref(C.gpointer(image))

Ruby Memory leak (MRI)

I must be missing something but every single application I write in Ruby seems like leaking some memory. I use Ruby MRI 2.3 but I see the same behaviour with other versions.
Whenever I write a test application that does something inside a loop it is slowly leaking memory.
while true
#do something
sleep 0.1
end
For instance, I can write to array and then clean it in a loop, or just send http post request.
Here is just one example, but I have many examples like this:
require 'net/http'
require 'json'
require 'openssl'
class Tester
def send_http some_json
begin
#uri = URI('SERVER_URL')
#http = Net::HTTP.new(#uri.host, #uri.port)
#http.use_ssl = true
#http.keep_alive_timeout = 10
#http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#http.read_timeout = 30
#req = Net::HTTP::Post.new(#uri.path, 'Content-Type' => 'application/json')
#req.body = some_json.to_json
res = #http.request(#req)
rescue Exception => e
puts e.message
puts e.backtrace.inspect
end
end
def run
while true
some_json = {"name": "My name"}
send_http(some_json)
sleep 0.1
end
end
end
Tester.new.run
The leak I see is very small, it can be 0.5 mb every hour.
I ran the code with MemoryProfiler and with GC::Profiler.enable and it shows that I have no leaks. So it must be 2 options:
There is a memory leak in C code. This might be possible but I don't use any external gems so I find it hard to believe that Ruby is leaking.
There is no memory leak and this is some sort of Ruby memory management mechanism. The thing is that I can defiantly see the memory growing. Until when will it grow? How much do I need to wait to know if it is a leak or now?
The same code runs perfectly fine with JRuby without any leaks.
I was amazed reading a post:
stack overlflow
from Joe Edgar:
Ruby’s history is mostly as a command line tool for text processing
and therefore it values quick startup and a small memory footprint. It
was not designed for long-running daemon/server processes
If what is written there is true and Ruby doesn't release memory back to OS then... We will always have a leak, right?
For instance:
Ruby asks for memory from OS.
OS provides the memory to Ruby.
Ruby frees the memory but GC still didn't run.
Ruby asks for more memory from OS.
OS provide more memory to Ruby.
Ruby runs GC but it is too late as Ruby already asked twice.
And so on and on.
What am I missing here?
Look Into GC Compaction and (Un)frozen String Literals
"Identical" Strings Aren't Necessarily Identical
Prior to Ruby 2.7.0, mainline Ruby didn't have compacting garbage collection. While I don't fully understand all the internals, the gist is that certain objects couldn't be garbage collected. Since you're using Ruby 2.3, that's something to keep in mind as you work on your memory allocation issues. Other non-YARV VMs may handle their internals differently, which is why you might see variation when using alternative engines like JRuby.
Even with Ruby 3.0.0-preview2, String literals aren't frozen by default, so your current implementation is creating a new String object with a unique object ID every tenth of a second. Consider the following:
3.times.map { 'foo'.__id__ }
#=> [240, 260, 280]
Even though the String objects seem identical, Ruby is actually allocating each one as a unique object in memory. Because a loop iteration is not a scope gate, those String objects can't be collected or compacted by YARV.
Enable Frozen String Literals by Default
You may have other issues as well, but it seems likely that your largest issue is keeping all of those String literals in scope indefinitely within your endless while-loop. You may be able to resolve your garbage collection problem (it's not a memory leak) by using frozen String literals instead. Consider the following:
# run irb with universally-frozen string literals
RUBYOPT="--enable-frozen-string-literal" irb
3.times.map { 'foo'.__id__ }
#=> [240, 240, 240]
You can solve this within your code in other ways as well, but reducing the number of String literals that remain in scope seems like a very sensible place to start.

Ruby OOM in container

Recently we've encountered a problem with Ruby inside a Docker container. Despite quite low load, application tends to consume huge amounts of memory and after some time under mentioned load it OOMs.
After some investigation we narrowed down problem to the one-liner
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; end;'
On some machines it OOMed (was killed by oom-killer because of exceeding the limit) soon after the start, but on some it worked, though slowly, without OOMs. It seems like (only seems, maybe it's not the case) in some configurations ruby is able to deduce cgroup's limits and adjust it's GC.
Configurations tested:
CentOS 7, Docker 1.9 — OOM
CentOS 7, Docker 1.12 — OOM
Ubuntu 14.10, Docker 1.9 — OOM
Ubuntu 14.10, Docker 1.12 — OOM
MacOS X Docker 1.12 — No OOM
Fedora 23 Docker 1.12 — No OOM
If you look at the memory consumption of ruby process, in all cases it behaved similar to this picture, staying on the same level slightly below the limit, or crashing into the limit and being killed.
Memory consumption plot
We want to avoid OOMs at all cost, because it reduces resiliency and poses a risk of loosing data. Memory really needed for the application is way below the limit.
Do you have any suggestions as of what to do with ruby to avoid OOMing, possibly by loosing in performance?
We can't figure out what're the significant differences between tested installations.
Edit: Changing the code or increasing memory limit are not available. First one because we run fluentd with community plugins which we have no control of, second one because it won't guarantee that we won't face this issue again in the future.
You can try to tweak rubies garbage collection via environment variables (depending on your ruby version):
RUBY_GC_MALLOC_LIMIT=4000100
RUBY_GC_MALLOC_LIMIT_MAX=16000100
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1
Or call garbage collection manualy via GC.start
For your example, try
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; array = nil; end;'
to help the garbage collector.
Edit:
I don't have a comparable environment to yours. On my machine (14.04.5 LTS, docker 1.12.3, RAM 4GB, Intel(R) Core(TM) i5-3337U CPU # 1.80GHz) the following looks quite promising.
docker run -ti -m 500MB -e "RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1" \
-e "RUBY_GC_MALLOC_LIMIT=5242880" \
-e "RUBY_GC_MALLOC_LIMIT_MAX=16000100" \
-e "RUBY_GC_HEAP_INIT_SLOTS=500000" \
ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; puts `ps -o rss -p #{Process::pid}`.chomp.split("\n").last.strip.to_i / 1024.0 / 1024 ; puts GC.stat; end;'
But every ruby app needs a different setup for fine tuning and if you experience memory leaks, your lost.
I don't think this is a docker issue. You're overusing the resources of the container and Ruby tends to not behave well once you hit memory thresholds. It can GC, but if another process tries to take some memory or Ruby attempts to allocate again while you are maxed out then the kernel will (usually) kill the process with the most memory. If you're worried about memory usage on a server, add some threshold alerts at 80% RAM and allocate the appropriately sized resources for the job. When you start hitting thresholds, allocate more RAM or look at the particular job parameters/allocations to see if it needs to be redesigned to have a lower footprint.
Another potential option if you really want to have a nice fixed memory band to GC against is to use JRuby and set the JVM max memory to leave a little wiggle room on the container memory. The JVM will manage OOM within its own context better as it isn't sharing those resources with other processes nor letting the kernel think the server is dying.
I had a similar issue with a few java based Docker containers that were running on a single Docker host. The problem was each container saw the total available memory of the host machine and assumed it could use all of that memory for itself. It didn't run GC very often and I ended up getting out of memory exceptions. I ended up manually limiting the amount of memory each container could use and I no longer got OOMs. Within the contianer I also limited the memory of the JVM.
Not sure if this is the same issue you're seeing but it could be related.
https://docs.docker.com/engine/reference/run/#/runtime-constraints-on-resources

Ruby Memory Management

I have been using Ruby for a while now and I find, for bigger projects, it can take up a fair amount of memory. What are some best practices for reducing memory usage in Ruby?
Please, let each answer have one "best practice" and let the community vote it up.
When working with huge arrays of ActiveRecord objects be very careful... When processing those objects in a loop if on each iteration you are loading their related objects using ActiveRecord's has_many, belongs_to, etc. - the memory usage grows a lot because each object that belongs to an array grows...
The following technique helped us a lot (simplified example):
students.each do |student|
cloned_student = student.clone
...
cloned_student.books.detect {...}
ca_teachers = cloned_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
# Not sure if the following is necessary, but we have it just in case...
cloned_student = nil
end
In the code above "cloned_student" is the object that grows, but since it is "nullified" at the end of each iteration this is not a problem for huge array of students. If we didn't do "clone", the loop variable "student" would have grown, but since it belongs to an array - the memory used by it is never released as long as array object exists.
Different approach works too:
students.each do |student|
loop_student = Student.find(student.id) # just re-find the record into local variable.
...
loop_student.books.detect {...}
ca_teachers = loop_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
end
In our production environment we had a background process that failed to finish once because 8Gb of RAM wasn't enough for it. After this small change it uses less than 1Gb to process the same amount of data...
Don't abuse symbols.
Each time you create a symbol, ruby puts an entry in it's symbol table. The symbol table is a global hash which never gets emptied.
This is not technically a memory leak, but it behaves like one. Symbols don't take up much memory so you don't need to be too paranoid, but it pays to be aware of this.
A general guideline: If you've actually typed the symbol in code, it's fine (you only have a finite amount of code after all), but don't call to_sym on dynamically generated or user-input strings, as this opens the door to a potentially ever-increasing number
Don't do this:
def method(x)
x.split( doesn't matter what the args are )
end
or this:
def method(x)
x.gsub( doesn't matter what the args are )
end
Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...
Things like this are why I have lots of love for the ruby language, but no respect for MRI
Beware of C extensions which allocate large chunks of memory themselves.
As an example, when you load an image using RMagick, the entire bitmap gets loaded into memory inside the ruby process. This may be 30 meg or so depending on the size of the image.
However, most of this memory has been allocated by RMagick itself. All ruby knows about is a wrapper object, which is tiny(1).
Ruby only thinks it's holding onto a tiny amount of memory, so it won't bother running the GC. In actual fact it's holding onto 30 meg.
If you loop over a say 10 images, you can run yourself out of memory really fast.
The preferred solution is to manually tell the C library to clean up the memory itself - RMagick has a destroy! method which does this. If your library doesn't however, you may need to forcibly run the GC yourself, even though this is generally discouraged.
(1): Ruby C extensions have callbacks which will get run when the ruby runtime decides to free them, so the memory will eventually be successfully freed at some point, just perhaps not soon enough.
Measure and detect which parts of your code are creating objects that cause memory usage to go up. Improve and modify your code then measure again. Sometimes, you're using gems or libraries that use up a lot of memory and creating a lot of objects as well.
There are many tools out there such as busy-administrator that allow you to check the memory size of objects (including those inside hashes and arrays).
$ gem install busy-administrator
Example # 1: MemorySize.of
require 'busy-administrator'
data = BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
puts BusyAdministrator::MemorySize.of(data)
# => 10 MiB
Example # 2: MemoryUtils.profile
Code
require 'busy-administrator'
results = BusyAdministrator::MemoryUtils.profile(gc_enabled: false) do |analyzer|
BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
end
BusyAdministrator::Display.debug(results)
Output:
{
memory_usage:
{
before: 12 MiB
after: 22 MiB
diff: 10 MiB
}
total_time: 0.406452
gc:
{
count: 0
enabled: false
}
specific:
{
}
object_count: 151
general:
{
String: 10 MiB
Hash: 8 KiB
BusyAdministrator::MemorySize: 0 Bytes
Process::Status: 0 Bytes
IO: 432 Bytes
Array: 326 KiB
Proc: 72 Bytes
RubyVM::Env: 96 Bytes
Time: 176 Bytes
Enumerator: 80 Bytes
}
}
You can also try ruby-prof and memory_profiler. It is better if you test and experiment different versions of your code so you can measure the memory usage and performance of each version. This will allow you to check if your optimization really worked or not. You usually use these tools in development / testing mode and turn them off in production.

Why this XML parsing Ruby code runs slower with GC disabled?

I've got a piece of code parsing 500 MB XML file using libxml-ruby gem. What is surprising to me, this code runs slower with GC disabled, which seems counter-intuitive. What might be the reason? I've go plenty of memory available and the system is not swapping.
require 'xml'
#GC.disable
#reader = XML::Reader.file('books.xml', :options => XML::Parser::Options::NOBLANKS)
#reader.read
#reader.read
while #reader.name == 'book'
book_id = #reader.get_attribute('id')
#reader.read
until #reader.name == 'book' && #reader.node_type == XML::Reader::TYPE_END_ELEMENT
case #reader.name
when 'author'
author = #reader.read_string
when 'title'
title = #reader.read_string
when 'genre'
genre = #reader.read_string
when 'price'
price = #reader.read_string
when 'publish_date'
publish_date = #reader.read_string
when 'description'
description = #reader.read_string
end
#reader.next
end
#reader.read
end
#reader.close
Here are the results I got:
ruby gc on gc off
2.2.0 16.93s 18.81s
2.1.5 16.22s 18.58s
2.0.0 17.63s 17.99s
Why disable the garbage collector? I've read in Ruby Performance Optimization book that Ruby is slow mostly because programmers don't think about memory consumption, which makes garbage collector use a lot of execution time. Thus turning off the GC should instantly speed things up (at the cost of memory usage of course), as long as the system is not swapping.
I wanted to see if my XML parsing module can be improved, so I started experimenting with it by disabling the GC, which brought me to this problem. I expected a significant speed-up with GC disabled, but instead I got the opposite. I know that differences are not huge, but still this is strange to me.
libxml-ruby gem uses a native C LibXML implementation under the hood - can it be the reason?
The file that I used is manually multiplicated books.xml sample downloaded from some Microsoft documentation:
<catalog>
<book id="bk101">
<author>John Doe</author>
<title>XML for dummies</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>Some description</description>
</book>
....
</catalog>
My setup: OS X Yosemite, Intel Core i5 2.6 GHz, 16GB of RAM.
Thanks for any suggestions.
You are forgetting the operating system - you've disabled GC in your MRI process but you have no control over the linux/unix kernel and how it allocates memory to your MRI application.
In fact, I believe that by disabling GC you have significantly hamstrung the behaviour of your application, making it likely that your program will need to continuously request more RAM from the kernel. This will likely involve some form of overhead in the kernel as it allocates swap or memory to you.
Your source data is a 500 mb xml file that you are reading, node by node, into your MRI program's memory footprint. It's likely your MRI process consumes several GB of data by the time it is done processing; and none of the values in your main reading block are discarded after each iteration - they simply hang around in memory, and are only finally cleaned up when your application exits and the memory is handed back to the operating system.
GC is in place to manage this; it is intended to prevent your application from requesting additional memory from the kernel unless it absolutely needs it, and to allow your application to run "well enough" within the memory allocated to it within reason.
So I'm not honestly surprised you see a slowdown with GC disabled. What would be telling is the load average and swap usage of your box during your benchmarks.

Resources