Do Ruby objects have a size limit? - ruby

I'm building some large strings which have a short lifetime within the application. Will the String objects grow arbitrarily large up to the physical limits of the ruby instance?
What I'm wondering is if, without any intervention in limiting the string size, my application would get hosed by running out of memory, or whether it would degrade gracefully.
Thanks for any input!

There is a limit. A String can be 2**31 - 1 (and accordingly 2**63 - 1 on 64 bit ruby). You can see the limit with:
>> s = String.new("1" * (2**32))
RangeError: bignum too big to convert into `long'
from (irb):3:in `*'
from (irb):3
>> s = String.new("1" * (2**31))
RangeError: bignum too big to convert into `long'
from (irb):4:in `*'
from (irb):4
Having said that, while you could try to allocate a string that big it will likely fail (at least on a 32 bit system as typically the maximum amount of memory a process can allocate is between 2.5 and 3GB and a 2**31 - 1 length string is nearly 2GB by itself.) As seen:
>> "1" * (2**30)
NoMemoryError: failed to allocate memory
from /usr/lib/ruby/1.8/irb.rb:310:in `inspect'
from /usr/lib/ruby/1.8/irb.rb:310:in `output_value'
from /usr/lib/ruby/1.8/irb.rb:159:in `eval_input'
from /usr/lib/ruby/1.8/irb.rb:271:in `signal_status'
from /usr/lib/ruby/1.8/irb.rb:155:in `eval_input'
from /usr/lib/ruby/1.8/irb/ruby-lex.rb:244:in `each_top_level_statement'
from /usr/lib/ruby/1.8/irb/ruby-lex.rb:230:in `loop'
from /usr/lib/ruby/1.8/irb/ruby-lex.rb:230:in `each_top_level_statement'
from /usr/lib/ruby/1.8/irb/ruby-lex.rb:229:in `catch'
from /usr/lib/ruby/1.8/irb/ruby-lex.rb:229:in `each_top_level_statement'
from /usr/lib/ruby/1.8/irb.rb:154:in `eval_input'
from /usr/lib/ruby/1.8/irb.rb:71:in `start'
from /usr/lib/ruby/1.8/irb.rb:70:in `catch'
from /usr/lib/ruby/1.8/irb.rb:70:in `start'
from /usr/bin/irb:13
Maybe IRB bug!!
I don't believe there is any way to catch the NoMemoryError.
Updated to reflect the comment from sepp2k

Related

What does number mean in the backtrace?

Does the 1, 2, 3, 4, 5, 1, 2 indicates these two errors (EOFError and NoMethodError) happened in different threads?
Traceback (most recent call last):
2: from /gem/lib/my/project/conn.rb:72:in `block in initialize'
1: from /usr/local/lib/ruby/2.7.0/openssl/buffering.rb:125:in `readpartial'
/usr/local/lib/ruby/2.7.0/openssl/buffering.rb:125:in `sysread': end of file reached (EOFError)
5: from /gem/lib/my/project/conn.rb:69:in `block in initialize'
4: from /gem/lib/my/project/conn.rb:80:in `rescue in block in initialize'
3: from /gem/lib/my/project/session.rb:60:in `disconnected'
2: from /gem/lib/my/project/session.rb:217:in `retransmit'
1: from /gem/lib/my/project/session.rb:117:in `transmit_results'
/gem/lib/rspec/buildkite/analytics/conn.rb:104:in `transmit': undefined method `write' for nil:NilClass (NoMethodError)
The numbers correspond to the position of the stack frame in the execution stack and thus in the quoted backtrace.
In your question, you are quoting two different exceptions with their respective back traces. By going up the back trace (thus by going from the bottom of the trace towards the top, you can trace the respective calls and their line numbers in their source file. Upper traces (i.e. those with a larger number) have called functionality with lower numbers.
Note that different Ruby versions use a different default order of those traces. Older versions have but the inner calls (i.e. lower numbers) on top of the trace. Since Ruby 2.5, the default order was reversed to show th exception and the inner calls at the bottom.

Memory leak in ruby

I have such code in irb:
2.6.3 :001 > a = []; 100000000000.times do a.push([1]) end
^CTraceback (most recent call last):
3: from (irb):1
2: from (irb):1:in `times'
1: from (irb):1:in `block in irb_binding'
IRB::Abort (abort then interrupt!)
2.6.3 :002 > a.clear
=> []
2.6.3 :003 > GC.start
=> nil
2.6.3 :004 > a.size
=> 0
2.6.3 :005 > exit
My memory chart:
So memory is completely released only at the exit.
How memory can freed completely before app exit?
Operating as Designed
It's a leak if and only if the memory isn't returned to the system after Ruby exits. Since that's not the behavior you're describing, it's fair to say that your interpreter appears to be operating as designed.
See below for a little more about how Ruby's garbage collection works at a high level, and why your array-building is so memory intensive.
No Memory is Leaking
This is not a leak; this is how Ruby garbage collection works! It's basically a mark-and-sweep garbage collector, with some new support for compaction. At a high level, Ruby allocates memory for objects still in scope, and generally won't release the allocation until all references go out of scope.
Ruby's garbage collection isn't well-documented outside of source code, and the implementation is a bit more complex than what I described above. Furthermore, the garbage collection implementation can vary from release to release, and between different interpreters (e.g. JRuby and MRI) too! Still, it's sufficient to understand what you're seeing.
Basically, 100000000000.times do a.push([1]) end will push an element onto Array a 100 million times. As long as a is in scope, the memory won't be garbage collected. Even if you manually start the garbage collector routines after a goes out of scope, Ruby may or may not free the memory if the system isn't under memory pressure.
I wouldn't worry about this unless you have very long-lived processes that need to keep millions of records in active memory. If you do, a purpose-built cache or database (e.g. memcached, Redis) might be more efficient.

Ruby + Windows + Timeouts + SerialPorts won't work

I am developing a multiplatform Ruby program that is supposed to connect via USB to a serial device.
First I was using the serialport gem (1.0.4), but then I ran into some strange problems and had to drop it.
I then proceeded to communicate via Ruby's IO class, as follows:
#port = IO.new IO.sysopen(path, mode), mode
Communication via syswrite and sysread is perfect both in Linux as Windows.
With the communication done, I tried setting up timeouts so that the program won't hang if any desync occurs. All done on the Linux side with timeout.rb, but Windows won't get me control of the interpreter again after calling any IO read method (sysread, getc, gets, getbyte... I tried them all!).
I experimented with Terminator, but it wouldn't even run, throwing argument exceptions instead of timing out -- even in Linux!:
require 'terminator'
Terminator.terminate 2 do
sleep 4
end
produces:
/var/lib/gems/1.9.1/gems/terminator-0.4.4/lib/terminator.rb:164: Use RbConfig instead of obsolete and deprecated Config.
ArgumentError: wrong number of arguments (1 for 0)
from /var/lib/gems/1.9.1/gems/terminator-0.4.4/lib/terminator.rb:127:in `block in terminate'
from (irb):12:in `call'
from (irb):12:in `sleep'
from (irb):12:in `block in irb_binding'
from /var/lib/gems/1.9.1/gems/terminator-0.4.4/lib/terminator.rb:134:in `call'
from /var/lib/gems/1.9.1/gems/terminator-0.4.4/lib/terminator.rb:134:in `terminate'
from (irb):11
from /usr/bin/irb:12:in `<main>'
As SystemTimer relies on UNIX signals and doesn't work on Windows (it simply wraps timeout.rb), I am stuck with a multiplatform program that would just eternally hang when running on Windows, although just fine on Linux.
Is there any way I could set timeouts on serial port reading on Windows? Perhaps a win32api call?
Thank you for your time. :)
Edit:
I believe I found a way around the problem, although it really stinks.
instead of
Timeout::timeout 0.5 do
gets
end
I tried
begin
Timeout::timeout 0.5 do
Thread.new do
gets
end.join
end
rescue
"Expired :)"
end
and it seems to be fine on IRB.
I'm going to implement and test it and then I'll post the results here. :)
However, any prettier solution is most than welcome!

Ruby Malloc Limit: How to raise it higher

I am using ruby 1.9.3 with rails 3.1 and the memory usage for these things can get pretty large pretty fast. I have read around and it appears that the default ruby malloc limit is 8MB. This is pretty low and I have a lot of server to play around with. How can I raise the malloc limit to something like 1024 MB or so? I know the variable is RUBY_GC_MALLOC_LIMIT. I don't really want to have to custom compile the VM.
If you are still looking for this you could try the malloc gem. You can custom set malloc items:
m = Malloc.new(1048576) #Number of bytes to allocate

Ruby Memory Management

I have been using Ruby for a while now and I find, for bigger projects, it can take up a fair amount of memory. What are some best practices for reducing memory usage in Ruby?
Please, let each answer have one "best practice" and let the community vote it up.
When working with huge arrays of ActiveRecord objects be very careful... When processing those objects in a loop if on each iteration you are loading their related objects using ActiveRecord's has_many, belongs_to, etc. - the memory usage grows a lot because each object that belongs to an array grows...
The following technique helped us a lot (simplified example):
students.each do |student|
cloned_student = student.clone
...
cloned_student.books.detect {...}
ca_teachers = cloned_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
# Not sure if the following is necessary, but we have it just in case...
cloned_student = nil
end
In the code above "cloned_student" is the object that grows, but since it is "nullified" at the end of each iteration this is not a problem for huge array of students. If we didn't do "clone", the loop variable "student" would have grown, but since it belongs to an array - the memory used by it is never released as long as array object exists.
Different approach works too:
students.each do |student|
loop_student = Student.find(student.id) # just re-find the record into local variable.
...
loop_student.books.detect {...}
ca_teachers = loop_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
end
In our production environment we had a background process that failed to finish once because 8Gb of RAM wasn't enough for it. After this small change it uses less than 1Gb to process the same amount of data...
Don't abuse symbols.
Each time you create a symbol, ruby puts an entry in it's symbol table. The symbol table is a global hash which never gets emptied.
This is not technically a memory leak, but it behaves like one. Symbols don't take up much memory so you don't need to be too paranoid, but it pays to be aware of this.
A general guideline: If you've actually typed the symbol in code, it's fine (you only have a finite amount of code after all), but don't call to_sym on dynamically generated or user-input strings, as this opens the door to a potentially ever-increasing number
Don't do this:
def method(x)
x.split( doesn't matter what the args are )
end
or this:
def method(x)
x.gsub( doesn't matter what the args are )
end
Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...
Things like this are why I have lots of love for the ruby language, but no respect for MRI
Beware of C extensions which allocate large chunks of memory themselves.
As an example, when you load an image using RMagick, the entire bitmap gets loaded into memory inside the ruby process. This may be 30 meg or so depending on the size of the image.
However, most of this memory has been allocated by RMagick itself. All ruby knows about is a wrapper object, which is tiny(1).
Ruby only thinks it's holding onto a tiny amount of memory, so it won't bother running the GC. In actual fact it's holding onto 30 meg.
If you loop over a say 10 images, you can run yourself out of memory really fast.
The preferred solution is to manually tell the C library to clean up the memory itself - RMagick has a destroy! method which does this. If your library doesn't however, you may need to forcibly run the GC yourself, even though this is generally discouraged.
(1): Ruby C extensions have callbacks which will get run when the ruby runtime decides to free them, so the memory will eventually be successfully freed at some point, just perhaps not soon enough.
Measure and detect which parts of your code are creating objects that cause memory usage to go up. Improve and modify your code then measure again. Sometimes, you're using gems or libraries that use up a lot of memory and creating a lot of objects as well.
There are many tools out there such as busy-administrator that allow you to check the memory size of objects (including those inside hashes and arrays).
$ gem install busy-administrator
Example # 1: MemorySize.of
require 'busy-administrator'
data = BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
puts BusyAdministrator::MemorySize.of(data)
# => 10 MiB
Example # 2: MemoryUtils.profile
Code
require 'busy-administrator'
results = BusyAdministrator::MemoryUtils.profile(gc_enabled: false) do |analyzer|
BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
end
BusyAdministrator::Display.debug(results)
Output:
{
memory_usage:
{
before: 12 MiB
after: 22 MiB
diff: 10 MiB
}
total_time: 0.406452
gc:
{
count: 0
enabled: false
}
specific:
{
}
object_count: 151
general:
{
String: 10 MiB
Hash: 8 KiB
BusyAdministrator::MemorySize: 0 Bytes
Process::Status: 0 Bytes
IO: 432 Bytes
Array: 326 KiB
Proc: 72 Bytes
RubyVM::Env: 96 Bytes
Time: 176 Bytes
Enumerator: 80 Bytes
}
}
You can also try ruby-prof and memory_profiler. It is better if you test and experiment different versions of your code so you can measure the memory usage and performance of each version. This will allow you to check if your optimization really worked or not. You usually use these tools in development / testing mode and turn them off in production.

Resources