Fixnum array memory consumption in Ruby 1.9.3 - are arrays getting compressed? - ruby

I created new arrays and tracked the amount of memory allocated using "Use Size" returned by GC::Profiler.report.
Fixnum (size: memory used in bytes):
100,000: 900
1,000,000: 1320
10,000,000: 17,552,240
Bignum (size: memory used in bytes):
100,000: 5081,680
1,000,000: 39,999,520
The memory consumption for Bignum makes sense, however the consumption for Fixnum is very weird. I would expect the array to take up at least 4 bytes per elements (more likely 8 bytes because I'm using a 64bit machine). The only logical explanation I can come up with for the low memory consumption is that Fixnum arrays are getting compressed as a bitmap, however I couldn't find any documentation that would support that.
What's going on?
The code used is:
GC::Profiler.enable
GC.start
GC::Profiler.report
a = []
1_000_000.times do
a << rand(2**62)
end
GC::Profiler.report
Results:
Index Invoke Time(sec) Use Size(byte) Total Size(byte) Total Object GC Time(ms)
1 13.541 15700560 19223000 480575 68.00500000000120337518
2 14.017 15701880 19239360 480984 48.00299999999957378805

Related

Why doesn't this Ruby program return off heap memory to the operating system?

I am trying to understand when memory allocated off the Ruby heap gets returned to the operating system. I understand that Ruby never returns memory allocated to it's heap but I am still not sure about the behaviour of off heap memory. i.e. those objects that don't fit into a 40 byte RVALUE.
Consider the following program that allocates some large strings and then forces a major GC.
require 'objspace'
STRING_SIZE = 250
def print_stats(msg)
puts '-------------------'
puts msg
puts '-------------------'
puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1,"KB";}'`}"
puts "HEAP SIZE: #{(GC.stat[:heap_sorted_length] * 408 * 40)/1024} KB"
puts "SIZE OF ALL OBJECTS: #{ObjectSpace.memsize_of_all/1024} KB"
end
def run
print_stats('START WORK')
#data=[]
600_000.times do
#data << " " * STRING_SIZE
end
print_stats('END WORK')
#data=nil
end
run
GC.start
print_stats('AFTER FORCED MAJOR GC')
Running this program with Ruby 2.2.3 on MRI it produces the following output. After a forced major GC, the heap size is as expected but RSS has not decreased significantly.
-------------------
START WORK
-------------------
RSS: 7036 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3172 KB
-------------------
END WORK
-------------------
RSS: 205660 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 178423 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 164492 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 2484 KB
Compare these results to the following results when we allocate one large object instead of many smaller objects.
def run
print_stats('START WORK')
#data = " " * STRING_SIZE * 600_000
print_stats('END WORK')
#data=nil
end
-------------------
START WORK
-------------------
RSS: 7072 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3170 KB
-------------------
END WORK
-------------------
RSS: 153584 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 149064 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 7096 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 2483 KB
Note the final RSS value. We seem to have freed all the memory we allocated for the big string.
I am not sure why the second example releases the memory but the first example doesn't as they are both allocating memory off the Ruby heap. This is one reference that could provide an explanation but I would be interested in explanations from others.
Releasing memory back to the kernel also has a cost. User space memory
allocators may hold onto that memory (privately) in the hope it can be
reused within the same process and not give it back to the kernel for
use in other processes.
#joanbm has a very good point here. His referenced article explains this pretty well :
Ruby's GC releases memory gradually, so when you do GC on 1 big chunk of memory pointed by 1 reference it releases it all, but when there is a lot of references, the GC will releases memory in smaller chuncks.
Several calls to GC.start will release more and more memory in the 1st example.
Here are 2 orther articles to dig deeper :
http://thorstenball.com/blog/2014/03/12/watching-understanding-ruby-2.1-garbage-collector/
https://samsaffron.com/archive/2013/11/22/demystifying-the-ruby-gc

SystemStackError when pushing more than 130798 objects into an array

I am trying to understand why pushing many (in my case 130798) objects in an array returns a SystemStackError.
big = Array.new(130797, 1)
[].push(*big) && false
=> false
bigger = Array.new(130798, 1)
[].push(*bigger) && false
=> SystemStackError: stack level too deep
from (irb):104
from /Users/julien/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'
I was able to reproduce it on MRI 1.9.3 and 2.2.0 while no errors were raised on Rubinius (2.5.2).
I understand this is due to the way Array are implemented in MRI but don't quite understand why a SystemStackError is raised.
Ruby's error message ("stack level too deep") isn't accurate here - what Ruby is really saying is "I ran out of stack memory", which is usually caused by infinite recursion, but in this case, is caused by you passing more arguments than Ruby has memory allocated to handle.
Ruby 2.0+ has a maximum stack size controlled by RUBY_THREAD_VM_STACK_SIZE (prior to 2.0 this was controlled by the C limits, set via ulimit). Each argument passed to a method gets pushed onto the thread's stack; if you push more arguments onto the stack than RUBY_THREAD_VM_STACK_SIZE has room to accomodate, you'll get a SystemStackError. You can see this limit from IRB:
RubyVM::DEFAULT_PARAMS[:thread_vm_stack_size]
=> 1048576
By default, each thread has 1MB of stack it can use. Ruby Fixnums are 8 bytes large, and on my system, I overflow at 130808 arguments, or 1046464 bytes allocated, leaving 2112 bytes allocated for the rest of the call stack. By using the splat operator (*) you are saying "take this list of 130798 Fixnums and expand it into 130798 arguments to be passed on the stack"; you simply don't have enough stack memory allocated to hold them all.
If you need to, you can increase RUBY_THREAD_VM_STACK_SIZE when you invoke Ruby:
$ RUBY_THREAD_VM_STACK_SIZE=2097152 irb
> [].push(*Array.new(150808, 1)); nil
=> nil
And this will increase the number of arguments you can pass. However, it also means that each thread will allocate twice as much stack, which is probably not desirable. You should also note that Fibers have a separate stack allocation setting, which is typically substantially smaller, since Fibers are designed to by lightweight and disposable.
Very rarely should you ever need to pass that much data on the stack; typically, if you need to pass a large amount of data to a method, you would pass an object as an argument (that is, on the stack, such as a Hash or Array) whose storage is allocated on the heap, so your stack usage is measured in bytes even if your heap usage is measured in megabytes. That is, you would pass your very large array to your method (which could hold gigabytes of data on the heap without issue), then you would iterate that array in your method.

Redis 10x more memory usage than data

I am trying to store a wordlist in redis. The performance is great.
My approach is of making a set called "words" and adding each new word via 'sadd'.
When adding a file thats 15.9 MB and contains about a million words, the redis-server process consumes 160 MB of ram. How come I am using 10x the memory, is there any better way of approaching this problem?
Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.
A Redis set is implemented as a hash table. This includes:
an array of pointers growing geometrically (powers of two)
a second array may be required when incremental rehashing is active
single-linked list cells representing the entries in the hash table (3 pointers, 24 bytes per entry)
Redis object wrappers (one per value) (16 bytes per entry)
actual data themselves (each of them prefixed by 8 bytes for size and capacity)
All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)
Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.
If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.
To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:
hash it modulo 10000 (done on client side)
HMSET words:[hashnum] [word] 1
Instead of storing:
words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }
you can store:
words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...
To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).
With this strategy, significant memory saving can be done provided the modulo of the hash is
chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):
# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
Beware: the name of these parameters have changed with Redis >= 2.6.
Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.
As for my experiments, It is better to store your data inside a hash table/dictionary . the best ever case I reached after a lot of benchmarking is to store inside your hashtable data entries that are not exceeding 500 keys.
I tried standard string set/get, for 1 million keys/values, the size was 79 MB. It is very huge in case if you have big numbers like 100 millions which will use around 8 GB.
I tried hashes to store the same data, for the same million keys/values, the size was increasingly small 16 MB.
Have a try in case if anybody needs the benchmarking code, drop me a mail
Did you try persisting the database (BGSAVE for example), shutting the server down and getting it back up? Due to fragmentation behavior, when it comes back up and populates its data from the saved RDB file, it might take less memory.
Also: What version of Redis to you work with? Have a look at this blog post - it says that fragmentation has partially solved as of version 2.4.

managed heap fragmentation

I am trying to understand how heap fragmenation works. What does the following output tell me?
Is this heap overly fragmented?
I have 243010 "free objects" with a total of 53304764 bytes. Are those "free object" spaces in the heap that once contained object but that are now garabage collected?
How can I force a fragmented heap to clean up?
!dumpheap -type Free -stat
total 243233 objects
Statistics:
MT Count TotalSize Class Name
0017d8b0 243010 53304764 Free
It depends on how your heap is organized. You should have a look at how much memory in Gen 0,1,2 is allocated and how much free memory you have there compared to the total used memory.
If you have 500 MB managed heap used but and 50 MB is free then you are doing pretty well. If you do memory intensive operations like creating many WPF controls and releasing them you need a lot more memory for a short time but .NET does not give the memory back to the OS once you allocated it. The GC tries to recognize allocation patterns and tends to keep your memory footprint high although your current heap size is way too big until your machine is running low on physical memory.
I found it much easier to use psscor2 for .NET 3.5 which has some cool commands like ListNearObj where you can find out which objects are around your memory holes (pinned objects?). With the commands from psscor2 you have much better chances to find out what is really going on in your heaps. Most commands are also available in SOS.dll in .NET 4 as well.
To answer your original question: Yes free objects are gaps on the managed heap which can simply be the free memory block after your last allocated object on a GC segement. Or if you do !DumpHeap with the start address of a GC segment you see the objects allocated in that managed heap segment along with your free objects which are GC collected objects.
This memory holes do normally happen in Gen2. The object addresses before and after the free object do tell you what potentially pinned objects are around your hole. From this you should be able to determine your allocation history and optimize it if you need to.
You can find the addresses of the GC Heaps with
0:021> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x101da9cc
generation 1 starts at 0x10061000
generation 2 starts at 0x02aa1000
ephemeral segment allocation context: none
segment begin allocated size
02aa0000 02aa1000** 03836a30 0xd95a30(14244400)
10060000 10061000** 103b8ff4 0x357ff4(3506164)
Large object heap starts at 0x03aa1000
segment begin allocated size
03aa0000 03aa1000 03b096f8 0x686f8(427768)
Total Size: Size: 0x115611c (18178332) bytes.
------------------------------
GC Heap Size: Size: 0x115611c (18178332) bytes.
There you see that you have heaps at 02aa1000 and 10061000.
With !DumpHeap 02aa1000 03836a30 you can dump the GC Heap segment.
!DumpHeap 02aa1000 03836a30
Address MT Size
...
037b7b88 5b408350 56
037b7bc0 60876d60 32
037b7be0 5b40838c 20
037b7bf4 5b408350 56
037b7c2c 5b408728 20
037b7c40 5fe4506c 16
037b7c50 60876d60 32
037b7c70 5b408728 20
037b7c84 5fe4506c 16
037b7c94 00135de8 519112 Free
0383685c 5b408728 20
03836870 5fe4506c 16
03836880 608c55b4 96
....
There you find your free memory blocks which was an object which was already GCed. You can dump the surrounding objects (the output is sorted address wise) to find out if they are pinned or have other unusual properties.
You have 50MB of RAM as Free space. This is not good.
Having .NET allocating blocks of 16MB from process, we have a fragmentation issue indeed.
There are plenty of reasons to fragmentation to occure in .NET.
Have a look here and here.
In your case it is possibly a pinning. As 53304764 / 243010 makes 219.35 bytes per object - much lower then LOH objects.

Ruby Garbage Collection Heap Slot size

So, ruby enterprise documentation states that all the values in the GC settings are defined in slots: http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning
(e.g. RUBY_HEAP_MIN_SLOTS)
We fine-tuned our app's min slot size and increment for the best performance by trial and error (we have enough machines to get a good idea how different values affect the number of malloc calls and Full GCs).
But something has been bugging me for a while: How big is 1 slot in bytes?
From Ruby source:
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
$ rvm use ruby-1.9.2-p136
$ gdb ruby
(gdb) p sizeof(RVALUE)
$1 = 40
The default in 1.9 is 8K
http://svn.ruby-lang.org/repos/ruby/trunk/gc.c
(search for HEAP_SIZE)
Note well that whenever it runs out of space and needs to reallocate, in 1.9 it allocates exponentially more heaps.
In 1.8 it would allocate bigger and bigger heaps.
After diggin' through the code:
1 slot is a size of sizeof(struct RVALUE), which depends on the machine.

Resources