So, ruby enterprise documentation states that all the values in the GC settings are defined in slots: http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning
(e.g. RUBY_HEAP_MIN_SLOTS)
We fine-tuned our app's min slot size and increment for the best performance by trial and error (we have enough machines to get a good idea how different values affect the number of malloc calls and Full GCs).
But something has been bugging me for a while: How big is 1 slot in bytes?
From Ruby source:
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
$ rvm use ruby-1.9.2-p136
$ gdb ruby
(gdb) p sizeof(RVALUE)
$1 = 40
The default in 1.9 is 8K
http://svn.ruby-lang.org/repos/ruby/trunk/gc.c
(search for HEAP_SIZE)
Note well that whenever it runs out of space and needs to reallocate, in 1.9 it allocates exponentially more heaps.
In 1.8 it would allocate bigger and bigger heaps.
After diggin' through the code:
1 slot is a size of sizeof(struct RVALUE), which depends on the machine.
Related
The main reason for external sort is that the data may be larger than the main memory we have.However,we are using virtual memory now, and the virtual memory will take care of swapping between main memory and disk.Why do we need to have external sort then?
An external sort algorithm makes sorting large amounts of data efficient (even when the data does not fit into physical RAM).
While using an in-memory sorting algorithm and virtual memory satisfies the functional requirements for an external sort (that is, it will sort the data), it fails to achieve the non-functional requirement of being efficient. A good external sort minimises the amount of data read and written to external storage (and historically also seek times), and a general-purpose virtual memory implementation on top of a sort algorithm not designed for this will not be competitive with an algorithm designed to minimise IO.
In addition to #Anonymous's answer that external sort is better optimized for less disk IO, sometimes using in-memory sort and using the virtual memory is infeasible, since the virtual memory space is smaller than the file's size.
For example, if you have a 32 bits system (there are still a lot of these), and you want to sort a 20 GB file, 32bits system allow you to have 2^32 ~= 4GB virtual addresses, but the file you are trying to sort cannot fit in.
This used to be a real issue when 64 bits systems were still not very common, and is still an issue today for old 32 bits systems and some embadded devices.
However, even for 64 bits system, as expained in previous answers, the external sort algorithm is more optimized for the nature of sorting, and will require significantly less disk IO than letting the OS "take care of things".
I'm using Windows, in common line shell, you could run "systeminfo", it gives me my laptop's memory usage information.
Total Physical Memory: 8,082 MB
Available Physical Memory: 2,536 MB
Virtual Memory: Max Size: 11,410 MB
Virtual Memory: Available: 2,686 MB
Virtual Memory: In Use: 8,724 MB
I just write a app to test max size of array I could initialize from my laptop.
public static void BurnMemory()
{
for(var i = 1; i <= 1024; i++)
{
long size = 1 << i;
long t = 4 * size / (1 << 30);
try
{
// 1 int32 takes 32 bit(4 byte) memmory,
var arr = new int[size];
Console.WriteLine("Test pass initialize a array with size = 2^" + i.ToString());
}
catch(OutOfMemoryException err)
{
Console.WriteLine("Reach memory limitation when initialize a array with size = 2^{0} int32 = 4 x {1}B= {2}TB",i, size, t );
break;
}
}
}
It seems it terminate when it is trying to initialize array with size of 2^29.
Reach memory limitation when initialize a array with size = 2^29 int32 = 4 x 536870912B= 2TB
What I get from the test:
It is not hard to reach the memory limitation.
We need to understand our server's capability, then decide whether use in-memory sort or external sort.
How much memory do i need to load 100 million records in to memory. Suppose each record needs 7 bytes. Here is my calculation
each record = <int> <short> <byte>
4 + 2 + 1 = 7 bytes
needed memory in GB = 7 * 100 * 1,000,000 / 1000,000,000 = 0.7 GB
Do you see any problem with this calculation?
With 100,000,000 records, you need to allow for overhead. Exactly what and how much overhead you'll have will depend on the language.
In C/C++, for example, fields in a structure or class are aligned onto specific boundaries. Details may vary depending on the compiler, but in general int's must begin at an address that is a multiple of 4, short's at a multiple of 2, char's can begin anywhere.
So assuming that your 4+2+1 means an int, a short, and a char, then if you arrange them in that order, the structure will take 7 bytes, but at the very minimum the next instance of the structure must begin at a 4-byte boundary, so you'll have 1 pad byte in the middle. I think, in fact, most C compilers require structs as a whole to begin at an 8-byte boundary, though in this case that doesn't matter.
Every time you allocate memory there's some overhead for allocation block. The compiler has to be able to keep track of how much memory was allocated and sometimes where the next block is. If you allocate 100,000,000 records as one big "new" or "malloc", then this overhead should be trivial. But if you allocate each one individually, then each record will have the overhead. Exactly how much that is depends on the compiler, but, let's see, one system I used I think it was 8 bytes per allocation. If that's the case, then here you'd need 16 bytes for each record: 8 bytes for block header, 7 for data, 1 for pad. So it could easily take double what you expect.
Other languages will have different overhead. The easiest thing to do is probably to find out empirically: Look up what the system call is to find out how much memory you're using, then check this value, allocate a million instances, check it again and see the difference.
If you really need just 7 bytes per structure, then you are almost right.
For memory measurements, we usually use the factor of 1024, so you would need
700 000 000 / 1024³ = 667,57 MiB = 0,652 GiB
I need to simulate a memory-hungry process. For example, On a machine with 4.0 GiB, I need a process that would eat 3.2 GiB (give or take few MiB).
I assumed it should be as easy as:
my $mbytes = 3276;
my $huge_string = 'X' x ($mbytes * 1024 * 1024);
But I end up with process eating twice as much memory as I need it to.
this is same on two Windows 7 amd64 machines: one with 64-bit, the other
with 32-bit build of Strawberry Perl
I'm using Sysinternals Process Explorer and watching "Private Bytes"
Of course, I could just $mbytes /= 2 (for now, I'll probably will do that), but:
Is there a better way?
Can anyone explain why the amount is twice as length of the string?
Code adapted from http://www.perlmonks.org/index.pl?node_id=948181, all credit goes to Perlmonk BrowserUk.
my $huge_string = 'X';
$huge_string x= $mbytes * 1024 * 1024;
why the amount is twice as length of the string?
Think about the order of evaluation. The right-hand expression allocates memory for your x expression, and again so does the assignment operation into your new scalar. As usual for Perl, even though the right-hand expression is not referenced anymore, the memory is not freed right away.
Operating on an existing scalar avoids the second allocation, as shown above.
I've been poring over this Erlang crash dump where the VM has run out of heap memory. The problem is that there is no obvious culprit allocating all that memory.
Using some serious black awk magic I've summed up the fields Stack+heap, OldHeap, Heap unused and OldHeap unused for each process and ranked them by memory usage. The problem is that this number doesn't come even close to the number that is representing the total memory for all the processes processes_used according to the Erlang crash dump guide.
I've already tried the Crashdump Viewer and either I'm missing something or there isn't much help there for my kind of problem.
The number I get is 525 MB whereas the processes_used value is at 1348 MB. Where can I find the rest of the memory?
Edit: The Heap unused and OldHeap unused shouldn't have been included since they are a sub-part of Stack+Heap and OldHeap, that plus the fact that the number displayed for Stack+Heap and OldHeap are listed as number of words, not bytes, was the problem.
There is an module called crashdump_viewer which is great for these kinds of analysis.
Another thing to keep in mind is that Heap+Stack is afaik in words, not bytes which would mean that you have to multiply Heap+Stack with 4 on 32 and 8 on 64 bit. Can't find a reference in the manual for this but Processes talks about it a bit.
It seems like people are compiling MRI Ruby (1.8.7) for 64-bit platforms. I've been searching and reading for a while now without really getting the answers I want. What I want to know is if any of you guys actually used more than 4GB of memory in Ruby? Is Ruby truly 64-bits if you compile it that way?
I've found comments in the source code indicating that it's not tested on 64-bits. For instance it says "BigDecimal has not yet been compiled and tested on 64 bit integer system." in the comments for BigDecimal.
It would also be interesting to know how the other implementations of Ruby do in 64-bits.
MRI (both the 1.8.x and 1.9.x line) can be compiled as 64 bits.
For example, Snow Leopard comes bundled with 1.8.7 compiled as 64 bits. This can be seen in the activity monitor, or from irb by asking, for example 42.size. You'll get 8 (bytes) if it is compiled in 64 bits, 4 (bytes) otherwise.
Ruby will be able to access over 4G of ram. For example:
$ irb
>> n = (1 << 29) + 8
=> 536870920
>> x = Array.new(n, 42); x.size
=> 536870921 # one greater because it holds elements from 0 to n inclusive
Getting the last line will take a while if you don't have more than 4 G or ram because the OS will swap a lot, but even on my machine with 4 GB it works. Virtual ram size for the process was 4.02 G.
I updated the comment in the bigdecimal html file which was outdated (from march 2003...)