It seems like people are compiling MRI Ruby (1.8.7) for 64-bit platforms. I've been searching and reading for a while now without really getting the answers I want. What I want to know is if any of you guys actually used more than 4GB of memory in Ruby? Is Ruby truly 64-bits if you compile it that way?
I've found comments in the source code indicating that it's not tested on 64-bits. For instance it says "BigDecimal has not yet been compiled and tested on 64 bit integer system." in the comments for BigDecimal.
It would also be interesting to know how the other implementations of Ruby do in 64-bits.
MRI (both the 1.8.x and 1.9.x line) can be compiled as 64 bits.
For example, Snow Leopard comes bundled with 1.8.7 compiled as 64 bits. This can be seen in the activity monitor, or from irb by asking, for example 42.size. You'll get 8 (bytes) if it is compiled in 64 bits, 4 (bytes) otherwise.
Ruby will be able to access over 4G of ram. For example:
$ irb
>> n = (1 << 29) + 8
=> 536870920
>> x = Array.new(n, 42); x.size
=> 536870921 # one greater because it holds elements from 0 to n inclusive
Getting the last line will take a while if you don't have more than 4 G or ram because the OS will swap a lot, but even on my machine with 4 GB it works. Virtual ram size for the process was 4.02 G.
I updated the comment in the bigdecimal html file which was outdated (from march 2003...)
Related
I am reading the book <windows via c/c++> ,in Chapter 13 - Windows Memory Architecture -
Getting a Larger User-Mode Partition in x86 Windows
I occur at this:
In early versions of Windows, Microsoft didn't allow applications to
access their address space above 2 GB. So some creative developers
decided to leverage this and, in their code, they would use the high
bit in a pointer as a flag that had meaning only to their
applications. Then when the application accessed the memory address,
code executed that cleared the high bit of the pointer before the
memory address was used. Well, as you can imagine, when an application
runs in a user-mode environment greater than 2 GB, the application
fails in a blaze of fire.
I can't understand that, can someone make an example to explain it for me, thanks.
To access ~2GB of memory, you only need a 31 bit address. However, on 32 bit systems, addresses are 32 bit long and hence, pointers are 32 bit long.
As the book describes, in early versions of windows developers could only use 2GB of memory, therefore, the last bit in each 32-bit pointer could be used for other purposes, as it was ALWAYS zero. However, before using the address, this extra bit had to be cleared again, presumably so the program didn't crash, because it tried to access a higher than 2GB address.
The code probably looked something like this:
int val = 1;
int* p = &val;
// ...
// Using the last bit of p to set a flag, for some purpose
p |= 1UL << 31;
// ...
// Then before using the address in some way, the bit has to be cleared again:
p &= ~(1UL << 31);
*p = 3;
Now, if you can be certain that your pointers will only ever point to an address where the most significant bit (MSB) is zero, i.e. in a ~2GB address space, this is fine. However, if the address space is increased, some pointers will have a 1 in their MSB and by clearing it, you set your pointer to an incorrect address in memory. If you then try to read from or write to that address, you will have undefined behavior and your program will most likely fail in a blaze of fire.
This question already has answers here:
Increasing (or decreasing) the memory available to R processes
(7 answers)
Closed 8 years ago.
I am trying to access more memory using code I found in stackoverflow (Increasing (or decreasing) the memory available to R processes). However, I get the following error which I haven't been able to resolve:
memory.limit(10000)
Error in memory.limit(10000) :
don't be silly!: your machine has a 4Gb address limit
R is telling me that I have a 4gb address limit (despite the fact that I'm on a 64bit OS with 16gb of RAM). Anyone know how to get around this?
Windows OS: Windows 7 Enterprise, Intel(R) Core(TM) i7-2600 CPY #3.40GHz
Installed Memory (RAM): 16.0GB
System type: 64 bit OS
R Version: 3.0.0
RStudio Version: 0.97.551
I never used R, but with a quick search I came across memory.limit()documentation (here)
I quote :
memory.limit(size = NA)
size : numeric. If NA report the memory size, otherwise request a new limit, in Mb.
10.000 MB = 10 GB, hence the error.
About the 64-bit problem, it may come from R itself (depending on the virtual machine version I guess).
I've been reading Windows via C/C++ by Jeffrey Richter and came across the following snippet in the chapter about Windows' memory architecture related to porting 32 bit applications to a 64 bit environment.
If the system could somehow guarantee that no memory allocations would every be made above 0x00000000'7FFFFFFF, the application would work fine. Truncating a 64 bit address to a 32 bit address when the high 33 bits are 0 causes no problem whatsoever.
I'm having some trouble understanding why the system needs to guarantee that no memory allocations are made above 0x00000000'7FFFFFFF and not 0x00000000'FFFFFFFF. Shouldn't it be okay to truncate the address so long as the high 32 bits are 0? I'm probably missing something and would really appreciate it if someone with more knowledge about windows than me could explain why this is the case.
Not all 32bit systems/languages use unsigned values for memory addresses, so the 32th bit might have different meaning in some contexts. By limiting the address space to 31 bits, you don't run into that problem. And also, Windows limits a 32bit app from accessing addresses higher than 2 GB without the use of special extensions to extend that, so most apps would not need the 32th bit anyway.
(This question attempts to find out why the running of a program can be different on different processors, so it is related to the performance aspect of programming.)
The following program will take 3.6 seconds to run on a Macbook that has 2.2GHz Core 2 Duo, and 1.8 seconds to run on a Macbook Pro that has 2.53GHz Core 2 Duo. Why is that?
That's a bit weird... why doubling the speed when the CPU is only 15% faster in clock speed? I double checked the CPU meter to make sure none of the 2 cores are in 100% usage (so as to see the CPU is not busy running something else). Could it be because one is Mac OS X Leopard and one is Mac OS X Snow Leopard (64 bit)? Both are running Ruby 1.9.2.
p RUBY_VERSION
p RUBY_DESCRIPTION if defined? RUBY_DESCRIPTION
n = 9_999_999
p n
t = 0; 1.upto(n) {|i| t += i if i%3==0 || i%5==0}; p t
The following are just output of the program:
On 2.2GHz Core 2 Duo: (Update: Macbook identifier: MacBook3,1, therefore probably is Intel Core 2 Duo (T7300/T7500))
$ time ruby 1.rb
"1.9.2"
"ruby 1.9.2p0 (2010-08-18 revision 29036) [i386-darwin9.8.0]"
9999999
23333331666668
real 0m3.784s
user 0m3.751s
sys 0m0.021s
2.53GHz Intel Core 2 Duo: (Update: Macbook identifier: MacBookPro5,4, therefore probably is Intel Core 2 Duo Penryn with 3 MB on-chip L2 cache)
$ time ruby 1.rb
"1.9.2"
"ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]"
9999999
23333331666668
real 0m1.893s
user 0m1.809s
sys 0m0.012s
Test run on Windows 7:
time_start = Time.now
p RUBY_VERSION
p RUBY_DESCRIPTION if defined? RUBY_DESCRIPTION
n = 9_999_999
p n
t = 0; 1.upto(n) {|i| t += i if i%3==0 || i%5==0}; p t
print "Took #{Time.now - time_start} seconds to run\n"
Intel Q6600 Quad Core 2.4GHz running Windows 7, 64-bit:
C:\> ruby try.rb
"1.9.2"
"ruby 1.9.2p0 (2010-08-18) [i386-mingw32]"
9999999
23333331666668
Took 3.248186 seconds to run
Intel 920 i7 2.67GHz running Windows 7, 64-bit:
C:\> ruby try.rb
"1.9.2"
"ruby 1.9.2p0 (2010-08-18) [i386-mingw32]"
9999999
23333331666668
Took 2.044117 seconds to run
It is also strange why an i7 with 2.67GHz is slower than a 2.53GHz Core 2 Duo.
I suspect that ruby is switching to a arbitrary-precision integer implementation
later on the 64 bit os.
Quoting the Fixnum ruby doc:
A Fixnum holds Integer values that can
be represented in a native machine
word (minus 1 bit). If any operation
on a Fixnum exceeds this range, the
value is automatically converted to a
Bignum.
Here, a native machine word is technically 64 bit, but the interpreter is compiled to run on 32 bit processors.
why doubling the speed when the CPU is only 15% faster in clock speed?
Quite simply because the performance of computers is determined not solely by CPU clock speed.
Other things to consider are:
CPU architectures, including e.g. the number of cores on a CPU, or the general ability to run multiple instructions in parallel
other clock speeds in the system (memory, FSB)
CPU cache sizes
installed memory chips (some are faster than others)
additionally installed hardware (might slow down the system through hardware interruptions)
different operating systems
32-bit vs. 64-bit systems
I'm sure there's a lot more things to add to the above list. I won't elaborate further on the point, but if anyone feels like it, please, feel free to add to the above list.
In out CI environment we have a lot of "pizza box" computers that are supposed to be identical. They have the same hardware, were installed at the same time, and should be generally identical. They're even placed in "thermally equivalent" locations. They're not identical, and the variation can be quite stunning.
The only conclusion I have come up with is different binnings of CPU will have different thresholds for thermal stepping; some of the "best" chips hold up better. I also suspect other "minor" hardware faults/variations to be playing a role here. Maybe the slow boxes have slightly different components that play less well together ?
There are tools out there that will show you if your CPU is throttling for thermal reasons.
I don't know much Ruby, but your code doesn't look to be multithreaded, if that's the case it's not going to take advantage of multiple cores. There can also be large differences between two CPU models. You have smaller process sizes, larger caches, better SIMD instructions sets, faster memory access, etc... Compiler & OS differences can cause large swings in performance between Windows & Linux, this can also be said for x86 vs x64. Plus Core i7s support HyperThreading which in some cases makes a single threaded app slower.
Just as an example, if that 2.2Ghz CPU is an Intel Core2 E4500 it has the following specs:
Clock: 2.2Ghz
L2 Cache: 2MB
FSB: 800MT/sec
Process Size: 65nm
vs a T9400 which is likely in your MacBook Pro
Clock: 2.53Ghz
L2 Cache: 6MB
FSB: 1066MT/sec
Process Size: 45nm
Plus you're running it on an x64 build of Darwin. All those things could definitely add up to inflating a trivial little script into executing much faster.
didn't read the code, But.. It is really hard to test on 2 different computers.
You need exactly the same os, same processes, same amount of memory.
If you change the processor family (i7, core2due, P4, P4-D) - the processor frequency says nothing on each processor abilities against another family. you can only compare in the same family (a newer processor might invest cycles in core management rather in computation for example)
So, ruby enterprise documentation states that all the values in the GC settings are defined in slots: http://www.rubyenterpriseedition.com/documentation.html#_garbage_collector_performance_tuning
(e.g. RUBY_HEAP_MIN_SLOTS)
We fine-tuned our app's min slot size and increment for the best performance by trial and error (we have enough machines to get a good idea how different values affect the number of malloc calls and Full GCs).
But something has been bugging me for a while: How big is 1 slot in bytes?
From Ruby source:
* sizeof(RVALUE) is
* 20 if 32-bit, double is 4-byte aligned
* 24 if 32-bit, double is 8-byte aligned
* 40 if 64-bit
$ rvm use ruby-1.9.2-p136
$ gdb ruby
(gdb) p sizeof(RVALUE)
$1 = 40
The default in 1.9 is 8K
http://svn.ruby-lang.org/repos/ruby/trunk/gc.c
(search for HEAP_SIZE)
Note well that whenever it runs out of space and needs to reallocate, in 1.9 it allocates exponentially more heaps.
In 1.8 it would allocate bigger and bigger heaps.
After diggin' through the code:
1 slot is a size of sizeof(struct RVALUE), which depends on the machine.