Go shift count too large - go

In Go's constant specification, it is mentioned that:
Numeric constants represent exact values of arbitrary precision and do not overflow.
So I tried
const VeryVeryBigNumber = 1 << 200
and it works. However, the biggest shift count I could try is 511 and using 512 will throw:
shift count too large: 512.
What does 512 represents? I have no intention to use it, I just want to know why is it limited to 511 in my machine (I'm using ubuntu 64 bit and go 1.9.2)?
Thanks

512 is kind of an arbitrary limit. The only thing the spec says is:
Implementation restriction: Although numeric constants have arbitrary
precision in the language, a compiler may implement them using an
internal representation with limited precision. That said, every
implementation must:
Represent integer constants with at least 256 bits.
Unfortunately, the comments around the limits don't give a reason.
At some point, a limit has to be used. I would recommend sticking to the required 256.

Related

Should I use double data structure to store very large Integer values?

int types have a very low range of number it supports as compared to double. For example I want to use a integer number with a high range. Should I use double for this purpose. Or is there an alternative for this.
Is arithmetic slow in doubles ?
Whether double arithmetic is slow as compared to integer arithmetic depends on the CPU and the bit size of the integer/double.
On modern hardware floating point arithmetic is generally not slow. Even though the general rule may be that integer arithmetic is typically a bit faster than floating point arithmetic, this is not always true. For instance multiplication & division can even be significantly faster for floating point than the integer counterpart (see this answer)
This may be different for embedded systems with no hardware support for floating point. Then double arithmetic will be extremely slow.
Regarding your original problem: You should note that a 64 bit long long int can store more integers exactly (2^63) while double can store integers only up to 2^53 exactly. It can store higher numbers though, but not all integers: they will get rounded.
The nice thing about floating point is that it is much more convenient to work with. You have special symbols for infinity (Inf) and a symbol for undefined (NaN). This makes division by zero for instance possible and not an exception. Also one can use NaN as a return value in case of error or abnormal conditions. With integers one often uses -1 or something to indicate an error. This can propagate in calculations undetected, while NaN will not be undetected as it propagates.
Practical example: The programming language MATLAB has double as the default data type. It is used always even for cases where integers are typically used, e.g. array indexing. Even though MATLAB is an intepreted language and not so fast as a compiled language such as C or C++ is is quite fast and a powerful tool.
Bottom line: Using double instead of integers will not be slow. Perhaps not most efficient, but performance hit is not severe (at least not on modern desktop computer hardware).

64 bit integer and 64 bit float homogeneous representation

Assume we have some sequence as input. For performance reasons we may want to convert it in homogeneous representation. And in order to transform it into homogeneous representation we are trying to convert it to same type. Here lets consider only 2 types in input - int64 and float64 (in my simple code I will use numpy and python; it is not the matter of this question - one may think only about 64-bit integer and 64-bit floats).
First we may try to cast everything to float64.
So we want something like so as input:
31 1.2 -1234
be converted to float64. If we would have all int64 we may left it unchanged ("already homogeneous"), or if something else was found we would return "not homogeneous". Pretty straightforward.
But here is the problem. Consider a bit modified input:
31000000 1.2 -1234
Idea is clear - we need to check that our "caster" is able to handle large by absolute value int64 properly:
format(np.float64(31000000), '.0f') # just convert to float64 and print
'31000000'
Seems like not a problem at all. So lets go to the deal right away:
im = np.iinfo(np.int64).max # maximum of int64 type
format(np.float64(im), '.0f')
format(np.float64(im-100), '.0f')
'9223372036854775808'
'9223372036854775808'
Now its really undesired - we lose some information which maybe needed. I.e. we want to preserve all the information provided in the input sequence.
So our im and im-100 values cast to the same float64 representation. The reason of this is clear - float64 has only 53 significand of total 64 bits. That is why its precision enough to represent log10(2^53) ~= 15.95 i.e. about all 16-length int64 without any information loss. But int64 type contains up to 19 digits.
So we end up with about [10^16; 10^19] (more precisely [10^log10(53); int64.max]) range in which each int64 may be represented with information loss.
Q: What decision in such situation should one made in order to represent int64 and float64 homogeneously.
I see several options for now:
Just convert all int64 range to float64 and "forget" about possible information loss.
Motivation here is "majority of input barely will be > 10^16 int64 values".
EDIT: This clause was misleading. In clear formulation we don't consider such solutions (but left it for completeness).
Do not make such automatic conversions at all. Only if explicitly specified.
I.e. we agree with performance drawbacks. For any int-float arrays. Even with ones as in simplest 1st case.
Calculate threshold for performing conversion to float64 without possible information loss. And use it while making casting decision. If int64 above this threshold found - do not convert (return "not homogeneous").
We've already calculate this threshold. It is log10(2^53) rounded.
Create new type "fint64". This is an exotic decision but I'm considering even this one for completeness.
Motivation here consists of 2 points. First one: it is frequent situation when user wants to store int and float types together. Second - is structure of float64 type. I'm not quite understand why one will need ~308 digits value range if significand consists only of ~16 of them and other ~292 is itself a noise. So we might use one of float64 exponent bits to indicate whether its float or int is stored here. But for int64 it would be definitely drawback to lose 1 bit. Cause would reduce our integer range twice. But we would gain possibility freely store ints along with floats without any additional overhead.
EDIT: While my initial thinking of this was as "exotic" decision in fact it is just a variant of another solution alternative - composite type for our representation (see 5 clause). But need to add here that my 1st composition has definite drawback - losing some range for float64 and for int64. What we rather do - is not to subtract 1 bit but add one bit which represents a flag for int or float type stored in following 64 bits.
As proposed #Brendan one may use composite type consists of "combination of 2 or more primitive types". So using additional primitives we may cover our "problem" range for int64 for example and get homogeneous representation in this "new" type.
EDITs:
Because here question arisen I need to try be very specific: Devised application in question do following thing - convert sequence of int64 or float64 to some homogeneous representation lossless if possible. The solutions are compared by performance (e.g. total excessive RAM needed for representation). That is all. No any other requirements is considered here (cause we should consider a problem in its minimal state - not writing whole application). Correspondingly algo that represents our data in homogeneous state lossless (we are sure we not lost any information) fits into our app.
I've decided to remove words "app" and "user" from question - it was also misleading.
When choosing a data type there are 3 requirements:
if values may have different signs
needed precision
needed range
Of course hardware doesn't provide a lot of types to choose from; so you'll need to select the next largest provided type. For example, if you want to store values ranging from 0 to 500 with 8 bits of precision; then hardware won't provide anything like that and you will need to use either 16-bit integer or 32-bit floating point.
When choosing a homogeneous representation there are 3 requirements:
if values may have different signs; determined from the requirements from all of the original types being represented
needed precision; determined from the requirements from all of the original types being represented
needed range; determined from the requirements from all of the original types being represented
For example, if you have integers from -10 to +10000000000 you need a 35 bit integer type that doesn't exist so you'll use a 64-bit integer, and if you need floating point values from -2 to +2 with 31 bits of precision then you'll need a 33 bit floating point type that doesn't exist so you'll use a 64-bit floating point type; and from the requirements of these two original types you'll know that a homogeneous representation will need a sign flag, a 33 bit significand (with an implied bit), and a 1-bit exponent; which doesn't exist so you'll use a 64-bit floating point type as the homogeneous representation.
However; if you don't know anything about the requirements of the original data types (and only know that whatever the requirements were they led to the selection of a 64-bit integer type and a 64-bit floating point type), then you'll have to assume "worst cases". This leads to needing a homogeneous representation that has a sign flag, 62 bits of precision (plus an implied 1 bit) and an 8 bit exponent. Of course this 71 bit floating point type doesn't exist, so you need to select the next largest type.
Also note that sometimes there is no "next largest type" that hardware supports. When this happens you need to resort to "composed types" - a combination of 2 or more primitive types. This can include anything up to and including "big rational numbers" (numbers represented by 3 big integers in "numerator / divisor * (1 << exponent)" form).
Of course if the original types (the 64-bit integer type and 64-bit floating point type) were primitive types and your homogeneous representation needs to use a "composed type"; then your "for performance reasons we may want to convert it in homogeneous representation" assumption is likely to be false (it's likely that, for performance reasons, you want to avoid using a homogeneous representation).
In other words:
If you don't know anything about the requirements of the original data types, it's likely that, for performance reasons, you want to avoid using a homogeneous representation.
Now...
Let's rephrase your question as "How to deal with design failures (choosing the wrong types which don't meet requirements)?". There is only one answer, and that is to avoid the design failure. Run-time checks (e.g. throwing an exception if the conversion to the homogeneous representation caused precision loss) serve no purpose other than to notify developers of design failures.
It is actually very basic: use 64 bits floating point. Floating point is an approximation, and you will loose precision for many ints. But there are no uncertainties other than "might this originally have been integral" and "does the original value deviates more than 1.0".
I know of one non-standard floating point representation that would be more powerfull (to be found in the net). That might (or might not) help cover the ints.
The only way to have an exact int mapping, would be to reduce the int range, and guarantee (say) 60 bits ints to be precise, and the remaining range approximated by floating point. Floating point would have to be reduced too, either exponential range as mentioned, or precision (the mantissa).

What's the largest value supported by the math/big package in golang?

I'm reading the documentation to the math/big package here:
https://golang.org/pkg/math/big/#pkg-constants
I am trying to understand how large a number is too big for math.big, and this looked like a constant I could interrogate.
I see on my machine:
fmt.Println(math.MaxUint32)
4294967295
How does this relate to the largest integer possible on my machine, for the purpose of calculation? What are the units of this number? Is this bytes, or decimal places or something other than the number itself?
bignum libraries usually store big numbers as a sequence of digits (e.g. in base 264). Their limitation is related to the memory available. So the largest number you could represent is tied to the limitation of your virtual address space. You can safely assume that a number even as large as 1010000 is representable in bignum. Of course, a googolplex is not representable as a bignum (because it has more bits than the number of particles in the universe).
Another limitation is the complexity of arithmetic operations. But there exist very efficient bignum algorithms.
FWIW, the GMPlib (a C library for bignums) can deal with numbers as long as there is memory for them. However, it is rumored than when malloc fails, GMPlib is aborting.
I don't know what happens inside Go bignums when a number is too big to be representable (and that limit varies from one machine to the next and could be different from one run to the next). For example, Go's Int.Mul gives a product whose size is the sum of the size of the arguments, and the "out of memory" error is undocumented (but obviously can happen).
When using bignums, prefer iterative algorithms to recursive ones. For example, a naive recursive factorial might overflow the call stack with large enough bignums, so you want to code it iteratively.

long double (GCC specific) and __float128

I'm looking for detailed information on long double and __float128 in GCC/x86 (more out of curiosity than because of an actual problem).
Few people will probably ever need these (I've just, for the first time ever, truly needed a double), but I guess it is still worthwile (and interesting) to know what you have in your toolbox and what it's about.
In that light, please excuse my somewhat open questions:
Could someone explain the implementation rationale and intended usage of these types, also in comparison of each other? For example, are they "embarrassment implementations" because the standard allows for the type, and someone might complain if they're only just the same precision as double, or are they intended as first-class types?
Alternatively, does someone have a good, usable web reference to share? A Google search on "long double" site:gcc.gnu.org/onlinedocs didn't give me much that's truly useful.
Assuming that the common mantra "if you believe that you need double, you probably don't understand floating point" does not apply, i.e. you really need more precision than just float, and one doesn't care whether 8 or 16 bytes of memory are burnt... is it reasonable to expect that one can as well just jump to long double or __float128 instead of double without a significant performance impact?
The "extended precision" feature of Intel CPUs has historically been source of nasty surprises when values were moved between memory and registers. If actually 96 bits are stored, the long double type should eliminate this issue. On the other hand, I understand that the long double type is mutually exclusive with -mfpmath=sse, as there is no such thing as "extended precision" in SSE. __float128, on the other hand, should work just perfectly fine with SSE math (though in absence of quad precision instructions certainly not on a 1:1 instruction base). Am I right in these assumptions?
(3. and 4. can probably be figured out with some work spent on profiling and disassembling, but maybe someone else had the same thought previously and has already done that work.)
Background (this is the TL;DR part):
I initially stumbled over long double because I was looking up DBL_MAX in <float.h>, and incidentially LDBL_MAX is on the next line. "Oh look, GCC actually has 128 bit doubles, not that I need them, but... cool" was my first thought. Surprise, surprise: sizeof(long double) returns 12... wait, you mean 16?
The C and C++ standards unsurprisingly do not give a very concrete definition of the type. C99 (6.2.5 10) says that the numbers of double are a subset of long double whereas C++03 states (3.9.1 8) that long double has at least as much precision as double (which is the same thing, only worded differently). Basically, the standards leave everything to the implementation, in the same manner as with long, int, and short.
Wikipedia says that GCC uses "80-bit extended precision on x86 processors regardless of the physical storage used".
The GCC documentation states, all on the same page, that the size of the type is 96 bits because of the i386 ABI, but no more than 80 bits of precision are enabled by any option (huh? what?), also Pentium and newer processors want them being aligned as 128 bit numbers. This is the default under 64 bits and can be manually enabled under 32 bits, resulting in 32 bits of zero padding.
Time to run a test:
#include <stdio.h>
#include <cfloat>
int main()
{
#ifdef USE_FLOAT128
typedef __float128 long_double_t;
#else
typedef long double long_double_t;
#endif
long_double_t ld;
int* i = (int*) &ld;
i[0] = i[1] = i[2] = i[3] = 0xdeadbeef;
for(ld = 0.0000000000000001; ld < LDBL_MAX; ld *= 1.0000001)
printf("%08x-%08x-%08x-%08x\r", i[0], i[1], i[2], i[3]);
return 0;
}
The output, when using long double, looks somewhat like this, with the marked digits being constant, and all others eventually changing as the numbers get bigger and bigger:
5636666b-c03ef3e0-00223fd8-deadbeef
^^ ^^^^^^^^
This suggests that it is not an 80 bit number. An 80-bit number has 18 hex digits. I see 22 hex digits changing, which looks much more like a 96 bits number (24 hex digits). It also isn't a 128 bit number since 0xdeadbeef isn't touched, which is consistent with sizeof returning 12.
The output for __int128 looks like it's really just a 128 bit number. All bits eventually flip.
Compiling with -m128bit-long-double does not align long double to 128 bits with a 32-bit zero padding, as indicated by the documentation. It doesn't use __int128 either, but indeed seems to align to 128 bits, padding with the value 0x7ffdd000(?!).
Further, LDBL_MAX, seems to work as +inf for both long double and __float128. Adding or subtracting a number like 1.0E100 or 1.0E2000 to/from LDBL_MAX results in the same bit pattern.
Up to now, it was my belief that the foo_MAX constants were to hold the largest representable number that is not +inf (apparently that isn't the case?). I'm also not quite sure how an 80-bit number could conceivably act as +inf for a 128 bit value... maybe I'm just too tired at the end of the day and have done something wrong.
Ad 1.
Those types are designed to work with numbers with huge dynamic range. The long double is implemented in a native way in the x87 FPU. The 128b double I suspect would be implemented in software mode on modern x86s, as there's no hardware to do the computations in hardware.
The funny thing is that it's quite common to do many floating point operations in a row and the intermediate results are not actually stored in declared variables but rather stored in FPU registers taking advantage of full precision. That's why comparison:
double x = sin(0); if (x == sin(0)) printf("Equal!");
Is not safe and cannot be guaranteed to work (without additional switches).
Ad. 3.
There's an impact on the speed depending what precision you use. You can change used the precision of the FPU by using:
void
set_fpu (unsigned int mode)
{
asm ("fldcw %0" : : "m" (*&mode));
}
It will be faster for shorter variables, slower for longer. 128bit doubles will be probably done in software so will be much slower.
It's not only about RAM memory wasted, it's about cache being wasted. Going to 80 bit double from 64b double will waste from 33% (32b) to almost 50% (64b) of the memory (including cache).
Ad 4.
On the other hand, I understand that the long double type is mutually
exclusive with -mfpmath=sse, as there is no such thing as "extended
precision" in SSE. __float128, on the other hand, should work just
perfectly fine with SSE math (though in absence of quad precision
instructions certainly not on a 1:1 instruction base). Am I right under
these assumptions?
The FPU and SSE units are totally separate. You can write code using FPU at the same time as SSE. The question is what will the compiler generate if you constrain it to use only SSE? Will it try to use FPU anyway? I've been doing some programming with SSE and GCC will generate only single SISD on its own. You have to help it to use SIMD versions. __float128 will probably work on every machine, even the 8-bit AVR uC. It's just fiddling with bits after all.
The 80 bit in hex representation is actually 20 hex digits. Maybe the bits which are not used are from some old operation? On my machine, I compiled your code and only 20 bits change in long
mode: 66b4e0d2-ec09c1d5-00007ffe-deadbeef
The 128-bit version has all the bits changing. Looking at the objdump it looks as if it was using software emulation, there are almost no FPU instructions.
Further, LDBL_MAX, seems to work as +inf for both long double and
__float128. Adding or subtracting a number like 1.0E100 or 1.0E2000 to/from LDBL_MAX results in the same bit pattern. Up to now, it was my
belief that the foo_MAX constants were to hold the largest
representable number that is not +inf (apparently that isn't the
case?).
This seems to be strange...
I'm also not quite sure how an 80-bit number could conceivably
act as +inf for a 128-bit value... maybe I'm just too tired at the end
of the day and have done something wrong.
It's probably being extended. The pattern which is recognized to be +inf in 80-bit is translated to +inf in 128-bit float too.
IEEE-754 defined 32 and 64 floating-point representations for the purpose of efficient data storage, and an 80-bit representation for the purpose of efficient computation. The intention was that given float f1,f2; double d1,d2; a statement like d1=f1+f2+d2; would be executed by converting the arguments to 80-bit floating-point values, adding them, and converting the result back to a 64-bit floating-point type. This would offer three advantages compared with performing operations on other floating-point types directly:
While separate code or circuitry would be required for conversions to/from 32-bit types and 64-bit types, it would only be necessary to have only one "add" implementation, one "multiply" implementation, one "square root" implementation, etc.
Although in rare cases using an 80-bit computational type could yield results that were very slightly less accurate than using other types directly (worst-case rounding error is 513/1024ulp in cases where computations on other types would yield an error of 511/1024ulp), chained computations using 80-bit types would frequently be more accurate--sometimes much more accurate--than computations using other types.
On a system without a FPU, separating a double into a separate exponent and mantissa before performing computations, normalizing a mantissa, and converting a separate mantissa and exponent into a double, are somewhat time consuming. If the result of one computation will be used as input to another and discarded, using an unpacked 80-bit type will allow these steps to be omitted.
In order for this approach to floating-point math to be useful, however, it is imperative that it be possible for code to store intermediate results with the same precision as would be used in computation, such that temp = d1+d2; d4=temp+d3; will yield the same result as d4=d1+d2+d3;. From what I can tell, the purpose of long double was to be that type. Unfortunately, even though K&R designed C so that all floating-point values would be passed to variadic methods the same way, ANSI C broke that. In C as originally designed, given the code float v1,v2; ... printf("%12.6f", v1+v2);, the printf method wouldn't have to worry about whether v1+v2 would yield a float or a double, since the result would get coerced to a known type regardless. Further, even if the type of v1 or v2 changed to double, the printf statement wouldn't have to change.
ANSI C, however, requires that code which calls printf must know which arguments are double and which are long double; a lot of code--if not a majority--of code which uses long double but was written on platforms where it's synonymous with double fails to use the correct format specifiers for long double values. Rather than having long double be an 80-bit type except when passed as a variadic method argument, in which case it would be coerced to 64 bits, many compilers decided to make long double be synonymous with double and not offer any means of storing the results of intermediate computations. Since using an extended precision type for computation is only good if that type is made available to the programmer, many people came to conclude regard extended precision as evil even though it was only ANSI C's failure to handle variadic arguments sensibly that made it problematic.
PS--The intended purpose of long double would have benefited if there had also been a long float which was defined as the type to which float arguments could be most efficiently promoted; on many machines without floating-point units that would probably be a 48-bit type, but the optimal size could range anywhere from 32 bits (on machines with an FPU that does 32-bit math directly) up to 80 (on machines which use the design envisioned by IEEE-754). Too late now, though.
It boils down to the difference between 4.9999999999999999999 and 5.0.
Although the range is the main difference, it is precision that is important.
These type of data will be needed in great circle calculations or coordinate mathematics that is likely to be used with GPS systems.
As the precision is much better than normal double, it means you can retain typically 18 significant digits without loosing accuracy in calculations.
Extended precision I believe uses 80 bits (used mostly in maths processors), so 128 bits will be much more accurate.
C99 and C++11 added types float_t and double_t which are aliases for built-in floating-point types. Roughly, float_t is the type of the result of doing arithmetic among values of type float, and double_t is the type of the result of doing arithmetic among values of type double.

Arbitrary precision arithmetic with Ruby

How the heck does Ruby do this? Does Jörg or anyone else know what's happening behind the scenes?
Unfortunately I don't know C very well so bignum.c is of little help to me. I was just kind of curious it someone could explain (in plain English) the theory behind whatever miracle algorithm its using.
irb(main):001:0> 999**999
368063488259223267894700840060521865838338232037353204655959621437025609300472231530103873614505175218691345257589896391130393189447969771645832382192366076536631132001776175977932178658703660778465765811830827876982014124022948671975678131724958064427949902810498973271030787716781467419524180040734398996952930832508934116945966120176735120823151959779536852290090377452502236990839453416790640456116471139751546750048602189291028640970574762600185950226138244530187489211615864021135312077912018844630780307462205252807737757672094320692373101032517459518497524015120165166724189816766397247824175394802028228160027100623998873667435799073054618906855460488351426611310634023489044291860510352301912426608488807462312126590206830413782664554260411266378866626653755763627796569082931785645600816236891168141774993267488171702172191072731069216881668294625679492696148976999868715671440874206427212056717373099639711168901197440416590226524192782842896415414611688187391232048327738965820265934093108172054875188246591760877131657895633586576611857277011782497943522945011248430439201297015119468730712364007639373910811953430309476832453230123996750235710787086641070310288725389595138936784715274150426495416196669832679980253436807864187160054589045664027158817958549374490512399055448819148487049363674611664609890030088549591992466360050042566270348330911795487647045949301286614658650071299695652245266080672989921799342509291635330827874264789587306974472327718704306352445925996155619153783913237212716010410294999877569745287353422903443387562746452522860420416689019732913798073773281533570910205207767157128174184873357050830752777900041943256738499067821488421053870869022738698816059810579221002560882999884763252161747566893835178558961142349304466506402373556318707175710866983035313122068321102457824112014969387225476259342872866363550383840720010832906695360553556647545295849966279980830561242960013654529514995113584909050813015198928283202189194615501403435553060147713139766323195743324848047347575473228198492343231496580885057330510949058490527738662697480293583612233134502078182014347192522391449087738579081585795613547198599661273567662441490401862839817822686573112998663038868314974259766039340894024308383451039874674061160538242392803580758232755749310843694194787991556647907091849600704712003371103926967137408125713631396699343733288014254084819379380555174777020843568689927348949484201042595271932630685747613835385434424807024615161848223715989797178155169951121052285149157137697718850449708843330475301440373094611119631361702936342263219382793996895988331701890693689862459020775599439506870005130750427949747071390095256759203426671803377068109744629909769176319526837824364926844730545524646494321826241925107158040561607706364484910978348669388142016838792902926158979355432483611517588605967745393958061959024834251565197963477521095821435651996730128376734574843289089682710350244222290017891280419782767803785277960834729869249991658417000499998999
Simple: it does it the same way you do, ever since first grade. Except it doesn't compute in base 10, it computes in base 4 billion (and change).
Think about it: with our number system, we can only represent numbers from 0 to 9. So, how can we compute 6+7 without overflowing? Easy: we do actually overflow! We cannot represent the result of 6+7 as a number between 0 and 9, but we can overflow to the next place and represent it as two numbers between 0 and 9: 3×100 + 1×101. If you want to add two numbers, you add them digit-wise from the right and overflow ("carry") to the left. If you want to multiply two numbers, you have to multiply every digit of one number individually with the other number, then add up the intermediate results.
BigNum arithmetic (this is what this kind of arithmetic where the numbers are bigger than the native machine numbers is usually called) works basically the same way. Except that the base is not 10, and its not 2, either – it's the size of a native machine integer. So, on a 32 bit machine, it would be base 232 or 4 294 967 296.
Specifically, in Ruby Integer is actually an abstract class that is never instianted. Instead, it has two subclasses, Fixnum and Bignum, and numbers automagically migrate between them, depending on their size. In MRI and YARV, Fixnum can hold a 31 or 63 bit signed integer (one bit is used for tagging) depending on the native word size of the machine. In JRuby, a Fixnum can hold a full 64 bit signed integer, even on an 32 bit machine.
The simplest operation is adding two numbers. And if you look at the implementation of + or rather bigadd_core in YARV's bignum.c, it's not too bad to follow. I can't read C either, but you can cleary see how it loops over the individual digits.
You could read the source for bignum.c...
At a very high level, without going into any implementation details, bignums are calculated "by hand" like you used to do in grade school. Now, there are certainly many optimizations that can be applied, but that's the gist of it.
I don't know of the implementation details so I'll cover how a basic Big Number implementation would work.
Basically instead of relying on CPU "integers" it will create it's own using multiple CPU integers. To store arbritrary precision, well lets say you have 2 bits. So the current integer is 11. You want to add one. In normal CPU integers, this would roll over to 00
But, for big number, instead of rolling over and keeping a "fixed" integer width, it would allocate another bit and simulate an addition so that the number becomes the correct 100.
Try looking up how binary math can be done on paper. It's very simple and is trivial to convert to an algorithm.
Beaconaut APICalc 2 just released on Jan.18, 2011, which is an arbitrary-precision integer calculator for bignum arithmetic, cryptography analysis and number theory research......
http://www.beaconaut.com/forums/default.aspx?g=posts&t=13
It uses the Bignum class
irb(main):001:0> (999**999).class
=> Bignum
Rdoc is available of course

Resources