bignum in emacs/elisp - elisp

Does emacs have support for big numbers that don't fit in integers? If it does, how do I use them?

Emacs Lispers frustrated by Emacs’s
lack of bignum handling: calc.el
provides very good bignum
capabilities.—EmacsWiki
calc.el is part of the GNU Emacs distribution. See its source code for available functions. You can immediately start playing with it by typing M-x quick-calc. You may also want to check bigint.el package, that is a non-standard, lightweight implementation for handling bignums.

Emacs 27.1 supports bignums natively (see the NEWS file of Emacs):
** Emacs Lisp integers can now be of arbitrary size.
Emacs uses the GNU Multiple Precision (GMP) library to support
integers whose size is too large to support natively. The integers
supported natively are known as "fixnums", while the larger ones are
"bignums". The new predicates 'bignump' and 'fixnump' can be used to
distinguish between these two types of integers.
All the arithmetic, comparison, and logical (a.k.a. "bitwise")
operations where bignums make sense now support both fixnums and
bignums. However, note that unlike fixnums, bignums will not compare
equal with 'eq', you must use 'eql' instead. (Numerical comparison
with '=' works on both, of course.)
Since large bignums consume a lot of memory, Emacs limits the size of
the largest bignum a Lisp program is allowed to create. The
nonnegative value of the new variable 'integer-width' specifies the
maximum number of bits allowed in a bignum. Emacs signals an integer
overflow error if this limit is exceeded.
Several primitive functions formerly returned floats or lists of
integers to represent integers that did not fit into fixnums. These
functions now simply return integers instead. Affected functions
include functions like 'encode-char' that compute code-points, functions
like 'file-attributes' that compute file sizes and other attributes,
functions like 'process-id' that compute process IDs, and functions like
'user-uid' and 'group-gid' that compute user and group IDs.
Bignums are automatically chosen when arithmetic calculations with fixnums overflow the fixnum-range. The expression (bignump most-positive-fixnum) returns nil while (bignump (+ most-positive-fixnum 1)) returns t.

Related

Should I use double data structure to store very large Integer values?

int types have a very low range of number it supports as compared to double. For example I want to use a integer number with a high range. Should I use double for this purpose. Or is there an alternative for this.
Is arithmetic slow in doubles ?
Whether double arithmetic is slow as compared to integer arithmetic depends on the CPU and the bit size of the integer/double.
On modern hardware floating point arithmetic is generally not slow. Even though the general rule may be that integer arithmetic is typically a bit faster than floating point arithmetic, this is not always true. For instance multiplication & division can even be significantly faster for floating point than the integer counterpart (see this answer)
This may be different for embedded systems with no hardware support for floating point. Then double arithmetic will be extremely slow.
Regarding your original problem: You should note that a 64 bit long long int can store more integers exactly (2^63) while double can store integers only up to 2^53 exactly. It can store higher numbers though, but not all integers: they will get rounded.
The nice thing about floating point is that it is much more convenient to work with. You have special symbols for infinity (Inf) and a symbol for undefined (NaN). This makes division by zero for instance possible and not an exception. Also one can use NaN as a return value in case of error or abnormal conditions. With integers one often uses -1 or something to indicate an error. This can propagate in calculations undetected, while NaN will not be undetected as it propagates.
Practical example: The programming language MATLAB has double as the default data type. It is used always even for cases where integers are typically used, e.g. array indexing. Even though MATLAB is an intepreted language and not so fast as a compiled language such as C or C++ is is quite fast and a powerful tool.
Bottom line: Using double instead of integers will not be slow. Perhaps not most efficient, but performance hit is not severe (at least not on modern desktop computer hardware).

What's the largest value supported by the math/big package in golang?

I'm reading the documentation to the math/big package here:
https://golang.org/pkg/math/big/#pkg-constants
I am trying to understand how large a number is too big for math.big, and this looked like a constant I could interrogate.
I see on my machine:
fmt.Println(math.MaxUint32)
4294967295
How does this relate to the largest integer possible on my machine, for the purpose of calculation? What are the units of this number? Is this bytes, or decimal places or something other than the number itself?
bignum libraries usually store big numbers as a sequence of digits (e.g. in base 264). Their limitation is related to the memory available. So the largest number you could represent is tied to the limitation of your virtual address space. You can safely assume that a number even as large as 1010000 is representable in bignum. Of course, a googolplex is not representable as a bignum (because it has more bits than the number of particles in the universe).
Another limitation is the complexity of arithmetic operations. But there exist very efficient bignum algorithms.
FWIW, the GMPlib (a C library for bignums) can deal with numbers as long as there is memory for them. However, it is rumored than when malloc fails, GMPlib is aborting.
I don't know what happens inside Go bignums when a number is too big to be representable (and that limit varies from one machine to the next and could be different from one run to the next). For example, Go's Int.Mul gives a product whose size is the sum of the size of the arguments, and the "out of memory" error is undocumented (but obviously can happen).
When using bignums, prefer iterative algorithms to recursive ones. For example, a naive recursive factorial might overflow the call stack with large enough bignums, so you want to code it iteratively.

Representation of negative integers

Does ISO-Prolog have any prescriptions / recommendations
regarding the representation of negative integers and operations on them? 2's complement, maybe?
Asking as a programmer/user: Are there any assumptions I can safely make when performing bit-level operations on negative integers?
ISO/IEC 13211-1 has several requirements for integers, but a concrete representation is not required. If the integer representation is bounded, one of the following conditions holds
7.1.2 Integer
...
minint = -(*minint)
minint = -(maxint+1)
Further, the evaluable functors listed in 9.4 Bitwise functors, that is (>>)/2, (<<)/2, (/\)/2, (\/)/2, (\)/1, and xor/2 are implementation defined for negative values. E.g.,
8.4.1 (>>)/2 – bitwise right shift
9.4.1.1 Description
...
The value shall be implementation defined depending onwhether the shift is logical (fill with zeros) or arithmetic(fill with a copy of the sign bit).The value shall be implementation defined if VS is negative,or VS is larger than the bit size of an integer.
Note that implementation defined means that a conforming processor has to document this in the accompanying documentation. So before using a conforming processor, you have to read the manual.
De facto, there is no current Prolog processor (I am aware of) that does not provide arithmetic right shift and does not use 2's complement.
Strictly speaking these are two different questions:
Actual physical representation: this isn't visible at the Prolog level, and therefore the standard quite rightly has nothing to say about it. Note that many Prolog systems have two or more internal representations (e.g. two's complement fixed size and sign+magnitude bignums) but present a single integer type to the programmer.
Results of bitwise operations: while the standard defines these operations, it leaves much of their behaviour implementation defined. This is a consequence of (a) not having a way to specify the width of a bit pattern, and (b) not committing to a specific mapping between negative numbers and bit patterns.
This not only means that all bitwise operations on negative numbers are officially not portable, but also has the curious effect that the result of bitwise negation is totally implementation-defined (even for positive arguments): Y is \1 could legally give -2, 268435454, 2147483646, 9223372036854775806, etc. All you know is that negating twice returns the original number.
In practice, fortunately, there seems to be a consensus towards "The bitwise arithmetic operations behave as if operating on an unlimited length two's complement representation".

Arbitrary base conversion algorithm for (textually represented) integers

I am looking a general algorithm that would convert from one (arbitrary) numerical base to another (also arbitrary) without storing the result in a large integer and performing arithmetic operations on it in between.
The algorithm I am looking for takes an array of numerical values in a given base (that would mostly be a string of characters) and returns the result alike.
Thank you for help.
I would say it is not possible. For certain bases it would be possible to convert from one string to another, by just streaming the chars through (e.g. if one base is a multiple of the other, like octal->hex), but for arbitrary bases it is not possible without arithmetic operations.
If you would do it with strings/chars in between it would be still big integer arithmetic, but your integers were just in a (unnecessary big) unusual format.
So you have just the choice between: Either reprogram arithmetic operations with char encoded numbers, or do the step and use a big integer library and walk the convert(char(base1->bigInt), convert(bigInt->base2) path.
It's computable, but it's not pretty.
Seriously, it'd probably be easier and faster to include one of the many bignum libraries or write your own.

Can long integer routines benefit from SSE?

I'm still working on routines for arbitrary long integers in C++. So far, I have implemented addition/subtraction and multiplication for 64-bit Intel CPUs.
Everything works fine, but I wondered if I can speed it a bit by using SSE. I browsed through the SSE docs and processor instruction lists, but I could not find anything I think I can use and here is why:
SSE has some integer instructions, but most instructions handle floating point. It doesn't look like it was designed for use with integers (e.g. is there an integer compare for less?)
The SSE idea is SIMD (same instruction, multiple data), so it provides instructions for 2 or 4 independent operations. I, on the other hand, would like to have something like a 128 bit integer add (128 bit input and output). This doesn't seem to exist. (Yet? In AVX2 maybe?)
The integer additions and subtractions handle neither input nor output carries. So it's very cumbersome (and thus, slow) to do it by hand.
My question is: is my assessment correct or is there anything I have overlooked? Can long integer routines benefit from SSE? In particular, can they help me to write a quicker add, sub or mul routine?
In the past, the answer to this question was a solid, "no". But as of 2017, the situation is changing.
But before I continue, time for some background terminology:
Full Word Arithmetic
Partial Word Arithmetic
Full-Word Arithmetic:
This is the standard representation where the number is stored in base 232 or 264 using an array of 32-bit or 64-bit integers.
Many bignum libraries and applications (including GMP) use this representation.
In full-word representation, every integer has a unique representation. Operations like comparisons are easy. But stuff like addition are more difficult because of the need for carry-propagation.
It is this carry-propagation that makes bignum arithmetic almost impossible to vectorize.
Partial-Word Arithmetic
This is a lesser-used representation where the number uses a base less than the hardware word-size. For example, putting only 60 bits in each 64-bit word. Or using base 1,000,000,000 with a 32-bit word-size for decimal arithmetic.
The authors of GMP call this, "nails" where the "nail" is the unused portion of the word.
In the past, use of partial-word arithmetic was mostly restricted to applications working in non-binary bases. But nowadays, it's becoming more important in that it allows carry-propagation to be delayed.
Problems with Full-Word Arithmetic:
Vectorizing full-word arithmetic has historically been a lost cause:
SSE/AVX2 has no support for carry-propagation.
SSE/AVX2 has no 128-bit add/sub.
SSE/AVX2 has no 64 x 64-bit integer multiply.*
*AVX512-DQ adds a lower-half 64x64-bit multiply. But there is still no upper-half instruction.
Furthermore, x86/x64 has plenty of specialized scalar instructions for bignums:
Add-with-Carry: adc, adcx, adox.
Double-word Multiply: Single-operand mul and mulx.
In light of this, both bignum-add and bignum-multiply are difficult for SIMD to beat scalar on x64. Definitely not with SSE or AVX.
With AVX2, SIMD is almost competitive with scalar bignum-multiply if you rearrange the data to enable "vertical vectorization" of 4 different (and independent) multiplies of the same lengths in each of the 4 SIMD lanes.
AVX512 will tip things more in favor of SIMD again assuming vertical vectorization.
But for the most part, "horizontal vectorization" of bignums is largely still a lost cause unless you have many of them (of the same size) and can afford the cost of transposing them to make them "vertical".
Vectorization of Partial-Word Arithmetic
With partial-word arithmetic, the extra "nail" bits enable you to delay carry-propagation.
So as long as you as you don't overflow the word, SIMD add/sub can be done directly. In many implementations, partial-word representation uses signed integers to allow words to go negative.
Because there is (usually) no need to perform carryout, SIMD add/sub on partial words can be done equally efficiently on both vertically and horizontally-vectorized bignums.
Carryout on horizontally-vectorized bignums is still cheap as you merely shift the nails over the next lane. A full carryout to completely clear the nail bits and get to a unique representation usually isn't necessary unless you need to do a comparison of two numbers that are almost the same.
Multiplication is more complicated with partial-word arithmetic since you need to deal with the nail bits. But as with add/sub, it is nevertheless possible to do it efficiently on horizontally-vectorized bignums.
AVX512-IFMA (coming with Cannonlake processors) will have instructions that give the full 104 bits of a 52 x 52-bit multiply (presumably using the FPU hardware). This will play very well with partial-word representations that use 52 bits per word.
Large Multiplication using FFTs
For really large bignums, multiplication is most efficiently done using Fast-Fourier Transforms (FFTs).
FFTs are completely vectorizable since they work on independent doubles. This is possible because fundamentally, the representation that FFTs use is
a partial word representation.
To summarize, vectorization of bignum arithmetic is possible. But sacrifices must be made.
If you expect SSE/AVX to be able to speed up some existing bignum code without fundamental changes to the representation and/or data layout, that's not likely to happen.
But nevertheless, bignum arithmetic is possible to vectorize.
Disclosure:
I'm the author of y-cruncher which does plenty of large number arithmetic.

Resources