Is there an IDIOMATIC way to get a random Fixnum in Ruby? - ruby

I'm playing with an algorithm which uses random numbers. It would be nice to be able to maximize the randomness I can get while keeping the number a nice reasonably-performant integer, so ideally they'd be in the range Fixnum::MIN .. Fixnum::MAX, but 0..Fixnum::MAX ought to be fine too.
OH WAIT. Those constants aren't actually things that exist. So when you read that Random.rand returns a float unless you pass it an integer argument the only obvious course of action is to resort to terrible hacks like these.
Is there any more-idiomatic way to get a random integer in Ruby, or does Yukihiro just expect me to make my code hideous and duplicate dubious integer-size exponentiation if I want this sort of capability?

Random Values from 0..FIXNUM_MAX
When Fixnum overflows, Ruby will just convert to Bignum. However, this related answer shows how to calculate the minimum and maximum values of Fixnum for your platform. Using that as a starting point, you can get a positive integer in the desired range with:
FIXNUM_MAX = (2**(0.size * 8 -2) -1)
Random.rand FIXNUM_MAX
Negative Integers
If you insist on having negative numbers too, then the following may be "close enough" for your purposes, even though FIXNUM_MIN.abs == FIXNUM_MAX may be false on your platform:
FIXNUM_MAX = (2**(0.size * 8 -2) -1)
random_num = Random.rand FIXNUM_MAX
random_num.even? ? random_num : random_num * -1
See Also
Kernel#rand
Random#rand
SecureRandom#random_number

This should get you a fairly large number of sample integers:
require "securerandom"
exponent = rand(1..15)
puts (SecureRandom.random_number * 10**exponent).to_i
a faster working algo that produces same or possibly better randomness:
r = Random.new
exponent = rand(1..15)
puts (r.rand * 10**exponent).to_i
or even a simpler way:
FIXNUM_MAX = (2**(0.size * 8 -2) -1)
FIXNUM_MIN = -(2**(0.size * 8 -2))
p rand(FIXNUM_MIN..FIXNUM_MAX)

Related

Cubic root of large number

I'm trying to identify the cubic root of a large number. I found a solution which works for smaller numbers, but not in this case:
require 'openssl'
q = OpenSSL::BN::generate_prime(2048)
ti = q.to_i #=> 3202718747...
ti3 = ti ** 3 #=> 328515909...
m = ti3 ** (1/3.0) #=> Infinity
I was hoping to see m = the original output of ti. Yes, this is a part of a Matasano challenge. I've put a lot of effort into not seeking help thus far, but I've reached a point where it's just a "how do I do something otherwise simple, in Ruby". Any assistance appreciated.
In ruby operations on integers automatically get promoted to bignums (arbitrary precision integers), so you never get an overflow.
The same is not true of floating point operations: you end up with infinity because raising to the power 1/3 is a floating point operation and the first thing it does is try to convert your number to a float. The biggest number a float in ruby can represent is about 10^308 whereas your number is probably around the 10^1800 mark, so it bails out and returns Infinity
Ruby has a BigDecimal class for this. You might therefore be tempted to do
BigDecimal.new(ti3) ** (1/3.0)
This gives a wildly wrong answer for me - I suspect because (1/3.0) is a float, so only approximately 1/3
BigDecimal.new(ti3) ** Rational(1,3)
On the other hand produces the correct result for me (with negligible error). Rational is Ruby's class for representing fractions in an exact manner. In ruby 2.1 you can shorten this to
BigDecimal.new(ti3) ** (1r/3)
The docs do say that only integer exponents are supported but this seems to be a hangover from the ruby 1.8 days
The following code was put forward based on the two pieces of advice given.
def nthroot(n, a, precision = 1e-1024)
x = a
begin
prev = x
x = ((n - 1) * prev + a / (prev ** (n - 1))) / n
end while (prev - x).abs > precision
x
end
It was based on an implementation of Newton's method which dealt with floats, but also just returned infinity. This version deals with integers only, but works for large integers.
Of course, an nthroot, may be called with n = 3.
I don't know what the Matasano challenge is, but what comes to mind is Newton's Method
The wikipedia page on Cube Roots also suggests using Newton's Method

How do I square a number without using multiplication?

Was wondering if there is a way to write a method which squares a number (integer or decimal/float) without using the operational sign (*). For example: square of 2 will be 4, square of 2.5 will be 6.25, and 3.5's will be 12.25.
Here is my approach:
def square(num)
number = num
number2 = number
(1...(number2.floor)).each{ num += number }
num
end
puts square(2) #=> 4 [Correct]
puts square(16) #=> 256 [Correct]
puts square(2.5) #=> 5.0 [Wrong]
puts square(3.5) #=> 10.5 [Wrong]
The code works for integers, but not with floats/decimals. What am I doing wrong here? Also, if anybody has a fresh approach to this problem then please share. Algorithms are also welcome. Also, considering performance of the method will be a plus.
There are a few tricks you could use, arranged here in order of increasing trickery.
Logarithms
Observe that k * k = e^log(k*k) = e^(log(k) + log(k)), and use that rule:
Math.exp(Math.log(5.2) + Math.log(5.2))
# => 27.04
No multiplication here!
Division
As another commenter suggested, you could take the reciprocal operation, division: k/(1.0/k) == k^2. However, this introduces additional floating-point errors, since k / (1.0 / k) is two floating-point operations, whereas k * k is only one.
Exponentiation
Or, since this is Ruby, if you want exactly the same value as the floating-point operation and you don't want to use the multiplication operator, you can use the exponentiation operator: k**2 == k * k.
Call a web service
It's not multiplying if you don't do it yourself!
require 'wolfram' # https://github.com/cldwalker/wolfram
query = 'Square[5.2]'
result = Wolfram.fetch(query)
Blatant cheating
Finally, if you're feeling really cheap, you could avoid actually employing the literal "*" operation, and use something equivalent:
n = ...
require 'base64'
n.send (Base64.decode64 'Kg==').to_sym, n # => n * n
Didn't use any operation sign.
def square(num)
num.send 42.chr, num
end
Well, the inverse of multiplication is division, so you can get the same result* by dividing by its inverse. That is: square(n) = n / (1.0 / n). Just make sure you don't inadvertently do integer division.
*Technically dividing twice introduces a second opportunity for rounding error in floating-point arithmetic since it performs two operations. So, this will not produce exactly the same result as floating-point multiplication - but this was also not a requirement in the question.

BigDecimal loses precision after multiplication

I'm getting a strange behaviour with BigDecimal in ruby. Why does this print false?
require 'bigdecimal'
a = BigDecimal.new('100')
b = BigDecimal.new('5.1')
c = a / b
puts c * b == a #false
BigDecimal doesn't claim to have infinite precision, it just provides support for precisions outside the normal floating point ranges:
BigDecimal provides similar support for very large or very accurate floating point numbers.
But BigDecimal values still have a finite number of significant digits, hence the precs method:
precs
Returns an Array of two Integer values.
The first value is the current number of significant digits in the BigDecimal. The second value is the maximum number of significant digits for the BigDecimal.
You can see things starting to go awry if you look at your c:
>> c.to_s
=> "0.19607843137254901960784313725E2"
That's a nice clean rational number but BigDecimal doesn't know that, it is still stuck seeing c as a finite string of digits.
If you use Rational instead, you'll get the results you're expecting:
>> a = Rational(100)
>> b = Rational(51, 10)
>> c * b == a
=> true
Of course, this trickery only applies if you are working with Rational numbers so anything fancy (such as roots or trigonometry) is out of bounds.
This is normal behaviour, and not at all strange.
BigDecimal does not guarantee infinite accuracy, it allows you to specify arbitrary accuracy, which is not the same thing. The value 100/5.1 cannot be expressed with complete precision using floating point internal representation. Doesn't matter how many bits are used.
A "big rational" approach could achieve it - but would not give you access to some functions e.g. square roots.
See http://ruby-doc.org/core-1.9.3/Rational.html
# require 'rational' necessary only in Ruby 1.8
a = 100.to_r
b = '5.1'.to_r
c = a / b
c * b == a
# => true

How to implement square root and exponentiation on arbitrary length numbers?

I'm working on new data type for arbitrary length numbers (only non-negative integers) and I got stuck at implementing square root and exponentiation functions (only for natural exponents). Please help.
I store the arbitrary length number as a string, so all operations are made char by char.
Please don't include advices to use different (existing) library or other way to store the number than string. It's meant to be a programming exercise, not a real-world application, so optimization and performance are not so necessary.
If you include code in your answer, I would prefer it to be in either pseudo-code or in C++. The important thing is the algorithm, not the implementation itself.
Thanks for the help.
Square root: Babylonian method. I.e.
function sqrt(N):
oldguess = -1
guess = 1
while abs(guess-oldguess) > 1:
oldguess = guess
guess = (guess + N/guess) / 2
return guess
Exponentiation: by squaring.
function exp(base, pow):
result = 1
bits = toBinary(powr)
for bit in bits:
result = result * result
if (bit):
result = result * base
return result
where toBinary returns a list/array of 1s and 0s, MSB first, for instance as implemented by this Python function:
def toBinary(x):
return map(lambda b: 1 if b == '1' else 0, bin(x)[2:])
Note that if your implementation is done using binary numbers, this can be implemented using bitwise operations without needing any extra memory. If using decimal, then you will need the extra to store the binary encoding.
However, there is a decimal version of the algorithm, which looks something like this:
function exp(base, pow):
lookup = [1, base, base*base, base*base*base, ...] #...up to base^9
#The above line can be optimised using exp-by-squaring if desired
result = 1
digits = toDecimal(powr)
for digit in digits:
result = result * result * lookup[digit]
return result
Exponentiation is trivially implemented with multiplication - the most basic implementation is just a loop,
result = 1;
for (int i = 0; i < power; ++i) result *= base;
You can (and should) implement a better version using squaring with divide & conquer - i.e. a^5 = a^4 * a = (a^2)^2 * a.
Square root can be found using Newton's method - you have to get an initial guess (a good one is to take a square root from the highest digit, and to multiply that by base of the digits raised to half of the original number's length), and then to refine it using division: if a is an approximation to sqrt(x), then a better approximation is (a + x / a) / 2. You should stop when the next approximation is equal to the previous one, or to x / a.

How can I randomly iterate through a large Range?

I would like to randomly iterate through a range. Each value will be visited only once and all values will eventually be visited. For example:
class Array
def shuffle
ret = dup
j = length
i = 0
while j > 1
r = i + rand(j)
ret[i], ret[r] = ret[r], ret[i]
i += 1
j -= 1
end
ret
end
end
(0..9).to_a.shuffle.each{|x| f(x)}
where f(x) is some function that operates on each value. A Fisher-Yates shuffle is used to efficiently provide random ordering.
My problem is that shuffle needs to operate on an array, which is not cool because I am working with astronomically large numbers. Ruby will quickly consume a large amount of RAM trying to create a monstrous array. Imagine replacing (0..9) with (0..99**99). This is also why the following code will not work:
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
redo if tried[x]
tried[x] = true
f(x) # some function
}
This code is very naive and quickly runs out of memory as tried obtains more entries.
What sort of algorithm can accomplish what I am trying to do?
[Edit1]: Why do I want to do this? I'm trying to exhaust the search space of a hash algorithm for a N-length input string looking for partial collisions. Each number I generate is equivalent to a unique input string, entropy and all. Basically, I'm "counting" using a custom alphabet.
[Edit2]: This means that f(x) in the above examples is a method that generates a hash and compares it to a constant, target hash for partial collisions. I do not need to store the value of x after I call f(x) so memory should remain constant over time.
[Edit3/4/5/6]: Further clarification/fixes.
[Solution]: The following code is based on #bta's solution. For the sake of conciseness, next_prime is not shown. It produces acceptable randomness and only visits each number once. See the actual post for more details.
N = size_of_range
Q = ( 2 * N / (1 + Math.sqrt(5)) ).to_i.next_prime
START = rand(N)
x = START
nil until f( x = (x + Q) % N ) == START # assuming f(x) returns x
I just remembered a similar problem from a class I took years ago; that is, iterating (relatively) randomly through a set (completely exhausting it) given extremely tight memory constraints. If I'm remembering this correctly, our solution algorithm was something like this:
Define the range to be from 0 to
some number N
Generate a random starting point x[0] inside N
Generate an iterator Q less than N
Generate successive points x[n] by adding Q to
the previous point and wrapping around if needed. That
is, x[n+1] = (x[n] + Q) % N
Repeat until you generate a new point equal to the starting point.
The trick is to find an iterator that will let you traverse the entire range without generating the same value twice. If I'm remembering correctly, any relatively prime N and Q will work (the closer the number to the bounds of the range the less 'random' the input). In that case, a prime number that is not a factor of N should work. You can also swap bytes/nibbles in the resulting number to change the pattern with which the generated points "jump around" in N.
This algorithm only requires the starting point (x[0]), the current point (x[n]), the iterator value (Q), and the range limit (N) to be stored.
Perhaps someone else remembers this algorithm and can verify if I'm remembering it correctly?
As #Turtle answered, you problem doesn't have a solution. #KandadaBoggu and #bta solution gives you random numbers is some ranges which are or are not random. You get clusters of numbers.
But I don't know why you care about double occurence of the same number. If (0..99**99) is your range, then if you could generate 10^10 random numbers per second (if you have a 3 GHz processor and about 4 cores on which you generate one random number per CPU cycle - which is imposible, and ruby will even slow it down a lot), then it would take about 10^180 years to exhaust all the numbers. You have also probability about 10^-180 that two identical numbers will be generated during a whole year. Our universe has probably about 10^9 years, so if your computer could start calculation when the time began, then you would have probability about 10^-170 that two identical numbers were generated. In the other words - practicaly it is imposible and you don't have to care about it.
Even if you would use Jaguar (top 1 from www.top500.org supercomputers) with only this one task, you still need 10^174 years to get all numbers.
If you don't belive me, try
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
puts "Oh, no!" if tried[x]
tried[x] = true
}
I'll buy you a beer if you will even once see "Oh, no!" on your screen during your life time :)
I could be wrong, but I don't think this is doable without storing some state. At the very least, you're going to need some state.
Even if you only use one bit per value (has this value been tried yes or no) then you will need X/8 bytes of memory to store the result (where X is the largest number). Assuming that you have 2GB of free memory, this would leave you with more than 16 million numbers.
Break the range in to manageable batches as shown below:
def range_walker range, batch_size = 100
size = (range.end - range.begin) + 1
n = size/batch_size
n.times do |i|
x = i * batch_size + range.begin
y = x + batch_size
(x...y).sort_by{rand}.each{|z| p z}
end
d = (range.end - size%batch_size + 1)
(d..range.end).sort_by{rand}.each{|z| p z }
end
You can further randomize solution by randomly choosing the batch for processing.
PS: This is a good problem for map-reduce. Each batch can be worked by independent nodes.
Reference:
Map-reduce in Ruby
you can randomly iterate an array with shuffle method
a = [1,2,3,4,5,6,7,8,9]
a.shuffle!
=> [5, 2, 8, 7, 3, 1, 6, 4, 9]
You want what's called a "full cycle iterator"...
Here is psudocode for the simplest version which is perfect for most uses...
function fullCycleStep(sample_size, last_value, random_seed = 31337, prime_number = 32452843) {
if last_value = null then last_value = random_seed % sample_size
return (last_value + prime_number) % sample_size
}
If you call this like so:
sample = 10
For i = 1 to sample
last_value = fullCycleStep(sample, last_value)
print last_value
next
It would generate random numbers, looping through all 10, never repeating If you change random_seed, which can be anything, or prime_number, which must be greater than, and not be evenly divisible by sample_size, you will get a new random order, but you will still never get a duplicate.
Database systems and other large-scale systems do this by writing the intermediate results of recursive sorts to a temp database file. That way, they can sort massive numbers of records while only keeping limited numbers of records in memory at any one time. This tends to be complicated in practice.
How "random" does your order have to be? If you don't need a specific input distribution, you could try a recursive scheme like this to minimize memory usage:
def gen_random_indices
# Assume your input range is (0..(10**3))
(0..3).sort_by{rand}.each do |a|
(0..3).sort_by{rand}.each do |b|
(0..3).sort_by{rand}.each do |c|
yield "#{a}#{b}#{c}".to_i
end
end
end
end
gen_random_indices do |idx|
run_test_with_index(idx)
end
Essentially, you are constructing the index by randomly generating one digit at a time. In the worst-case scenario, this will require enough memory to store 10 * (number of digits). You will encounter every number in the range (0..(10**3)) exactly once, but the order is only pseudo-random. That is, if the first loop sets a=1, then you will encounter all three-digit numbers of the form 1xx before you see the hundreds digit change.
The other downside is the need to manually construct the function to a specified depth. In your (0..(99**99)) case, this would likely be a problem (although I suppose you could write a script to generate the code for you). I'm sure there's probably a way to re-write this in a state-ful, recursive manner, but I can't think of it off the top of my head (ideas, anyone?).
[Edit]: Taking into account #klew and #Turtle's answers, the best I can hope for is batches of random (or close to random) numbers.
This is a recursive implementation of something similar to KandadaBoggu's solution. Basically, the search space (as a range) is partitioned into an array containing N equal-sized ranges. Each range is fed back in a random order as a new search space. This continues until the size of the range hits a lower bound. At this point the range is small enough to be converted into an array, shuffled, and checked.
Even though it is recursive, I haven't blown the stack yet. Instead, it errors out when attempting to partition a search space larger than about 10^19 keys. I has to do with the numbers being too large to convert to a long. It can probably be fixed:
# partition a range into an array of N equal-sized ranges
def partition(range, n)
ranges = []
first = range.first
last = range.last
length = last - first + 1
step = length / n # integer division
((first + step - 1)..last).step(step) { |i|
ranges << (first..i)
first = i + 1
}
# append any extra onto the last element
ranges[-1] = (ranges[-1].first)..last if last > step * ranges.length
ranges
end
I hope the code comments help shed some light on my original question.
pastebin: full source
Note: PW_LEN under # options can be changed to a lower number in order to get quicker results.
For a prohibitively large space, like
space = -10..1000000000000000000000
You can add this method to Range.
class Range
M127 = 170_141_183_460_469_231_731_687_303_715_884_105_727
def each_random(seed = 0)
return to_enum(__method__) { size } unless block_given?
unless first.kind_of? Integer
raise TypeError, "can't randomly iterate from #{first.class}"
end
sample_size = self.end - first + 1
sample_size -= 1 if exclude_end?
j = coprime sample_size
v = seed % sample_size
each do
v = (v + j) % sample_size
yield first + v
end
end
protected
def gcd(a,b)
b == 0 ? a : gcd(b, a % b)
end
def coprime(a, z = M127)
gcd(a, z) == 1 ? z : coprime(a, z + 1)
end
end
You could then
space.each_random { |i| puts i }
729815750697818944176
459631501395637888351
189447252093456832526
919263002791275776712
649078753489094720887
378894504186913665062
108710254884732609237
838526005582551553423
568341756280370497598
298157506978189441773
27973257676008385948
757789008373827330134
487604759071646274309
217420509769465218484
947236260467284162670
677052011165103106845
406867761862922051020
136683512560740995195
866499263258559939381
596315013956378883556
326130764654197827731
55946515352016771906
785762266049835716092
515578016747654660267
...
With a good amount of randomness so long as your space is a few orders smaller than M127.
Credit to #nick-steele and #bta for the approach.
This isn't really a Ruby-specific answer but I hope it's permitted. Andrew Kensler gives a C++ "permute()" function that does exactly this in his "Correlated Multi-Jittered Sampling" report.
As I understand it, the exact function he provides really only works if your "array" is up to size 2^27, but the general idea could be used for arrays of any size.
I'll do my best to sort of explain it. The first part is you need a hash that is reversible "for any power-of-two sized domain". Consider x = i + 1. No matter what x is, even if your integer overflows, you can determine what i was. More specifically, you can always determine the bottom n-bits of i from the bottom n-bits of x. Addition is a reversible hash operation, as is multiplication by an odd number, as is doing a bitwise xor by a constant. If you know a specific power-of-two domain, you can scramble bits in that domain. E.g. x ^= (x & 0xFF) >> 5) is valid for the 16-bit domain. You can specify that domain with a mask, e.g. mask = 0xFF, and your hash function becomes x = hash(i, mask). Of course you can add a "seed" value into that hash function to get different randomizations. Kensler lays out more valid operations in the paper.
So you have a reversible function x = hash(i, mask, seed). The problem is that if you hash your index, you might end up with a value that is larger than your array size, i.e. your "domain". You can't just modulo this or you'll get collisions.
The reversible hash is the key to using a technique called "cycle walking", introduced in "Ciphers with Arbitrary Finite Domains". Because the hash is reversible (i.e. 1-to-1), you can just repeatedly apply the same hash until your hashed value is smaller than your array! Because you're applying the same hash, and the mapping is one-to-one, whatever value you end up on will map back to exactly one index, so you don't have collisions. So your function could look something like this for 32-bit integers (pseudocode):
fun permute(i, length, seed) {
i = hash(i, 0xFFFF, seed)
while(i >= length): i = hash(i, 0xFFFF, seed)
return i
}
It could take a lot of hashes to get to your domain, so Kensler does a simple trick: he keeps the hash within the domain of the next power of two, which makes it require very few iterations (~2 on average), by masking out the unnecessary bits. The final algorithm looks like this:
fun next_pow_2(length) {
# This implementation is for clarity.
# See Kensler's paper for one way to do it fast.
p = 1
while (p < length): p *= 2
return p
}
permute(i, length, seed) {
mask = next_pow_2(length)-1
i = hash(i, mask, seed) & mask
while(i >= length): i = hash(i, mask, seed) & mask
return i
}
And that's it! Obviously the important thing here is choosing a good hash function, which Kensler provides in the paper but I wanted to break down the explanation. If you want to have different random permutations each time, you can add a "seed" value to the permute function which then gets passed to the hash function.

Resources