I have the following program and below the program an input data file, which contains 10 lines of different data. I want to read this data randomly not sequentially, for example, it will maybe read line 3 then maybe line 5, not like number 1 2 3 4... Then these numbers I want to print randomly.
program rand
implicit none
integer::i, ok
real(kind=8) , allocatable , dimension(:):: s
integer, parameter:: nstep = 1, natom = 10
integer:: seed, rand
open(unit=2,file="fort.2",status="old",action="read")
allocate(s(natom),stat=ok)
if(ok/=0)then
print*,"problem allocating position array"
end if
do i=1,natom
read(2,*)s(i)
print*,i=(rand(seed))
end do
end program rand
Input file:
1.004624
1.008447
1.028897
1.001287
0.9994195
1.036111
0.9829285
1.029622
1.005867
0.9372157
As suggested by #IanBush in a comment, and also by #Sazzad in his answer, a reasonable approach is to read the whole file into an array as your program is already doing. However, simply shuffing does not seem to me to lead to a random printing. It is just a new order. That is the reason while I am proposing this solution.
Random means that the same number can be printed many times while other are not printed at all, if the number of print is limited. And as I can see your problem is how to select randomly. Since you show some effort, here is a modified version of your program
program rand
implicit none
integer::i, ok, idx
real(kind=8) , allocatable , dimension(:):: s
integer, parameter:: nstep = 1, natom = 10
integer:: seed!, rand
real(kind = 8) :: randNum
!
!
open(unit=2,file="fort.2",status="old",action="read")
!
!
allocate(s(natom),stat=ok)
if(ok/=0)then
print*,"problem allocating position array"
end if
!
do i=1,natom
read(2,*)s(i)
!print*,i=(rand(seed))
end do
!
CALL random_seed() ! Initialize a pseudo-random number sequence
! to the default state. For serious program, do not use the default
! use for example the program on the website of gnu fortran
! https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html
!
do i=1,natom !you can and should change natom here to something else
CALL random_number(randNum)
idx = int(randNum*natom) + 1
print*,'element at ',idx,': ', s(idx)
end do
end program rand
This difference is that the printing is commented in your original program and there is a new loop to print randomly. You will see that some numbers will be printed more than once. To give each number a chance to be printed, you should set a large number of iteration inf the printing loop.
In this answer, I used the default seed for the random number which is not a good idea. On the web site of gnu fortran ( link ) you can find a good approach of initializing the random seed. It is a good programming habit if the reproducibility is not a concern.
General algorithm looks like,
Read all or N lines from file in lines[N]
Create an array index[N] = {1, 2, ... N}
Shuffle index array with simple shuffle algorithms
Traverse index[i] for each i up to size and output line[i]
You have to convert it in your language yourself
Related
My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8
To make sure that this is not a duplicate, I have already checked this and this out.
I want to generate random numbers in a specific range including step size (not continuous distribution).
For example, I want to generate random numbers between -2 and 3 in which the step between two consecutive numbers is 0.02. (e.g. [-2 -1.98 -1.96 ... 2.69 2.98 3] so a generated number should be 2.96 not 2.95).
I have tried this:
a=-2*100;
b=3*100;
r = (b-a).*rand(5,1) + a;
for i=1:length(r)
if r(i) >= 0
if mod(fix(r(i)),2)
r(i)=ceil(r(i))/100;
else
r(i)=floor(r(i))/100;
end
else
if mod(fix(r(i)),2)
r(i)=floor(r(i))/100;
else
r(i)=ceil(r(i))/100;
end
end
end
and it works.
there is an alternative way to do this in MATLAB which is :
y = datasample(-2:0.02:3,5,'Replace',false)
I want to know:
How can I make my own implementation faster (improve the
performance)?
If the second method is faster (it looks faster to me), how can I
use similar implementation in C++?
Those previous answers do cover your case if you read carefully. For example, this one produces random numbers between limits with a step size of one. But let's generalize this to an arbitrary step size in case you can't figure out how to get there. There are several different ways. Here's one using randi where we use the default step size of one and the range from one to the number possible values as indices:
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randi(length(v),[5 1]))
If you look inside datasample (type edit datasample in your command window to view the code) you'll see that it's doing something very similar to this answer. In the case of the 'Replace' option being true see around line 135 (in R2013a at least).
If the 'Replace' option is false, as in your use of datasample above, then randperm actually needs to be used instead (see around line 159):
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randperm(length(v),51))
Because there is no replacement in this case, 51 is the maximum number of values that can be requested in a call and all values of r will be unique.
In C++ you should not use rand() if you're doing scientific computing and generating large numbers of random variates. Instead you should use a large period random number generator such as Mersenne Twister (the default in Matlab). C++11 includes a version of this generator as part of . More here in rand(). If you want something fast, you should try the Double precision SIMD-oriented Fast Mersenne Twister. You'll have to ask another question if you want to implement your code in C++.
The distribution you want is a simple transform of integers, so how about:
step = 0.02
r = randi([-2 3] / step, [5, 1]) * step;
In C++, rand() generates integers too, so it should be pretty obvious how to take a similar approach there.
I have a data approximately a million record, each record have 6 floating point number. I want to find sets of records who share identical six values, and ideally I want to do it in Fortran since the rest of processing is done in Fortran. What would be the recommended approach for this? At the end i want to have mapping from original index to new index which is condensed version of these dataset without duplicate. Each record has other attributes and i am interested in aggregating those for groups based on the six attributes.
I tried to find those sets by exporting output as csv, import it into MS Access, then a query that finds those sets took 10 seconds or so to run. I wrote a code which does http://rosettacode.org/wiki/Remove_duplicate_elements#Fortran this ("linear search"?), but with million record it didnt complete after 10 min or so, i just abandoned this approach.
Approach I am thinking now is adapting ranking/sorting routine from slatec or orderpack which i assume do better than my crude code. But I am wondering if such things are already done and i can download, or if there is better approach for this.
EDIT:
I said "finding duplicate", but i actually need mapping from original data records to this reduced sets. I want to have mapping array like imap(1:n), where imap(1), imap(4), imap(5) has same values if those 6 float pt. values in original record 1, 4 and 5 are the same. Hope this is not too much a deviation from what I said originally...
This is what I ended up doing... I took code mrgrnk from ORDERPACK , and adapted for my purpose. The subroutine findmap below appears to be doing what I wanted it to do.
module fndmap
use m_mrgrnk, only:mrgrnk
implicit none
contains
subroutine findmap(stkprm, stkmap )
! given 2-d real array stkprm, find a mapping described below:
!
! (identical records are assigned with same index)
! stkmap(i) == stkmap(j) iff stkprm(:,i) == stkprm(:,j)
! (order conserved)
! if i < j and stkmap(i) /= stkmap(j), then stkmap(i) < stkmap(j)
! (new index are contiguous)
! set(stkmap) == {1,2,..,maxval(stkmap)}
!
real,dimension(:,:),intent(in) :: stkprm
integer,dimension(:), intent(out) :: stkmap
integer, dimension(size(stkprm,2)) :: irngt
integer, dimension(size(stkprm,2)) :: iwork
integer :: nrec, i, j
nrec = size(stkprm,2)
! find rank of each record, duplicate records kept
call ar_mrgrnk(stkprm, irngt)
! construct iwork array, which has index of original array where the
! record are identical, and the index is youguest
i = 1
do while(i<=nrec)
do j=i+1,nrec
if (any(stkprm(:,irngt(i))/=stkprm(:,irngt(j)))) exit
enddo
iwork(irngt(i:j-1)) = minval(irngt(i:j-1))
i = j
enddo
! now construct the map, where stkmap(i) shows index of new array
! with duplicated record eliminated, original order kept
j = 0
do i=1,nrec
if (i==iwork(i)) then
j = j+1
stkmap(i) = j
else
stkmap(i) = stkmap(iwork(i))
endif
enddo
end subroutine
recursive subroutine ar_mrgrnk(xdont, irngt)
! behaves like mrgrnk of ORDERPACK, except that array is 2-d
! each row are ranked by first field, then second and so on
real, dimension(:,:), intent(in) :: xdont
integer, dimension(:), intent(out), target :: irngt
integer, dimension(size(xdont,2)) :: iwork
integer :: nfld,nrec
integer :: i, j
integer, dimension(:), pointer :: ipt
nfld=size(xdont,1)
nrec=size(xdont,2)
! rank by the first field
call mrgrnk(xdont(1,:), irngt)
! if there's only one field, it's done
if (nfld==1) return
! examine the rank to see if multiple record has identical
! values for the first field
i = 1
do while(i<=nrec)
do j=i+1,nrec
if (xdont(1,irngt(i))/=xdont(1,irngt(j))) exit
enddo
! if one-to-one, do nothing
if (j-1>i) then
! if many-to-one,
! gather those many, and rank them
call ar_mrgrnk(xdont(2:,irngt(i:j-1)),iwork)
! rearrange my rank based on those fields to the right
ipt => irngt(i:j-1)
ipt = ipt(iwork(1:j-i))
endif
i = j
enddo
if(associated(ipt)) nullify(ipt)
end subroutine
end module
How does the value produced by rand function depends on it seed value.When we do not define any seed then how does its values differ.
Below is a code that i found for generating numbers for an integer array can any one please explain :
#!/usr/bin/perl -w
# Linear search of an array
# Note that if you later on want to search for something from a
# list of values, you shouldn’t have used an array in the first
# place.
# Generating 10 integers
$NUM = 10;
$MAXINT = 100; # 1 + the maximum integer generated
srand(); # initialize the randomize seed
print "Numbers Generated:\n(";
for $i (1 .. $NUM) {
push #array, sprintf("%d", rand(1) * $MAXINT);
print $array[$i-1];
print ", " unless ($i == $NUM);
}
print ")\n\n";
You don't need to explicitly call srand; it will be implicitly done for you the first time you call rand if you haven't previously called srand.
srand with no parameters will try to initialize the random number generator to a, err, random state. It uses /dev/urandom or the like if available and otherwise falls back on a value calculated from the current time and pid.
rand() with no parameters returns a floating point value between 0 (inclusive) and 1 (exclusive). Multiplying that by some integer number gives a floating point value from >= 0 and < that integer. Using that in integer context (such as a '%d' format value) gives you an integer from 0 to one less than your multiplier. rand(x), for x other than 0, returns the same range of random numbers that x * rand() would have. So rand(1) is equivalent to just rand(), and rand(1) * $MAXINT could have just been rand($MAXINT).
As far as I know perl uses the pseudo-random number generation functions of the standard C library.
It may depend on the implementation but it usually is a Linear Congruential Generator. This kind of PRNG uses its previous value to generate the next, therefore it will need a start value aka the seed.
The value of initializing with a selected seed is, that you get the same pseudo-random numbers. In that way you can keep some random based calculations repeatable, eg. how different alogrithms performs on a fixed set.
I would like to randomly iterate through a range. Each value will be visited only once and all values will eventually be visited. For example:
class Array
def shuffle
ret = dup
j = length
i = 0
while j > 1
r = i + rand(j)
ret[i], ret[r] = ret[r], ret[i]
i += 1
j -= 1
end
ret
end
end
(0..9).to_a.shuffle.each{|x| f(x)}
where f(x) is some function that operates on each value. A Fisher-Yates shuffle is used to efficiently provide random ordering.
My problem is that shuffle needs to operate on an array, which is not cool because I am working with astronomically large numbers. Ruby will quickly consume a large amount of RAM trying to create a monstrous array. Imagine replacing (0..9) with (0..99**99). This is also why the following code will not work:
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
redo if tried[x]
tried[x] = true
f(x) # some function
}
This code is very naive and quickly runs out of memory as tried obtains more entries.
What sort of algorithm can accomplish what I am trying to do?
[Edit1]: Why do I want to do this? I'm trying to exhaust the search space of a hash algorithm for a N-length input string looking for partial collisions. Each number I generate is equivalent to a unique input string, entropy and all. Basically, I'm "counting" using a custom alphabet.
[Edit2]: This means that f(x) in the above examples is a method that generates a hash and compares it to a constant, target hash for partial collisions. I do not need to store the value of x after I call f(x) so memory should remain constant over time.
[Edit3/4/5/6]: Further clarification/fixes.
[Solution]: The following code is based on #bta's solution. For the sake of conciseness, next_prime is not shown. It produces acceptable randomness and only visits each number once. See the actual post for more details.
N = size_of_range
Q = ( 2 * N / (1 + Math.sqrt(5)) ).to_i.next_prime
START = rand(N)
x = START
nil until f( x = (x + Q) % N ) == START # assuming f(x) returns x
I just remembered a similar problem from a class I took years ago; that is, iterating (relatively) randomly through a set (completely exhausting it) given extremely tight memory constraints. If I'm remembering this correctly, our solution algorithm was something like this:
Define the range to be from 0 to
some number N
Generate a random starting point x[0] inside N
Generate an iterator Q less than N
Generate successive points x[n] by adding Q to
the previous point and wrapping around if needed. That
is, x[n+1] = (x[n] + Q) % N
Repeat until you generate a new point equal to the starting point.
The trick is to find an iterator that will let you traverse the entire range without generating the same value twice. If I'm remembering correctly, any relatively prime N and Q will work (the closer the number to the bounds of the range the less 'random' the input). In that case, a prime number that is not a factor of N should work. You can also swap bytes/nibbles in the resulting number to change the pattern with which the generated points "jump around" in N.
This algorithm only requires the starting point (x[0]), the current point (x[n]), the iterator value (Q), and the range limit (N) to be stored.
Perhaps someone else remembers this algorithm and can verify if I'm remembering it correctly?
As #Turtle answered, you problem doesn't have a solution. #KandadaBoggu and #bta solution gives you random numbers is some ranges which are or are not random. You get clusters of numbers.
But I don't know why you care about double occurence of the same number. If (0..99**99) is your range, then if you could generate 10^10 random numbers per second (if you have a 3 GHz processor and about 4 cores on which you generate one random number per CPU cycle - which is imposible, and ruby will even slow it down a lot), then it would take about 10^180 years to exhaust all the numbers. You have also probability about 10^-180 that two identical numbers will be generated during a whole year. Our universe has probably about 10^9 years, so if your computer could start calculation when the time began, then you would have probability about 10^-170 that two identical numbers were generated. In the other words - practicaly it is imposible and you don't have to care about it.
Even if you would use Jaguar (top 1 from www.top500.org supercomputers) with only this one task, you still need 10^174 years to get all numbers.
If you don't belive me, try
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
puts "Oh, no!" if tried[x]
tried[x] = true
}
I'll buy you a beer if you will even once see "Oh, no!" on your screen during your life time :)
I could be wrong, but I don't think this is doable without storing some state. At the very least, you're going to need some state.
Even if you only use one bit per value (has this value been tried yes or no) then you will need X/8 bytes of memory to store the result (where X is the largest number). Assuming that you have 2GB of free memory, this would leave you with more than 16 million numbers.
Break the range in to manageable batches as shown below:
def range_walker range, batch_size = 100
size = (range.end - range.begin) + 1
n = size/batch_size
n.times do |i|
x = i * batch_size + range.begin
y = x + batch_size
(x...y).sort_by{rand}.each{|z| p z}
end
d = (range.end - size%batch_size + 1)
(d..range.end).sort_by{rand}.each{|z| p z }
end
You can further randomize solution by randomly choosing the batch for processing.
PS: This is a good problem for map-reduce. Each batch can be worked by independent nodes.
Reference:
Map-reduce in Ruby
you can randomly iterate an array with shuffle method
a = [1,2,3,4,5,6,7,8,9]
a.shuffle!
=> [5, 2, 8, 7, 3, 1, 6, 4, 9]
You want what's called a "full cycle iterator"...
Here is psudocode for the simplest version which is perfect for most uses...
function fullCycleStep(sample_size, last_value, random_seed = 31337, prime_number = 32452843) {
if last_value = null then last_value = random_seed % sample_size
return (last_value + prime_number) % sample_size
}
If you call this like so:
sample = 10
For i = 1 to sample
last_value = fullCycleStep(sample, last_value)
print last_value
next
It would generate random numbers, looping through all 10, never repeating If you change random_seed, which can be anything, or prime_number, which must be greater than, and not be evenly divisible by sample_size, you will get a new random order, but you will still never get a duplicate.
Database systems and other large-scale systems do this by writing the intermediate results of recursive sorts to a temp database file. That way, they can sort massive numbers of records while only keeping limited numbers of records in memory at any one time. This tends to be complicated in practice.
How "random" does your order have to be? If you don't need a specific input distribution, you could try a recursive scheme like this to minimize memory usage:
def gen_random_indices
# Assume your input range is (0..(10**3))
(0..3).sort_by{rand}.each do |a|
(0..3).sort_by{rand}.each do |b|
(0..3).sort_by{rand}.each do |c|
yield "#{a}#{b}#{c}".to_i
end
end
end
end
gen_random_indices do |idx|
run_test_with_index(idx)
end
Essentially, you are constructing the index by randomly generating one digit at a time. In the worst-case scenario, this will require enough memory to store 10 * (number of digits). You will encounter every number in the range (0..(10**3)) exactly once, but the order is only pseudo-random. That is, if the first loop sets a=1, then you will encounter all three-digit numbers of the form 1xx before you see the hundreds digit change.
The other downside is the need to manually construct the function to a specified depth. In your (0..(99**99)) case, this would likely be a problem (although I suppose you could write a script to generate the code for you). I'm sure there's probably a way to re-write this in a state-ful, recursive manner, but I can't think of it off the top of my head (ideas, anyone?).
[Edit]: Taking into account #klew and #Turtle's answers, the best I can hope for is batches of random (or close to random) numbers.
This is a recursive implementation of something similar to KandadaBoggu's solution. Basically, the search space (as a range) is partitioned into an array containing N equal-sized ranges. Each range is fed back in a random order as a new search space. This continues until the size of the range hits a lower bound. At this point the range is small enough to be converted into an array, shuffled, and checked.
Even though it is recursive, I haven't blown the stack yet. Instead, it errors out when attempting to partition a search space larger than about 10^19 keys. I has to do with the numbers being too large to convert to a long. It can probably be fixed:
# partition a range into an array of N equal-sized ranges
def partition(range, n)
ranges = []
first = range.first
last = range.last
length = last - first + 1
step = length / n # integer division
((first + step - 1)..last).step(step) { |i|
ranges << (first..i)
first = i + 1
}
# append any extra onto the last element
ranges[-1] = (ranges[-1].first)..last if last > step * ranges.length
ranges
end
I hope the code comments help shed some light on my original question.
pastebin: full source
Note: PW_LEN under # options can be changed to a lower number in order to get quicker results.
For a prohibitively large space, like
space = -10..1000000000000000000000
You can add this method to Range.
class Range
M127 = 170_141_183_460_469_231_731_687_303_715_884_105_727
def each_random(seed = 0)
return to_enum(__method__) { size } unless block_given?
unless first.kind_of? Integer
raise TypeError, "can't randomly iterate from #{first.class}"
end
sample_size = self.end - first + 1
sample_size -= 1 if exclude_end?
j = coprime sample_size
v = seed % sample_size
each do
v = (v + j) % sample_size
yield first + v
end
end
protected
def gcd(a,b)
b == 0 ? a : gcd(b, a % b)
end
def coprime(a, z = M127)
gcd(a, z) == 1 ? z : coprime(a, z + 1)
end
end
You could then
space.each_random { |i| puts i }
729815750697818944176
459631501395637888351
189447252093456832526
919263002791275776712
649078753489094720887
378894504186913665062
108710254884732609237
838526005582551553423
568341756280370497598
298157506978189441773
27973257676008385948
757789008373827330134
487604759071646274309
217420509769465218484
947236260467284162670
677052011165103106845
406867761862922051020
136683512560740995195
866499263258559939381
596315013956378883556
326130764654197827731
55946515352016771906
785762266049835716092
515578016747654660267
...
With a good amount of randomness so long as your space is a few orders smaller than M127.
Credit to #nick-steele and #bta for the approach.
This isn't really a Ruby-specific answer but I hope it's permitted. Andrew Kensler gives a C++ "permute()" function that does exactly this in his "Correlated Multi-Jittered Sampling" report.
As I understand it, the exact function he provides really only works if your "array" is up to size 2^27, but the general idea could be used for arrays of any size.
I'll do my best to sort of explain it. The first part is you need a hash that is reversible "for any power-of-two sized domain". Consider x = i + 1. No matter what x is, even if your integer overflows, you can determine what i was. More specifically, you can always determine the bottom n-bits of i from the bottom n-bits of x. Addition is a reversible hash operation, as is multiplication by an odd number, as is doing a bitwise xor by a constant. If you know a specific power-of-two domain, you can scramble bits in that domain. E.g. x ^= (x & 0xFF) >> 5) is valid for the 16-bit domain. You can specify that domain with a mask, e.g. mask = 0xFF, and your hash function becomes x = hash(i, mask). Of course you can add a "seed" value into that hash function to get different randomizations. Kensler lays out more valid operations in the paper.
So you have a reversible function x = hash(i, mask, seed). The problem is that if you hash your index, you might end up with a value that is larger than your array size, i.e. your "domain". You can't just modulo this or you'll get collisions.
The reversible hash is the key to using a technique called "cycle walking", introduced in "Ciphers with Arbitrary Finite Domains". Because the hash is reversible (i.e. 1-to-1), you can just repeatedly apply the same hash until your hashed value is smaller than your array! Because you're applying the same hash, and the mapping is one-to-one, whatever value you end up on will map back to exactly one index, so you don't have collisions. So your function could look something like this for 32-bit integers (pseudocode):
fun permute(i, length, seed) {
i = hash(i, 0xFFFF, seed)
while(i >= length): i = hash(i, 0xFFFF, seed)
return i
}
It could take a lot of hashes to get to your domain, so Kensler does a simple trick: he keeps the hash within the domain of the next power of two, which makes it require very few iterations (~2 on average), by masking out the unnecessary bits. The final algorithm looks like this:
fun next_pow_2(length) {
# This implementation is for clarity.
# See Kensler's paper for one way to do it fast.
p = 1
while (p < length): p *= 2
return p
}
permute(i, length, seed) {
mask = next_pow_2(length)-1
i = hash(i, mask, seed) & mask
while(i >= length): i = hash(i, mask, seed) & mask
return i
}
And that's it! Obviously the important thing here is choosing a good hash function, which Kensler provides in the paper but I wanted to break down the explanation. If you want to have different random permutations each time, you can add a "seed" value to the permute function which then gets passed to the hash function.