Finding if a random number has occured before or not - algorithm

Let me be clear at start that this is a contrived example and not a real world problem.
If I have a problem of creating a random number between 0 to 10. I do this 11 times making sure that a previously occurred number is not drawn again, if I get a repeated number,
I create another random number again to make sure it has not be seen earlier. So essentially I get a a sequence of unique numbers from 0 - 10 in a random order
e.g. 3 1 2 0 5 9 4 8 10 6 7 and so on
Now to come up with logic to make sure that the random numbers are unique and not one which we have drawn before, we could use many approaches
Use C++ std::bitset and set the bit corresponding to the index equal to value of each random no. and check it next time when a new random number is drawn.
Or
Use a std::map<int,int> to count the number of times or even simple C array with some sentinel values stored in that array to indicate if that number has occurred or not.
If I have to avoid these methods above and use some mathematical/logical/bitwise operation to find whether a random number has been draw before or not, is there a way?

You don't want to do it the way you suggest. Consider what happens when you have already selected 10 of the 11 items; your random number generator will cycle until it finds the missing number, which might be never, depending on your random number generator.
A better solution is to create a list of numbers 0 to 10 in order, then shuffle the list into a random order. The normal algorithm for doing this is due to Knuth, Fisher and Yates: starting at the first element, swap each element with an element at a location greater than the current element in the array.
function shuffle(a, n)
for i from n-1 to 1 step -1
j = randint(i)
swap(a[i], a[j])
We assume an array with indices 0 to n-1, and a randint function that sets j to the range 0 <= j <= i.

Use an array and add all possible values to it. Then pick one out of the array and remove it. Next time, pick again until the array is empty.

Yes, there is a mathematical way to do it, but it is a bit expansive.
have an array: primes[] where primes[i] = the i'th prime number. So its beginning will be [2,3,5,7,11,...].
Also store a number mult Now, once you draw a number (let it be i) you check if mult % primes[i] == 0, if it is - the number was drawn before, if it wasn't - then the number was not. chose it and do mult = mult * primes[i].
However, it is expansive because it might require a lot of space for large ranges (the possible values of mult increases exponentially
(This is a nice mathematical approach, because we actually look at a set of primes p_i, the array of primes is only the implementation to the abstract set of primes).
A bit manipulation alternative for small values is using an int or long as a bitset.
With this approach, to check a candidate i is not in the set you only need to check:
if (pow(2,i) & set == 0) // not in the set
else //already in the set
To enter an element i to the set:
set = set | pow(2,i)
A better approach will be to populate a list with all the numbers, shuffle it with fisher-yates shuffle, and iterate it for generating new random numbers.

If I have to avoid these methods above and use some
mathematical/logical/bitwise operation to find whether a random number
has been draw before or not, is there a way?
Subject to your contrived constraints yes, you can imitate a small bitset using bitwise operations:
You can choose different integer types on the right according to what size you need.
bitset code bitwise code
std::bitset<32> x; unsigned long x = 0;
if (x[i]) { ... } if (x & (1UL << i)) { ... }
// assuming v is 0 or 1
x[i] = v; x = (x & ~(1UL << i)) | ((unsigned long)v << i);
x[i] = true; x |= (1UL << i);
x[i] = false; x &= ~(1UL << i);
For a larger set (beyond the size in bits of unsigned long long), you will need an array of your chosen integer type. Divide the index by the width of each value to know what index to look up in the array, and use the modulus for the bit shifts. This is basically what bitset does.
I'm assuming that the various answers that tell you how best to shuffle 10 numbers are missing the point entirely: that your contrived constraints are there because you do not in fact want or need to know how best to shuffle 10 numbers :-)

Keep a variable too map the drawn numbers. The i'th bit of that variable will be 1 if the number was drawn before:
int mapNumbers = 0;
int generateRand() {
if (mapNumbers & ((1 << 11) - 1) == ((1 << 11) - 1)) return; // return if all numbers have been generated
int x;
do {
x = newVal();
} while (!x & mapNumbers);
mapNumbers |= (1 << x);
return x;
}

Related

How to turn integers into Fibonacci coding efficiently?

Fibonacci sequence is obtained by starting with 0 and 1 and then adding the two last numbers to get the next one.
All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition. For example: 13 can be the sum of the sets {13}, {5,8} or {2,3,8}. But, as we have seen, some numbers have more than one set whose sum is the number. If we add the constraint that the sets cannot have two consecutive Fibonacci numbers, than we have a unique representation for each number.
We will use a binary sequence (just zeros and ones) to do that. For example, 17 = 1 + 3 + 13. Then, 17 = 100101. See figure 2 for a detailed explanation.
I want to turn some integers into this representation, but the integers may be very big. How to I do this efficiently.
The problem itself is simple. You always pick the largest fibonacci number less than the remainder. You can ignore the the constraint with the consecutive numbers (since if you need both, the next one is the sum of both so you should have picked that one instead of the initial two).
So the problem remains how to quickly find the largest fibonacci number less than some number X.
There's a known trick that starting with the matrix (call it M)
1 1
1 0
You can compute fibbonacci number by matrix multiplications(the xth number is M^x). More details here: https://www.nayuki.io/page/fast-fibonacci-algorithms . The end result is that you can compute the number you're look in O(logN) matrix multiplications.
You'll need large number computations (multiplications and additions) if they don't fit into existing types.
Also store the matrices corresponding to powers of two you compute the first time, since you'll need them again for the results.
Overall this should be O((logN)^2 * large_number_multiplications/additions)).
First I want to tell you that I really liked this question, I didn't know that All positive integers can be represented as a sum of a set of Fibonacci numbers without repetition, I saw the prove by induction and it was awesome.
To respond to your question I think that we have to figure how the presentation is created. I think that the easy way to find this is that from the number we found the closest minor fibonacci item.
For example if we want to present 40:
We have Fib(9)=34 and Fib(10)=55 so the first element in the presentation is Fib(9)
since 40 - Fib(9) = 6 and (Fib(5) =5 and Fib(6) =8) the next element is Fib(5). So we have 40 = Fib(9) + Fib(5)+ Fib(2)
Allow me to write this in C#
class Program
{
static void Main(string[] args)
{
List<int> fibPresentation = new List<int>();
int numberToPresent = Convert.ToInt32(Console.ReadLine());
while (numberToPresent > 0)
{
int k =1;
while (CalculateFib(k) <= numberToPresent)
{
k++;
}
numberToPresent = numberToPresent - CalculateFib(k-1);
fibPresentation.Add(k-1);
}
}
static int CalculateFib(int n)
{
if (n == 1)
return 1;
int a = 0;
int b = 1;
// In N steps compute Fibonacci sequence iteratively.
for (int i = 0; i < n; i++)
{
int temp = a;
a = b;
b = temp + b;
}
return a;
}
}
Your result will be in fibPresentation
This encoding is more accurately called the "Zeckendorf representation": see https://en.wikipedia.org/wiki/Fibonacci_coding
A greedy approach works (see https://en.wikipedia.org/wiki/Zeckendorf%27s_theorem) and here's some Python code that converts a number to this representation. It uses the first 100 Fibonacci numbers and works correctly for all inputs up to 927372692193078999175 (and incorrectly for any larger inputs).
fibs = [0, 1]
for _ in xrange(100):
fibs.append(fibs[-2] + fibs[-1])
def zeck(n):
i = len(fibs) - 1
r = 0
while n:
if fibs[i] <= n:
r |= 1 << (i - 2)
n -= fibs[i]
i -= 1
return r
print bin(zeck(17))
The output is:
0b100101
As the greedy approach seems to work, it suffices to be able to invert the relation N=Fn.
By the Binet formula, Fn=[φ^n/√5], where the brackets denote the nearest integer. Then with n=floor(lnφ(√5N)) you are very close to the solution.
17 => n = floor(7.5599...) => F7 = 13
4 => n = floor(4.5531) => F4 = 3
1 => n = floor(1.6722) => F1 = 1
(I do not exclude that some n values can be off by one.)
I'm not sure if this is an efficient enough for you, but you could simply use Backtracking to find a(the) valid representation.
I would try to start the backtracking steps by taking the biggest possible fib number and only switch to smaller ones if the consecutive or the only once constraint is violated.

Given a file containing 4.30 billion 32-bit integers, how can we find a number, which has appeared at least twice?

I have come up with divide and conquer algorithm for this. Just wanted to know if this would work or not?
First mid is calculated from the integer range i.e. (0+(1<<32-1))>>1 and then this idea is applied: range of number from start to mid or from mid to end will always be less than the numbers we are going to consider as we are considering billion numbers and there will definitely some numbers which are going to be repeated as the range of 32bit integer is much smaller compare to billion numbers.
def get_duplicate(input, start, end):
while True:
mid = (start >> 1) + end - (end >> 1)
less_to_mid = 0
more_to_mid = 0
equal_to_mid = 0
for data in input:
data = int(data, 16)
if data < mid:
less_to_mid += 1
elif data == mid:
equal_to_mid += 1
else:
more_to_mid += 1
if equal_to_mid > 1:
return mid
elif mid-start < less_to_mid:
end = mid-1
elif end-mid < more_to_mid:
start = mid+1
with open("codes\output.txt", 'r+') as f:
content = f.read().split()
print(get_duplicate(content, 0, 1<<32-1))
I know we can use bit array but I just want to get your views on this solution and if implementation is buggy.
Your method is OK. But you will probably need to read the input many times to find the answer.
Here is a variant, which allows you to find a duplicate with few memory, but you only need to read the input twice.
Initialize an array A[65536] of integers to zero.
Read the numbers one by one. Every time a number x is read, add 1 to A[x mod 65536].
When the reading ends, there will be at least one i such that A[i] is strictly bigger than 65536. This is because 65536 * 63356 < 4.3 billion. Let us say A[i0] is bigger than 65536.
Clear the array A to zero.
Read the numbers again, but this time, only look at those numbers x such that x mod 65536 = i0. For every such x, add 1 to A[x / 65536].
When the reading ends, there will be at least one j such that A[j] is strictly bigger than 1. Then the number 65536 * j + i0 is the final answer.
2^32 bits memory is nothing special for the modern systems. So you have to use bitset, this data structure needs only a bit per number and all modern languages have an implementation. Here is the idea - you just remember if a number has been already seen:
void print_twice_seen (Iterator &it)//iterates through all numbers
{
std::bitset< (1L<<32) > seen;//bitset for 2^32 elements, assume 64bit system
while(it.hasNext()){
unsigned int val=it.next();//return current element and move the iterator
if(seen[val])
std::cout<<"Seen at least twice: "<<val<<std::endl;
else
seen.set(val, true);//remember as seen
}
}

Generating a non-repeating set from a random seed, and extract result by index

p.s. I have referred to this as Random, but this is a Seed Based Random Shuffle, where the Seed will be generated by a PRNG, but with the same Seed, the same "random" distribution will be observed.
I am currently trying to find a method to assist in doing 2 things:
1) Generate Non-Repeating Sequence
This will take 2 arguments: Seed; and N. It will generate a sequence, of size N, populated with numbers between 1 and N, with no repetitions.
I have found a few good methods to do this, but most of them get stumped by feasibility with the second thing.
2) Extract an entry from the Sequence
This will take 3 arguments: Seed; N; and I. This is for determining what value would appear at position I in a Sequence that would be generated with Seed and N. However, in order to work with what I have in mind, it absolutely cannot use a generated sequence, and pick out an element.
I initially worked with pre-calculating the sequence, then querying it, but this only really works in test cases, as the number of Seeds, and the value of N that will be used would create a database into the Petabytes.
From what I can tell, having a method that implements requirement 1 by using requirement 2 would be the most ideal method.
i.e. a sequence is generated by:
function Generate_Sequence(int S, int N) {
int[] sequence = new int[N];
for (int i = 0; i < N; i++) {
sequence[i] = Extract_From_Sequence(S, N, i);
}
return sequence;
}
For Example
GS = Generate Sequence
ES = Extract from Sequence
for:
S = 1
N = 5
I = 4
GS(S, N) = { 4, 2, 5, 1, 3 }
ES(S, N, I) = 1
let S = 2
GS(S, N) = { 3, 5, 2, 4, 1 }
ES(S, N, I) = 4
One way to do this is to make a permutation over the bit positions of the number. Assume that N is a power of two (I will discuss the general case later!).
Use the seed S to generate a permutation \sigma over the set of {1,2,...,log(n)}. Then permute the bits of I according to the \sigma to obtain I'. In other words, the bit of I' at the position \sigma(x) is obtained from the bit of I at the position x.
One problem with this method is its linearity (It is closed under the XOR operation). To overcome this, you can find a number p with gcd(p,N)=1 (this can be done easily even for very large Ns) and generate a random number (q < N) using the seed S. The output of the Extract_From_Sequence(S, N, I) would be (p*I'+q mod N).
Now the case where N is not a complete power of two. The problem arises when the I' falls outside the range of [1,N]. In that case, we return the most significant bits of I to their initial position until the resulting value falls into the desired range. This is done by changing the \sigma(log(n)) bit of I' with the log(n) bit, and so on ....

Number of distinct sequences of fixed length which can be generated using a given set of numbers

I am trying to find different sequences of fixed length which can be generated using the numbers from a given set (distinct elements) such that each element from set should appear in the sequence. Below is my logic:
eg. Let the set consists of S elements, and we have to generate sequences of length K (K >= S)
1) First we have to choose S places out of K and place each element from the set in random order. So, C(K,S)*S!
2) After that, remaining places can be filled from any values from the set. So, the factor
(K-S)^S should be multiplied.
So, overall result is
C(K,S)S!((K-S)^S)
But, I am getting wrong answer. Please help.
PS: C(K,S) : No. of ways selecting S elements out of K elements (K>=S) irrespective of order. Also, ^ : power symbol i.e 2^3 = 8.
Here is my code in python:
# m is the no. of element to select from a set of n elements
# fact is a list containing factorial values i.e. fact[0] = 1, fact[3] = 6& so on.
def ways(m,n):
res = fact[n]/fact[n-m+1]*((n-m)**m)
return res
What you are looking for is the number of surjective functions whose domain is a set of K elements (the K positions that we are filling out in the output sequence) and the image is a set of S elements (your input set). I think this should work:
static int Count(int K, int S)
{
int sum = 0;
for (int i = 1; i <= S; i++)
{
sum += Pow(-1, (S-i)) * Fact(S) / (Fact(i) * Fact(S - i)) * Pow(i, K);
}
return sum;
}
...where Pow and Fact are what you would expect.
Check out this this math.se question.
Here's why your approach won't work. I didn't check the code, just your explanation of the logic behind it, but I'm pretty sure I understand what you're trying to do. Let's take for example K = 4, S = {7,8,9}. Let's examine the sequence 7,8,9,7. It is a unique sequence, but you can get to it by:
Randomly choosing positions 1,2,3, filling them randomly with 7,8,9 (your step 1), then randomly choosing 7 for the remaining position 4 (your step 2).
Randomly choosing positions 2,3,4, filling them randomly with 8,9,7 (your step 1), then randomly choosing 7 for the remaining position 1 (your step 2).
By your logic, you will count it both ways, even though it should be counted only once as the end result is the same. And so on...

Population segmentation algorithm

I have a population of 50 ordered integers (1,2,3,..,50) and I look for a generic way to slice it "n" ways ("n" is the number of cutoff points ranging from 1 to 25) that maintains the order of the elements.
For example, for n=1 (one cutoff point) there are 49 possible grouping alternatives ([1,2-49], [1-2,3-50], [1-3,4-50],...). For n=2 (two cutoff points), the grouping alternatives are like: [1,2,3-50], [1,2-3,4-50],...
Could you recommend any general-purpose algorithm to complete this task in an efficient way?
Thanks,
Chris
Thanks everyone for your feedback. I reviewed all your comments and I am working on a generic solution that will return all combinations (e.g., [1,2,3-50], [1,2-3,4-50],...) for all numbers of cutoff points.
Thanks again,
Chris
Let sequence length be N, and number of slices n.
That problem becomes easier when you notice that, choosing a slicing to n slices is equivalent to choosing n - 1 from N - 1 possible split points (a split point is between every two numbers in the sequence). Hence there is (N - 1 choose n - 1) such slicings.
To generate all slicings (to n slices), you have to generate all n - 1 element subsets of numbers from 1 to N - 1.
The exact algorithm for this problem is placed here: How to iteratively generate k elements subsets from a set of size n in java?
Do you need the cutoffs, or are you just counting them. If you're just going to count them, then it's simple:
1 cutoff = (n-1) options
2 cutoffs = (n-1)*(n-2)/2 options
3 cutoffs = (n-1)(n-2)(n-3)/4 options
you can see the patterns here
If you actually need the cutoffs, then you have to actually do the loops, but since n is so small, Emilio is right, just brute force it.
1 cutoff
for(i=1,i<n;++i)
cout << i;
2 cutoffs
for(i=1;<i<n;++i)
for(j=i+1,j<n;++j)
cout << i << " " << j;
3 cutoffs
for(i=1;<i<n;++i)
for(j=i+1,j<n;++j)
for(k=j+1,k<n;++k)
cout << i << " " << j << " " << k;
again, you can see the pattern
So you want to select 25 split point from 49 choices in all possible ways. There are a lot of well known algorithms to do that.
I want to draw your attention to another side of this problem. There are 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 different combinations. So if you want to store it, the required amount of memory is 32TB * sizeof(Combination). I guess that it will pass 1 PB mark.
Now lets assume that you want to process generated data on the fly. Lets make rather optimistic assumption that you can process 1 million combinations per second (here i assume that there is no parallelization). So this task will take 10^7 seconds = 2777 hours = 115 days.
This problem is more complicated than it seems at first glance. If you want to solve if at home in reasonable time, my suggestion is to change the strategy or wait for the advance of quantum computers.
This will generate an array of all the ranges, but I warn you, it'll take tons of memory, due to the large numbers of results (50 elements with 3 splits is 49*48*47=110544) I haven't even tried to compile it, so there's probably errors, but this is the general algorithm I'd use.
typedef std::vector<int>::iterator iterator_t;
typedef std::pair<iterator_t, iterator_t> range_t;
typedef std::vector<range_t> answer_t;
answer_t F(std::vector<int> integers, int slices) {
answer_t prev; //things to slice more
answer_t results; //thin
//initialize results for 0 slices
results.push_back(answer(range(integers.begin(), integers.end()), 1));
//while there's still more slicing to do
while(slices--) {
//move "results" to the "things to slice" pile
prev.clear();
prev.swap(results);
//for each thing to slice
for(int group=0; group<prev.size(); ++group) {
//for each range
for(int crange=0; crange<prev[group].size(); ++crange) {
//for each place in that range
for(int newsplit=0; newsplit<prev[group][crange].size(); ++newsplit) {
//copy the "result"
answer_t cur = prev[group];
//slice it
range_t L = range(cur[crange].first, cur[crange].first+newsplit);
range_t R = range(cur[crange].first+newsplit), cur[crange].second);
answer_t::iterator loc = cur.erase(cur.begin()+crange);
cur.insert(loc, R);
cur.insert(loc, L);
//add it to the results
results.push_back(cur);
}
}
}
}
return results;
}

Resources