In one of my project I encountered a need to generate a set of numbers in a given range that will be:
Exhaustive, which means that it will cover the most of the given
range without any repetition.
It will guarantee determinism (every time the sequence will be the
same). This can be probably achieved with a fixed seed.
It will be random (I am not very versed into Random Number Theory, but I guess there is a bunch of rules that describes randomness. From perspective something like 0,1,2..N is not random).
Ranges I am talking about can be ranges of integers, or of real numbers.
For example, if I used standard C# random generator to generate 10 numbers in range [0, 9] I will get this:
0 0 1 2 0 1 5 6 2 6
As you can see, a big part of given range still remains 'unexplored' and there are many repetitions.
Of course, input space can be very large, so remembering previously chosen values is not an option.
What would be the right way to tackle this problem?
Thanks.
After the comments:
Ok i agree that the random is not the right word, but I hope that you understood what I am trying to achieve. I want to explore given range that can be big so in memory list is not an option. If a range is (0, 10) and i want three numbers i want to guarantee that those numbers will be different and that they will 'describe the range' (i.e. They wont all be in a lower half etc).
Determinism part means that i would like to use something like standard rng with a fixed seed, so I can fully control the sequence.
I hope i made things a bit clearer.
Thanks.
Here's three options with different tradeoffs:
Generate a list of numbers ahead of time, and shuffle them using the fisher-yates shuffle. Select from the list as needed. O(n) total memory, and O(1) time per element. Randomness is as good as the PRNG you used to do the shuffle. The simplest of the three alternatives, too.
Use a Linear Feedback Shift Register, which will generate every value in its sequence exactly once before repeating. O(log n) total memory, and O(1) time per element. It's easy to determine future values based on the present value, however, and LFSRs are most easily constructed for power of 2 periods (but you can pick the next biggest power of 2, and skip any out of range values).
Use a secure permutation based on a block cipher. Usable for any power of 2 period, and with a little extra trickery, any arbitrary period. O(log n) total space and O(1) time per element, randomness is as good as the block cipher. The most complex of the three to implement.
If you just need something, what about something like this?
maxint = 16
step = 7
sequence = 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9, 0
If you pick step right, it will generate the entire interval before repeating. You can play around with different values of step to get something that "looks" good. The "seed" here is where you start in the sequence.
Is this random? Of course not. Will it look random according to a statistical test of randomness? It might depend on the step, but likely this will not look very statistically random at all. However, it certainly picks the numbers in the range, not in their original order, and without any memory of the numbers picked so far.
In fact, you could make this look even better by making a list of factors - like [1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16] - and using shuffled versions of those to compute step * factor (mod maxint). Let's say we shuffled the example factors lists like [3, 2, 4, 5, 1], [6, 8, 9, 10, 7], [13, 16, 12, 11, 14, 15]. then we'd get the sequence
5, 14, 12, 3, 7, 10, 8, 15, 6, 1, 11, 0, 4, 13, 2, 9
The size of the factors list is completely tunable, so you can store as much memory as you like. Bigger factor lists, more randomness. No repeats regardless of factor list size. When you exhaust a factor list, generating a new one is as easy as counting and shuffling.
It is my impression that what you are looking for is a randomly-ordered list of numbers, not a random list of numbers. You should be able to get this with the following pseudocode. Better math-ies may be able to tell me if this is in fact not random:
list = [ 1 .. 100 ]
for item,index in list:
location = random_integer_below(list.length - index)
list.switch(index,location+index)
Basically, go through the list and pick a random item from the rest of the list to use in the position you are at. This should randomly arrange the items in your list. If you need to reproduce the same random order each time, consider saving the array, or ensuring somehow that random_integer_below always returns numbers in the same order given some seed.
Generate an array that contains the range, in order. So the array contains [0, 1, 2, 3, 4, 5, ... N]. Then use a Fisher-Yates Shuffle to scramble the array. You can then iterate over the array to get your random numbers.
If you need repeatability, seed your random number generator with the same value at the start of the shuffle.
Do not use a random number generator to select numbers in a range. What will eventually happen is that you have one number left to fill, and your random number generator will cycle repeatedly until it selects that number. Depending on the random number generator, there is no guarantee that will ever happen.
What you should do is generate a list of numbers on the desired range, then use a random number generator to shuffle the list. The shuffle is known as the Fisher-Yates shuffle, or sometimes called the Knuth shuffle. Here's pseudocode to shuffle an array x of n elements with indices from 0 to n-1:
for i from n-1 to 1
j = random integer such that 0 ≤ j ≤ i
swap x[i] and x[j]
Related
How to form a combination of say 10 questions so that each student (total students = 10) get unique combination.
I don't want to use factorial.
you can use circular queue data structure
now you can cut this at any point you like , and it then it will give you a unique string
for example , if you cut this at point between 2 and 3 and then iterate your queue, you will get :
3, 4, 5, 6, 7, 8, 9, 10, 1, 2
so you need to implement a circular queue, then cut it from 10 different points (after 1, after 2[shown in picture 2],after 3,....)
There are 3,628,800 different permutations of 10 items taken 10 at a time.
If you only need 10 of them you could start with an array that has the values 1-10 in it. Then shuffle the array. That becomes your first permutation. Shuffle the array again and check to see that you haven't already generated that permutation. Repeat that process: shuffle, check, save, until you have 10 unique permutations.
It's highly unlikely (although possible) that you'll generate a duplicate permutation in only 10 tries.
The likelihood that you generate a duplicate increases as you generate more permutations, increasing to 50% by the time you've generated about 2,000. But if you just want a few hundred or less, then this method will do it for you pretty quickly.
The proposed circular queue technique works, too, and has the benefit of simplicity, but the resulting sequences are simply rotations of the original order, and it can't produce more than 10 without a shuffle. The technique I suggest will produce more "random" looking orderings.
We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:
We have arrays of numbers
{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}
some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.
Explanation:
{1, 2}
{2, 3, 4, 5}
{2, 7}
We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.
In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.
This problem is equivalent to the Set Cover problem (e.g.: take N=0) and thus efficient, exact solutions that work in general are unlikely. However, in practice, heuristics and approximations are often good enough. Given the similarity of your problem with Set Cover, the greedy heuristic is a natural starting point. Instead of stopping when you've covered all elements, stop when you've covered all but N elements.
You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is O(n²) since for each element, you need to check through all arrays if it's unique.
You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.
After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to N.That's similar to the subset sum problem, but much less complex because N > 0 and all your numbers are > 0.
So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum < N.
Finally, you can remove every array that corresponds to a number which you were able to fit into N.
I have a set of frequency values and I'd like to find the most likely subsets of values meeting the following conditions:
values in each subset should be harmonically related (approximately multiples of a given value)
the number of subsets should be as small as possible
every subset should have a minimum number of missing harmonics smaller than the highest value
E.g. [1,2,3,4,10,20,30] should return [1,2,3,4] and [10,20,30] (a set with all the values is not optimal because, even if they are harmonically related, there are many missing values)
The brute force method could be to compute all the possible subsets of values in the sets and compute some cost value, but that would take way too long time.
Is there any efficient algorithm to perform this task (or something similar)?
I would reduce the problem to minimum set cover, which, although NP-hard, often is efficiently solvable in practice via integer programming. I'm assuming that it would be reasonable to decompose [1, 2, 3, 4, 8, 12, 16] as [1, 2, 3, 4] and [4, 8, 12, 16], with 4 repeating.
To solve set cover (well, to use stock integer-program solvers, anyway), we need to enumerate all of the maximal allowed subsets. If the fundamental (i.e., the given value) must belong to the set, then, for each frequency, we can enumerate its multiples in order until too many in a row are missing. If not, we try all pairs of frequencies, assume that their fundamental is their approximate greatest common divisor, and extend the subset downward and upward until too many frequencies are missing.
ok so here is the problem.
let's say:
1 means Bob
2 means Jerry
3 means Tom
4 means Henry
any summation combination of two of aforementioned numbers is a status/ mood type which is how the program will be encoded:
7 (4+3) means Angry
5 (3+2) menas Sad
3 (2+1) means Mad
4 (3+1) means Happy
and so on...
how may i create a decode function such that it accepts one of the added (encoded) values, such as 7, 5, 3, 4, etc and figures out the combination and return the names of the people representing the two numbers that constitue the combination. take note that one number cannot be repeated to get mood result, meaning 4 has to be 3+1 and may not be 2+2. so we can assume for this example, that there is only one possible combination for each status/ mood code. now the problem is, how do you implement such code in python 3? what would be the algorithm or logic for such a problem. how do you seek or check for combination of two numbers? i'm thinking i should just run a loop that keeps on adding two numbers at a time until the result matches with the status/ mood code. will that work? BUT THIS METHOD WILL SOON BECOME OBSOLETE IF THE NUMBER OF COMBINATIONS IS INCREASED (as in adding 4 numbers together instead of 2). doing it this way will take up a lot of time and will possibly be inefficient.
i apologize, i know this questions is extremely confusing but please bear with me.
let's try and work something out.
Use Binary
If you want to have sums that are unique, then assign each possible "Person" a number that's a power of 2. The sum of any combination of these numbers will uniquely identify which numbers were used in the sum.
1, 2, 4, 8, 16, ...
Rather than offer a detailed proof of correctness, I offer an intuitive argument about this: any number can be represented in base 2, and it is always a sum of exactly one combination of powers of 2.
This solution may not be optimal. It has realistic limitations (32 or 64 different "person" identifiers, unless you use some sort of BigInt), but depending on your needs, it might work. Having the smallest possible values, binary is better than any other radix though.
Example
(Edited)
Here's a quick snippet that demonstrates how you could decode the sum. The returned values are the exponents of the powers of 2. count_persons could be arbitrarily large, as could the range of n iterated over (just as a quick example).
#!/usr/bin/python3
count_persons = 64
for n in range(20,30):
matches = list(filter(lambda i: (n>>i) & 0x1, range(1,count_persons)))
print('{0}: {1}'.format(n,matches))
Output:
20: [2, 4]
21: [2, 4]
22: [1, 2, 4]
23: [1, 2, 4]
24: [3, 4]
25: [3, 4]
26: [1, 3, 4]
27: [1, 3, 4]
28: [2, 3, 4]
29: [2, 3, 4]
See a more appropriate answer here
In my opinion, the selected answer is so suboptimal that it can be considered plain wrong.
The table you are building can be indexed with N(N-1)/2 values, while the binary approach uses 2N.
With a 64 bits unsigned integer, you could encode about sqrt(265) values, that is 6 billion names, compared with the 64 names the binary approach will allow.
Using a big number library could push the limit somewhat, but the computations involved would be hugely more costly than the simple o(N) reverse indexing algorithm needed by the alternative approach.
My conclusion is: the binary approach is grossly inefficient, unless you want to play with a handful of values, in which case hard-coding or precomputing the indexes would be just as good a solution.
Since the question is very unlikely to match a search on the subject, it is not that important anyway.
I'm coding a question on an online judge for practice . The question is regarding optimizing Bogosort and involves not shuffling the entire number range every time. If after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle Johnny gets (1, 2, 5, 4, 3, 6) he will fix 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm.
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions.
A sample input/output says that for n=6, the answer is 1826/189.
I don't quite understand how the answer was arrived at.
This looks similar to 2011 Google Code Jam, Preliminary Round, Problem 4, however the answer is n, I don't know how you get 1826/189.