how to generate longer random number from a short random number? - random

I have a short random number input, let's say int 0-999.
I don't know the distribution of the input. Now I want to generate a random number in range 0-99999 based on the input without changing the distribution shape.
I know there is a way to make the input to [0,1] by dividing it by 999 and then multiple 99999 to get the result. However, this method doesn't cover all the possible values, like 99999 will never get hit.

Assuming your input is some kind of source of randomness...
You can take two consecutive inputs and combine them:
input() + 1000*(input()%100)
Be careful though. This relies on the source having plenty of entropy, so that a given input number isn't always followed by the same subsequent input number. If your source is a PRNG designed to cycle between the numbers 0–999 in some fashion, this technique won't work.
With most production entropy sources (e.g., /dev/urandom), this should work fine. OTOH, with a production entropy source, you could fetch a random number between 0–99999 fairly directly.

You can try something like the following:
(input * 100) + random
where random is a random number between 0 and 99.
The problem is that input only specifies which 100 range to use. For instance 50 just says you will have a number between 5000 and 5100 (to keep a similar shape distribution). Which number between 5000 and 5100 to pick is up to you.

Related

Generate random looking numbers deterministically from a random-access lookup key in O(1) space and time

I want to output random looking numbers based on an input. If the same input is put in, the same output is given.
I don't want to pregenerate and store a bunch of random data, and I don't want it to take an O(n) amount of time to recover the nth index.
It does not need to be secure, cryptographically or otherwise, just enough to look random.
If you want a deterministic random-access function from an (index,length) pair to a random looking string of bytes you could use SHA3-N(index)[:length] where N is the first convenient number greater than length.
This would not behave identically to an actual array as reading indexes 1 (with length 10) and 5 (with length 10) would not have any overlap (which you'd expect from an array).
This is going to be slow and very inconvenient for N>512, so if you need longer strings you'll want to do multiple rounds. Something like SHA3-512(SHA3-512(index)[0:256])++SHA3-512(SHA3-512(index)[256:512]) to get something 1024bytes long.
Armed with the multiple rounds part you could use any hash function (e.g. SHA256, MD5) which might be more convenient.
I should note that this is definitely not secure and the output could easily be predicted by an adversary.
Typically, a random number generator will generate the same sequence of pseudo-random numbers given the same seed. For example, such python code might be like so:
random.seed(1)
for i in range(1, 10):
print(random.randint(1,100)
Will print the same list no matter how many times you invoke that code. Similarly, so will this:
random.seed(42)
for i in range(1, 10):
print(random.randint(1,100)
If somehow you then describe the sections of your array as a seed (you could use a hash function to do this indeed) you can seed the generator with that value and reliably allow dynamic sizing of the list requested.

Random number that prefers a range

I'm trying to write a program that requires me to generator a random number.
I also need to make it so there's a variable chance to pick a set range.
In this case, I would be generating between 1-10 and the range with the percent chance is 7-10.
How would I do this? Could I be supplied with a formula or something like that?
So if I'm understanding your question, you want two number ranges, and a variable-defined probability that the one range will be selected. This can be described mathematically as a probability density function (PDF), which in this case would also take your "chance" variable as an argument. If, for example, your 7-10 range is more likely than the rest of the 1-10 range, your PDF might look something like:
One PDF such as a flat distribution can be transformed into another via a transformation function, which would allow you to generate a uniformly random number and transform it to your own density function. See here for the rigorous mathematics:
http://www.stat.cmu.edu/~shyun/probclass16/transformations.pdf
But since you're writing a program and not a mathematics thesis, I suggest that the easiest way is to just generate two random numbers. The first decides which range to use, and the second generates the number within your chosen range. Keep in mind that if your ranges overlap (1-10 and 7-10 obviously do) then the overlapping region will be even more likely, so you will probably want to make your ranges exclusive so you can more easily control the probabilities. You haven't said what language you're using, but here's a simple example in Python:
import random
range_chance = 30 #30 percent change of the 7-10 range
if random.uniform(0,100) < range_chance:
print(random.uniform(7,10))
else:
print(random.uniform(1,7)) #1-7 so that there is no overlapping region

Need an Algorithm to generate Serialnumber

I want to generate 16-digits hexadecimal serial-number like: F204-8BE2-17A2-CFF3.
(This pattern give me 16^16 distinct serial-number But I don't need all of them)
I need you all to suggest me an algorithm to generate these serial-numbers randomly with an special characteristic which is:
each two serial-numbers have (at-least) 6 different digits
(= It means if you are given two most similar serial-number, they should still have difference in 6 indexes)
I know that a good algorithm with this characteristic needs to remember previously generated serial-numbers and I don't want that much.
In fact, I need an algorithm which do this with least probability for a chosen pair to collide (less than 0.001 seems sufficient )
PS:
I've just tried to create 10K string randomly using MD5 hash and It gave similar string( similar=more than 3 same digits) with 0.00018 probability.
It is possible to construct a correct generator without having to remember all previously generated codes. You can generate serial numbers that are spaced 6 characters apart by using Hamming code. A hamming code can be designed to arbitrarily space out two distinct generated values. Obviously, the greater the distance, the higher redundancy you will have to use, resulting in more complex code and longer numbers.
First you design a hamming code to your liking, that encodes a number into a sequence of hexadecimal digits and then you can take any sequence of numbers and use it as a seed, such as prime numbers. You just always need to remember, what number was used last and use the next one.
That being said, if you don't need to properly ensure minimal distance of two serials, and would settle for a small error, I would suggest that any half decent hash function or cypher should produce decently spaced out outputs. Therefore the first thing I would try to do is to take MD5 or SHA hashes and test-drive them on numbers 1 - 1000. My hopes are, the results will be quite satisfactory.
I suggest you look into the ANSI X9.17 pseudorandom bit generator. An algorithmic sketch is given in these slides. ANSI X9.17 generates 64-bit pseudorandom strings which is what you want.
A revised and enhanced version of this generator was approved by NIST. Please have a look at this page.
Now whether you use ANSI X9.17 generator, another generator, or develop your own, it's a good idea to have the generator pass some statistical tests in order to ensure the quality of its pseudorandom bits.
Example tests include the ENT battery, the DIEHARD battery, and the NIST battery.

Make CURAND generate positive different random numbers less than a specific number

I am trying to use CURAND library to generate random numbers which are completely independent of each other. Hence I want to give different seeds to each thread.
So, Question 1: How do I give different seeds to each thread?(Is there some time function in CUDA which I can use?)
Now I also want to generate this random number between a range i.e 0 to 10000. How do I accomplish that to happen.
Currently I am using curand_normal (as I want to have numbers from normal distribution) but its giving me negative and same numbers which I do not want.
Setting different seeds is not a statistically sound way to get independent (non-correlated) random numbers (with any single random number generator). You would be better off selecting different sub-sequences of a single sequence, and most random number libraries will allow you to do that, including cuRAND.
Check out the examples in the CUDA SDK, for example the EstimatePiP or EstimatePiInlineP examples use cuRAND to generate pseudo-random numbers.
For the second part of your question, as mentioned in the cuRAND manual the curand_normal() routines return Normally distributed numbers with mean 0.0 and standard deviation 1.0 (i.e. Standard Normal Distribution). Clearly that means that you will have ~50% negative numbers.
It doesn't make sense to specify a fixed range along with the Normal distribution. You either want some other distribution (e.g. Uniform) with the fixed range or else you want the Normal distribution with a specific mean and standard distribution. To get from the Standard Normal to your target mean/std.dev. you simply multiply the random draw by the target standard deviation and add the target mean.

Find the count of a particular number in an infinite stream of numbers at a particular moment

I faced this problem in a recent interview:
You have a stream of incoming numbers in range 0 to 60000 and you have a function which will take a number from that range and return the count of occurrence of that number till that moment. Give a suitable Data structure/algorithm to implement this system.
My solution is:
Make an array of size 60001 pointing to bit-vectors. These bit vectors will contain the count of the incoming numbers and the incoming numbers will also be used to index into the array for the corresponding number. Bit-vectors will dynamically increase as the count gets too big to hold in them.
So, if the numbers are coming at rate 100numbers/sec then, in 1million years total numbers will be = (100*3600*24)*365*1000000 = 3.2*10^15. In the worst case where all numbers in the stream is same it will take ceil((log(3.2*10^15) / log 2) )= 52bits and if the numbers are uniformly distributed the we will have (3.2*10^15) / 60001 = 5.33*10^10 number of occurrences for each number which will require total of 36 bits for each numbers.
So, assuming 4byte pointers we need (60001 * 4)/1024 = 234 KB memory for the array and for the case with same numbers, we need bit vector size = 52/8 = 7.5 bytes which is still around 234KB. And for the other case we need (60001 * 36 / 8)/1024 = 263.7 KB for bit vector totaling about 500KB. So, it is very much feasible to do this with ordinary PC and memory.
But the interviewer said, as it is infinite stream it will eventually overflow and gave me hint like how can we do this if there were many PCs and we could pass messages between them or think about file system etc. But I kept thinking if this solution was not working then, others would too. Needless to say, I did not get the job.
How to do this problem with less memory? Can you think of an alternative approach (using network of PCs may be)?
A formal model for the problem could be the following.
We want to know if it exists a constant space bounded Turing machine such that, in any given time it recognizes the language L of all couples (number,number of occurrences so far). This means that all correct couples will be accepted and all incorrect couples will be rejected.
As a corollary of the Theorem 3.13 in Hopcroft-Ullman we know that every language recognized by a constant space bounded machine is regular.
It can be proven by using the pumping lemma for regular languages that the language described above is not a regular language. So you can't recognize it with a constant space bounded machine.
you can easily use index based search, by using an array like int arr[60000][1], whenever you get a number , say 5000, directly access the index( num-1) = (5000-1) as, arr[num-1][1], and increment the number, and now whenever u want to know how many times a particular num has ocurred you can just access it by arr[num-1][1] and you'll get the count for that number, Its simplest possible linear time implementation.
Isn't this External Sorting? Store the infinite stream in a file. Do a seek() (RandomAccessFile.seek() in Java) in the file and get to the appropriate timestamp. This is similar to Binary Search since the data is sorted by timestamps. Once you get to the appropriate timestamp, the problem turns into counting a particular number from an infinite set of numbers. Here, instead of doing a quick sort in memory, Counting sort can be done since the range of numbers is limited.

Resources