I have 4 integers with which I want to convert to a seed in order to generate a random number. I understand this is arbitrary for the most part, I do however want to make sure what I am currently doing is not overkill (or doesn't generate enough spread in seed values).
I have roughly 1000 objects which I want to have random properties based on some of their variables.
Two variables are constant and are of the 0 - 1000 range and are random for each object, duplicates can occur but this is not likely at all (constant1 and constant2). The other two variables change with deltas of 1 over long time periods through the running of the program, start at 0, can be anywhere within the signed int32 range but will tend to be between -100 and 100 (variable1 and variable2).
How do you suitably generate a seed from these 4 values?
You should probably initialize Random generator only once, when class instance is initialized, so you should use only 2 of the properties (the other 2 are set to 0 by default, aren't they?) to get a seed.
Because of 1. and assuming that constant1 and constant2 are random by default within 0-1000, you can use constant1 * 1000 + constant2 to get random number between 0 and 1000000. I'm not sure about the randomness distribution, but it should be enough to get a seed.
Update
If you really need to get the seed depend on other two variables, you can follow the pattern and do it as follows:
var seed = ((variable1 * 200 + variable) * 1000 + constant1) * 1000 + constant2;
but because it exceeds Int32 range you have to do that in unsafe context to prevent OverflowException being thrown.
And the last thing: I'm not 100% sure it will give you normalized distribution of generated values.
Related
I'm looking for a pseudo random number generator which has the following properties:
Non-repeating: The returned numbers must be unique until all numbers from 0 to n have been returned once, only then it can repeat each number once more, etc.
Deterministic: If I used the same seed twice it needs to result in the same sequence.
Few allocations: It should not require to allocate a large memory area in order to then mix its data up like sequence permutations would.
My goal is that I could initialize the random number generator with some seed value and then continuously call its function to generate the next number in the sequence, possibly passing it the previous one.
One possible method is a block cypher. Encrypt the numbers 0, 1, 2, ... with a given key and the output is guaranteed unique, and will only repeat once the block size is passed. Each key will generate a different permutation. You just need to keep track of the key and the last number you encrypted.
DES uses a 64 bit block and AES uses a 128 bit block. If those sizes don't suit then you need to look at Format preserving encryption for an appropriately sized block.
One point to note, a non-repeating generator is not random. As more numbers are generated the pool of unused numbers shrinks, until the last number is fully determined. You need to consider if this is important in your application.
I created a counter that goes up from 0 to 9999 until it resets again. I use the output of this counter as a value to make unique entries. However, the application needs to find its last created number each time the application is restarted. Therfore I am looking for a method which avoids any sort of object storage and relies solely on random number generation.
Something like:
int randomTimeBasedGenerator() {
Random r = new Random(System.currentTimeMillis())
int num = r.nextInt() % 9999
return num
}
But what guarantee do I have that this method generates unique numbers? And, if not, how long would it remain unique? Are there any study papers I can look into for this sort of scenario?
Random number generation would be an elegant solution for my situation, if I can at least guarantee it won't repeat within a couple of weeks or months. But random number generation would be useless in my case if no such guarantee exists.
You have no guarantee that the return value of a random number generator remains unique. Random number generators generate unique sequences of numbers, not unique numbers. Random numbers will always repeat themselves, sooner or later.
As suggested by #Thilo, UUIDs are unique numbers. But an even better approach in your case might be to set up a lightweight database (sqlite will do) and add a record to a table with incremental id's. It is not possible to keep track of a process without storing values somewhere.
I have a short random number input, let's say int 0-999.
I don't know the distribution of the input. Now I want to generate a random number in range 0-99999 based on the input without changing the distribution shape.
I know there is a way to make the input to [0,1] by dividing it by 999 and then multiple 99999 to get the result. However, this method doesn't cover all the possible values, like 99999 will never get hit.
Assuming your input is some kind of source of randomness...
You can take two consecutive inputs and combine them:
input() + 1000*(input()%100)
Be careful though. This relies on the source having plenty of entropy, so that a given input number isn't always followed by the same subsequent input number. If your source is a PRNG designed to cycle between the numbers 0–999 in some fashion, this technique won't work.
With most production entropy sources (e.g., /dev/urandom), this should work fine. OTOH, with a production entropy source, you could fetch a random number between 0–99999 fairly directly.
You can try something like the following:
(input * 100) + random
where random is a random number between 0 and 99.
The problem is that input only specifies which 100 range to use. For instance 50 just says you will have a number between 5000 and 5100 (to keep a similar shape distribution). Which number between 5000 and 5100 to pick is up to you.
I have a set of events that must occur randomly, but in a predefined frequency. i.e over a course of (totally) infinite events, event A should have occured 10% of the times, event B should have occured 3%, and so on... Of course the total sum of the percentages of the event list will add upto 100.
I want to achieve this programmatically. How do I do this?
You haven't specified a language, so here comes some pseudo-code
You basically want a function which will call other functions with various probabilities
Function RandomEvent
float roll = Random() -- Random number between 0 and 1
if roll < 0.1 then
EventA
else if roll < 0.13 then
EventB
....
interesting description. Without specific details constricting impementation, I can only offer an idea that you can modify to fit into the choices you've already made about your implementation. If you have a file for which every line contains a single event, then construct the file to have 10% A lines, 3% B lines, etc. Then when choosing an event, get an integer randomly generated to select a line number from the file.
You have to elaborate a little more on what you mean. If you just want the probabilities to be as you described, just pick a random number between 1-100 and map it to the events. That is, if the random number is 1-10, do Event A. If it's 11-13, do Event B, etc.
However, if you require things to come out exactly with those proportions at all times (not that this is really possible), you have to do it differently. Please confirm which meaning you are looking for and I'll edit if needed.
For each event, generate a random number between 0 and 100. If event A should occur 10% of the times, map values 0 - 10 to event A, and so on.
For instance, for 2 events:
n = 0 - 10 ==> Event A
n = 11 - 99 ==> Event B
If you do this, you can have your events occur at random times, and if the running time is long enough (and your RNG is good enough), event frequencies will add up to the desired percentage.
Generate a sequence of events in the exact proportions you want.
For each event, randomly generate a timestamp when each event should be delivered, within your time bounds.
Sort by that timestamp
Run through the list, delivering each event at the appropriate time.
Choose a random number from 1 to 100 inclusive. Assign each event a unique set of integers that represents the frequency that it should occur. If you randomly generated number falls within that particular selected range of numbers fire the associated event.
In the example above the event that should show 10% of the time you would assign it a range of integers 10 integers long (1-10, 12-21, etc...). How you store these integer rangess is up to you.
Like Michael said, since these are random numbers there is no way to guarantee said event fires exactly 10% of the time but over the long run it should...given an even distribution of random numbers.
I'm working on an application where I need to generate unique, non-sequential IDs. One of the constraints I have is that they must consist of 3 digits followed by 2 letters (only about 600k IDs). Given my relatively small pool of IDs I was considering simply generating all possible IDs, shuffling them and putting them into a database. Since, internally, I'll have a simple, sequential, ID to use, it'll be easy to pluck them out one at a time & be sure I don't have any repeats.
This doesn't feel like a very satisfying solution. Does anyone out there have a more interesting method of generating unique IDs from a limited pool than this 'lottery' method?
This can be done a lot of different ways, depending on what you are trying to optimize (speed, memory usage, etc.).
ID pattern = ddd c1c[0]
Option 1 (essentially like hashing, similar to Zak's):
1 Generate a random number between 0 and number of possibilities (676k).
2- Convert number to combination
ddd = random / (26^2)
c[0] = random % (26)
c[1] = (random / 26) % 26
3- Query DB for existence of ID and increment until a free one is found.
Option 2 (Linear feedback shift register, see wikipedia):
1- Seed with a random number in range (0,676k). (See below why you can't seed with '0')
2- Generate subsequent random numbers by applying the following to the current ID number
num = (num >> 1) ^ (-(num & 1u) & 0x90000u);
3- Skip IDs larger than range (ie 0xA50A0+)
4- Convert number into ID format (as above)
*You will need to save the last number generated that was used for an ID, but you won't need to query the DB to see if it is used. This solution will enumerate all possible IDs except [000 AA] due to the way the LFSR works.
[edit] Since your range is actually larger than you need, you can get back [000 AA] by subtracting 1 before you convert to the ID and have your valid range be (0,0xA50A0]
Use a finite group. Basically, take a 32 or 64-bit integer, and find a large number that is coprime to the maximum value for your integer; call this number M. Then, for all integers n, n * M will result in a unique number that has lots of digits.
This has the advantage that you don't need to pre-fill the database, or run a separate select query -- you can do this all from within one insert statement, by having your n just be an auto-increment, and have a separate ID column that defaults to the n * M.
You could generate a random ID conforming to that standard, do a DB select to see if it exists already, then insert it into a DB to note it has been "used". For the first 25% of the life of that scheme (or about 150k entries), it should be relatively fast to generate new random ID's. After that though, it will take longer and longer, and you might as well pre-fill the table to look for free IDs.
Depending on what you define as sequential, you could just pick a certain starting point on the letters, such as 'aa', and just loop through the three digits, so it would be:
001aa
002aa
003aa
Once you get to zz then increment the number part.
You could use modular arithmetic to generate ids. Pick a number that is coprime with 676,000 and for a seed. id is the standard incrementing id of the table. Then the following pseudocode is what you need:
uidNo = (id * seed) % 676000
digits = uidNo / 676
char1 = uidNo % 26
char2 = (uidNo / 26) % 26
uidCode = str(digits) + chr(char1+65) + chr(char2+65)
If a user has more than one consecutively issued id, they could guess the algorithm and the seed and generate all the ids in order. This may mean the algorithm is not secure enough for your use case.