Distribute a quantity randomly - algorithm

I'm starting a project where I'm simulating an explosion of an object. I want to randomly distribute the total mass of the object that explodes into the fragments. For example, if the object has a mass of 3 kg and breaks into 3 fragments their masses could be 1, 0.5, 1.5 respectively. I want to do the same thing with energy and other things. Also, I would like to have control over the random distribution used.
I think I could do this simply by generating a random number, somehow relate it to the quantity I want to distribute and keep doing that while subtracting to the total pool. The problem with this approach is that on first sight it doesn't seem very efficient, and it may give problems for a fixed number of fragments.
So the question is, is there an algorithm or an efficient way to do this?
An example will be thoroughly appreciated.

For this problem, the first thing I would try is this:
Generate N-1 random numbers between 0 and 1
Sort them
Raise them to the xth power
Multiply the N differences between 0, successive numbers, and 1, by the quantity you want to distribute. Of course all these differences add up to 1, so you'll end up distributing exactly the target quantity.
A nice advantage of this method is that you can adjust the parameter x to get an aesthetically pleasing distribution of chunks. Natural explosions won't produce a uniform distribution of chunk sizes, so you'll want to play with this.

So here's a generic algorithm that might work for you:
Generate N random numbers using a distribution of your choosing
Find the sum of all the numbers
Divide each number by its sum
Multiply by the fixed total mass of your object
This will only take O(N) time, and will allow you to control the distribution and number of chunks.

Related

How to get a representative random number from a set of pseudo random numbers?

Let's say I got three pseudo random numbers from different pseudo random number generators.
Since the generators would reflect only a part of the real random number generating process, I believe that one way to get a number closer to real random might be to somehow get a "center" of the three pseudo random numbers.
An easy way to get that "center" would be to take average, median or mode (if any) of them.
I am wondering if there's a more sophisticated way due to the fact that they should represent random numbers.
Well, there is an approach, called entropy extractor, which allows to get (good) random numbers from not quite random source(s).
If you have three independent but somewhat low quality (biased) RNGs, you could combine them together into uniform source.
Suppose you have three generators giving you a single byte each, then uniform output would be
t = X*Y + Z
where addition and multiplication are done over GF(28) finite field.
Some code (Python)
def RNG1():
return ... # single random byte
def RNG2():
return ... # single random byte
def RNG3():
return ... # single random byte
from pyfinite import ffield
def muRNG():
X = RNG1()
Y = RNG2()
Z = RNG3()
GF = ffield.FField(8)
return GF.Add(GF.Multiply(X, Y), Z)
Paper where this idea was stated
Trying to use some form of "centering" turns out to be a bad idea if your goal is to have a better representation of the randomness.
First, a thought experiment. If you think three values gives more randomness, wouldn't more be even better? It turns out that if you take either the average or median of n Uniform(0,1) values, as n→∞ these both converge to 0.5, a point. It also happens to be the case that replacing distributions with a "representative" constant is generally a bad idea if you want to understand stochastic systems. As an extreme example, consider queues. As the arrival rate of customers/entities approaches the rate at which they can be served, stochastic queues get progressively larger on average. However, if the arrival and service distributions are constant, queues remain at zero length until the arrival rate exceeds the service rate, at which point they go to infinity. When the rates are equal, the stochastic queue would have infinite queues, while the deterministic queue would remain at its initial length (usually assumed to be zero). Infinity and zero are about as wildly different as you can get, illustrating that replacing distributions in a queueing model with their means would give you no understanding of how queues actually work.
Next, empirical evidence. Below histograms of the medians and averages constructed from 10,000 samples of three uniforms. As you can see, they have different distribution shapes but are clearly no longer uniform. Values bunch in the middle and are progressively rarer towards the endpoints of the range (0,1).
The uniform distribution has maximum entropy for continuous distributions on a closed interval, so both of these alternatives, being non-uniform, are clearly lower entropy, i.e., more predictable.
To get good random numbers, it's advisable to get some bits of entropy. Depending on whether they are used for security purposes or not, you could just get the time from the system clock as a seed for a random number generator, or use more sophisticated means. The project PWGen download | SourceForge.net is open-sourced, and monitors Windows events as a source of random bits of entropy.
You can find more info on how to random numbers in C++ from this SO ? too: Random number generation in C++11: how to generate, how does it work? [closed]. It turns out C++'s random numbers aren't always all that random: Everything You Never Wanted to Know about C++'s random_device; so looking for a good way to seed, i.e. by passing the time in mS to srand() and calling rand() might be a quick and dirty way to go.

Threshold to stop generating random unique things

Given a population size P, I must generate P random, but unique objects. An object is an unordered list of X unique unordered pairs.
I am currently just using a while loop with T attempts at generating a random ordering before giving up. Currently T = some constant.
So my question is at what point should I stop attempting to generate more unique objects i.e. the reasonable value of T.
For example:
1) If I have 3 unique objects and I need just one more, I can attempt up to e.g. 4 times
2) But if I have 999 unique objects and I need just one more, I do not want to make e.g. 1000 attempts
The problem I'm dealing with doesn't absolutely require every unique ordering. The user specifies the number actually, so I want to determine at what point to say that it is not reasonable to generate any more.
I hope that makes sense
If not, a more general case:
Choosing N numbers, at what value of T does it start to get very difficult to start generating more unique random numbers from the possible N.
I'm not sure if T would be the same in both cases but maybe this second case would be sufficient for my needs. I need a relatively large threshold for small values of N and a relatively small threshold for large values of N.
Not that it matters, but this is for a basic genetic algorithm.
Are you asking for something like lottery tickets/balls selection? For that there is a well-known shuffle algorithm - Fisher–Yates-Knuth shuffle.

Distribute numbers to two "containers" and minimize their difference of sum

Suppose there are n numbers let says we have the following 4 numbers 15,20,10,25
There are two container A and B and my job is to distribute numbers to them so that the sum of the number in each container have the least difference.
In the above example, A should have 15+20 and B should have 10+ 25. So difference = 0.
I think of a method. It seems to work but I don't know why.
Sort the number list in descending order first. In each round, take the maximum number out
and put to the container have less sum.
Btw, is it can be solved by DP?
THX
In fact, your method doesn't always work. Think about that 2,4,4,5,5.The result by your method will be (5,4,2)(5,4), while the best answer is (5,5)(4,4,2).
Yes, it can be solved by Dynamical Programming.Here are some useful link:
Tutorial and Code: http://www.cs.cornell.edu/~wdtseng/icpc/notes/dp3.pdf
A practice: http://people.csail.mit.edu/bdean/6.046/dp/ (then click Balanced Partition)
What's more, please note that if the scale of problem is damn large (like you have 5 million numbers etc.), you won't want to use DP which needs a too huge matrix. If this is the case, you want to use a kind of Monte Carlo Algorithm:
divide n numbers into two groups randomly (or use your method at this step if you like);
choose one number from each group, if (to swap these two number decrease the difference of sum) swap them;
repeat step 2 until "no swap occurred for a long time".
You don't want to expect this method could always work out with the best answer, but it is the only way I know to solve this problem at very large scale within reasonable time and memory.

Shuffling a huge range of numbers using minimal storage

I've got a very large range/set of numbers, (1..1236401668096), that I would basically like to 'shuffle', i.e. randomly traverse without revisiting the same number. I will be running a Web service, and each time a request comes in it will increment a counter and pull the next 'shuffled' number from the range. The algorithm will have to accommodate for the server going offline, being able to restart traversal using the persisted value of the counter (something like how you can seed a pseudo-random number generator, and get the same pseudo-random number given the seed and which iteration you are on).
I'm wondering if such an algorithm exists or is feasible. I've seen the Fisher-Yates Shuffle, but the 1st step is to "Write down the numbers from 1 to N", which would take terabytes of storage for my entire range. Generating a pseudo-random number for each request might work for awhile, but as the database/tree gets full, collisions will become more common and could degrade performance (already a 0.08% chance of collision after 1 billion hits according to my calculation). Is there a more ideal solution for my scenario, or is this just a pipe dream?
The reason for the shuffling is that being able to correctly guess the next number in the sequence could lead to a minor DOS vulnerability in my app, but also because the presentation layer will look much nicer with a wider number distribution (I'd rather not go into details about exactly what the app does). At this point I'm considering just using a PRNG and dealing with collisions or shuffling range slices (starting with (1..10000000).to_a.shuffle, then, (10000001, 20000000).to_a.shuffle, etc. as each range's numbers start to run out).
Any mathemagicians out there have any better ideas/suggestions?
Concatenate a PRNG or LFSR sequence with /dev/random bits
There are several algorithms that can generate pseudo-random numbers with arbitrarily large and known periods. The two obvious candidates are the LCPRNG (LCG) and the LFSR, but there are more algorithms such as the Mersenne Twister.
The period of these generators can be easily constructed to fit your requirements and then you simply won't have collisions.
You could deal with the predictable behavior of PRNG's and LFSR's by adding 10, 20, or 30 bits of cryptographically hashed entropy from an interface like /dev/random. Because the deterministic part of your number is known to be unique it makes no difference if you ever repeat the actually random part of it.
Divide and conquer? Break down into manageable chunks and shuffle them. You could divide the number range e.g. by their value modulo n. The list is constructive and quite small depending on n. Once a group is exhausted, you can use the next one.
For example if you choose an n of 1000, you create 1000 different groups. Pick a random number between 1 and 1000 (let's call this x) and shuffle the numbers whose value modulo 1000 equals x. Once you have exhausted that range, you can choose a new random number between 1 and 1000 (without x obviously) to get the next subset to shuffle. It shouldn't exactly be challenging to keep track of which numbers of the 1..1000 range have already been used, so you'd just need a repeatable shuffle algorithm for the numbers in the subset (e.g. Fisher-Yates on their "indices").
I guess the best option is to use a GUID/UUID. They are made for this type of thing, and it shouldn't be hard to find an existing implementation to suit your needs.
While collisions are theoretically possible, they are extremely unlikely. To quote Wikipedia:
The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs

Genetic Algorithm Implementation for weight optimization

I am a data mining student and I have a problem that I was hoping that you guys could give me some advice on:
I need a genetic algo that optimizes the weights between three inputs. The weights need to be positive values AND they need to sum to 100%.
The difficulty is in creating an encoding that satisfies the sum to 100% requirement.
As a first pass, I thought that I could simply create a chrom with a series of numbers (ex.4,7,9). Each weight would simply be its number divided by the sum of all of the chromosome's numbers (ex. 4/20=20%).
The problem with this encoding method is that any change to the chromosome will change the sum of all the chromosome's numbers resulting in a change to all of the chromosome's weights. This would seem to significantly limit the GA's ability to evolve a solution.
Could you give any advice on how to approach this problem?
I have read about real valued encoding and I do have an implementation of a GA but it will give me weights that may not necessarily add up to 100%.
It is mathematically impossible to change one value without changing at least one more if you need the sum to remain constant.
One way to make changes would be exactly what you suggest: weight = value/sum. In this case when you change one value, the difference to be made up is distributed across all the other values.
The other extreme is to only change pairs. Start with a set of values that add to 100, and whenever 1 value changes, change another by the opposite amount to maintain your sum. The other could be picked randomly, or by a rule. I'd expect this would take longer to converge than the first method.
If your chromosome is only 3 values long, then mathematically, these are your only two options.

Resources