simulating diffusion with cellular automata - algorithm

I am trying to implement some kind of artificial life simulation with cells propagating molecule concentration between them using an 2d cellular automata.
problem is that I cannot figure out an algorithm for these diffusion processes. I first thought, "ok Minecraft did something like that with their redstone stuff it can't be this hard"..
My implementation currently iterates over 2d array of cells where each cell has a number of molecules count. then I use kind of convolution with 3x3 kernel to propagate values if saturation of cells are not fulfilled.
For example start values like this:
100, 0, 0
0, 0, 0
0, 0, 0
leads to this configuration:
96, 1, 1
1, 1, 0
0, 0, 0
how can the cells in the middle "know" that they must propagate further although they are already saturated?
or may is simulating diffusion just not possible that way?
The simulation does not need to be physically correct. What only matters is that the molecules will distribute more or less evenly among the cells, the total number of molecules should not change in that process. Small rounding losses are acceptable.

Related

Evenly-spaced samples from a stream of unknown length

I want to sample K items from a stream of N items that I see one at a time. I don't know how big N is until the last item turns up, and I want the space consumption to depend on K rather than N.
So far I've described a reservoir sampling problem. The major ask though is that I'd like the samples to be 'evenly spaced', or at least more evenly spaced than reservoir sampling manages. This is vague; one formalization would be that the sample indices are a low-discrepancy sequence, but I'm not particularly tied to that.
I'd also like the process to be random and every possible sample to have a non-zero probability of appearing, but I'm not particularly tied to this either.
My intuition is that this is a feasible problem, and the algorithm I imagine preferentially drops samples from the 'highest density' part of the reservoir in order to make space for samples from the incoming stream. It also seems like a common enough problem that someone should have written a paper on it, but Googling combinations of 'evenly spaced', 'reservoir', 'quasirandom', 'sampling' haven't gotten me anywhere.
edit #1: An example might help.
Example
Suppose K=3, and I get items 0, 1, 2, 3, 4, 5, ....
After 3 items, the sample would be [0, 1, 2], with spaces of {1}
After 6 items, I'd like to most frequently get [0, 2, 4] with its spaces of {2}, but commonly getting samples like [0, 3, 5] or [0, 2, 4] with spaces of {2, 3} would be good too.
After 9 items, I'd like to most frequently get [0, 4, 8] with its spaces of {4}, but commonly getting samples like [0, 4, 7] with spaces of {4, 3} would be good too.
edit #2: I've learnt a lesson here about providing lots of context when requesting answers. David and Matt's answers are promising, but in case anyone sees this and has a perfect solution, here's some more information:
Context
I have hundreds of low-res videos streaming through a GPU. Each stream is up to 10,000 frames long, and - depending on application - I want to sample 10 to 1000 frames from each. Once a stream is finished and I've got a sample, it's used to train a machine learning algorithm, then thrown away. Another stream is started in its place. The GPU's memory is 10 gigabytes, and a 'good' set of reservoirs occupies a few gigabytes in the current application and plausibly close to the entire memory in future applications.
If space isn't at a premium, I'd oversample using the uniform random reservoir algorithm by some constant factor (e.g., if you need k items, sample 10k) and remember the index that each sampled item appeared at. At the end, use dynamic programming to choose k indexes to maximize (e.g.) the sum of the logs of the gaps between consecutive chosen indexes.
Here's an algorithm that doesn't require much extra memory. Hopefully it meets your quality requirements.
The high-level idea is to divide the input into k segments and choose one element uniformly at random from each segment. Given the memory constraint, we can't make the segments as even as we would like, but they'll be within a factor of two.
The simple version of this algorithm (that uses 2k reservoir slots and may return a sample of any size between k and 2k) starts by reading the first k elements, then proceeds in rounds. In round r (counting from zero), we read k 2r elements, using the standard reservoir algorithm to choose one random sample from each segment of 2r. At the end of each round, we append these samples to the existing reservoir and do the following compression step. For each pair of consecutive elements, choose one uniformly at random to retain and discard the other.
The complicated version of this algorithm uses k slots and returns a sample of size k by interleaving the round sampling step with compression. Rather than write a formal description, I'll demonstrate it, since I think that will be easier to understand.
Let k = 8. We pick up after 32 elements have been read. I use the notation [a-b] to mean a random element whose index is between a and b inclusive. The reservoir looks like this:
[0-3] [4-7] [8-11] [12-15] [16-19] [20-23] [24-27] [28-31]
Before we process the next element (32), we have to make room. This means merging [0-3] and [4-7] into [0-7].
[0-7] [32] [8-11] [12-15] [16-19] [20-23] [24-27] [28-31]
We merge the next few elements into [32].
[0-7] [32-39] [8-11] [12-15] [16-19] [20-23] [24-27] [28-31]
Element 40 requires another merge, this time of [16-19] and [20-23]. In general, we do merges in a low-discrepancy order.
[0-7] [32-39] [8-11] [12-15] [16-23] [40] [24-27] [28-31]
Keep going.
[0-7] [32-39] [8-11] [12-15] [16-23] [40-47] [24-27] [28-31]
At the end of the round, the reservoir looks like this.
[0-7] [32-39] [8-15] [48-55] [16-23] [40-47] [24-31] [56-63]
We use standard techniques from FFT to undo the butterfly permutation of the new samples and move them to the end.
[0-7] [8-15] [16-23] [24-31] [32-39] [40-47] [48-55] [56-63]
Then we start the next round.
Perhaps the simplest way to do reservoir sampling is to associate a random score with each sample, and then use a heap to remember the k samples with the highest scores.
This corresponds to applying a threshold operation to white noise, where the threshold value is chosen to admit the correct number of samples. Every sample has the same chance of being included in the output set, exactly as if k samples were selected uniformly.
If you sample blue noise instead of white noise to produce your scores, however, then applying a threshold operation will produce a low-discrepancy sequence and the samples in your output set will be more evenly spaced. This effect occurs because, while white noise samples are all independent, blue noise samples are temporally anti-correlated.
This technique is used to create pleasing halftone patterns (google Blue Noise Mask).
Theoretically, it works for any final sampling ratio, but realistically its limited by numeric precision. I think it has a good chance of working OK for your range of 1-100, but I'd be more comfortable with 1-20.
There are many ways to generate blue noise, but probably your best choices are to apply a high-pass filter to white noise or to construct an approximation directly from 1D Perlin noise.

Determin Depth of Huffman Tree using Input Character pattern (or Frequency)?

I'd like to as a variation on this question regarding Huffman tree building. Is there anyway to calculate the depth of a Huffman tree from the input (or frequency), without drawing tree.
if there is no quick way, How the answer of that question was found? Specific Example is : 10-Input Symbol with Frequency 1 to 10 is 5.
If you are looking for an equation to take the frequencies and give you the depth, then no, no such equation exists. The proof is that there exist sets of frequencies on which you will have arbitrary choices to make in applying the Huffman algorithm that result in different depth trees! So there isn't even a unique answer to "What is the depth of the Huffman tree?" for some sets of frequencies.
A simple example is the set of frequencies 1, 1, 2, and 2, which can give a depth of 2 or 3 depending on which minimum frequencies are paired when applying the Huffman algorithm.
The only way to get the answer is to apply the Huffman algorithm. You can take some shortcuts to get just the depth, since you won't be using the tree at the end. But you will be effectively building the tree no matter what.
You might be able to approximate the depth, or at least put bounds on it, with an entropy equation. In some special cases the bounds may be restrictive enough to give you the exact depth. E.g. if all of the frequencies are equal, then you can calculate the depth to be the ceiling of the log base 2 of the number of symbols.
A cool example that shows that a simple entropy bound won't be strong enough to get the exact answer is when you use the Fibonacci sequence for the frequencies. This assures that the depth of the tree is the number of symbols minus one. So the frequencies 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, and 610 will result in a depth of 14 bits even though the entropy of the lowest frequency symbol is 10.64 bits.

How do binary connections eliminate multiplication?

I was reading Neural Network with Few Multiplications and I'm having trouble understanding how Binary or Ternary Connect eliminate the need for multiplication.
They explain that by stochastically sampling the weights from [-1, 0, 1], we eliminate the need to multiply and Wx can be calculated using only sign changes. However, even with weights strictly -1, 0, and 1, how can I change the signs of x without multiplication?
eg. W = [0,1,-1] and x = [0.3, 0.2, 0.4]. Wouldn't I still need to multiply W and x to get [0, 0.2, -0.4]? Or is there some other way to change the sign more efficiently than multiplication?
Yes. All the general-purpose processors I know of since the "early days" (say, 1970) have a machine operation to take the magnitude of one number, the sign of another, and return the result. The data transfer happens in parallel: the arithmetic part of the operation is a single machine cycle.
Many high-level languages have this capability as a built-in function. It often comes under a name such as "copy_sign".

Grouping or Clustering Algorithm

Similar questions in the database seem to be much more complicated than my example. I want to cluster 100'ish points on a line. Number of groups is irrelevant; the closeness of points is more important.
What is a term, method or algorithm to deal with this grouping problem? K-means, Hamming distance, hierarchical agglomeration, clique or complete linkage??
I've reduced two examples to bare minimum for clarification:
Simple example:
Set A = {600, 610, 620, 630} and the set of differences between its elements is diff_A = {10, 20, 30, 10, 20, 10}. I can then group as follows: {10, 10, 10}, {20, 20}, and {30}. Done.
Problematic example:
Set B = {600, 609, 619, 630} and the set of differences is diff_B = {9, 10, 11, 19, 21, 30}. I try to group with a tolerance of 1, i.e. differences that are 1 (or less) are 'similar enough' to be grouped but I get a paradox: {9, 10} AND/OR {10, 11}, {19}, {21}, and {30}.
Issue:
9 and 10 are close enough, 10 and 11 are close enough, but 9 and 11 are not, so how should I handle these overlapping groups? Perhaps this small example is unsolvable because it is symmetrical?
Why do you work on the pairwise differences? Consider the values 1, 2, 101, 102, 201, 202. Pairwise differences are 1,100,101,200,201,99,100,199,200,1,100,101,99,100,1
The values of ~200 bear no information. There is a different "cluster" inbetween. You shouldn't use them for your analysis.
Instead, grab a statistics textbook and look up Kernel Density Estimation. Don't bother to look for clustering - these methods are usually designed for the multivariate case. Your data is 1 dimensional. It can be sorted (it probably already is), and this can be exploited for better results.
There are well-established heuristics for density estimation on such data, and you can split your data on local minimum density (or simply at a low density threshold). This is much simpler, yet robust and reliable. You don't need to set a paramter such as k for k-means. There are cases where k-means is a good choice - it has origins in signal detection, where it was known that there are k=10 different signal frequencies. Today, it is mostly used for multidimensional data.
See also:
Cluster one-dimensional data optimally?
1D Number Array Clustering
partitioning an float array into similar segments (clustering)
What clustering algorithm to use on 1-d data?

How to program a function to return values on some sort of probability?

This question arose to me while I was playing FIFA.
Assumingly, they programmed a complex function which includes all the factors like shooting skills, distance, shot power etc. to calculate the probability that the shot hits the target. How would they have programmed something that the goal happens according to that probability?
In other words, like a function X() has the probability that it return 1 89% and 0 11%. How would I program it so that it returns 1 (approximately) 89 times in 100 trials?
Generate a uniformly-distributed random number between 0 and 1, and return true if the number is less than the desired probability (0.89).
For example, in IPython:
In [13]: from random import random
In [14]: vals = [random() < 0.89 for i in range(10000)]
In [15]: sum(vals)
Out[15]: 8956
In this realisation, 8956 out of the 10000 boolean outcomes are true. If we repeat the experiment, the number will vary around 8900.
That is not how goals are determined in FIFA or other video games. They don't have a function that says, with some probability, the shot makes it or doesn't.
Rather, they simulate a ball actually being kicked into a goal.
The ball will have some speed (based on the "shot power") and some trajectory angle (based on where the player aimed, and some variability based on the character's "shot skill"). Then they allow physics - and the AI of the goalee, if there is one - to take over, and count it as a point only when the ball physically enters the goal.
There is of course still randomness involved, but there is no single variable that decides whether or not a shot will make it.
I'm not 100% sure but one way i would achieve:
Generate a random number (between 0 and 100). If the number is 89 or greater than return 1, elsewise return 0.
If you have a random number generator, then you would do something like:
bool return_true_89_out_of_100() {
double random_n = rand(); // returns random between 0 and 89
return (random_n < 0.89);
}
You can generate a crudely random number by, for example, sampling lower bits of the CPU clock or some mathematical tricks.
You're tagged language agnostic, but the answer depends on what random number function(s) are available to you. Furthermore the accuracy may depend on how close to being truly random your generator is (generally they're not that close).
As to random number functions, there tend to be two kinds -- those which generate a number between 0 and 1, and those that generate a number between m and n. Each can be used to derive a percentage easily.

Resources