Obtaining gain ratio with joint entropy - entropy

I want to normalize the information gain between two discrete random variables to obtain the gain ratio, as a unit invariant coefficient. is it an acceptable method to normalize the information gain with the joint entropy, as a shortcut to the intrinsic value, or do I have to do it according to the Wikipedia link below?
http://en.wikipedia.org/wiki/Information_gain_ratio
Many thanks!

Related

Generating Gaussian Random Numbers without a Uniform Random Number Generator

I know many uniform random number generators(RNGs) based on some algorithms, physical systems and so on. Eventually, all these lead to uniformly distributed random numbers. It's interesting and important to know whether there is Gaussian RNGs, i.e. the algorithm or something else creates Gaussian random numbers. Much precisely I want to say that I don't want to use transformations such as Box–Muller or Marsaglia polar method to get Gaussian from Uniform RNGs. I am interested if there is some paper, algorithm or even idea to create Gaussian random numbers without any of use Uniform RNGs. It's just to say we pretend that we don't know there exist Uniform random number generators.
As already noted in answers/comments, by virtue of CLT some sum of any iid random number could be made into some reasonable looking gaussian. If incoming stream is uniform, this is basically Bates distribution. Ami Tavory answer is pretty much amounts to using Bates in disguise. You could look at closely related Irwin-Hall distribution, and at n=12 or higher they look a lot like gaussian.
There is one method which is used in practice and does not rely on transformation of the U(0,1) - Wallace method (Wallace, C. S. 1996. "Fast Pseudorandom Generators for Normal and Exponential Variates." ACM Transactions on Mathematical Software.), or gaussian pool method. I would advice to read description here and see if it fits your purpose
As others have noted, it's a bit unclear what is your motivation for this, and therefore I'm not sure if the following answers your question.
Nevertheless, it is possible to generate (an approximation of) this without the specific formulas transforming uniform RNGs that you mention.
As with any RNG, we have to have some source of randomness (or pseudo-randomness). I'm assuming, therefore, that there is some limitless sequence of binary bits which are independently equally likely to be 0 or 1 (note that it's possible to counter that this is a uniform discrete binary RNG, so I'm unsure if this answers your question).
Choose some large fixed n. For each invocation of the RNG, generate n such bits, sum them as x, and return
(2 x - 1) / √n
By the de Moivre–Laplace theorem this is normal with mean 0 and variance 1.

Why are JPEG quantization matrices asymmetric?

Why are the basis blocks corresponding to reflected waves in the quantisation matrix given seemingly random priorities in the standard JPEG quantisation matrices. Also, why aren't the priorities monotonic with respect to frequency?
I haven't been able to find any explanation and all I can come up with is possible tiling patterns occurring with symmetric quantisation matrices or an adaptation to the arrangement of photoreceptors in the eye.
The quantization tables are a set of fudge factors that attempt to model human perception.
The specific quantization table values are more art than science, because human perception is quirky and complex, and ideal coefficients depend on specific viewing conditions that can only be roughly guessed in advance.
Tables are not always monotonic with respect to frequency, because blocks of certain frequencies form patterns that are more useful than others, e.g. for straight horizontal and vertical lines.

What's the randomness quality of the Perlin/Simplex Noise algorithms?

What's the randomness quality of the Perlin Noise algorithm and Simplex Noise algorithm?
Which algorithm of the two has better randomness?
Compared with standard pseudo-random generators, does it make sense to use Perlin/Simplex as random number generator?
Update:
I know what the Perlin/Simplex Noise is used for. I'm only curious of randomness properties.
Perlin noise and simplex noise are meant to generate useful noise, not to be completely random. These algorithms are generally used to create procedurally generated landscapes and the like. For example, it can generate terrain such as this (image from here):
In this image, the noise generates a 2D heightmap such as this (image from here):
Each pixel's color represents a height. After producing a heightmap, a renderer is used to create terrain matching the "heights" (colors) of the image.
Therefore, the results of the algorithm are not actually "random"; there are lots of easily discernible patterns, as you can see.
Simplex supposedly looks a bit "nicer", which would imply less randomness, but its main purpose is that it produces similar noise but scales to higher dimensions better. That is, if one would produce 3D,4D,5D noise, simplex noise would outperform Perlin noise, and produce similar results.
If you want a general psuedo-random number generator, look at the Mersenne twister or other prngs. Be warned, wrt to cryptography, prngs can be full of caveats.
Update:
(response to OPs updated question)
As for the random properties of these noise functions, I know perlin noise uses a (very) poor man's prng as input, and does some smoothing/interpolation between neighboring "random" pixels. The input randomness is really just pseudorandom indexing into a precomputed random vector.
The index is computed using some simple integer operations, nothing too fancy. For example, the noise++ project uses precomputed "randomVectors" (see here) to obtain its source noise, and interpolates between different values from this vector. It generates a "random" index into this vector with some simple integer operations, adding a small amount of pseudorandomness. Here is a snippet:
int vIndex = (NOISE_X_FACTOR * ix + NOISE_Y_FACTOR * iy + NOISE_Z_FACTOR * iz + NOISE_SEED_FACTOR * seed) & 0xffffffff;
vIndex ^= (vIndex >> NOISE_SHIFT);
vIndex &= 0xff;
const Real xGradient = randomVectors3D[(vIndex<<2)];
...
The somewhat random noise is then smoothed over and in effect blended with neighboring pixels, producing the patterns.
After producing the initial noise, perlin/simplex noise has the concept of octaves of noise; that is, reblending the noise into itself at different scales. This produces yet more patters. So the initial quality of the noise is probably only as good as the precomputed random arrays, plus the effect of the psuedorandom indexing. But after all that the perlin noise does to it, the apparent randomness decreases significantly (it actually spreads over a wider area I think).
As stated in "The Statistics of Random Numbers", AI Game Wisdom 2, asking which produces 'better' randomness depends what you're using it for. Generally, the quality of PRNGs are compared via test batteries. At the time of print, the author indicates that the best known & most widely used test batteries for testing the randomness of PRNGs are ENT & Diehard. Also, see related questions of how to test random numbers and why statistical randomness tests seem ad-hoc.
Beyond the standard issues of testing typical PRNGs, testing Perlin Noise or Simplex Noise as PRNGs is more complicated because:
Both internally require a PRNG, thus the randomness of their output is influenced by the underlying PRNG.
Most PRNGs have lack tunable parameters. In contrast, Perlin noise is summation of one or more coherent-noise functions (octaves) with ever-increasing frequencies and ever-decreasing amplitudes. Since the final image depends on the number and nature of the octaves used, the quality of the randomness will vary accordingly. libnoise: Modifying the Parameters of the Noise Module
An argument similar to #2 holds for varying the number of dimensions used in Simplex noise as "a 3D section of 4D simplex noise is different from 3D simplex noise." Stefan Gustavson's Simplex noise demystified.
i think you are confused.
perlin and simplex take random numbers from some other source and make them less random so that they look more like natural landscapes (random numbers alone do not look like natural landscapes).
so they are not a source of random numbers - they are a way of processing random numbers from somewhere else.
and even if they were a source, they would not be a good source (the numbers are strongly correlated).
do NOT use perlin or simplex for randomness. they aren't meant for that. they're an /application/ of randomness.
people choose these for their visual appeal, which hasn't been sufficiently discussed yet, so i'll focus on that.
perlin/simplex with smoothstep are perfectly smooth. no matter how far you zoom, they will always be a gradient, not a vertex or edge.
the output range is (+/- 1/2 x #dimensions), so you need to compensate for this to get it to the range 0 to 1 or -1 to 1 as needed. fixing this is standard. adding octaves will increase this range by the scaling factor of the octave (its usually half the bigger octave of course).
perlin/simplex noise have the bizarre quality of being brown noise when zoomed in and blue noise when zoomed out. neither one nor a middle zoom is especially good for prng purposes, but theyre great for faking natural occurances (which arent really random, and /are/ spacially biased).
both perlin and simplex noise tend to have some bias along the axes, with perlin having a few more problems in this area. edit: getting away from even more bias in three dimensions is very complicated. its difficult (impossible?) to generate a large number of unbiased points upon a sphere.
perlin results tend to be circular with octagonal bias, while simplex tends to generate ovals with hexagonal bias.
a slice of higher dimensional simplex doesnt look like lower dimensional simplex. but a 2d slice of 3d perlin looks pretty much just like 2d perlin.
most people feel that simplex can't actually handle higher dimensions - it tends to "look worse and worse" for higher dimensions. perlin allegedly doesn't have this problem (it still has bias though).
i believe once "octaved" they both have similar triangular distribution of output when layered, (similar to rolling 2 dice) (id love if someone could double check this for me.) and so both benefit from a smoothstep. this is standard. (its possible to bias the results for equal output but it would still have dimensional biases that would fail prng quality tests due to high spacial correlation, which is /the/ feature, not a bug.)
please note that the octaves technique is not part of perlin or simplex definition. it is merely a trick frequently used in conjunction with them. perlin and simplex blend gradients at equally distributed points. octaves of this noise are combined to create larger and smaller structures. this is also frequently used in "value noise" which uses basically the white noise equivalent to this concept instead of the perlin noise. value noise with octaves will also exhibit /even worse/ octagonal bias. hence why perlin or simplex are preferred.
simplex is faster in all cases - /especially/ in higher dimensions.
so simplex fixes the problems of perlin in both performance and visuals, but introduces its own problems.

Determining if a dataset approximates a sine wave

Is there an algorithm that can be used to determine whether a sample of data taken at fixed time intervals approximates a sine wave?
Take the fourier transform which transforms the data into a frequency table (search for fft, fast fourier transformation, for an implementation. For example, FFTW). If it is a sinus or cosinus, the frequency table will contain one very high value corresponding to the frequency you're searching for and some noise at other frequencies.
Alternatively, match several sinussen at several frequencies and try to match them using cross correlation: the sum of squares of the differences between your signal and the sinus you're trying to fit. You would need to do this for sinussen at a range of frequencies of course. And you would need to do this while translating the sinus along the x-axis to find the phase.
You can calculate the fourier transform and look for a single spike. That would tell you that the dataset approximates a sine curve at that frequency.
Shot into the blue: You could take advantage of the fact that the integral of a*sin(t) is a*cos(t). Keeping track of min/max of your data should allow you to know a.
Check the least squares method.
#CookieOfFortune: I agree, but the Fourier series fit is optimal in the least squares sense (as is said on the Wikipedia article).
If you want to play around first with own input data, check the Discrete Fourier Transformation (DFT) on Wolfram Alpha. As noted before, if you want a fast implementation you should check out one of several FFT-libraries.

How to 'smooth' data and calculate line gradient?

I'm reading data from a device which measures distance. My sample rate is high so that I can measure large changes in distance (i.e. velocity) but this means that, when the velocity is low, the device delivers a number of measurements which are identical (due to the granularity of the device). This results in a 'stepped' curve.
What I need to do is to smooth the curve in order to calculate the velocity. Following that I then need to calculate the acceleration.
How to best go about this?
(Sample rate up to 1000Hz, calculation rate of 10Hz would be ok. Using C# in VS2005)
The wikipedia entry from moogs is a good starting point for smoothing the data. But it does not help you in making a decision.
It all depends on your data, and the needed processing speed.
Moving Average
Will flatten the top values. If you are interrested in the minimum and maximum value, don't use this. Also I think using the moving average will influence your measurement of the acceleration, since it will flatten your data (a bit), thereby acceleration will appear to be smaller. It all comes down to the needed accuracy.
Savitzky–Golay
Fast algorithm. As fast as the moving average. That will preserve the heights of peaks. Somewhat harder to implement. And you need the correct coefficients. I would pick this one.
Kalman filters
If you know the distribution, this can give you good results (it is used in GPS navigation systems). Maybe somewhat harder to implement. I mention this because I have used them in the past. But they are probably not a good choice for a starter in this kind of stuff.
The above will reduce noise on your signal.
Next you have to do is detect the start and end point of the "acceleration". You could do this by creating a Derivative of the original signal. The point(s) where the derivative crosses the Y-axis (zero) are probably the peaks in your signal, and might indicate the start and end of the acceleration.
You can then create a second degree derivative to get the minium and maximum acceleration itself.
You need a smoothing filter, the simplest would be a "moving average": just calculate the average of the last n points.
The question here is, how to determine n, can you tell us more about your application?
(There are other, more complicated filters. They vary on how they preserve the input data. A good list is in Wikipedia)
Edit!: For 10Hz, average the last 100 values.
Moving averages are generally terrible - but work well for white noise. Both moving averages & Savitzky-Golay both boil down to a correlation - and therefore are very fast and could be implemented in real time. If you need higher order information like first and second derivatives - SG is a good right choice. The magic of SG lies in the constant correlation coefficients needed for the filter - once you have decided the length and degree of polynomial to fit locally, the coefficients need only to be found once. You can compute them using R (sgolay) or Matlab.
You can also estimate a noisy signal's first derivative via the Savitzky-Golay best-fit polynomials - these are sometimes called Savitzky-Golay derivatives - and typically give a good estimate of the first derivative.
Kalman filtering can be very effective, but it's heavier computationally - it's hard to beat a short convolution for speed!
Paul
CenterSpace Software
In addition to the above articles, have a look at Catmull-Rom Splines.
You could use a moving average to smooth out the data.
In addition to GvSs excellent answer above you could also consider smoothing / reducing the stepping effect of your averaged results using some general curve fitting such as cubic or quadratic splines.

Resources