Bug in density calculation std::piecewise_constant_distribution? - c++11

It seems that std::piecewise_constant_distribution computes the densities wrongly , at least with GCC and its standard library.
According to http://www.cplusplus.com/reference/random/piecewise_constant_distribution/:
The densities should be computed as:
Checking this manually reveals the bug!
This can be seen here: http://coliru.stacked-crooked.com/a/ca171bf600b5148f
The source code related to this is found in /usr/include/c++/4.8/bits/random.tcc (on linux) and the extract of the initialization function _M_initialize called by the constructor shows that there is something incorrect here:
const double __sum = std::accumulate(_M_den.begin(),
_M_den.end(), 0.0);
__detail::__normalize(_M_den.begin(), _M_den.end(), _M_den.begin(),
__sum); <----- WRONG
// THIS is not the cummulative distribution (since the above normalization does not give the probability of the intervalls!)
_M_cp.reserve(_M_den.size());
std::partial_sum(_M_den.begin(), _M_den.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
// Dividing here by the interval length is WRONG!!!
for (size_t __k = 0; __k < _M_den.size(); ++__k)
_M_den[__k] /= _M_int[__k + 1] - _M_int[__k];

Here's the applicable part of the specification, straight from N4296:
As can be seen clearly, the summation applies only to the weights.
It's easy to see that there's something wrong with your testing code. Reducing the number of intervals to two, the first of length 1 and the second of length 2:
std::array<PREC,3> intervals {0, 1, 3};
and giving each interval weight equal to its length:
std::array<PREC,2> weights {1, 2};
One would expect the density to be constant. But your code reports:
Probability : 0.200000000000000011102230246252
Probability : 0.400000000000000022204460492503

The wording on cplusplus.com is ambiguous. cppreference.com gives a clearer explanation, and that's exactly what is written in C++ standard:

Related

Probability normal distribution with an equation P(|x-3| > 5) for X~N(2,6)

I'm confused about how to go about solving this problem. I don't quite understand what |x-3| represents in this case, and how it impacts the outcome when the variable is normally distributed. What would be the steps required to solve this?
It is absolute value, so P(|X-3|>5) means out of whole [-infinity...+infinity] range subrange around point x=3 with width of 5 is excluded.
So you have X in ranges [-infinity...-2] and [8...+infinity]
Given N(x;2,6) distribution, probability would be sum of integrals
P(|X-3|>5) = S[-infinity...-2] N(x;2,6) dx + S[8...+infinity] N(x;2,6) dx
where S denotes integration, or, equivalent
P(|X-3|>5) = 1 - S[-2...8] N(x;2,6) dx

cudnnRNNForwardTraining seqLength / xDesc usage

Let's say I have N sequences x[i], each with length seqLength[i] for 0 <= i < N. As far as I understand from the cuDNN docs, they have to be ordered by sequence length, the longest first, so assume that seqLength[i] >= seqLength[i+1]. Let's say that they have the feature dimension D, so x[i] is a 2D tensor of shape (seqLength[i], D). As far as I understand, I should prepare a tensor x where all x[i] are contiguously behind each other, i.e. it would be of shape (sum(seqLength), D).
According to the cuDNN docs, the functions cudnnRNNForwardInference / cudnnRNNForwardTraining gets the argument int seqLength and cudnnTensorDescriptor_t* xDesc, where:
seqLength: Number of iterations to unroll over.
xDesc: Array of tensor descriptors. Each must have the same second dimension. The first dimension may decrease from element n to element n + 1 but may not increase.
I'm not exactly sure I understand this correctly.
Is seqLength my max(seqLength)?
And xDesc is an array. Of what length? max(seqLength)? If so, I assume that it describes one batch of features for each frame but some of the later frames will have less sequences in it. It sounds like the number of sequences per frame is described in the first dimension.
So:
xDesc[t].shape[0] = len([i for i in range(N) if t < seqLength[i]])
for all 0 <= t < max(seqLength). I.e. 0 <= xDesc[t].shape[0] <= N.
How much dimensions does each xDesc[t] describe, i.e. what is len(xDesc[t].shape)? I would assume that it is 2 and the second dimension is the feature dimension, i.e. D, i.e.:
xDesc[t].shape = (len(...), D)
The strides would have to be set accordingly, although it's also not totally clear. If x is stored in row-major order, then
xDesc[0].strides[0] = D * xDesc[0].shape[0]
xDesc[0].strides[1] = 1
But how does cuDNN compute the offset for frame t? I guess it will keep track and thus calculate sum([xDesc[t2].strides[0] for t2 in range(t)]).
Most example code I have seen assume that all sequences are of the same length. Also they all describe 3 dimensions per xDesc[t], not 2. Why is that? The third dimension is always 1, as well as the stride of the second and third dimension, and the stride for the first dimension is N. So this assumes that the tensor x is row-major ordered and of shape (max(seqLength), N, D). The code is actually a bit strange. E.g. from TensorFlow:
int dims[] = {batch_size, data_size, 1};
int strides[] = {dims[1] * dims[2], dims[2], 1};
cudnnSetTensorNdDescriptor(
...,
sizeof(dims) / sizeof(dims[0]) /*nbDims*/, dims /*dimA*/,
strides /*strideA*/);
The code looks really similar in all examples I have found. Search for cudnnSetTensorNdDescriptor or cudnnRNNForwardTraining. E.g.:
TensorFlow (issue 6633)
Theano
mxnet
Torch
Baidu persistent-rnn
Caffe2
Chainer
I found one example which can handle sequences of different length. Again search for cudnnSetTensorNdDescriptor:
Microsoft CNTK
That claims that there must be 3 dimensions for every xDesc[t]. It has the comment:
these dimensions are what CUDNN expects: (the minibatch dimension, the data dimension, and the number 1 (because each descriptor describes one frame of data)
Edit: Support for this was added now end of 2018 for PyTorch, in this commit.
Am I missing something from the cuDNN documentation? I really have not found that information in it.
My question is basically, is my conclusion about how to set the arguments x, seqLength and xDesc for cudnnRNNForwardInference / cudnnRNNForwardTraining correct, and also my implicit assumptions, or if not, how would I use it, how does the memory layout look like, etc.?

How can I acquire a certain "random range" in a higher frequency?

My question is basically, "how can I obtain certain random values within a specific range more than random values outside the range?"
Allow me to demonstrate what I mean:
If I were to, on a good amount of trials, start picking a variety of
random numbers from 1-10, I should be seeing more numbers in the 7-10
range than in the 1-6 range.
I tried a couple of ways, but I am not getting desirable results.
First Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (1-math.random()^3)*10
end
print(sum/i)
end
getAverage(500)
I was constantly getting numbers only around 7.5, such as 7.48, and 7.52. Although this does indeed get me a number within my range, I don't want such strict consistancy.
Second Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (math.random() > .3 and math.random(7,10) or math.random(1,6))
end
print(sum/i)
end
getAverage(500)
This function didn't work as I wanted it to either. I primarily getting numbers such as 6.8 and 7.2 but nothing even close to 8.
Third Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (((math.random(10) * 2)/1.2)^1.05) - math.random(1,3)
end
print(sum/i)
end
getAverage(500)
This function was giving me slightly more favorable results, with the function consistently returning 8, but that is the issue - consistency.
What type of paradigms or practical solutions can I use to generate more random numbers within a specific range over another range?
I have labeled this as Lua, but a solution in any language that is understandable is acceptable.
I don't want such strict consistancy.
What does that mean?
If you average a very large number of values in a given range from any RNG, you should expect that to produce the same number. That means each of the numbers in the range was equally likely to appear.
This function didn't work as I wanted it to either. I primarily getting numbers such as 6.8 and 7.2 but nothing even close to 8.
You have to clarify what "didn't work" means. Why would you expect it to give you 8? You can see it won't just by looking at the formula you used.
For instance, if you'd used math.random(1,10), assuming all numbers in the range have an equal chance of appearing, you should expect the average to be 5.5, dead in the middle of 1 and 10 (because (1+2+3+4+5+6+7+8+9+10)/10 = 5.5).
You used math.random() > .3 and math.random(7,10) or math.random(1,6) which is saying 70% of the time to give 7, 8, 9, or 10 (average = 8.5) and 30% of the time to give you 1, 2, 3, 4, 5, or 6 (average = 3.5). That should give you an overall average of 7 (because 3.5 * .3 + 8.5 * .7 = 7). If you bump up your sample size, that's exactly what you'll see. You're seeing values on either size because you sample size is so small (try bumping it up to 100000).
I've made skewed random values before by simply generating two random numbers in the range, and then picking the largest (or smallest). This skews the probability towards the high (or low) endpoint.
Picking the smallest of two gives you a linear probability distribution.
Picking the smallest of three gives you a parabolic distribution (more selectivity, less probability at "the other end"). For my needs, a linear distribution was fine.
Not exactly what you wanted, but maybe it's good enough.
Have fun!

Standard deviation of one element

When I try to execute
StandardDeviation[{1}]
I get an error
StandardDeviation::shlen: "The argument {1} should have at least two elements"
But std of one element is 0, isn't it?
The standard deviation is commonly defined as the square-root of the unbiased estimator of the variance:
You can easily see that for a single sample, N=1 and you get 0/0, which is undefined. Hence your standard deviation is undefined for a single sample in Mathematica.
Now depending on your conventions, you might want to define a standard deviation for a single sample (either return Null or some value or 0). Here's an example that shows you how to define it for a single sample.
std[x_List] := Which[(Length[x] == 1), 0, True, StandardDeviation[x]]
std[{1}]
Out[1]= 0
The standard deviation of a constant is zero.
The estimated standard deviation of one sample is undefined.
If you want some formality:
p[x_] := DiracDelta[x - mu];
expValue = Integrate[x p[x] , {x, -Infinity, Infinity}]
stdDev = Sqrt[Integrate[(x - expValue)^2 p[x] , {x, -Infinity, Infinity}]]
(*
-> ConditionalExpression[mu, mu \[Element] Reals]
-> ConditionalExpression[0, mu \[Element] Reals]
*)
Edit
Or better, using Mathematica ProbabilityDistribution[]:
dist = ProbabilityDistribution[DiracDelta[x - mu], {x, -Infinity, Infinity}];
{Mean[dist], StandardDeviation[dist]}
(*
-> { mu, ConditionalExpression[0, mu \[Element] Reals]}
*)
If your population size is one element, then yes the standard deviation of your population will be 0. However typically standard deviations are used on samples, and not on the entire population, so instead of dividing by the number of elements in the sample, you divide by the number of elements minus one. This is due to the error inherent in performing calculations on a sample, rather than a population.
Performing a calculation of the standard deviation over a population of size 1 makes absolutely no sense, which I think is where the confusion is coming from. If you know that your population contains only one element then finding out the standard deviation of that element is pointless, so generally you will see the standard deviation of a single element written as undefined.
Standard deviation - which is a measure for the deviation of the actual value from the average of a given set - for a list of one element doesn't make any sense (you can set it to 0 if you want).

Using basic arithmetics for calculating Pi with arbitary precision

I am looking for a formula/algorithm to calculate PI~3.14 in a given precision.
The formula/algorithm must have only very basic arithmetic as
+: Addition
-: Subtraction
*: Multiplication
/: Divison
because I want to implement these operations in C++ and want to keep the implementation as simple as possible (no bignum library is allowed).
I have found that this formula for calculating Pi is pretty simple:
Pi/4 = 1 - 1/3 + 1/5 - 1/7 + ... = sum( (-1)^(k+1)/(2*k-1) , k=1..inf )
(note that (-1)^(k+1) can be implemented easily by above operators).
But the problem about this formula is the inability to specify the number of digits to calculate. In other words, there is no direct way to determine when to stop the calculation.
Maybe a workaround to this problem is calculating the difference between n-1th and nth calculated term and considering it as the current error.
Anyway, I am looking for a formula/algorithm that have these properties and also converges faster to Pi
Codepad link:
#include <iostream>
#include <cmath>
int main()
{
double p16 = 1, pi = 0, precision = 10;
for(int k=0; k<=precision; k++)
{
pi += 1.0/p16 * (4.0/(8*k + 1) - 2.0/(8*k + 4) - 1.0/(8*k + 5) - 1.0/(8*k+6));
p16 *= 16;
}
std::cout<<std::setprecision(80)<<pi<<'\n'<<M_PI;
}
Output:
3.141592653589793115997963468544185161590576171875
3.141592653589793115997963468544185161590576171875
This is actually the Bailey-Borwein-Plouffe formula, also taken from the link from wikipedia.
In your original (slowly converging) example, the error term can be computed because this is an alternating series; see http://en.wikipedia.org/wiki/Alternating_series#Approximating_Sums
Essentially, the next uncomputed term is a bound on the error.
You can just do the Taylor envelope of the arctan(1) and then you will get pi/4 just summing all the rest part.
The taylor envelope of arctan(1)
http://en.wikipedia.org/wiki/Taylor_series
also you can use the euler formula with z=1 and then multiply the result by 4.
http://upload.wikimedia.org/math/2/7/9/279bed5a2ea3b80a71f5b22078090168.png

Resources