Finding parameters next to vectors to get a desired vector - algorithm

What is the simplest algorithm I can use to find such values of m1, m2, m3, ..., mn, that the following equation is satisified (of course to a certain accuracy threshold):
m1*v1 + m2*v2 + ... + mn*vn = vd
where v1, v2, ..., vn and vd are given vectors of 3-10 dimensions? The parameters m1, ..., mn should be positive real numbers.
I need an algorithm that's reliable and quick to code. Problem sizes will be small, (no larger than n=100) so speed isn't a very important issue, especially that the accuracies will be rather liberal.

What you are describing is a system of linear equations. You can write it as the following matrix equation:
A * x = b
Where, if k is the dimension of the vectors:
/ v1[1] v2[1] ... vn[1] \
| v1[2] v2[2] ... vn[2] |
A = | ..................... |
| ..................... |
\ v1[k] v2[k] ... vn[k] /
/ m1 \
| m2 |
x = | .. |
| .. |
\ mn /
/ vd[1] \
| vd[2] |
b = | ..... |
| ..... |
\ vd[k] /
There are several ways to solve these. If n is equal to k and the problem has a solution (which may or may not), then you can solve it by inverting the coefficient matrix A and computing inverse(A) * b, by using Cramer's rule or, most commonly, with Gaussian elmination. If n is not equal to k, several things can happen, you can learn about it googling a little bit.
By the way, you said that m1 ... mn must be positive numbers (non-zero?). In this case, you may want to approach your problem from linear programming, adding restrictions like m1 > 0, m2 > 0, etc. and using simplex algorithm to solve it.
Whatever you use, is really not advisable to program the algorithm by yourself. There are plenty of libraries for every language that deal with this kind of problems.

Related

Algorithm for square root calculation

I have been implementing control software in C and one of the control algorithms requires square root calculation. I have been looking for suitable square root calculation algorithm which will have constant execution time irrespective to the radicand value. This requirement rules out the sqrt function from the standard library.
As far as my platform I have been working with floating point 32 bits ARM Cortex A9 based machine. As far as the radicand range in my application the algorithms are calculated in physical units so I expect following range <0, 400>. As far as the required error I think that error about 1 % could be sufficient. Can anybody recommend me a square root calculation algorithm suitable for my purposes?
My initial approach would be to use the Taylor serie for square root with precalculated coefficients at a number of fixed points. This will reduce the calculation to a subtraction and a number of multiplication.
The look-up table would be a 2D array like:
point | C0 | C1 | C2 | C3 | C4 | ...
-----------------------------------------
0.5 | f00 | f01 | f02 | f03 | f04 |
-----------------------------------------
1.0 | f10 | f11 | f12 | f13 | f14 |
-----------------------------------------
1.5 | f20 | f21 | f22 | f23 | f24 |
-----------------------------------------
....
So when calculating sqrt(x) use the table row with the point closest to x.
Example:
sqrt(1.1) (i.e. use point 1.0 coeffients)
f10 +
f11 * (1.1 - 1.0) +
f12 * (1.1 - 1.0) ^ 2 +
f13 * (1.1 - 1.0) ^ 3 +
f14 * (1.1 - 1.0) ^ 4
The table above suggest a fixed distance between the points at which you precalculate coeffients (i.e. 0.5 between each point). However, due to the natur of square root you may find that the distance between points shall differ for different ranges of x. For instance x in [0 - 1] -> distance 0.1,x in [1 - 2] -> distance 0.25, x in [2 - 10] -> distance 0.5 and so on.
Another thing is the number of terms needed to get the desired precision. Here you may also find that different ranges of x may require a different number of coefficients.
All this is easy to precalculation on a normal computer (e.g. using excel).
Note: For values very close to zero this method isn't good. Maybe Newtons method will be a better choice.
Taylor series: https://en.wikipedia.org/wiki/Taylor_series
Newtons method: https://en.wikipedia.org/wiki/Newton%27s_method
Also relevant: https://math.stackexchange.com/questions/291168/algorithms-for-approximating-sqrt2
Arm v7 instruction set provides a fast instruction for inverse reciprocal square root calculation vrsqrte_f32 for two simultaneous approximations and vrsqrteq_f32 for four approximations. (The scalar variant vrsqrtes_f32 is only available on Arm64 v8.2).
Then the result can be simply calculated by x * vrsqrte_f32(x);, which has better than 0.33% relative accuracy over the whole range of positive values x. See https://www.mdpi.com/2079-3197/9/2/21/pdf
ARM NEON instruction FRSQRTE gives 8.25 correct bits of the result.
At x==0 vrsqrtes_f32(x) == Inf, so x*vrsqrtes_f32(x) would be NaN.
If the value of x==0 is unavoidable, the optimal two instruction sequence needs a bit more adjustment:
float sqrtest(float a) {
// need to "transfer" or "convert" the scalar input
// to a vector of two
// - optimally we would not need an instruction for that
// but we would just let the processor calculate the instruction
// for all the lanes in the register
float32x2_t a2 = vdup_n_f32(a);
// next we create a mask that is all ones for the legal
// domain of 1/sqrt(x)
auto is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));
// calculate two reciprocal estimates in parallel
float32x2_t a2est = vrsqrte_f32(a2);
// we need to mask the result, so that effectively
// all non-legal values of a2est are zeroed
a2est = vand_u32(is_legal, a2est);
// x * 1/sqrt(x) == sqrt(x)
a2 = vmul_f32(a2, a2est);
// finally we get only the zero lane of the result
// discarding the other half
return vget_lane_f32(a2, 0);
}
Surely this method will have almost twice the throughput with
void sqrtest2(float &a, float &b) {
float32x2_t a2 = vset_lane_f32(b, vdup_n_f32(a), 1);
float32x2_t is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));
float32x2_t a2est = vrsqrte_f32(a2);
a2est = vand_u32(is_legal, a2est);
a2 = vmul_f32(a2, a2est);
a = vget_lane_f32(a2,0);
b = vget_lane_f32(a2,1);
}
And even better, if you can work directly with float32x2_t or float32x4_t inputs and outputs.
float32x2_t sqrtest2(float32x2_t a2) {
float32x2_t is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));
float32x2_t a2est = vrsqrte_f32(a2);
a2est = vand_u32(is_legal, a2est);
return vmul_f32(a2, a2est);
}
This implementation gives sqrtest2(1) == 0.998 and sqrtest2(400) == 19.97 (tested on MacBook M1 with arm64). Being branchless and LUT free, this has likely a constant execution time, assuming that all the instructions execute in constant number of cycles.
I have decided to use following approach. I have chosen the Newton method and then I have experimentally set the fixed number of iterations so that the error in whole range of the radicand i.e. <0,400> doesn't exceed the prescribed value. I have ended at six iterations. As far as the radicand with value 0 I have decided to return 0 without any calculations.

Understanding the time complexity of the Longest Common Subsequence Algorithm

I do not understand the O(2^n) complexity that the recursive function for the Longest Common Subsequence algorithm has.
Usually, I can tie this notation with the number of basic operations (in this case comparisons) of the algorithm, but this time it doesn't make sense in my mind.
For example, having two strings with the same length of 5. In the worst case the recursive function computes 251 comparisons. And 2^5 is not even close to that value.
Can anyone explain the algorithmic complexity of this function?
def lcs(xstr, ystr):
global nComp
if not xstr or not ystr:
return ""
x, xs, y, ys = xstr[0], xstr[1:], ystr[0], ystr[1:]
nComp += 1
#print("comparing",x,"with",y)
if x == y:
return x + lcs(xs, ys)
else:
return max(lcs(xstr, ys), lcs(xs, ystr), key=len)
To understand it properly look at the diagram carefully and follow the recursive top-to-down approach while reading the graph.
Here, xstr = "ABCD"
ystr = "BAEC"
lcs("ABCD", "BAEC") // Here x != y
/ \
lcs("BCD", "BAEC") <-- x==y --> lcs("ABCD", "AEC") x==y
| |
| |
lcs("CD", "AEC") <-- x!=y --> lcs("BCD", "EC")
/ \ / \
/ \ / \
/ \ / \
lcs("D","AEC") lcs("CD", "EC") lcs("BCD", "C")
/ \ / \ / \
lcs("", "AEC") lcs("D","EC") lcs("CD", "C") lcs("BCD","")
| \ / \ | / |
Return lcs("", "EC") lcs("D" ,"C") lcs("D", "") lcs("CD","") Return
/ \ / \ / \ / \
Return lcs("","C") lcs("D","") lcs("","") Return lcs("D","") Return
/ \ / \ / / \
Return lcs("","") Return lcs("", "") Return
| |
Return Return
NOTE: The proper way of representation of recursive call is usually done by using tree approach, but here i used the graph approach just to compress the tree so one can easy understand the recursive call in a go. And, of course it would be easy to me to represent.
Since, in the above diagram there are some redundant pairs like lcs("CD", "EC") which is the result of deletion of "A" from the "AEC" in lcs("CD", "AEC") and of "B" from the "BCD" in lcs("BCD", "EC"). As a result, these pairs will be called more than once while execution which increases the time complexity of the program.
As you could easily see that every pair generates two outcomes for its next level until it encounters any empty string or x==y. Therefore, if the length of the strings are n, m (considering the length of the xstr is n and ystr is m and we are considering the worst case scenario). Then, we will have number outcomes at the end of the order : 2n+m. (How? think)
Since, n+m is an integer number let's say N. Therefore, the time complexity of the algorithm : O(2N), which is not efficient for lager values of N.
Therefore, we prefer Dynamic-Programming Approach over the recursive Approach. It can reduce the time complexity to: O(n.m) => O(n2) , when n == m.
Even now, if you are getting hard time to understand the logic, i would suggest you to make a tree-like (not the graph which i have shown here) representation for xstr = "ABC" and ystr = "EF". I hope you will understand it.
Any doubt, comments most welcome.
O(2^n) means the run time is proportional to (2^n) for large enough n. It doesn't mean the number is bad, high, low, or anything specific for a small n, and it doesn't give a way to calculate the absolute run-time.
To get a feel for the implication, you should consider the run-times for n = 1000, 2000, 3000, or even 1 million, 2 million, etc.
In your example, assuming that for n=5 the algorithm takes a max of 251 iteration, then the O(n) prediction is that for n=50, it would take in the range of 2^(50)/2^(5)*251 = 2^45*251 = ~8.8E15 iterations.

I'm trying to find if a rectangle intersects a concave polygon. Does this algorithm accomplish that?

I'm trying to find if a rectangle intersects a concave polygon. I found this algorithm:
double determinant(Vector2D vec1, Vector2D vec2){
return vec1.x*vec2.y-vec1.y*vec2.x;
}
//one edge is a-b, the other is c-d
Vector2D edgeIntersection(Vector2D a, Vector2D b, Vector2D c, Vector2D d){
double det=determinant(b-a,c-d);
double t=determinant(c-a,c-d)/det;
double u=determinant(b-a,c-a)/det;
if ((t<0)||(u<0)||(t>1)||(u>1))return NO_INTERSECTION;
return a*(1-t)+t*b;
}
If I perform this 4 times (Top to Right, Top to bottom left, top to bottom right, bottom to right) * (all edges of my polygon) would this effectively and accurately tell me if the rectangle has part of or all the concave polygon inside? If not what would be missing?
Thanks
The code attempts to find the intersection point of two segments - AB and CD.
There are many different ways to explain how it is doing it, depending on how you interpret these operations.
Let's say point A has coordinates (xa, ya), B - (xb, yb) and so on. Let's say
dxAB = xb - xa
dyAB = yb - ya
dxCD = xd - xc
dyCD = yd - yc
The the following system of two linear equations
| dxAB dxCD | | t | | xc-xa |
| | * | | = | |
| dyAB dyCD | | u | | yc-ya |
if solved for t and u, will give you the proportional position of the intersection point on line AB (value t) and on line CD (value u). These values will lie in range of [0, 1] if the point belongs to the corresponding segment and outside of that range if the point lies outside the segment (on the line containing the segment).
In order to solve this system of linear equations we can use the well-known Cramer's rule. For that we will need the determinant of
| dxAB dxCD |
| |
| dyAB dyCD |
which is exactly determinant(b - a, c - d) from your code. (Actually, what I have here is determinant(b - a, d - c), but it is not really important for the purposes of this explanation. The code you posted for some reason swaps C and D, see P.S. note below).
And we will also need determinant of
| xc-xa dxCD |
| |
| yc-ya dyCD |
which is exactly determinant(c-a,c-d) from your code, and determinant of
| dxAB xc-xa |
| |
| dyAB yc-ya |
which is exactly determinant(b-a,c-a).
Dividing these determinants in accordance with the Cramer's rule will give us the values of t and u, which is exactly what is done in the code you posted.
The code then proceeds to test the values of t and u to check if the segments actually intersect, i.e. whether both t and u belong to [0, 1] range. And if they do, it calculates the actual intersection point by evaluating a*t+b*(1-t) (equivalently, it could evaluate c*u+d*(1-u)). (Again, see the P.S. note below).
P.S. In the original code the points D and C are "swapped" in a sense that the code does c - d, where I do d - c in my explanation. But this makes no difference for the general idea of the algorithm, as long as one's careful with signs.
This swap of C and D point is also the reason for a*(1-t)+t*b expression is used when evaluating the intersection point. Normally, as in my explanation, one'd expect to see something like a*t+b*(1-t) there. (I have my doubts about this though. I would expect to see a*t+b*(1-t) there even in your version. Could be a bug.)
P.P.S. The author if the code forgot to check for det == 0 (or very close to 0), which will happen in case when the segments are parallel.
I think the following should work:
(1) for each e1 in rectangle_edges, e2 in polygon_edges
(1.1) if edgeIntersection(e1,e2) != NO_INTERSECTION
(1.1.1) return true
(2) if (max_polygon_x < max_rectangle_x) and (min_polygon_x > min_rectangle_x) and (max_polygon_y < max_rectangle_y) and (min_polygon_y > min_rectangle_y)
(2.1) return true
(2) return false
Edit: Added check for whether the polygon is inside the rectangle.
As far as I can tell after a quick glance, it tries to determine if 2 line segments intersect, and if they do, what the coordinates of the intersection point are.
No, it is not good enough to determine if your rectangle and your polygon intersect, because you'd still miss the case where either the polygon is completely inside the rectangle, or the other way around.

Sorted list difference

I have the following problem.
I have a set of elements that I can sort by a certain algorithm A . The sorting is good, but very expensive.
There is also an algorithm B that can approximate the result of A. It is much faster, but the ordering will not be exactly the same.
Taking the output of A as a 'golden standard' I need to get a meaningful estimate of the error resulting of the use of B on the same data.
Could anyone please suggest any resource I could look at to solve my problem?
Thanks in advance!
EDIT :
As requested : adding an example to illustrate the case :
if the data are the first 10 letters of the alphabet,
A outputs : a,b,c,d,e,f,g,h,i,j
B outputs : a,b,d,c,e,g,h,f,j,i
What are the possible measures of the resulting error, that would allow me to tune the internal parameters of algorithm B to get result closer to the output of A?
Spearman's rho
I think what you want is Spearman's rank correlation coefficient. Using the index [rank] vectors for the two sortings (perfect A and approximate B), you calculate the rank correlation rho ranging from -1 (completely different) to 1 (exactly the same):
where d(i) are the difference in ranks for each character between A and B
You can defined your measure of error as a distance D := (1-rho)/2.
I would determine the largest correctly ordered sub set.
+-------------> I
| +--------->
| |
A -> B -> D -----> E -> G -> H --|--> J
| ^ | | ^
| | | | |
+------> C ---+ +-----------> F ---+
In your example 7 out of 10 so the algorithm scores 0.7. The other sets have the length 6. Correct ordering scores 1.0, reverse ordering 1/n.
I assume that this is related to the number of inversions. x + y indicates x <= y (correct order) and x - y indicates x > y (wrong order).
A + B + D - C + E + G + H - F + J - I
We obtain almost the same result - 6 of 9 are correct scorring 0.667. Again correct ordering scores 1.0 and reverse ordering 0.0 and this might be much easier to calculate.
Are you looking for finding some algorithm that calculates the difference based on array sorted with A and array sorted with B as inputs? Or are you looking for a generic method of determining on average how off an array would be when sorted with B?
If the first, then I suggest something as simple as the distance each item is from where it should be (an average would do better than a sum to remove length of array as an issue)
If the second, then I think I'd need to see more about these algorithms.
It's tough to give a good generic answer, because the right solution for you will depend on your application.
One of my favorite options is just the number of in-order element pairs, divided by the total number of pairs. This is a nice, simple, easy-to-compute metric that just tells you how many mistakes there are. But it doesn't make any attempt to quantify the magnitude of those mistakes.
double sortQuality = 1;
if (array.length > 1) {
int inOrderPairCount = 0;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) ++inOrderPairCount;
}
sortQuality = (double) inOrderPairCount / (array.length - 1);
}
Calculating RMS Error may be one of the many possible methods. Here is small python code.
def calc_error(out_A,out_B):
# in <= input
# out_A <= output of algorithm A
# out_B <= output of algorithm B
rms_error = 0
for i in range(len(out_A)):
# Take square of differences and add
rms_error += (out_A[i]-out_B[i])**2
return rms_error**0.5 # Take square root
>>> calc_error([1,2,3,4,5,6],[1,2,3,4,5,6])
0.0
>>> calc_error([1,2,3,4,5,6],[1,2,4,3,5,6]) # 4,3 swapped
1.414
>>> calc_error([1,2,3,4,5,6],[1,2,4,6,3,5]) # 3,4,5,6 randomized
2.44
NOTE:
Taking square root is not necessary but taking squares is as just differences may sum to zero. I think that calc_error function gives approximate number of wrongly placed pairs but I dont have any programming tools handy so :(.
Take a look at this question.
you could try something involving hamming distance
if anyone is using R language, I've implemented a function that computes the "spearman rank correlation coefficient" using the method described above by #bubake :
get_spearman_coef <- function(objectA, objectB) {
#getting the spearman rho rank test
spearman_data <- data.frame(listA = objectA, listB = objectB)
spearman_data$rankA <- 1:nrow(spearman_data)
rankB <- c()
for (index_valueA in 1:nrow(spearman_data)) {
for (index_valueB in 1:nrow(spearman_data)) {
if (spearman_data$listA[index_valueA] == spearman_data$listB[index_valueB]) {
rankB <- append(rankB, index_valueB)
}
}
}
spearman_data$rankB <- rankB
spearman_data$distance <-(spearman_data$rankA - spearman_data$rankB)**2
spearman <- 1 - ( (6 * sum(spearman_data$distance)) / (nrow(spearman_data) * ( nrow(spearman_data)**2 -1) ) )
print(paste("spearman's rank correlation coefficient"))
return( spearman)
}
results :
get_spearman_coef(c("a","b","c","d","e"), c("a","b","c","d","e"))
spearman's rank correlation coefficient: 1
get_spearman_coef(c("a","b","c","d","e"), c("b","a","d","c","e"))
spearman's rank correlation coefficient: 0.9

Representing continuous probability distributions

I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I need is some way of taking two of these PDFs and doing arithmetic on them. E.g. if I have two values x taken from PDF X, and y taken from PDF Y, I need to get the PDF for (x+y), or any other operation f(x,y).
An analytical solution is not possible, so what I'm looking for is some representation of PDFs that allows such things. An obvious (but computationally expensive) solution is monte-carlo: generate lots of values of x and y, and then just measure f(x, y). But that takes too much CPU time.
I did think about representing the PDF as a list of ranges where each range has a roughly equal probability, effectively representing the PDF as the union of a list of uniform distributions. But I can't see how to combine them.
Does anyone have any good solutions to this problem?
Edit: The goal is to create a mini-language (aka Domain Specific Language) for manipulating PDFs. But first I need to sort out the underlying representation and algorithms.
Edit 2: dmckee suggests a histogram implementation. That is what I was getting at with my list of uniform distributions. But I don't see how to combine them to create new distributions. Ultimately I need to find things like P(x < y) in cases where this may be quite small.
Edit 3: I have a bunch of histograms. They are not evenly distributed because I'm generating them from occurance data, so basically if I have 100 samples and I want ten points in the histogram then I allocate 10 samples to each bar, and make the bars variable width but constant area.
I've figured out that to add PDFs you convolve them, and I've boned up on the maths for that. When you convolve two uniform distributions you get a new distribution with three sections: the wider uniform distribution is still there, but with a triangle stuck on each side the width of the narrower one. So if I convolve each element of X and Y I'll get a bunch of these, all overlapping. Now I'm trying to figure out how to sum them all and then get a histogram that is the best approximation to it.
I'm beginning to wonder if Monte-Carlo wasn't such a bad idea after all.
Edit 4: This paper discusses convolutions of uniform distributions in some detail. In general you get a "trapezoid" distribution. Since each "column" in the histograms is a uniform distribution, I had hoped that the problem could be solved by convolving these columns and summing the results.
However the result is considerably more complex than the inputs, and also includes triangles. Edit 5: [Wrong stuff removed]. But if these trapezoids are approximated to rectangles with the same area then you get the Right Answer, and reducing the number of rectangles in the result looks pretty straightforward too. This might be the solution I've been trying to find.
Edit 6: Solved! Here is the final Haskell code for this problem:
-- | Continuous distributions of scalars are represented as a
-- | histogram where each bar has approximately constant area but
-- | variable width and height. A histogram with N bars is stored as
-- | a list of N+1 values.
data Continuous = C {
cN :: Int,
-- ^ Number of bars in the histogram.
cAreas :: [Double],
-- ^ Areas of the bars. #length cAreas == cN#
cBars :: [Double]
-- ^ Boundaries of the bars. #length cBars == cN + 1#
} deriving (Show, Read)
{- | Add distributions. If two random variables #vX# and #vY# are
taken from distributions #x# and #y# respectively then the
distribution of #(vX + vY)# will be #(x .+. y).
This is implemented as the convolution of distributions x and y.
Each is a histogram, which is to say the sum of a collection of
uniform distributions (the "bars"). Therefore the convolution can be
computed as the sum of the convolutions of the cross product of the
components of x and y.
When you convolve two uniform distributions of unequal size you get a
trapezoidal distribution. Let p = p2-p1, q - q2-q1. Then we get:
> | |
> | ______ |
> | | | with | _____________
> | | | | | |
> +-----+----+------- +--+-----------+-
> p1 p2 q1 q2
>
> gives h|....... _______________
> | /: :\
> | / : : \ 1
> | / : : \ where h = -
> | / : : \ q
> | / : : \
> +--+-----+-------------+-----+-----
> p1+q1 p2+q1 p1+q2 p2+q2
However we cannot keep the trapezoid in the final result because our
representation is restricted to uniform distributions. So instead we
store a uniform approximation to the trapezoid with the same area:
> h|......___________________
> | | / \ |
> | |/ \|
> | | |
> | /| |\
> | / | | \
> +-----+-------------------+--------
> p1+q1+p/2 p2+q2-p/2
-}
(.+.) :: Continuous -> Continuous -> Continuous
c .+. d = C {cN = length bars - 1,
cBars = map fst bars,
cAreas = zipWith barArea bars (tail bars)}
where
-- The convolve function returns a list of two (x, deltaY) pairs.
-- These can be sorted by x and then sequentially summed to get
-- the new histogram. The "b" parameter is the product of the
-- height of the input bars, which was omitted from the diagrams
-- above.
convolve b c1 c2 d1 d2 =
if (c2-c1) < (d2-d1) then convolve1 b c1 c2 d1 d2 else convolve1 b d1
d2 c1 c2
convolve1 b p1 p2 q1 q2 =
[(p1+q1+halfP, h), (p2+q2-halfP, (-h))]
where
halfP = (p2-p1)/2
h = b / (q2-q1)
outline = map sumGroup $ groupBy ((==) `on` fst) $ sortBy (comparing fst)
$ concat
[convolve (areaC*areaD) c1 c2 d1 d2 |
(c1, c2, areaC) <- zip3 (cBars c) (tail $ cBars c) (cAreas c),
(d1, d2, areaD) <- zip3 (cBars d) (tail $ cBars d) (cAreas d)
]
sumGroup pairs = (fst $ head pairs, sum $ map snd pairs)
bars = tail $ scanl (\(_,y) (x2,dy) -> (x2, y+dy)) (0, 0) outline
barArea (x1, h) (x2, _) = (x2 - x1) * h
Other operators are left as an exercise for the reader.
No need for histograms or symbolic computation: everything can be done at the language level in closed form, if the right point of view is taken.
[I shall use the term "measure" and "distribution" interchangeably. Also, my Haskell is rusty and I ask you to forgive me for being imprecise in this area.]
Probability distributions are really codata.
Let mu be a probability measure. The only thing you can do with a measure is integrate it against a test function (this is one possible mathematical definition of "measure"). Note that this is what you will eventually do: for instance integrating against identity is taking the mean:
mean :: Measure -> Double
mean mu = mu id
another example:
variance :: Measure -> Double
variance mu = (mu $ \x -> x ^ 2) - (mean mu) ^ 2
another example, which computes P(mu < x):
cdf :: Measure -> Double -> Double
cdf mu x = mu $ \z -> if z < x then 1 else 0
This suggests an approach by duality.
The type Measure shall therefore denote the type (Double -> Double) -> Double. This allows you to model results of MC simulation, numerical/symbolic quadrature against a PDF, etc. For instance, the function
empirical :: [Double] -> Measure
empirical h:t f = (f h) + empirical t f
returns the integral of f against an empirical measure obtained by eg. MC sampling. Also
from_pdf :: (Double -> Double) -> Measure
from_pdf rho f = my_favorite_quadrature_method rho f
construct measures from (regular) densities.
Now, the good news. If mu and nu are two measures, the convolution mu ** nu is given by:
(mu ** nu) f = nu $ \y -> (mu $ \x -> f $ x + y)
So, given two measures, you can integrate against their convolution.
Also, given a random variable X of law mu, the law of a * X is given by:
rescale :: Double -> Measure -> Measure
rescale a mu f = mu $ \x -> f(a * x)
Also, the distribution of phi(X) is given by the image measure phi_* X, in our framework:
apply :: (Double -> Double) -> Measure -> Measure
apply phi mu f = mu $ f . phi
So now you can easily work out an embedded language for measures. There are much more things to do here, particularly with respect to sample spaces other than the real line, dependencies between random variables, conditionning, but I hope you get the point.
In particular, the pushforward is functorial:
newtype Measure a = (a -> Double) -> Double
instance Functor Measure a where
fmap f mu = apply f mu
It is a monad too (exercise -- hint: this very much looks like the continuation monad. What is return ? What is the analog of call/cc ?).
Also, combined with a differential geometry framework, this can probably be turned into something which compute Bayesian posterior distributions automatically.
At the end of the day, you can write stuff like
m = mean $ apply cos ((from_pdf gauss) ** (empirical data))
to compute the mean of cos(X + Y) where X has pdf gauss and Y has been sampled by a MC method whose results are in data.
Probability distributions form a monad; see eg the work of Claire Jones and also the LICS 1989 paper, but the ideas go back to a 1982 paper by Giry (DOI 10.1007/BFb0092872) and to a 1962 note by Lawvere that I cannot track down (http://permalink.gmane.org/gmane.science.mathematics.categories/6541).
But I don't see the comonad: there's no way to get an "a" out of an "(a->Double)->Double". Perhaps if you make it polymorphic - (a->r)->r for all r? (That's the continuation monad.)
Is there anything that stops you from employing a mini-language for this?
By that I mean, define a language that lets you write f = x + y and evaluates f for you just as written. And similarly for g = x * z, h = y(x), etc. ad nauseum. (The semantics I'm suggesting call for the evaluator to select a random number on each innermost PDF appearing on the RHS at evaluation time, and not to try to understand the composted form of the resulting PDFs. This may not be fast enough...)
Assuming that you understand the precision limits you need, you can represent a PDF fairly simply with a histogram or spline (the former being a degenerate case of the later). If you need to mix analytically defined PDFs with experimentally determined ones, you'll have to add a type mechanism.
A histogram is just an array, the contents of which represent the incidence in a particular region of the input range. You haven't said if you have a language preference, so I'll assume something c-like. You need to know the bin-structure (uniorm sizes are easy, but not always best) including the high and low limits and possibly the normalizatation:
struct histogram_struct {
int bins; /* Assumed to be uniform */
double low;
double high;
/* double normalization; */
/* double *errors; */ /* if using, intialize with enough space,
* and store _squared_ errors
*/
double contents[];
};
This kind of thing is very common in scientific analysis software, and you might want to use an existing implementation.
I worked on similar problems for my dissertation.
One way to compute approximate convolutions is to take the Fourier transform of the density functions (histograms in this case), multiply them, then take the inverse Fourier transform to get the convolution.
Look at Appendix C of my dissertation for formulas for various special cases of operations on probability distributions. You can find the dissertation at: http://riso.sourceforge.net
I wrote Java code to carry out those operations. You can find the code at: https://sourceforge.net/projects/riso
Autonomous mobile robotics deals with similar issue in localization and navigation, in particular the Markov localization and Kalman filter (sensor fusion). See An experimental comparison of localization methods continued for example.
Another approach you could borrow from mobile robots is path planning using potential fields.
A couple of responses:
1) If you have empirically determined PDFs they either you have histograms or you have an approximation to a parametric PDF. A PDF is a continuous function and you don't have infinite data...
2) Let's assume that the variables are independent. Then if you make the PDF discrete then P(f(x,y)) = f(x,y)p(x,y) = f(x,y)p(x)p(y) summed over all the combinations of x and y such that f(x,y) meets your target.
If you are going to fit the empirical PDFs to standard PDFs, e.g. the normal distribution, then you can use already-determined functions to figure out the sum, etc.
If the variables are not independent, then you have more trouble on your hands and I think you have to use copulas.
I think that defining your own mini-language, etc., is overkill. you can do this with arrays...
Some initial thoughts:
First, Mathematica has a nice facility for doing this with exact distributions.
Second, representation as histograms (ie, empirical PDFs) is problematic since you have to make choices about bin size. That can be avoided by storing a cumulative distribution instead, ie, an empirical CDF. (In fact, you then retain the ability to recreate the full data set of samples that the empirical distribution is based on.)
Here's some ugly Mathematica code to take a list of samples and return an empirical CDF, namely a list of value-probability pairs. Run the output of this through ListPlot to see a plot of the empirical CDF.
empiricalCDF[t_] :=
Flatten[{{#[[2,1]],#[[1,2]]},#[[2]]}&/#Partition[Prepend[Transpose[{#[[1]],
Rest[FoldList[Plus,0,#[[2]]]]/Length[t]}&[Transpose[{First[#],Length[#]}&/#
Split[Sort[t]]]]],{Null,0}],2,1],1]
Finally, here's some information on combining discrete probability distributions:
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter7.pdf
I think the histograms or the list of 1/N area regions is a good idea. For the sake of argument, I'll assume that you'll have a fixed N for all distributions.
Use the paper you linked edit 4 to generate the new distribution. Then, approximate it with a new N-element distribution.
If you don't want N to be fixed, it's even easier. Take each convex polygon (trapezoid or triangle) in the new generated distribution and approximate it with a uniform distribution.
Another suggestion is to use kernel densities. Especially if you use Gaussian kernels, then they can be relatively easy to work with... except that the distributions quickly explode in size without care. Depending on the application, there are additional approximation techniques like importance sampling that can be used.
If you want some fun, try representing them symbolically like Maple or Mathemetica would do. Maple uses directed acyclic graphs, while Matematica uses a list/lisp like appoach (I believe, but it's been a loooong time, since I even thought about this).
Do all your manipulations symbolically, then at the end push through numerical values. (Or just find a way to launch off in a shell and do the computations).
Paul.

Resources