Calculating large exponential shares / probabilities - algorithm

Let there be an event space ES.
Let there be some sets of objects OS[].
The probabilities of selecting any object are mutually disjoint.
Now, assume that the size of each set is based on a number X[i] assigned to it.
The size of each set rises exponentially with that number.
The base (B) used for exponentiation could be the Euler's number (e), due to its nice properties, but let's assume that, that might not be the case.
Now, we are after calculating the probability of selecting any member of a selected set, at random, while keeping in mind that the arity of each set might be very large.
After the sequence of probabilities is known it's used to compute P[i]*(C).
I wonder if this could be optimized/approximated for very large exponents i.e. computed with low memory consumption i.e. implemented.
Related question I found is here still they seem to tackle only opposite probabilities.
// Numerical example:
// A,C - constants, natural numbers
//exponents
X[1] = 3432342332;
X[2] = 55438849;
X[3] = 34533;
//probabilities
P1 = A^X[1]/(A^X[1]+A^X[2]+A^X[3]);
P2 = A^X[2]/(A^X[1]+A^X[2]+A^X[3]);
P3 = A^X[3]/(A^X[1]+A^X[2]+A^X[3]);
//Results
R1 = P1 *C;
R2 = P2 *C;
R3 = P3 *C;
Excel would fail when exponents are larger than few hundreds.

So you have a number a>1, an integer array B of n elements, and for each i, you are to calculate a^B[i] / (a^B[1] + a^B[2] + ... + a^B[n]) .
Let C[i] = B[i] - max(B[1], ..., B[n]). Then you calculate
a^C[i] / (a^C[1] + a^C[2] + ... + a^C[n]). Since all elements of C are now non-positive, you don't care about overflow.

Related

Conditional sampling of binary vectors (?)

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...
I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.
A tiny example:
Out of the vectors:
110
010
011
110
100
I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be
110
100
My ideas so far
one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?
My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).
Firstly I will make some assumptions regarding this problem:
Regardless whether the column sum of the selected solution is over or under the target, it weighs the same.
The sum of the first, second, and third column are equally weighted in the solution (i.e. If there's a solution whereas the first column sum is off by 1, and another where the third column sum is off by 1, the solution are equally good).
The closest problem I can think of this problem is the Subset sum problem, which itself can be thought of a special case of Knapsack problem.
However both of these problem are NP-Complete. This means there are no polynomial time algorithm that can solve them, even though it is easy to verify the solution.
If I were you the two most arguably efficient solution of this problem are linear programming and machine learning.
Depending on how many columns you are optimising in this problem, with linear programming you can control how much finely tuned you want the solution, in exchange of time. You should read up on this, because this is fairly simple and efficient.
With Machine learning, you need a lot of data sets (the set of vectors and the set of solutions). You don't even need to specify what you want, a lot of machine learning algorithms can generally deduce what you want them to optimise based on your data set.
Both solution has pros and cons, you should decide which one to use yourself based on the circumstances and problem set.
This definitely can be modeled as (integer!) linear program (many problems can). Once you have it, you can use a program such as lpsolve to solve it.
We model vector i is selected as x_i which can be 0 or 1.
Then for each column c, we have a constraint:
sum of all (x_i * value of i in column c) = target for column c
Taking your example, in lp_solve this could look like:
min: ;
+x1 +x4 +x5 >= 2;
+x1 +x4 +x5 <= 2;
+x1 +x2 +x3 +x4 <= 1;
+x1 +x2 +x3 +x4 >= 1;
+x3 <= 0;
+x3 >= 0;
bin x1, x2, x3, x4, x5;
If you are fine with a heuristic based search approach, here is one.
Go over the list and find the minimum squared sum of the digit wise difference between each bit string and the goal. For example, if we are looking for 2, 1, 0, and we are scoring 0, 1, 0, we would do it in the following way:
Take the digit wise difference:
2, 0, 1
Square the digit wise difference:
4, 0, 1
Sum:
5
As a side note, squaring the difference when scoring is a common method when doing heuristic search. In your case, it makes sense because bit strings that have a 1 in as the first digit are a lot more interesting to us. In your case this simple algorithm would pick first 110, then 100, which would is the best solution.
In any case, there are some optimizations that could be made to this, I will post them here if this kind of approach is what you are looking for, but this is the core of the algorithm.
You have a given target binary vector. You want to select M vectors out of N that have the closest sum to the target. Let's say you use the eucilidean distance to measure if a selection is better than another.
If you want an exact sum, have a look at the k-sum problem which is a generalization of the 3SUM problem. The problem is harder than the subset sum problem, because you want an exact number of elements to add to a target value. There is a solution in O(N^(M/2)). lg N), but that means more than 2000^250 * 7.6 > 10^826 operations in your case (in the favorable case where vectors operations have a cost of 1).
First conclusion: do not try to get an exact result unless your vectors have some characteristics that may reduce the complexity.
Here's a hill climbing approach:
sort the vectors by number of 1's: 111... first, 000... last;
use the polynomial time approximate algorithm for the subset sum;
you have an approximate solution with K elements. Because of the order of elements (the big ones come first), K should be a little as possible:
if K >= M, you take the M first vectors of the solution and that's probably near the best you can do.
if K < M, you can remove the first vector and try to replace it with 2 or more vectors from the rest of the N vectors, using the same technique, until you have M vectors. To sumarize: split the big vectors into smaller ones until you reach the correct number of vectors.
Here's a proof of concept with numbers, in Python:
import random
def distance(x, y):
return abs(x-y)
def show(ls):
if len(ls) < 10:
return str(ls)
else:
return ", ".join(map(str, ls[:5]+("...",)+ls[-5:]))
def find(is_xs, target):
# see https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution
S = [(0, ())] # we store indices along with values to get the path
for i, x in is_xs:
T = [(x + t, js + (i,)) for t, js in S]
U = sorted(S + T)
y, ks = U[0]
S = [(y, ks)]
for z, ls in U:
if z == target: # use the euclidean distance here if you want an approximation
return ls
if z != y and z < target:
y, ks = z, ls
S.append((z, ls))
ls = S[-1][1] # take the closest element to target
return ls
N = 2000
M = 500
target = 1000
xs = [random.randint(0, 10) for _ in range(N)]
print ("Take {} numbers out of {} to make a sum of {}", M, xs, target)
xs = sorted(xs, reverse = True)
is_xs = list(enumerate(xs))
print ("Sorted numbers: {}".format(show(tuple(is_xs))))
ls = find(is_xs, target)
print("FIRST TRY: {} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
splits = 0
while len(ls) < M:
first_x = xs[ls[0]]
js_ys = [(i, x) for i, x in is_xs if i not in ls and x != first_x]
replace = find(js_ys, first_x)
splits += 1
if len(replace) < 2 or len(replace) + len(ls) - 1 > M or sum(xs[i] for i in replace) != first_x:
print("Give up: can't replace {}.\nAdd the lowest elements.")
ls += tuple([i for i, x in is_xs if i not in ls][len(ls)-M:])
break
print ("Replace {} (={}) by {} (={})".format(ls[:1], first_x, replace, sum(xs[i] for i in replace)))
ls = tuple(sorted(ls[1:] + replace)) # use a heap?
print("{} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
print("AFTER {} splits, {} -> {}".format(splits, ls, sum(x for i, x in is_xs if i in ls)))
The result is obviously not guaranteed to be optimal.
Remarks:
Complexity: find has a polynomial time complexity (see the Wikipedia page) and is called at most M^2 times, hence the complexity remains polynomial. In practice, the process is reasonably fast (split calls have a small target).
Vectors: to ensure that you reach the target with the minimum of elements, you can improve the order of element. Your target is (t_1, ..., t_c): if you sort the t_js from max to min, you get the more importants columns first. You can sort the vectors: by number of 1s and then by the presence of a 1 in the most important columns. E.g. target = 4 8 6 => 1 1 1 > 0 1 1 > 1 1 0 > 1 0 1 > 0 1 0 > 0 0 1 > 1 0 0 > 0 0 0.
find (Vectors) if the current sum exceed the target in all the columns, then you're not connecting to the target (any vector you add to the current sum will bring you farther from the target): don't add the sum to S (z >= target case for numbers).
I propose a simple ad hoc algorithm, which, broadly speaking, is a kind of gradient descent algorithm. It seems to work relatively well for input vectors which have a distribution of 1s “similar” to the target sum vector, and probably also for all “nice” input vectors, as defined in a comment of yours. The solution is not exact, but the approximation seems good.
The distance between the sum vector of the output vectors and the target vector is taken to be Euclidean. To minimize it means minimizing the sum of the square differences off sum vector and target vector (the square root is not needed because it is monotonic). The algorithm does not guarantee to yield the sample that minimizes the distance from the target, but anyway makes a serious attempt at doing so, by always moving in some locally optimal direction.
The algorithm can be split into 3 parts.
First of all the first M candidate output vectors out of the N input vectors (e.g., N=2000, M=500) are put in a list, and the remaining vectors are put in another.
Then "approximately optimal" swaps between vectors in the two lists are done, until either the distance would not decrease any more, or a predefined maximum number of iterations is reached. An approximately optimal swap is one where removing the first vector from the list of output vectors causes a maximal decrease or minimal increase of the distance, and then, after the removal of the first vector, adding the second vector to the same list causes a maximal decrease of the distance. The whole swap is avoided if the net result is not a decrease of the distance.
Then, as a last phase, "optimal" swaps are done, again stopping on no decrease in distance or maximum number of iterations reached. Optimal swaps cause a maximal decrease of the distance, without requiring the removal of the first vector to be optimal in itself. To find an optimal swap all vector pairs have to be checked. This phase is much more expensive, being O(M(N-M)), while the previous "approximate" phase is O(M+(N-M))=O(N). Luckily, when entering this phase, most of the work has already been done by the previous phase.
from typing import List, Tuple
def get_sample(vects: List[Tuple[int]], target: Tuple[int], n_out: int,
max_approx_swaps: int = None, max_optimal_swaps: int = None,
verbose: bool = False) -> List[Tuple[int]]:
"""
Get a sample of the input vectors having a sum close to the target vector.
Closeness is measured in Euclidean metrics. The output is not guaranteed to be
optimal (minimum square distance from target), but a serious attempt is made.
The max_* parameters can be used to avoid too long execution times,
tune them to your needs by setting verbose to True, or leave them None (∞).
:param vects: the list of vectors (tuples) with the same number of "columns"
:param target: the target vector, with the same number of "columns"
:param n_out: the requested sample size
:param max_approx_swaps: the max number of approximately optimal vector swaps,
None means unlimited (default: None)
:param max_optimal_swaps: the max number of optimal vector swaps,
None means unlimited (default: None)
:param verbose: print some info if True (default: False)
:return: the sample of n_out vectors having a sum close to the target vector
"""
def square_distance(v1, v2):
return sum((e1 - e2) ** 2 for e1, e2 in zip(v1, v2))
n_vec = len(vects)
assert n_vec > 0
assert n_out > 0
n_rem = n_vec - n_out
assert n_rem > 0
output = vects[:n_out]
remain = vects[n_out:]
n_col = len(vects[0])
assert n_col == len(target) > 0
sumvect = (0,) * n_col
for outvect in output:
sumvect = tuple(map(int.__add__, sumvect, outvect))
sqdist = square_distance(sumvect, target)
if verbose:
print(f"sqdist = {sqdist:4} after"
f" picking the first {n_out} vectors out of {n_vec}")
if max_approx_swaps is None:
max_approx_swaps = sqdist
n_approx_swaps = 0
while sqdist and n_approx_swaps < max_approx_swaps:
# find the best vect to subtract (the square distance MAY increase)
sqdist_0 = None
index_0 = None
sumvect_0 = None
for index in range(n_out):
tmp_sumvect = tuple(map(int.__sub__, sumvect, output[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_0 is None or sqdist_0 > tmp_sqdist:
sqdist_0 = tmp_sqdist
index_0 = index
sumvect_0 = tmp_sumvect
# find the best vect to add,
# but only if there is a net decrease of the square distance
sqdist_1 = sqdist
index_1 = None
sumvect_1 = None
for index in range(n_rem):
tmp_sumvect = tuple(map(int.__add__, sumvect_0, remain[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_1 > tmp_sqdist:
sqdist_1 = tmp_sqdist
index_1 = index
sumvect_1 = tmp_sumvect
if sumvect_1:
tmp = output[index_0]
output[index_0] = remain[index_1]
remain[index_1] = tmp
sqdist = sqdist_1
sumvect = sumvect_1
n_approx_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_approx_swaps}"
f" approximately optimal swap{'s'[n_approx_swaps == 1:]}")
diffvect = tuple(map(int.__sub__, sumvect, target))
if max_optimal_swaps is None:
max_optimal_swaps = sqdist
n_optimal_swaps = 0
while sqdist and n_optimal_swaps < max_optimal_swaps:
# find the best pair to swap,
# but only if the square distance decreases
best_sqdist = sqdist
best_diffvect = diffvect
best_pair = None
for i0 in range(M):
tmp_diffvect = tuple(map(int.__sub__, diffvect, output[i0]))
for i1 in range(n_rem):
new_diffvect = tuple(map(int.__add__, tmp_diffvect, remain[i1]))
new_sqdist = sum(d * d for d in new_diffvect)
if best_sqdist > new_sqdist:
best_sqdist = new_sqdist
best_diffvect = new_diffvect
best_pair = (i0, i1)
if best_pair:
tmp = output[best_pair[0]]
output[best_pair[0]] = remain[best_pair[1]]
remain[best_pair[1]] = tmp
sqdist = best_sqdist
diffvect = best_diffvect
n_optimal_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_optimal_swaps}"
f" optimal swap{'s'[n_optimal_swaps == 1:]}")
return output
from random import randrange
C = 30 # number of columns
N = 2000 # total number of vectors
M = 500 # number of output vectors
F = 0.9 # fill factor of the target sum vector
T = int(M * F) # maximum value + 1 that can be appear in the target sum vector
A = 10000 # maximum number of approximately optimal swaps, may be None (∞)
B = 10 # maximum number of optimal swaps, may be None (unlimited)
target = tuple(randrange(T) for _ in range(C))
vects = [tuple(int(randrange(M) < t) for t in target) for _ in range(N)]
sample = get_sample(vects, target, M, A, B, True)
Typical output:
sqdist = 2639 after picking the first 500 vectors out of 2000
sqdist = 9 after 27 approximately optimal swaps
sqdist = 1 after 4 optimal swaps
P.S.: As it stands, this algorithm is not limited to binary input vectors, integer vectors would work too. Intuitively I suspect that the quality of the optimization could suffer, though. I suspect that this algorithm is more appropriate for binary vectors.
P.P.S.: Execution times with your kind of data are probably acceptable with standard CPython, but get better (like a couple of seconds, almost a factor of 10) with PyPy. To handle bigger sets of data, the algorithm would have to be translated to C or some other language, which should not be difficult at all.

How can I optimize this indexing algorithm

My Questions
Is there anyway that I can speed up this calculation?
Is there a better algorithm or implementation that I can be use to calculate the same values?
Describing the algorithm
I have a complex indexing problem that I'm struggling to solve in an efficient way.
The goal is to calculate the matrix w_prime using values a combination of values from the equally sized matrices w, dY, and dX.
The value of w_prime(i,j) is calculated as mean( w( indY & indX ) ), where indY and indX are the indices of dY and dX that are equal to i and j respectively.
Here's a simple implementation in matlab of an algorithm to compute w_prime:
for i = 1:size(w_prime,1)
indY = dY == i;
for j = 1:size(w_prime,2)
indX = dX == j;
w_prime(ind) = mean( w( indY & indX ) );
end
end
Performance Problems
This implementation is sufficient in example case below; however, in my actual use case w, dY, dX are ~3000x3000 and w_prime is ~60X900. Meaning that each index calculation is happening on a ~9 million elements. Needless this implementation is too slow to be usable. Additionally I'll need to run this code a few dozen times.
Example Calculation
If I want to compute w(1,1)
Find the indices of dY that equal 1, save as indY
Find the indices of dX that equal 1, save as indX
Find intersection of indY and indX save as ind
Save the mean( w(ind) ) to w_prime(1,1)
General Problem Description
I have a set points defined by two vectors X, and T, both are 1XN where N is ~3000. Additionally the values of X and T are integers bound by the intervals (1 60) and (1 900) respectively.
The matrices dX and dT, are simply distance matrices, meaning that they contain the pairwise distances between the points. Ie dx(i,j) is equal abs( x(i) - x(j) ).
They are calculated using: dx = pdist(x);
The matrix w can be thought of as a weight matrix that describes how much influence one point has on another.
The purpose of calculating w_prime(a,b) is to determine the average weight between the sub-set of points that are separated by a in the X dimension and b in the T dimension.
This can be expressed as follows:
This is straightforward with ACCUMARRAY:
nx = max(dX(:));
ny = max(dY(:));
w_prime = accumarray([dX(:),dY(:)],w(:),[nx,ny],#mean,NaN)
The output will be a nx-by-ny sized array with NaNs wherever there was no corresponding pair of indices. If you're sure that there will be a full complement of indices all the time, you can simplify the above calculation to
w_prime = accumarray([dX(:),dY(:)],w(:),[],#mean)
So, what does accumarray do? It looks at the rows of [dX(:),dY(:)]. Each row gives the (i,j) coordinate pair in w_prime to which the row contributes. For all pairs (1,1), it applies the function (#mean) to the corresponding entries in w(:), and writes the output into w_prime(1,1).

How to calculate iteratively the running weighted average so that last values to weight most?

I want to implement an iterative algorithm, which calculates weighted average. The specific weight law does not matter, but it should be close to 1 for the newest values and close to 0 to the oldest.
The algorithm should be iterative. i.e. it should not remember all previous values. It should know only one newest value and any aggregative information about past, like previous values of the average, sums, counts etc.
Is it possible?
For example, the following algorithm can be:
void iterate(double value) {
sum *= 0.99;
sum += value;
count++;
avg = sum / count;
}
It will give exponential decreasing weight, which may be not good. Is it possible to have step decreasing weight or something?
EDIT 1
The the requirements for weighing law is follows:
1) The weight decreases into past
2) I has some mean or characteristic duration so that values older this duration matters much lesser than newer ones
3) I should be able to set this duration
EDIT 2
I need the following. Suppose v_i are values, where v_1 is the first. Also suppose w_i are weights. But w_0 is THE LAST.
So, after first value came I have first average
a_1 = v_1 * w_0
After the second value v_2 came, I should have average
a_2 = v_1 * w_1 + v_2 * w_0
With next value I should have
a_3 = v_1 * w_2 + v_2 * w_1 + v_3 * w_0
Note, that weight profile is moving with me, while I am moving along value sequence.
I.e. each value does not have it's own weight all the time. My goal is to have this weight lower while going to past.
First a bit of background. If we were keeping a normal average, it would go like this:
average(a) = 11
average(a,b) = (average(a)+b)/2
average(a,b,c) = (average(a,b)*2 + c)/3
average(a,b,c,d) = (average(a,b,c)*3 + d)/4
As you can see here, this is an "online" algorithm and we only need to keep track of pieces of data: 1) the total numbers in the average, and 2) the average itself. Then we can undivide the average by the total, add in the new number, and divide it by the new total.
Weighted averages are a bit different. It depends on what kind of weighted average. For example if you defined:
weightedAverage(a,wa, b,wb, c,wc, ..., z,wz) = a*wa + b*wb + c*wc + ... + w*wz
or
weightedAverage(elements, weights) = elements·weights
...then you don't need to do anything besides add the new element*weight! If however you defined the weighted average akin to an expected-value from probability:
weightedAverage(elements,weights) = elements·weights / sum(weights)
...then you'd need to keep track of the total weights. Instead of undividing by the total number of elements, you undivide by the total weight, add in the new element&ast;weight, then divide by the new total weight.
Alternatively you don't need to undivide, as demonstrated below: you can merely keep track of the temporary dot product and weight total in a closure or an object, and divide it as you yield (this can help a lot with avoiding numerical inaccuracy from compounded rounding errors).
In python this would be:
def makeAverager():
dotProduct = 0
totalWeight = 0
def averager(newValue, weight):
nonlocal dotProduct,totalWeight
dotProduct += newValue*weight
totalWeight += weight
return dotProduct/totalWeight
return averager
Demo:
>>> averager = makeAverager()
>>> [averager(value,w) for value,w in [(100,0.2), (50,0.5), (100,0.1)]]
[100.0, 64.28571428571429, 68.75]
>>> averager(10,1.1)
34.73684210526316
>>> averager(10,1.1)
25.666666666666668
>>> averager(30,2.0)
27.4
> But my task is to have average recalculated each time new value arrives having old values reweighted. –OP
Your task is almost always impossible, even with exceptionally simple weighting schemes.
You are asking to, with O(1) memory, yield averages with a changing weighting scheme. For example, {values·weights1, (values+[newValue2])·weights2, (values+[newValue2,newValue3])·weights3, ...} as new values are being passed in, for some nearly arbitrarily changing weights sequence. This is impossible due to injectivity. Once you merge the numbers in together, you lose a massive amount of information. For example, even if you had the weight vector, you could not recover the original value vector, or vice versa. There are only two cases I can think of where you could get away with this:
Constant weights such as [2,2,2,...2]: this is equivalent to an on-line averaging algorithm, which you don't want because the old values are not being "reweighted".
The relative weights of previous answers do not change. For example you could do weights of [8,4,2,1], and add in a new element with arbitrary weight like ...+[1], but you must increase all the previous by the same multiplicative factor, like [16,8,4,2]+[1]. Thus at each step, you are adding a new arbitrary weight, and a new arbitrary rescaling of the past, so you have 2 degrees of freedom (only 1 if you need to keep your dot-product normalized). The weight-vectors you'd get would look like:
[w0]
[w0*(s1), w1]
[w0*(s1*s2), w1*(s2), w2]
[w0*(s1*s2*s3), w1*(s2*s3), w2*(s3), w3]
...
Thus any weighting scheme you can make look like that will work (unless you need to keep the thing normalized by the sum of weights, in which case you must then divide the new average by the new sum, which you can calculate by keeping only O(1) memory). Merely multiply the previous average by the new s (which will implicitly distribute over the dot-product into the weights), and tack on the new +w*newValue.
I think you are looking for something like this:
void iterate(double value) {
count++;
weight = max(0, 1 - (count / 1000));
avg = ( avg * total_weight * (count - 1) + weight * value) / (total_weight * (count - 1) + weight)
total_weight += weight;
}
Here I'm assuming you want the weights to sum to 1. As long as you can generate a relative weight without it changing in the future, you can end up with a solution which mimics this behavior.
That is, suppose you defined your weights as a sequence {s_0, s_1, s_2, ..., s_n, ...} and defined the input as sequence {i_0, i_1, i_2, ..., i_n}.
Consider the form: sum(s_0*i_0 + s_1*i_1 + s_2*i_2 + ... + s_n*i_n) / sum(s_0 + s_1 + s_2 + ... + s_n). Note that it is trivially possible to compute this incrementally with a couple of aggregation counters:
int counter = 0;
double numerator = 0;
double denominator = 0;
void addValue(double val)
{
double weight = calculateWeightFromCounter(counter);
numerator += weight * val;
denominator += weight;
}
double getAverage()
{
if (denominator == 0.0) return 0.0;
return numerator / denominator;
}
Of course, calculateWeightFromCounter() in this case shouldn't generate weights that sum to one -- the trick here is that we average by dividing by the sum of the weights so that in the end, the weights virtually seem to sum to one.
The real trick is how you do calculateWeightFromCounter(). You could simply return the counter itself, for example, however note that the last weighted number would not be near the sum of the counters necessarily, so you may not end up with the exact properties you want. (It's hard to say since, as mentioned, you've left a fairly open problem.)
This is too long to post in a comment, but it may be useful to know.
Suppose you have:
w_0*v_n + ... w_n*v_0 (we'll call this w[0..n]*v[n..0] for short)
Then the next step is:
w_0*v_n1 + ... w_n1*v_0 (and this is w[0..n1]*v[n1..0] for short)
This means we need a way to calculate w[1..n1]*v[n..0] from w[0..n]*v[n..0].
It's certainly possible that v[n..0] is 0, ..., 0, z, 0, ..., 0 where z is at some location x.
If we don't have any 'extra' storage, then f(z*w(x))=z*w(x + 1) where w(x) is the weight for location x.
Rearranging the equation, w(x + 1) = f(z*w(x))/z. Well, w(x + 1) better be constant for a constant x, so f(z*w(x))/z better be constant. Hence, f must let z propagate -- that is, f(z*w(x)) = z*f(w(x)).
But here again we have an issue. Note that if z (which could be any number) can propagate through f, then w(x) certainly can. So f(z*w(x)) = w(x)*f(z). Thus f(w(x)) = w(x)/f(z).
But for a constant x, w(x) is constant, and thus f(w(x)) better be constant, too. w(x) is constant, so f(z) better be constant so that w(x)/f(z) is constant. Thus f(w(x)) = w(x)/c where c is a constant.
So, f(x)=c*x where c is a constant when x is a weight value.
So w(x+1) = c*w(x).
That is, each weight is a multiple of the previous. Thus, the weights take the form w(x)=m*b^x.
Note that this assumes the only information f has is the last aggregated value. Note that at some point you will be reduced to this case unless you're willing to store a non-constant amount of data representing your input. You cannot represent an infinite length vector of real numbers with a real number, but you can approximate them somehow in a constant, finite amount of storage. But this would merely be an approximation.
Although I haven't rigorously proven it, it is my conclusion that what you want is impossible to do with a high degree of precision, but you may be able to use log(n) space (which may as well be O(1) for many practical applications) to generate a quality approximation. You may be able to use even less.
I tried to practically code something (in Java). As has been said, your goal is not achievable. You can only count average from some number of last remembered values. If you don't need to be exact, you can approximate the older values. I tried to do it by remembering last 5 values exactly and older values only SUMmed by 5 values, remembering the last 5 SUMs. Then, the complexity is O(2n) for remembering last n+n*n values. This is a very rough approximation.
You can modify the "lastValues" and "lasAggregatedSums" array sizes as you want. See this ascii-art picture trying to display a graph of last values, showing that the first columns (older data) are remembered as aggregated value (not individually), and only the earliest 5 values are remembered individually.
values:
#####
##### ##### #
##### ##### ##### # #
##### ##### ##### ##### ## ##
##### ##### ##### ##### ##### #####
time: --->
Challenge 1: My example doesn't count weights, but I think it shouldn't be problem for you to add weights for the "lastAggregatedSums" appropriately - the only problem is, that if you want lower weights for older values, it would be harder, because the array is rotating, so it is not straightforward to know which weight for which array member. Maybe you can modify the algorithm to always "shift" values in the array instead of rotating? Then adding weights shouldn't be a problem.
Challenge 2: The arrays are initialized with 0 values, and those values are counting to the average from the beginning, even when we haven't receive enough values. If you are running the algorithm for long time, you probably don't bother that it is learning for some time at the beginning. If you do, you can post a modification ;-)
public class AverageCounter {
private float[] lastValues = new float[5];
private float[] lastAggregatedSums = new float[5];
private int valIdx = 0;
private int aggValIdx = 0;
private float avg;
public void add(float value) {
lastValues[valIdx++] = value;
if(valIdx == lastValues.length) {
// count average of last values and save into the aggregated array.
float sum = 0;
for(float v: lastValues) {sum += v;}
lastAggregatedSums[aggValIdx++] = sum;
if(aggValIdx >= lastAggregatedSums.length) {
// rotate aggregated values index
aggValIdx = 0;
}
valIdx = 0;
}
float sum = 0;
for(float v: lastValues) {sum += v;}
for(float v: lastAggregatedSums) {sum += v;}
avg = sum / (lastValues.length + lastAggregatedSums.length * lastValues.length);
}
public float getAvg() {
return avg;
}
}
you can combine (weighted sum) exponential means with different effective window sizes (N) in order to get the desired weights.
Use more exponential means to define your weight profile more detailed.
(more exponential means also means to store and calculate more values, so here is the trade off)
A memoryless solution is to calculate the new average from a weighted combination of the previous average and the new value:
average = (1 - P) * average + P * value
where P is an empirical constant, 0 <= P <= 1
expanding gives:
average = sum i (weight[i] * value[i])
where value[0] is the newest value, and
weight[i] = P * (1 - P) ^ i
When P is low, historical values are given higher weighting.
The closer P gets to 1, the more quickly it converges to newer values.
When P = 1, it's a regular assignment and ignores previous values.
If you want to maximise the contribution of value[N], maximize
weight[N] = P * (1 - P) ^ N
where 0 <= P <= 1
I discovered weight[N] is maximized when
P = 1 / (N + 1)

Combinatorial Optimization - Variation on Knapsack

Here is a real-world combinatorial optimization problem.
We are given a large set of value propositions for a certain product. The value propositions are of different types but each type is independent and adds equal benefit to the overall product. In building the product, we can include any non-negative integer number of "units" of each type. However, after adding the first unit of a certain type, the marginal benefit of additional units of that type continually decreases. In fact, the marginal benefit of a new unit is the inverse of the number of units of that type, after adding the new unit. Our product must have a least one unit of some type, and there is a small correction that we must make to the overall value because of this requirement.
Let T[] be an array representing the number of each type in a certain production run of the product. Then the overall value V is given by (pseudo code):
V = 1
For Each t in T
V = V * (t + 1)
Next t
V = V - 1 // correction
On cost side, units of the same type have the same cost. But units of different types each have unique, irrational costs. The number of types is large, but we are given an array of type costs C[] that is sorted from smallest to largest. Let's further assume that the type quantity array T[] is also sorted by cost from smallest to largest. Then the overall cost U is simply the sum of each unit cost:
U = 0
For i = 0, i < NumOfValueTypes
U = U + T[i] * C[i]
Next i
So far so good. So here is the problem: Given product P with value V and cost U, find the product Q with the cost U' and value V', having the minimal U' such that U' > U, V'/U' > V/U.
The problem you've described is nonlinear integer programming problem because it contains a product of integer variables t. Its feasibility set is not closed because of strict inequalities which can be worked around by using non-strict inequalities and adding a small positive number (epsilon) to the right hand sides. Then the problem can be formulated in AMPL as follows:
set Types;
param Costs{Types}; # C
param GivenProductValue; # V
param GivenProductCost; # U
param Epsilon;
var units{Types} integer >= 0; # T
var productCost = sum {t in Types} units[t] * Costs[t];
minimize cost: productCost;
s.t. greaterCost: productCost >= GivenProductCost + Epsilon;
s.t. greaterValuePerCost:
prod {t in Types} (units[t] + 1) - 1 >=
productCost * GivenProductValue / GivenProductCost + Epsilon;
This problem can be solved using a nonlinear integer programming solver such as Couenne.
Honestly I don't think there is an easy way to solve this. The best thing would be to write the system and solve it with a solver ( Excel solver will do the tricks, but you can use Ampl to solve this non lienar program.)
The Program:
Define: U;
V;
C=[c1,...cn];
Variables: T=[t1,t2,...tn];
Objective Function: SUM(ti.ci)
Constraints:
For all i: ti integer
SUM(ti.ci) > U
(PROD(ti+1)-1).U > V.SUM(ti.ci)
It works well with excel, (you just replace >U by >=U+d where d is the significative number of the costs- (i.e if C=[1.1, 1.8, 3.0, 9.3] d =0.1) since excel doesn't allow stric inequalities in the solver.)
I guess with a real solver like Ampl it will work perfectly.
Hope it helps,

"Approximate" greatest common divisor

Suppose you have a list of floating point numbers that are approximately multiples of a common quantity, for example
2.468, 3.700, 6.1699
which are approximately all multiples of 1.234. How would you characterize this "approximate gcd", and how would you proceed to compute or estimate it?
Strictly related to my answer to this question.
You can run Euclid's gcd algorithm with anything smaller then 0.01 (or a small number of your choice) being a pseudo 0. With your numbers:
3.700 = 1 * 2.468 + 1.232,
2.468 = 2 * 1.232 + 0.004.
So the pseudo gcd of the first two numbers is 1.232. Now you take the gcd of this with your last number:
6.1699 = 5 * 1.232 + 0.0099.
So 1.232 is the pseudo gcd, and the mutiples are 2,3,5. To improve this result, you may take the linear regression on the data points:
(2,2.468), (3,3.7), (5,6.1699).
The slope is the improved pseudo gcd.
Caveat: the first part of this is algorithm is numerically unstable - if you start with very dirty data, you are in trouble.
Express your measurements as multiples of the lowest one. Thus your list becomes 1.00000, 1.49919, 2.49996. The fractional parts of these values will be very close to 1/Nths, for some value of N dictated by how close your lowest value is to the fundamental frequency. I would suggest looping through increasing N until you find a sufficiently refined match. In this case, for N=1 (that is, assuming X=2.468 is your fundamental frequency) you would find a standard deviation of 0.3333 (two of the three values are .5 off of X * 1), which is unacceptably high. For N=2 (that is, assuming 2.468/2 is your fundamental frequency) you would find a standard deviation of virtually zero (all three values are within .001 of a multiple of X/2), thus 2.468/2 is your approximate GCD.
The major flaw in my plan is that it works best when the lowest measurement is the most accurate, which is likely not the case. This could be mitigated by performing the entire operation multiple times, discarding the lowest value on the list of measurements each time, then use the list of results of each pass to determine a more precise result. Another way to refine the results would be adjust the GCD to minimize the standard deviation between integer multiples of the GCD and the measured values.
This reminds me of the problem of finding good rational-number approximations of real numbers. The standard technique is a continued-fraction expansion:
def rationalizations(x):
assert 0 <= x
ix = int(x)
yield ix, 1
if x == ix: return
for numer, denom in rationalizations(1.0/(x-ix)):
yield denom + ix * numer, numer
We could apply this directly to Jonathan Leffler's and Sparr's approach:
>>> a, b, c = 2.468, 3.700, 6.1699
>>> b/a, c/a
(1.4991896272285252, 2.4999594813614263)
>>> list(itertools.islice(rationalizations(b/a), 3))
[(1, 1), (3, 2), (925, 617)]
>>> list(itertools.islice(rationalizations(c/a), 3))
[(2, 1), (5, 2), (30847, 12339)]
picking off the first good-enough approximation from each sequence. (3/2 and 5/2 here.) Or instead of directly comparing 3.0/2.0 to 1.499189..., you could notice than 925/617 uses much larger integers than 3/2, making 3/2 an excellent place to stop.
It shouldn't much matter which of the numbers you divide by. (Using a/b and c/b you get 2/3 and 5/3, for instance.) Once you have integer ratios, you could refine the implied estimate of the fundamental using shsmurfy's linear regression. Everybody wins!
I'm assuming all of your numbers are multiples of integer values. For the rest of my explanation, A will denote the "root" frequency you are trying to find and B will be an array of the numbers you have to start with.
What you are trying to do is superficially similar to linear regression. You are trying to find a linear model y=mx+b that minimizes the average distance between a linear model and a set of data. In your case, b=0, m is the root frequency, and y represents the given values. The biggest problem is that the independent variables X are not explicitly given. The only thing we know about X is that all of its members must be integers.
Your first task is trying to determine these independent variables. The best method I can think of at the moment assumes that the given frequencies have nearly consecutive indexes (x_1=x_0+n). So B_0/B_1=(x_0)/(x_0+n) given a (hopefully) small integer n. You can then take advantage of the fact that x_0 = n/(B_1-B_0), start with n=1, and keep ratcheting it up until k-rnd(k) is within a certain threshold. After you have x_0 (the initial index), you can approximate the root frequency (A = B_0/x_0). Then you can approximate the other indexes by finding x_n = rnd(B_n/A). This method is not very robust and will probably fail if the error in the data is large.
If you want a better approximation of the root frequency A, you can use linear regression to minimize the error of the linear model now that you have the corresponding dependent variables. The easiest method to do so uses least squares fitting. Wolfram's Mathworld has a in-depth mathematical treatment of the issue, but a fairly simple explanation can be found with some googling.
Interesting question...not easy.
I suppose I would look at the ratios of the sample values:
3.700 / 2.468 = 1.499...
6.1699 / 2.468 = 2.4999...
6.1699 / 3.700 = 1.6675...
And I'd then be looking for a simple ratio of integers in those results.
1.499 ~= 3/2
2.4999 ~= 5/2
1.6675 ~= 5/3
I haven't chased it through, but somewhere along the line, you decide that an error of 1:1000 or something is good enough, and you back-track to find the base approximate GCD.
The solution which I've seen and used myself is to choose some constant, say 1000, multiply all numbers by this constant, round them to integers, find the GCD of these integers using the standard algorithm and then divide the result by the said constant (1000). The larger the constant, the higher the precision.
This is a reformulaiton of shsmurfy's solution when you a priori choose 3 positive tolerances (e1,e2,e3)
The problem is then to search smallest positive integers (n1,n2,n3) and thus largest root frequency f such that:
f1 = n1*f +/- e1
f2 = n2*f +/- e2
f3 = n3*f +/- e3
We assume 0 <= f1 <= f2 <= f3
If we fix n1, then we get these relations:
f is in interval I1=[(f1-e1)/n1 , (f1+e1)/n1]
n2 is in interval I2=[n1*(f2-e2)/(f1+e1) , n1*(f2+e2)/(f1-e1)]
n3 is in interval I3=[n1*(f3-e3)/(f1+e1) , n1*(f3+e3)/(f1-e1)]
We start with n1 = 1, then increment n1 until the interval I2 and I3 contain an integer - that is floor(I2min) different from floor(I2max) same with I3
We then choose smallest integer n2 in interval I2, and smallest integer n3 in interval I3.
Assuming normal distribution of floating point errors, the most probable estimate of root frequency f is the one minimizing
J = (f1/n1 - f)^2 + (f2/n2 - f)^2 + (f3/n3 - f)^2
That is
f = (f1/n1 + f2/n2 + f3/n3)/3
If there are several integers n2,n3 in intervals I2,I3 we could also choose the pair that minimize the residue
min(J)*3/2=(f1/n1)^2+(f2/n2)^2+(f3/n3)^2-(f1/n1)*(f2/n2)-(f1/n1)*(f3/n3)-(f2/n2)*(f3/n3)
Another variant could be to continue iteration and try to minimize another criterium like min(J(n1))*n1, until f falls below a certain frequency (n1 reaches an upper limit)...
I found this question looking for answers for mine in MathStackExchange (here and here).
I've only managed (yet) to measure the appeal of a fundamental frequency given a list of harmonic frequencies (following the sound/music nomenclature), which can be useful if you have a reduced number of options and is feasible to compute the appeal of each one and then choose the best fit.
C&P from my question in MSE (there the formatting is prettier):
being v the list {v_1, v_2, ..., v_n}, ordered from lower to higher
mean_sin(v, x) = sum(sin(2*pi*v_i/x), for i in {1, ...,n})/n
mean_cos(v, x) = sum(cos(2*pi*v_i/x), for i in {1, ...,n})/n
gcd_appeal(v, x) = 1 - sqrt(mean_sin(v, x)^2 + (mean_cos(v, x) - 1)^2)/2, which yields a number in the interval [0,1].
The goal is to find the x that maximizes the appeal. Here is the (gcd_appeal) graph for your example [2.468, 3.700, 6.1699], where you find that the optimum GCD is at x = 1.2337899957639993
Edit:
You may find handy this JAVA code to calculate the (fuzzy) divisibility (aka gcd_appeal) of a divisor relative to a list of dividends; you can use it to test which of your candidates makes the best divisor. The code looks ugly because I tried to optimize it for performance.
//returns the mean divisibility of dividend/divisor as a value in the range [0 and 1]
// 0 means no divisibility at all
// 1 means full divisibility
public double divisibility(double divisor, double... dividends) {
double n = dividends.length;
double factor = 2.0 / divisor;
double sum_x = -n;
double sum_y = 0.0;
double[] coord = new double[2];
for (double v : dividends) {
coordinates(v * factor, coord);
sum_x += coord[0];
sum_y += coord[1];
}
double err = 1.0 - Math.sqrt(sum_x * sum_x + sum_y * sum_y) / (2.0 * n);
//Might happen due to approximation error
return err >= 0.0 ? err : 0.0;
}
private void coordinates(double x, double[] out) {
//Bhaskara performant approximation to
//out[0] = Math.cos(Math.PI*x);
//out[1] = Math.sin(Math.PI*x);
long cos_int_part = (long) (x + 0.5);
long sin_int_part = (long) x;
double rem = x - cos_int_part;
if (cos_int_part != sin_int_part) {
double common_s = 4.0 * rem;
double cos_rem_s = common_s * rem - 1.0;
double sin_rem_s = cos_rem_s + common_s + 1.0;
out[0] = (((cos_int_part & 1L) * 8L - 4L) * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (((sin_int_part & 1L) * 8L - 4L) * sin_rem_s) / (sin_rem_s + 5.0);
} else {
double common_s = 4.0 * rem - 4.0;
double sin_rem_s = common_s * rem;
double cos_rem_s = sin_rem_s + common_s + 3.0;
double common_2 = ((cos_int_part & 1L) * 8L - 4L);
out[0] = (common_2 * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (common_2 * sin_rem_s) / (sin_rem_s + 5.0);
}
}

Resources