How to calculate iteratively the running weighted average so that last values to weight most? - algorithm

I want to implement an iterative algorithm, which calculates weighted average. The specific weight law does not matter, but it should be close to 1 for the newest values and close to 0 to the oldest.
The algorithm should be iterative. i.e. it should not remember all previous values. It should know only one newest value and any aggregative information about past, like previous values of the average, sums, counts etc.
Is it possible?
For example, the following algorithm can be:
void iterate(double value) {
sum *= 0.99;
sum += value;
count++;
avg = sum / count;
}
It will give exponential decreasing weight, which may be not good. Is it possible to have step decreasing weight or something?
EDIT 1
The the requirements for weighing law is follows:
1) The weight decreases into past
2) I has some mean or characteristic duration so that values older this duration matters much lesser than newer ones
3) I should be able to set this duration
EDIT 2
I need the following. Suppose v_i are values, where v_1 is the first. Also suppose w_i are weights. But w_0 is THE LAST.
So, after first value came I have first average
a_1 = v_1 * w_0
After the second value v_2 came, I should have average
a_2 = v_1 * w_1 + v_2 * w_0
With next value I should have
a_3 = v_1 * w_2 + v_2 * w_1 + v_3 * w_0
Note, that weight profile is moving with me, while I am moving along value sequence.
I.e. each value does not have it's own weight all the time. My goal is to have this weight lower while going to past.

First a bit of background. If we were keeping a normal average, it would go like this:
average(a) = 11
average(a,b) = (average(a)+b)/2
average(a,b,c) = (average(a,b)*2 + c)/3
average(a,b,c,d) = (average(a,b,c)*3 + d)/4
As you can see here, this is an "online" algorithm and we only need to keep track of pieces of data: 1) the total numbers in the average, and 2) the average itself. Then we can undivide the average by the total, add in the new number, and divide it by the new total.
Weighted averages are a bit different. It depends on what kind of weighted average. For example if you defined:
weightedAverage(a,wa, b,wb, c,wc, ..., z,wz) = a*wa + b*wb + c*wc + ... + w*wz
or
weightedAverage(elements, weights) = elements·weights
...then you don't need to do anything besides add the new element*weight! If however you defined the weighted average akin to an expected-value from probability:
weightedAverage(elements,weights) = elements·weights / sum(weights)
...then you'd need to keep track of the total weights. Instead of undividing by the total number of elements, you undivide by the total weight, add in the new element*weight, then divide by the new total weight.
Alternatively you don't need to undivide, as demonstrated below: you can merely keep track of the temporary dot product and weight total in a closure or an object, and divide it as you yield (this can help a lot with avoiding numerical inaccuracy from compounded rounding errors).
In python this would be:
def makeAverager():
dotProduct = 0
totalWeight = 0
def averager(newValue, weight):
nonlocal dotProduct,totalWeight
dotProduct += newValue*weight
totalWeight += weight
return dotProduct/totalWeight
return averager
Demo:
>>> averager = makeAverager()
>>> [averager(value,w) for value,w in [(100,0.2), (50,0.5), (100,0.1)]]
[100.0, 64.28571428571429, 68.75]
>>> averager(10,1.1)
34.73684210526316
>>> averager(10,1.1)
25.666666666666668
>>> averager(30,2.0)
27.4

> But my task is to have average recalculated each time new value arrives having old values reweighted. –OP
Your task is almost always impossible, even with exceptionally simple weighting schemes.
You are asking to, with O(1) memory, yield averages with a changing weighting scheme. For example, {values·weights1, (values+[newValue2])·weights2, (values+[newValue2,newValue3])·weights3, ...} as new values are being passed in, for some nearly arbitrarily changing weights sequence. This is impossible due to injectivity. Once you merge the numbers in together, you lose a massive amount of information. For example, even if you had the weight vector, you could not recover the original value vector, or vice versa. There are only two cases I can think of where you could get away with this:
Constant weights such as [2,2,2,...2]: this is equivalent to an on-line averaging algorithm, which you don't want because the old values are not being "reweighted".
The relative weights of previous answers do not change. For example you could do weights of [8,4,2,1], and add in a new element with arbitrary weight like ...+[1], but you must increase all the previous by the same multiplicative factor, like [16,8,4,2]+[1]. Thus at each step, you are adding a new arbitrary weight, and a new arbitrary rescaling of the past, so you have 2 degrees of freedom (only 1 if you need to keep your dot-product normalized). The weight-vectors you'd get would look like:
[w0]
[w0*(s1), w1]
[w0*(s1*s2), w1*(s2), w2]
[w0*(s1*s2*s3), w1*(s2*s3), w2*(s3), w3]
...
Thus any weighting scheme you can make look like that will work (unless you need to keep the thing normalized by the sum of weights, in which case you must then divide the new average by the new sum, which you can calculate by keeping only O(1) memory). Merely multiply the previous average by the new s (which will implicitly distribute over the dot-product into the weights), and tack on the new +w*newValue.

I think you are looking for something like this:
void iterate(double value) {
count++;
weight = max(0, 1 - (count / 1000));
avg = ( avg * total_weight * (count - 1) + weight * value) / (total_weight * (count - 1) + weight)
total_weight += weight;
}

Here I'm assuming you want the weights to sum to 1. As long as you can generate a relative weight without it changing in the future, you can end up with a solution which mimics this behavior.
That is, suppose you defined your weights as a sequence {s_0, s_1, s_2, ..., s_n, ...} and defined the input as sequence {i_0, i_1, i_2, ..., i_n}.
Consider the form: sum(s_0*i_0 + s_1*i_1 + s_2*i_2 + ... + s_n*i_n) / sum(s_0 + s_1 + s_2 + ... + s_n). Note that it is trivially possible to compute this incrementally with a couple of aggregation counters:
int counter = 0;
double numerator = 0;
double denominator = 0;
void addValue(double val)
{
double weight = calculateWeightFromCounter(counter);
numerator += weight * val;
denominator += weight;
}
double getAverage()
{
if (denominator == 0.0) return 0.0;
return numerator / denominator;
}
Of course, calculateWeightFromCounter() in this case shouldn't generate weights that sum to one -- the trick here is that we average by dividing by the sum of the weights so that in the end, the weights virtually seem to sum to one.
The real trick is how you do calculateWeightFromCounter(). You could simply return the counter itself, for example, however note that the last weighted number would not be near the sum of the counters necessarily, so you may not end up with the exact properties you want. (It's hard to say since, as mentioned, you've left a fairly open problem.)

This is too long to post in a comment, but it may be useful to know.
Suppose you have:
w_0*v_n + ... w_n*v_0 (we'll call this w[0..n]*v[n..0] for short)
Then the next step is:
w_0*v_n1 + ... w_n1*v_0 (and this is w[0..n1]*v[n1..0] for short)
This means we need a way to calculate w[1..n1]*v[n..0] from w[0..n]*v[n..0].
It's certainly possible that v[n..0] is 0, ..., 0, z, 0, ..., 0 where z is at some location x.
If we don't have any 'extra' storage, then f(z*w(x))=z*w(x + 1) where w(x) is the weight for location x.
Rearranging the equation, w(x + 1) = f(z*w(x))/z. Well, w(x + 1) better be constant for a constant x, so f(z*w(x))/z better be constant. Hence, f must let z propagate -- that is, f(z*w(x)) = z*f(w(x)).
But here again we have an issue. Note that if z (which could be any number) can propagate through f, then w(x) certainly can. So f(z*w(x)) = w(x)*f(z). Thus f(w(x)) = w(x)/f(z).
But for a constant x, w(x) is constant, and thus f(w(x)) better be constant, too. w(x) is constant, so f(z) better be constant so that w(x)/f(z) is constant. Thus f(w(x)) = w(x)/c where c is a constant.
So, f(x)=c*x where c is a constant when x is a weight value.
So w(x+1) = c*w(x).
That is, each weight is a multiple of the previous. Thus, the weights take the form w(x)=m*b^x.
Note that this assumes the only information f has is the last aggregated value. Note that at some point you will be reduced to this case unless you're willing to store a non-constant amount of data representing your input. You cannot represent an infinite length vector of real numbers with a real number, but you can approximate them somehow in a constant, finite amount of storage. But this would merely be an approximation.
Although I haven't rigorously proven it, it is my conclusion that what you want is impossible to do with a high degree of precision, but you may be able to use log(n) space (which may as well be O(1) for many practical applications) to generate a quality approximation. You may be able to use even less.

I tried to practically code something (in Java). As has been said, your goal is not achievable. You can only count average from some number of last remembered values. If you don't need to be exact, you can approximate the older values. I tried to do it by remembering last 5 values exactly and older values only SUMmed by 5 values, remembering the last 5 SUMs. Then, the complexity is O(2n) for remembering last n+n*n values. This is a very rough approximation.
You can modify the "lastValues" and "lasAggregatedSums" array sizes as you want. See this ascii-art picture trying to display a graph of last values, showing that the first columns (older data) are remembered as aggregated value (not individually), and only the earliest 5 values are remembered individually.
values:
#####
##### ##### #
##### ##### ##### # #
##### ##### ##### ##### ## ##
##### ##### ##### ##### ##### #####
time: --->
Challenge 1: My example doesn't count weights, but I think it shouldn't be problem for you to add weights for the "lastAggregatedSums" appropriately - the only problem is, that if you want lower weights for older values, it would be harder, because the array is rotating, so it is not straightforward to know which weight for which array member. Maybe you can modify the algorithm to always "shift" values in the array instead of rotating? Then adding weights shouldn't be a problem.
Challenge 2: The arrays are initialized with 0 values, and those values are counting to the average from the beginning, even when we haven't receive enough values. If you are running the algorithm for long time, you probably don't bother that it is learning for some time at the beginning. If you do, you can post a modification ;-)
public class AverageCounter {
private float[] lastValues = new float[5];
private float[] lastAggregatedSums = new float[5];
private int valIdx = 0;
private int aggValIdx = 0;
private float avg;
public void add(float value) {
lastValues[valIdx++] = value;
if(valIdx == lastValues.length) {
// count average of last values and save into the aggregated array.
float sum = 0;
for(float v: lastValues) {sum += v;}
lastAggregatedSums[aggValIdx++] = sum;
if(aggValIdx >= lastAggregatedSums.length) {
// rotate aggregated values index
aggValIdx = 0;
}
valIdx = 0;
}
float sum = 0;
for(float v: lastValues) {sum += v;}
for(float v: lastAggregatedSums) {sum += v;}
avg = sum / (lastValues.length + lastAggregatedSums.length * lastValues.length);
}
public float getAvg() {
return avg;
}
}

you can combine (weighted sum) exponential means with different effective window sizes (N) in order to get the desired weights.
Use more exponential means to define your weight profile more detailed.
(more exponential means also means to store and calculate more values, so here is the trade off)

A memoryless solution is to calculate the new average from a weighted combination of the previous average and the new value:
average = (1 - P) * average + P * value
where P is an empirical constant, 0 <= P <= 1
expanding gives:
average = sum i (weight[i] * value[i])
where value[0] is the newest value, and
weight[i] = P * (1 - P) ^ i
When P is low, historical values are given higher weighting.
The closer P gets to 1, the more quickly it converges to newer values.
When P = 1, it's a regular assignment and ignores previous values.
If you want to maximise the contribution of value[N], maximize
weight[N] = P * (1 - P) ^ N
where 0 <= P <= 1
I discovered weight[N] is maximized when
P = 1 / (N + 1)

Related

Calculating large exponential shares / probabilities

Let there be an event space ES.
Let there be some sets of objects OS[].
The probabilities of selecting any object are mutually disjoint.
Now, assume that the size of each set is based on a number X[i] assigned to it.
The size of each set rises exponentially with that number.
The base (B) used for exponentiation could be the Euler's number (e), due to its nice properties, but let's assume that, that might not be the case.
Now, we are after calculating the probability of selecting any member of a selected set, at random, while keeping in mind that the arity of each set might be very large.
After the sequence of probabilities is known it's used to compute P[i]*(C).
I wonder if this could be optimized/approximated for very large exponents i.e. computed with low memory consumption i.e. implemented.
Related question I found is here still they seem to tackle only opposite probabilities.
// Numerical example:
// A,C - constants, natural numbers
//exponents
X[1] = 3432342332;
X[2] = 55438849;
X[3] = 34533;
//probabilities
P1 = A^X[1]/(A^X[1]+A^X[2]+A^X[3]);
P2 = A^X[2]/(A^X[1]+A^X[2]+A^X[3]);
P3 = A^X[3]/(A^X[1]+A^X[2]+A^X[3]);
//Results
R1 = P1 *C;
R2 = P2 *C;
R3 = P3 *C;
Excel would fail when exponents are larger than few hundreds.
So you have a number a>1, an integer array B of n elements, and for each i, you are to calculate a^B[i] / (a^B[1] + a^B[2] + ... + a^B[n]) .
Let C[i] = B[i] - max(B[1], ..., B[n]). Then you calculate
a^C[i] / (a^C[1] + a^C[2] + ... + a^C[n]). Since all elements of C are now non-positive, you don't care about overflow.

Conditional sampling of binary vectors (?)

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...
I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.
A tiny example:
Out of the vectors:
110
010
011
110
100
I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be
110
100
My ideas so far
one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?
My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).
Firstly I will make some assumptions regarding this problem:
Regardless whether the column sum of the selected solution is over or under the target, it weighs the same.
The sum of the first, second, and third column are equally weighted in the solution (i.e. If there's a solution whereas the first column sum is off by 1, and another where the third column sum is off by 1, the solution are equally good).
The closest problem I can think of this problem is the Subset sum problem, which itself can be thought of a special case of Knapsack problem.
However both of these problem are NP-Complete. This means there are no polynomial time algorithm that can solve them, even though it is easy to verify the solution.
If I were you the two most arguably efficient solution of this problem are linear programming and machine learning.
Depending on how many columns you are optimising in this problem, with linear programming you can control how much finely tuned you want the solution, in exchange of time. You should read up on this, because this is fairly simple and efficient.
With Machine learning, you need a lot of data sets (the set of vectors and the set of solutions). You don't even need to specify what you want, a lot of machine learning algorithms can generally deduce what you want them to optimise based on your data set.
Both solution has pros and cons, you should decide which one to use yourself based on the circumstances and problem set.
This definitely can be modeled as (integer!) linear program (many problems can). Once you have it, you can use a program such as lpsolve to solve it.
We model vector i is selected as x_i which can be 0 or 1.
Then for each column c, we have a constraint:
sum of all (x_i * value of i in column c) = target for column c
Taking your example, in lp_solve this could look like:
min: ;
+x1 +x4 +x5 >= 2;
+x1 +x4 +x5 <= 2;
+x1 +x2 +x3 +x4 <= 1;
+x1 +x2 +x3 +x4 >= 1;
+x3 <= 0;
+x3 >= 0;
bin x1, x2, x3, x4, x5;
If you are fine with a heuristic based search approach, here is one.
Go over the list and find the minimum squared sum of the digit wise difference between each bit string and the goal. For example, if we are looking for 2, 1, 0, and we are scoring 0, 1, 0, we would do it in the following way:
Take the digit wise difference:
2, 0, 1
Square the digit wise difference:
4, 0, 1
Sum:
5
As a side note, squaring the difference when scoring is a common method when doing heuristic search. In your case, it makes sense because bit strings that have a 1 in as the first digit are a lot more interesting to us. In your case this simple algorithm would pick first 110, then 100, which would is the best solution.
In any case, there are some optimizations that could be made to this, I will post them here if this kind of approach is what you are looking for, but this is the core of the algorithm.
You have a given target binary vector. You want to select M vectors out of N that have the closest sum to the target. Let's say you use the eucilidean distance to measure if a selection is better than another.
If you want an exact sum, have a look at the k-sum problem which is a generalization of the 3SUM problem. The problem is harder than the subset sum problem, because you want an exact number of elements to add to a target value. There is a solution in O(N^(M/2)). lg N), but that means more than 2000^250 * 7.6 > 10^826 operations in your case (in the favorable case where vectors operations have a cost of 1).
First conclusion: do not try to get an exact result unless your vectors have some characteristics that may reduce the complexity.
Here's a hill climbing approach:
sort the vectors by number of 1's: 111... first, 000... last;
use the polynomial time approximate algorithm for the subset sum;
you have an approximate solution with K elements. Because of the order of elements (the big ones come first), K should be a little as possible:
if K >= M, you take the M first vectors of the solution and that's probably near the best you can do.
if K < M, you can remove the first vector and try to replace it with 2 or more vectors from the rest of the N vectors, using the same technique, until you have M vectors. To sumarize: split the big vectors into smaller ones until you reach the correct number of vectors.
Here's a proof of concept with numbers, in Python:
import random
def distance(x, y):
return abs(x-y)
def show(ls):
if len(ls) < 10:
return str(ls)
else:
return ", ".join(map(str, ls[:5]+("...",)+ls[-5:]))
def find(is_xs, target):
# see https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution
S = [(0, ())] # we store indices along with values to get the path
for i, x in is_xs:
T = [(x + t, js + (i,)) for t, js in S]
U = sorted(S + T)
y, ks = U[0]
S = [(y, ks)]
for z, ls in U:
if z == target: # use the euclidean distance here if you want an approximation
return ls
if z != y and z < target:
y, ks = z, ls
S.append((z, ls))
ls = S[-1][1] # take the closest element to target
return ls
N = 2000
M = 500
target = 1000
xs = [random.randint(0, 10) for _ in range(N)]
print ("Take {} numbers out of {} to make a sum of {}", M, xs, target)
xs = sorted(xs, reverse = True)
is_xs = list(enumerate(xs))
print ("Sorted numbers: {}".format(show(tuple(is_xs))))
ls = find(is_xs, target)
print("FIRST TRY: {} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
splits = 0
while len(ls) < M:
first_x = xs[ls[0]]
js_ys = [(i, x) for i, x in is_xs if i not in ls and x != first_x]
replace = find(js_ys, first_x)
splits += 1
if len(replace) < 2 or len(replace) + len(ls) - 1 > M or sum(xs[i] for i in replace) != first_x:
print("Give up: can't replace {}.\nAdd the lowest elements.")
ls += tuple([i for i, x in is_xs if i not in ls][len(ls)-M:])
break
print ("Replace {} (={}) by {} (={})".format(ls[:1], first_x, replace, sum(xs[i] for i in replace)))
ls = tuple(sorted(ls[1:] + replace)) # use a heap?
print("{} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
print("AFTER {} splits, {} -> {}".format(splits, ls, sum(x for i, x in is_xs if i in ls)))
The result is obviously not guaranteed to be optimal.
Remarks:
Complexity: find has a polynomial time complexity (see the Wikipedia page) and is called at most M^2 times, hence the complexity remains polynomial. In practice, the process is reasonably fast (split calls have a small target).
Vectors: to ensure that you reach the target with the minimum of elements, you can improve the order of element. Your target is (t_1, ..., t_c): if you sort the t_js from max to min, you get the more importants columns first. You can sort the vectors: by number of 1s and then by the presence of a 1 in the most important columns. E.g. target = 4 8 6 => 1 1 1 > 0 1 1 > 1 1 0 > 1 0 1 > 0 1 0 > 0 0 1 > 1 0 0 > 0 0 0.
find (Vectors) if the current sum exceed the target in all the columns, then you're not connecting to the target (any vector you add to the current sum will bring you farther from the target): don't add the sum to S (z >= target case for numbers).
I propose a simple ad hoc algorithm, which, broadly speaking, is a kind of gradient descent algorithm. It seems to work relatively well for input vectors which have a distribution of 1s “similar” to the target sum vector, and probably also for all “nice” input vectors, as defined in a comment of yours. The solution is not exact, but the approximation seems good.
The distance between the sum vector of the output vectors and the target vector is taken to be Euclidean. To minimize it means minimizing the sum of the square differences off sum vector and target vector (the square root is not needed because it is monotonic). The algorithm does not guarantee to yield the sample that minimizes the distance from the target, but anyway makes a serious attempt at doing so, by always moving in some locally optimal direction.
The algorithm can be split into 3 parts.
First of all the first M candidate output vectors out of the N input vectors (e.g., N=2000, M=500) are put in a list, and the remaining vectors are put in another.
Then "approximately optimal" swaps between vectors in the two lists are done, until either the distance would not decrease any more, or a predefined maximum number of iterations is reached. An approximately optimal swap is one where removing the first vector from the list of output vectors causes a maximal decrease or minimal increase of the distance, and then, after the removal of the first vector, adding the second vector to the same list causes a maximal decrease of the distance. The whole swap is avoided if the net result is not a decrease of the distance.
Then, as a last phase, "optimal" swaps are done, again stopping on no decrease in distance or maximum number of iterations reached. Optimal swaps cause a maximal decrease of the distance, without requiring the removal of the first vector to be optimal in itself. To find an optimal swap all vector pairs have to be checked. This phase is much more expensive, being O(M(N-M)), while the previous "approximate" phase is O(M+(N-M))=O(N). Luckily, when entering this phase, most of the work has already been done by the previous phase.
from typing import List, Tuple
def get_sample(vects: List[Tuple[int]], target: Tuple[int], n_out: int,
max_approx_swaps: int = None, max_optimal_swaps: int = None,
verbose: bool = False) -> List[Tuple[int]]:
"""
Get a sample of the input vectors having a sum close to the target vector.
Closeness is measured in Euclidean metrics. The output is not guaranteed to be
optimal (minimum square distance from target), but a serious attempt is made.
The max_* parameters can be used to avoid too long execution times,
tune them to your needs by setting verbose to True, or leave them None (∞).
:param vects: the list of vectors (tuples) with the same number of "columns"
:param target: the target vector, with the same number of "columns"
:param n_out: the requested sample size
:param max_approx_swaps: the max number of approximately optimal vector swaps,
None means unlimited (default: None)
:param max_optimal_swaps: the max number of optimal vector swaps,
None means unlimited (default: None)
:param verbose: print some info if True (default: False)
:return: the sample of n_out vectors having a sum close to the target vector
"""
def square_distance(v1, v2):
return sum((e1 - e2) ** 2 for e1, e2 in zip(v1, v2))
n_vec = len(vects)
assert n_vec > 0
assert n_out > 0
n_rem = n_vec - n_out
assert n_rem > 0
output = vects[:n_out]
remain = vects[n_out:]
n_col = len(vects[0])
assert n_col == len(target) > 0
sumvect = (0,) * n_col
for outvect in output:
sumvect = tuple(map(int.__add__, sumvect, outvect))
sqdist = square_distance(sumvect, target)
if verbose:
print(f"sqdist = {sqdist:4} after"
f" picking the first {n_out} vectors out of {n_vec}")
if max_approx_swaps is None:
max_approx_swaps = sqdist
n_approx_swaps = 0
while sqdist and n_approx_swaps < max_approx_swaps:
# find the best vect to subtract (the square distance MAY increase)
sqdist_0 = None
index_0 = None
sumvect_0 = None
for index in range(n_out):
tmp_sumvect = tuple(map(int.__sub__, sumvect, output[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_0 is None or sqdist_0 > tmp_sqdist:
sqdist_0 = tmp_sqdist
index_0 = index
sumvect_0 = tmp_sumvect
# find the best vect to add,
# but only if there is a net decrease of the square distance
sqdist_1 = sqdist
index_1 = None
sumvect_1 = None
for index in range(n_rem):
tmp_sumvect = tuple(map(int.__add__, sumvect_0, remain[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_1 > tmp_sqdist:
sqdist_1 = tmp_sqdist
index_1 = index
sumvect_1 = tmp_sumvect
if sumvect_1:
tmp = output[index_0]
output[index_0] = remain[index_1]
remain[index_1] = tmp
sqdist = sqdist_1
sumvect = sumvect_1
n_approx_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_approx_swaps}"
f" approximately optimal swap{'s'[n_approx_swaps == 1:]}")
diffvect = tuple(map(int.__sub__, sumvect, target))
if max_optimal_swaps is None:
max_optimal_swaps = sqdist
n_optimal_swaps = 0
while sqdist and n_optimal_swaps < max_optimal_swaps:
# find the best pair to swap,
# but only if the square distance decreases
best_sqdist = sqdist
best_diffvect = diffvect
best_pair = None
for i0 in range(M):
tmp_diffvect = tuple(map(int.__sub__, diffvect, output[i0]))
for i1 in range(n_rem):
new_diffvect = tuple(map(int.__add__, tmp_diffvect, remain[i1]))
new_sqdist = sum(d * d for d in new_diffvect)
if best_sqdist > new_sqdist:
best_sqdist = new_sqdist
best_diffvect = new_diffvect
best_pair = (i0, i1)
if best_pair:
tmp = output[best_pair[0]]
output[best_pair[0]] = remain[best_pair[1]]
remain[best_pair[1]] = tmp
sqdist = best_sqdist
diffvect = best_diffvect
n_optimal_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_optimal_swaps}"
f" optimal swap{'s'[n_optimal_swaps == 1:]}")
return output
from random import randrange
C = 30 # number of columns
N = 2000 # total number of vectors
M = 500 # number of output vectors
F = 0.9 # fill factor of the target sum vector
T = int(M * F) # maximum value + 1 that can be appear in the target sum vector
A = 10000 # maximum number of approximately optimal swaps, may be None (∞)
B = 10 # maximum number of optimal swaps, may be None (unlimited)
target = tuple(randrange(T) for _ in range(C))
vects = [tuple(int(randrange(M) < t) for t in target) for _ in range(N)]
sample = get_sample(vects, target, M, A, B, True)
Typical output:
sqdist = 2639 after picking the first 500 vectors out of 2000
sqdist = 9 after 27 approximately optimal swaps
sqdist = 1 after 4 optimal swaps
P.S.: As it stands, this algorithm is not limited to binary input vectors, integer vectors would work too. Intuitively I suspect that the quality of the optimization could suffer, though. I suspect that this algorithm is more appropriate for binary vectors.
P.P.S.: Execution times with your kind of data are probably acceptable with standard CPython, but get better (like a couple of seconds, almost a factor of 10) with PyPy. To handle bigger sets of data, the algorithm would have to be translated to C or some other language, which should not be difficult at all.

Generating Random Numbers for RPG games

I'm wondering if there is an algorithm to generate random numbers that most likely will be low in a range from min to max. For instance if you generate a random number between 1 and 100 it should most of the time be below 30 if you call the function with f(min: 1, max: 100, avg: 30), but if you call it with f(min: 1, max: 200, avg: 10) the most the average should be 10. A lot of games does this, but I simply can't find a way to do this with formula. Most of the examples I have seen uses a "drop table" or something like that.
I have come up with a fairly simple way to weight the outcome of a roll, but it is not very efficient and you don't have a lot of control over it
var pseudoRand = function(min, max, n) {
if (n > 0) {
return pseudoRand(min, Math.random() * (max - min) + min, n - 1)
}
return max;
}
rands = []
for (var i = 0; i < 20000; i++) {
rands.push(pseudoRand(0, 100, 1))
}
avg = rands.reduce(function(x, y) { return x + y } ) / rands.length
console.log(avg); // ~50
The function simply picks a random number between min and max N times, where it for every iteration updates the max with the last roll. So if you call it with N = 2, and max = 100 then it must roll 100 two times in a row in order to return 100
I have looked at some distributions on wikipedia, but I don't quite understand them enough to know how I can control the min and max outputs etc.
Any help is very much welcomed
A simple way to generate a random number with a given distribution is to pick a random number from a list where the numbers that should occur more often are repeated according with the desired distribution.
For example if you create a list [1,1,1,2,2,2,3,3,3,4] and pick a random index from 0 to 9 to select an element from that list you will get a number <4 with 90% probability.
Alternatively, using the distribution from the example above, generate an array [2,5,8,9] and pick a random integer from 0 to 9, if it's ≤2 (this will occur with 30% probability) then return 1, if it's >2 and ≤5 (this will also occur with 30% probability) return 2, etc.
Explained here: https://softwareengineering.stackexchange.com/a/150618
A probability distribution function is just a function that, when you put in a value X, will return the probability of getting that value X. A cumulative distribution function is the probability of getting a number less than or equal to X. A CDF is the integral of a PDF. A CDF is almost always a one-to-one function, so it almost always has an inverse.
To generate a PDF, plot the value on the x-axis and the probability on the y-axis. The sum (discrete) or integral (continuous) of all the probabilities should add up to 1. Find some function that models that equation correctly. To do this, you may have to look up some PDFs.
Basic Algorithm
https://en.wikipedia.org/wiki/Inverse_transform_sampling
This algorithm is based off of Inverse Transform Sampling. The idea behind ITS is that you are randomly picking a value on the y-axis of the CDF and finding the x-value it corresponds to. This makes sense because the more likely a value is to be randomly selected, the more "space" it will take up on the y-axis of the CDF.
Come up with some probability distribution formula. For instance, if you want it so that as the numbers get higher the odds of them being chosen increases, you could use something like f(x)=x or f(x)=x^2. If you want something that bulges in the middle, you could use the Gaussian Distribution or 1/(1+x^2). If you want a bounded formula, you can use the Beta Distribution or the Kumaraswamy Distribution.
Integrate the PDF to get the Cumulative Distribution Function.
Find the inverse of the CDF.
Generate a random number and plug it into the inverse of the CDF.
Multiply that result by (max-min) and then add min
Round the result to the nearest integer.
Steps 1 to 3 are things you have to hard code into the game. The only way around it for any PDF is to solve for the shape parameters of that correspond to its mean and holds to the constraints on what you want the shape parameters to be. If you want to use the Kumaraswamy Distribution, you will set it so that the shape parameters a and b are always greater than one.
I would suggest using the Kumaraswamy Distribution because it is bounded and it has a very nice closed form and closed form inverse. It only has two parameters, a and b, and it is extremely flexible, as it can model many different scenarios, including polynomial behavior, bell curve behavior, and a basin-like behavior that has a peak at both edges. Also, modeling isn't too hard with this function. The higher the shape parameter b is, the more tilted it will be to the left, and the higher the shape parameter a is, the more tilted it will be to the right. If a and b are both less than one, the distribution will look like a trough or basin. If a or b is equal to one, the distribution will be a polynomial that does not change concavity from 0 to 1. If both a and b equal one, the distribution is a straight line. If a and b are greater than one, than the function will look like a bell curve. The best thing you can do to learn this is to actually graph these functions or just run the Inverse Transform Sampling algorithm.
https://en.wikipedia.org/wiki/Kumaraswamy_distribution
For instance, if I want to have a probability distribution shaped like this with a=2 and b=5 going from 0 to 100:
https://www.wolframalpha.com/input/?i=2*5*x%5E(2-1)*(1-x%5E2)%5E(5-1)+from+x%3D0+to+x%3D1
Its CDF would be:
CDF(x)=1-(1-x^2)^5
Its inverse would be:
CDF^-1(x)=(1-(1-x)^(1/5))^(1/2)
The General Inverse of the Kumaraswamy Distribution is:
CDF^-1(x)=(1-(1-x)^(1/b))^(1/a)
I would then generate a number from 0 to 1, put it into the CDF^-1(x), and multiply the result by 100.
Pros
Very accurate
Continuous, not discreet
Uses one formula and very little space
Gives you a lot of control over exactly how the randomness is spread out
Many of these formulas have CDFs with inverses of some sort
There are ways to bound the functions on both ends. For instance, the Kumaraswamy Distribution is bounded from 0 to 1, so you just input a float between zero and one, then multiply the result by (max-min) and add min. The Beta Distribution is bounded differently based on what values you pass into it. For something like PDF(x)=x, the CDF(x)=(x^2)/2, so you can generate a random value from CDF(0) to CDF(max-min).
Cons
You need to come up with the exact distributions and their shapes you plan on using
Every single general formula you plan on using needs to be hard coded into the game. In other words, you can program the general Kumaraswamy Distribution into the game and have a function that generates random numbers based on the distribution and its parameters, a and b, but not a function that generates a distribution for you based on the average. If you wanted to use Distribution x, you would have to find out what values of a and b best fit the data you want to see and hard code those values into the game.
I would use a simple mathematical function for that. From what you describe, you need an exponential progression like y = x^2. at average (average is at x=0.5 since rand gets you a number from 0 to 1) you would get 0.25. If you want a lower average number, you can use a higher exponent like y = x^3 what would result in y = 0.125 at x = 0.5
Example:
http://www.meta-calculator.com/online/?panel-102-graph&data-bounds-xMin=-2&data-bounds-xMax=2&data-bounds-yMin=-2&data-bounds-yMax=2&data-equations-0=%22y%3Dx%5E2%22&data-rand=undefined&data-hideGrid=false
PS: I adjusted the function to calculate the needed exponent to get the average result.
Code example:
function expRand (min, max, exponent) {
return Math.round( Math.pow( Math.random(), exponent) * (max - min) + min);
}
function averageRand (min, max, average) {
var exponent = Math.log(((average - min) / (max - min))) / Math.log(0.5);
return expRand(min, max, exponent);
}
alert(averageRand(1, 100, 10));
You may combine 2 random processes. For example:
first rand R1 = f(min: 1, max: 20, avg: 10);
second rand R2 = f(min:1, max : 10, avg : 1);
and then multiply R1*R2 to have a result between [1-200] and average around 10 (the average will be shifted a bit)
Another option is to find the inverse of the random function you want to use. This option has to be initialized when your program starts but doesn't need to be recomputed. The math used here can be found in a lot of Math libraries. I will explain point by point by taking the example of an unknown random function where only four points are known:
First, fit the four point curve with a polynomial function of order 3 or higher.
You should then have a parametrized function of type : ax+bx^2+cx^3+d.
Find the indefinite integral of the function (the form of the integral is of type a/2x^2+b/3x^3+c/4x^4+dx, which we will call quarticEq).
Compute the integral of the polynomial from your min to your max.
Take a uniform random number between 0-1, then multiply by the value of the integral computed in Step 5. (we name the result "R")
Now solve the equation R = quarticEq for x.
Hopefully the last part is well known, and you should be able to find a library that can do this computation (see wiki). If the inverse of the integrated function does not have a closed form solution (like in any general polynomial with degree five or higher), you can use a root finding method such as Newton's Method.
This kind of computation may be use to create any kind of random distribution.
Edit :
You may find the Inverse Transform Sampling described above in wikipedia and I found this implementation (I haven't tried it.)
You can keep a running average of what you have returned from the function so far and based on that in a while loop get the next random number that fulfills the average, adjust running average and return the number
Using a drop table permit a very fast roll, that in a real time game matter. In fact it is only one random generation of a number from a range, then according to a table of probabilities (a Gauss distribution for that range) a if statement with multiple choice. Something like that:
num = random.randint(1,100)
if num<10 :
case 1
if num<20 and num>10 :
case 2
...
It is not very clean but when you have a finite number of choices it can be very fast.
There are lots of ways to do so, all of which basically boil down to generating from a right-skewed (a.k.a. positive-skewed) distribution. You didn't make it clear whether you want integer or floating point outcomes, but there are both discrete and continuous distributions that fit the bill.
One of the simplest choices would be a discrete or continuous right-triangular distribution, but while that will give you the tapering off you desire for larger values, it won't give you independent control of the mean.
Another choice would be a truncated exponential (for continuous) or geometric (for discrete) distribution. You'd need to truncate because the raw exponential or geometric distribution has a range from zero to infinity, so you'd have to lop off the upper tail. That would in turn require you to do some calculus to find a rate λ which yields the desired mean after truncation.
A third choice would be to use a mixture of distributions, for instance choose a number uniformly in a lower range with some probability p, and in an upper range with probability (1-p). The overall mean is then p times the mean of the lower range + (1-p) times the mean of the upper range, and you can dial in the desired overall mean by adjusting the ranges and the value of p. This approach will also work if you use non-uniform distribution choices for the sub-ranges. It all boils down to how much work you're willing to put into deriving the appropriate parameter choices.
One method would not be the most precise method, but could be considered "good enough" depending on your needs.
The algorithm would be to pick a number between a min and a sliding max. There would be a guaranteed max g_max and a potential max p_max. Your true max would slide depending on the results of another random call. This will give you a skewed distribution you are looking for. Below is the solution in Python.
import random
def get_roll(min, g_max, p_max)
max = g_max + (random.random() * (p_max - g_max))
return random.randint(min, int(max))
get_roll(1, 10, 20)
Below is a histogram of the function ran 100,000 times with (1, 10, 20).
private int roll(int minRoll, int avgRoll, int maxRoll) {
// Generating random number #1
int firstRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
// Iterating 3 times will result in the roll being relatively close to
// the average roll.
if (firstRoll > avgRoll) {
// If the first roll is higher than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll > verificationRoll && verificationRoll >= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
} else if (firstRoll < avgRoll) {
// If the first roll is lower than the (set) average roll:
for (int i = 0; i < 3; i++) {
int verificationRoll = ThreadLocalRandom.current().nextInt(minRoll, maxRoll + 1);
if (firstRoll < verificationRoll && verificationRoll <= avgRoll) {
// If the following condition is met:
// The iteration-roll is closer to 30 than the first roll
firstRoll = verificationRoll;
}
}
}
return firstRoll;
}
Explanation:
roll
check if the roll is above, below or exactly 30
if above, reroll 3 times & set the roll according to the new roll, if lower but >= 30
if below, reroll 3 times & set the roll according to the new roll, if
higher but <= 30
if exactly 30, don't set the roll anew
return the roll
Pros:
simple
effective
performs well
Cons:
You'll naturally have more results that are in the range of 30-40 than you'll have in the range of 20-30, simple due to the 30-70 relation.
Testing:
You can test this by using the following method in conjunction with the roll()-method. The data is saved in a hashmap (to map the number to the number of occurences).
public void rollTheD100() {
int maxNr = 100;
int minNr = 1;
int avgNr = 30;
Map<Integer, Integer> numberOccurenceMap = new HashMap<>();
// "Initialization" of the map (please don't hit me for calling it initialization)
for (int i = 1; i <= 100; i++) {
numberOccurenceMap.put(i, 0);
}
// Rolling (100k times)
for (int i = 0; i < 100000; i++) {
int dummy = roll(minNr, avgNr, maxNr);
numberOccurenceMap.put(dummy, numberOccurenceMap.get(dummy) + 1);
}
int numberPack = 0;
for (int i = 1; i <= 100; i++) {
numberPack = numberPack + numberOccurenceMap.get(i);
if (i % 10 == 0) {
System.out.println("<" + i + ": " + numberPack);
numberPack = 0;
}
}
}
The results (100.000 rolls):
These were as expected. Note that you can always fine-tune the results, simply by modifying the iteration-count in the roll()-method (the closer to 30 the average should be, the more iterations should be included (note that this could hurt the performance to a certain degree)). Also note that 30 was (as expected) the number with the highest number of occurences, by far.
<10: 4994
<20: 9425
<30: 18184
<40: 29640
<50: 18283
<60: 10426
<70: 5396
<80: 2532
<90: 897
<100: 223
Try this,
generate a random number for the range of numbers below the average and generate a second random number for the range of numbers above the average.
Then randomly select one of those, each range will be selected 50% of the time.
var psuedoRand = function(min, max, avg) {
var upperRand = (int)(Math.random() * (max - avg) + avg);
var lowerRand = (int)(Math.random() * (avg - min) + min);
if (math.random() < 0.5)
return lowerRand;
else
return upperRand;
}
Having seen much good explanations and some good ideas, I still think this could help you:
You can take any distribution function f around 0, and substitute your interval of interest to your desired interval [1,100]: f -> f'.
Then feed the C++ discrete_distribution with the results of f'.
I've got an example with the normal distribution below, but I can't get my result into this function :-S
#include <iostream>
#include <random>
#include <chrono>
#include <cmath>
using namespace std;
double p1(double x, double mean, double sigma); // p(x|x_avg,sigma)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max); // transform ("stretch") it to the interval
int plot_ps(int x_avg, int x_min, int x_max, double sigma);
int main()
{
int x_min = 1;
int x_max = 20;
int x_avg = 6;
double sigma = 5;
/*
int p[]={2,1,3,1,2,5,1,1,1,1};
default_random_engine generator (chrono::system_clock::now().time_since_epoch().count());
discrete_distribution<int> distribution {p*};
for (int i=0; i< 10; i++)
cout << i << "\t" << distribution(generator) << endl;
*/
plot_ps(x_avg, x_min, x_max, sigma);
return 0; //*/
}
// Normal distribution function
double p1(double x, double mean, double sigma)
{
return 1/(sigma*sqrt(2*M_PI))
* exp(-(x-mean)*(x-mean) / (2*sigma*sigma));
}
// Transforms intervals to your wishes ;)
// z_min and z_max are the desired values f'(x_min) and f'(x_max)
double p2(int x, int x_min, int x_max, double x_avg, double z_min, double z_max)
{
double y;
double sigma = 1.0;
double y_min = -sigma*sqrt(-2*log(z_min));
double y_max = sigma*sqrt(-2*log(z_max));
if(x < x_avg)
y = -(x-x_avg)/(x_avg-x_min)*y_min;
else
y = -(x-x_avg)/(x_avg-x_max)*y_max;
return p1(y, 0.0, sigma);
}
//plots both distribution functions
int plot_ps(int x_avg, int x_min, int x_max, double sigma)
{
double z = (1.0+x_max-x_min);
// plot p1
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p1(i, x_avg, sigma)*(sigma*sqrt(2*M_PI)*20.0)+0.5), '*')
<< endl;
}
cout << endl;
// plot p2
for (int i=1; i<=20; i++)
{
cout << i << "\t" <<
string(int(p2(i, x_min, x_max, x_avg, 1.0/z, 1.0/z)*(20.0*sqrt(2*M_PI))+0.5), '*')
<< endl;
}
}
With the following result if I let them plot:
1 ************
2 ***************
3 *****************
4 ******************
5 ********************
6 ********************
7 ********************
8 ******************
9 *****************
10 ***************
11 ************
12 **********
13 ********
14 ******
15 ****
16 ***
17 **
18 *
19 *
20
1 *
2 ***
3 *******
4 ************
5 ******************
6 ********************
7 ********************
8 *******************
9 *****************
10 ****************
11 **************
12 ************
13 *********
14 ********
15 ******
16 ****
17 ***
18 **
19 **
20 *
So - if you could give this result to the discrete_distribution<int> distribution {}, you got everything you want...
Well, from what I can see of your problem, I would want for the solution to meet these criteria:
a) Belong to a single distribution: If we need to "roll" (call math.Random) more than once per function call and then aggregate or discard some results, it stops being truly distributed according to the given function.
b) Not be computationally intensive: Some of the solutions use Integrals, (Gamma distribution, Gaussian Distribution), and those are computationally intensive. In your description, you mention that you want to be able to "calculate it with a formula", which fits this description (basically, you want an O(1) function).
c) Be relatively "well distributed", e.g. not have peaks and valleys, but instead have most results cluster around the mean, and have nice predictable slopes downwards towards the ends, and yet have the probability of the min and the max to be not zero.
d) Not to require to store a large array in memory, as in drop tables.
I think this function meets the requirements:
var pseudoRand = function(min, max, avg )
{
var randomFraction = Math.random();
var head = (avg - min);
var tail = (max - avg);
var skewdness = tail / (head + tail);
if (randomFraction < skewdness)
return min + (randomFraction / skewdness) * head;
else
return avg + (1 - randomFraction) / (1 - skewdness) * tail;
}
This will return floats, but you can easily turn them to ints by calling
(int) Math.round(pseudoRand(...))
It returned the correct average in all of my tests, and it is also nicely distributed towards the ends. Hope this helps. Good luck.

Dynamic algorithm to find maximum sum of products of "accessible" numbers in an array

I have been asked to give a dynamic algorithm that would take a sequence of an even amount of numbers (both positive and negative) and do the following:
Each "turn" two numbers are chosen to be multiplied together. The algorithm can only access either end of the sequence. However, if the first number chosen is the leftmost number, the second number can be either the rightmost number, or the new leftmost number (since the old leftmost number has already been "removed/chosen") and vice-versa. The objective of the program is to find the maximum total sum of the products of the two numbers chosen each round.
Example:
Sequence: { 10, 4, 20, -5, 0, 7 }
Optimal result: 7*10 + 0*-5 + 4*20 = 150
My Progress:
I have been trying to find a dynamic approach without much luck. I've been able to deduce that the program is essentially only allowed to multiply the end numbers by the "adjacent" numbers each time, and that the objective is to multiply the smallest possible numbers by the smallest possible numbers (resulting in either a double negative multiplication - a positive number, or the least-smallest number attainable), and continue to apply this rule each time right to the finish. Contrastingly, this rule would also apply in the opposite direction - multiply the largest possible numbers by the largest possible numbers each time. Maybe the best way is to apply both methods at once? I'm not sure, as I mentioned, I haven't had much luck implementing an algorithm for this problem.
Let's look at both a recursive and a bottom-up tabulated approach. First the recursive:
{10, 4,20,-5, 0, 7}
First call:
f(0,5) = max(f(0,3)+0*7, f(2,5)+10*4, f(1,4)+10*7)
Let's follow one thread:
f(1,4) = max(f(1,2)+(-5)*0, f(3,4)+4*20, f(2,3)+4*0)
f(1,2), f(3,4), and f(2,3) are "base cases" and have a direct solution. The function can now save these in a table indexed by i,j, to be accessed later by other threads of the recursion. For example, f(2,5) = max(f(2,3)+0*7... also needs the value for f(2,3) and can avoid creating another function call if the value is already in the table. As the recursive function calls are returned, the function can save the next values in the table for f(1,4), f(2,5), and f(0,3). Since the array in this example is short, the reduction in function calls is not that significant, but for longer arrays, the number of overlapping function calls (to the same i,j) can be much larger, which is why memoization can prove more efficient.
The tabulated approach is what I tried to unfold in my other answer. Here, instead of a recursion, we rely on (in this case) a similar mathematical formulation to calculate the next values in the table, relying on other values in the table that have already been calculated. The stars under the array are meant to illustrate the order by which we calculate the values (using two nested for loops). You can see that the values needed to calculate the formula for each (i,j)-sized subset are either a base case or exist earlier in the loop order; these are: a subset extended two elements to the left, a subset extended two elements to the right, and a subset extended one element to each side.
You're probably looking for a dynamic programming algorithm. Let A be the array of numbers, the recurrence for this problem will be
f(start,stop) = max( // last two numbers multiplied + the rest of sequence,
// first two numbers multiplied + the rest of sequence,
// first number*last number + rest of sequence )
f(start,stop) then is the optimal result for the subsequence of the array beginning at start,stop. You should compute f(start,stop) for all valid values using dynamic programming or memoization.
Hint: The first part // last two numbers multiplied + the rest of sequence looks like:
f(start,stop-2) + A[stop-1]*A[stop-2]
Let i and j represent the first and last indexes of the array, A, after the previous turn. Clearly, they must represent some even-sized contiguous subset of A. Then a general case for dp[i][j] ought to be max(left, right, both) where left = A[i-2]*A[i-1] + dp[i-2][j], right = A[j+1]*A[j+2] + dp[i][j+2], and both = A[i-1]*A[j+1] + dp[i-1][j+1]; and the solution is max(A[i]*A[i+1] + dp[i][i+1]) for all i except the last.
Fortunately, we can compute the dp in a decreasing order, such that the needed values, always representing larger surrounding subsets, are already calculated (stars represent the computed subset):
{10, 4,20,-5, 0, 7}
* * * * * *
* * * *
* *
* * * * (70)
* *
* * * *
* *
* * left = (80 + 70)
* *
Below is the code snippet of recursive approach.
public class TestClass {
public static void main(String[] args) {
int[] arr = {10, 4, 20, -5, 0, 7};
System.out.println(findMaximumSum(arr, 0, arr.length - 1));
}
private static int findMaximumSum(int[] arr, int start, int end) {
if (end - start == 1)
return arr[start] * arr[end];
return findMaximum(
findMaximumSum(arr, start + 2, end) + (arr[start] * arr[start + 1]),
findMaximumSum(arr, start + 1, end - 1) + (arr[start] * arr[end]),
findMaximumSum(arr, start, end - 2)+ (arr[end] * arr[end - 1])
);
}
private static int findMaximum(int x, int y, int z) {
return Math.max(Math.max(x, y), z);
}
}
The result is 10*4 + 20*7 + -5*0 = 180
and similarly for the input {3,9,7,1,8,2} the answer is 3*2 + 9*8 + 7*1 = 85
Let's turn this into a sweet dynamic programming formula.
We define the subproblem as follows:
We would like to maximize the total sum, while picking either the first two, the first and last, or the last two values of the subarray i, j.
Then the recurrence equation looks like this:
set OPT(i,j) =
if i == j
v[i]
else if i < j:
max (
v[i] + v[i+1] + OPT(i + 2,j),
v[i] + v[j] + OPT(i + 1,j + 1),
v[j] + v[j-1] + OPT(i, j - 2)
)
else:
0
The topological order: either or both i and j are getting smaller.
Now the base case comes when i is equal to j, the value is returned.
And to come to the original problem, calling OPT(0,n-1) returns the maximum sum.
The time complexity is O(n^2). Since we use dynamic programming, it enables us to cache all values. Per subproblem call, we use at most O(n) time, and we do this O(n) times.

"Approximate" greatest common divisor

Suppose you have a list of floating point numbers that are approximately multiples of a common quantity, for example
2.468, 3.700, 6.1699
which are approximately all multiples of 1.234. How would you characterize this "approximate gcd", and how would you proceed to compute or estimate it?
Strictly related to my answer to this question.
You can run Euclid's gcd algorithm with anything smaller then 0.01 (or a small number of your choice) being a pseudo 0. With your numbers:
3.700 = 1 * 2.468 + 1.232,
2.468 = 2 * 1.232 + 0.004.
So the pseudo gcd of the first two numbers is 1.232. Now you take the gcd of this with your last number:
6.1699 = 5 * 1.232 + 0.0099.
So 1.232 is the pseudo gcd, and the mutiples are 2,3,5. To improve this result, you may take the linear regression on the data points:
(2,2.468), (3,3.7), (5,6.1699).
The slope is the improved pseudo gcd.
Caveat: the first part of this is algorithm is numerically unstable - if you start with very dirty data, you are in trouble.
Express your measurements as multiples of the lowest one. Thus your list becomes 1.00000, 1.49919, 2.49996. The fractional parts of these values will be very close to 1/Nths, for some value of N dictated by how close your lowest value is to the fundamental frequency. I would suggest looping through increasing N until you find a sufficiently refined match. In this case, for N=1 (that is, assuming X=2.468 is your fundamental frequency) you would find a standard deviation of 0.3333 (two of the three values are .5 off of X * 1), which is unacceptably high. For N=2 (that is, assuming 2.468/2 is your fundamental frequency) you would find a standard deviation of virtually zero (all three values are within .001 of a multiple of X/2), thus 2.468/2 is your approximate GCD.
The major flaw in my plan is that it works best when the lowest measurement is the most accurate, which is likely not the case. This could be mitigated by performing the entire operation multiple times, discarding the lowest value on the list of measurements each time, then use the list of results of each pass to determine a more precise result. Another way to refine the results would be adjust the GCD to minimize the standard deviation between integer multiples of the GCD and the measured values.
This reminds me of the problem of finding good rational-number approximations of real numbers. The standard technique is a continued-fraction expansion:
def rationalizations(x):
assert 0 <= x
ix = int(x)
yield ix, 1
if x == ix: return
for numer, denom in rationalizations(1.0/(x-ix)):
yield denom + ix * numer, numer
We could apply this directly to Jonathan Leffler's and Sparr's approach:
>>> a, b, c = 2.468, 3.700, 6.1699
>>> b/a, c/a
(1.4991896272285252, 2.4999594813614263)
>>> list(itertools.islice(rationalizations(b/a), 3))
[(1, 1), (3, 2), (925, 617)]
>>> list(itertools.islice(rationalizations(c/a), 3))
[(2, 1), (5, 2), (30847, 12339)]
picking off the first good-enough approximation from each sequence. (3/2 and 5/2 here.) Or instead of directly comparing 3.0/2.0 to 1.499189..., you could notice than 925/617 uses much larger integers than 3/2, making 3/2 an excellent place to stop.
It shouldn't much matter which of the numbers you divide by. (Using a/b and c/b you get 2/3 and 5/3, for instance.) Once you have integer ratios, you could refine the implied estimate of the fundamental using shsmurfy's linear regression. Everybody wins!
I'm assuming all of your numbers are multiples of integer values. For the rest of my explanation, A will denote the "root" frequency you are trying to find and B will be an array of the numbers you have to start with.
What you are trying to do is superficially similar to linear regression. You are trying to find a linear model y=mx+b that minimizes the average distance between a linear model and a set of data. In your case, b=0, m is the root frequency, and y represents the given values. The biggest problem is that the independent variables X are not explicitly given. The only thing we know about X is that all of its members must be integers.
Your first task is trying to determine these independent variables. The best method I can think of at the moment assumes that the given frequencies have nearly consecutive indexes (x_1=x_0+n). So B_0/B_1=(x_0)/(x_0+n) given a (hopefully) small integer n. You can then take advantage of the fact that x_0 = n/(B_1-B_0), start with n=1, and keep ratcheting it up until k-rnd(k) is within a certain threshold. After you have x_0 (the initial index), you can approximate the root frequency (A = B_0/x_0). Then you can approximate the other indexes by finding x_n = rnd(B_n/A). This method is not very robust and will probably fail if the error in the data is large.
If you want a better approximation of the root frequency A, you can use linear regression to minimize the error of the linear model now that you have the corresponding dependent variables. The easiest method to do so uses least squares fitting. Wolfram's Mathworld has a in-depth mathematical treatment of the issue, but a fairly simple explanation can be found with some googling.
Interesting question...not easy.
I suppose I would look at the ratios of the sample values:
3.700 / 2.468 = 1.499...
6.1699 / 2.468 = 2.4999...
6.1699 / 3.700 = 1.6675...
And I'd then be looking for a simple ratio of integers in those results.
1.499 ~= 3/2
2.4999 ~= 5/2
1.6675 ~= 5/3
I haven't chased it through, but somewhere along the line, you decide that an error of 1:1000 or something is good enough, and you back-track to find the base approximate GCD.
The solution which I've seen and used myself is to choose some constant, say 1000, multiply all numbers by this constant, round them to integers, find the GCD of these integers using the standard algorithm and then divide the result by the said constant (1000). The larger the constant, the higher the precision.
This is a reformulaiton of shsmurfy's solution when you a priori choose 3 positive tolerances (e1,e2,e3)
The problem is then to search smallest positive integers (n1,n2,n3) and thus largest root frequency f such that:
f1 = n1*f +/- e1
f2 = n2*f +/- e2
f3 = n3*f +/- e3
We assume 0 <= f1 <= f2 <= f3
If we fix n1, then we get these relations:
f is in interval I1=[(f1-e1)/n1 , (f1+e1)/n1]
n2 is in interval I2=[n1*(f2-e2)/(f1+e1) , n1*(f2+e2)/(f1-e1)]
n3 is in interval I3=[n1*(f3-e3)/(f1+e1) , n1*(f3+e3)/(f1-e1)]
We start with n1 = 1, then increment n1 until the interval I2 and I3 contain an integer - that is floor(I2min) different from floor(I2max) same with I3
We then choose smallest integer n2 in interval I2, and smallest integer n3 in interval I3.
Assuming normal distribution of floating point errors, the most probable estimate of root frequency f is the one minimizing
J = (f1/n1 - f)^2 + (f2/n2 - f)^2 + (f3/n3 - f)^2
That is
f = (f1/n1 + f2/n2 + f3/n3)/3
If there are several integers n2,n3 in intervals I2,I3 we could also choose the pair that minimize the residue
min(J)*3/2=(f1/n1)^2+(f2/n2)^2+(f3/n3)^2-(f1/n1)*(f2/n2)-(f1/n1)*(f3/n3)-(f2/n2)*(f3/n3)
Another variant could be to continue iteration and try to minimize another criterium like min(J(n1))*n1, until f falls below a certain frequency (n1 reaches an upper limit)...
I found this question looking for answers for mine in MathStackExchange (here and here).
I've only managed (yet) to measure the appeal of a fundamental frequency given a list of harmonic frequencies (following the sound/music nomenclature), which can be useful if you have a reduced number of options and is feasible to compute the appeal of each one and then choose the best fit.
C&P from my question in MSE (there the formatting is prettier):
being v the list {v_1, v_2, ..., v_n}, ordered from lower to higher
mean_sin(v, x) = sum(sin(2*pi*v_i/x), for i in {1, ...,n})/n
mean_cos(v, x) = sum(cos(2*pi*v_i/x), for i in {1, ...,n})/n
gcd_appeal(v, x) = 1 - sqrt(mean_sin(v, x)^2 + (mean_cos(v, x) - 1)^2)/2, which yields a number in the interval [0,1].
The goal is to find the x that maximizes the appeal. Here is the (gcd_appeal) graph for your example [2.468, 3.700, 6.1699], where you find that the optimum GCD is at x = 1.2337899957639993
Edit:
You may find handy this JAVA code to calculate the (fuzzy) divisibility (aka gcd_appeal) of a divisor relative to a list of dividends; you can use it to test which of your candidates makes the best divisor. The code looks ugly because I tried to optimize it for performance.
//returns the mean divisibility of dividend/divisor as a value in the range [0 and 1]
// 0 means no divisibility at all
// 1 means full divisibility
public double divisibility(double divisor, double... dividends) {
double n = dividends.length;
double factor = 2.0 / divisor;
double sum_x = -n;
double sum_y = 0.0;
double[] coord = new double[2];
for (double v : dividends) {
coordinates(v * factor, coord);
sum_x += coord[0];
sum_y += coord[1];
}
double err = 1.0 - Math.sqrt(sum_x * sum_x + sum_y * sum_y) / (2.0 * n);
//Might happen due to approximation error
return err >= 0.0 ? err : 0.0;
}
private void coordinates(double x, double[] out) {
//Bhaskara performant approximation to
//out[0] = Math.cos(Math.PI*x);
//out[1] = Math.sin(Math.PI*x);
long cos_int_part = (long) (x + 0.5);
long sin_int_part = (long) x;
double rem = x - cos_int_part;
if (cos_int_part != sin_int_part) {
double common_s = 4.0 * rem;
double cos_rem_s = common_s * rem - 1.0;
double sin_rem_s = cos_rem_s + common_s + 1.0;
out[0] = (((cos_int_part & 1L) * 8L - 4L) * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (((sin_int_part & 1L) * 8L - 4L) * sin_rem_s) / (sin_rem_s + 5.0);
} else {
double common_s = 4.0 * rem - 4.0;
double sin_rem_s = common_s * rem;
double cos_rem_s = sin_rem_s + common_s + 3.0;
double common_2 = ((cos_int_part & 1L) * 8L - 4L);
out[0] = (common_2 * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (common_2 * sin_rem_s) / (sin_rem_s + 5.0);
}
}

Resources