Assume I have these two arrays:
float arr[] = {40.4357,40.6135,40.2477,40.2864,39.3449,39.8901,40.103,39.9959,39.7863,39.9102,39.2652,39.2688,39.5147,38.2246,38.5376,38.4512,38.9951,39.0999,39.3057,38.53,38.2761,38.1722,37.8816,37.6521,37.8306,38.0853,37.9644,38.0626,38.0567,38.3518,38.4044,38.3553,38.4978,38.3768,38.2058,38.3175,38.3123,38.262,38.0093,38.3685,38.0111,38.4539,38.8122,39.1413,38.9409,39.2043,39.3538,39.4123,39.3628,39.2825,39.1898,39.0431,39.0634,38.5993,38.252,37.3793,36.6334,36.4009,35.2822,34.4262,34.2119,34.1552,34.3325,33.9626,33.2661,32.3819,35.1959,36.7602,37.9039,37.8103,37.5832,37.9718,38.3111,38.9323,38.6763,39.1163,38.8469,39.805,40.2627,40.3689,40.4064,40.0558,40.815,41.0234,41.0128,41.0296,41.0927,40.7046,40.6775,40.2711,40.1283,39.7518,40.0145,40.0394,39.8461,39.6317,39.5548,39.1996,38.9861,38.8507,38.8603,38.483,38.4711,38.4214,38.4286,38.5766,38.7532,38.7905,38.6029,38.4635,38.1403,36.6844,36.616,36.4053,34.7934,34.0226,33.0505,33.4978,34.6106,35.284,35.7535,35.3541,35.5481,35.4086,35.7096,36.0526,36.1222,35.9408,36.1007,36.7952,36.99,37.1024,37.0993,37.3144,36.6951,37.1213,38.0026,38.1266,39.2538,38.8963,39.0158,38.6235,38.7908,38.6041,38.4489,38.3207,37.7398,38.5304,38.925,38.7249,38.9221,39.1704,39.5113,40.0613,39.3602,39.8689,39.973,40.0524,40.0025,40.7584,40.9714,40.9106,40.9685,40.6554,39.7314,39.0044,38.7183,38.5163,38.6101,38.2004,38.7606,38.7532,37.8903,37.8403,38.5368,39.0462,38.8279,39.0748,39.2907,38.5447,38.423,38.5624,38.476,38.5784,39.0905,39.379,39.4739,39.5774,40.7036,40.3044,39.6162,39.9967,40.0562,39.3426,38.666,38.7561,39.2823,38.8548,37.6214,37.8188,38.1086,38.3619,38.5472,38.1357,38.1422,37.95,37.1837,37.4636,36.8852,37.1617,37.5051,37.7724,38.0879,37.7197,38.0422,37.8551,38.5688,38.8388};
float pattern[] = {38.6434,38.1409,37.3391,37.5457,37.7487,37.7499,37.6121,37.4789,37.5821,37.6541,38.0365,37.7907,37.9932,37.9945,37.7032,37.3556,37.6359,37.5412,37.5296,37.8829,38.3797,38.4452,39.0929,39.1233,39.3014,39.0317,38.903,38.8221,39.045,38.6944,39.0699,39.0978,38.9877,38.8123,38.7491,38.5888,38.7875,38.2086,37.7484,37.3961,36.8663,36.2607,35.8838,35.3297,35.5574,35.7239};
Ives uploaded this example graph:
As you can see in the graph pattern almost fits in the array at index 17
Whats the best and fastest way to find this index? And is there a way to have a confidence for there match fuse the values are not equal as you can see?
If the starting index is your only degree of freedom, you can just try each index and calculate the sum of squared errors for each of the data points. In Python this could look like this:
data = [40.4357,40.6135,40.2477,...]
pattern = [38.6434,38.1409,37.3391,37.5457,37.7487,...]
best_ind, best_err = 0, 1e9999
for i in range(len(data) - len(pattern)):
subdata = data[i : i + len(pattern)]
err = sum((d-p)**2 for (d, p) in zip(subdata, pattern))
if err < best_err:
best_ind, best_err = i, err
Result:
>>> print best_ind, best_err
17 21.27929269
The straightforward algorithm is to chose a measure of convergence ( how you describe similarity, this might be average of errors, or their squared values or any other function suitable for your purposes) and apply steps
Let i = 0 be an integer index, M be a container of size = length(data) - length(pattern) + 1 to store the measurings
If i < size then shift your pattern by i, otherwise go to step 5
Calculate the measure of similarity and store into M
i = i + 1, go to 2 and repeat
Chose the index of minimum value in M
It's a one-liner in Python, using the fact that tuples are sorted lexicographically:
In [1]:
import numpy as np
arr = np.array( [ 40.4357,40.6135,40.2477,40.2864,39.3449,39.8901,40.103,39.9959,39.7863,39.9102,39.2652,39.2688,39.5147,38.2246,38.5376,38.4512,38.9951,39.0999,39.3057,38.53,38.2761,38.1722,37.8816,37.6521,37.8306,38.0853,37.9644,38.0626,38.0567,38.3518,38.4044,38.3553,38.4978,38.3768,38.2058,38.3175,38.3123,38.262,38.0093,38.3685,38.0111,38.4539,38.8122,39.1413,38.9409,39.2043,39.3538,39.4123,39.3628,39.2825,39.1898,39.0431,39.0634,38.5993,38.252,37.3793,36.6334,36.4009,35.2822,34.4262,34.2119,34.1552,34.3325,33.9626,33.2661,32.3819,35.1959,36.7602,37.9039,37.8103,37.5832,37.9718,38.3111,38.9323,38.6763,39.1163,38.8469,39.805,40.2627,40.3689,40.4064,40.0558,40.815,41.0234,41.0128,41.0296,41.0927,40.7046,40.6775,40.2711,40.1283,39.7518,40.0145,40.0394,39.8461,39.6317,39.5548,39.1996,38.9861,38.8507,38.8603,38.483,38.4711,38.4214,38.4286,38.5766,38.7532,38.7905,38.6029,38.4635,38.1403,36.6844,36.616,36.4053,34.7934,34.0226,33.0505,33.4978,34.6106,35.284,35.7535,35.3541,35.5481,35.4086,35.7096,36.0526,36.1222,35.9408,36.1007,36.7952,36.99,37.1024,37.0993,37.3144,36.6951,37.1213,38.0026,38.1266,39.2538,38.8963,39.0158,38.6235,38.7908,38.6041,38.4489,38.3207,37.7398,38.5304,38.925,38.7249,38.9221,39.1704,39.5113,40.0613,39.3602,39.8689,39.973,40.0524,40.0025,40.7584,40.9714,40.9106,40.9685,40.6554,39.7314,39.0044,38.7183,38.5163,38.6101,38.2004,38.7606,38.7532,37.8903,37.8403,38.5368,39.0462,38.8279,39.0748,39.2907,38.5447,38.423,38.5624,38.476,38.5784,39.0905,39.379,39.4739,39.5774,40.7036,40.3044,39.6162,39.9967,40.0562,39.3426,38.666,38.7561,39.2823,38.8548,37.6214,37.8188,38.1086,38.3619,38.5472,38.1357,38.1422,37.95,37.1837,37.4636,36.8852,37.1617,37.5051,37.7724,38.0879,37.7197,38.0422,37.8551,38.5688,38.8388] )
pattern = np.array( [ 38.6434,38.1409,37.3391,37.5457,37.7487,37.7499,37.6121,37.4789,37.5821,37.6541,38.0365,37.7907,37.9932,37.9945,37.7032,37.3556,37.6359,37.5412,37.5296,37.8829,38.3797,38.4452,39.0929,39.1233,39.3014,39.0317,38.903,38.8221,39.045,38.6944,39.0699,39.0978,38.9877,38.8123,38.7491,38.5888,38.7875,38.2086,37.7484,37.3961,36.8663,36.2607,35.8838,35.3297,35.5574,35.7239 ] )
min( ( ( ( arr[i:i+len(pattern)] - pattern ) ** 2 ).mean(), i ) for i in xrange(len(arr)-len(pattern)) )
Out[5]:
(0.46259331934782588, 17)
where 0.46 is the minimal mean squared error, and 17 is the position of the minimum in arr.
I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform