I have a set of points like:
(x , y , z , t)
(1 , 3 , 6 , 0.5)
(1.5 , 4 , 6.5 , 1)
(3.5 , 7 , 8 , 1.5)
(4 , 7.25 , 9 , 2)
I am looking to find the best linear fit on these points, let say a function like:
f(t) = a * x +b * y +c * z
This is Linear Regression problem. The "best fit" depends on the metric you define for being better.
One simple example is the Least Squares Metric, which aims to minimize the sum of squares: (f((x_i,y_i,z_i)) - w_i)^2 - where w_i is the measured value for the sample.
So, in least squares you are trying to minimize SUM{(a*x_i+b*y_i+c*z^i - w_i)^2 | per each i }. This function has a single global minimum at:
(a,b,c) = (X^T * X)^-1 * X^T * w
Where:
X is a 3xm matrix (m is the number of samples you have)
X^T - is the transposed of this matrix
w - is the measured results: `(w_1,w_2,...,w_m)`
The * operator represents matrix multiplication
There are more complex other methods, that use other distance metric, one example is the famous SVR with a linear kernel.
It seems that you are looking for the major axis of a point cloud.
You can work this out by finding the Eigenvector associated to the largest Eigenvalue of the covariance matrix. Could be an opportunity to use the power method (starting the iterations with the point farthest from the centroid, for example).
Can also be addressed by Singular Value Decomposition, preferably using methods that compute the largest values only.
If your data set contains outliers, then RANSAC could be a better choice: take two points at random and compute the sum of distances to the line they define. Repeat a number of times and keep the best fit.
Using the squared distances will answer your request for least-squares, but non-squared distances will be more robust.
You have a linear problem.
For example, my equation will be Y=ax1+bx2+c*x3.
In MATLAB do it:
B = [x1(:) x2(:) x3(:)] \ Y;
Y_fit = [x1(:) x2(:) x3(:)] * B;
In PYTHON do it:
import numpy as np
B, _, _, _ = np.linalg.lstsq([x1[:], x2[:], x3[:]], Y)
Y_fit = np.matmul([x1[:] x2[:] x3[:]], B)
Related
I have a curvefit problem
I have two functions
y = ax+b
y = ax^2+bx-2.3
I have one set of data each for the above functions
I need to find a and b using least square method combining both the functions
I was using fminsearch function to minimize the sum of squares of errors of these two functions.
I am unable to use this method in lsqcurvefit
Kindly help me
Regards
Ram
I think you'll need to worry less about which library routine to use and more about the math. Assuming you mean vertical offset least squares, then you'll want
D = sum_{i=1..m}(y_Li - a x_Li + b)^2 + sum_{i=j..n}(y_Pj - a x_Pj^2 - b x_Pj + 2.3)^2
where there are m points (x_Li, y_Li) on the line and n points (x_Pj, y_Pj) on the parabola. Now find partial derivatives of D with respect to a and b. Setting them to zero provides two linear equations in 2 unknowns, a and b. Solve this linear system.
y = ax+b
y = ax^2+bx-2.3
In order to not confuse y of the first equation with y of the second equation we use distinct notations :
u = ax+b
v = ax^2+bx+c
The method of linear regression combined for the two functions is shown on the joint page :
HINT : If you want to find by yourself the matrixial equation appearing above, follow the Gene's answer.
I'm working on genetic algorithm which uses blend BLX-alpha crossover.
I found 2 algorithms, which seem to me quite different from each other
https://yadi.sk/i/u5nq986GuDoNm - page 8
crossover is made as follows:
a. Select 2 parents: G1, G2
b. generate uniformly distributed random number gamma from [-alpha, 1 + alpha], where alpha = 0.5
c. generate an offspring as follows: G = gamma * G1 + (1 - gamma) * G2
http://www.tomaszgwiazda.com/blendX.htm
crossover is made as follows:
a. select two parents X(t) and Y(t) from a parent pool
b. create two offspring X(t+1) and Y(t+1) as follows:
c. for i = 1 to n do
d. di=|xi(t)-yi(t)|
e. choose a uniform random real number u from interval
f. xi(t+1)=u
g. choose a uniform random real number u from interval
h. yi(t+1)=u
i. end do
where:
a – positive real parameter
xi, yi - the i-th component of a parent
di - distance betweet parent components
Which of these 2 algorithms is correct? Or they are equal?
In my task I'm using 2nd method, because the first one provides unsatisfying results.
I concerned with this question, because I'm working on GA, where the first algorithm is supposed to be used.
Any help would be appreciated!
You can search for paper "Real-Coded Genetic Algorithms and Interval Schemata" in which the BLX-alpha crossover first introduced.
In this paper,the first algorithm is introduced.
As for the second one ,I think it is equal to the first one in the way of producing offsprings.Because the second algs produces two offsprings one time,it has more chance to get a better individual.But it also needs more FEs.
def crossover_blen(p1,p2,alpha):
c1,c2 = deepcopy(p1),deepcopy(p2)
for i in range(len(p1)):
distancia = abs(c2[i]-c1[i])
l = min(c1[i],c2[i]) - alpha * distancia
u = max(c1[i],c2[i]) + alpha * distancia
c1[i] = l + random.random() * (u-l)
c2[i] = l + random.random() * (u-l)
return [c1,c2]
I have to do optimization in supervised learning to get my weights.
I have to learn the values (w1,w2,w3,w4) such that whenever my vector A = [a1 a2 a3 a4] is 1 the sum w1*a1 + w2*a2 + w3*a3 + w4*a4 becomes greater than 0.5 and when its -1 ( labels ) then it becomes less than 0.5.
Can somebody tell me how I can approach this problem in Matlab ? One way that I know is to do it using evolutionary algorithms, taking a random value vector and then changing to pick the best n values.
Is there any other way that this can be approached ?
You can do it using linprog.
Let A be a matrix of size n by 4 consisting of all n training 4-vecotrs you have. You should also have a vector y with n elements (each either plus or minus 1), representing the label of each training 4-vecvtor.
Using A and y we can write a linear program (look at the doc for the names of the parameters I'm using). Now, you do not have an objective function, so you can simply set f to be f = zeros(4,1);.
The only thing you have is an inequality constraint (< a_i , w > - .5) * y_i >= 0 (where <.,.> is a dot-product between 4-vector a_i and weight vector w).
If my calculations are correct, this constraint can be written as
cmat = bsxfun( #times, A, y );
Overall you get
w = linprog( zeros(4,1), -cmat, .5*y );
Is there a general way to convert between a measure of similarity and a measure of distance?
Consider a similarity measure like the number of 2-grams that two strings have in common.
2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4
What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?
This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?
I appreciate any guidance you may offer.
Let d denotes distance, s denotes similarity. To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by:
s = 1 - d_norm.
where s is in the range [0 1], with 1 denotes highest similarity (the items in comparison are identical), and 0 denotes lowest similarity (largest distance).
If your similarity measure (s) is between 0 and 1, you can use one of these:
1-s
sqrt(1-s)
-log(s)
(1/s)-1
Doing 1/similarity is not going to keep the properties of the distribution.
the best way is
distance (a->b) = highest similarity - similarity (a->b).
with highest similarity being the similarity with the biggest value. You hence flip your distribution.
the highest similarity becomes 0 etc
Yes, there is a most general way to change between similarity and distance: a strictly monotone decreasing function f(x).
That is, with f(x) you can make similarity = f(distance) or distance = f(similarity). It works in both directions. Such function works, because the relation between similarity and distance is that one decreases when the other increases.
Examples:
These are some well-known strictly monotone decreasing candidates that work for non-negative similarities or distances:
f(x) = 1 / (a + x)
f(x) = exp(- x^a)
f(x) = arccot(ax)
You can choose parameter a>0 (e.g., a=1)
Edit 2021-08
A very practical approach is to use the function sim2diss belonging to the statistical software R. This functions provides a up to 13 methods to compute dissimilarity from similarities. Sadly the methods are not at all explained: you have to look into the code :-\
similarity = 1/difference
and watch out for difference = 0
According to scikit learn:
Kernels are measures of similarity, i.e. s(a, b) > s(a, c) if objects a and b are considered “more similar” than objects a and c. A kernel must also be positive semi-definite.
There are a number of ways to convert between a distance metric and a similarity measure, such as a kernel. Let D be the distance, and S be the kernel:
S = np.exp(-D * gamma), where one heuristic for choosing gamma is 1 /
num_features
S = 1. / (D / np.max(D))
In the case of Levenshtein distance, you could increase the sim score by 1 for every time the sequences match; that is, 1 for every time you didn't need a deletion, insertion or substitution. That way the metric would be a linear measure of how many characters the two strings have in common.
In one of my projects (based on Collaborative Filtering) I had to convert between correlation (cosine between vectors) which was from -1 to 1 (closer 1 is more similar, closer to -1 is more diverse) to normalized distance (close to 0 the distance is smaller and if it's close to 1 the distance is bigger)
In this case: distance ~ diversity
My formula was: dist = 1 - (cor + 1)/2
If you have similarity to diversity and the domain is [0,1] in both cases the simlest way is:
dist = 1 - sim
sim = 1 - dist
Cosine similarity is widely used for n-gram count or TFIDF vectors.
from math import pi, acos
def similarity(x, y):
return sum(x[k] * y[k] for k in x if k in y) / sum(v**2 for v in x.values())**.5 / sum(v**2 for v in y.values())**.5
Cosine similarity can be used to compute a formal distance metric according to wikipedia. It obeys all the properties of a distance that you would expect (symmetry, nonnegativity, etc):
def distance_metric(x, y):
return 1 - 2 * acos(similarity(x, y)) / pi
Both of these metrics range between 0 and 1.
If you have a tokenizer that produces N-grams from a string you could use these metrics like this:
>>> import Tokenizer
>>> tokenizer = Tokenizer(ngrams=2, lower=True, nonwords_set=set(['hello', 'and']))
>>> from Collections import Counter
>>> list(tokenizer('Hello World again and again?'))
['world', 'again', 'again', 'world again', 'again again']
>>> Counter(tokenizer('Hello World again and again?'))
Counter({'again': 2, 'world': 1, 'again again': 1, 'world again': 1})
>>> x = _
>>> Counter(tokenizer('Hi world once again.'))
Counter({'again': 1, 'world once': 1, 'hi': 1, 'once again': 1, 'world': 1, 'hi world': 1, 'once': 1})
>>> y = _
>>> sum(x[k]*y[k] for k in x if k in y) / sum(v**2 for v in x.values())**.5 / sum(v**2 for v in y.values())**.5
0.42857142857142855
>>> distance_metric(x, y)
0.28196592805724774
I found the elegant inner product of Counter in this SO answer
Suppose you have a list of floating point numbers that are approximately multiples of a common quantity, for example
2.468, 3.700, 6.1699
which are approximately all multiples of 1.234. How would you characterize this "approximate gcd", and how would you proceed to compute or estimate it?
Strictly related to my answer to this question.
You can run Euclid's gcd algorithm with anything smaller then 0.01 (or a small number of your choice) being a pseudo 0. With your numbers:
3.700 = 1 * 2.468 + 1.232,
2.468 = 2 * 1.232 + 0.004.
So the pseudo gcd of the first two numbers is 1.232. Now you take the gcd of this with your last number:
6.1699 = 5 * 1.232 + 0.0099.
So 1.232 is the pseudo gcd, and the mutiples are 2,3,5. To improve this result, you may take the linear regression on the data points:
(2,2.468), (3,3.7), (5,6.1699).
The slope is the improved pseudo gcd.
Caveat: the first part of this is algorithm is numerically unstable - if you start with very dirty data, you are in trouble.
Express your measurements as multiples of the lowest one. Thus your list becomes 1.00000, 1.49919, 2.49996. The fractional parts of these values will be very close to 1/Nths, for some value of N dictated by how close your lowest value is to the fundamental frequency. I would suggest looping through increasing N until you find a sufficiently refined match. In this case, for N=1 (that is, assuming X=2.468 is your fundamental frequency) you would find a standard deviation of 0.3333 (two of the three values are .5 off of X * 1), which is unacceptably high. For N=2 (that is, assuming 2.468/2 is your fundamental frequency) you would find a standard deviation of virtually zero (all three values are within .001 of a multiple of X/2), thus 2.468/2 is your approximate GCD.
The major flaw in my plan is that it works best when the lowest measurement is the most accurate, which is likely not the case. This could be mitigated by performing the entire operation multiple times, discarding the lowest value on the list of measurements each time, then use the list of results of each pass to determine a more precise result. Another way to refine the results would be adjust the GCD to minimize the standard deviation between integer multiples of the GCD and the measured values.
This reminds me of the problem of finding good rational-number approximations of real numbers. The standard technique is a continued-fraction expansion:
def rationalizations(x):
assert 0 <= x
ix = int(x)
yield ix, 1
if x == ix: return
for numer, denom in rationalizations(1.0/(x-ix)):
yield denom + ix * numer, numer
We could apply this directly to Jonathan Leffler's and Sparr's approach:
>>> a, b, c = 2.468, 3.700, 6.1699
>>> b/a, c/a
(1.4991896272285252, 2.4999594813614263)
>>> list(itertools.islice(rationalizations(b/a), 3))
[(1, 1), (3, 2), (925, 617)]
>>> list(itertools.islice(rationalizations(c/a), 3))
[(2, 1), (5, 2), (30847, 12339)]
picking off the first good-enough approximation from each sequence. (3/2 and 5/2 here.) Or instead of directly comparing 3.0/2.0 to 1.499189..., you could notice than 925/617 uses much larger integers than 3/2, making 3/2 an excellent place to stop.
It shouldn't much matter which of the numbers you divide by. (Using a/b and c/b you get 2/3 and 5/3, for instance.) Once you have integer ratios, you could refine the implied estimate of the fundamental using shsmurfy's linear regression. Everybody wins!
I'm assuming all of your numbers are multiples of integer values. For the rest of my explanation, A will denote the "root" frequency you are trying to find and B will be an array of the numbers you have to start with.
What you are trying to do is superficially similar to linear regression. You are trying to find a linear model y=mx+b that minimizes the average distance between a linear model and a set of data. In your case, b=0, m is the root frequency, and y represents the given values. The biggest problem is that the independent variables X are not explicitly given. The only thing we know about X is that all of its members must be integers.
Your first task is trying to determine these independent variables. The best method I can think of at the moment assumes that the given frequencies have nearly consecutive indexes (x_1=x_0+n). So B_0/B_1=(x_0)/(x_0+n) given a (hopefully) small integer n. You can then take advantage of the fact that x_0 = n/(B_1-B_0), start with n=1, and keep ratcheting it up until k-rnd(k) is within a certain threshold. After you have x_0 (the initial index), you can approximate the root frequency (A = B_0/x_0). Then you can approximate the other indexes by finding x_n = rnd(B_n/A). This method is not very robust and will probably fail if the error in the data is large.
If you want a better approximation of the root frequency A, you can use linear regression to minimize the error of the linear model now that you have the corresponding dependent variables. The easiest method to do so uses least squares fitting. Wolfram's Mathworld has a in-depth mathematical treatment of the issue, but a fairly simple explanation can be found with some googling.
Interesting question...not easy.
I suppose I would look at the ratios of the sample values:
3.700 / 2.468 = 1.499...
6.1699 / 2.468 = 2.4999...
6.1699 / 3.700 = 1.6675...
And I'd then be looking for a simple ratio of integers in those results.
1.499 ~= 3/2
2.4999 ~= 5/2
1.6675 ~= 5/3
I haven't chased it through, but somewhere along the line, you decide that an error of 1:1000 or something is good enough, and you back-track to find the base approximate GCD.
The solution which I've seen and used myself is to choose some constant, say 1000, multiply all numbers by this constant, round them to integers, find the GCD of these integers using the standard algorithm and then divide the result by the said constant (1000). The larger the constant, the higher the precision.
This is a reformulaiton of shsmurfy's solution when you a priori choose 3 positive tolerances (e1,e2,e3)
The problem is then to search smallest positive integers (n1,n2,n3) and thus largest root frequency f such that:
f1 = n1*f +/- e1
f2 = n2*f +/- e2
f3 = n3*f +/- e3
We assume 0 <= f1 <= f2 <= f3
If we fix n1, then we get these relations:
f is in interval I1=[(f1-e1)/n1 , (f1+e1)/n1]
n2 is in interval I2=[n1*(f2-e2)/(f1+e1) , n1*(f2+e2)/(f1-e1)]
n3 is in interval I3=[n1*(f3-e3)/(f1+e1) , n1*(f3+e3)/(f1-e1)]
We start with n1 = 1, then increment n1 until the interval I2 and I3 contain an integer - that is floor(I2min) different from floor(I2max) same with I3
We then choose smallest integer n2 in interval I2, and smallest integer n3 in interval I3.
Assuming normal distribution of floating point errors, the most probable estimate of root frequency f is the one minimizing
J = (f1/n1 - f)^2 + (f2/n2 - f)^2 + (f3/n3 - f)^2
That is
f = (f1/n1 + f2/n2 + f3/n3)/3
If there are several integers n2,n3 in intervals I2,I3 we could also choose the pair that minimize the residue
min(J)*3/2=(f1/n1)^2+(f2/n2)^2+(f3/n3)^2-(f1/n1)*(f2/n2)-(f1/n1)*(f3/n3)-(f2/n2)*(f3/n3)
Another variant could be to continue iteration and try to minimize another criterium like min(J(n1))*n1, until f falls below a certain frequency (n1 reaches an upper limit)...
I found this question looking for answers for mine in MathStackExchange (here and here).
I've only managed (yet) to measure the appeal of a fundamental frequency given a list of harmonic frequencies (following the sound/music nomenclature), which can be useful if you have a reduced number of options and is feasible to compute the appeal of each one and then choose the best fit.
C&P from my question in MSE (there the formatting is prettier):
being v the list {v_1, v_2, ..., v_n}, ordered from lower to higher
mean_sin(v, x) = sum(sin(2*pi*v_i/x), for i in {1, ...,n})/n
mean_cos(v, x) = sum(cos(2*pi*v_i/x), for i in {1, ...,n})/n
gcd_appeal(v, x) = 1 - sqrt(mean_sin(v, x)^2 + (mean_cos(v, x) - 1)^2)/2, which yields a number in the interval [0,1].
The goal is to find the x that maximizes the appeal. Here is the (gcd_appeal) graph for your example [2.468, 3.700, 6.1699], where you find that the optimum GCD is at x = 1.2337899957639993
Edit:
You may find handy this JAVA code to calculate the (fuzzy) divisibility (aka gcd_appeal) of a divisor relative to a list of dividends; you can use it to test which of your candidates makes the best divisor. The code looks ugly because I tried to optimize it for performance.
//returns the mean divisibility of dividend/divisor as a value in the range [0 and 1]
// 0 means no divisibility at all
// 1 means full divisibility
public double divisibility(double divisor, double... dividends) {
double n = dividends.length;
double factor = 2.0 / divisor;
double sum_x = -n;
double sum_y = 0.0;
double[] coord = new double[2];
for (double v : dividends) {
coordinates(v * factor, coord);
sum_x += coord[0];
sum_y += coord[1];
}
double err = 1.0 - Math.sqrt(sum_x * sum_x + sum_y * sum_y) / (2.0 * n);
//Might happen due to approximation error
return err >= 0.0 ? err : 0.0;
}
private void coordinates(double x, double[] out) {
//Bhaskara performant approximation to
//out[0] = Math.cos(Math.PI*x);
//out[1] = Math.sin(Math.PI*x);
long cos_int_part = (long) (x + 0.5);
long sin_int_part = (long) x;
double rem = x - cos_int_part;
if (cos_int_part != sin_int_part) {
double common_s = 4.0 * rem;
double cos_rem_s = common_s * rem - 1.0;
double sin_rem_s = cos_rem_s + common_s + 1.0;
out[0] = (((cos_int_part & 1L) * 8L - 4L) * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (((sin_int_part & 1L) * 8L - 4L) * sin_rem_s) / (sin_rem_s + 5.0);
} else {
double common_s = 4.0 * rem - 4.0;
double sin_rem_s = common_s * rem;
double cos_rem_s = sin_rem_s + common_s + 3.0;
double common_2 = ((cos_int_part & 1L) * 8L - 4L);
out[0] = (common_2 * cos_rem_s) / (cos_rem_s + 5.0);
out[1] = (common_2 * sin_rem_s) / (sin_rem_s + 5.0);
}
}