Why B-Spline are defined only where basis function sum to 1? - curve-fitting

I'm trying to understand b-spline.
It's not clear why << The curve is defined only where order basis functions overlap>>, where order is degree+1 (for a cubic the order is 4).
I found also that where a number equal to the order of basis functions overlap the sum is 1, and maybe is linked to the fact that curve starts here.
The first phrase comes from: http://www-evasion.imag.fr/~Francois.Faure/doc/inventorMentor/sgi_html/ch08.html in "Knot Sequence" section. I pasted it for you:

The definition of the curve states, that the NURBS factors of the basis function sum up to 1. Outside the interval, their sum is lower than 1.
E.g. take two points p1 and p2 (and give them some coordinates as you like). The combination q = 0.5*p1 + 0.5 * p2 gives us the point q in the middle of p1 and p2 (as 0.5 + 0.5 = 1). But where does the point q' = 0.2 * p1 + 0.2 * p2 sit? Try it out...

Related

Understanding Support Vector Regression (SVR) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm working with SVR, and using this resource. Erverything is super clear, with epsilon intensive loss function (from figure). Prediction comes with tube, to cover most training sample, and generalize bounds, using support vectors.
Then we have this explanation. This can be described by introducing (non-negative) slack variables , to measure the deviation of training samples outside -insensitive zone. I understand this error, outside tube, but don't know, how we can use this in optimization. Could somebody explain this?
In local source. I'm trying to achieve very simple optimization solution, without libraries. This what I have for loss function.
import numpy as np
# Kernel func, linear by default
def hypothesis(x, weight, k=None):
k = k if k else lambda z : z
k_x = np.vectorize(k)(x)
return np.dot(k_x, np.transpose(weight))
.......
import math
def boundary_loss(x, y, weight, epsilon):
prediction = hypothesis(x, weight)
scatter = np.absolute(
np.transpose(y) - prediction)
bound = lambda z: z \
if z >= epsilon else 0
return np.sum(np.vectorize(bound)(scatter))
First, let's look at the objective function. The first term, 1/2 * w^2 (wish this site had LaTeX support but this will suffice) correlates with the margin of the SVM. The article you linked doesn't, in my opinion, explain this very well and calls this term describing "the model's complexity", but perhaps this is not the best way of explaining it. Minimizing this term maximizes the margin (while still representing the data well), which is the predominant goal of using SVM's doing regression.
Warning, Math Heavy Explanation: The reason this is the case is that when maximizing the margin, you want to find the "farthest" non-outlier points right on the margin and minimize its distance. Let this farthest point be x_n. We want to find its Euclidean distance d from the plane f(w, x) = 0, which I will rewrite as w^T * x + b = 0 (where w^T is just the transpose of the weights matrix so that we can multiply the two). To find the distance, let us first normalize the plane such that |w^T * x_n + b| = epsilon, which we can do WLOG as w is still able to form all possible planes of the form w^T * x + b= 0. Then, let's note that w is perpendicular to the plane. This is obvious if you have dealt a lot with planes (particularly in vector calculus), but can be proven by choosing two points on the plane x_1 and x_2, then noticing that w^T * x_1 + b = 0, and w^T * x_2 + b = 0. Subtracting the two equations we get w^T(x_1 - x_2) = 0. Since x_1 - x_2 is just any vector strictly on the plane, and its dot product with w is 0, then we know that w is perpendicular to the plane. Finally, to actually calculate the distance between x_n and the plane, we take the vector formed by x_n' and some point on the plane x' (The vectors would then be x_n - x', and projecting it onto the vector w. Doing this, we get d = |w * (x_n - x') / |w||, which we can rewrite as d = (1 / |w|) * | w^T * x_n - w^T x'|, and then add and subtract b to the inside to get d = (1 / |w|) * | w^T * x_n + b - w^T * x' - b|. Notice that w^T * x_n + b is epsilon (from our normalization above), and that w^T * x' + b is 0, as this is just a point on our plane. Thus, d = epsilon / |w|. Notice that maximizing this distance subject to our constraint of finding the x_n and having |w^T * x_n + b| = epsilon is a difficult optimization problem. What we can do is restructure this optimization problem as minimizing 1/2 * w^T * w subject to the first two constraints in the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon. You may think that I have forgotten the slack variables, and this is true, but when just focusing on this term and ignoring the second term, we ignore the slack variables for now, I will bring them back later. The reason these two optimizations are equivalent is not obvious, but the underlying reason lies in discrimination boundaries, which you are free to read more about (it's a lot more math that frankly I don't think this answer needs more of). Then, note that minimizing 1/2 * w^T * w is the same as minimizing 1/2 * |w|^2, which is the desired result we were hoping for. End of the Heavy Math
Now, notice that we want to make the margin big, but not so big that includes noisy outliers like the one in the picture you provided.
Thus, we introduce a second term. To motivate the margin down to a reasonable size the slack variables are introduced, (I will call them p and p* because I don't want to type out "psi" every time). These slack variables will ignore everything in the margin, i.e. those are the points that do not harm the objective and the ones that are "correct" in terms of their regression status. However, the points outside the margin are outliers, they do not reflect well on the regression, so we penalize them simply for existing. The slack error function that is given there is relatively easy to understand, it just adds up the slack error of every point (p_i + p*_i) for i = 1,...,N, and then multiplies by a modulating constant C which determines the relative importance of the two terms. A low value of C means that we are okay with having outliers, so the margin will be thinned and more outliers will be produced. A high value of C indicates that we care a lot about not having slack, so the margin will be made bigger to accommodate these outliers at the expense of representing the overall data less well.
A few things to note about p and p*. First, note that they are both always >= 0. The constraint in your picture shows this, but it also intuitively makes sense as slack should always add to the error, so it is positive. Second, notice that if p > 0, then p* = 0 and vice versa as an outlier can only be on one side of the margin. Last, all points inside the margin will have p and p* be 0, since they are fine where they are and thus do not contribute to the loss.
Notice that with the introduction of the slack variables, if you have any outliers then you won't want the condition from the first term, that is, |w^T * x_n + b| = epsilon as the x_n would be this outlier, and your whole model would be screwed up. What we allow for, then, is to change the constraint to be |w^T * x_n + b| = epsilon + (p + p*). When translated to the new optimization's constraint, we get the full constraint from the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon + p + p*. (I combined the two equations into one here, but you could rewrite them as the picture is and that would be the same thing).
Hopefully after covering all this up, the motivation for the objective function and the corresponding slack variables makes sense to you.
If I understand the question correctly, you also want code to calculate this objective/loss function, which I think isn't too bad. I have not tested this (yet), but I think this should be what you want.
# Function for calculating the error/loss for a SVM. I assume that:
# - 'x' is 2d array representing the vectors of the data points
# - 'y' is an array representing the values each vector actually gives
# - 'weights' is an array of weights that we tune for the regression
# - 'epsilon' is a scalar representing the breadth of our margin.
def optimization_objective(x, y, weights, epsilon):
# Calculates first term of objective (note that norm^2 = dot product)
margin_term = np.dot(weight, weight) / 2
# Now calculate second term of objective. First get the sum of slacks.
slack_sum = 0
for i in range(len(x)): # For each observation
# First find the absolute distance between expected and observed.
diff = abs(hypothesis(x[i]) - y[i])
# Now subtract epsilon
diff -= epsilon
# If diff is still more than 0, then it is an 'outlier' and will have slack.
slack = max(0, diff)
# Add it to the slack sum
slack_sum += slack
# Now we have the slack_sum, so then multiply by C (I picked this as 1 aribtrarily)
C = 1
slack_term = C * slack_sum
# Now, simply return the sum of the two terms, and we are done.
return margin_term + slack_term
I got this function working on my computer with small data, and you may have to change it a little to work with your data if, for example, the arrays are structured differently, but the idea is there. Also, I am not the most proficient with python, so this may not be the most efficient implementation, but my intent was to make it understandable.
Now, note that this just calculates the error/loss (whatever you want to call it). To actually minimize it requires going into Lagrangians and intense quadratic programming which is a much more daunting task. There are libraries available for doing this but if you want to do this library free as you are doing with this, I wish you good luck because doing that is not a walk in the park.
Finally, I would like to note that most of this information I got from notes I took in my ML class I took last year, and the professor (Dr. Abu-Mostafa) was a great help to have me learn the material. The lectures for this class are online (by the same prof), and the pertinent ones for this topic are here and here (although in my very biased opinion you should watch all the lectures, they were a great help). Leave a comment/question if you need anything cleared up or if you think I made a mistake somewhere. If you still don't understand, I can try to edit my answer to make more sense. Hope this helps!

Geometry matrix colinearity prove

Let's say I have this matrix with n=4 and m=5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Let's say I have a diagonal from the (1,2) point to the (4,5) point. And I have a point P(3,4). How can I check in my algorithm that P is on the diagonal?
TL;DR
Instead of an n-by-m matrix, think about it like a x-y grid. You can get the equation of a line on that grid, and once you have that equation, you put the x coordinate of the point you are interested in checking into your equation. If the y value you calculate from the equation matches the y coordinate of the point you are checking, the point lies on the line.
But How Do I Maths?
First some quick terminology. We have 3 points of interest in this case - the two points that define the line (or "diagonal", as the OP calls it), and the one point that we want to check. I'm going to designate the coordinates of the "diagonal" points with the numbers 1 and 2, and the point we want to check with the letter i. Additionally, for the math we need to do later, I need to treat the horizontal and vertical coordinates of the points separately, and I'll use your n-by-m convention to do so. So when I write n1 in an equation below, that is the n coordinate of the first point used to define the diagonal (so the 1 part of the point (1,2) that you give in your example).
What we are looking for is the equation of a line on our grid. This equation will have the form n = (slope) * m + (intercept).
Okay, now that we have the definitions taken care of, we can write the equations. The first step to solving the problem is finding the slope of your line. This will be the change in the vertical coordinate divided by the change in the horizontal component between the two points that define the line (so (n2 - n1) / (m2 - m1)). Using the values from your example, this will be (4 - 1) / (5 - 2) = 3 / 3 = 1. Note that since you are doing a division here, it is possible that your answer will not be a whole number, so make sure you keep that in mind when declaring your variables in whatever programming language you end up using - unintentional rounding in this step can really mess things up later.
Once we have our slope, the next step is calculating our intercept. We can do this by plugging our slope and the m and n coordinates into the equation for the line we are trying to get. So we start with the equation n1 = (slope) * m1 + (intercept). We can rearrange this equation to (intercept) = n1 - (slope) * m1. Plugging in the values from our example, we get (intercept) = 1 - (1 * 2) = -1.
So now we have the general equation of our line, which for our example is n = (1) * m + (-1).
Now that we have the (slope) and (intercept), we can plug in the coordinates of any point we want to check and see if the numbers match up. Our example point has a m coordinate of 4, so we can plug that into our equation.
n = (1) * (4) + (-1) = 3
Since the n coordinate we calculated using our equation matches the n coordinate of our point in our example, we can say that the sample point DOES fall on the line.
Suppose we wanted to also check to see if the point (2,5) was also on the line. When we plug that point's m coordinate into our equation, we get...
n = (1) * (5) + (-1) = 4
Since the n coordinate we calculated with our equation (4) doesn't match the n coordinate of the point we were checking (2), we know this point DOES NOT fall on the line.

Weighted random number (without predefined values!)

currently I'm needing a function which gives a weighted, random number.
It should chose a random number between two doubles/integers (for example 4 and 8) while the value in the middle (6) will occur on average, about twice as often than the limiter values 4 and 8.
If this were only about integers, I could predefine the values with variables and custom probabilities, but I need the function to give a double with at least 2 digits (meaning thousands of different numbers)!
The environment I use, is the "Game Maker" which provides all sorts of basic random-generators, but not weighted ones.
Could anyone possibly lead my in the right direction how to achieve this?
Thanks in advance!
The sum of two independent continuous uniform(0,1)'s, U1 and U2, has a continuous symmetrical triangle distribution between 0 and 2. The distribution has its peak at 1 and tapers to zero at either end. We can easily translate that to a range of (4,8) via scaling by 2 and adding 4, i.e., 4 + 2*(U1 + U2).
However, you don't want a height of zero at the endpoints, you want half the peak's height. In other words, you want a triangle sitting on a rectangular base (i.e., uniform), with height h at the endpoints and height 2h in the middle. That makes life easy, because the triangle must have a peak of height h above the rectangular base, and a triangle with height h has half the area of a rectangle with the same base and height h. It follows that 2/3 of your probability is in the base, 1/3 is in the triangle.
Combining the elements above leads to the following pseudocode algorithm. If rnd() is a function call that returns continuous uniform(0,1) random numbers:
define makeValue()
if rnd() <= 2/3 # Caution, may want to use 2.0/3.0 for many languages
return 4 + (4 * rnd())
else
return 4 + (2 * (rnd() + rnd()))
I cranked out a million values using that and plotted a histogram:
For the case someone needs this in Game Maker (or a different language ) as an universal function:
if random(1) <= argument0
return argument1 + ((argument2-argument1) * random(1))
else
return argument1 + (((argument2-argument1)/2) * (random(1) + random(1)))
Called as follows (similar to the standard random_range function):
val = weight_random_range(FACTOR, FROM, TO)
"FACTOR" determines how much of the whole probability figure is the "base" for constant probability. E.g. 2/3 for the figure above.
0 will provide a perfect triangle and 1 a rectangle (no weightning).

How to find the closest rotation

Consider points Y given in increasing order from [0,T). We are to consider these points as lying on a circle of circumference T. Now consider points X also from [0,T) and also lying on a circle of circumference T.
We say the distance between X and Y is the sum of the absolute distance between the each point in X and its closest point in Y recalling that both are considered to be lying in a circle. Write this distance as Delta(X, Y).
I am trying to find a quick way of determining a rotation of X which makes this distance as small as possible.
My code for making some data to test with is
#!/usr/bin/python
import random
import numpy as np
from bisect import bisect_left
def simul(rate, T):
time = np.random.exponential(rate)
times = [0]
newtime = times[-1]+time
while (newtime < T):
times.append(newtime)
newtime = newtime+np.random.exponential(rate)
return times[1:]
For each point I use this function to find its closest neighbor.
def takeClosest(myList, myNumber, T):
"""
Assumes myList is sorted. Returns closest value to myNumber in a circle of circumference T.
If two numbers are equally close, return the smallest number.
"""
pos = bisect_left(myList, myNumber)
before = myList[pos - 1]
after = myList[pos%len(myList)]
if after - myNumber < myNumber - before:
return after
else:
return before
So the distance between two circles is:
def circle_dist(timesY, timesX):
dist = 0
for t in timesX:
closest_number = takeClosest(timesY, t, T)
dist += np.abs(closest_number - t)
return dist
So to make some data we just do
#First make some data
T = 5000
timesX = simul(1, T)
timesY = simul(10, T)
Finally to rotate circle timesX by offset we can
timesX = [(t + offset)%T for t in timesX]
In practice my timesX and timesY will have about 20,000 points each.
Given timesX and timesY, how can I quickly find (approximately) which rotation of timesX gives
the smallest distance to timesY?
Distance along the circle between a single point and a set of points is a piecewise linear function of rotation. The critical points of this function are the points of the set itself (zero distance) and points midway between neighbouring points of the set (local maximums of distance). Linear coefficients of such function are ±1.
Sum of such functions is again piecewise linear, but now with a quadratic number of critical points. Actually all these functions are the same, except shifted along the argument axis. Linear coefficients of the sum are integers.
To find its minimum one would have to calculate its value in all critical points.
I don'see a way to significantly reduce the amount of work needed, but 1,600,000,000 points is not such a big deal anyway, especially if you can spread the work between several processors.
To calculate sum of two such functions, represent the summands as sequences of critical points and associated coefficients to the left and to the right of each critical point. Then just merge the two point sequences while adding the coefficients.
You can solve your (original) problem with a sweep line algorithm. The trick is to use the right "discretization". Imagine cutting your circle up into two strips:
X: x....x....x..........x................x.........x...x
Y: .....x..........x.....x..x.x...........x.............
Now calculate the score = 5+0++1+1+5+9+6.
The key observation is that if we rotate X very slightly (right say), some of the points will improve and some will get worse. We can call this the "differential". In the above example the differential would be 1 - 1 - 1 + 1 + 1 - 1 + 1 because the first point is matched to something on its right, the second point is matched to something under it or to its left etc.
Of course, as we move X more, the differential will change. However only as many times as the matchings change, which is never more than |X||Y| but probably much less.
The proposed algorithm is thus to calculate the initial score and the time (X position) of the next change in differential. Go to that next position and calculate the score again. Continue until you reach your starting position.
This is probably a good example for the iterative closest point (ICP) algorithm:
It repeatedly matches each point with its closest neighbor and moves all points such that the mean squared distance is minimized. (Note that this corresponds to minimizing the sum of squared distances.)
import pylab as pl
T = 10.0
X = pl.array([3, 5.5, 6])
Y = pl.array([1, 1.5, 2, 4])
pl.clf()
pl.subplot(1, 2, 1, polar=True)
pl.plot(X / T * 2 * pl.pi, pl.ones(X.shape), 'r.', ms=10, mew=3)
pl.plot(Y / T * 2 * pl.pi, pl.ones(Y.shape), 'b+', ms=10, mew=3)
circDist = lambda X, Y: (Y - X + T / 2) % T - T / 2
while True:
D = circDist(pl.reshape(X, (-1, 1)), pl.reshape(Y, (1, -1)))
closestY = pl.argmin(D**2, axis = 1)
distance = circDist(X, Y[closestY])
shift = pl.mean(distance)
if pl.absolute(shift) < 1e-3:
break
X = (X + shift) % T
pl.subplot(1, 2, 2, polar=True)
pl.plot(X / T * 2 * pl.pi, pl.ones(X.shape), 'r.', ms=10, mew=3)
pl.plot(Y / T * 2 * pl.pi, pl.ones(Y.shape), 'b+', ms=10, mew=3)
Important properties of the proposed solution are:
The ICP is an iterative algorithm. Thus it depends on an initial approximate solution. Furthermore, it won't always converge to the global optimum. This mainly depends on your data and the initial solution. If in doubt, try evaluating the ICP with different starting configurations and choose the most frequent result.
The current implementation performs a directed match: It looks for the closest point in Y relative to each point in X. It might yield different matches when swapping X and Y.
Computing all pair-wise distances between points in X and points in Y might be intractable for large point clouds (like 20,000 points, as you indicated). Therefore, the line D = circDist(...) might get replaced by a more efficient approach, e.g. not evaluating all possible pairs.
All points contribute to the final rotation. If there are any outliers, they might distort the shift significantly. This can be overcome with a robust average like the median or simply by excluding points with large distance.

Quadratic Bezier Interpolation

I would like to get some code in AS2 to interpolate a quadratic bezier curve. the nodes are meant to be at constant distance away from each other. Basically it is to animate a ball at constant speed along a non-hyperbolic quadratic bezier curve defined by 3 pts.
Thanks!
The Bezier curve math is really quite simple, so I'll help you out with that and you can translate it into ActionScript.
A 2D quadratic Bezier curve is defined by three (x,y) coordinates. I will refer to these as P0 = (x0,y0), P1 = (x1,y1) and P2 = (x2,y2). Additionally a parameter value t, which ranges from 0 to 1, is used to indicate any position along the curve. All x, y and t variables are real-valued (floating point).
The equation for a quadratic Bezier curve is:
P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2
So, using pseudocode, we can smoothly trace out the Bezier curve like so:
for i = 0 to step_count
t = i / step_count
u = 1 - t
P = P0*u*u + P1*2*u*t + P2*t*t
draw_ball_at_position( P )
This assumes that you have already defined the points P0, P1 and P2 as above. If you space the control points evenly then you should get nice even steps along the curve. Just define step_count to be the number of steps along the curve that you would like to see.
Please note that the expression can be done much more efficient mathematically.
P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2
and
P = P0*u*u + P1*2*u*t + P2*t*t
both hold t multiplications which can be simplified.
For example:
C = A*t + B(1-t) = A*t + B - B*t = t*(A-B) + B = You saved one multiplication = Double performance.
The solution proposed by Naaff, that is P(t) = P0*(1-t)^2 + P1*2*(1-t)*t + P2*t^2, will get you the correct "shape", but selecting evenly-spaced t in the [0:1] interval will not produce evenly-spaced P(t). In other words, the speed is not constant (you can differentiate the previous equation with respect to t to see see it).
Usually, a common method to traverse a parametric curve at constant-speed is to reparametrize by arc-length. This means expressing P as P(s) where s is the length traversed along the curve. Obviously, s varies from zero to the total length of the curve. In the case of a quadratic bezier curve, there's a closed-form solution for the arc-length as a function of t, but it's a bit complicated. Computationally, it's often faster to just integrate numerically using your favorite method. Notice however that the idea is to compute the inverse relation, that is, t(s), so as to express P as P(t(s)). Then, choosing evenly-spaced s will produce evenly-space P.

Resources