Understanding Support Vector Regression (SVR) [closed]

Understanding Support Vector Regression (SVR) [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm working with SVR, and using this resource. Erverything is super clear, with epsilon intensive loss function (from figure). Prediction comes with tube, to cover most training sample, and generalize bounds, using support vectors.
Then we have this explanation. This can be described by introducing (non-negative) slack variables , to measure the deviation of training samples outside -insensitive zone. I understand this error, outside tube, but don't know, how we can use this in optimization. Could somebody explain this?
In local source. I'm trying to achieve very simple optimization solution, without libraries. This what I have for loss function.
import numpy as np
# Kernel func, linear by default
def hypothesis(x, weight, k=None):
k = k if k else lambda z : z
k_x = np.vectorize(k)(x)
return np.dot(k_x, np.transpose(weight))
.......
import math
def boundary_loss(x, y, weight, epsilon):
prediction = hypothesis(x, weight)
scatter = np.absolute(
np.transpose(y) - prediction)
bound = lambda z: z \
if z >= epsilon else 0
return np.sum(np.vectorize(bound)(scatter))

First, let's look at the objective function. The first term, 1/2 * w^2 (wish this site had LaTeX support but this will suffice) correlates with the margin of the SVM. The article you linked doesn't, in my opinion, explain this very well and calls this term describing "the model's complexity", but perhaps this is not the best way of explaining it. Minimizing this term maximizes the margin (while still representing the data well), which is the predominant goal of using SVM's doing regression.
Warning, Math Heavy Explanation: The reason this is the case is that when maximizing the margin, you want to find the "farthest" non-outlier points right on the margin and minimize its distance. Let this farthest point be x_n. We want to find its Euclidean distance d from the plane f(w, x) = 0, which I will rewrite as w^T * x + b = 0 (where w^T is just the transpose of the weights matrix so that we can multiply the two). To find the distance, let us first normalize the plane such that |w^T * x_n + b| = epsilon, which we can do WLOG as w is still able to form all possible planes of the form w^T * x + b= 0. Then, let's note that w is perpendicular to the plane. This is obvious if you have dealt a lot with planes (particularly in vector calculus), but can be proven by choosing two points on the plane x_1 and x_2, then noticing that w^T * x_1 + b = 0, and w^T * x_2 + b = 0. Subtracting the two equations we get w^T(x_1 - x_2) = 0. Since x_1 - x_2 is just any vector strictly on the plane, and its dot product with w is 0, then we know that w is perpendicular to the plane. Finally, to actually calculate the distance between x_n and the plane, we take the vector formed by x_n' and some point on the plane x' (The vectors would then be x_n - x', and projecting it onto the vector w. Doing this, we get d = |w * (x_n - x') / |w||, which we can rewrite as d = (1 / |w|) * | w^T * x_n - w^T x'|, and then add and subtract b to the inside to get d = (1 / |w|) * | w^T * x_n + b - w^T * x' - b|. Notice that w^T * x_n + b is epsilon (from our normalization above), and that w^T * x' + b is 0, as this is just a point on our plane. Thus, d = epsilon / |w|. Notice that maximizing this distance subject to our constraint of finding the x_n and having |w^T * x_n + b| = epsilon is a difficult optimization problem. What we can do is restructure this optimization problem as minimizing 1/2 * w^T * w subject to the first two constraints in the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon. You may think that I have forgotten the slack variables, and this is true, but when just focusing on this term and ignoring the second term, we ignore the slack variables for now, I will bring them back later. The reason these two optimizations are equivalent is not obvious, but the underlying reason lies in discrimination boundaries, which you are free to read more about (it's a lot more math that frankly I don't think this answer needs more of). Then, note that minimizing 1/2 * w^T * w is the same as minimizing 1/2 * |w|^2, which is the desired result we were hoping for. End of the Heavy Math
Now, notice that we want to make the margin big, but not so big that includes noisy outliers like the one in the picture you provided.
Thus, we introduce a second term. To motivate the margin down to a reasonable size the slack variables are introduced, (I will call them p and p* because I don't want to type out "psi" every time). These slack variables will ignore everything in the margin, i.e. those are the points that do not harm the objective and the ones that are "correct" in terms of their regression status. However, the points outside the margin are outliers, they do not reflect well on the regression, so we penalize them simply for existing. The slack error function that is given there is relatively easy to understand, it just adds up the slack error of every point (p_i + p*_i) for i = 1,...,N, and then multiplies by a modulating constant C which determines the relative importance of the two terms. A low value of C means that we are okay with having outliers, so the margin will be thinned and more outliers will be produced. A high value of C indicates that we care a lot about not having slack, so the margin will be made bigger to accommodate these outliers at the expense of representing the overall data less well.
A few things to note about p and p*. First, note that they are both always >= 0. The constraint in your picture shows this, but it also intuitively makes sense as slack should always add to the error, so it is positive. Second, notice that if p > 0, then p* = 0 and vice versa as an outlier can only be on one side of the margin. Last, all points inside the margin will have p and p* be 0, since they are fine where they are and thus do not contribute to the loss.
Notice that with the introduction of the slack variables, if you have any outliers then you won't want the condition from the first term, that is, |w^T * x_n + b| = epsilon as the x_n would be this outlier, and your whole model would be screwed up. What we allow for, then, is to change the constraint to be |w^T * x_n + b| = epsilon + (p + p*). When translated to the new optimization's constraint, we get the full constraint from the picture you attached, that is, |y_i - f(x_i, w)| <= epsilon + p + p*. (I combined the two equations into one here, but you could rewrite them as the picture is and that would be the same thing).
Hopefully after covering all this up, the motivation for the objective function and the corresponding slack variables makes sense to you.
If I understand the question correctly, you also want code to calculate this objective/loss function, which I think isn't too bad. I have not tested this (yet), but I think this should be what you want.
# Function for calculating the error/loss for a SVM. I assume that:
# - 'x' is 2d array representing the vectors of the data points
# - 'y' is an array representing the values each vector actually gives
# - 'weights' is an array of weights that we tune for the regression
# - 'epsilon' is a scalar representing the breadth of our margin.
def optimization_objective(x, y, weights, epsilon):
# Calculates first term of objective (note that norm^2 = dot product)
margin_term = np.dot(weight, weight) / 2
# Now calculate second term of objective. First get the sum of slacks.
slack_sum = 0
for i in range(len(x)): # For each observation
# First find the absolute distance between expected and observed.
diff = abs(hypothesis(x[i]) - y[i])
# Now subtract epsilon
diff -= epsilon
# If diff is still more than 0, then it is an 'outlier' and will have slack.
slack = max(0, diff)
# Add it to the slack sum
slack_sum += slack
# Now we have the slack_sum, so then multiply by C (I picked this as 1 aribtrarily)
C = 1
slack_term = C * slack_sum
# Now, simply return the sum of the two terms, and we are done.
return margin_term + slack_term
I got this function working on my computer with small data, and you may have to change it a little to work with your data if, for example, the arrays are structured differently, but the idea is there. Also, I am not the most proficient with python, so this may not be the most efficient implementation, but my intent was to make it understandable.
Now, note that this just calculates the error/loss (whatever you want to call it). To actually minimize it requires going into Lagrangians and intense quadratic programming which is a much more daunting task. There are libraries available for doing this but if you want to do this library free as you are doing with this, I wish you good luck because doing that is not a walk in the park.
Finally, I would like to note that most of this information I got from notes I took in my ML class I took last year, and the professor (Dr. Abu-Mostafa) was a great help to have me learn the material. The lectures for this class are online (by the same prof), and the pertinent ones for this topic are here and here (although in my very biased opinion you should watch all the lectures, they were a great help). Leave a comment/question if you need anything cleared up or if you think I made a mistake somewhere. If you still don't understand, I can try to edit my answer to make more sense. Hope this helps!

Related

AI : find forces to make sum{torque}=0 (for drone maneuver)

There is a drone with a "hook" in the middle (black circle).
The image below shows top-view of the drone.
The hook disables drone movement, but the drone can still rotate in
every direction.
I know inertia of the drone in every axis.
There are many forces exert on the drone :- a b c d e.
(all in z direction, can be negative)
I can control some forces : a b c d.
I can't control e, but I know its value.
Each force can impact the summation of torque as torque = r x f,
r for all forces are known.
My objective
Find algorithm to calculate a b c d to make sum{torque} = 0, while minimize summation of square of force (a^2+b^2+c^2+d^2).
Just approximation is enough, i.e. the return result don't have to be real minimum.
In the real situation, there are more than 4 forces that I can control.
The algorithm will be used in automatic-drone-maneuver system in a game.
My attempt
I can split the equation into 3 axis:-
torqueX = k1*a + k2*b + k3*c + k4*d + k5 *e = 0 ....(1)
torqueY = k6*a + k7*b + k8*c + k9*d + k10*e = 0 ....(2)
torqueZ = k11*a + k12*b + k13*c + k14*d + k15*e = 0 ....(3)
All k's value are known.
I have another objective function.
minimize(a^2+b^2+c^2+d^2) ....(4)
After googling, I found that this is Linearly Constrained Least-Squares Problems and found an approach to find a precise solution using matrix notation. ([Ref1] to [Ref3])
I am very new to this kind of problem, but after I skim a bit, I feel that the approach requires inverse matrix computation.
I believe computation around the inverse matrix should be avoided, because there is a risk when the divider near 0.
Thus, I prefer to adjust the force a b c d gradually.
Question
Should this problem be solved by "iteration" approach?
If I insist, how to solve it using "iteration" approach?
I found [Ref4], but it seems to be very hard, too hard for a simple (?) problem like this.
Reference
[Ref1] http://stanford.edu/class/ee103/lectures/constrained-least-squares/constrained-least-squares_slides.pdf (slide 3)
[Ref2] http://stanford.edu/class/ee103/lectures/least-squares/mols_slides.pdf (slide 5)
[Ref3] https://inst.eecs.berkeley.edu/~ee127a/book/login/l_ols_cls_def.html
[Ref4] http://www.sciencedirect.com/science/article/pii/002437959190009L

Intersection of axis-aligned rectangular cuboids (MBR) in one dimension

Currently I'm doing benchmarks on time series indexing algorithms. Since most of the time no reference implementations are available, I have to write my own implementations (all in Java). At the moment I am stuck a little at section 6.2 of a paper called Indexing multi-dimensional time-series with support for multiple distance measures available here in PDF : http://hadjieleftheriou.com/papers/vldbj04-2.pdf
A MBR (minimum bounding rectangle) is basically a rectanglular cubiod with some coordinates and directions. As an example P and Q are two MBRs with P.coord={0,0,0} and P.dir={1,1,3} and Q.coords={0.5,0.5,1} and Q.dir={1,1,1} where the first entries represent the time dimension.
Now I would like to calculate the MINDIST(Q,P) between Q and P :
However I am not sure how to implement the "intersection of two MBRs in the time dimension" (Dim 1) since I am not sure what the intersection in the time dimension actually means. It is also not clear what h_Q, l_Q, l_P, h_P mean, since this notation is not explained (my guess is they mean something like highest or lowest value of a dimension in the intersection).
I would highly appreciate it, if someone could explain to me how to calculate the intersection of two MBRs in the first dimension and maybe enlighten me with an interpretation of the notation. Thanks!

Well, Figure 14 in your paper explains the time intersection. And the rectangles are axis-aligned, thus it makes sense to use high and low on each coordinate.
The multiplication sign you see is not a cross product, just a normal multiplication, because on both sides of it you have a scalar, and not vectors.
However I must agree that the discussions on page 14 are rather fuzzy, but they seem to tell us that both types of intersections (complete and partial), when they are have a t subscript, mean the norm of the intersection along the t coordinate.
Thus it seems you could factorize the time intersection to get a formula that would be :
It is worth noting that, maybe counter-intuitively, when your objects don't intersect on the time plane, their MINDIST is defined to be 0.
Hence the following pseudo-code ;
mindist(P, Q)
{
if( Q.coord[0] + Q.dir[0] < P.coord[0] ||
Q.coord[0] > P.coord[0] + P.dir[0] )
return 0;
time = min(Q.coord[0] + Q.dir[0], P.coord[0] + P.dir[0]) - max(Q.coord[0], P.coord[0]);
sum = 0;
for(d=1; d<D; ++d)
{
if( Q.coord[d] + Q.dir[d] < P.coord[d] )
x = Q.coord[d] + Q.dir[d] - P.coord[d];
else if( P.coord[d] + P.dir[d] < Q.coord[d] )
x = P.coord[d] + P.dir[d] - Q.coord[d];
else
x = 0;
sum += x*x;
}
return sqrt(time * sum);
}
Note the absolute values in the paper are unnecessary since we just checked which values where bigger, and we thus know we only add positive numbers.

Multiliteration implementation with inaccurate distance data

I am trying to create an android smartphone application which uses Apples iBeacon technology to determine the current indoor location of itself. I already managed to get all available beacons and calculate the distance to them via the rssi signal.
Currently I face the problem, that I am not able to find any library or implementation of an algorithm, which calculates the estimated location in 2D by using 3 (or more) distances of fixed points with the condition, that these distances are not accurate (which means, that the three "trilateration-circles" do not intersect in one point).
I would be deeply grateful if anybody can post me a link or an implementation of that in any common programming language (Java, C++, Python, PHP, Javascript or whatever). I already read a lot on stackoverflow about that topic, but could not find any answer I were able to convert in code (only some mathematical approaches with matrices and inverting them, calculating with vectors or stuff like that).
EDIT
I thought about an own approach, which works quite well for me, but is not that efficient and scientific. I iterate over every meter (or like in my example 0.1 meter) of the location grid and calculate the possibility of that location to be the actual position of the handset by comparing the distance of that location to all beacons and the distance I calculate with the received rssi signal.
Code example:
public Location trilaterate(ArrayList<Beacon> beacons, double maxX, double maxY)
{
for (double x = 0; x <= maxX; x += .1)
{
for (double y = 0; y <= maxY; y += .1)
{
double currentLocationProbability = 0;
for (Beacon beacon : beacons)
{
// distance difference between calculated distance to beacon transmitter
// (rssi-calculated distance) and current location:
// |sqrt(dX^2 + dY^2) - distanceToTransmitter|
double distanceDifference = Math
.abs(Math.sqrt(Math.pow(beacon.getLocation().x - x, 2)
+ Math.pow(beacon.getLocation().y - y, 2))
- beacon.getCurrentDistanceToTransmitter());
// weight the distance difference with the beacon calculated rssi-distance. The
// smaller the calculated rssi-distance is, the more the distance difference
// will be weighted (it is assumed, that nearer beacons measure the distance
// more accurate)
distanceDifference /= Math.pow(beacon.getCurrentDistanceToTransmitter(), 0.9);
// sum up all weighted distance differences for every beacon in
// "currentLocationProbability"
currentLocationProbability += distanceDifference;
}
addToLocationMap(currentLocationProbability, x, y);
// the previous line is my approach, I create a Set of Locations with the 5 most probable locations in it to estimate the accuracy of the measurement afterwards. If that is not necessary, a simple variable assignment for the most probable location would do the job also
}
}
Location bestLocation = getLocationSet().first().location;
bestLocation.accuracy = calculateLocationAccuracy();
Log.w("TRILATERATION", "Location " + bestLocation + " best with accuracy "
+ bestLocation.accuracy);
return bestLocation;
}
Of course, the downside of that is, that I have on a 300m² floor 30.000 locations I had to iterate over and measure the distance to every single beacon I got a signal from (if that would be 5, I do 150.000 calculations only for determine a single location). That's a lot - so I will let the question open and hope for some further solutions or a good improvement of this existing solution in order to make it more efficient.
Of course it has not to be a Trilateration approach, like the original title of this question was, it is also good to have an algorithm which includes more than three beacons for the location determination (Multilateration).

If the current approach is fine except for being too slow, then you could speed it up by recursively subdividing the plane. This works sort of like finding nearest neighbors in a kd-tree. Suppose that we are given an axis-aligned box and wish to find the approximate best solution in the box. If the box is small enough, then return the center.
Otherwise, divide the box in half, either by x or by y depending on which side is longer. For both halves, compute a bound on the solution quality as follows. Since the objective function is additive, sum lower bounds for each beacon. The lower bound for a beacon is the distance of the circle to the box, times the scaling factor. Recursively find the best solution in the child with the lower lower bound. Examine the other child only if the best solution in the first child is worse than the other child's lower bound.
Most of the implementation work here is the box-to-circle distance computation. Since the box is axis-aligned, we can use interval arithmetic to determine the precise range of distances from box points to the circle center.
P.S.: Math.hypot is a nice function for computing 2D Euclidean distances.

Instead of taking confidence levels of individual beacons into account, I would instead try to assign an overall confidence level for your result after you make the best guess you can with the available data. I don't think the only available metric (perceived power) is a good indication of accuracy. With poor geometry or a misbehaving beacon, you could be trusting poor data highly. It might make better sense to come up with an overall confidence level based on how well the perceived distance to the beacons line up with the calculated point assuming you trust all beacons equally.
I wrote some Python below that comes up with a best guess based on the provided data in the 3-beacon case by calculating the two points of intersection of circles for the first two beacons and then choosing the point that best matches the third. It's meant to get started on the problem and is not a final solution. If beacons don't intersect, it slightly increases the radius of each up until they do meet or a threshold is met. Likewise, it makes sure the third beacon agrees within a settable threshold. For n-beacons, I would pick 3 or 4 of the strongest signals and use those. There are tons of optimizations that could be done and I think this is a trial-by-fire problem due to the unwieldy nature of beaconing.
import math
beacons = [[0.0,0.0,7.0],[0.0,10.0,7.0],[10.0,5.0,16.0]] # x, y, radius
def point_dist(x1,y1,x2,y2):
x = x2-x1
y = y2-y1
return math.sqrt((x*x)+(y*y))
# determines two points of intersection for two circles [x,y,radius]
# returns None if the circles do not intersect
def circle_intersection(beacon1,beacon2):
r1 = beacon1[2]
r2 = beacon2[2]
dist = point_dist(beacon1[0],beacon1[1],beacon2[0],beacon2[1])
heron_root = (dist+r1+r2)*(-dist+r1+r2)*(dist-r1+r2)*(dist+r1-r2)
if ( heron_root > 0 ):
heron = 0.25*math.sqrt(heron_root)
xbase = (0.5)*(beacon1[0]+beacon2[0]) + (0.5)*(beacon2[0]-beacon1[0])*(r1*r1-r2*r2)/(dist*dist)
xdiff = 2*(beacon2[1]-beacon1[1])*heron/(dist*dist)
ybase = (0.5)*(beacon1[1]+beacon2[1]) + (0.5)*(beacon2[1]-beacon1[1])*(r1*r1-r2*r2)/(dist*dist)
ydiff = 2*(beacon2[0]-beacon1[0])*heron/(dist*dist)
return (xbase+xdiff,ybase-ydiff),(xbase-xdiff,ybase+ydiff)
else:
# no intersection, need to pseudo-increase beacon power and try again
return None
# find the two points of intersection between beacon0 and beacon1
# will use beacon2 to determine the better of the two points
failing = True
power_increases = 0
while failing and power_increases < 10:
res = circle_intersection(beacons[0],beacons[1])
if ( res ):
intersection = res
else:
beacons[0][2] *= 1.001
beacons[1][2] *= 1.001
power_increases += 1
continue
failing = False
# make sure the best fit is within x% (10% of the total distance from the 3rd beacon in this case)
# otherwise the results are too far off
THRESHOLD = 0.1
if failing:
print 'Bad Beacon Data (Beacon0 & Beacon1 don\'t intersection after many "power increases")'
else:
# finding best point between beacon1 and beacon2
dist1 = point_dist(beacons[2][0],beacons[2][1],intersection[0][0],intersection[0][1])
dist2 = point_dist(beacons[2][0],beacons[2][1],intersection[1][0],intersection[1][1])
if ( math.fabs(dist1-beacons[2][2]) < math.fabs(dist2-beacons[2][2]) ):
best_point = intersection[0]
best_dist = dist1
else:
best_point = intersection[1]
best_dist = dist2
best_dist_diff = math.fabs(best_dist-beacons[2][2])
if best_dist_diff < THRESHOLD*best_dist:
print best_point
else:
print 'Bad Beacon Data (Beacon2 distance to best point not within threshold)'
If you want to trust closer beacons more, you may want to calculate the intersection points between the two closest beacons and then use the farther beacon to tie-break. Keep in mind that almost anything you do with "confidence levels" for the individual measurements will be a hack at best. Since you will always be working with very bad data, you will defintiely need to loosen up the power_increases limit and threshold percentage.

You have 3 points : A(xA,yA,zA), B(xB,yB,zB) and C(xC,yC,zC), which respectively are approximately at dA, dB and dC from you goal point G(xG,yG,zG).
Let's say cA, cB and cC are the confidence rate ( 0 < cX <= 1 ) of each point.
Basically, you might take something really close to 1, like {0.95,0.97,0.99}.
If you don't know, try different coefficient depending of distance avg. If distance is really big, you're likely to be not very confident about it.
Here is the way i'll do it :
var sum = (cA*dA) + (cB*dB) + (cC*dC);
dA = cA*dA/sum;
dB = cB*dB/sum;
dC = cC*dC/sum;
xG = (xA*dA) + (xB*dB) + (xC*dC);
yG = (yA*dA) + (yB*dB) + (yC*dC);
xG = (zA*dA) + (zB*dB) + (zC*dC);
Basic, and not really smart but will do the job for some simple tasks.
EDIT
You can take any confidence coef you want in [0,inf[, but IMHO, restraining at [0,1] is a good idea to keep a realistic result.

Suggested algorithms/methods for laying out labels on an image

Given an image and a set of labels attached to particular points on the image, I'm looking for an algorithm to lay out the labels to the sides of the image with certain constraints (roughly same number of labels on each side, labels roughly equidistant, lines connecting the labels to their respective points with no lines crossing).
Now, an approximate solution can typically be found quite naively by ordering the labels by Y coordinate (of the point they refer to), as in this example (proof of concept only, please ignore accuracy or otherwise of actual data!).
Now to satisfy the condition of no crossings, some ideas that occurred to me:
use a genetic algorithm to find an ordering of labels with no crossovers;
use another method (e.g. dynamic programming algorithm) to search for such an ordering;
use one of the above algorithms, allowing for variations in the spacing as well as ordering, to find the solution that minimises number of crossings and variation from even spacing;
maybe there are criteria I can use to brute search through every possible ordering of the labels within certain criteria (do not re-order two labels if their distance is greater than X);
if all else fails, just try millions of random orderings/spacing offsets and take the one that gives the minimum crossings/spacing variation. (Advantage: straightforward to program and will probably find a good enough solution; slight disadvantage, though not a show-stopper: maybe can't then run it on the fly during the application to allow user to change layout/size of the image.)
Before I embark on one of these, I would just welcome some other people's input: has anybody else experience with a similar problem and have any information to report on the success/failure of any of the above methods, or if they have a better/simpler solution that isn't occurring to me? Thanks for your input!

Lucas Bradsheet's honours thesis Labelling Maps using Multi-Objective Evolutionary Algorithms has quite a good discussion of this.
First off, this paper creates usable metrics for a number of metrics of labelling quality.
For example, clarity (how obvious the mapping between sites and labels was): clarity(s)=rs+rs1/rt
where rs is the distance between a site and its label and rt is the distance between a site and there closest other label).
It also has useful metrics for the conflicts between labels, sites and borders, as well as for measuring the density and symmetry of labels. Bradsheet then uses a multiple objective genetic algorithm to generate a "Pareto frontier" of feasible solutions. It also includes information about how he mutated the individuals, and some notes on improving the speed of the algorithm.
There's a lot of detail in it, and it should provide some good food for thought.

Let's forget about information design for a moment. This tasks recalls some memories related to PCB routing algorithms. Actually there are a lot of common requirements, including:
intersections optimization
size optimization
gaps optimization
So, it could be possible to turn the initial task into something similar to PCB routing.
There are a lot of information available, but I would suggest to look through Algorithmic studies on PCB routing by Tan Yan.
It provides a lot of details and dozens of hints.
Adaptation for the current task
The idea is to treat markers on the image and labels as two sets of pins and use escape routing to solve the task. Usually the PCB area is represented as an array of pins. Same can be done to the image with possible optimizations:
avoid low contrast areas
avoid text boxes if any
etc
So the task can be reduced to "routing in case of unused pins"
Final result can be really close to the requested style:
Algorithmic studies on PCB routing by Tan Yan is a good place to continue.
Additional notes
I chn change the style of the drawing a little bit, in order to accentuate the similarity.
It should not be a big problem to do some reverse transformation, keeping the good look and readability.
Anyway, adepts of simplicity (like me, for example) can spend several minutes and invent something better (or something different):
As for me, curves do not look like a complete solution, at least on this stage. Anyway, I've just tried to show there is room for enhancements, so PCB routing approach can be considered as an option.

One option is to turn it into an integer programming problem.
Lets say you have n points and n corresponding labels distributed around the outside of the diagram.
The number of possible lines is n^2, if we look at all possible intersections, there are less than n^4 intersections (if all possible lines were displayed).
In our integer programming problem we add the following constraints:
(to decide if a line is switched on (i.e. displayed to the screen) )
For each point on the diagram, only one of the possible n lines
connecting to it is to be switched on.
For each label, only one of the possible n lines connecting to it is
to be switched on.
For each pair of intersecting line segments line1 and line2, only
zero or one of these lines may be switched on.
Optionally, we can minimize the total distance of all the switched on lines. This enhances aesthetics.
When all of these constraints hold at the same time, we have a solution:
The code below produced the above diagram for 24 random points.
Once You start to get more than 15 or so points, the run time of the program will start to slow.
I used the PULP package with its default solver. I used PyGame for the display.
Here is the code:
__author__ = 'Robert'
import pygame
pygame.font.init()
import pulp
from random import randint
class Line():
def __init__(self, p1, p2):
self.p1 = p1
self.p2 = p2
self.length = (p1[0] - p2[0])**2 + (p1[1] - p2[1])**2
def intersect(self, line2):
#Copied some equations for wikipedia. Not sure if this is the best way to check intersection.
x1, y1 = self.p1
x2, y2 = self.p2
x3, y3 = line2.p1
x4, y4 = line2.p2
xtop = (x1*y2-y1*x2)*(x3-x4)-(x1-x2)*(x3*y4-y3*x4)
xbottom = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
ytop = (x1*y2-y1*x2)*(y3-y4)-(y1-y2)*(x3*y4-y3*x4)
ybottom = xbottom
if xbottom == 0:
#lines are parallel. Can only intersect if they are the same line. I'm not checking that however,
#which means there could be a rare bug that occurs if more than 3 points line up.
if self.p1 in (line2.p1, line2.p2) or self.p2 in (line2.p1, line2.p2):
return True
return False
x = float(xtop) / xbottom
y = float(ytop) / ybottom
if min(x1, x2) <= x <= max(x1, x2) and min(x3, x4) <= x <= max(x3, x4):
if min(y1, y2) <= y <= max(y1, y2) and min(y3, y4) <= y <= max(y3, y4):
return True
return False
def solver(lines):
#returns best line matching
lines = list(lines)
prob = pulp.LpProblem("diagram labelling finder", pulp.LpMinimize)
label_points = {} #a point at each label
points = {} #points on the image
line_variables = {}
variable_to_line = {}
for line in lines:
point, label_point = line.p1, line.p2
if label_point not in label_points:
label_points[label_point] = []
if point not in points:
points[point] = []
line_on = pulp.LpVariable("point{0}-point{1}".format(point, label_point),
lowBound=0, upBound=1, cat=pulp.LpInteger) #variable controls if line used or not
label_points[label_point].append(line_on)
points[point].append(line_on)
line_variables[line] = line_on
variable_to_line[line_on] = line
for lines_to_point in points.itervalues():
prob += sum(lines_to_point) == 1 #1 label to each point..
for lines_to_label in label_points.itervalues():
prob += sum(lines_to_label) == 1 #1 point for each label.
for line1 in lines:
for line2 in lines:
if line1 > line2 and line1.intersect(line2):
line1_on = line_variables[line1]
line2_on = line_variables[line2]
prob += line1_on + line2_on <= 1 #only switch one on.
#minimize length of switched on lines:
prob += sum(i.length * line_variables[i] for i in lines)
prob.solve()
print prob.solutionTime
print pulp.LpStatus[prob.status] #should say "Optimal"
print len(prob.variables())
for line_on, line in variable_to_line.iteritems():
if line_on.varValue > 0:
yield line #yield the lines that are switched on
class Diagram():
def __init__(self, num_points=20, width=700, height=800, offset=150):
assert(num_points % 2 == 0) #if even then labels align nicer (-:
self.background_colour = (255,255,255)
self.width, self.height = width, height
self.screen = pygame.display.set_mode((width, height))
pygame.display.set_caption('Diagram Labeling')
self.screen.fill(self.background_colour)
self.offset = offset
self.points = list(self.get_points(num_points))
self.num_points = num_points
self.font_size = min((self.height - 2 * self.offset)//num_points, self.offset//4)
def get_points(self, n):
for i in range(n):
x = randint(self.offset, self.width - self.offset)
y = randint(self.offset, self.height - self.offset)
yield (x, y)
def display_outline(self):
w, h = self.width, self.height
o = self.offset
outline1 = [(o, o), (w - o, o), (w - o, h - o), (o, h - o)]
pygame.draw.lines(self.screen, (0, 100, 100), True, outline1, 1)
o = self.offset - self.offset//4
outline2 = [(o, o), (w - o, o), (w - o, h - o), (o, h - o)]
pygame.draw.lines(self.screen, (0, 200, 0), True, outline2, 1)
def display_points(self, color=(100, 100, 0), radius=3):
for point in self.points:
pygame.draw.circle(self.screen, color, point, radius, 2)
def get_label_heights(self):
for i in range((self.num_points + 1)//2):
yield self.offset + 2 * i * self.font_size
def get_label_endpoints(self):
for y in self.get_label_heights():
yield (self.offset, y)
yield (self.width - self.offset, y)
def get_all_lines(self):
for point in self.points:
for end_point in self.get_label_endpoints():
yield Line(point, end_point)
def display_label_lines(self, lines):
for line in lines:
pygame.draw.line(self.screen, (255, 0, 0), line.p1, line.p2, 1)
def display_labels(self):
myfont = pygame.font.SysFont("Comic Sans MS", self.font_size)
label = myfont.render("label", True, (155, 155, 155))
for y in self.get_label_heights():
self.screen.blit(label, (self.offset//4 - 10, y - self.font_size//2))
pygame.draw.line(self.screen, (255, 0, 0), (self.offset - self.offset//4, y), (self.offset, y), 1)
for y in self.get_label_heights():
self.screen.blit(label, (self.width - 2*self.offset//3, y - self.font_size//2))
pygame.draw.line(self.screen, (255, 0, 0), (self.width - self.offset + self.offset//4, y), (self.width - self.offset, y), 1)
def display(self):
self.display_points()
self.display_outline()
self.display_labels()
#self.display_label_lines(self.get_all_lines())
self.display_label_lines(solver(self.get_all_lines()))
diagram = Diagram()
diagram.display()
pygame.display.flip()
running = True
while running:
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False

I think an actual solution of this problem is on the slightly different layer. It doesn't seem to be right idea to start solving algorithmic problem totally ignoring Information design. There is an interesting example found here
Let's identify some important questions:
How is the data best viewed?
Will it confuse people?
Is it readable?
Does it actually help to better understand the picture?
By the way, chaos is really confusing. We like order and predictability. There is no need to introduce additional informational noise to the initial image.
The readability of a graphical message is determined by the content and its presentation. Readability of a message involves the reader’s ability to understand the style of text and pictures. You have that interesting algorithmic task because of the additional "noisy" approach. Remove the chaos -- find better solution :)
Please note, this is just a PoC. The idea is to use only horizontal lines with clear markers. Labels placement is straightforward and deterministic. Several similar ideas can be proposed.
With such approach you can easily balance left-right labels, avoid small vertical gaps between lines, provide optimal vertical density for labels, etc.
EDIT
Ok, let's see how initial process may look.
User story: as a user I want important images to be annotated in order to simplify understanding and increase it's explanatory value.
Important assumptions:
initial image is a primary object for the user
readability is a must
So, the best possible solution is to have annotations but do not have them. (I would really suggest to spend some time reading about the theory of inventive problem solving).
Basically, there should be no obstacles for the user to see the initial picture, but annotations should be right there when needed. It can be slightly confusing, sorry for that.
Do you think intersections issue is the only one behind the following image?
Please note, the actual goal behind the developed approach is to provide two information flows (image and annotations) and help the user to understand everything as fast as possible. By the way, vision memory is also very important.
What are behind human vision:
Selective attention
Familiarity detection
Pattern detection
Do you want to break at least one of these mechanisms? I hope you don't. Because it will make the actual result not very user-friendly.
So what can distract me?
strange lines randomly distributed over the image (random geometric objects are very distractive)
not uniform annotations placement and style
strange complex patterns as a result of final merge of the image and the annotation layer
Why my proposal should be considered?
It has simple pattern, so pattern detection will let the user stop noticing annotations, but see the picture instead
It has uniform design, so familiarity detection will work too
It does not affect initial image so much as other solutions because lines have minimal width.
Lines are horizontal, anti-aliasing is not used, so it saves more information and provides clean result
Finally, it does simplify routing algorithm a lot.
Some additional comments:
Do not use random points to test your algorithms, use simple but yet important cases. You'll see automated solutions sometimes may fail dramatically.
I do not suggest to use approach proposed by me as is. There are a lot of possible enhancements.
What I'm really suggest is to go one level up and do several iterations on the meta-level.
Grouping can be used to deal with the complex case, mentioned by Robert King:
Or I can imagine for a second some point is located slightly above it's default location. But only for a second, because I do not want to break the main processing flow and affect other markers.
Thank you for reading.

You can find the center of your diagram, and then draw the lines from the points radially outward from the center. The only way you could have a crossing is if two of the points lie on the same ray, in which case you just shift one of the lines a bit one way, and shift the other a bit the other way, like so:
With only actual parts showing:
In case there are two or more points colinear with the center, you can shift the lines slightly to the side:
While this doen't produce very good multisegment line things, it very clearly labels the diagram. Also, to make it more fisually appealing, it may be better to pick a point for the center that is actually the center of your object, rather than just the center of the point set.

I would add one more thing to your prototype - may be it will be acceptable after this:
Iterate through every intersection and swap labels, repeat until there are intersections.
This process is finite, because number of states is finite and every swap reduces sum of all line lengths - so no loop is possible.

This problem can be cast as graph layout.
I recommend you look at e.g. the Graphviz library. I have not done any experiments, but believe that by expressing the points to be labeled and the labels themselves as nodes and the lead lines as edges, you would get good results.
You would have to express areas where labels should not go as "dummy" nodes not to be overlapped.
Graphvis has bindings for many languages.
Even if Graphviz does not have quite enough flexibility to do exactly what you need, the "Theory" section of that page has references for energy minimization and spring algorithms that can be applied to your problem. The literature on graph layout is enormous.

Position of connected points in space

A-B-C-D are 4 points. We define r = length(B-C), angle, ang1 = (A-B-C) and angle ang2 = (B-C-D) and the torsion angle tors1 = (A-B-C-D). What I really need to do is to find the coordinates of C and D provided that I have the new values of r, ang1, ang2 and tors1.
The thing is that the points A and B are rigidly connected to each other, and points C and D are also connected to each other by a rigid connector, so to speak. That is the distance (C-D) remains fixed and also distance A-B remains fixed. There is no such rigid connection between the points B and C.
We have the old coordinates of the 4 points for some other set of (r,ang1,ang2,tors1) and we need to find the new coordinates when this defining set of variables changes to some arbitrary value.
I would be grateful for any helpful comments.
Thanks a lot.
I'm not allowed to post a picture because I'm a new user :(
Additional Info: An iterative solution is not going to be useful because I need to do this in a simulation "plenty of times O(10^6)".

I think the best way to approach this problem would be to think in terms of analytic geometry.
Each point A,B,C,D has some 3D coordinates (x,y,z) and you have some relationships between
them (e.g. distance B-C is equal to r means that
r = sqrt[ (x_b - x_c)^2 + (y_b - y_c)^2 + (z_b - z_c)^2 ]
Once you define such relations it remains to solve the resulting system of equations for the unknown values of coordinates of the points you need to determine.
This is a general approach, if you describe the problem better (maybe a picture?) it might be easy to find some efficient ways of solving such systems because of some special properties your problem has.

You haven't mentioned the coordinate system. Even if (r, a1, a2, t) don't change, the "coordinates" will change if the whole structure can be sent whirling off into space. So I'll make some assumptions:
Put B at the origin, C on the positive X axis and A in the XY plane with y&gt0. If you don't know the distance AB, calculate it from the old coordinates. Likewise CD.
A: (-AB cos(a1), AB sin(a1), 0)
B: (0, 0, 0)
C: (r, 0, 0)
D: (r + CD cos(a2), CD sin(a2) cos(t), CD sin(a2) sin(t))
(Just watch out for sign conventions in the angles.)

you are describing a set of constraints.
what you need to do is for every constraint check if they are still satisfied, and if not calc the most efficient way to get it correct again.
for instance, in case of length b-c=r if b-c is not r anymore, make it r again by moving both b and c to or from eachother so that the constraint is met again.
for every constraint one by one do this.
Then repeat a few times until the system has stabilized again (e.g. all constraints are met).
that's it

You are asking for a solution to a nonlinear system of equations. For the mathematically inclined, I will write out the constraint equations:
Suppose you have positions of points A,B,C,D. We define vectors AB=A-B, etc., and furthermore, we use the notation nAB to denote the normalized vector AB/|AB|. With this notation, we have:
AB.AB = fixed
CD.CD = fixed
CB.CB = r*r
nAB.nCB = cos(ang1)
nDC.nBC = cos(ang2)
Let E = D - DC.(nCB x nAB) // projection of D onto plane defined by ABC
nEC.nDC = cos(tors1)
nEC x nDC = sin(tors1) // not sure if your torsion angle is signed (if not, delete this)
where the dot (.) denotes dot product, and cross (x) denotes cross product.
Each point is defined by 3 coordinates, so there are 12 unknowns, and 6 constraint equations, leaving 6 degrees of freedom that are unconstrained. These are the 6 gauge DOFs from the translational and rotational invariance of the space.
Assuming you have old point positions A', B', C', and D', and you want to find a new solution which is "closest" (in a sense I defined) to those old positions, then you are solving an optimization problem:
minimize: AA'.AA' + BB'.BB' + CC'.CC' + DD'.DD'
subject to the 4-5 constraints above.
This optimization problem has no nice properties so you will want to use something like Conjugate Gradient descent to find a locally optimal solution with the starting guess being the old point positions. That is an iterative solution, which you said is unacceptable, but there is no direct solution unless you clarify your problem.
If this sounds good to you, I can elaborate on the nitty gritty of performing the numerical optimization.

This is a different solution than the one I gave already. Here I assume that the positions of A and B are not allowed to change (i.e. positions of A and B are constants), similar to Beta's solution. Note that there are still an infinite number of solutions, since we can rotate the structure around the axis defined by A-B and all your constraints are still satisfied.
Let the coordinates of A be A[0], A[1] and A[2], and similarly for B. You want explicit equations for C and D, as you mentioned in the response to Beta's solution, so here they are:
First find the position of C. As mentioned before, there are an infinite number of possibilities, so I will pick a good one for you.
Vector AB = A-B
Normalize(AB)
int best_i = 0;
for i = 1 to 2
if AB[i] < AB[best_i]
best_i = i
// best_i contains dimension in which AB is smallest
Vector N = Cross(AB, unit_vec[best_i]) // A good normal vector to AB
Normalize(N)
Vector T = Cross(N, AB) // AB, N, and T form an orthonormal frame
Normalize(T) // redundant, but just in case
C = B + r*AB*cos(ang1) + r*N*sin(ang1)
// Assume s is the known, fixed distance between C and D
// Update the frame
Vector BC = B-C, Normalize(BC)
N = Cross(BC, T), Normalize(N)
D = C + s*cos(tors1)*BC*cos(ang2) + s*cos(tors1)*N*sin(ang1) +/- s*sin(tors1)*T
That last plus or minus depends on how you define the orthonormal frame. Try one and see if it's what you want, otherwise it's the other sign. The notation above is pretty informal, but it gives a definite recipe for how to generate C and D from A, B, and your parameters. It also chooses a good C (which depends on a good, nondegenerate N). unit_vec[i] refers to the vector of all zeros, except for a 1 at index i. As usual, I have not tested the pseudocode above :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio