Related
I have a set of points W={(x1, y1), (x2, y2),..., (xn, yn)} on the 2D plane. Can you find an algorithm that takes these points as the input and returns a point (x, y) on the 2D plane which has the minimum sum of distances from the points in W? In other words, if
di = Euclidean_distance((x, y), (xi, yi))
I want to minimize:
d1 + d2 + ... + dn
The Problem
You're looking for the geometric median.
An Easy Solution
There is no closed-form solution to this problem, so iterative or probabilistic methods are used. The easiest way to find this is probably with Weiszfeld's algorithm:
We can implement this in Python as follows:
import numpy as np
from numpy.linalg import norm as npnorm
c_pt_old = np.random.rand(2)
c_pt_new = np.array([0,0])
while npnorm(c_pt_old-c_pt_new)>1e-6:
num = 0
denom = 0
for i in range(POINT_NUM):
dist = npnorm(c_pt_new-pts[i,:])
num += pts[i,:]/dist
denom += 1/dist
c_pt_old = c_pt_new
c_pt_new = num/denom
print(c_pt_new)
There's a chance that Weiszfeld's algorithm won't converge, so it might be best to run it several times from different starting points.
A General Solution
You can also find this using second-order cone programming (SOCP). In addition to solving your specific problem, this general formulation then allows you to easily add constraints and weightings, such as variable uncertainty in the location of each data point.
To do so, you create a number of indicator variables representing the distance between the proposed center point and the data points.
You then minimize the sum of the indicator variables. The result follows
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt
#Generate random test data
POINT_NUM = 100
pts = np.random.rand(POINT_NUM,2)
c_pt = cp.Variable(2) #The center point we wish to locate
distances = cp.Variable(POINT_NUM) #Distance from the center point to each data point
#Generate constraints. These are used to hold distances.
constraints = []
for i in range(POINT_NUM):
constraints.append( cp.norm(c_pt-pts[i,:])<=distances[i] )
objective = cp.Minimize(cp.sum(distances))
problem = cp.Problem(objective,constraints)
optimal_value = problem.solve()
print("Optimal value = {0}".format(optimal_value))
print("Optimal location = {0}".format(c_pt.value))
plt.scatter(x=pts[:,0], y=pts[:,1], s=1)
plt.scatter(c_pt.value[0], c_pt.value[1], s=10)
plt.show()
SOCPs are available in a number of solvers including CPLEX, Elemental, ECOS, ECOS_BB, GUROBI, MOSEK, CVXOPT, and SCS.
I've tested and the two approaches give the same answers to within tolerance.
Weiszfeld, E. (1937). "Sur le point pour lequel la somme des distances de n points donnes est minimum". Tohoku Mathematical Journal. 43: 355–386.
If that point does not need to be from your sample, then the mean minimises the euclidean distance.
A third method would be to use a compact nonlinear programming formulation. An unconstrained NLP model would be:
min sum(i, ||x-p(i)|| )
This has just 2 variables (the coordinates of x).
There is a very good initial point available. Let p(i,c) be the coordinates of the data points. Then the mean is
m(c) = sum(i, p(i,c)) / n
where n is the number of data points. This point is often very close to the optimal value of x. So we can use m as an excellent initial point for x.
Some limited experiments indicate this approach is quite faster than a cone programming formulation for large n.
For details see Yet Another Math Programming Consultant - Finding the Central Point in a Point Cloud blog post.
My experimental data points looks like pieces of hyperbola. Below I provide a code (Matlab), which generates "dummy" data, which is very similar to original one:
function [x_out,y_out,alpha1,alpha2,ecK,offsetX,offsetY,branchDirection] = dummyGenerator(mu_alpha)
alpha_range=0.1;
numberPoint2Return=100; % number of points to return
ecK=10.^((rand(1)-0.5)*2*2); % eccentricity-related parameter
% slope of the first asimptote (radians)
alpha1 = ((rand(1)-0.5)*alpha_range+mu_alpha);
% slope of the first asimptote (radians)
alpha2 = -((rand(1)-0.5)*alpha_range+mu_alpha);
beta = pi-abs(alpha1-alpha2); % angle between asimptotes (radians)
branchDirection = datasample([0,1],1); % where branch directed
% up: branchDirection==0;
% down: branchDirection==1;
% generate branch
x = logspace(-3,2,numberPoint2Return*100)'; %over sampling
y = (tan(pi/2-beta)*x+ecK./x);
% rotate branch using branchDirection
theta = -(pi/2-alpha1)-pi*branchDirection;
% get rotation matrix
rotM = [ cos(theta), -sin(theta);
sin(theta), cos(theta) ];
% get rotated coordinates
XY1=[x,y]*rotM;
x1=XY1(:,1); y1=XY1(:,2);
% remove possible Inf
x1(~isfinite(y1))=[];
y1(~isfinite(y1))=[];
% add noise
y1=((rand(numel(y1),1)-0.5)+y1);
% downsampling
%x_out=linspace(min(x1),max(x1),numberPoint2Return)';
x_out=linspace(-10,10,numberPoint2Return)';
y_out=interp1(x1,y1,x_out,'nearest');
% randomize offset
offsetX=(rand(1)-0.5)*50;
offsetY=(rand(1)-0.5)*50;
x_out=x_out+offsetX;
y_out=y_out+offsetY;
end
Typical results are presented on figure:
The data has following important property: slopes of both asymptotes comes from the same distribution (just different signs), so for my fitting I have rather goo estimation for mu_alpha.
Now starts the problematic part. I try to fit these data points. The main idea of my approach is to find a rotation to obtain y=k*x+b/x shape and then just fit it.
I use the following code:
function Rsquare = fitFunction(x,y,alpha1,alpha2,ecK,offsetX,offsetY)
R=[];
for branchDirection=[0 1]
% translate back
xt=x-offsetX;
yt=y-offsetY;
% rotate back
theta = (pi/2-alpha1)+pi*branchDirection;
rotM = [ cos(theta), -sin(theta);
sin(theta), cos(theta) ];
XY1=[xt,yt]*rotM;
x1=XY1(:,1); y1=XY1(:,2);
% get fitted values
beta = pi-abs(alpha1-alpha2);
%xf = logspace(-3,2,10^3)';
y1=y1(x1>0);
x1=x1(x1>0);
%x1=x1-min(x1);
xf=sort(x1);
yf=(tan(pi/2-beta)*xf+ecK./xf);
R(end+1)=sum((xf-x1).^2+(yf-y1).^2);
end
Rsquare=min(R);
end
Unfortunately this code works not good, very often I have bad results, even when I use known(from simulation) initial parameters.
Could You help me to find a good solution for such fitting problem?
UPDATE:
I find a solution (see Answer), but
I still have a small problem - my estimation of aparameter is bad, sometimes I did no have good fits because of this reason.
Could You suggest some ideas how to estimate a from experimental point?
I found the main problem (it was my brain as usually)! I did not know about general equation of hyperbola. So equation for my hyperbolas are:
((x-x0)/a).^2-((y-y0)/b).^2=-1
So ,we may not take care about sign, then I may use the following code:
mu_alpha=pi/6;
[x,y,alpha1,alpha2,ecK,offsetX,offsetY,branchDirection] = dummyGenerator(mu_alpha);
% hyperb=#(alpha,a,x0,y0) tan(alpha)*a*sqrt(((x-x0)/a).^2+1)+y0;
hyperb=#(x,P) tan(P(1))*P(2)*sqrt(((x-P(3))./P(2)).^2+1)+P(4);
cost =#(P) fitFunction(x,y,P);
x0=mean(x);
y0=mean(y);
a=(max(x)-min(x))./20;
P0=[mu_alpha,a,x0,y0];
[P,fval] = fminsearch(cost,P0);
hold all
plot(x,y,'-o')
plot(x,hyperb(x,P))
function Rsquare = fitFunction(x,y,P)
%x=sort(x);
yf=tan(P(1))*P(2)*sqrt(((x-P(3))./P(2)).^2+1)+P(4);
Rsquare=sum((yf-y).^2);
end
P.S. LaTex tags did not work for me
I am doing a clustering task and I have a distance matrix. I wish to visualize this distance matrix as a 2D graph. Please let me know if there is any way to do it online or in programming languages like R or python.
My distance matrix is as follows,
I used the classical Multidimensional scaling functionality (in R) and obtained a 2D plot that looks like:
But What I am looking for is a graph with nodes and weighted edges running between them.
Possibility 1
I assume, that you want a 2dimensional graph, where distances between nodes positions are the same as provided by your table.
In python, you can use networkx for such applications. In general there are manymethods of doing so, remember, that all of them are just approximations (as in general it is not possible to create a 2 dimensional representataion of points given their pairwise distances) They are some kind of stress-minimizatin (or energy-minimization) approximations, trying to find the "reasonable" representation with similar distances as those provided.
As an example you can consider a four point example (with correct, discrete metric applied):
p1 p2 p3 p4
---------------
p1 0 1 1 1
p2 1 0 1 1
p3 1 1 0 1
p4 1 1 1 0
In general, drawing actual "graph" is redundant, as you have fully connected one (each pair of nodes is connected) so it should be sufficient to draw just points.
Python example
import networkx as nx
import numpy as np
import string
dt = [('len', float)]
A = np.array([(0, 0.3, 0.4, 0.7),
(0.3, 0, 0.9, 0.2),
(0.4, 0.9, 0, 0.1),
(0.7, 0.2, 0.1, 0)
])*10
A = A.view(dt)
G = nx.from_numpy_matrix(A)
G = nx.relabel_nodes(G, dict(zip(range(len(G.nodes())),string.ascii_uppercase)))
G = nx.to_agraph(G)
G.node_attr.update(color="red", style="filled")
G.edge_attr.update(color="blue", width="2.0")
G.draw('distances.png', format='png', prog='neato')
In R you can try multidimensional scaling
# Classical MDS
# N rows (objects) x p columns (variables)
# each row identified by a unique row name
d <- dist(mydata) # euclidean distances between the rows
fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim
fit # view results
# plot solution
x <- fit$points[,1]
y <- fit$points[,2]
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2",
main="Metric MDS", type="n")
text(x, y, labels = row.names(mydata), cex=.7)
Possibility 2
You just want to draw a graph with labeled edges
Again, networkx can help:
import networkx as nx
# Create a graph
G = nx.Graph()
# distances
D = [ [0, 1], [1, 0] ]
labels = {}
for n in range(len(D)):
for m in range(len(D)-(n+1)):
G.add_edge(n,n+m+1)
labels[ (n,n+m+1) ] = str(D[n][n+m+1])
pos=nx.spring_layout(G)
nx.draw(G, pos)
nx.draw_networkx_edge_labels(G,pos,edge_labels=labels,font_size=30)
import pylab as plt
plt.show()
Multidimensional scaling (MDS) is exactly what you want. See here and here for more.
You did not mentioned if you want a 2 dimensional graph or not. I suppose that you want to build a graph on 2 dimensions due to the fact that you need that for visualization. Considering that you have to be aware that for the most of the graphs this is simply not possible.
What can be probably done is to approximate somehow the values from distance matrix, something like small values to have relative small edges and big values to have a relative big length.
With all previous considerations one option would be graphviz. See neato function.
In general what you are interested in is force-directed drawing. See wikipedia for further reference.
You can use d3js Force Directed Graph and configure distance between nodes. d3js force layout has some clustering capability to separate nodes with similar distances. Here's an example with values as distance between nodes:
http://vida.io/documents/SyT7DREdQmGSpsBkK
Another way to visualize is to use same distance between nodes but different line thickness. In that case, you'd want to calculate stroke-width based on values:
.style("stroke-width", function(d) { return Math.sqrt(d.value / 50); });
Given an image and a set of labels attached to particular points on the image, I'm looking for an algorithm to lay out the labels to the sides of the image with certain constraints (roughly same number of labels on each side, labels roughly equidistant, lines connecting the labels to their respective points with no lines crossing).
Now, an approximate solution can typically be found quite naively by ordering the labels by Y coordinate (of the point they refer to), as in this example (proof of concept only, please ignore accuracy or otherwise of actual data!).
Now to satisfy the condition of no crossings, some ideas that occurred to me:
use a genetic algorithm to find an ordering of labels with no crossovers;
use another method (e.g. dynamic programming algorithm) to search for such an ordering;
use one of the above algorithms, allowing for variations in the spacing as well as ordering, to find the solution that minimises number of crossings and variation from even spacing;
maybe there are criteria I can use to brute search through every possible ordering of the labels within certain criteria (do not re-order two labels if their distance is greater than X);
if all else fails, just try millions of random orderings/spacing offsets and take the one that gives the minimum crossings/spacing variation. (Advantage: straightforward to program and will probably find a good enough solution; slight disadvantage, though not a show-stopper: maybe can't then run it on the fly during the application to allow user to change layout/size of the image.)
Before I embark on one of these, I would just welcome some other people's input: has anybody else experience with a similar problem and have any information to report on the success/failure of any of the above methods, or if they have a better/simpler solution that isn't occurring to me? Thanks for your input!
Lucas Bradsheet's honours thesis Labelling Maps using Multi-Objective Evolutionary Algorithms has quite a good discussion of this.
First off, this paper creates usable metrics for a number of metrics of labelling quality.
For example, clarity (how obvious the mapping between sites and labels was): clarity(s)=rs+rs1/rt
where rs is the distance between a site and its label and rt is the distance between a site and there closest other label).
It also has useful metrics for the conflicts between labels, sites and borders, as well as for measuring the density and symmetry of labels. Bradsheet then uses a multiple objective genetic algorithm to generate a "Pareto frontier" of feasible solutions. It also includes information about how he mutated the individuals, and some notes on improving the speed of the algorithm.
There's a lot of detail in it, and it should provide some good food for thought.
Let's forget about information design for a moment. This tasks recalls some memories related to PCB routing algorithms. Actually there are a lot of common requirements, including:
intersections optimization
size optimization
gaps optimization
So, it could be possible to turn the initial task into something similar to PCB routing.
There are a lot of information available, but I would suggest to look through Algorithmic studies on PCB routing by Tan Yan.
It provides a lot of details and dozens of hints.
Adaptation for the current task
The idea is to treat markers on the image and labels as two sets of pins and use escape routing to solve the task. Usually the PCB area is represented as an array of pins. Same can be done to the image with possible optimizations:
avoid low contrast areas
avoid text boxes if any
etc
So the task can be reduced to "routing in case of unused pins"
Final result can be really close to the requested style:
Algorithmic studies on PCB routing by Tan Yan is a good place to continue.
Additional notes
I chn change the style of the drawing a little bit, in order to accentuate the similarity.
It should not be a big problem to do some reverse transformation, keeping the good look and readability.
Anyway, adepts of simplicity (like me, for example) can spend several minutes and invent something better (or something different):
As for me, curves do not look like a complete solution, at least on this stage. Anyway, I've just tried to show there is room for enhancements, so PCB routing approach can be considered as an option.
One option is to turn it into an integer programming problem.
Lets say you have n points and n corresponding labels distributed around the outside of the diagram.
The number of possible lines is n^2, if we look at all possible intersections, there are less than n^4 intersections (if all possible lines were displayed).
In our integer programming problem we add the following constraints:
(to decide if a line is switched on (i.e. displayed to the screen) )
For each point on the diagram, only one of the possible n lines
connecting to it is to be switched on.
For each label, only one of the possible n lines connecting to it is
to be switched on.
For each pair of intersecting line segments line1 and line2, only
zero or one of these lines may be switched on.
Optionally, we can minimize the total distance of all the switched on lines. This enhances aesthetics.
When all of these constraints hold at the same time, we have a solution:
The code below produced the above diagram for 24 random points.
Once You start to get more than 15 or so points, the run time of the program will start to slow.
I used the PULP package with its default solver. I used PyGame for the display.
Here is the code:
__author__ = 'Robert'
import pygame
pygame.font.init()
import pulp
from random import randint
class Line():
def __init__(self, p1, p2):
self.p1 = p1
self.p2 = p2
self.length = (p1[0] - p2[0])**2 + (p1[1] - p2[1])**2
def intersect(self, line2):
#Copied some equations for wikipedia. Not sure if this is the best way to check intersection.
x1, y1 = self.p1
x2, y2 = self.p2
x3, y3 = line2.p1
x4, y4 = line2.p2
xtop = (x1*y2-y1*x2)*(x3-x4)-(x1-x2)*(x3*y4-y3*x4)
xbottom = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
ytop = (x1*y2-y1*x2)*(y3-y4)-(y1-y2)*(x3*y4-y3*x4)
ybottom = xbottom
if xbottom == 0:
#lines are parallel. Can only intersect if they are the same line. I'm not checking that however,
#which means there could be a rare bug that occurs if more than 3 points line up.
if self.p1 in (line2.p1, line2.p2) or self.p2 in (line2.p1, line2.p2):
return True
return False
x = float(xtop) / xbottom
y = float(ytop) / ybottom
if min(x1, x2) <= x <= max(x1, x2) and min(x3, x4) <= x <= max(x3, x4):
if min(y1, y2) <= y <= max(y1, y2) and min(y3, y4) <= y <= max(y3, y4):
return True
return False
def solver(lines):
#returns best line matching
lines = list(lines)
prob = pulp.LpProblem("diagram labelling finder", pulp.LpMinimize)
label_points = {} #a point at each label
points = {} #points on the image
line_variables = {}
variable_to_line = {}
for line in lines:
point, label_point = line.p1, line.p2
if label_point not in label_points:
label_points[label_point] = []
if point not in points:
points[point] = []
line_on = pulp.LpVariable("point{0}-point{1}".format(point, label_point),
lowBound=0, upBound=1, cat=pulp.LpInteger) #variable controls if line used or not
label_points[label_point].append(line_on)
points[point].append(line_on)
line_variables[line] = line_on
variable_to_line[line_on] = line
for lines_to_point in points.itervalues():
prob += sum(lines_to_point) == 1 #1 label to each point..
for lines_to_label in label_points.itervalues():
prob += sum(lines_to_label) == 1 #1 point for each label.
for line1 in lines:
for line2 in lines:
if line1 > line2 and line1.intersect(line2):
line1_on = line_variables[line1]
line2_on = line_variables[line2]
prob += line1_on + line2_on <= 1 #only switch one on.
#minimize length of switched on lines:
prob += sum(i.length * line_variables[i] for i in lines)
prob.solve()
print prob.solutionTime
print pulp.LpStatus[prob.status] #should say "Optimal"
print len(prob.variables())
for line_on, line in variable_to_line.iteritems():
if line_on.varValue > 0:
yield line #yield the lines that are switched on
class Diagram():
def __init__(self, num_points=20, width=700, height=800, offset=150):
assert(num_points % 2 == 0) #if even then labels align nicer (-:
self.background_colour = (255,255,255)
self.width, self.height = width, height
self.screen = pygame.display.set_mode((width, height))
pygame.display.set_caption('Diagram Labeling')
self.screen.fill(self.background_colour)
self.offset = offset
self.points = list(self.get_points(num_points))
self.num_points = num_points
self.font_size = min((self.height - 2 * self.offset)//num_points, self.offset//4)
def get_points(self, n):
for i in range(n):
x = randint(self.offset, self.width - self.offset)
y = randint(self.offset, self.height - self.offset)
yield (x, y)
def display_outline(self):
w, h = self.width, self.height
o = self.offset
outline1 = [(o, o), (w - o, o), (w - o, h - o), (o, h - o)]
pygame.draw.lines(self.screen, (0, 100, 100), True, outline1, 1)
o = self.offset - self.offset//4
outline2 = [(o, o), (w - o, o), (w - o, h - o), (o, h - o)]
pygame.draw.lines(self.screen, (0, 200, 0), True, outline2, 1)
def display_points(self, color=(100, 100, 0), radius=3):
for point in self.points:
pygame.draw.circle(self.screen, color, point, radius, 2)
def get_label_heights(self):
for i in range((self.num_points + 1)//2):
yield self.offset + 2 * i * self.font_size
def get_label_endpoints(self):
for y in self.get_label_heights():
yield (self.offset, y)
yield (self.width - self.offset, y)
def get_all_lines(self):
for point in self.points:
for end_point in self.get_label_endpoints():
yield Line(point, end_point)
def display_label_lines(self, lines):
for line in lines:
pygame.draw.line(self.screen, (255, 0, 0), line.p1, line.p2, 1)
def display_labels(self):
myfont = pygame.font.SysFont("Comic Sans MS", self.font_size)
label = myfont.render("label", True, (155, 155, 155))
for y in self.get_label_heights():
self.screen.blit(label, (self.offset//4 - 10, y - self.font_size//2))
pygame.draw.line(self.screen, (255, 0, 0), (self.offset - self.offset//4, y), (self.offset, y), 1)
for y in self.get_label_heights():
self.screen.blit(label, (self.width - 2*self.offset//3, y - self.font_size//2))
pygame.draw.line(self.screen, (255, 0, 0), (self.width - self.offset + self.offset//4, y), (self.width - self.offset, y), 1)
def display(self):
self.display_points()
self.display_outline()
self.display_labels()
#self.display_label_lines(self.get_all_lines())
self.display_label_lines(solver(self.get_all_lines()))
diagram = Diagram()
diagram.display()
pygame.display.flip()
running = True
while running:
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
I think an actual solution of this problem is on the slightly different layer. It doesn't seem to be right idea to start solving algorithmic problem totally ignoring Information design. There is an interesting example found here
Let's identify some important questions:
How is the data best viewed?
Will it confuse people?
Is it readable?
Does it actually help to better understand the picture?
By the way, chaos is really confusing. We like order and predictability. There is no need to introduce additional informational noise to the initial image.
The readability of a graphical message is determined by the content and its presentation. Readability of a message involves the reader’s ability to understand the style of text and pictures. You have that interesting algorithmic task because of the additional "noisy" approach. Remove the chaos -- find better solution :)
Please note, this is just a PoC. The idea is to use only horizontal lines with clear markers. Labels placement is straightforward and deterministic. Several similar ideas can be proposed.
With such approach you can easily balance left-right labels, avoid small vertical gaps between lines, provide optimal vertical density for labels, etc.
EDIT
Ok, let's see how initial process may look.
User story: as a user I want important images to be annotated in order to simplify understanding and increase it's explanatory value.
Important assumptions:
initial image is a primary object for the user
readability is a must
So, the best possible solution is to have annotations but do not have them. (I would really suggest to spend some time reading about the theory of inventive problem solving).
Basically, there should be no obstacles for the user to see the initial picture, but annotations should be right there when needed. It can be slightly confusing, sorry for that.
Do you think intersections issue is the only one behind the following image?
Please note, the actual goal behind the developed approach is to provide two information flows (image and annotations) and help the user to understand everything as fast as possible. By the way, vision memory is also very important.
What are behind human vision:
Selective attention
Familiarity detection
Pattern detection
Do you want to break at least one of these mechanisms? I hope you don't. Because it will make the actual result not very user-friendly.
So what can distract me?
strange lines randomly distributed over the image (random geometric objects are very distractive)
not uniform annotations placement and style
strange complex patterns as a result of final merge of the image and the annotation layer
Why my proposal should be considered?
It has simple pattern, so pattern detection will let the user stop noticing annotations, but see the picture instead
It has uniform design, so familiarity detection will work too
It does not affect initial image so much as other solutions because lines have minimal width.
Lines are horizontal, anti-aliasing is not used, so it saves more information and provides clean result
Finally, it does simplify routing algorithm a lot.
Some additional comments:
Do not use random points to test your algorithms, use simple but yet important cases. You'll see automated solutions sometimes may fail dramatically.
I do not suggest to use approach proposed by me as is. There are a lot of possible enhancements.
What I'm really suggest is to go one level up and do several iterations on the meta-level.
Grouping can be used to deal with the complex case, mentioned by Robert King:
Or I can imagine for a second some point is located slightly above it's default location. But only for a second, because I do not want to break the main processing flow and affect other markers.
Thank you for reading.
You can find the center of your diagram, and then draw the lines from the points radially outward from the center. The only way you could have a crossing is if two of the points lie on the same ray, in which case you just shift one of the lines a bit one way, and shift the other a bit the other way, like so:
With only actual parts showing:
In case there are two or more points colinear with the center, you can shift the lines slightly to the side:
While this doen't produce very good multisegment line things, it very clearly labels the diagram. Also, to make it more fisually appealing, it may be better to pick a point for the center that is actually the center of your object, rather than just the center of the point set.
I would add one more thing to your prototype - may be it will be acceptable after this:
Iterate through every intersection and swap labels, repeat until there are intersections.
This process is finite, because number of states is finite and every swap reduces sum of all line lengths - so no loop is possible.
This problem can be cast as graph layout.
I recommend you look at e.g. the Graphviz library. I have not done any experiments, but believe that by expressing the points to be labeled and the labels themselves as nodes and the lead lines as edges, you would get good results.
You would have to express areas where labels should not go as "dummy" nodes not to be overlapped.
Graphvis has bindings for many languages.
Even if Graphviz does not have quite enough flexibility to do exactly what you need, the "Theory" section of that page has references for energy minimization and spring algorithms that can be applied to your problem. The literature on graph layout is enormous.
Given a set of points, what's the fastest way to fit a parabola to them? Is it doing the least squares calculation or is there an iterative way?
Thanks
Edit:
I think gradient descent is the way to go. The least squares calculation would have been a little bit more taxing (having to do qr decomposition or something to keep things stable).
If the points have no error associated, you may interpolate by three points. Otherwise least squares or any equivalent formulation is the way to go.
I recently needed to find a parabola that passes through 3 points.
suppose you have (x1,y1), (x2,y2) and (x3,y3) and you want the parabola
y-y0 = a*(x-x0)^2
to pass through them: find y0, x0, and a.
You can do some algebra and get this solution (providing the points aren't all on a line) :
let c = (y1-y2) / (y2-y3)
x0 = ( -x1^2 + x2^2 + c*( x2^2 - x3^2 ) ) / (2.0*( -x1+x2 + c*x2 - c*x3 ))
a = (y1-y2) / ( (x1-x0)^2 - (x2-x0)^2 )
y0 = y1 - a*(x1-x0)^2
Note in the equation for c if y2==y3 then you've got a problem. So in my algorithm I check for this and swap say x1, y1 with x2, y2 and then proceed.
hope that helps!
Paul Probert
A calculated solution is almost always faster than an iterative solution. The "exception" would be for low iteration counts and complex calculations.
I would use the least squares method. I've only every coded it for linear regression fits but it can be used for parabolas (I had reason to look it up recently - sources included an old edition of "Numerical Recipes" Press et al; and "Engineering Mathematics" Kreyzig).
ALGORITHM FOR PARABOLA
Read no. of data points n and order of polynomial Mp .
Read data values .
If n< Mp
[ Regression is not possible ]
stop
else
continue ;
Set M=Mp + 1 ;
Compute co-efficient of C-matrix .
Compute co-efficient of B-matrix .
Solve for the co-efficients
a1,a2,. . . . . . . an .
Write the co-efficient .
Estimate the function value at the glren of independents variables .
Using the free arbitrary accuracy math program "PARI" (for Mac or PC):
Here is how I would fit a parabola to a set of 641 points,
and I also show how to find the minimum of that parabola:
Set a high number of digits of precision:
\p 300
Write the data points to a text file separated by one space
for each data point
(use ASCII characters in base ten, no space at file start or file end, and no returns, write extremely large or small floating points as for example
"9.0E-23" but not "9.0D-23" ).
make a string to point to that file:
fileone="./desktop/data.txt"
read that file into PARI using the following instructions:
fileopen(fileone,r)
readsplit(file) = my(cmd);cmd="perl -ne \"chomp; print '[' . join(',', split(/ +/)) . ']\n';\"";eval(externstr(Str(cmd," ",file)))
readsplit(fileone)
Label that data with a name:
in = %
V = in[1]
Define a least squares fit function:
lsf(X,Y,n) = my(M=matrix(#X,n+1,i,j,X[i]^(j-1)));fit=Polrev(matsolve(M~*M,M~*Y~))
Apply that lsf function to your 641 data points:
lsf([-320..320],V, 2)
Then if you want to show the minimum of that parabolic fit, enter:
xextreme = solve (x=-1000,1000,eval(deriv(fit)));print (xextreme*(124.5678-123.5678)/640+(124.5678+123.5678)/2);x=xextreme;print(eval(fit))
(I had to adjust for my particular x-axis scaling before the "print" statement in that command line above).
(Note: A sacrifice made to simplify this algorithm
causes it to work only
when the data set has equally spaced x-axis coordinates.)
I was worried that my last post
was too compact to follow and
too hard to convert to other environments.
I would like to show here how to solve the
generalized problem of parabolic data fitting explicitly
without specialized matrix math terminology;
and so that each multiplication, division,
subtraction and addition can be seen at once.
To save ink this fit reparameterizes the x-axis as evenly
spaced points centered on zero
so that odd powered sums all get eliminated
(saving a lot of space and time),
so the x-coordinates of the N data points
are effectively labeled by points
of this vector: X=[-(N-1)/2..(N-1)/2].
For example "xextreme" will be returned
versus those integer indices
and so (if desired) a simple (consumes very little CPU time)
linear transformation must be applied after the algorithm below
to get it versus your problem's particular x-axis labels.
This is written in the language of
the free program "PARI" but all the
commands are simple to translate to any language.
Step 1: assign a label to the y-axis data:
? V=[5,2,1,2,5]
"PARI" confirms that entry:
%280 = [5, 2, 1, 2, 5]
Then type in the following processing algorithm
which calculates a best fit parabola
through any y-axis data set with constant x-axis separation:
? g=#V;h=(g-1)*g*(g+1)/3;i=h*(3*g*g-7)/5;\
a=sum(i=1,g,V[i]);b=sum(i=1,g,(2*i-1-g)*V[i]);c=sum(i=1,g,(2*i-1-g)*(2*i-1-g)*V[i]);\
A=matdet([a,c;h,i])/matdet([g,h;h,i]);B=b/h*2;C=matdet([g,h;a,c])/matdet([g,h;h,i])*4;\
xextreme=-B/(2*C);yextreme=-B*B/(4*C)+A;fit=Polrev([A,B,C]);\
print("\n","y of extreme is ",yextreme,"\n","which occurs this many data points from center of data: ",xextreme)
(Note for non-PARI users:
the command "matdet([a,c;h,i])"
is just another way of entering "a*i-c*h")
Those commands then produce the following screen output:
y of extreme is 1
which occurs this many data points from center of data: 0
The algorithm stores the polynomial of the fit in the variable "fit":
? fit
%282 = x^2 + 1
?
(Note that to make that algorithm short
the x-axis labels are assigned as X=[-(N-1)/2..(N-1)/2],
thus they are X=[-2,-1,0,1,2]
To correct that
for the same polynomial as parameterized
by an x-axis coordinate data set of say X=[−1,0,1,2,3]:
just apply a simple linear transform, in this case:
"x^2 + 1" --> "(t - 1)^2 + 1".)