Dividing cubic spline computation - algorithm

I have n points (20k to 30k) in 3D, I want to interpolate them using cubic spline. The problem here is that I will not have all of these points at the same time, they will be sent to me. so I don't want to wait until receive all of them to start the interpolation.
As a result, I choose to split these points into subsets, interpolate the first subset of points ,use it, and by the time start interpolate the next subset of points and so on
(split these points into subsets of hundreds points n1,n2,... and find the splines for each subset so that the result is identical to the result of the n point splines curves.)
I thought that overlapping these subset during computation is enough to do that but it seems not. what do you suggest to solve this?

You can start the interpolation at any time, using the available points (presumably in the correct order !). Cubic spline interpolation is a very stable process and when you add more points, most of the curve remains unchanged.
If your concern is that you want to avoid redoing the whole computation several times, I guess that it is enough to work on several sections with some overlap (say 20 points) and discard the results of the 10 extreme points of all sections.

Calculation of interpolation splines is performed over the whole point sequence, and results for two separate halves with overlapping would be slightly different. Note, for example, that border condition for the last point of the first spline might include predefined bias or zero curvature, while the same part of the second spline is calculated to provide continuity.
You can try to calculate some kind of smooth transition for overlapping region.
Edit
After question update - I don't see relation between parallel treatment and your problem.
Instead you can connect subranges with C1-continuity:
Calculate splines interpolation for the first set of point. Use free-end condition - zero curvature. Remember bias (linear coefficient) in the terminal point.
For the every next set calculate spline interpolation using predefined starting bias - from the last set, and ending zero curvature condition again.
BTW, spline interpolation for thousands of points should work very fast (it is linear algorithm). Is it really bottleneck?

Related

Interpolating missing contour lines between existing contour lines

Contour lines (aka isolines) are curves that trace constant values across a 2D scalar field. For example, in a geographical map you might have contour lines to illustrate the elevation of the terrain by showing where the elevation is constant. In this case, let's store contour lines as lists of points on the map.
Suppose you have map that has several contour lines at known elevations, and otherwise you know nothing about the elevations of the map. What algorithm would you use to fill in additional contour lines to approximate the unknown elevations of the map, assuming the landscape is continuous and doesn't do anything surprising?
It is easy to find advise about interpolating the elevation of an individual point using contour lines. There are also algorithms like Marching Squares for turning point elevations into contour lines, but none of these exactly capture this use case. We don't need the elevation of any particular point; we just want the contour lines. Certainly we could solve this problem by filling an array with estimated elevations and then using Marching Squares to estimate the contour lines based on the array, but the two steps of that process seem unnecessarily expensive and likely to introduce artifacts. Surely there is a better way.
IMO, about all methods will amount to somehow reconstructing the 3D surface by interpolation, even if implicitly.
You may try by flattening the curves (turning them to polylines) and triangulating the resulting polygons thay they will define. (There will be a step of closing the curves that end on the border of the domain.)
By intersection of the triangles with a new level (unsing linear interpolation along the sides), you will obtain new polylines corresponding to new isocurves. Notice that the intersections with the old levels recreates the old polylines, which is sound.
You may apply a post-smoothing to the curves, but you will have no guarantee to retrieve the original old curves and cannot prevent close surves to cross each other.
Beware that increasing the density of points along the curves will give you a false feeling of accuracy, as the error due to the spacing of the isolines will remain (indeed the reconstructed surface will be cone-like, with one of the curvatures being null; the surface inside the bottommost and topmost lines will be flat).
Alternatively to using flat triangles, one may think of a scheme where you compute a gradient vector at every vertex (f.i. from a least square fit of a plane on the vertex and its neighbors), and use this information to generate a bivariate polynomial surface in the triangle. You must do this in such a way that the values along a side will coincide for the two triangles that share it. (Unfortunately, I have no formula to give you.)
The isolines are then obtained by a further subdivision of the triangle in smaller triangles, with a flat approximation.
Actually, this is not very different from getting sample points, (Delaunay) triangulating them and fitting picewise continuous patches to the triangles.
Whatever method you will use, be it 2D or 3D, it is useful to reason on what happens if you sweep the range of z values in a continous way. This thought experiment does reconstruct a 3D surface, which will possess continuity and smoothness properties.
A possible improvement over the crude "flat triangulation" model could be to extend every triangle side between to iso-polylines with sides leading to the next iso-polylines. This way, higher order interpolation (cubic) can be achieved, giving a smoother reconstruction.
Anyway, you can be sure that this will introduce discontinuities or other types of artifacts.
A mixed method:
flatten the isolines to polylines;
triangulate the poygons formed by the polylines and the borders;
on every node, estimate the surface gradient (least-square fit of a plane to the node and its neighborrs);
in every triangle, consider the two sides along which you need to interpolate and compute the derivative at endpoints (from the known gradients and the side directions);
use Hermite interpolation along these sides and solve for the desired iso-levels;
join the points obtained on both sides.
This method should be a good tradeoff between complexity and smoothness. It does reconstruct a continuous surface (except maybe for the remark below).
Note that is some cases, yo will obtain three solutions of the cubic. If there are three on each side, join them in order. Otherwise, make a decision on which to join and use the remaining two to close the curve.

Algorithm to detect curved lines from list of 2D points

I am trying to extract horizontal lines from a set of 2D points generated from the photo of the model of a human torso:
The points "mostly" form horizontal(ish) lines in a more or less regular way, but with possible gaps/missing-points:
There can be regions where the lines deform a bit:
And regions with background noise:
Of course I would need to tune things so I exclude those parts with defects. What I am looking for with this question is a suggested algorithm to find lines where they are well-behaved, filling eventual gaps and avoiding eventual noise, and also terminating the lines properly upon some discontinuity condition.
I believe there could be some optimizing or voting "flood fill" variant that would score line candidates and yield only well-formed lines, but I am not experienced with this and cannot figure anything by myself.
This dataset is in a gist here, and it is important to note that X coordinates are integers, so points are aligned vertically. The Y coordinates though are decimal numbers.
I would start by finding the nearest neighbor of every dot, then the second nearest neighbor on the other side (I mean only considering the dots in the half plane opposite to the first neighbor).
If the distance to the second neighbor exceeds twice the distance to the first, ignore it.
Just doing that, I bet that you will reconstruct a great deal of the curves, with gaps left unfilled.
By estimating the local curvature along the curve (f.i. by computing the circumscribed circle of three dots, taking every other dot, you can discard noisy portions.
Then to fill the gaps, you can detect the curve endpoints and look for the nearest endpoint in an angle around the extrapolated direction.
First step in the processing:
These are integral curves to the vector field representing the direction pattern.
So maybe start by finding for each point the slope vector, the predominant direction, by taking points from the neighborhood and fitting a line with LS or performing a PCA. Increasing the neighborhood radius should allow to deal with the data irregularities thereby picking up a greater-scale slope trend instead of a local noise.
If you decide to do this, could you post here the slope field you find, so instead of points could we see some tangents?

Generating a minimal set of vertices from a spline/curve

In my project, I represent geometry using splines. For physics and rendering I preprocess the splines and convert them into lines, and later polygons, by sampling the splines at a regular interval. However, I want to reduce the number of vertices/lines by ignoring samples that are already well enough represented by a line.
Coming up short when searching, I was wondering if there are any traditional techniques to convert a curve to a set of vertices while reducing the resulting error.
EDIT: To clarify, the result I want to end up with is a number of vertices/line segments that best represent the spline with the fewest amount of vertices/line segments. I'm not sure how to define what "best represent the spline" really means, but the goal is to make it as hard as possible to distinguish the difference between the spline and the approximation.
It can be done by recursively refining part which is not near segment between part ends.
If we have curve (spline) C:[0,1]->R^n. Than first approximation is segment S between curve end points [C(0), C(1)]. Take point C(0.5) and check how far is it from segment S. If it is far than we have to take it in discretization, if not than S is good approximation. If C(0.5) is far, than next approximation is polyline [C(0), C(0.5), C(1)], and we make same procedure with parts [C(0), C(0.5)] and [C(0.5), C(1)].
If you are using polynomial spline of order >= 3 (e.g. cubic spline) than it can have inflection point(s). In that case it is possible that curve point on half can 'fall' right on segment, but curve around to be far from segment. In that case it is good to check one more level of sub-parts.
This is entirely based on my own intuition, so I'm not sure if it coincides AT ALL with best practices. I do have a mathematics degree, so hopefully it's not too far off. I'll have you note that the computation involved may outstrip performance gains granted by not using as many vertices if the spline needs to be recalculated frequently.
Let's say the vertices are in an array like [v(0), v(1), v(2),..., v(n)] where each v(i) is something like (x, y). By iterating over the vertices starting at v(1) and ending at v(n-1), we can compare a point with its neighbors in order to tell whether or not to discard it. Note that we ignore v(0) and v(n) for two reasons: (I assume) we don't want to remove our endpoints, and also v(0) and v(n) are missing a neighbor that we would need in order to set up our calculation. I can think of a couple possibilities here that might warrant examination, but one in particular seems (in my head) to be the best answer...
Consider the case where we're deciding whether or not to remove v(i) from the vertex array. We could examine the Cartesian distance between v(i) and its neighbors, and remove the point if both are below some threshold value T. For example if v(i-1) = (x1, y1) and v(i) = (x2, y2) and v(i+1) = (x3, y3), then we evaluate sqrt((x2-x1)^2 + (y2-y1)^2))<T && sqrt((x3-x2)^2 + (y3-y2)^2))<T, removing v(i) if the evaluation returns true.
In 3+ dimensions, this would become more complicated - the calculation would be similar, but you would require a method of determining a point's neighbors since they might not lie directly next to the examined point in the vertex array.

extract points which satisfy certain conditions

I have an array of points in one plane. They form some shape. I need to extract points from this array which only form straight lines of this shape.
At this moment I have an algorithm but it does not work very good. I take first two points, make a straight line and then check if the following points lie on it with some tolerance. But there is a problem: the points which form straight line are not really on the straight but have some deviation. This deviation is quite large. If in my algorithm I make deviation large enough to get points from the straight part, then other points which are on the slightly bent part but have deviation less then specified also extracted.
I am looking for some idea on how to perform such task.
Here is the picture:
In circles are the parts which I want to extract. Red points are the parts which I could extract with my approach. If I increase the tolerance then I miss the straight pieces too.
First, if you already have some candidate subset of points and want to check whether they lie on a straight line. Use a form of linear regression to identify the best-fitting line, then check how well it fits and accept or reject the hypothesis that this particular segment is linear based on that.
One of the most standard ways of doing that is using Least Squares method.
Identifying the subset is a different problem, the best solution to which will depend strongly on the kind of data you have and the objective. I suggest that enumerating all the segments is a good starting point, if the amount of data is not extremely large, -- that should be doable in no more than cubic time, I gather.
There are certainly some approximations one can apply, e.g. choosing a point in the sequence and building a subset by iteratively adding points on either side as long as the segment remains linear within the tolerance threshold, than accepting or rejecting it if the segment is long enough.
I assume here that the curve is parameterizable by one of the coordinates. If this is not the case, e.g. if the curve is closed, additional steps may be required to separate the curve into parameterizable segments.
EDIT: how to check a segment is straight
There's a number of options.
First, I would expect that for a straight line the average deviation would stay roughly the same as you add the new points, then you can simply find a reasonable threshold on that given the data.
Second option is to further split the subset into a fixed number of parts (e.g. 2), find the best fitting line for each one and then compare these. In case of a straight line, roughly the same line should be predicted, but for a curve it would be different.
Third option is to perform nonlinear curve fitting, e.g. fit a quadratic curve and check the coefficient for the quadratic term -- if the line is straight, it should be close to zero.
In each case, of course, there is a tradeoff between the segment size and the deviation of the points from that segment. In the extreme case, there would either be one huge linear segment with huge deviation or a whole buch of 2-point segments with 0 deviation. The actual threshold on the deviation, the difference between the tangent curves, or the magnitude of the quadratic term (depending on the option you prefer) has to be selected for the given dataset to suit your needs. Looking at the plot, I would say that the threshold should be picked so as to allow for segments of length 10 or so.

Randomly and efficiently filling space with shapes

What is the most efficient way to randomly fill a space with as many non-overlapping shapes? In my specific case, I'm filling a circle with circles. I'm randomly placing circles until either a certain percentage of the outer circle is filled OR a certain number of placements have failed (i.e. were placed in a position that overlapped an existing circle). This is pretty slow, and often leaves empty spaces unless I allow a huge number of failures.
So, is there some other type of filling algorithm I can use to quickly fill as much space as possible, but still look random?
Issue you are running into
You are running into the Coupon collector's problem because you are using a technique of Rejection sampling.
You are also making strong assumptions about what a "random filling" is. Your algorithm will leave large gaps between circles; is this what you mean by "random"? Nevertheless it is a perfectly valid definition, and I approve of it.
Solution
To adapt your current "random filling" to avoid the rejection sampling coupon-collector's issue, merely divide the space you are filling into a grid. For example if your circles are of radius 1, divide the larger circle into a grid of 1/sqrt(2)-width blocks. When it becomes "impossible" to fill a gridbox, ignore that gridbox when you pick new points. Problem solved!
Possible dangers
You have to be careful how you code this however! Possible dangers:
If you do something like if (random point in invalid grid){ generateAnotherPoint() } then you ignore the benefit / core idea of this optimization.
If you do something like pickARandomValidGridbox() then you will slightly reduce the probability of making circles near the edge of the larger circle (though this may be fine if you're doing this for a graphics art project and not for a scientific or mathematical project); however if you make the grid size 1/sqrt(2) times the radius of the circle, you will not run into this problem because it will be impossible to draw blocks at the edge of the large circle, and thus you can ignore all gridboxes at the edge.
Implementation
Thus the generalization of your method to avoid the coupon-collector's problem is as follows:
Inputs: large circle coordinates/radius(R), small circle radius(r)
Output: set of coordinates of all the small circles
Algorithm:
divide your LargeCircle into a grid of r/sqrt(2)
ValidBoxes = {set of all gridboxes that lie entirely within LargeCircle}
SmallCircles = {empty set}
until ValidBoxes is empty:
pick a random gridbox Box from ValidBoxes
pick a random point inside Box to be center of small circle C
check neighboring gridboxes for other circles which may overlap*
if there is no overlap:
add C to SmallCircles
remove the box from ValidBoxes # possible because grid is small
else if there is an overlap:
increase the Box.failcount
if Box.failcount > MAX_PERGRIDBOX_FAIL_COUNT:
remove the box from ValidBoxes
return SmallCircles
(*) This step is also an important optimization, which I can only assume you do not already have. Without it, your doesThisCircleOverlapAnother(...) function is incredibly inefficient at O(N) per query, which will make filling in circles nearly impossible for large ratios R>>r.
This is the exact generalization of your algorithm to avoid the slowness, while still retaining the elegant randomness of it.
Generalization to larger irregular features
edit: Since you've commented that this is for a game and you are interested in irregular shapes, you can generalize this as follows. For any small irregular shape, enclose it in a circle that represent how far you want it to be from things. Your grid can be the size of the smallest terrain feature. Larger features can encompass 1x2 or 2x2 or 3x2 or 3x3 etc. contiguous blocks. Note that many games with features that span large distances (mountains) and small distances (torches) often require grids which are recursively split (i.e. some blocks are split into further 2x2 or 2x2x2 subblocks), generating a tree structure. This structure with extensive bookkeeping will allow you to randomly place the contiguous blocks, however it requires a lot of coding. What you can do however is use the circle-grid algorithm to place the larger features first (when there's lot of space to work with on the map and you can just check adjacent gridboxes for a collection without running into the coupon-collector's problem), then place the smaller features. If you can place your features in this order, this requires almost no extra coding besides checking neighboring gridboxes for collisions when you place a 1x2/3x3/etc. group.
One way to do this that produces interesting looking results is
create an empty NxM grid
create an empty has-open-neighbors set
for i = 1 to NumberOfRegions
pick a random point in the grid
assign that grid point a (terrain) type
add the point to the has-open-neighbors set
while has-open-neighbors is not empty
foreach point in has-open-neighbors
get neighbor-points as the immediate neighbors of point
that don't have an assigned terrain type in the grid
if none
remove point from has-open-neighbors
else
pick a random neighbor-point from neighbor-points
assign its grid location the same (terrain) type as point
add neighbor-point to the has-open-neighbors set
When done, has-open-neighbors will be empty and the grid will have been populated with at most NumberOfRegions regions (some regions with the same terrain type may be adjacent and so will combine to form a single region).
Sample output using this algorithm with 30 points, 14 terrain types, and a 200x200 pixel world:
Edit: tried to clarify the algorithm.
How about using a 2-step process:
Choose a bunch of n points randomly -- these will become the centres of the circles.
Determine the radii of these circles so that they do not overlap.
For step 2, for each circle centre you need to know the distance to its nearest neighbour. (This can be computed for all points in O(n^2) time using brute force, although it may be that faster algorithms exist for points in the plane.) Then simply divide that distance by 2 to get a safe radius. (You can also shrink it further, either by a fixed amount or by an amount proportional to the radius, to ensure that no circles will be touching.)
To see that this works, consider any point p and its nearest neighbour q, which is some distance d from p. If p is also q's nearest neighbour, then both points will get circles with radius d/2, which will therefore be touching; OTOH, if q has a different nearest neighbour, it must be at distance d' < d, so the circle centred at q will be even smaller. So either way, the 2 circles will not overlap.
My idea would be to start out with a compact grid layout. Then take each circle and perturb it in some random direction. The distance in which you perturb it can also be chosen at random (just make sure that the distance doesn't make it overlap another circle).
This is just an idea and I'm sure there are a number of ways you could modify it and improve upon it.

Resources