How to find the order of discrete point-set efficiently? - performance

I have a series of discrete point on a plane, However, their order is scattered. Here is an instance:
To connect them with a smooth curve, I wrote a findSmoothBoundary() to achieve the smooth boundary.
Code
function findSmoothBoundary(boundaryPointSet)
%initialize the current point
currentP = boundaryPointSet(1,:);
%Create a space smoothPointsSet to store the point
smoothPointsSet = NaN*ones(length(boundaryPointSet),2);
%delete the current point from the boundaryPointSet
boundaryPointSet(1,:) = [];
ptsNum = 1; %record the number of smoothPointsSet
smoothPointsSet(ptsNum,:) = currentP;
while ~isempty(boundaryPointSet)
%ultilize the built-in knnsearch() to
%achieve the nearest point of current point
nearestPidx = knnsearch(boundaryPointSet,currentP);
currentP = boundaryPointSet(nearestPidx,:);
ptsNum = ptsNum + 1;
smoothPointsSet(ptsNum,:) = currentP;
%delete the nearest point from boundaryPointSet
boundaryPointSet(nearestPidx,:) = [];
end
%visualize the smooth boundary
plot(smoothPointsSet(:,1),smoothPointsSet(:,2))
axis equal
end
Although findSmoothBoundary() can find the smooth boundary rightly, but its efficiency is much lower ( About the data, please see here)
So I would like to know:
How to find the discrete point order effieciently?
Data
theta = linspace(0,2*pi,1000)';
boundaryPointSet= [2*sin(theta),cos(theta)];
tic;
findSmoothBoundary(boundaryPointSet)
toc;
%Elapsed time is 4.570719 seconds.

This answer is not perfect because I'll have to make a few hypothesis in order for it to work. However, for a vast majority of cases, it should works as intended. Moreover, from the link you gave in the comments, I think these hypothesis are at least weak, if not verified by definition :
1. The point form a single connected region
2. The center of mass of your points lies in the convex hull of those points
If these hypothesis are respected, you can do the following (Full code available at the end):
Step 1 : Calculate the center of mass of your points
Means=mean(boundaryPointSet);
Step 2 : Change variables to set the origin to the center of mass
boundaryPointSet(:,1)=boundaryPointSet(:,1)-Means(1);
boundaryPointSet(:,2)=boundaryPointSet(:,2)-Means(2);
Step3 : Convert coordinates to polar
[Angles,Radius]=cart2pol(boundaryPointSet(:,1),boundaryPointSet(:,2));
Step4 : Sort the Angle and use this sorting to sort the Radius
[newAngles,ids]=sort(Angles);
newRadius=Radius(ids);
Step5 : Go back to cartesian coordinates and re-add the coordinates of the center of mass:
[X,Y]=pol2cart(newAngles,newRadius);
X=X+Means(1);
Y=Y+means(2);
Full Code
%%% Find smooth boundary
fid=fopen('SmoothBoundary.txt');
A=textscan(fid,'%f %f','delimiter',',');
boundaryPointSet=cell2mat(A);
boundaryPointSet(any(isnan(boundaryPointSet),2),:)=[];
idx=randperm(size(boundaryPointSet,1));
boundaryPointSet=boundaryPointSet(idx,:);
tic
plot(boundaryPointSet(:,1),boundaryPointSet(:,2))
%% Find mean value of all parameters
Means=mean(boundaryPointSet);
%% Center values around Mean point
boundaryPointSet(:,1)=boundaryPointSet(:,1)-Means(1);
boundaryPointSet(:,2)=boundaryPointSet(:,2)-Means(2);
%% Get polar coordinates of your points
[Angles,Radius]=cart2pol(boundaryPointSet(:,1),boundaryPointSet(:,2));
[newAngles,ids]=sort(Angles);
newRadius=Radius(ids);
[X,Y]=pol2cart(newAngles,newRadius);
X=X+Means(1);
Y=Y+means(2);
toc
figure
plot(X,Y);
Note : As your values are already sorted in your input file, I had to mess it up a bit by permutating them
Outputs :
Boundary
Elapsed time is 0.131808 seconds.
Messed Input :
Output :

Related

How can you iterate linearly through a 3D grid?

Assume we have a 3D grid that spans some 3D space. This grid is made out of cubes, the cubes need not have integer length, they can have any possible floating point length.
Our goal is, given a point and a direction, to check linearly each cube in our path once and exactly once.
So if this was just a regular 3D array and the direction is say in the X direction, starting at position (1,2,0) the algorithm would be:
for(i in number of cubes)
{
grid[1+i][2][0]
}
But of course the origin and the direction are arbitrary and floating point numbers, so it's not as easy as iterating through only one dimension of a 3D array. And the fact the side lengths of the cubes are also arbitrary floats makes it slightly harder as well.
Assume that your cube side lengths are s = (sx, sy, sz), your ray direction is d = (dx, dy, dz), and your starting point is p = (px, py, pz). Then, the ray that you want to traverse is r(t) = p + t * d, where t is an arbitrary positive number.
Let's focus on a single dimension. If you are currently at the lower boundary of a cube, then the step length dt that you need to make on your ray in order to get to the upper boundary of the cube is: dt = s / d. And we can calculate this step length for each of the three dimensions, i.e. dt is also a 3D vector.
Now, the idea is as follows: Find the cell where the ray's starting point lies in and find the parameter values t where the first intersection with the grid occurs per dimension. Then, you can incrementally find the parameter values where you switch from one cube to the next for each dimension. Sort the changes by the respective t value and just iterate.
Some more details:
cell = floor(p - gridLowerBound) / s <-- the / is component-wise division
I will only cover the case where the direction is positive. There are some minor changes if you go in the negative direction but I am sure that you can do these.
Find the first intersections per dimension (nextIntersection is a 3D vector):
nextIntersection = ((cell + (1, 1, 1)) * s - p) / d
And calculate the step length:
dt = s / d
Now, just iterate:
if(nextIntersection.x < nextIntersection.y && nextIntersection.x < nextIntersection.z)
cell.x++
nextIntersection.x += dt.x
else if(nextIntersection.y < nextIntersection.z)
cell.y++
nextIntersection.y += dt.y
else
cell.z++
nextIntersection.z += dt.z
end if
if cell is outside of grid
terminate
I have omitted the case where two or three cells are changed at the same time. The above code will only change one at a time. If you need this, feel free to adapt the code accordingly.
Well if you are working with floats, you can make the equation for the line in direction specifiedd. Which is parameterized by t. Because in between any two floats there is a finite number of points, you can simply check each of these points which cube they are in easily cause you have point (x,y,z) whose components should be in, a respective interval defining a cube.
The issue gets a little bit harder if you consider intervals that are, dense.
The key here is even with floats this is a discrete problem of searching. The fact that the equation of a line between any two points is a discrete set of points means you merely need to check them all to the cube intervals. What's better is there is a symmetry (a line) allowing you to enumerate each point easily with arithmetic expression, one after another for checking.
Also perhaps consider integer case first as it is same but slightly simpler in determining the discrete points as it is a line in Z_2^8?

Best way to find all points of lattice in sphere

Given a bunch of arbitrary vectors (stored in a matrix A) and a radius r, I'd like to find all integer-valued linear combinations of those vectors which land inside a sphere of radius r. The necessary coordinates I would then store in a Matrix V. So, for instance, if the linear combination
K=[0; 1; 0]
lands inside my sphere, i.e. something like
if norm(A*K) <= r then
V(:,1)=K
end
etc.
The vectors in A are sure to be the simplest possible basis for the given lattice and the largest vector will have length 1. Not sure if that restricts the vectors in any useful way but I suspect it might. - They won't have as similar directions as a less ideal basis would have.
I tried a few approaches already but none of them seem particularly satisfying. I can't seem to find a nice pattern to traverse the lattice.
My current approach involves starting in the middle (i.e. with the linear combination of all 0s) and go through the necessary coordinates one by one. It involves storing a bunch of extra vectors to keep track of, so I can go through all the octants (in the 3D case) of the coordinates and find them one by one. This implementation seems awfully complex and not very flexible (in particular it doesn't seem to be easily generalizable to arbitrary numbers of dimension - although that isn't strictly necessary for the current purpose, it'd be a nice-to-have)
Is there a nice* way to find all the required points?
(*Ideally both efficient and elegant**. If REALLY necessary, it wouldn't matter THAT much to have a few extra points outside the sphere but preferably not that many more. I definitely do need all the vectors inside the sphere. - if it makes a large difference, I'm most interested in the 3D case.
**I'm pretty sure my current implementation is neither.)
Similar questions I found:
Find all points in sphere of radius r around arbitrary coordinate - this is actually a much more general case than what I'm looking for. I am only dealing with periodic lattices and my sphere is always centered at 0, coinciding with one point on the lattice.
But I don't have a list of points but rather a matrix of vectors with which I can generate all the points.
How to efficiently enumerate all points of sphere in n-dimensional grid - the case for a completely regular hypercubic lattice and the Manhattan-distance. I'm looking for completely arbitary lattices and euclidean distance (or, for efficiency purposes, obviously the square of that).
Offhand, without proving any assertions, I think that 1) if the set of vectors is not of maximal rank then the number of solutions is infinite; 2) if the set is of maximal rank, then the image of the linear transformation generated by the vectors is a subspace (e.g., plane) of the target space, which intersects the sphere in a lower-dimensional sphere; 3) it follows that you can reduce the problem to a 1-1 linear transformation (kxk matrix on a k-dimensional space); 4) since the matrix is invertible, you can "pull back" the sphere to an ellipsoid in the space containing the lattice points, and as a bonus you get a nice geometric description of the ellipsoid (principal axis theorem); 5) your problem now becomes exactly one of determining the lattice points inside the ellipsoid.
The latter problem is related to an old problem (counting the lattice points inside an ellipse) which was considered by Gauss, who derived a good approximation. Determining the lattice points inside an ellipse(oid) is probably not such a tidy problem, but it probably can be reduced one dimension at a time (the cross-section of an ellipsoid and a plane is another ellipsoid).
I found a method that makes me a lot happier for now. There may still be possible improvements, so if you have a better method, or find an error in this code, definitely please share. Though here is what I have for now: (all written in SciLab)
Step 1: Figure out the maximal ranges as defined by a bounding n-parallelotope aligned with the axes of the lattice vectors. Thanks for ElKamina's vague suggestion as well as this reply to another of my questions over on math.se by chappers: https://math.stackexchange.com/a/1230160/49989
function I=findMaxComponents(A,r) //given a matrix A of lattice basis vectors
//and a sphere radius r,
//find the corners of the bounding parallelotope
//built from the lattice, and store it in I.
[dims,vecs]=size(A); //figure out how many vectors there are in A (and, unnecessarily, how long they are)
U=eye(vecs,vecs); //builds matching unit matrix
iATA=pinv(A'*A); //finds the (pseudo-)inverse of A^T A
iAT=pinv(A'); //finds the (pseudo-)inverse of A^T
I=[]; //initializes I as an empty vector
for i=1:vecs //for each lattice vector,
t=r*(iATA*U(:,i))/norm(iAT*U(:,i)) //find the maximum component such that
//it fits in the bounding n-parallelotope
//of a (n-1)-sphere of radius r
I=[I,t(i)]; //and append it to I
end
I=[-I;I]; //also append the minima (by symmetry, the negative maxima)
endfunction
In my question I only asked for a general basis, i.e, for n dimensions, a set of n arbitrary but linearly independent vectors. The above code, by virtue of using the pseudo-inverse, works for matrices of arbitrary shapes and, similarly, Scilab's "A'" returns the conjugate transpose rather than just the transpose of A so it equally should work for complex matrices.
In the last step I put the corresponding minimal components.
For one such A as an example, this gives me the following in Scilab's console:
A =
0.9701425 - 0.2425356 0.
0.2425356 0.4850713 0.7276069
0.2425356 0.7276069 - 0.2425356
r=3;
I=findMaxComponents(A,r)
I =
- 2.9494438 - 3.4186986 - 4.0826424
2.9494438 3.4186986 4.0826424
I=int(I)
I =
- 2. - 3. - 4.
2. 3. 4.
The values found by findMaxComponents are the largest possible coefficients of each lattice vector such that a linear combination with that coefficient exists which still land on the sphere. Since I'm looking for the largest such combinations with integer coefficients, I can safely drop the part after the decimal point to get the maximal plausible integer ranges. So for the given matrix A, I'll have to go from -2 to 2 in the first component, from -3 to 3 in the second and from -4 to 4 in the third and I'm sure to land on all the points inside the sphere (plus superfluous extra points, but importantly definitely every valid point inside) Next up:
Step 2: using the above information, generate all the candidate combinations.
function K=findAllCombinations(I) //takes a matrix of the form produced by
//findMaxComponents() and returns a matrix
//which lists all the integer linear combinations
//in the respective ranges.
v=I(1,:); //starting from the minimal vector
K=[];
next=1; //keeps track of what component to advance next
changed=%F; //keeps track of whether to add the vector to the output
while or(v~=I(2,:)) //as long as not all components of v match all components of the maximum vector
if v <= I(2,:) then //if each current component is smaller than each largest possible component
if ~changed then
K=[K;v]; //store the vector and
end
v(next)=v(next)+1; //advance the component by 1
next=1; //also reset next to 1
changed=%F;
else
v(1:next)=I(1,1:next); //reset all components smaller than or equal to the current one and
next=next+1; //advance the next larger component next time
changed=%T;
end
end
K=[K;I(2,:)]'; //while loop ends a single iteration early so add the maximal vector too
//also transpose K to fit better with the other functions
endfunction
So now that I have that, all that remains is to check whether a given combination actually does lie inside or outside the sphere. All I gotta do for that is:
Step 3: Filter the combinations to find the actually valid lattice points
function points=generatePoints(A,K,r)
possiblePoints=A*K; //explicitly generates all the possible points
points=[];
for i=possiblePoints
if i'*i<=r*r then //filter those that are too far from the origin
points=[points i];
end
end
endfunction
And I get all the combinations that actually do fit inside the sphere of radius r.
For the above example, the output is rather long: Of originally 315 possible points for a sphere of radius 3 I get 163 remaining points.
The first 4 are: (each column is one)
- 0.2425356 0.2425356 1.2126781 - 0.9701425
- 2.4253563 - 2.6678919 - 2.4253563 - 2.4253563
1.6977494 0. 0.2425356 0.4850713
so the remainder of the work is optimization. Presumably some of those loops could be made faster and especially as the number of dimensions goes up, I have to generate an awful lot of points which I have to discard, so maybe there is a better way than taking the bounding n-parallelotope of the n-1-sphere as a starting point.
Let us just represent K as X.
The problem can be represented as:
(a11x1 + a12x2..)^2 + (a21x1 + a22x2..)^2 ... < r^2
(x1,x2,...) will not form a sphere.
This can be done with recursion on dimension--pick a lattice hyperplane direction and index all such hyperplanes that intersect the r-radius ball. The ball intersection of each such hyperplane itself is a ball, in one lower dimension. Repeat. Here's the calling function code in Octave:
function lat_points(lat_bas_mx,rr)
% **globals for hyperplane lattice point recursive function**
clear global; % this seems necessary/important between runs of this function
global MLB;
global NN_hat;
global NN_len;
global INP; % matrix of interior points, each point(vector) a column vector
global ctr; % integer counter, for keeping track of lattice point vectors added
% in the pre-allocated INP matrix; will finish iteration with actual # of points found
ctr = 0; % counts number of ball-interior lattice points found
MLB = lat_bas_mx;
ndim = size(MLB)(1);
% **create hyperplane normal vectors for recursion step**
% given full-rank lattice basis matrix MLB (each vector in lattice basis a column),
% form set of normal vectors between successive, nested lattice hyperplanes;
% store them as columnar unit normal vectors in NN_hat matrix and their lengths in NN_len vector
NN_hat = [];
for jj=1:ndim-1
tmp_mx = MLB(:,jj+1:ndim);
tmp_mx = [NN_hat(:,1:jj-1),tmp_mx];
NN_hat(:,jj) = null(tmp_mx'); % null space of transpose = orthogonal to columns
tmp_len = norm(NN_hat(:,jj));
NN_hat(:,jj) = NN_hat(:,jj)/tmp_len;
NN_len(jj) = dot(MLB(:,jj),NN_hat(:,jj));
if (NN_len(jj)<0) % NN_hat(:,jj) and MLB(:,jj) must have positive dot product
% for cutting hyperplane indexing to work correctly
NN_hat(:,jj) = -NN_hat(:,jj);
NN_len(jj) = -NN_len(jj);
endif
endfor
NN_len(ndim) = norm(MLB(:,ndim));
NN_hat(:,ndim) = MLB(:,ndim)/NN_len(ndim); % the lowest recursion level normal
% is just the last lattice basis vector
% **estimate number of interior lattice points, and pre-allocate memory for INP**
vol_ppl = prod(NN_len); % the volume of the ndim dimensional lattice paralellepiped
% is just the product of the NN_len's (they amount to the nested altitudes
% of hyperplane "paralellepipeds")
vol_bll = exp( (ndim/2)*log(pi) + ndim*log(rr) - gammaln(ndim/2+1) ); % volume of ndim ball, radius rr
est_num_pts = ceil(vol_bll/vol_ppl); % estimated number of lattice points in the ball
err_fac = 1.1; % error factor for memory pre-allocation--assume max of err_fac*est_num_pts columns required in INP
INP = zeros(ndim,ceil(err_fac*est_num_pts));
% **call the (recursive) function**
% for output, global variable INP (matrix of interior points)
% stores each valid lattice point (as a column vector)
clp = zeros(ndim,1); % confirmed lattice point (start at origin)
bpt = zeros(ndim,1); % point at center of ball (initially, at origin)
rd = 1; % initial recursion depth must always be 1
hyp_fun(clp,bpt,rr,ndim,rd);
printf("%i lattice points found\n",ctr);
INP = INP(:,1:ctr); % trim excess zeros from pre-allocation (if any)
endfunction
Regarding the NN_len(jj)*NN_hat(:,jj) vectors--they can be viewed as successive (nested) altitudes in the ndim-dimensional "parallelepiped" formed by the vectors in the lattice basis, MLB. The volume of the lattice basis parallelepiped is just prod(NN_len)--for a quick estimate of the number of interior lattice points, divide the volume of the ndim-ball of radius rr by prod(NN_len). Here's the recursive function code:
function hyp_fun(clp,bpt,rr,ndim,rd)
%{
clp = the lattice point we're entering this lattice hyperplane with
bpt = location of center of ball in this hyperplane
rr = radius of ball
rd = recrusion depth--from 1 to ndim
%}
global MLB;
global NN_hat;
global NN_len;
global INP;
global ctr;
% hyperplane intersection detection step
nml_hat = NN_hat(:,rd);
nh_comp = dot(clp-bpt,nml_hat);
ix_hi = floor((rr-nh_comp)/NN_len(rd));
ix_lo = ceil((-rr-nh_comp)/NN_len(rd));
if (ix_hi<ix_lo)
return % no hyperplane intersections detected w/ ball;
% get out of this recursion level
endif
hp_ix = [ix_lo:ix_hi]; % indices are created wrt the received reference point
hp_ln = length(hp_ix);
% loop through detected hyperplanes (updated)
if (rd<ndim)
bpt_new_mx = bpt*ones(1,hp_ln) + NN_len(rd)*nml_hat*hp_ix; % an ndim by length(hp_ix) matrix
clp_new_mx = clp*ones(1,hp_ln) + MLB(:,rd)*hp_ix; % an ndim by length(hp_ix) matrix
dd_vec = nh_comp + NN_len(rd)*hp_ix; % a length(hp_ix) row vector
rr_new_vec = sqrt(rr^2-dd_vec.^2);
for jj=1:hp_ln
hyp_fun(clp_new_mx(:,jj),bpt_new_mx(:,jj),rr_new_vec(jj),ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
INP(:,ctr+1:ctr+hp_ln) = clp + MLB(:,rd)*hp_ix;
ctr += hp_ln;
return
endif
endfunction
This has some Octave-y/Matlab-y things in it, but most should be easily understandable; M(:,jj) references column jj of matrix M; the tic ' means take transpose; [A B] concatenates matrices A and B; A=[] declares an empty matrix.
Updated / better optimized from original answer:
"vectorized" the code in the recursive function, to avoid most "for" loops (those slowed it down a factor of ~10; the code now is a bit more difficult to understand though)
pre-allocated memory for the INP matrix-of-interior points (this speeded it up by another order of magnitude; before that, Octave was having to resize the INP matrix for every call to the innermost recursion level--for large matrices/arrays that can really slow things down)
Because this routine was part of a project, I also coded it in Python. From informal testing, the Python version is another 2-3 times faster than this (Octave) version.
For reference, here is the old, much slower code in the original posting of this answer:
% (OLD slower code, using for loops, and constantly resizing
% the INP matrix) loop through detected hyperplanes
if (rd<ndim)
for jj=1:length(hp_ix)
bpt_new = bpt + hp_ix(jj)*NN_len(rd)*nml_hat;
clp_new = clp + hp_ix(jj)*MLB(:,rd);
dd = nh_comp + hp_ix(jj)*NN_len(rd);
rr_new = sqrt(rr^2-dd^2);
hyp_fun(clp_new,bpt_new,rr_new,ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
for jj=1:length(hp_ix)
clp_new = clp + hp_ix(jj)*MLB(:,rd);
INP = [INP clp_new];
endfor
return
endif

Determine whether the direction of a line segment is clockwise or anti clockwise

I have a list of 2D points (x1,y1),(x2,y2)......(Xn,Yn) representing a curved segment, is there any formula to determine whether the direction of drawing that segment is clockwise or anti clockwise ?
any help is appreciated
Alternately, you can use a bit of linear algebra. If you have three points a, b, and c, in that order, then do the following:
1) create the vectors u = (b-a) = (b.x-a.x,b.y-a.y) and v = (c-b) ...
2) calculate the cross product uxv = u.x*v.y-u.y*v.x
3) if uxv is -ve then a-b-c is curving in clockwise direction (and vice-versa).
by following a longer curve along in the same manner, you can even detect when as 's'-shaped curve changes from clockwise to anticlockwise, if that is useful.
One possible approach. It should work reasonably well if the sampling of the line represented by your list of points is uniform and smooth enough, and if the line is sufficiently simple.
Subtract the mean to "center" the line.
Convert to polar coordinates to get the angle.
Unwrap the angle, to make sure its increments are meaningful.
Check if total increment is possitive or negative.
I'm assuming you have the data in x and y vectors.
theta = cart2pol(x-mean(x), y-mean(y)); %// steps 1 and 2
theta = unwrap(theta); %// step 3
clockwise = theta(end)<theta(1); %// step 4. Gives 1 if CW, 0 if ACW
This only considers the integrated effect of all points. It doesn't tell you if there are "kinks" or sections with different directions of turn along the way.
A possible improvement would be to replace the average of x and y by some kind of integral. The reason is: if sampling is denser in a region the average will be biased towards that, whereas the integral wouldn't.
Now this is my approach, as mentioned in a comment to the question -
Another approach: draw a line from starting point to ending point. This line is indeed a vector. A CW curve has most of its part on RHS of this line. For CCW, left.
I wrote a sample code to elaborate this idea. Most of the explanation can be found in comments in the code.
clear;clc;close all
%% draw a spiral curve
N = 30;
theta = linspace(0,pi/2,N); % a CCW curve
rho = linspace(1,.5,N);
[x,y] = pol2cart(theta,rho);
clearvars theta rho N
plot(x,y);
hold on
%% find "the vector"
vec(:,:,1) = [x(1), y(1); x(end), y(end)]; % "the vector"
scatter(x(1),y(1), 200,'s','r','fill') % square is the starting point
scatter(x(end),y(end), 200,'^','r','fill') % triangle is the ending point
line(vec(:,1,1), vec(:,2,1), 'LineStyle', '-', 'Color', 'r')
%% find center of mass
com = [mean(x), mean(y)]; % center of mass
vec(:,:,2) = [x(1), y(1); com]; % secondary vector (start -> com)
scatter(com(1), com(2), 200,'d','k','fill') % diamond is the com
line(vec(:,1,2), vec(:,2,2), 'LineStyle', '-', 'Color', 'k')
%% find rotation angle
dif = diff(vec,1,1);
[ang, ~] = cart2pol(reshape(dif(1,1,:),1,[]), reshape(dif(1,2,:),1,[]));
clearvars dif
% now you can tell the answer by the rotation angle
if ( diff(ang)>0 )
disp('CW!')
else
disp('CCW!')
end
One can always tell on which side of the directed line (the vector) a point is, by comparing two vectors, namely, rotating vector [starting point -> center of mass] to the vector [starting point -> ending point], and then comparing the rotation angle to 0. A few seconds of mind-animating can help understand.

How to filter a set of 2D points moving in a certain way

I have a list of points moving in two dimensions (x- and y-axis) represented as rows in an array. I might have N points - i.e., N rows:
1 t1 x1 y1
2 t2 x2 y2
.
.
.
N tN xN yN
where ti, xi, and yi, is the time-index, x-coordinate, and the y-coordinate for point i. The time index-index ti is an integer from 1 to T. The number of points at each such possible time index can vary from 0 to N (still with only N points in total).
My goal is the filter out all the points that do not move in a certain way; or to keep only those that do. A point must move in a parabolic trajectory - with decreasing x- and y-coordinate (i.e., moving to the left and downwards only). Points with other dynamic behaviour must be removed.
Can I use a simple sorting mechanism on this array - and then analyse the order of the time-index? I have also considered the fact each point having the same time-index ti are physically distinct points, and so should be paired up with other points. The complexity of the problem grew - and now I turn to you.
NOTE: You can assume that the points are confined to a sub-region of the (x,y)-plane between two parabolic curves. These curves intersect only at only at one point: A point close to the origin of motion for any point.
More Information:
I have made some datafiles available:
MATLAB datafile (1.17 kB)
same data as CSV with semicolon as column separator (2.77 kB)
Necessary context:
The datafile hold one uint32 array with 176 rows and 5 columns. The columns are:
pixel x-coordinate in 175-by-175 lattice
pixel y-coordinate in 175-by-175 lattice
discrete theta angle-index
time index (from 1 to T = 10)
row index for this original sorting
The points "live" in a 175-by-175 pixel-lattice - and again inside the upper quadrant of a circle with radius 175. The points travel on the circle circumference in a counterclockwise rotation to a certain angle theta with horizontal, where they are thrown off into something close to a parabolic orbit. Column 3 holds a discrete index into a list with indices 1 to 45 from 0 to 90 degress (one index thus spans 2 degrees). The theta-angle was originally deduces solely from the points by setting up the trivial equations of motions and solving for the angle. This gives rise to a quasi-symmetric quartic which can be solved in close-form. The actual metric radius of the circle is 0.2 m and the pixel coordinate were converted from pixel-coordinate to metric using simple linear interpolation (but what we see here are the points in original pixel-space).
My problem is that some points are not behaving properly and since I need to statistics on the theta angle, I need to remove the points that certainly do NOT move in a parabolic trajoctory. These error are expected and fully natural, but still need to be filtered out.
MATLAB plot code:
% load data and setup variables:
load mat_points.mat;
num_r = 175;
num_T = 10;
num_gridN = 20;
% begin plotting:
figure(1000);
clf;
plot( ...
num_r * cos(0:0.1:pi/2), ...
num_r * sin(0:0.1:pi/2), ...
'Color', 'k', ...
'LineWidth', 2 ...
);
axis equal;
xlim([0 num_r]);
ylim([0 num_r]);
hold all;
% setup grid (yea... went crazy with one):
vec_tickValues = linspace(0, num_r, num_gridN);
cell_tickLabels = repmat({''}, size(vec_tickValues));
cell_tickLabels{1} = sprintf('%u', vec_tickValues(1));
cell_tickLabels{end} = sprintf('%u', vec_tickValues(end));
set(gca, 'XTick', vec_tickValues);
set(gca, 'XTickLabel', cell_tickLabels);
set(gca, 'YTick', vec_tickValues);
set(gca, 'YTickLabel', cell_tickLabels);
set(gca, 'GridLineStyle', '-');
grid on;
% plot points per timeindex (with increasing brightness):
vec_grayIndex = linspace(0,0.9,num_T);
for num_kt = 1:num_T
vec_xCoords = mat_points((mat_points(:,4) == num_kt), 1);
vec_yCoords = mat_points((mat_points(:,4) == num_kt), 2);
plot(vec_xCoords, vec_yCoords, 'o', ...
'MarkerEdgeColor', 'k', ...
'MarkerFaceColor', vec_grayIndex(num_kt) * ones(1,3) ...
);
end
Thanks :)
Why, it looks almost as if you're simulating a radar tracking debris from the collision of two missiles...
Anyway, let's coin a new term: object. Objects are moving along parabolae and at certain times they may emit flashes that appear as points. There are also other points which we are trying to filter out.
We will need some more information:
Can we assume that the objects obey the physics of things falling under gravity?
Must every object emit a point at every timestep during its lifetime?
Speaking of lifetime, do all objects begin at the same time? Can some expire before others?
How precise is the data? Is it exact? Is there a measure of error? To put it another way, do we understand how poorly the points from an object might fit a perfect parabola?
Sort the data with (index,time) as keys and for all locations of a point i see if they follow parabolic trajectory?
Which part are you facing problem? Sorting should be very easy. IMHO, it is the second part (testing if a set of points follow parabolic trajectory) that is difficult.

Finding the area of a 2-D data set

I have a .txt file with about 100,000 points in the 2-D plane. When I plot the points, there is a clearly defined 2-D region (think of a 2-D disc that has been morphed a bit).
What is the easiest way to compute the area of this region? Any way of doing easily in Matlab?
I made a polygonal approximation by finding a bunch (like 40) points on the boundary of the region and computing the area of the polygonal region in Matlab, but I was wondering if there is another, less tedious method than finding 40 points on the boundary.
Consider this example:
%# random points
x = randn(300,1);
y = randn(300,1);
%# convex hull
dt = DelaunayTri(x,y);
k = convexHull(dt);
%# area of convex hull
ar = polyarea(dt.X(k,1),dt.X(k,2))
%# plot
plot(dt.X(:,1), dt.X(:,2), '.'), hold on
fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha', 0.2);
hold off
title( sprintf('area = %g',ar) )
There is a short screencast By Doug Hull which solves this exact problem.
EDIT:
I am posting a second answer inspired by the solution proposed by #Jean-FrançoisCorbett.
First I create random data, and using the interactive brush tool, I remove some points to make it look like the desired "kidney" shape...
To have a baseline to compare against, we can manually trace the enclosing region using the IMFREEHAND function (I'm doing this using my laptop's touchpad, so not the most accurate drawing!). Then we find the area of this polygon using POLYAREA. Just like my previous answer, I compute the convex hull as well:
Now, and based on a previous SO question I had answered (2D histogram), the idea is to lay a grid over the data. The choice of the grid resolution is very important, mine was numBins = [20 30]; for the data used.
Next we count the number of squares containing enough points (I used at least 1 point as threshold, but you could try a higher value). Finally we multiply this count by the area of one grid square to obtain the approximated total area.
%### DATA ###
%# some random data
X = randn(100000,1)*1;
Y = randn(100000,1)*2;
%# HACK: remove some point to make data look like a kidney
idx = (X<-1 & -4<Y & Y<4 ); X(idx) = []; Y(idx) = [];
%# or use the brush tool
%#brush on
%### imfreehand ###
figure
line('XData',X, 'YData',Y, 'LineStyle','none', ...
'Color','b', 'Marker','.', 'MarkerSize',1);
daspect([1 1 1])
hROI = imfreehand('Closed',true);
pos = getPosition(hROI); %# pos = wait(hROI);
delete(hROI)
%# total area
ar1 = polyarea(pos(:,1), pos(:,2));
%# plot
hold on, plot(pos(:,1), pos(:,2), 'Color','m', 'LineWidth',2)
title('Freehand')
%### 2D histogram ###
%# center of bins
numBins = [20 30];
xbins = linspace(min(X), max(X), numBins(1));
ybins = linspace(min(Y), max(Y), numBins(2));
%# map X/Y values to bin-indices
Xi = round( interp1(xbins, 1:numBins(1), X, 'linear', 'extrap') );
Yi = round( interp1(ybins, 1:numBins(2), Y, 'linear', 'extrap') );
%# limit indices to the range [1,numBins]
Xi = max( min(Xi,numBins(1)), 1);
Yi = max( min(Yi,numBins(2)), 1);
%# count number of elements in each bin
H = accumarray([Yi(:), Xi(:)], 1, [numBins(2) numBins(1)]);
%# total area
THRESH = 0;
sqNum = sum(H(:)>THRESH);
sqArea = (xbins(2)-xbins(1)) * (ybins(2)-ybins(1));
ar2 = sqNum*sqArea;
%# plot 2D histogram/thresholded_histogram
figure, imagesc(xbins, ybins, H)
axis on, axis image, colormap hot; colorbar; %#caxis([0 500])
title( sprintf('2D Histogram, bins=[%d %d]',numBins) )
figure, imagesc(xbins, ybins, H>THRESH)
axis on, axis image, colormap gray
title( sprintf('H > %d',THRESH) )
%### convex hull ###
dt = DelaunayTri(X,Y);
k = convexHull(dt);
%# total area
ar3 = polyarea(dt.X(k,1), dt.X(k,2));
%# plot
figure, plot(X, Y, 'b.', 'MarkerSize',1), daspect([1 1 1])
hold on, fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha',0.2); hold off
title('Convex Hull')
%### plot ###
figure, hold on
%# plot histogram
imagesc(xbins, ybins, H>=1)
axis on, axis image, colormap gray
%# plot grid lines
xoff = diff(xbins(1:2))/2; yoff = diff(ybins(1:2))/2;
xv1 = repmat(xbins+xoff,[2 1]); xv1(end+1,:) = NaN;
yv1 = repmat([ybins(1)-yoff;ybins(end)+yoff;NaN],[1 size(xv1,2)]);
yv2 = repmat(ybins+yoff,[2 1]); yv2(end+1,:) = NaN;
xv2 = repmat([xbins(1)-xoff;xbins(end)+xoff;NaN],[1 size(yv2,2)]);
xgrid = [xv1(:);NaN;xv2(:)]; ygrid = [yv1(:);NaN;yv2(:)];
line(xgrid, ygrid, 'Color',[0.8 0.8 0.8], 'HandleVisibility','off')
%# plot points
h(1) = line('XData',X, 'YData',Y, 'LineStyle','none', ...
'Color','b', 'Marker','.', 'MarkerSize',1);
%# plot convex hull
h(2) = patch('XData',dt.X(k,1), 'YData',dt.X(k,2), ...
'LineWidth',2, 'LineStyle','-', ...
'EdgeColor','r', 'FaceColor','r', 'FaceAlpha',0.5);
%# plot freehand polygon
h(3) = plot(pos(:,1), pos(:,2), 'g-', 'LineWidth',2);
%# compare results
title(sprintf('area_{freehand} = %g, area_{grid} = %g, area_{convex} = %g', ...
ar1,ar2,ar3))
legend(h, {'Points' 'Convex Jull','FreeHand'})
hold off
Here is the final result of all three methods overlayed, with the area approximations displayed:
My answer is the simplest and perhaps the least elegant and precise. But first, a comment on previous answers:
Since your shape is usually kidney-shaped (not convex), calculating the area of its convex hull won't do, and an alternative is to determine its concave hull (see e.g. http://www.concavehull.com/home.php?main_menu=1) and calculate the area of that. But determining a concave hull is far more difficult than a convex hull. Plus, straggler points will cause trouble in both he convex and concave hull.
Delaunay triangulation followed by pruning, as suggested in #Ed Staub's answer, may a bit be more straightforward.
My own suggestion is this: How precise does your surface area calculation have to be? My guess is, not very. With either concave hull or pruned Delaunay triangulation, you'll have to make an arbitrary choice anyway as to where the "boundary" of your shape is (the edge isn't knife-sharp, and I see there are some straggler points sprinkled around it).
Therefore a simpler algorithm may be just as good for your application.
Divide your image in an orthogonal grid. Loop through all grid "pixels" or squares; if a given square contains at least one point (or perhaps two points?), mark the square as full, else empty. Finally, add the area of all full squares. Bingo.
The only parameter is the resolution length (size of the squares). Its value should be set to something similar to the pruning length in the case of Delaunay triangulation, i.e. "points within my shape are closer to each other than this length, and points further apart than this length should be ignored".
Perhaps an additional parameter is the number of points threshold for a square to be considered full. Maybe 2 would be good to ignore straggler points, but that may define the main shape a bit too tightly for your taste... Try both 1 and 2, and perhaps take an average of both. Or, use 1 and prune away the squares that have no neighbours (game-of-life-style). Simlarly, empty squares whose 8 neighbours are full should be considered full, to avoid holes in the middle of the shape.
There is no end to how much this algorithm can be refined, but due to the arbitrariness intrinsic to the problem definition in your particular application, any refinement is probably the algorithm equivalent of "polishing a turd".
I know next to nothing, so don't put much stock in this... consider doing a Delaunay triangulation. Then remove any hull (outer) edges longer than some maximum. Repeat until nothing to remove. Fill the remaining triangles.
This will orphan some outlier points.
I suggest using a space-filling-curve, for example a z-curve or better a moore curve. A sfc fills the full space and is good to index each points. For example for all f(x)=y you can sort the points of the curve in ascendending order and from that result you take as many points until you get a full roundtrip. These points you can then use to compute the area. Because you have many points maybe you want to use less points and use a cluster which make the result less accurate.
I think you can get the border points using convex hull algorithm with restriction to the edge length (you should sort points by vertical axis). Thus it will follow nonconvexity of your region. I propose length round 0.02. In any case you can experiment a bit with different lengths drawing the result and examining it visually.

Resources