Technique to reduce dense point cloud in 3D - point-clouds

I have a point cloud consist of more than 100,000 points , i have to reduce this dense point cloud.
My point cloud is sorted with respect to z axis.
I used simple mathematics like, if selected point's x=3 , y = 4 , z = 5 . Then compare with remaining point cloud with this criteria (x - x(i) == 0.0001f ) if matches , then try another one till end of the point cloud , and select the most updated one , by this way i am reducing the point cloud. It provides me results , but not up to my expectations.
SO is there any technique to reduce dense point cloud..

I should be writing this as a comment but dont have enough rep.
You can do a singular value decomposition. Take your big long vector Xand do SVD decomposition on it. Plot your singular values obtained and see which of the singular values have a high weight, select those this will get you the optimal rank r of the matrix. So you will reconstruct your original X matrix as X' = U Sig V where each of these are rank r truncated.

Related

Finding closest pair of points in the plane with non-distinct x-coordinates in O(nlogn)

Most of the implementations of the algorithm to find the closest pair of points in the plane that I've seen online have one of two deficiencies: either they fail to meet an O(nlogn) runtime, or they fail to accommodate the case where some points share an x-coordinate. Is a hash map (or equivalent) required to solve this problem optimally?
Roughly, the algorithm in question is (per CLRS Ch. 33.4):
For an array of points P, create additional arrays X and Y such that X contains all points in P, sorted by x-coordinate and Y contains all points in P, sorted by y-coordinate.
Divide the points in half - drop a vertical line so that you split X into two arrays, XL and XR, and divide Y similarly, so that YL contains all points left of the line and YR contains all points right of the line, both sorted by y-coordinate.
Make recursive calls for each half, passing XL and YL to one and XR and YR to the other, and finding the minimum distance, d in each of those halves.
Lastly, determine if there's a pair with one point on the left and one point on the right of the dividing line with distance smaller than d; through a geometric argument, we find that we can adopt the strategy of just searching through the next 7 points for every point within distance d of the dividing line, meaning the recombination of the divided subproblems is only an O(n) step (even if it looks n2 at first glance).
This has some tricky edge cases. One way people deal with this is sorting the strip of points of distance d from the dividing line at every recombination step (e.g. here), but this is known to result in an O(nlog2n) solution.
Another way people deal with edge cases is by assuming each point has a distinct x-coordinate (e.g. here): note the snippet in closestUtil which adds to Pyl (or YL as we call it) if the x-coordinate of a point in Y is <= the line, or to Pyr (YR) otherwise. Note that if all points lie on the same vertical line, this would result us writing past the end of the array in C++, as we write all n points to YL.
So the tricky bit when points can have the same x-coordinate is dividing the points in Y into YL and YR depending on whether a point p in Y is in XL or XR. The pseudocode in CLRS for this is (edited slightly for brevity):
for i = 1 to Y.length
if Y[i] in X_L
Y_L.length = Y_L.length + 1;
Y_L[Y_L.length] = Y[i]
else Y_R.length = Y_R.length + 1;
Y_R[Y_R.length] = Y[i]
However, absent of pseudocode, if we're working with plain arrays, we don't have a magic function that can determine whether Y[i] is in X_L in O(1) time. If we're assured that all x-coordinates are distinct, sure - we know that anything with an x-coordinate less than the dividing line is in XL, so with one comparison we know what array to partition any point p in Y into. But in the case where x-coordinates are not necessarily distinct (e.g. in the case where they all lie on the same vertical line), do we require a hash map to determine whether a point in Y is in XL or XR and successfully break down Y into YL and YR in O(n) time? Or is there another strategy?
Yes, there are at least two approaches that work here.
The first, as Bing Wang suggests, is to apply a rotation. If the angle is sufficiently small, this amounts to breaking ties by y coordinate after comparing by x, no other math needed.
The second is to adjust the algorithm on G4G to use a linear-time partitioning algorithm to divide the instance, and a linear-time sorted merge to conquer it. Presumably this was not done because the author valued the simplicity of sorting relative to the previously mentioned algorithms in most programming languages.
Tardos & Kleinberg suggests annotating each point with its position (index) in X.
You could do this in N time, or, if you really, really want to, you could do it "for free" in the sorting operation.
With this annotation, you could do your O(1) partitioning, and then take the position pr of the right-most point in Xl in O(1), using it to determine weather a point in Y goes in Yl (position <= pr), or Yr (position > pr). This does not require an extra data structure like a hash map, but it does require that those same positions are used in X and Y.
NB:
It is not immediately obvious to me that the partitioning of Y is the only problem that arises when multiple points have the same coordinate on the x-axis. It seems to me that the proof of linearity of the comparisons neccesary across partitions breaks, but I have seen only the proof that you need only 15 comparisons, not the proof for the stricter 7-point version, so i cannot be sure.

Get spiral index from location in 1d, 3d

Based on a known {x,y,z,...} coordinate, I'm looking for the index of a location. A 2-dimensional (2d) solution, is provided here.
I'm now trying to extend this to two other dimensions: 1d and 3d (and possibly to generalize to higher dimensions).
For 1d, I ended up with the following algorithm (Matlab code), where the walk alternates between the right and left side of the axis:
n = 20; %number of values
X = -n/2:n/2; %X values (1d)
%we want 'p' the index of the location:
for i=1:numel(X)
if(X(i) > 0)
p(i) = 2*X(i)-1;
else
p(i) = -2*X(i);
end
end
resulting in the following indexes:
However, I have difficulties in vizualizing how the indexation should takes place in 3d (i.e. how the index walks through the nodes in 3d). I'm primarily interested in a C/C++ solution but any other language is fine.
EDIT
Reflecting #Spektre comments and suggestions: I aim at finding the indexes of a set of 3d coordinates {x,y,z}. This can be seen as a way to map the 3d coordinates into a set of indexes (1d). The spiral provides a convenient way to perform such a task in 2d, but cannot be extended in 3d.
well as you chose line/square/cube like coil of screws to fill your 1D/2D/3D space like this:
I would:
Implement line/square/cube maps
These will map between 1D index ix and coordinate for known "radius". It will be similar to this:
template<class T,int N> class cube_map
See the member functions: ix2dir and dir2ix which maps between index and direction vector. No need to store the surface data you just need these conversion functions. However you need to tweak them so the order of points/indexes represent the pattern you want... in 1D,2D is easy but for 3D I would chose something like surface spiral on cube similar to this:
How to distribute points evenly on the surface of hyperspheres in higher dimensions?
Do not forget to handle even and odd screws in 3D differently (mirror) so screws join at correct locations...
Just to be complete the cube map is like single screw in your 3D system it holds the surface of a cube and can convert between direction vector dir and 1D index ix back and forward. Its used to speed up algorithms in both graphics and geometry ... I think its first use was for Bump Mapping fast vector normalizations. By using cube_map you can do analogy in any dimensionality even 2D,1D easily you just use square_map and Line_map instead without any algo changes. I have tested it on OBB 2D/3D after conversion to cube/square maps the algo stayed the same just the vectors had different number of coordinates (without them the algos where very different).
create/derive equation/LUT that describes how many points which radius covers (inside screws included)
So function that returns number of points that are in coil of r screws together. It will be a series so the equation should be derivable/inferable easily. Let call this:
points = volume(r);
now conversion ix -> (x,y,z...)
first find which radius r your ix is so:
volume(r) <= ix < volume(r+1)
simple for loop will do but even better would be binary search of r. Then convert your ix into index iy in line/square/cube map:
iy = ix-volume(r);
Now just use the ix2dir(iy) function to obtain your point position...
Reverse conversion (x,y,z...) -> ix
r = max(|x|,|y|,|z|,...)
iy = dir2ix(x,y,z,...)
ix = volume(r-1) + iy

How does Principle Component Initialization work for determining the weights of the map vectors in Self Organizing Maps?

I studied on a fundamental SOM initialization and was looking to understand exactly how this process, PCI, works for initializing weight vectors on the map. My understanding is that for a two dimensional Map, this initialization method looks at the eigenvectors for the two largest eigenvalues of the data matrix and then uses the subspace spanned by these eigenvectors to initialize the map. Does that mean that in order to get the initial map weights, does this method take random linear combinations of the largest two eigenvectors in order to generate the map weights? Is there a patten?
For example, for 40 input data vectors on the map, does the lininit initialization method take combinations a1*[e1] + a2*[e2] where [e1] and [e2] are the two largest eigenvectors and a1 and a2 are random integers ranging from -3 to 3? Or is there a different mechanism? I was looking to make sure I knew exactly how lininit takes the two largest eigenvectors of the input data matrix and uses them to construct the initial weight vectors for the map.
The SOM creates a map that has the neighbourhood relationship between nearby nodes. Random initialisation does not help this process, since the nodes start randomly. Therefore, the idea of using the PCA initialisation is just a shortcut to get the map closer to the final state. This saves a lot of computation.
So how does this work? The first two principal components (PCs) are used. Set the initial weights as linear combination of the PCs. Rather than using random a1 and a2, the weights are set in a range that corresponds to the scale of the principal components.
For example, for a 5x3 map, a1 and a2 can both be in the range (-1, 1) with the relevant number of elements. In other words, for the 5x3 map, a1 = [-1.0 -0.5 0.0 0.5 1.0] and a2 = [-1.0 0.0 1.0], with 5 nodes and 3 nodes, respectively.
Then set each of the weights of nodes. For a rectangular SOM, each node has indices [m, n]. Use the values of a1[m] and a2[n]. Thus, for all m = [1 2 3 4 5] and n = [1 2 3]:
weight[m, n] = a1[m] * e1 + a2[n] * e2
That is how to initialize the weights using the principal components. This makes the initial state globally ordered, so now the SOM algorithm is used to create the local ordering.
The Principal Component part of the name is a reference to https://en.wikipedia.org/wiki/Principal_component_analysis.
Here is the idea. You start with data points placed at vectors of many underlying factors. But they may be correlated in your data. So, for example, if you're measuring height, weight, blood pressure, etc, you expect that tall people will weigh more. But what you want to do is replace this with vectors of factors that are not correlated with each other in your data.
So your principal component is a vector of length 1 which is as strongly correlated as possible with the variation in your dataset.
Your secondary component is the vector of length 1 at right angles to the first which is as strongly correlated as possible with the rest of the variation in your data set.
Your tertiary component is the vector of length 1 at right angles to the first two which is as strongly correlated as possible with the rest of the variation in your data set.
And so on.
In practice you may start with many factors, but most of the information is captured in just the first few. For example in the results of intelligence testing the first component is IQ and the second is the difference between how you are at verbal and quantitative reasoning.
How this applies to SOM initialization is that a simple linear model built off of PCA analysis is a pretty good guess for the answer that you're looking for, so starting there reduces how much work you have to do to finish getting the answer.

Normalized Graph Cuts Image Segmentation

I'm implementing the normalized graph-cuts algorithm in MATLAB. Can someone please explain how to proceed after bi-partitioning the second smallest eigen vector. Now I have 2 segments, what is the meaning of "recursively bi-partitioning the segmented parts?"
Recursively bi-partitioning means that you need to write a recursive function. this function should bi-partition your segment in each iteration.
you have image I , you make that image to two partition I1 and I2 , then you need to bi-partition each one ,you can call
I11,I12
and
I21, I22
and then again bi-partition each segment , you can call that
I111,I112
and
I121,I122
, and
I21,I22
, and
I221 ,I222
and continue in this way...
I assume you use Matlab function to solve generalized eigenvalue problem with 'sm' option:
[eigvectors,eigvalues]=eigs(L,D,6,'sm')
L is laplacian matrix = D-W
D is diagonal matrix
W is your weight matrix
number 6 means that you are looking for 6 eigen vectors , and 'sm' means you are looking for smallest magnitude.
My thesis for my Master degree in AI was about Improving segmentation using Normalized cut ; feel free to ask any questions.

permuting the rows and columns of a matrix for clustering [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
i have a distance matrix that is 1000x1000 in dimension and symmetric with 0s along the diagonal. i want to form groupings of distances (clusters) by simultaneously reordering the rows and columns of the matrix. this is like reordering a matrix before you visualize its clusters with a heatmap. i feel like this should be an easy problem, but i am not having much luck finding code that does the permutations online. can anyone help?
Here is one approach that came to mind:
"Sparsify" the matrix so that only "sufficiently close" neighbors have a nonzero value in the matrix.
Use a Cuthill-McKee algorithm to compress the bandwidth of the sparse matrix.
Do a symmetric reordering of the original matrix using the results from Step 2.
Example
I will use Octave (everything I am doing should also work in Matlab) since it has a Reverse Cuthill-McKee (RCM) implementation built in.
First, we need to generate a distance matrix. This function creates a random set of points and their distance matrix:
function [x, y, A] = make_rand_dist_matrix(n)
x = rand(n, 1);
y = rand(n, 1);
A = sqrt((repmat(x, 1, n) - repmat(x', n, 1)).^2 +
(repmat(y, 1, n) - repmat(y', n, 1)).^2);
end
Let's use that to generate and visualize a 100-point example.
[x, y, A] = make_rand_dist_matrix(100);
surf(A);
Viewing the surface plot from above gets the image below (yours will be different, of course).
Warm colors represent greater distances than cool colors. Row (or column, if you prefer) i in the matrix contains the distances between point i and all points. The distance between point i and point j is in entry A(i, j). Our goal is to reorder the matrix so that the row corresponding to point i is near rows corresponding to points a short distance from i.
A simple way to sparsify A is to make all entries greater than some threshold zero, and that is what is done below, although more sophisticated approaches may prove more effective.
B = A < 0.2; % sparsify A -- only values less than 0.2 are nonzeros in B
p = symrcm(B); % compute reordering by Reverse Cuthill-McKee
surf(A(p, p)); % visualize reordered distance matrix
The matrix is now ordered in a way that brings nearby points closer together in the matrix. This result is not optimal, of course. Sparse matrix bandwidth compression is computed using heuristics, and RCM is a very simple approach. As I mentioned above, more sophisticated approaches for producing the sparse matrix may give better results, and different algorithms may also yield better results for the problem.
Just for Fun
Another way to look at what happened is to plot the points and connect a pair of points if their corresponding rows in the matrix are adjacent. Your goal is to have the lines connecting pairs of points that are near each other. For a more dramatic effect, we use a larger set of points than above.
[x, y, A] = make_rand_dist_matrix(2000);
plot(x, y); % plot the points in their initial, random order
Clearly, connections are all over the place and are occurring over a wide variety of distances.
B = A < 0.2; % sparsify A
p = symrcm(B);
plot(x(p), y(p)) % plot the reordered points
After reordering, the connections tend to be over much smaller distances and much more orderly.
Two Matlab functions do this: symrcm and
symamd.
Note that there is no unique solution to this problem. Clustering is another approach.

Resources