I'm having a project need categorize 3D models based on the complexity.
By "complexity", I mean for example, 3D model of furniture in modern style has low complexity, but 3D model of royal style furniture has very high complexity.
All 3D models are mesh type. I only need the very rough estimate, the reliability is not required too high, but should be correct most of times.
Please guide me which way, or the algorithm for this purpose (not based on vertices count).
It the best if we can process inside Meshlab, but any other source is fine too.
Thanks!
Let's consider a sphere: it looks simple, but it can be made of many vertices. I don't think that counting vertices gives a good estimation of complexity. The spheres' vertices are very little diverse.
Let's consider the old vs simple and modern furniture: the old one has potentially many different vertices but their organization is not "simple".
I propose to measure complexity to consider:
the number of different angles (and solid angles) between edges
the number of different edges' length (eg., connected vertices distances)
So far so good. But we got here by counting global complexity. What if with the same set of edges and vertices, we order them and build something that changes in a monotonic manner ? Yes, we need also to take into account local complexity: say the complexity in a limited chunk of space.
An algorithm is taking form:
divide the space into smaller spaces
count sets of different edges by angles and length
You can imagine take into account several scales by ranging the size of space divisions, and count sets every time, and in the end multiply or add the results.
I think you got something interesting. The thing is that this algorithm is rather close to some methods do estimate dimension of a fractal object.
Papers (google scholar) about "estimate fractal dimension"
3D models are composed of vertices, and vertices are connected together by edges to form faces. A rough measure of complexity from a computation standpoint would be to count the vertices or faces.
This approach falls down when trying to categorize the two chairs. It's entirely possible to have a simple chair with more vertices and faces than the regal chair.
To address that limitation I would merge adjacent faces with congruent normal vectors. If the faces share 1 edge and have congruent normal vectors then they can be said to be planar to each other. This would have the effect of simplifying the 3D model. A simple object should have a lower number of vertices / faces after this operation than a more complex model. At least in theory.
I'm sure there's a name for this algorithm, but I don't know it.
Related
I am looking for a numerical method to calculate the volume of the intersection of more than two cylinders in any angle (not just 90° (Steinmetz Solid)). There is an old Hubbell paper (1965) but it just work for two cylinders.
Evidently, I can make the calculation by hand, but I need a numerical method since I am making calculations for millions of random intersections.
Exact computation of the intersection volume looks like an endeavour. The graph of edges can have high complexity, and the edges are complicated skew curves.
I would try with a voxelization of space, one bit per voxel (2000³ voxels requiring 1GB of memory). Maybe an octree representation can help lower the storage requirement, with the number of cells required being closer to the area than to the volume.
Anyway, the filling of the cyclindres will take a quite significant amount of time.
We have some polylines (list of points, has start and end point, not cyclic) and polygons (list of points, cyclic, no such thing as endpoints).
We want to map each polyline to a new polyline and each polygon to a new polygon so the total number of edges is small enough.
Let's say the number of edges originally is N, and we want our result to have M edges. N is much larger than M.
Polylines need to keep their start and end points, so they contribute at least 1 edge, one less than their vertex count. Polygons need to still be polygons, so they contribute at least 3 edges, equal to their vertex count. M will be at least large enough for this requirement.
The outputs should be as close as possible to the inputs. This would end up being an optimization problem of minimizing some metric to within some small tolerance of the true optimal solution. Originally I'd have used the area of the symmetric difference of the original and result (area between), but if another metric makes this easier to do I'll gladly take that instead.
It's okay if the results only include vertices in the original, the fit will be a little worse but it might be necessary to keep the time complexity down.
Since I'm asking for an algorithm, it'd be nice to also see an implementation. I'll likely have to re-implement it for where I'll be using it anyway, so details like what language or what data structures won't matter too much.
As for how good the approximation needs to be, about what you'd expect from getting a vector image from a bitmap image. The actual use here is for a tool for a game though, there's some strange details for the specific game, that's why the output edge count is fixed rather than the tolerance.
It's pretty hard to find any information on this kind of thing, so without even providing a full workable algorithm, just some pointers would be very much appreciated.
Ramer–Douglas–Peucker algorithm (mentioned in the comment) is definitely good, but it has some disadvantages:
It requires open polyline on input, for closed polygon one has to fix an arbitrary point, which either decreases final quality or forces to test many points and decreases the performance.
The vertices of simplified polyline are a subset of original polyline vertices, other options are not considered. This permits very fast implementations, but again decreases the precision of simplified polyline.
Another alternative is to take well known algorithm for simplification of triangular meshes Surface Simplification Using Quadric Error Metrics and adapt it for polylines:
distances to planes containing triangles are replaced with distances to lines containing polyline segments,
quadratic forms lose one dimension if the polyline is two dimensional.
But the majority of the algorithm is kept including the queue of edge contraction (minimal heap) according to the estimated distortion such contraction produces in the polyline.
Here is an example of this algorithm application:
Red - original polyline, blue - simplified polyline, and one can see that all its vertices do not lie on the original polyline, while general shape is preserved as much as possible with so few line segments.
it'd be nice to also see an implementation.
One can find an implementation in MeshLib, see MRPolylineDecimate.h/.cpp
I'm currently writing a web application for creating and manipulating graphs (in the graph theory sense, not charts). For this, I want to implement a number of "arrange as ..." functions that take the selected vertices and arrange them into certain shapes (you can ignore the edges).
Now writing simple algorithms to arrange the vertices into a grid or circle is trivial. What I want to do though is to find a general algorithm for taking n actual vertex coordinates and n destination vertex coordinates, and finding an optimal (or near optimal) mapping from the former to the latter so that the sum or average (whichever is easiest) of distances the vertices need to be moved is minimized. The idea is that these functions should mostly just "clean up" an existing arrangement without fundamentally altering relative positions if the vertices are somewhat similar to the desired arrangement already.
For example, if I have 12 vertices arranged in a rough circle, labeled 1-12 like the hours on a clock, I would like my "arrange as circle" algorithm to snap them to a perfect circle with the same ordering 1-12 like the hours on a clock. If I have 25 vertices arranged in a rough 5x5 grid, I would like my "arrange as grid" algorithm to snap them to a perfect 5x5 grid with the same ordering.
Of course I could theoretically use a generalized constraints-optimization / hill-climbing algorithm or brute-force the permutation, but both are too inefficient to perform client-side in the browser. Is there a more specific, known method for finding good "low-energy" 1:1 mappings between lists of 2d coordinates?
This is known as the assignment problem. Or more specifically, the linear assignment problem (since the number of objects and destinations are the same). There are various algorithms to solve it. Most notably, the Hungarian algorithm.
See https://en.wikipedia.org/wiki/Assignment_problem
Your cost function C(i,j) will be simply
C(i,j) = distance between points i and j
Where the i points are your current locations and the j points are your destination locations.
Suppose we are given a small number of objects and "distances" between them -- what algorithm exists for fitting these objects to points in two-dimensional space in a way that approximates these distances?
The difficulty here is that the "distance" is not distance in Euclidean space -- this is why we can only fit/approximate.
(for those interested in what the notion of distance is precisely, it is the symmetric distance metric on the power set of a (finite) set).
Given that the number of objects is small, you can create an undirected weighted graph, where these objects would be nodes and the edge between any two nodes has the weight that corresponds to the distance between these two objects. You end up with n*(n-1)/2 edges.
Once the graph is created, there are a lot of visualization software and algorithms that correspond to graphs.
Try a triangulation method, something like this;
Start by taking three objects with known distances between them, and create a triangle in an arbitrary grid based on the side lengths.
For each additional object that has not been placed, find at least three other objects that have been placed that you have known distances to, and use those distances to place the object using distance / distance intersection (i.e. the intersection point of the three circles centred around the fixed points with radii of the distances)
Repeat until all objects have been placed, or no more objects can be placed.
For unplaced objects, you could start another similar exercise, and then use any available distances to relate the separate clusters. Look up triangulation and trilateration networks for more info.
Edit: As per the comment below, where the distances are approximate and include an element of error, the above approach may be used to establish provisional coordinates for each object, and those coordinates may then be adjusted using a least squares method such as variation of coordinates This would also cater for weighting distances based on their magnitude as required. For a more detailed description, check Ghilani & Wolf's book on the subject. This depends very much on the nature of the differences between your distances and how you would like your objects represented in Euclidean space based on those distances. The relationship needs to be modelled and applied as part of any solution.
This is an example of Multidimensional Scaling, or more generally, Nonlinear dimensionality reduction. There are a fair amount of tools/libraries for doing this available (see the second link for a list).
Problem Statement:
I have the following problem:
There are more than a billion points in 3D space. The goal is to find the top N points which has largest number of neighbors within given distance R. Another condition is that the distance between any two points of those top N points must be greater than R. The distribution of those points are not uniform. It is very common that certain regions of the space contain a lot of points.
Goal:
To find an algorithm that can scale well to many processors and has a small memory requirement.
Thoughts:
Normal spatial decomposition is not sufficient for this kind of problem due to the non-uniform distribution. irregular spatial decomposition that evenly divide the number of points may help us the problem. I will really appreciate that if someone can shed some lights on how to solve this problem.
Use an Octree. For 3D data with a limited value domain that scales very well to huge data sets.
Many of the aforementioned methods such as locality sensitive hashing are approximate versions designed for much higher dimensionality where you can't split sensibly anymore.
Splitting at each level into 8 bins (2^d for d=3) works very well. And since you can stop when there are too few points in a cell, and build a deeper tree where there are a lot of points that should fit your requirements quite well.
For more details, see Wikipedia:
https://en.wikipedia.org/wiki/Octree
Alternatively, you could try to build an R-tree. But the R-tree tries to balance, making it harder to find the most dense areas. For your particular task, this drawback of the Octree is actually helpful! The R-tree puts a lot of effort into keeping the tree depth equal everywhere, so that each point can be found at approximately the same time. However, you are only interested in the dense areas, which will be found on the longest paths in the Octree without even having to look at the actual points yet!
I don't have a definite answer for you, but I have a suggestion for an approach that might yield a solution.
I think it's worth investigating locality-sensitive hashing. I think dividing the points evenly and then applying this kind of LSH to each set should be readily parallelisable. If you design your hashing algorithm such that the bucket size is defined in terms of R, it seems likely that for a given set of points divided into buckets, the points satisfying your criteria are likely to exist in the fullest buckets.
Having performed this locally, perhaps you can apply some kind of map-reduce-style strategy to combine spatial buckets from different parallel runs of the LSH algorithm in a step-wise manner, making use of the fact that you can begin to exclude parts of your problem space by discounting entire buckets. Obviously you'll have to be careful about edge cases that span different buckets, but I suspect that at each stage of merging, you could apply different bucket sizes/offsets such that you remove this effect (e.g. perform merging spatially equivalent buckets, as well as adjacent buckets). I believe this method could be used to keep memory requirements small (i.e. you shouldn't need to store much more than the points themselves at any given moment, and you are always operating on small(ish) subsets).
If you're looking for some kind of heuristic then I think this result will immediately yield something resembling a "good" solution - i.e. it will give you a small number of probable points which you can check satisfy your criteria. If you are looking for an exact answer, then you are going to have to apply some other methods to trim the search space as you begin to merge parallel buckets.
Another thought I had was that this could relate to finding the metric k-center. It's definitely not the exact same problem, but perhaps some of the methods used in solving that are applicable in this case. The problem is that this assumes you have a metric space in which computing the distance metric is possible - in your case, however, the presence of a billion points makes it undesirable and difficult to perform any kind of global traversal (e.g. sorting of the distances between points). As I said, just a thought, and perhaps a source of further inspiration.
Here are some possible parts of a solution.
There are various choices at each stage,
which will depend on Ncluster, on how fast the data changes,
and on what you want to do with the means.
3 steps: quantize, box, K-means.
1) quantize: reduce the input XYZ coordinates to say 8 bits each,
by taking 2^8 percentiles of X,Y,Z separately.
This will speed up the whole flow without much loss of detail.
You could sort all 1G points, or just a random 1M,
to get 8-bit x0 < x1 < ... x256, y0 < y1 < ... y256, z0 < z1 < ... z256
with 2^(30-8) points in each range.
To map float X -> 8 bit x, unrolled binary search is fast —
see Bentley, Pearls p. 95.
Added: Kd trees
split any point cloud into different-sized boxes, each with ~ Leafsize points —
much better than splitting X Y Z as above.
But afaik you'd have to roll your own Kd tree code
to split only the first say 16M boxes, and keep counts only, not the points.
2) box: count the number of points in each 3d box,
[xj .. xj+1, yj .. yj+1, zj .. zj+1].
The average box will have 2^(30-3*8) points;
the distribution will depend on how clumpy the data is.
If some boxes are too big or get too many points, you could
a) split them into 8,
b) track the centre of the points in each box,
otherwide just take box midpoints.
3)
K-means clustering
on the 2^(3*8) box centres.
(Google parallel "k means" -> 121k hits.)
This depends strongly on K aka Ncluster, also on your radius R.
A rough approach would be to grow a
heap
of the say 27*Ncluster boxes with the most points,
then take the biggest ones subject to your Radius constraint.
(I like to start with a
Minimum spanning tree,
then remove the K-1 longest links to get K clusters.)
See also
Color quantization .
I'd make Nbit, here 8, a parameter from the beginning.
What is your Ncluster ?
Added: if your points are moving in time, see
collision-detection-of-huge-number-of-circles on SO.
I would also suggest to use an octree. The OctoMap framework is very good at dealing with huge 3D point clouds. It does not store all the points directly, but updates the occupancy density of every node (aka 3D box).
After the tree is built, you can use a simple iterator to find the node with the highest density. If you would like to model the point density or distribution inside the nodes, the OctoMap is very easy to adopt.
Here you can see how it was extended to model the point distribution using a planar model.
Just an idea. Create a graph with given points and edges between points when distance < R.
Creation of this kind of graph is similar to spatial decomposition. Your questions can be answered with local search in graph. First are vertices with max degree, second is finding of maximal unconnected set of max degree vertices.
I think creation of graph and search can be made parallel. This approach can have large memory requirement. Splitting domain and working with graphs for smaller volumes can reduce memory need.