Clustering Data in a 3D matrix with another matrix

Clustering Data in a 3D matrix with another matrix - algorithm

I Have got 2 Data cubes represented as 3D matrices. Both of them will be of same dimensions. we have to do rule based ordering. our condition is that if any sub cube of both of them ( sub cube must match exactly in location and orientation) matches atleast p% we can tell that they are similar. now given two 3D matrices containing the data , we have to write an algorithm which prints the number of similar subcubes that are similar in the given two cubes.
I tried brute force algorithm but it turned out to be very slow on large data sets. Is there any specific algorithm I can use here or any technique??
Thanks in advance.

We can adapt the first solution in this question. Construct another 3D matrix called count and fill all its edge cells corresponding to matching data with 1s. Then, starting from count(1,1,1), consider the cells in lexicographic order and set the count(i, j, k) for i,j,k such that the data matches to the smallest value of any of its neighbours which have already been set. If the data doesn't match, set count(i, j, k) = 0.
At the end, the non-zero elements of count contain the matching cubes, and their value denotes the width of the cube.

Related

Largest Rectangle size after each query (Algorithm)

I recently came across this algorithmic question in an interview. The question goes something like:
Initially there is a rectangle (starting at the origin (0,0) and ending at (n,m)) given. Then there are q queries like x=r or y=c which basically divides the initial rectangles into smaller rectangles. After each query, we have to return the largest rectangle size currently present.
See the diagram:
So, here we were initially given a rectangle from (0,0) to (6,6) [a square in fact!!]. Now after the 1st query (shown as dotted line above) x = 2, the largest rectangle size is 24. After the second query y = 1, the largest rectangle size is 20. And this is how it goes on and on.
My approach to solving this:
At every query, find:
The largest interval on the x axis (maxX) [keep storing all the x = r values in a list]
The largest interval on y axis (maxY) [keep storing all the y = c values in another list]
At every query, your answer is (maxX * maxY)
For finding 1 and 2, I will have to iterate through the whole list, which is not very efficient.
So, I have 2 questions:
Is my solution correct? If not, what is the correct approach to the problem. If yes, how can I optimise my solution?

It's correct but takes O(n) time per query.
You could, for each dimension, have one binary search tree (or other sorted container with O(log n) operations) for the coordinates (initially two) and one for the interval sizes. Then for each query in that dimension:
Add the new coordinate to the coordinates.
From its neighbors, compute the interval's old size and remove that from the sizes.
Compute the two new intervals' sizes and add them to the sizes.
The largest size is at the end of the sizes.
Would be O(log n) per query.

Yes, your algorithm is correct.
To optimize it, first of all, consider only one dimension, because the two dimensions in your geometry are fully orthogonal.
So, you need to have a data structure which holds a partitioning of an interval into sub-intervals, and supports fast application of these two operations:
Split a given interval into two
Find a largest interval
You can do that by using two sorted lists, one sorted by coordinate, and the other sorted by size. You should have pointers from one data structure to the other, and vice-versa.
To implement the "splitting" operation:
Find the interval which you should split, using binary search in the coordinate-sorted list
Remove the interval from both lists
Add two smaller intervals to both lists

Grouping large set of similar vectors

I have a 3d mesh of ~200,000 triangles.
To find all the flat (or near enough flat) surfaces on the model I thought I could try and group triangles by their normal vectors (giving me ones which face the same way) and then I can search these smaller sets for ones which are similar in position or connected.
I cannot think of a good way to practically do this while also keeping things relatively speedy. I have come up with solutions which would take n² but none which are elegant and quicker than that.
I have vertex information and triangle information (vertices, centre and normal).
Any suggestions would be appreciated.

It is possible that I have misunderstood the problem so I am stating what I think you need to do : "Given a set of vectors, group parallel vectors together".
You could use a hash-map to solve this problem. I am assuming that you stored the normal vectors in the form:
a + b + c = 0
You just need to write a function that converts a vector to an integer, for example, if I know that 0 <= a, b, c <= 1000, then I can use F(a, b, c) = a + 1000b + 1000000c which guarantees unique integer for every unique vector. After this, its just a matter of creating a hashmap which maps some integer to a list and store all the parallel vectors in the same list.

You want to find connected components on the graph from your triangles. The only thing you need is to store adjacency information in a convenient form.
Create a list of all edges (min, max), if all edges have two triangles adjacent, then there are 300'000 edges. This can be done in linear time:
For every vertex count number of adjacent vertices with greater index, do the partial sum on these numbers.
Allocate and fill an array for edges (second vertex and utility data). Use array from step 1 to access edges adjacent to a vertex. Such an access can be done in the constant time if we know that the number of edges adjacent to a vertex is bounded from above by a constant and the whole step can be done in the linear time.
So, mentioned utility data is the numbers of pair of triangles adjacent to the edge.
Ok, now you have adjacency info. It is time to find connected components. You can use DFS for it. It will work in the linear time because every triangle has three (constant number of) neighbors.
Here you need to allocate 200'000 * sizeof(int) * 4 bytes. And it can be done in the linear time.
You could also want to read about doubly connected edge list, but it is pretty expensive.

2D grid data structure for nearest free cell

Consider a 2000 x 2000 2D bool array. 100,000 elements are set to true, the rest to false.
Given a cell (x1,y1) we need to find the nearest cell (x2,y2) (by manhattan distance: abs(x1-x2) + abs(y1-y2)) that is false.
One way to do that would be to:
for (int dist = 0; true; dist++)
for ((x2,y2) in all cells dist away from (x1,y1))
if (!array[x2,y2])
return (x2,y2);
In the worst case we would have to iterate through 100,000 cells before finding the free one.
Is there a data structure we could use rather than a 2D array that would allow us to perform this search quicker?

If the data is constant and you have many queries on it:
You might want to use a k-d tree, and look for the nearest neighbor. Insert (i,j) for each element such that arr[i][j] = false. The standard k-d tree uses euclidean distance but I think one can modify it to use manhattan distances instead..
If the data is used for one query:
You will need at least Omega(n*m) ops to read the data and insert it into any data structure - so no point in doing that - the suggested solution will outperform only the build up of any data structure.

You might be interested into look into Region QuadTree. Here initially the entire image is modeled as the root since the image contains all 0s (assumption). Then when a particular pixel is set, the image is divided into 4 quadrants first and the 3 quadrants where the pixel is not included are left as leaves. The remaining quadrant is subdivided again and so on. This is reached till we have 4 point leaves out of which one is set.
This representation will help to rule-out entire regions during the search and the search time can be optimized to O(log n)

Querying a collection of rectangles for the overlap of an input rectangle

In a multi-dimensional space, I have a collection of rectangles, all of which are aligned to the grid. (I am using the word "rectangles" loosely - in a three dimensional space, they would be rectangular prisms.)
I want to query this collection for all rectangles that overlap an input rectangle.
What is the best data structure for holding the collection of rectangles? I will be adding rectangles to and removing rectangles from the collection from time to time, but these operations will be infrequent. The operation I want to be fast is the query.
One solution is to keep the corners of the rectangles in a list, and do a linear scan over the list, finding which rectangles overlap the query rectangle and skipping over the ones that don't.
However, I want the query operation to be faster than linear.
I've looked at the R-tree data structure, but it holds a collection of points, not a collection of rectangles, and I don't see any obvious way to generalize it.
The coordinates of my rectangles are discrete, in case you find that helpful.
I am interested in the general solution, but I will also tell you the properties of my specific problem: my problem space has three dimensions, and their multiplicity varies wildly. The first dimension has two possible values, the second dimension has 87 values, and the third dimension has 1.8 million values.

You can probably use KD-Trees which can be used for rectangles according to the wiki page:
Variations
Instead of points
Instead of points, a kd-tree can also
contain rectangles or
hyperrectangles[5]. A 2D rectangle is
considered a 4D object (xlow, xhigh,
ylow, yhigh). Thus range search
becomes the problem of returning all
rectangles intersecting the search
rectangle. The tree is constructed the
usual way with all the rectangles at
the leaves. In an orthogonal range
search, the opposite coordinate is
used when comparing against the
median. For example, if the current
level is split along xhigh, we check
the xlow coordinate of the search
rectangle. If the median is less than
the xlow coordinate of the search
rectangle, then no rectangle in the
left branch can ever intersect with
the search rectangle and so can be
pruned. Otherwise both branches should
be traversed. See also interval tree,
which is a 1-dimensional special case.

Let's call the original problem by PN - where N is number of dimensions.
Suppose we know the solution for P1 - 1-dimensional problem: find if a new interval is overlapping with a given collection of intervals.
Once we know to solve it, we can check if the new rectangle is overlapping with the collection of rectangles in each of the x/y/z projections.
So the solution of P3 is equivalent to P1_x AND P1_y AND P1_z.
In order to solve P1 efficiently we can use sorted list. Each node of the list will include coordinate and number-of-opened-intetrvals-up-to-this-coordinate.
Suppose we have the following intervals:
[1,5]
[2,9]
[3,7]
[0,2]
then the list will look as follows:
{0,1} , {1,2} , {2,2}, {3,3}, {5,2}, {7,1}, {9,0}
if we receive a new interval, say [6,7], we find the largest item in the list that is smaller than 6: {5,2} and smllest item that is greater than 7: {9,0}.
So it is easy to say that the new interval does overlap with the existing ones.
And the search in the sorted list is faster than linear :)

You have to use some sort of a partitioning technique. However, because your problem is constrained (you use only rectangles), the data-structure can be a little simplified. I haven't thought this through in detail, but something like this should work ;)
Using the discrete value constraint - you can create a secondary table-like data-structure where you store the discrete values of second dimension (the 87 possible values). Assume that these values represent planes perpendicular to this dimension. For each of these planes you can store, in this secondary table, the rectangles that intersect these planes.
Similarly for the third dimension you can use another table with as many equally spaced values as you need (1.8 million is too much, so you would probably want to make this at least a couple of magnitudes smaller), and create a map the rectangles that are between two chosen values.
Given a query rectangle you can query the first table in constant time to determine a set of tables which possibly intersects this query. Then you can do another query on the second table, and do an intersection of the results from the first and the second query results. This should narrow down the number of actual intersection tests that you have to perform.

Visiting the points in a triangle in a random order

For a right triangle specified by an equation aX + bY <= c on integers
I want to plot each pixel(*) in the triangle once and only once, in a pseudo-random order, and without storing a list of previously hit points.
I know how to do this with a line segment between 0 and x
pick a random point'o' along the line,
pick 'p' that is relatively prime to x
repeat for up to x times: Onext = (Ocur + P) MOD x
To do this for a triangle, I would
1. Need to count the number of pixels in the triangle sans lists
2. Map an integer 0..points into a x,y pair that is a valid pixel inside the triangle
I hope any solution could be generalized to pyramids and higher dimensional shapes.
(*) I use the CG term pixel for the pair of integer points X,Y such that the equation is satisfied.

Since you want to guarantee visiting each pixel once and only once, it's probably better to think in terms of pixels rather than the real triangles.
You can slice the triangles horizontally and get bunch of horizontal scan lines. Connect the scan lines together and you have converted your "triangle" into a long line. Apply your point visiting algorithm to your long chain of scan lines.
By the way, this mapping only needs to happen on paper, all you need is a function that can return (x, y) given (t) along the virtual scan line.
Edit:
To convert two points to a line segment, you can look for Bresenham's scan conversion. Once you get the 3 line segments converted into series of points, you can put all points into a bucket and group all points by y. Within the same y-value, sort points by x. The smallest x within a y-value is the begin point of the scan line and the largest x within the y-value is the end point of the scan line. This is called "scan converting triangle". You can find more info if you Google.

Here's a solution for Triangle Point Picking.
What you have to do is choose two vectors (sides) of your triangle, multiply each with a random number in [0,1] and add them up. This will provide a uniform distribution in the quadrilateral defined by the vectors. You'll have to check whether the result lies inside the original triangle; if it doesn't either transform it back in or simply discard it and try again.

One method is to put all of the pixels into an array and then shuffle the array (this is O(n)), then visit the pixels in the order in the shuffled array. This could require quite a lot of memory though.

Here's a method which wastes some CPU time but probably doesn't waste as much as a more complicated method would do.
Compute a rectangle that circumscribes the triangle. It will be easy to "linearize" that rectangle, each scan line followed by the next. Use the algorithm that you already know in order to traverse the pixels of the rectangle. When you hit each pixel, check if the pixel is in the triangle, and if not then skip it.

I would consider the lines of the triangle as single line, which is cut into segments. The segments would be stored in an array where the length of the segment also stored as well as the offset in the total length of the lines. Then depending on the value of O, you can select which array element contains the pixel you want to draw at that moment based on this information and paint the pixel based on the values in the element.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Clustering Data in a 3D matrix with another matrix - algorithm

Related

Largest Rectangle size after each query (Algorithm)

Grouping large set of similar vectors

2D grid data structure for nearest free cell

Querying a collection of rectangles for the overlap of an input rectangle

Visiting the points in a triangle in a random order

Categories

Resources