Kd tree for triangle meshes is too slow - raytracing

The Kd algorithm starts with the creation of a root BSP node by partitioning an array of primitives (triangles, spheres, ...) in order to create two new arrays (left and right primitives) used for the creation of its two subtrees.
The left and right primitives are calculated by partitioning a given primitives array into two arrays. The partitioning plane position is calculated by taking the median of the middelpoints of the interval (projected onto a given axis (x,y or z)) of each triangle.
For example a triangle with x-coordinates: 1, 2, 3 has a middelpoint 1=(3-1)/2 (along the x-axis)
A triangle with x-coordinates: 2, 3, 8 has a middelpoint 3=(8-2)/2 (along the x-axis)
A triangle with x-coordinates: 4, 3, 8 has a middelpoint 2.5=(8-3)/2 (along the x-axis)
The primitive array containing these three triangles is partitioned by a plane through x=2.5 (median) parallel to the yz-plane. The resulting left primitives array contains the three triangles, the resulting right primitives array contains the three triangles.
The two resulting arrays with the left and right primitives are used to construct the left and right subtree of a Kd node (primitives are only stored at leaf nodes).
For the left subtree:
If (the left primitives are empty) then the left subtree points to NULL
else if (the number of left primitives is smaller than the minimal number || the depth == 1) then the left subtree is a leaf node
else the left subtree is another tree.
create the left subtree with the left primitives along the axis (++axis % 3) with --depth as depth and the same minimal number.
Analoguous for the right subtree.
The algorithm implemented works, but it's very slow because the tree is not very well partitioned. When ray-tracing a bunny of 5500 triangles, there are a lot of leaf nodes of 1 triangle and the last leaf node still contains 762 triangles.
Is there a better partitioning algorithm (because mine is just an implementation of a Kd tree for single points converted to surfaces), that balances the tree?
UPDATE: I searched for an algorithm or heuristic that can partition an array of intervals [t0,t1] into two arrays of intervals according to a cutting point. The left array contains the intervals to the left of the cutting point and those that contain the cutting point. Analoguous for the right array. Both arrays must have approximately the same number of intervals and the number of duplicate intervals must be as least as possible. This algorithm may not be of complexity O(n^2).

I'd suggest you to use Surface Area Heuristic (SAH) for finding optimal split.
The probability of intersection of the ray with subtree - is proportional to the surface area of the bounding box of that subtree.
But, if leaf-subtree contains lots of triangles - it means that we have to traverse them all.
So, the main idea of SAH is to minimize both: surface areas of subtrees and number of polygons inside of them.
Lets look at small 2D example:
Also, using of SAH - helps to determine terminate condition during the building of kd-tree:
1) At each step of building of kd-tree, before the splitting of subtree - you have to calculate SAH of current subtree
SAH_initial = number_of_polygons * area_of_subtree
2) Aftre that you have to find SAH of the most optimal split of current subtree
SAH_optimal = min(S_left * N_left + S_right * N_right)
3) Aftre that you have to compare:
define some split_cost
...
if( SAH_optimal + split_cost < SAH_initial ) {
it would be optimal to split that subtree
} else {
you don't have to split current subtree
}
Here is another answer on Stackoverflow, which contains referebces to the articles, about SAH.

The computation of the middlepoints looks incorrect. For the triangle with x coordinates 2,3,8, the middlepoint should be 2 + (8-2)/2 = 5. The triangle with x coordinates 1,2,3 should have middlepoint 1 + (3-1)/2 = 2. This could explain the unbalanced leaves.

Related

Efficient algorithm to find the largest rectangle from a set of points

I have an array of points, and my goal is to pick two so that I maximize the area of the rectangle formed by the two points (one representing the low left corner and the other one the right top corner).
I could do this in O(n^2) by just doing two for loops and calculating every single possible area, but I think there must be a more efficient solution:
max_area = 0
for p1 in points:
for p2 in points:
area = p2[0]p2[1] + p1[0]p1[1] - p2[1]p1[0] - p2[0]p1[1]
if area > max_area:
max_area = area
It's clear that I want to maximize the area of the second point with the origin (0,0) (so p2[0]p2[1]), but I'm not sure how to go forward with that.
Yes, there's an O(n log n)-time algorithm (that should be matched by an element distinctness lower bound).
It suffices to find, for each p1, the p2 with which it has the largest rectangular area, then return the overall largest. This can be expressed as a 3D extreme point problem: each p2 gives rise to a 3D point (p2[0], p2[1], p2[0] p2[1]), and each p1 gives rise to a 3D vector (-p1[0], -p1[1], 1), and we want to maximize the dot product (technically plus p1[0] p1[1], but this constant offset doesn't affect the answer). Then we "just" have to follow Kirkpatrick's 1983 construction.
Say you have a rectangle formed by four points: A (top left), B (top right), C (bottom right) and D (bottom left).
The idea is to find two points p1 and p2 that are the closest to B and D respectively. This means that p1 and p2 are the furthest possible from each other.
def nearest_point(origin, points):
nearest = None
mindist = dist(origin, points[0])
for p in points[1:]:
d = dist(origin, p)
if mindist > d:
mindist = d
nearest = p
return nearest
Call it for B and D as origins:
points = [...]
p1 = nearest_point(B, points) # one for loop
p2 = nearest_point(D, points) # one for loop
Note that there can be multiples closest points which are equally distant from the origin (B or D). In this case, nearest_point() should return an array of points. You have to do two nested for loops to find the furthest two points.
Divide and conquer.
Note: This algorithm presumes that the rectangle is axis-aligned.
Step 1: Bucket the points into a grid of 4x4 buckets. Some buckets
may get empty.
Step 2: Using the corners of the buckets, calculate
maximum areas by opposite corners between not empty buckets. This may result in
several pairs of buckets, because your work with corners, not points. Notice also that you use left corners for left buckets, and so for bottom, right, top corners for those b,r,t buckets. That's why an even number is used for the size of the grid.
Step 3: Re-bucket each bucket selected in step 2 as a new, smaller, 4x4 grid.
Repeat steps 2 & 3 until you get only a pair of buckets that contain only a point in each bucket.
I have not calculated the complexity of this algorithm. Seems O(n log(n)).

Depth of a grid map when represented as a tree

I have a nxn grid map. I want to represent this as a tree where each node is a block in the grid and each edge is the movement(left,right,up,and down) from one block to another. I want to know, if this grid was represented as a tree, what would be its depth.
I came up this formula but I am not sure if this is correct:
Since there are at most 4 movements possible from each block, at each level t of the tree there are at most 4^t nodes. So total number of nodes in tree is at most
4^0 + 4^1 + 4^2+.....4^d where d is the depth of the tree and this is greater than or equal to n^2(actual number of blocks in the grid).
When I solve this equation(by taking sum of the geometric series on the left side and then taking log base 4 on both sides) I get the following result:
d >= log(3n^2 + 1) - 1 (log is base 4)
But this still does not give me the exact depth of the tree.
Is there a way to find exact depth of the tree?

How to find independent points in a unit square in O(n log n)?

Consider a unit square containing n 2D points. We say that two points p and q are independent in a square, if the Euclidean distance between them is greater than 1. A unit square can contain at most 3 mutually independent points. I would like to find those 3 mutually independent points in the given unit square in O(n log n). Is it possible? Please help me.
Can this problem be solved in O(n^2) without using any spatial data structures such as Quadtree, kd-tree, etc?
Use a spatial data structure such as a Quadtree to store your points. Each node in the quadtree has a bounding box and a set of 4 child nodes, and a list of points (empty except for the leaf nodes). The points are stored in the leaf nodes.
The point quadtree is an adaptation of a binary tree used to represent two-dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order in which data is processed. It is often very efficient in comparing two-dimensional, ordered data points, usually operating in O(log n) time.
For each point, maintain a set of all points that are independent of that point.
Insert all your points into the quadtree, then iterate through the points and use the quadtree to find the points that are independent of each:
main()
{
for each point p
insert p into quadtree
set p's set to empty
for each point p
findIndependentPoints(p, root node of quadtree)
}
findIndependentPoints(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point q in n
if distance between p and q > 1
find intersection r of q's set and p's set
if r is non-empty then
p, q, r are the 3 points -> ***SOLVED***
add p to q's set of independent points
add q to p's set of independent points
for each subnode m of n (up 4 of them)
findIndependentPoints(p, m)
}
You could speed up this:
find intersection r of q's set and p's set
by storing each set as a quadtree. Then you could find the intersection by searching in q's quadtree for a point independent of p using the same early-out technique:
// find intersection r of q's set and p's set:
// r = findMututallyIndependentPoint(p, q's quadtree root)
Point findMututallyIndependentPoint(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point r in n
if distance between p and r > 1
return r
for each subnode m of n (up 4 of them)
findMututallyIndependentPoint(p, m)
}
An alternative to using Quadtrees is using K-d trees, which produces more balanced trees where each leaf node is a similar depth from the root. The algorithm for finding independent points in that case would be the same, except that there would only be up to 2 and not 4 child nodes for each node in the data structure, and the bounding boxes at each level would be of variable size.
You might want to try this out.
Pick the top left point (Y) with coordinate (0,1). Calculate distance from each point from the List to point Y.
Sort the result in increasing order into SortedPointList (L)
If the first point (A) and the last point (B) in list L are independent:
Foreach point P in list L:
if P is independent to both A and B:
Return A, B, P
Pick the top right point (X) with coordinate (1,1). Calculate distance from each point from the List to point X.
Sort the result in increasing order into SortedPointList (S)
If the first point (C) and the last point (D) in list L are independent:
Foreach point O in list S:
if P is independent to both C and D:
Return C, D, O
Return null
This is a wrong solution. Kept it just for comments. If one finds another solution based on smallest enclosing circle, please put a link as a comment.
Solve the Smallest-circle problem.
If diameter of a circle <= 1, return null.
If the circle is determined by 3 points, check which are "mutually independent". If there are only two of them, try to find the third by iteration.
If the circle is determined by 2 points, they are "mutually independent". Try to find the third one by iteration.
Smallest-sircle problem can be solved in O(N), thus the whole problem complexity is also O(N).

Maximum minimum manhattan distance

Input:
A set of points
Coordinates are non-negative integer type.
Integer k
Output:
A point P(x, y) (in or not in the given set) whose manhattan distance to closest is maximal and max(x, y) <= k
My (naive) solution:
For every (x, y) in the grid which contain given set
BFS to find closest point to (x, y)
...
return maximum;
But I feel it run very slow for a large grid, please help me to design a better algorithm (or the code / peseudo code) to solve this problem.
Should I instead of loop over every (x, y) in grid, just need to loop every median x, y
P.S: Sorry for my English
EDIT:
example:
Given P1(x1,y1), P2(x2,y2), P3(x3,y3). Find P(x,y) such that min{dist(P,P1), dist(P,P2),
dist(P,P3)} is maximal
Yes, you can do it better. I'm not sure if my solution is optimal, but it's better than yours.
Instead of doing separate BFS for every point in the grid. Do a 'cumulative' BFS from all the input points at once.
You start with 2-dimensional array dist[k][k] with cells initialized to +inf and zero if there is a point in the input for this cell, then from every point P in the input you try to go in every possible direction. The further you are from the start point the bigger integer you put in the array dist. If there is a value in dist for a specific cell, but you can get there with a smaller amount of steps (smaller integer) you overwrite it.
In the end, when no more moves can be done, you scan the array dist to find the cell with maximum value. This is your point.
I think this would work quite well in practice.
For k = 3, assuming 1 <= x,y <= k, P1 = (1,1), P2 = (1,3), P3 = (2,2)
dist would be equal in the beginning
0, +inf, +inf,
+inf, 0, +inf,
0, +inf, +inf,
in the next step it would be:
0, 1, +inf,
1, 0, 1,
0, 1, +inf,
and in the next step it would be:
0, 1, 2,
1, 0, 1,
0, 1, 2,
so the output is P = (3,1) or (3,3)
If K is not large enough and you need to find a point with integer coordinates, you should do, as another answer suggested - Calculate minimum distances for all points on the grid, using BFS, strarting from all given points at once.
Faster solution, for large K, and probably the only one which can find a point with float coordinates, is as following. It has complexity of O(n log n log k)
Search for resulting maximum distance using dihotomy. You have to check if there is any point inside the square [0, k] X [0, k] which is at least given distance away from all points in given set. Suppose, you can check that fast enough for any distance. It is obvious, that if there is such point for some distance R, there always will be some point for all smaller distances r < R. For example, the same point would go. Thus you can search for maximum distance using binary search procedure.
Now, how to fast check for existence (and also find) a point which is at least r units away from all given points. You should draw "Manhattan spheres of radius r" around all given points. These are set of points at most r units away from given point. They are tilted by 45 degrees squares with diagonal equal to 2r. Now turn the picture by 45 degrees, and all squares will be parallel to the axis. Now you can check for existence of any point outside such squares using sweeping line algorithm. You have to sort all vertical edges of squares, and then process them one by one from left to right. Left borders will add segment mark to sweeping line, Left borders will erase it. And you have to check if there is any non marked point on the line. You can implement it using segment tree. Then, you have to check if there is any non marked point on the line inside the initial square [0,k]X[0,k].
So, again, overall solution will be binary search for r. Inside of it you will have to check if there is any point at least r units away from all given points. Do that by constructing "manhattans spheres of radius r" and then scanning them with a diagonal line from left-top corner to right-bottom. While moving line you should store number of opened spheres at each point at the line in the segment tree. between opening and closing of any spheres, line does not change, and if there is any free point there, it means, that you found it for distance r.
Binary search contributes log k to complexity. Each checking procedure is n log n for sorting squares borders, and n log k (n log n?) for processing them all.
Voronoi diagram would be another fast solution and could also find non integer answer. But it is much much harder to implement even for Manhattan measure.
First try
We can turn a 2D problem into a 1D problem by projecting onto the lines y=x and y=-x. If the points are (x1,y1) and (x2,y2) then the manhattan distance is abs(x1-x2)+abs(y1-y2). Change coordinate to a u-v system with basis U = (1,1), V = (1,-1). Coords of the two points in this basis are u1 = (x1-y1)/sqrt(2), v1= (x1+y1), u2= (x1-y1), v2 = (x1+y1). And the manhatten distance is the largest of abs(u1-u2), abs(v1-v2).
How this helps. We can just work with the 1D u-values of each points. Sort by u-value, loop through points and find the largest difference between pains of points. Do the same of v-values.
Calculating u,v coords of O(n), quick sorting is O(n log n), looping through sorted list is O(n).
Alas does not work well. Fails if we have point (-10,0), (10,0), (0,-10), (0,10). Lets try a
Voronoi diagram
Construct a Voronoi diagram
using Manhattan distance. This can be calculate in O(n log n) using https://en.wikipedia.org/wiki/Fortune%27s_algorithm
The vertices in the diagram are points which have maximum distance from its nearest vertices. There is psudo-code for the algorithm on the wikipedia page. You might need to adapt this for Manhattan distance.

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Resources