Rasterize polygons on GPU for hit-testing

Rasterize polygons on GPU for hit-testing - performance

Task
Perform many hit-tests on polygons. I have about 10,000 polygons with 1,000 points each. About 100,000,000 hit-tests will be performed against these polygons. The polygons may touch each other but never overlap.
Solution
Simple point-in-polygon test for every hit-test.
Problem
Far too slow.
Improvement
Rasterize the polygons and check which polygon the pixel of the hit-tests belongs to.
New problem
Rasterization of the polygons is slow. I'm using this algorithm: http://alienryderflex.com/polygon_fill/
Idea
Rasterize the polygons on the GPU since it's optimized for this task and can perform it in hardware.
Question
What do you think about my idea and can you give me advice where to start? Will it lead to high performance?
Sidenote
The area is sparsly covered with polygons. I want to maintain a list of pixels instead of a bitmap.

This answer suggests to use the GPU if you can:
How can I determine whether a 2D Point is within a Polygon?
Your case seems to be dominated by the hit tests, which are O(1) with rasterization. GPUs are able to rasterize to memory, but I am not aware of any GPU / rendering API that is able to work with a list of points. Also, with a list of points, you may lose the O(1) runtime for the hit check. You may be able to save some memory by using a 16 bit raster buffer (not sure if that's feasible with your GPU setup -- but you only seem to need 10000 colors).

Related

Fast 3D mesh generation from pointcloud

I would like to build a simple mesh from a set of points in the fastest way possible. Hypothethically my point cloud could be in a very low number of points range (something like 1000 to 50000).
I've seen about 3D Delenauy triangularization and some other methods, but most of the time I don't find speed reported on papers and other times I see huge computational times in the order of minutes.
An interesting algorithm I've found is this: https://doc.cgal.org/latest/Poisson_surface_reconstruction_3/index.html
My main concern is that this is used for making 2D surfaces in 3D space, while I have points in my pointcloud which would lie in the interior of the final volume.
Could you suggest me some algorithms which could be useful in my scenario? And also a raw estimate of computational times? Is it possible to make such task in less than 5 seconds?
Think I'm not trying to make human faces, sculptures, or stuff like that. The meshes I'm trying to reconstruct are always pretty polyhedrical.
Thanks for your attention

Very fast boolean difference between two meshes

Let's say I have a static object and a movable object which can be moved and rotated, what is the best way to very quickly calculate the difference of those two meshes?
Precision here is not so important, speed is though, since I have to use it in the update phase of the main loop.
Maybe, given the strict time limit, modifying the static object's vertices and triangles directly is to be preferred. Should voxels be preferred here instead?
EDIT: The use case is an interactive viewer of a wood panel (parallelepiped) and a milling tool (a revolved contour, some like these).
The milling tool can be rotated and can work oriented at varying degrees (5 axes).
EDIT 2: The milling tool may not pierce the wood.
EDIT 3: The panel can be as large as 6000x2000mm and the milling tool can be as little as 3x3mm.

If you need the best possible performance then the generic CSG approach may be too slow for you (but still depending on meshes and target hardware).
You may try to find some specialized algorithm, coded for your specific meshes. Let's say you have two cubes - one is a 'wall' and second is a 'window' - then it's much easier/faster to compute resulting mesh with your custom code, than full CSG. Unfortunately you don't say anything about your meshes.
You may also try to make it a 2D problem, use some simplified meshes to compute the result that will 'look like expected'.
If the movement of your meshes is somehow limited you may be able to precompute full or partial results for different mesh combinations to use at runtime.
You may use some space partitioning like BSP or Octrees to divide your meshes during precomputing stage. This way you could split one big problem into many smaller ones that may be faster to compute or at least to make the solution multi-threaded.
You've said about voxels - if you're fine with their look and limits you may voxelize both meshes and just read and mix two voxel values, instead of one. Then you would triangulate it using algorithm like Marching Cubes.
Those are all just some general ideas but we'll need better info to help you more.
EDIT:
With your description it looks like you're modeling some bas-relief, so you may use Relief Mapping to fake this effect. It's based on a height map stored as a texture, so you'd need to just update few pixels of the texture and render a plane. It should be quite fast compared to other approaches, the downside is that it's based on height map, so you can't get shapes that Tee Slot or Dovetail cutter would create.
If you want the real geometry then I'd start from a simple plane as your panel (don't need full 3D yet, just a front surface) and divide it with a 2D grid. The grid element should be slightly bigger than the drill size and every element is a separate mesh. In the frame update you'd cut one, or at most 4 elements that are touched with a drill. Thanks to this grid all your cutting operations will be run with very simple mesh so they may work with your intended speed. You can also cut all current elements in separate threads. After the cutting is done you'll upload to the GPU only currently modified elements so you may end up with quite complex mesh but small modifications per frame.

What is the fastest way of edge detection?

I am thinking of implement a image processing based solution for industrial problem.
The image is consists of a Red rectangle. Inside that I will see a matrix of circles. The requirement is to count the number of circles under following constraints. (Real application : Count the number of bottles in a bottle casing. Any missing bottles???)
The time taken for the operation should be very low.
I need to detect the red rectangle as well. My objective is to count the
items in package and there are no
mechanism (sensors) to trigger the
camera. So camera will need to capture
the photos continuously but the
program should have a way to discard
the unnecessary images.
Processing should be realtime.
There may be a "noise" in image capturing. You may see ovals instead of circles.
My questions are as follows,
What is the best edge detection algorithm that matches with the given
scenario?
Are there any other mechanisms that I can use other than the edge
detection?
Is there a big impact between the language I use and the performance of
the system?

AHH - YOU HAVE NOW TOLD US THE BOTTLES ARE IN FIXED LOCATIONS!
IT IS AN INCREDIBLY EASIER PROBLEM.
All you have to do is look at each of the 12 spots and see if there is a black area there or not. Nothing could be easier.
You do not have to do any edge or shape detection AT ALL.
It's that easy.
You then pointed out that the box might be rotatated, things could be jiggled. That the box might be rotated a little (or even a lot, 0 to 360 each time) is very easily dealt with. The fact that the bottles are in "slots" (even if jiggled) massively changes the nature of the problem. You're main problem (which is easy) is waiting until each new red square (crate) is centered under the camera. I just realised you meant "matrix" literally and specifically in the sentence in your original questions. That changes everything totally, compared to finding a disordered jumble of circles. Finding whether or not a blob is "on" at one of 12 points, is a wildly different problem to "identifying circles in an image". Perhaps you could post an image to wrap up the question.
Finally I believe Kenny below has identified the best solution: blob analysis.
"Count the number of bottles in a bottle casing"...
Do the individual bottles sit in "slots"? ie, there are 4x3 = 12 holes, one for each bottle.
In other words, you "only" have to determine if there is, or is not, a bottle in each of the 12 holes.
Is that correct?
If so, your problem is incredibly easier than the more general problem of a pile of bottles "anywhere".
Quite simply, where do we see the bottles from? The top, sides, bottom, or? Do we always see the tops/bottoms, or are they mixed (ie, packed top-to-tail). These issues make huge, huge differences.

Surf/Sift = overkill in this case you certainly don't need it.
If you want real time speed (about 20fps+ on a 800x600 image) I recommend using Cuda to implement edge detection using a standard filter scheme like sobel, then implement binarization + image closure to make sure the edges of circles are not segmented apart.
The hardest part will be fitting circles. This is assuming you already got to the step where you have taken edges and made sure they are connected using image closure (morphology.) At this point I would proceed as follows:
run blob analysis/connected components to segment out circles that do not touch. If circles can touch the next step will be trickier
for each connected componet/blob fit a circle or rectangle using RANSAC which can run in realtime (as opposed to Hough Transform which I believe is very hard to run in real time.)
Step 2 will be much harder if you can not segment the connected components that form circles seperately, so some additional thought should be invested on how to guarantee that condition.
Good luck.
Edit
Having thought about it some more, I feel like RANSAC is ideal for the case where the circle connected components do touch. RANSAC should hypothetically fit the circle to only a part of the connected component (due to its ability to perform well in the case of mostly outlier points.) This means that you could add an extra check to see if the fitted circle encompasses the entire connected component and if it does not then rerun RANSAC on the portion of the connected component that was left out. Rinse and repeat as many times as necessary.
Also I realize that I say circle but you could just as easily fit an ellipse instead of circles using RANSAC.
Also, I'd like to comment that when I say CUDA is a good choice I mean CUDA is a good choice to implement the sobel filter + binirization + image closing on. Connected components and RANSAC are probably best left to the CPU, but you can try pushing them onto CUDA though I don't know how much of an advantage a GPU will give you for those 2 over a CPU.

For the circles, try the Hough transform.
other mechanisms: dunno
Compiled languages will possibly be faster.

SIFT should have a very good response to circular objects - it is patented, though. GLOHis a similar algorithm, but I do not know if there are any implementations readily available.
Actually, doing some more research, SURF is an improved version of SIFT with quite a few implementations available, check out the links on the wikipedia page.

Sum of colors + convex hull to detect boundary. You need, mostly, 4 corners of a rectangle, and not it's sides?
No motion, no second camera, a little choice - lot of math methods against a little input (color histograms, color distribution matrix). Dunno.
Java == high memory consumption, Lisp == high brain consumption, C++ == memory/cpu/speed/brain use optimum.

If the contrast is good, blob analysis is the algorithm for the job.

How does 3D collision / object detection work?

I'v always wondered this. In a game like GTA where there are 10s of thousands of objects, how does the game know as soon as you're on a health pack?
There can't possibly be an event listener for each object? Iterating isn't good either? I'm just wondering how it's actually done.

There's no one answer to this but large worlds are often space-partitioned by using something along the lines of a quadtree or kd-tree which brings search times for finding nearest neighbors below linear time (fractional power, or at worst O( N^(2/3) ) for a 3D game). These methods are often referred to as BSP for binary space partitioning.
With regards to collision detection, each object also generally has a bounding volume mesh (set of polygons forming a convex hull) associated with it. These highly simplified meshes (sometimes just a cube) aren't drawn but are used in the detection of collisions. The most rudimentary method is to create a plane that is perpendicular to the line connecting the midpoints of each object with the plane intersecting the line at the line's midpoint. If an object's bounding volume has points on both sides of this plane, it is a collision (you only need to test one of the two bounding volumes against the plane). Another method is the enhanced GJK distance algorithm. If you want a tutorial to dive through, check out NeHe Productions' OpenGL lesson #30.
Incidently, bounding volumes can also be used for other optimizations such as what are called occlusion queries. This is a process of determining which objects are behind other objects (occluders) and therefore do not need to be processed / rendered. Bounding volumes can also be used for frustum culling which is the process of determining which objects are outside of the perspective viewing volume (too near, too far, or beyond your field-of-view angle) and therefore do not need to be rendered.
As Kylotan noted, using a bounding volume can generate false positives when detecting occlusion and simply does not work at all for some types of objects such as toroids (e.g. looking through the hole in a donut). Having objects like these occlude correctly is a whole other thread on portal-culling.

Quadtrees and Octrees, another quadtree, are popular ways, using space partitioning, to accomplish this. The later example shows a 97% reduction in processing over a pair-by-pair brute-force search for collisions.

A common technique in game physics engines is the sweep-and-prune method. This is explained in David Baraff's SIGGRAPH notes (see Motion with Constraints chapter). Havok definitely uses this, I think it's an option in Bullet, but I'm not sure about PhysX.
The idea is that you can look at the overlaps of AABBs (axis-aligned bounding boxes) on each axis; if the projection of two objects' AABBs overlap on all three axes, then the AABBs must overlap. You can check each axis relatively quickly by sorting the start and end points of the AABBs; there's a lot of temporal coherence between frames since usually most objects aren't moving very fast, so the sorting doesn't change much.
Once sweep-and-prune detects an overlap between AABBs, you can do the more detailed check for the objects, e.g. sphere vs. box. If the detailed check reveals a collision, you can then resolve the collision by applying forces, and/or trigger a game event or play a sound effect.

Correct. Normally there is not an event listener for each object. Often there is a non-binary tree structure in memory that mimics your games map. Imagine a metro/underground map.
This memory strucutre is a collection of things in the game. You the player, monsters and items that you can pickup or items that might blowup and do you harm. So as the player moves around the game the player object pointer is moved in the game/map memory structure.
see How should I have my game entities knowledgeable of the things around them?

I would like to recommend the solid book of Christer Ericson on real time collision detection. It presents the basics of collision detection while providing references on the contemporary research efforts.
Real-Time Collision Detection (The Morgan Kaufmann Series in Interactive 3-D Technology)

There are a lot of optimizations can be used.
Firstly - any object (say with index i for example) is bounded by cube, with center coordinates CXi,CYi, and size Si
Secondly - collision detection works with estimations:
a) Find all pairs cubes i,j with condition: Abs(CXi-CXj)<(Si+Sj) AND Abs(CYi-CYj)<(Si+Sj)
b) Now we work only with pairs got in a). We calculate distances between them more accurately, something like Sqrt(Sqr(CXi-CXj)+Sqr(CYi-CYj)), objects now represented as sets of few numbers of simple figures - cubes, spheres, cones - and we using geometry formulas to check these figures intersections.
c) Objects from b) with detected intersections are processed as collisions with physics calculating etc.

How to speed up marching cubes?

I'm using this marching cube algorithm to draw 3D isosurfaces (ported into C#, outputting MeshGeomtry3Ds, but otherwise the same). The resulting surfaces look great, but are taking a long time to calculate.
Are there any ways to speed up marching cubes? The most obvious one is to simply reduce the spatial sampling rate, but this reduces the quality of the resulting mesh. I'd like to avoid this.
I'm considering a two-pass system, where the first pass samples space much more coarsely, eliminating volumes where the field strength is well below my isolevel. Is this wise? What are the pitfalls?
Edit: the code has been profiled, and the bulk of CPU time is split between the marching cubes routine itself and the field strength calculation for each grid cell corner. The field calculations are beyond my control, so speeding up the cubes routine is my only option...
I'm still drawn to the idea of trying to eliminate dead space, since this would reduce the number of calls to both systems considerably.

I know this is a bit old, but I recently implemented Marching Cubes based on much the same source. There is a LOT of inefficiency here. At a minimum if you were doing something like
for (int x=0; x<densityArrayWidth; x++)
for (int z=0; z<densityArrayLength; z++)
for (int y=0; y<densityArrayHeight; y++)
Polygonize(Gridcell, isolevel, Triangles)
Look at how many times you'd be reallocating the edgeTable and Tritable! Those immediately need to move out to the overall class. I ditched the gridCell object as well, going directly from the points/values to the triangles.
In short it isn't just the algorithmic complexity, memory allocations (and in the base this does a huge amount of them) take time also.

Just in case anyone else ends up here, dead-space elimination through a coarser sampling rate makes virtually no difference at all. Any remotely safe (ie: allowing a border for sampling artifacts) coarser sampling ends up grabbing most of the grid anyway in any remotely non-trivial field.
Speeding up the underlying field evaluation (with heavy memoisation) seemed to mostly solve the performance problems.

Try marching tetrahedra instead -- the math is simpler, allowing you to consider fewer cases per cell.

each cube has 12 edges, if you go through each cube and find 12 intersection points, you are doing 4 times too many calculations for intersection points- you have to only use 3 edges in the bottom left corner of each cube, with an extra row in the top right corner of the zone, and then use a special upgrade to access all the values that you have found. I'm going to do a topic on this because it needs to be discussed and it's complicated.
Also, testing for areas in space that need polygons, by assessing the ISO level using Octree, and skipping areas far from the ISO level.
I had a look at propagation, but it isn't that reliable and efficient.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio