best way to handle a set of 2D matrices in opencl - parallel-processing

I'm wondering what the best way to handle a bunch of 2D arrays in openCL would be. The target platform in my case is GPGPU. Based on my problem, I think it would work best to have each workgroup manage an array.
As far as passing the array to each workgroup goes, I'm tempted to set global_worksize=numArrays*N*M and local_worksize=N*M in clEnqueueNDRangeKernel. Then I would address the array like a 3D array:
(numArrays*localSize*wgroupID)+localSize*x+y
does this make sense? I've been trying to search the internet to recover some best practices or examples, but I'm having a very difficult time doing so.
Thanks!

Related

What is the best data structure for an AABB collision checking physics engine?

I need an engine which consists of a world populated with axis-aligned bounding boxes (AABBs). A continuous loop will be executed, doing the following:
for box_a in world
box_a = do_something(box_a)
for box_b in world
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
The problem with that is, obviously, that this is O(n^2). I have managed to make this loop much faster partitioning the space in chunks, so this became:
for box_a in world
box_a = do_something(box_a)
for chunk in box_a.neighbor_chunks
for box_b in chunk
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
This is much faster but a little crude. Given that there is such a faster algorithm with not a lot of effort, I'd bet there is a data structure I'm not aware of that generalizes what I've done here, providing much better scalability for this algorithm.
So, my question is: what is the name of this problem and what are the optimal algorithms and data-structures to implement it?
this is indeed a generic problem of computer science : space partitionning.
its used in raytracing, path tracing, raster rendering, physics, IA, games, and pretty sure in HPC, databases, matrix maths, whatever science (molecules, pharmacy....), and I bet thousands of other stuff.
there is no 1 best structure, I have a friend who did his master on an algorithm to tesselate a point of cloud coming out of a laser scanner (billions of data) and in his case the best data structure was to mix a collection of uniforms 3D grids with some octree.
For other people kd-tree is the best, for other people, BVH trees are the best.
I like the grid system but it cannot work if the space is too wide because all cells has to exist.
One day I even implemented a sparse grid system using a hash map, it worked, I didn't bother to profile or investigate the performance so I wouldn't know if its an excellent way, I know its one way though.
To do that, you make a KEY class which is basically a 3D position vector hasher, first you apply an integer division on the coordinates to define the size of one grid cell. Then you stupidely hash all coordinates into one hash and provide a hash_value method or friend method. an equality operator and then its usable in a hash map.
You can use a google::sparse_map or something along these lines. I personally used boost::unordered and it was enough in my case.
Then the thing to consider is the presence of AABB into more than one cell. You can store a reference in every cell covered by your AABB, its just something to be aware of in every algorithm : "there is no 1-1 relationship between cell references and AABB." that's all.
good luck

Any good nearest-neighbors algorithm for similar images?

I am looking for an algorithm that can search for similar images in a large collection.
I'm currently using a SURF implementation in OpenCL.
At first I used the KNN search algorithm to compare every image's interrest points to the rest of the collection but tests revealed that it doesn't scale well. I've also tried a Hadoop implementation of KNN-Join which really takes a lot of temporary space in HDFS, way too much compared to the amount of input data. In fact pairwise distance approach isn't really appropriate because of the dimension of my input vectors (64).
I heard of Locally Sensitive Hashing and wondered if there was any free implementation, or if it's worth implementing it, maybe there's another algorithm I am not aware of ?
IIRC the flann algorithm is a good compromise:
http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

How should I index for a simple world of rectangles?

The world consists of many (1k-10k) rectangles of similar sizes, and I need to be able to quickly determine potential overlaps when trying to add a new rectangle. Rectangles will be added and removed dynamically. Are R-Trees appropriate here? If so, are there any good libraries I should consider? (I'm open to suggestions in any language).
R-Trees would be suitable, yes.
quad trees are also a good data structure for quickly finding objects in a region of 2D space. They are really a more uniform version of r-trees. Using these structures you can quickly zero in on a small region of space, with very few tests, even with massive data sets.
There is a c# implementation here, though I have not looked at it.
This kind of data structure (and it's 3D version called Octrees) are often used in games to manage the large data sets of objects that need to know if they are near any other objects for collision testing, and all kinds of other fun reasons.
You should be able to find lots of articles and examples of these kinds of data structures in the games industry sites, like gamasutra and opengl.org
You can also look up to kd-trees.
I don't know of any implementation but in 3D at least they are usually considered more performant than Octrees. For example, here is a return of experience I just googled it.
You may want to consider alternative to quad trees if you ever have a problem of performance.
However it should be noted that kd-trees are hard to rebalance...

Storing Large 2D Game Worlds

I've been experimenting with different ideas of how to store a 2D game world. I'm interested in hearing techniques of storing large quantities of objects while managing the set that's visible ( lets say 100,000 tiles square ). Obviously the techniques can vary based on how the game renders that space.
Lets assume that we're describing a scrolling 2d game world rather than screen based as you could fairly easily do screen based rendering from such a setup while the converse is a bit more messy.
Looking for language agnostic solutions here so it's more helpful to others.
Edit: I think a good answer here would be a general review of the ideas to consider when thinking about this, as some of the responders have attempted, but also begin to explain how different solutions would apply to those scenarios. It's a somewhat complex question, so I would expect a good answer to reflect that.
Quadtrees are a fairly efficient solution for storing data about a large 2-dimensional world and the objects within it.
You might get some ideas on how to implement this from some spatial data structures like range or kd trees.
However, the answer to this question would vary considerably depending exactly on how your game works.
Are we talking a 2D platformer with 10 enemies onscreen, 20 more offscreen but "active", and an unknown number more "inactive"? If so, you can probably store your whole level as an array of "screens" where you manipulate the ones closest to you.
Or do you mean a true 2D game with lots of up/down movement too? You might have to be a bit more careful here.
The platform is also of some importance. If you're implementing a simple platformer for desktop PCs, you probably wouldn't have to worry about performance as much as you would on an embedded device. This is no excuse to be naive about it, but you might not have to be terribly clever either.
This is a somewhat interesting question I think. Presumably someone smarter than I who has experience with implementing platformers has thought these things out already.
Break the world into smaller areas, and deal with them. Any solution to this problem is going to boil down to this concept (such as quadtrees, mentioned in another answer). The differences will be in how they subdivide the world.
How much data is stored per tile? How fast can players move across the world? What's the behavior of NPCs, etc., that are offscreen? Do they just reset when the player comes back (like old Zelda games)? Do they simply resume where they were? Do they do some kind of catch-up script?
How much different rendering data is going to be needed for different areas?
How much of the world can be seen at one time?
All of these questions are going to immpact your solution, as well as the capabilities of your platform. Coming up with a general answer for these without having a reasonable idea of these parameters is going to be a bit difficult.
Assuming that your game will only update what is visible and some area around what is visible, just break the world in "screens" (a "screen" is a rectangular area on the tilemap that can fill the whole screen). Keep in memory the "screens" around the visible area (and some more if you want to update entities which are close to the character - but there is little reason to update an entity that far away) and have the rest on disk with a cache to avoid loading/unloading of commonly visited areas when you move around. Some setup like:
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|VVV|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
Where "V" part is the "screen" where the center (hero or whatever) is, the "N" parts are those who are nearby and have active (updating) entities, are checked for collisions, etc and "F" parts are far parts which might get updated infrequently and are prone to be "swapped" out (stored to disk). Of course you might want to use more "N" screens than two :-).
Note btw that since 2D games do not usually hold much data instead of saving the far away parts to disk you might want to just keep them in memory compressed.
You probably want to use a single int or byte array that links to block types. If you need more optimization from there, then you'll want to link to more complicated data structures like oct trees from your array. There is a good discussion on a Java game forum here: http://www.javagaming.org/index.php/topic,20505.30.html text
Anything with links becomes very expensive because the pointer takes up something like 8 bytes each, depending upon the language, so depending upon how populated your world is it can get expensive very quickly (8 pointers 8 bytes each is 64 bytes per item, and a byte array is 1 byte per item). So unless 1/64 of your world is empty, a byte array is going to be a much better option. You're also going to need to spend a lot of time iterating down the tree whenever you're doing a lookup for collision or whatever else - a byte array will be an instantaneous lookup.
Hopefully that's detailed enough for you. :-)

What algorithm to use to obtain Objects from an Image

I would like to know what algorithm is used to obtain an image and get the objects present in the image and process (give information about) it. And also, how is this done?
I agree with Sid Farkus, there is no simple answer to this question.
Maybe you can get started by checking out the Open Computer Vision Library. There is a Wiki page on object detection with links to a How-To and to papers.
You may find other examples and approaches (i.e. algorithms); it's likely that the algorithms differ by application (i.e. depending on what you actually want to detect).
There are many ways to do Object Detection and it still an open problem.
You can start with template matching, It is probably the simplest way to solve, which consists of making a convolution with the known image (IA) on the new image (IB). It is a fairly simple idea because it is like applying a filter on the signal, the filter will generate a maximum point in the image when it finds the object, as shown in the video. But that technique has several cons, does not handle variants in scale or rotation so it has no real application.
Also you can find another option more robust feature matching, which consist on create a dataset with features such as SIFT, SURF or ORB of different objects with this you can train a SVM to recognize objects
You can also check deformable part models. However, The state of the art object detection is based on deep-learning such as Faster R-CNN, Alexnet, which learn the features that will be used to detect/recognize the objects
Well this is hardly an answerable question, but for most computer vision applications a good starting point is the Hough Transform

Resources