Best data structure for point cloud updates? - computational-geometry

I'm working on a robot using the new jetson nano. I've got points generating from the depth image of my camera and am working towards creating a scene as the robot moves around. My issue is just throwing points into the data structure every frame would make me run out of memory super quickly. Thus I want to have some heuristic that says if a point meets some condition don't add it.
For this I imagine I need an acceleration structure like an Octree, KDTree, BVH Hierarchy, or maybe something else. While I am familiar with them and find lots of info on how to build them, I'm a little confused on which of them would be easiest to update each frame or if some require complete rebuilds compared to incremental rebuilds. Could some be parallelized? Any insight on what type data structure maybe with a link about it would super helpful.
Edit:
I believe the best structure for this is likely a Sparse Voxel Octree. You can find some general ideas of how to do so with this blog from Nvidia. https://devblogs.nvidia.com/thinking-parallel-part-iii-tree-construction-gpu/ .
If a morton code maps to a specific voxel that voxel is 'filled'. Redundant points are automatically taken care of as voxel is either filled or unfilled. For removal I think i can do ray tracing on the octree and if I collide with a filled voxel before I expect too delete the existing voxel. There are some resolution problems, but I think I can handle this with a hybrid approach.

Related

Bidirectional path tracing, algorithm explanation

I'm trying to understand path tracing. So far, I have only dealt with the very basis - when a ray is launched from each intersection point in a random direction within the hemisphere, then again, and so on recursively, until the ray hits the light source. As a result, this approach leads to the fact that in the case of small light sources, the image is extremely noisy.
The following images show the noise level depending on the number of samples (rays) per pixel.
I am also not sure that i am doing everything correctly, because the "Monte Carlo" method, as far as I understand, implies that several rays are launched from each intersection point, and then their result is summed and averaged. But this approach leads to the fact that the number of rays increases exponentially, and after 6 bounces reaches inadequate values, so i decided that it is better to just run several rays per pixel initially (slightly shifted from the center of the pixel in a random direction), but only 1 ray is generated at each intersection. I do not know whether this approach corresponds to "Monte Carlo" or not, but at least this way the rendering does not last forever..
Bidirectional path tracing
I started looking for ways to reduce the amount of noise, and came across bidirectional path tracing. But unfortunately, i couldn't find a detailed explanation of this algorithm in simple words. All I understood is that the rays are generated from both the camera and the light sources, and then there is a check on the possibility of connecting the endpoints of these paths.
As you can see, if the intersection points of the blue ray from the camera and the white ray from the light source can be freely connected (there are no obstacles in the connection path), then we can assume that the ray from the camera can pass through the points y1, y0 directly to the light source.
But there are a lot of questions:
If the light source is not a point, but has some shape, then the point from which the ray is launched must be randomly selected on the surface of this shape? If you take only the center - then there will be no difference from a point light source, right?
Do i need to build a path from the light source for each path from the camera, or should there be only one path from the light source, while several paths (samples) are built from the camera for one pixel at once?
The number of bounces/re-reflections/refractions should be the same for the path from the camera and the light source? Or not?
But the questions don't end there. I have heard that the bidirectional trace method allows you to model caustics well (in comparison with regular path tracing). But I completely did not understand how the method of bidirectional path tracing can somehow help for this.
Example 1
Here the path will eventually be built, but the number of bounces will be extremely large, so no caustics will work here, despite the fact that the ray from the camera is directed almost to the same point where the path of the ray from the light source ends.
Example 2
Here the path will not be built, because there is an obstacle between the endpoints of the paths, although it could be built if point x3 was connected to point y1, but according to the algorithm (if I understand everything correctly), only the last points of the paths are connected.
Question:
What is the use of such an algorithm, if in a significant number of cases the paths either cannot be built, or are unnecessarily long? Maybe I misunderstand something? I came across many articles and documents where this algorithm was somehow described, but mostly it was described mathematically (using all sorts of magical terms like biased-unbiased, PDF, BSDF, and others), and not.. algorithmically. I am not that strong in mathematics and all sorts of mathematical notation and wording, I would just like to understand WHAT TO DO, how to implement it correctly in the code, how these paths are connected, in what order, and so on. This can be explained in simple words, pseudocode, right? I would be extremely grateful if someone would finally shed some light on all this.
Some references that helped me to understand the Path tracing right :
https://www.scratchapixel.com/ (every rendering student should begin with this)
https://en.wikipedia.org/wiki/Path_tracing
If you're looking for more references, path tracing is used for "Global illumination" wich is the opposite as "Direct illumination" that only rely on a straight line from the point to the light.
What's more caustics is well knowned to be a hard problem, so don't begin with it! Monte Carlo method is a good straightforward method to begin with, but it has its limitations (ie Caustics and tiny lights).
Some advices for rendering newbees
Mathematics notations are surely not the coolest ones. Every one will of course prefer a ready to go code. But maths is the most rigourous way to describe the world. It permits also to modelize a whole physic interaction in a small formula instead of plenty of lines of codes that doesn't fit to the real problem. I suggest you to forget you to try reading what you read better as a good mathematic formula is always detailed. If some variables are not specified, don't loose your time and search another reference.

Is there a way to create simple animations "on the fly" in modern OpenGL?

I think this requires a bit of background information:
I have been modding Minecraft for a while now, but I alway wanted to make my own game, so I started digging into the freshly released LWJGL3 to actually get things done. Yes, I know it's a bit ow level and I should use an engine and stuff...indeed, I already tried some engines and they never quite match what I want to do, so I decided I want to tackle the problem at its root.
So far, I kind of understand how to render meshes, move the "camera", etc. and I'm willing to take the learning curve.
But the thing is, at some point all the tutorials start to explain how to load models and create skeletal animations and so on...but I think I do not really want to go that way. A lot of things in working with Minecraft code was awful, but I liked how I could create models and animations from Java code. Sure, it did not look super realistic, but since I'm not great with Blender either, I doubt having "classic" models and animations would help. Anyway, in that code, I could rotate a box around to make a creature look at a player, I could use a sinus function to move legs and arms (or wings, in my case) and that was working, since Minecraft used immediate mode and Java could directly tell the graphics card where to draw each vertex.
So, actual question(s): Is there any good way to make dynamic animations in modern (3.3+) OpenGL? My models would basically be a hierarchy of shapes (boxes or whatever) and I want to be able to rotate them on the fly. But I'm not sure how to organize that. Would I store all the translation/rotation-matrices for each sub-shape? Would that put a hard limit on the amount of sub-shapes a model could have? Did anyone try something like that?
Edit: For clarification, what I did looked something like this:
Create a model: https://github.com/TheOnlySilverClaw/Birdmod/blob/master/src/main/java/silverclaw/birds/client/model/ModelOstrich.java
The model is created as a bunch of boxes in the constructor, the render and setRotationAngles methods set scale and rotations.
You should follow one opengl tutorial in order to understand the basics.
Let me suggest "Learning Modern 3D Graphics Programming", and especially this chapter, where you move one robot arm with multiple joints.
I did a port in java using jogl here, but you can easily port it over lwjgl.
What you are looking for is exactly skeletal animation, the only difference being the fact you do not want to load animations for your bones but want to compute / generate transforms on the fly.
You basically have a hierarchy of bones, and geometry attached to it. It looks like you want to manipulate this geometry "rigidly", so before sending your meshes / transforms to the GPU (the classic way), you want to start by computing the new transforms in model or world space, then send those freshly computed matrices to draw your geometries on the gpu the standard way.
As Sorin said, to compute each transform you simply have to iterate over your hierarchy and accumulate transforms given the transform of the parent bone and your local transform w.r.t the parent.
Yes and no.
You can have your hierarchy of shapes and store a relative transform for each.
For example the "player" whould have a translation to 100,100, 10 (where the player is), and then the "head" subcomponent would have an additional translation of 0,0,5 (just a bit higher on the z axis).
You can store these as matrices (they can encode translation, roation and scaling) and use glPushMatrix and glPop matrix to add and remove a matrix to a stack maintained by openGL.
The draw() function(or whatever you call it) should look something like :
glPushMatrix();
glMultMatrix(my_transform); // You can also just have glTranslate, glRotate or anything else.
// Draw my mesh
for (child : children) { child.draw(); }
glPopMatrix();
This gives you a hierarchical setup so that objects move with their parent. Alternatively you can have a stack in the main memory and do the multiplications yourself (use a library). I think the openGL stack may have a limit (implementation dependent), but if you handle it yourself the only limit is the amount of ram you can use. Once all the matrices are multiplied rendering is done in the same amount of time, that is it doesn't matter for performance how deep a mesh is in the hierarchy.
For actual animations you need to compute the intermediate transformations. For example for a crouch animation you probably want to have a few frames in between so that the camera doesn't just jump to the low position. You can do this with a time based linear interpolation between the start and end positions, but this only covers simple animations and you still have to implement it yourself.
Anything more complicated (i.e. modify the mesh based on the bone links) you would need to implement yourself.

Match 3D point cloud to CAD model

I have a point cloud of an object, obtained with a laser scanner, and a CAD surface model of that object.
How can I match the point cloud to the surface, to obtain the translation and rotation between cloud and model?
I suppose I could sample the surface and try the Iterative Closest Point (ICP) algorithm to match the resulting sampled point cloud to the scanner point cloud.
Would that actually work?
And are there better algorithms for this task?
In new OpenCV, I have implemented a surface matching module to match a 3D model to a 3D scene. No initial pose is required and the detection process is fully automatic. The model also involves an ICP.
To get an idea, please check that out a video here (though it is not generated by the implementation in OpenCV):
https://www.youtube.com/watch?v=uFnqLFznuZU
The full source code is here and the documentation is here.
You mentioned that you needed to sample your CAD model. This is correct and we have given a sampling algorithm suited for point pair feature matching, such as the one implemented in OpenCV:
Birdal, Tolga, and Slobodan Ilic. A point sampling algorithm for 3D matching of irregular geometries. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017.
http://campar.in.tum.de/pub/tbirdal2017iros/tbirdal2017iros.pdf
Yes, ICP can be applied to this problem, as you suggest with sampling the surface. It would be best if you have all available faces in your laser scan otherwise you may have to remove invisible faces from your model (depending on how many of these there are).
One way of automatically preparing a model by getting rid of some of the hidden faces is to calculate the concave hull which can be used to discard hidden faces (which are for example faces that are not close to the concave hull). Depending on how involved the model is this may or may not be necessary.
ICP works well if given a good initial guess because it ignores points that are not close with respect to the current guess. If ICP is not coming up with a good alignment you may try it with multiple random restarts to try and fix this problem, choosing the best alignment.
A more involved solution is to do local feature matching. You sample and calculate an invariant descriptor like SHOT or FPFH. You find the best matches, reject non-consistent matches, use them to come up with a good initial alignment and then refine with ICP. But you may not need this step depending on how robust and fast the random-restart ICP is.
There's an open source library for point cloud algorithms which implements registration against other point clouds. May be you can try some of their methods to see if any fit.
As a starter, if they don't have anything specific to fit against a polygon mesh, you can treat the mesh vertices as another point cloud and fit your point cloud against it. This is something that they definitely support.

What's the best depth map generation algorithm?

I'm into a 2D-to-3D application project and I'm looking for a method to produce the depth map of a single input image, without other external informations. I know that's a sort of "artificial intelligence" mattern but maybe an efficient algorythm exists.
At the moment I've found this one: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.7959&rep=rep1&type=pdf but I'm wondering if there is a better method, before start implementing. Suggestions? Thanks!
I've written quite a few automatic depth map generators. I don't think there's one that's better than all others in all cases. It all depends on the stereo pair you're starting with. I personally think a depth map generator based on local method (window or block based) with an edge preserving smoother is probably the best all-around depth map generator.
In any case, on this page:
depth map generation software
you can find depth map generator software based on optical flow, weight-based windows, graph cuts, and many other things that relate to depth map generation and lenticular creation. The best part is that it's all free.
For 2d to 3d conversion (which is more what you are asking), there's a piece of software called DMAG4 that uses a scarsely populated depth map (typically, done in Gimp with the paint brush) to indicate the main depths and then fills the unfilled areas using interpolation while maintaining the edges of the objects (edge-preserving).
DMAG4 can be found here (it's free to use):
2d to 3d conversion software DMAG4
Another way to 2d to 3d conversion is to use a sculpting program like Gimpel3d or Blender, both free. Clearly, this goes beyond depth map since you're essentially creating a 3d scene in which you can then move around (using the camera movement in Blender). This is often referred to as "camera mapping".
Well, I have recently come upon this:
http://make3d.cs.cornell.edu/code.html
which comes together with code, although the license might be too restrictive
("Noncommercial — You may not use this work for commercial purposes").
the gallery is impressive
http://make3d.stanford.edu/images/showall

Looking for ways for a robot to locate itself in the house

I am hacking a vacuum cleaner robot to control it with a microcontroller (Arduino). I want to make it more efficient when cleaning a room. For now, it just go straight and turn when it hits something.
But I have trouble finding the best algorithm or method to use to know its position in the room. I am looking for an idea that stays cheap (less than $100) and not to complex (one that don't require a PhD thesis in computer vision). I can add some discrete markers in the room if necessary.
Right now, my robot has:
One webcam
Three proximity sensors (around 1 meter range)
Compass (no used for now)
Wi-Fi
Its speed can vary if the battery is full or nearly empty
A netbook Eee PC is embedded on the robot
Do you have any idea for doing this? Does any standard method exist for these kind of problems?
Note: if this question belongs on another website, please move it, I couldn't find a better place than Stack Overflow.
The problem of figuring out a robot's position in its environment is called localization. Computer science researchers have been trying to solve this problem for many years, with limited success. One problem is that you need reasonably good sensory input to figure out where you are, and sensory input from webcams (i.e. computer vision) is far from a solved problem.
If that didn't scare you off: one of the approaches to localization that I find easiest to understand is particle filtering. The idea goes something like this:
You keep track of a bunch of particles, each of which represents one possible location in the environment.
Each particle also has an associated probability that tells you how confident you are that the particle really represents your true location in the environment.
When you start off, all of these particles might be distributed uniformly throughout your environment and be given equal probabilities. Here the robot is gray and the particles are green.
When your robot moves, you move each particle. You might also degrade each particle's probability to represent the uncertainty in how the motors actually move the robot.
When your robot observes something (e.g. a landmark seen with the webcam, a wifi signal, etc.) you can increase the probability of particles that agree with that observation.
You might also want to periodically replace the lowest-probability particles with new particles based on observations.
To decide where the robot actually is, you can either use the particle with the highest probability, the highest-probability cluster, the weighted average of all particles, etc.
If you search around a bit, you'll find plenty of examples: e.g. a video of a robot using particle filtering to determine its location in a small room.
Particle filtering is nice because it's pretty easy to understand. That makes implementing and tweaking it a little less difficult. There are other similar techniques (like Kalman filters) that are arguably more theoretically sound but can be harder to get your head around.
A QR Code poster in each room would not only make an interesting Modern art piece, but would be relatively easy to spot with the camera!
If you can place some markers in the room, using the camera could be an option. If 2 known markers have an angular displacement (left to right) then the camera and the markers lie on a circle whose radius is related to the measured angle between the markers. I don't recall the formula right off, but the arc segment (on that circle) between the markers will be twice the angle you see. If you have the markers at known height and the camera is at a fixed angle of inclination, you can compute the distance to the markers. Either of these methods alone can nail down your position given enough markers. Using both will help do it with fewer markers.
Unfortunately, those methods are imperfect due to measurement errors. You get around this by using a Kalman estimator to incorporate multiple noisy measurements to arrive at a good position estimate - you can then feed in some dead reckoning information (which is also imperfect) to refine it further. This part is goes pretty deep into math, but I'd say it's a requirement to do a great job at what you're attempting. You can do OK without it, but if you want an optimal solution (in terms of best position estimate for given input) there is no better way. If you actually want a career in autonomous robotics, this will play large in your future. (
Once you can determine your position you can cover the room in any pattern you'd like. Keep using the bump sensor to help construct a map of obstacles and then you'll need to devise a way to scan incorporating the obstacles.
Not sure if you've got the math background yet, but here is the book:
http://books.google.com/books/about/Applied_optimal_estimation.html?id=KlFrn8lpPP0C
This doesn't replace the accepted answer (which is great, thanks!) but I might recommend getting a Kinect and use that instead of your webcam, either through Microsoft's recently released official drivers or using the hacked drivers if your EeePC doesn't have Windows 7 (presumably it does not).
That way the positioning will be improved by the 3D vision. Observing landmarks will now tell you how far away the landmark is, and not just where in the visual field that landmark is located.
Regardless, the accepted answer doesn't really address how to pick out landmarks in the visual field, and simply assumes that you can. While the Kinect drivers may already have feature detection included (I'm not sure) you can also use OpenCV for detecting features in the image.
One solution would be to use a strategy similar to "flood fill" (wikipedia). To get the controller to accurately perform sweeps, it needs a sense of distance. You can calibrate your bot using the proximity sensors: e.g. run motor for 1 sec = xx change in proximity. With that info, you can move your bot for an exact distance, and continue sweeping the room using flood fill.
Assuming you are not looking for a generalised solution, you may actually know the room's shape, size, potential obstacle locations, etc. When the bot exists the factory there is no info about its future operating environment, which kind of forces it to be inefficient from the outset.
If that's you case, you can hardcode that info, and then use basic measurements (ie. rotary encoders on wheels + compass) to precisely figure out its location in the room/house. No need for wifi triangulation or crazy sensor setups in my opinion. At least for a start.
Ever considered GPS? Every position on earth has a unique GPS coordinates - with resolution of 1 to 3 metres, and doing differential GPS you can go down to sub-10 cm range - more info here:
http://en.wikipedia.org/wiki/Global_Positioning_System
And Arduino does have lots of options of GPS-modules:
http://www.arduino.cc/playground/Tutorials/GPS
After you have collected all the key coordinates points of the house, you can then write the routine for the arduino to move the robot from point to point (as collected above) - assuming it will do all those obstacles avoidance stuff.
More information can be found here:
http://www.google.com/search?q=GPS+localization+robots&num=100
And inside the list I found this - specifically for your case: Arduino + GPS + localization:
http://www.youtube.com/watch?v=u7evnfTAVyM
I was thinking about this problem too. But I don't understand why you can't just triangulate? Have two or three beacons (e.g. IR LEDs of different frequencies) and a IR rotating sensor 'eye' on a servo. You could then get an almost constant fix on your position. I expect the accuracy would be in low cm range and it would be cheap. You can then map anything you bump into easily.
Maybe you could also use any interruption in the beacon beams to plot objects that are quite far from the robot too.
You have a camera you said ? Did you consider looking at the ceiling ? There is little chance that two rooms have identical dimensions, so you can identify in which room you are, position in the room can be computed from angular distance to the borders of the ceiling and direction can probably be extracted by the position of doors.
This will require some image processing but the vacuum cleaner moving slowly to be efficiently cleaning will have enough time to compute.
Good luck !
Use Ultra Sonic Sensor HC-SR04 or similar.
As above told sense the walls distance from robot with sensors and room part with QR code.
When your are near to a wall turn 90 degree and move as width of your robot and again turn 90deg( i.e. 90 deg left turn) and again move your robot I think it will help :)

Resources