Rotation and translation in 3D reconstruction using 2D images - image

I am trying to do 3D model reconstruction using 2D images from different views. I am following this example code from Matlab to get the desired results:
Structure From Motion From Two Views.
Following are the test images taken from the camera:
Manually taken images of 1st and 2nd image with translation of 1cm:
Overlay with matched features of first and second image:
Manually taken images of 1st and 2nd image with translation of 2cm:
Overlay with matched features of first and second image:
These are the translation vectors and rotation matrices I get for each case:
1cm translation:
translation vector:[0.0245537412606279 -0.855696925927505 -0.516894461905255]
rotation matrix:
[0.999958322438693 0.00879926762261436 0.00243439415451741;
-0.00887800587357739 0.999365801035702 0.0344844418829408;
-0.00212941243132160 -0.0345046172211024 0.999402269855899]
2cm translation:
translation vector:[-0.215835469166982 -0.228607603749042 -0.949291111175908]
rotation matrix:
[0.999989695803078 -0.00104036790630347 -0.00441881457943975;
0.00149220346018613 0.994626852476622 0.103514238930121;
0.00428737874479779 -0.103519766069424 0.994618156086259]
In documentation, it says it is relative rotation and translation between the 2 images.
But I am unable to understand what these numbers mean and what is the unit of the above values.
Can anyone at least let me know in what units are we getting the translation and rotation or how to extract the rotation and translation which is in any way comparable to the real world values like cm/mm and radians/degrees respectively?

You can translate the rotation matrix into a axis-angle-representation where you get the angles in radians. This can be done using the vrrotmat2vec function or by implementing a translater yourself by following this if you don't have access to the package. The angle will then be in radians.
When it comes to translation however you wont get it in a unit that makes sense in the real world, since you don't know the scale. This is unfortunately a problem with structure from motion in general. It is impossible to know if you take a image close to something small or far away from something large.
When using structure from motion to construct a 3D model this is fortunately not a problem since you still get relative distances correctly. Therefore you will be able to capture the scene (by following the rest of the tutorial) but you wont be able to say if something is 2cm or 2km tall, unless you have something in the image that you know the real life size of.
Hope it helps :)

Related

Latent space image interpolation

Can someone tell me how (or the name of it, so that I could look it up) I can implement this interpolation effect? https://www.youtube.com/watch?v=36lE9tV9vm0&t=3010s&frags=pl%2Cwn
I tried to use r = r+dr, g = g+dr and b = b+db for the RGB values in each iteration, but it looks way too simple compared to the effect from the video.
"Can someone tell me how I can implement this interpolation effect?
(or the name of it, so that I could look it up)..."
It's not actually a named interpolation effect. It appears to interpolate but really it's just realtime updated variations of some fictional facial "features" (the hair, eyes, nose, etc are synthesized pixels taking hints from a library/database of possible matching feature types).
For this technique they used Neural Networks to do a process similar to DFT Image Reconstruction. You'll be modifying the image data in Frequency domain (with u,v), not Time domain (using x,y).
You can read about it at this PDF: https://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf
The (Python) source code:
https://github.com/tkarras/progressive_growing_of_gans
For ideas, on Youtube you can look up:
DFT image reconstruction (there's a good example with b/w Nicholas Cage photo reconstructed in stages. Loud music warning).
Image Synthesis with neural networks (one clip had salternative shoe and hand-bag designs (item photos) being "synthesized" by an N.N. after it analyzed features from other existing catalogue photos as "inspiration".
Image Enhancement Super Resolution using neural networks This method is closest to answering your question. One example has very low-res blurry pixelated image in b/w. Cannot tell if boy or girl. During a test, The network synthesizes various higher quality face images that it thinks is the correct match for the testing input.
After understanding what/how they're achieve it, you could think of shortcuts to get similar effect without needing networks eg: only using regular pixel editing functions.
Found it in another video, it is called "latent space interpolation", it has to be applied on the compressed images. If I have image A and the next image is image B, I have first to encode A and B, use the interpolation on the encoded data and finally decode the resulted image.
As of today, I found out that this kind of interpolation effect can be easily implemented for 3d image data. That is if the image data is available in a normalized and at 3d origin centred way, like for example in a unit sphere around the origin and the data of each faceimage is inside that unit sphere. Having the data of two images stored this way the interpolation can be calculated by taking the differences of rays going through the origin center and through each area of the sphere at some desired resolution.

Image comparison algorithm for unit testing

I have a small render engine written for fun. I would like to have some unit testing that would render automatically an image and then compare it to a stored image to check for differences. This should give some sort of metric to be able to gauge if the image is too far off or if we can attribute that to just different timings in animations. If it can also produce the location in the image of the differences that would be great, but not necessary. We can also assume that the 2 images are the exact same size.
What are the classic papers/techniques for that sort of thing ?
(the language is Go, probably nothing exists for it yet and I'd like to implement it myself to understand what's going on. The renderer is github.com/luxengine)
Thank you
One idea could be to see your problem as a case in Image Registration.
The following figure (taken from http://it.mathworks.com/help/images/point-mapping.html) gives a flow-chart for a method to solve the image registration problem.
Using the above figure terms, the basic idea is:
find some interest points in the Fixed image;
find in the Moving image the same corresponding points;
estimate the transformation between the two images using the point correspondences. One of the simplest transformation is a translation represented by a 2D vector; the magnitude of this vector is a measure of differences between the two images, in your case it can be related to the shift you wrote about in your comment. A richer transformation is an homography described by a 3x3 matrix, its distance from the identity matrix is again a measure of differences between the two images.
you can apply the transformation back, for example in the case of the translation you apply the translation to the Moving image and then the warped image can be compared (here I am simplifying a little) pixel by pixel to the Reference image.
Some more ideas are here: Image comparison - fast algorithm

how to improve keypoints detection and matching

I have been working a self project in image processing and robotics where instead robot as usual detecting colors and picking out the object, it tries to detect the holes(resembling different polygons) on the board. For a better understanding of the setup here is an image:
As you can see I have to detect these holes, find out their shapes and then use the robot to fit the object into the holes. I am using a kinect depth camera to get the depth image. The pic is shown below:
I was lost in thought of how to detect the holes with the camera, initially using masking to remove the background portion and some of the foreground portion based on the depth measurement,but this did not work out as, at different orientations of the camera the holes would merge with the board... something like inranging (it fully becomes white). Then I came across adaptiveThreshold function
adaptiveThreshold(depth1,depth3,255,ADAPTIVE_THRESH_GAUSSIAN_C,THRESH_BINARY,7,-1.0);
With noise removal using erode, dilate, and gaussian blur; which detected the holes in a better manner as shown in the picture below. Then I used the cvCanny edge detector to get the edges but so far it has not been good as shown in the picture below.After this I tried out various feature detectors from SIFT, SURF, ORB, GoodFeaturesToTrack and found out that ORB gave the best times and the features detected. After this I tried to get the relative camera pose of a query image by finding its keypoints and matching those keypoints for good matches to be given to the findHomography function. The results are as shown below as in the diagram:
In the end i want to get the relative camera pose between the two images and move the robot to that position using the rotational and translational vectors got from the solvePnP function.
So is there any other method by which I could improve the quality of the
holes detected for the keypoints detection and matching?
I had also tried contour detection and approxPolyDP but the approximated shapes are not really good:
I have tried tweaking the input parameters for the threshold and canny functions but
this is the best I can get
Also ,is my approach to get the camera pose correct?
UPDATE : No matter what I tried I could not get good repeatable features to map. Then I read online that a depth image is cheap in resolution and its only used for stuff like masking and getting the distances. So , it hit me that the features are not proper because of the low resolution image with its messy edges. So I thought of detecting features on a RGB image and using the depth image to get only the distances of those features. The quality of features I got were literally off the charts.It even detected the screws on the board!! Here are the keypoints detected using GoodFeaturesToTrack keypoint detection..
I met an another hurdle while getting the distancewith the distances of the points not coming out properly. I searched for possible causes and it occured to me after quite a while that there was a offset in the RGB and depth images because of the offset between the cameras.You can see this from the first two images. I then searched the net on how to compensate this offset but could not find a working solution.
If anyone one of you could help me in compensate the offset,it would be great!
UPDATE: I could not make good use of the goodFeaturesToTrack function. The function gives the corners in Point2f type .If you want to compute the descriptors we need the keypoints and converting Point2f to Keypoint with the code snippet below leads to the loss of scale and rotational invariance.
for( size_t i = 0; i < corners1.size(); i++ )
{
keypoints_1.push_back(KeyPoint(corners1[i], 1.f));
}
The hideous result from the feature matching is shown below .
I have to start on different feature matchings now.I'll post further updates. It would be really helpful if anyone could help in removing the offset problem.
Compensating the difference between image output and the world coordinates:
You should use good old camera calibration approach for calibrating the camera response and possibly generating a correction matrix for the camera output (in order to convert them into real scales).
It's not that complicated once you have printed out a checkerboard template and capture various shots. (For this application you don't need to worry about rotation invariance. Just calibrate the world view with the image array.)
You can find more information here: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/own_calib.html
--
Now since I can't seem to comment on the question, I'd like to ask if your specific application requires the machine to "find out" the shape of the hole on the fly. If there are finite amount of hole shapes, you may then model them mathematically and look for the pixels that support the predefined models on the B/W edge image.
Such as (x)^2+(y)^2-r^2=0 for a circle with radius r, whereas x and y are the pixel coordinates.
That being said, I believe more clarification is needed regarding the requirements of the application (shape detection).
If you're going to detect specific shapes such as the ones in your provided image, then you're better off using a classifer. Delve into Haar classifiers, or better still, look into Bag of Words.
Using BoW, you'll need to train a bunch of datasets, consisting of positive and negative samples. Positive samples will contain N unique samples of each shape you want to detect. It's better if N would be > 10, best if >100 and highly variant and unique, for good robust classifier training.
Negative samples would (obviously), contain stuff that do not represent your shapes in any way. It's just for checking the accuracy of the classifier.
Also, once you have your classifier trained, you could distribute your classifier data (say, suppose you use SVM).
Here are some links to get you started with Bag of Words:
https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/
Sample code:
http://answers.opencv.org/question/43237/pyopencv_from-and-pyopencv_to-for-keypoint-class/

Get POINT CLOUD through 360 Degree Rotation and Image Processing

My Question is as below in two parts……
QUESTION (IN SHORT):
• To generate point cloud of real-world object….
• Through 360 degree rotation of it….on rotating table
• Getting 360 images… one image at each degree (1° to 360°).
• I know how to process image and getting pixel value of it.
• See one sample image below…you can see image is black and white...because I have to deal with the objects which are much shiny (glittery)…and it is DIAMOND. So I have setting up background so that shiny object (diamond) converted in to B/W object. And so I can easily scan outer edge of object (e.g. Diamond).
• And one thing to consider is I don’t using any laser… I just using one rotating table and one camera for taking image…you can see one sample project over here… but there MATLAB hides all the things…because that guy using MATLAB’s in Built functionality.
• Actually I am looking for Math routine or Algorithm or any Technique which helping me out to how getting point cloud…….using the way I have mentioned……..
MORE ELABORATION:
I need to have point-cloud of real-world object. So, I can display it in Computer Screen.
For that I am using one rotating table. I will put my object on it and I will rotate table a complete 360° degree rotation and I will take 360 images…one image at each degree (1° to 360°).
Camera which is used for taking image is well calibrated. I have given one sample image as below. I also know how to scan image and getting pixel value of it.
Also take in consideration that my images are Silhouette type…means just black and white... No color images.
But my problem is or where I am trapped down is in...
Getting Points cloud of object…….from the data which I have getting through processing of image.
One same kind of project I found over here……..
But it just using built in MATLAB functions…I am using Microsoft Visual C#.Net so I have to build the entire algorithm myself….because MATLAB hides all the things which I want to know….
Is there any master…….who know this entire thing well and getting me out of trap...!!!!
Thanks…..
I have no experience of this but If I wanted to do something like this I would have tried this:
Use a single color light source
if Possible create a lightsource which falls on a thin verticle slice of the object.
have 360 B/W Images, those Images will be images of a verticle line having variyng intensity. If you use matlab your matrix will have a/few column with sime values.
now asume a verticle line(your Axis of rotation).
5 plot or convert (imageno, rownoOfMatrix, ValueInPopulatedColumnInSameRow)... [Assuming numbering Image from 0 to 360]
under ideal conditions A lame way To get X and Y use K1 * cos imgNo * ValInCol and K1 * sin imgNo * ValInCol, and Z will be some K2 * rowNum.. K1 and K2 can be caliberated knowing actual size of object.
I mean Something like this:
http://fab.cba.mit.edu/content/processes/structured_light/
but instead of using structured light using a single verticle light
http://www.geom.uiuc.edu/~samuelp/del_project.html This link might help in triangulation...

Liquify filter/iwarp

I'm trying to build something like the Liquify filter in Photoshop. I've been reading through image distortion code but I'm struggling with finding out what will create similar effects. The closest reference I could find was the iWarp filter in Gimp but the code for that isn't commented at all.
I've also looked at places like ImageMagick but they don't have anything in this area
Any pointers or a description of algorithms would be greatly appreciated.
Excuse me if I make this sound a little simplistic, I'm not sure how much you know about gfx programming or even what techniques you're using (I'd do it with HLSL myself).
The way I would approach this problem is to generate a texture which contains offsets of x/y coordinates in the r/g channels. Then the output colour of a pixel would be:
Texture inputImage
Texture distortionMap
colour(x,y) = inputImage(x + distortionMap(x, y).R, y + distortionMap(x, y).G)
(To tell the truth this isn't quite right, using the colours as offsets directly means you can only represent positive vectors, it's simple enough to subtract 0.5 so that you can represent negative vectors)
Now the only problem that remains is how to generate this distortion map, which is a different question altogether (any image would generate a distortion of some kind, obviously, working on a proper liquify effect is quite complex and I'll leave it to someone more qualified).
I think liquefy works by altering a grid.
Imagine each pixel is defined by its location on the grid.
Now when the user clicks on a location and move the mouse he's changing the grid location.
The new grid is again projected into the 2D view able space of the user.
Check this tutorial about a way to implement the liquify filter with Javascript. Basically, in the tutorial, the effect is done transforming the pixel Cartesian coordinates (x, y) to Polar coordinates (r, α) and then applying Math.sqrt on r.

Resources