I have a matrix 'A' with two columns which contains 2D points ( coordinates 'x' and 'y'). These points earlier were projected onto the plane from 3d cloud, so they create a 2d shape of some object.
For the second I have a noisy image 'B'(4k x 4k matrix) with similar (but translated and scaled) shape. What I want to do is to correlate points from matrix 'A' and use them as a binary mask for the object on the image 'B'. Currently i dont have a slightest idea how to do it.
Thanks for all help.
Following off of what AnonSubmitter85 suggested with the pattern recognition method, something like a SIFT (Scale Invariant Feature Transformation) detector might be helpful in the case of the scaled and rotated object. Similarly, matlab has a set of functions that do SURF (Speeded Up Robust Features) detection:
http://www.mathworks.com/help/vision/examples/find-image-rotation-and-scale-using-automated-feature-matching.html
Hopefully this can stimulate some new ideas.
Related
I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.
I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.
I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?
If you have any suggestions, thanks you in advance
I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.
PCL is not required to do this.
Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.
You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are
I am currently building an Augmented Reality application and stuck on a problem that seem quite easy but is very hard to me ... The problem is as follow:
My device's camera is calibrated and detect a 2D marker (such as a QRCode). I know the focal length, the sensor's position, the distance between my camera and the center of the marker, the real size of the marker and the coordinates of the 4 corners of the marker and of it center on the 2D image I got from the camera. See the following image:
On the image, we know the a,b,c,d distances and the coordinates of the red dots.
What I need to know is the position and the orientation of the camera according to the marker (as represented on the image, the origin is the center of the marker).
Is there an easy and fast way to do so? I tried some method imagined by myself (using Al-Kashi's formulas), but this ended with too much errors :(. Could someone point out a way to get me out of this?
You can find some example code for the EPnP algorithm on this webpage. This code consists in one header file and one source file, plus one file for the usage example, so this shouldn't be too hard to include in your code.
Note that this code is released for research/evaluation purposes only, as mentioned on this page.
EDIT:
I just realized that this code needs OpenCV to work. By the way, although this would add a pretty big dependency to your project, the current version of OpenCV has a builtin function called solvePnP, which does what you want.
You can compute the homography between the image points and the corresponding world points. Then from the homography you can compute the rotation and translation mapping a point from the marker's coordinate system into the camera's coordinate system. The math is described in the paper on camera calibration by Zhang.
Here's an example in MATLAB using the Computer Vision System Toolbox, which does most of what you need. It is using the extrinsics function, which computes a 3D rotation and a translation from matching image and world points. The points need not come from a checkerboard.
So I have a matrix A 300x500 which contains binary image of some object (background 0, object 1), and noisy image B with depicted the same object. I want to match binary mask A to the image B. Object on the mask has exactly the same shape as object on the noisy image. The problem is that images have different sizes (both of their planes and the objects depicted on them). Moreover the object on mask is located in the middle of the plane, contrary, on the image B is translated. Does anyone now simple solution how can I match these images?
Provided you don't rotate or scale your object the peak in cross-correlation should give you the shift between the two objects.
From the signal prcessing toolbox you can use xcorr2(A, B) to do this. The help even has it as one of the examples.
The peak position indicates the offset to get from one to the other. The fact that one indut is noisy will introduce some uncertainty in your answer, but this is inevitable as they do not exactly match.
I have an image with arbitrary regions shape (say objects), let's assume the background pixels are labeled as zeros whereas any object has a unique label (pixels of object 1 are labeled as 1, object 2 pixels are labeled as 2,...). Now for every object, I need to find the best elliptical fit of its pixels. This requires finding the center of the object, the major and minor axis, and the rotation angle. How can I find these?
Thank you;
Principal Component Analysis (PCA) is one way to go. See Wikipedia here.
The centroid is easy enough to find if your shapes are convex - just a weighted average of intensities over the xy positions - and PCA will give you the major and minor axes, hence the orientation.
Once you have the centre and axes, you have the basis for a set of ellipses that cover your shape. Extending the axes - in proportion - and testing each pixel for in/out, you can find the ellipse that just covers your shape. Or if you prefer, you can project each pixel position onto the major and minor axes and find the rough limits in one pass and then test in/out on "corner" cases.
It may help if you post an example image.
As you seem to be using Matlab, you can simply use the regionprops command, given that you have the Image Processing Toolbox.
It can extract all the information you need (and many more properties of image regions) and it will do the PCA for you, if the PCA-based based approach suits your needs.
Doc is here, look for the 'Centroid', 'Orientation', 'MajorAxisLength' and 'MinorAxisLength' parameters specifically.
Basically I was trying to achieve this: impose an arbitrary image to a pre-defined uneven surface. (See examples below).
-->
I do not have a lot of experience with image processing or 3D algorithms, so here is the best method I can think of so far:
Predefine a set of coordinates (say if we have a 10x10 grid, we have 100 coordinates that starts with (0,0), (0,10), (0,20), ... etc. There will be 9x9 = 81 grids.
Record the transformations for each individual coordinate on the t-shirt image e.g. (0,0) becomes (51,31), (0, 10) becomes (51, 35), etc.
Triangulate the original image into 81x2=162 triangles (with 2 triangles for each grid). Transform each triangle of the image based on the coordinate transformations obtained in Step 2 and draw it on the t-shirt image.
Problems/questions I have:
I don't know how to smooth out each triangle so that the image on t-shirt does not look ragged.
Is there a better way to do it? I want to make sure I'm not reinventing the wheels here before I proceed with an implementation.
Thanks!
This is called digital image warping. There was a popular graphics text on it in the 1990s (probably from somebody's thesis). You can also find an article on it from Dr. Dobb's Journal.
Your process is essentially correct. If you work pixel by pixel, rather than trying to use triangles, you'll avoid some of the problems you're facing. Scan across the pixels in target bitmap, and apply the local transformation based on the cell you're in to determine the coordinate of the corresponding pixel in the source bitmap. Copy that pixel over.
For a smoother result, you do your coordinate transformations in floating point and interpolate the pixel values from the source image using something like bilinear interpolation.
It's not really a solution for the problem, it's just a workaround :
If you have the 3D model that represents the T-Shirt.
you can use directX\OpenGL and put your image as a texture of the t-shirt.
Then you can ask it to render the picture you want from any point of view.