Wrong Homography Transformation - homography

Recently, I studied homography transformation (which adds an addition dimension to 2D points[x,y]->[wx,wy,x]) and projective geometry.
When I tried to transform the bottom part of photo 1 to the top of a new image, the result is wrong.
The reason I called it wrong is that the bottom part of photo 1 was projected onto the top part, as expected, of the new image but the top part of photo 1 is projected onto the bottom part of the new image. The top part of photo 1 is expected to be located "outside" the image (negative y).
Then, I found that the result could be correct if i eliminate the transformed pixels whose corresponding w is positive. The result is as photo 3.
I don't know how to mathematically or intuitively explain why the sign of w leads to different result. I guess sometimes negative w might work or be reasonable.
If someone want to thank about my post, i could provide my source code(matlab) with the photos.
Thank you!

Related

Field of view/ convexity map

On a shape from a logical image, I am trying to extract the field of view from any point inside the shape on matlab :
I tried something involving to test each line going through the point but it is really really long.(I hope to do it for each points of the shape or at least each point of it's contour wich is quite a few times)
I think a faster method would be working iteratively by the expansion of a disk from the considered point but I am not sure how to do it.
How can I find this field of view in an efficient way?
Any ideas or solution would be appreciated, thanks.
Here is a possible approach (the principle behind the function I wrote, available on Matlab Central):
I created this test image and an arbitrary point of view:
testscene=zeros(500);
testscene(80:120,80:120)=1;
testscene(200:250,400:450)=1;
testscene(380:450,200:270)=1;
viewpoint=[250, 300];
imsize=size(testscene); % checks the size of the image
It looks like this (the circle marks the view point I chose):
The next line computes the longest distance to the edge of the image from the viewpoint:
maxdist=max([norm(viewpoint), norm(viewpoint-[1 imsize(2)]), norm(viewpoint-[imsize(1) 1]), norm(viewpoint-imsize)]);
angles=1:360; % use smaller increment to increase resolution
Then generate a set of points uniformly distributed around the viewpoint.:
endpoints=bsxfun(#plus, maxdist*[cosd(angles)' sind(angles)'], viewpoint);
for k=1:numel(angles)
[CX,CY,C] = improfile(testscene,[viewpoint(1), endpoints(k,1)],[viewpoint(2), endpoints(k,2)]);
idx=find(C);
intersec(k,:)=[CX(idx(1)), CY(idx(1))];
end
What this does is drawing lines from the view point to each directions specified in the array angles and look for the position of the intersection with an obstacle or the edge of the image.
This should help visualizing the process:
Finally, let's use the built-in roipoly function to create a binary mask from a set of coordinates:
FieldofView = roipoly(testscene,intersec(:,1),intersec(:,2));
Here is how it looks like (obstacles in white, visible field in gray, viewpoint in red):

3D-Anaglyph creation algorithm, using depth map image: where to find?

I'm looking for a generic algorithm to calculate a red/cian anaglyph starting from the original image and his b/w depth map (example: http://www.swell3d.com/2008/07/turn-2d-painting-into-3d-anagl.html)
That algorythm are used, for example, in Photoshop but I can't find a readable explanation to reproduce it.
Thanks
After some researches I found what I was looking for.
First, I've readed some Photoshop/Gimp tutorials that describes how to make anaglyphs from two inputs: an image and its grayscale depth map. The core of the process is the use of "Displace Tool" and the depth map as a displacement map.
One of the several youtube tutorials: http://www.youtube.com/watch?v=gfYMe_vYhu4
So, I took some documentation about Gimp's Displace Tool by looking at this http://docs.gimp.org/en/plug-in-displace.html and directly at the source code of the tool (the method is very similar to the one proposed by Asgeir).
This lets us to produce two stereo images from the input, by looking at the depth map. The red and cyan colors of every image are calculated by reading this page http://3dtv.at/Knowhow/AnaglyphComparison_en.aspx ("Optimized" matrices are the best ones).
Then, the sum of the two images in one will produce the final anaglyph. Thanks everybody.
There are two algorithms involved. The first uses the original image and the depth map to produce a left and a right image. The second combines these images into a red-cyan anaglyph.
There are a couple ways to accomplish the first part. One is to take the original image and texture map it onto a fine mesh that lies flat in the XY plane. Then you tweak the Z values of each vertex in the mesh according to the corresponding value in the depth map. You've basically created a textured bas relief. You then use a 3D rendering algorithm to render the image from two vantage points that are offset horizontally by a small amount (essentially from the vantage point of a person's left and right eyes as they would view the bas relief).
There is probably a way to directly shift the pixels left and right which is a good fast approximation to what I described above.
Once you have the left and right images, you pass one through a cyan filter and one through a red filter. If you have RGB sources, that's as simple as taking the red channel from one image and combing it with the green and blue channels from the other image.
Anaglyphs work best with muted colors. If you have strong primaries, it won't look as good. You can use an algorithm to reduce the color saturation of the original image before you begin.
From the description in the link you provided I would assume that it is something like
for each pixel in depthmap
x_offset = (depthmap[x][y] / 255.0f) * MAX_PIXEL_OFFSET * DIRECTION
output[x + x_offset][y] = color_buffer[x][y]
blend output with color_buffer
Where MAX_PIXEL_OFFSET is the maximum shift in pixels and DIRECTION is -1 for one color and 1 for the other. This is assuming that the depthbuffer is one byte per pixel, range [0..255] and that 0 in the depthbuffer represents maximum distance.

How to find a correspondence or mapping between two similar images of the same object

I have two photos of a house, the camera is only moved about 1 meter or approx 1 foot from its position for the second photo. So for the first and second photo, they are very much the same but with some minor difference being the perspective.
I want to generate a mapping, a correspondence between the first photo and the second photo. I wish to know, for the pixels in the first photo, where do they map to in the second photo and in the opposite direction.
I guess that there is some way to detect similar structures between photos and using this way I can find a rough guess on where the pixels went to.
For a second part of this question, how can I solve this problem if some features were hidden or revealed, for example there is a tree between the house and the camera. So the tree will reveal some pixels and hide some as I moved the camera due to it being between the camera and the house.
You might want to look into the SIFT algorithm.
I want to note that SIFT and SURF are not going to solve this problem, they find a image in another image. They recognize the location of the image A inside the image B.
However, when the camera has moved a slight distance, some objects have moved and overlap each other. So what is needed to is to find which objects have moved relative to each other, to find which ones are overlapping others.

Best approach for specific Object/Image Recognition task?

I'm searching for an certain object in my photograph:
Object: Outline of a rectangle with an X in the middle. It looks like a rectangular checkbox. That's all. So, no fill, just lines. The rectangle will have the same ratios of length to width but it could be any size or any rotation in the photograph.
I've looked a whole bunch of image recognition approaches. But I'm trying to determine the best for this specific task. Most importantly, the object is made of lines and is not a filled shape. Also, there is no perspective distortion, so the rectangular object will always have right angles in the photograph.
Any ideas? I'm hoping for something that I can implement fairly easily.
Thanks all.
You could try using a corner detector (e.g. Harris) to find the corners of the box, the ends and the intersection of the X. That simplifies the problem to finding points in the right configuration.
Edit (response to comment):
I'm assuming you can find the corner points in your image, the 4 corners of the rectangle, the 4 line endings of the X and the center of the X, plus a few other corners in the image due to noise or objects in the background. That simplifies the problem to finding a set of 9 points in the right configuration, out of a given set of points.
My first try would be to look at each corner point A. Then I'd iterate over the points B close to A. Now if I assume that (e.g.) A is the upper left corner of the rectangle and B is the lower right corner, I can easily calculate, where I would expect the other corner points to be in the image. I'd use some nearest-neighbor search (or a library like FLANN) to see if there are corners where I'd expect them. If I can find a set of points that matches these expected positions, I know where the symbol would be, if it is present in the image.
You have to try if that is good enough for your application. If you have too many false positives (sets of corners of other objects that accidentially form a rectangle + X), you could check if there are lines (i.e. high contrast in the right direction) where you would expect them. And you could check if there is low contrast where there are no lines in the pattern. This should be relatively straightforward once you know the points in the image that correspond to the corners/line endings in the object you're looking for.
I'd suggest the Generalized Hough Transform. It seems you have a fairly simple, fixed shape. The generalized Hough transform should be able to detect that shape at any rotation or scale in the image. You many need to threshold the original image, or pre-process it in some way for this method to be useful though.
You can use local features to identify the object in image. Feature detection wiki
For example, you can calculate features on some referent image which contains only the object you're looking for and save the results, let's say, to a plain text file. After that you can search for the object just by comparing newly calculated features (on images with some complex scenes containing the object) with the referent ones.
Here's some good resource on local features:
Local Invariant Feature Detectors: A Survey

Liquify filter/iwarp

I'm trying to build something like the Liquify filter in Photoshop. I've been reading through image distortion code but I'm struggling with finding out what will create similar effects. The closest reference I could find was the iWarp filter in Gimp but the code for that isn't commented at all.
I've also looked at places like ImageMagick but they don't have anything in this area
Any pointers or a description of algorithms would be greatly appreciated.
Excuse me if I make this sound a little simplistic, I'm not sure how much you know about gfx programming or even what techniques you're using (I'd do it with HLSL myself).
The way I would approach this problem is to generate a texture which contains offsets of x/y coordinates in the r/g channels. Then the output colour of a pixel would be:
Texture inputImage
Texture distortionMap
colour(x,y) = inputImage(x + distortionMap(x, y).R, y + distortionMap(x, y).G)
(To tell the truth this isn't quite right, using the colours as offsets directly means you can only represent positive vectors, it's simple enough to subtract 0.5 so that you can represent negative vectors)
Now the only problem that remains is how to generate this distortion map, which is a different question altogether (any image would generate a distortion of some kind, obviously, working on a proper liquify effect is quite complex and I'll leave it to someone more qualified).
I think liquefy works by altering a grid.
Imagine each pixel is defined by its location on the grid.
Now when the user clicks on a location and move the mouse he's changing the grid location.
The new grid is again projected into the 2D view able space of the user.
Check this tutorial about a way to implement the liquify filter with Javascript. Basically, in the tutorial, the effect is done transforming the pixel Cartesian coordinates (x, y) to Polar coordinates (r, α) and then applying Math.sqrt on r.

Resources