Let I(V) -> R^3 be an image defined on a set of pixels V.
This is common line which I come across in most of the research papers related to image processing. I am confused about the R^3 factor here. Does it represent the RGB model or something else?
Context : www.eecs.berkeley.edu/~carreira/papers/cvpr2010_2.pdf
Yes, when the pixel values are tuples, the image must be multispectral. And in the particular case of three components, red/green/blue channels are the most likely.
Related
Folks,
I have read a number of articles on Discrete Wavelet Transform (DWT) and looked at some sample code as well. However, I am not clear on what exactly does DWT achieve.
Here is what I understand. For a two dimensional image in YUV format, I can pass in the Y plane (brightness) to DWT function as a parameter. The function returns me a matrix of the original width and height containing coefficient values.
What are these coefficient values telling me? Is it how fast or slow the brightness of a pixel has changed compared to its neighbors?
Further, the returned matrix is rearranged in four quarters. As the coefficients have been rearranged, I no longer know which coefficient belongs to which pixel. This is confusing. If I cannot associate the coefficient to its corresponding pixel location, how can I really use the coefficients?
A little bit of background. I am looking at hiding some information in an image as an invisible watermark. From what I understand, DWT can help me identify the best region to hide the information. However, I have not been able to put the whole picture together.
Ok. I figured out how DWT works. I was under the assumption that the coefficients generated have a relationship with the original image. However, the transform converts the input luma into a completely different set. It is possible to run the reverse transform on the new values to once again obtain the original values.
Regards,
Peter
I have a query on calculation of best matching point of one image to another image through intensity based registration. I'd like to have some comments on my algorithm:
Compute the warp matrix at this iteration
For every point of the image A,
2a. We warp the particular image A pixel coordinates with the warp matrix to image B
2b. Perform interpolation to get the corresponding intensity form image B if warped point coordinate is in image B.
2c. Calculate the similarity measure value between warped pixel A intensity and warped image B intensity
Cycle through every pixel in image A
Cycle through every possible rotation and translation
Would this be okay? Is there any relevant opencv code we can reference?
Comments on algorithm
Your algorithm appears good although you will have to be careful about:
Edge effects: You need to make sure that the algorithm does not favour matches where most of image A does not overlap image B. e.g. you may wish to compute the average similarity measure and constrain the transformation to make sure that at least 50% of pixels overlap.
Computational complexity. There may be a lot of possible translations and rotations to consider and this algorithm may be too slow in practice.
Type of warp. Depending on your application you may also need to consider perspective/lighting changes as well as translation and rotation.
Acceleration
A similar algorithm is commonly used in video encoders, although most will ignore rotations/perspective changes and just search for translations.
One approach that is quite commonly used is to do a gradient search for the best match. In other words, try tweaking the translation/rotation in a few different ways (e.g. left/right/up/down by 16 pixels) and pick the best match as your new starting point. Then repeat this process several times.
Once you are unable to improve the match, reduce the size of your tweaks and try again.
Alternative algorithms
Depending on your application you may want to consider some alternative methods:
Stereo matching. If your 2 images come from stereo camera then you only really need to search in one direction (and OpenCV provides useful methods to do this)
Known patterns. If you are able to place a known pattern (e.g. a chessboard) in both your images then it becomes a lot easier to register them (and OpenCV provides methods to find and register certain types of pattern)
Feature point matching. A common approach to image registration is to search for distinctive points (e.g. types of corner or more general places of interest) and then try to find matching distinctive points in the two images. For example, OpenCV contains functions to detect SURF features. Google has published a great paper on using this kind of approach in order to remove rolling shutter noise that I recommend reading.
I'm looking for a generic algorithm to calculate a red/cian anaglyph starting from the original image and his b/w depth map (example: http://www.swell3d.com/2008/07/turn-2d-painting-into-3d-anagl.html)
That algorythm are used, for example, in Photoshop but I can't find a readable explanation to reproduce it.
Thanks
After some researches I found what I was looking for.
First, I've readed some Photoshop/Gimp tutorials that describes how to make anaglyphs from two inputs: an image and its grayscale depth map. The core of the process is the use of "Displace Tool" and the depth map as a displacement map.
One of the several youtube tutorials: http://www.youtube.com/watch?v=gfYMe_vYhu4
So, I took some documentation about Gimp's Displace Tool by looking at this http://docs.gimp.org/en/plug-in-displace.html and directly at the source code of the tool (the method is very similar to the one proposed by Asgeir).
This lets us to produce two stereo images from the input, by looking at the depth map. The red and cyan colors of every image are calculated by reading this page http://3dtv.at/Knowhow/AnaglyphComparison_en.aspx ("Optimized" matrices are the best ones).
Then, the sum of the two images in one will produce the final anaglyph. Thanks everybody.
There are two algorithms involved. The first uses the original image and the depth map to produce a left and a right image. The second combines these images into a red-cyan anaglyph.
There are a couple ways to accomplish the first part. One is to take the original image and texture map it onto a fine mesh that lies flat in the XY plane. Then you tweak the Z values of each vertex in the mesh according to the corresponding value in the depth map. You've basically created a textured bas relief. You then use a 3D rendering algorithm to render the image from two vantage points that are offset horizontally by a small amount (essentially from the vantage point of a person's left and right eyes as they would view the bas relief).
There is probably a way to directly shift the pixels left and right which is a good fast approximation to what I described above.
Once you have the left and right images, you pass one through a cyan filter and one through a red filter. If you have RGB sources, that's as simple as taking the red channel from one image and combing it with the green and blue channels from the other image.
Anaglyphs work best with muted colors. If you have strong primaries, it won't look as good. You can use an algorithm to reduce the color saturation of the original image before you begin.
From the description in the link you provided I would assume that it is something like
for each pixel in depthmap
x_offset = (depthmap[x][y] / 255.0f) * MAX_PIXEL_OFFSET * DIRECTION
output[x + x_offset][y] = color_buffer[x][y]
blend output with color_buffer
Where MAX_PIXEL_OFFSET is the maximum shift in pixels and DIRECTION is -1 for one color and 1 for the other. This is assuming that the depthbuffer is one byte per pixel, range [0..255] and that 0 in the depthbuffer represents maximum distance.
I'd like to implement a Filter that allows resampling of an image by moving a number of control points that mark edges and tangent directions. The goal is to be able to freely transform an image as seen in Photoshop when you use "Free Transform" and chose Warpmode "Custom". The image is fitted into a some kind of Spline-Patch (if that is a valid name) that can be manipulated.
I understand how simple splines (paths) work but how do you connect them to form a patch?
And how can you sample such a patch to render the morphed image? For each pixel in the target I'd need to know what pixel in the source image corresponds. I don't even know where to start searching...
Any helpful info (keywords, links, papers, reference implementations) are greatly appreciated!
This document will get you a good insight into warping: http://www.gson.org/thesis/warping-thesis.pdf
However, this will include filtering out high frequencies, which will make the implementation a lot more complicated but will give a better result.
An easy way to accomplish what you want to do would be to loop through every pixel in your final image, plug the coordinates into your splines and retrieve the pixel in your original image. This pixel might have coordinates 0.4/1.2 so you could bilinearly interpolate between 0/1, 1/1, 0/2 and 1/2.
As for splines: there are many resources and solutions online for the 1D case. As for 2D it gets a bit trickier to find helpful resources.
A simple example for the 1D case: http://www-users.cselabs.umn.edu/classes/Spring-2009/csci2031/quad_spline.pdf
Here's a great guide for the 2D case: http://en.wikipedia.org/wiki/Bicubic_interpolation
Based upon this you could derive an own scheme for splines for the 2D case. Define a bivariate (with x and y) polynomial and set your constraints to solve for the coefficients of the polynomial.
Just keep in mind that the borders of the spline patches have to be consistent (both in value and derivative) to avoid ugly jumps.
Good luck!
So I want to detect lines on grayscale images. I have a lot of data 9x9 matrices of pixel ints 1 to 256 and 1*4 matrices of ponnts coords X ,Y, X,Y We have 1 line per 9x9 image or non lines. So what structure should have my NN?
Assuming that you're using the most common variety of neural networks, multillayer perceptrons, you'll have exactly as many input nodes as there are features.
The inputs may include transformed variables, in addition to the raw variables. The number of hidden nodes is selected by you, but you should have enough to permit the neural network to adequately make the mapping.
The number of output nodes will be determined by the number of classes and the representation you choose. Assuming two classes ("line", "not line" seems likely), you may use 1 output node, which indicates the estimated probability of one class (the probability of the remaining class being 1 minus the probability of the first class).
Detecting simple lines on a grayscale image is a well known problem. A Hough transform would be suffice for the job. See http://opencv.willowgarage.com/documentation/cpp/imgproc_feature_detection.html?highlight=hough%20line#cv-houghlines for a function that implement finding lines using Hough Transform.
Can you try the above function and see if it works?
If it doesn't, please update your question with a sample image.