I would like to ask a question I already asked on the OpenCV board but did not get an answer to: http://answers.opencv.org/question/189206/questions-about-the-fundamental-matrix-and-homographies/.
After learning about the fundamental matrix I have the following question that I could not answer by googling. The fundamental matrix is a more general case of the homography as it is independent of scene's structure. So I was wondering if it could be used for image stitching instead of a homography. But all papers I found only use homographies. So I reread the material about the properties of the fundamental matrix and now I am wondering:
Is it not possible to use the fundamental matrix for stitching because of its rank deficiency and the fact that it does only relate points in Image 1 to lines (epipolar lines) in Image 2?
Another question I have regarding homographies: All papers I read about image stitching use homographies for rotational panoramas. What if I want to create a panorama based only on translation between images? Can I use the homography as well? The answers provided by a google search vary quite a lot.
Kind regards and thanks for your help!
Conundraah
About using fundamental matrix for stitching.
It actually depends on how you want to stitch the image together.
The problem is even if you get the fundamental matrix, when you stitch images together, you will only need homography matrix to do the transformation of images. So what is the point of using fundamental matrix. Unless you figure out how to handle the different distance on the same image.
In the case of panorama images, the assumption is that the scene structure is far enough to be seen as planar, so comparatively the translation could be ignored. If that is not the case, translation could be considered.
Related
I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.
I know some of the most well-known ones are:
VGG Net
ResNet
Dense Net
Inception Net
Xception Net
They usually need an input of images around 224x224x3 and I also saw 32x32x3.
Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.
Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.
Should I scale my images? How about the opposite and shrink to 32x32 inputs?
Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
Do you have any recommendations regarding what architecture to choose?
I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?
In advances I leave my thanks!
I am assuming you basic know-how in using CNN for classification
Answering question 1~3
You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.
You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.
The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.
Question 4
When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want.
Also make sure the input size and format during inference is the same as the input size and format during training.
Question 5
If processing time is not a problem RESNET. If processing time is important, then MobileNet.
Question 6
6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.
I hope this will clear some of your problems and direct you to a proper solution.
I have a small render engine written for fun. I would like to have some unit testing that would render automatically an image and then compare it to a stored image to check for differences. This should give some sort of metric to be able to gauge if the image is too far off or if we can attribute that to just different timings in animations. If it can also produce the location in the image of the differences that would be great, but not necessary. We can also assume that the 2 images are the exact same size.
What are the classic papers/techniques for that sort of thing ?
(the language is Go, probably nothing exists for it yet and I'd like to implement it myself to understand what's going on. The renderer is github.com/luxengine)
Thank you
One idea could be to see your problem as a case in Image Registration.
The following figure (taken from http://it.mathworks.com/help/images/point-mapping.html) gives a flow-chart for a method to solve the image registration problem.
Using the above figure terms, the basic idea is:
find some interest points in the Fixed image;
find in the Moving image the same corresponding points;
estimate the transformation between the two images using the point correspondences. One of the simplest transformation is a translation represented by a 2D vector; the magnitude of this vector is a measure of differences between the two images, in your case it can be related to the shift you wrote about in your comment. A richer transformation is an homography described by a 3x3 matrix, its distance from the identity matrix is again a measure of differences between the two images.
you can apply the transformation back, for example in the case of the translation you apply the translation to the Moving image and then the warped image can be compared (here I am simplifying a little) pixel by pixel to the Reference image.
Some more ideas are here: Image comparison - fast algorithm
I'm working on a small program for optical mark recognition.
The processing of the scanned form consists of two steps:
1) Find the form in the scanned image, descew and crop borders.
2) With this "normalized" form, I can simply search the marks by using coordinates from the original document and so on.
For the first step, I'm currently using the Homography functions from OpenCV and a perspecive transform to map the points. I also tried the SurfDetector.
However, both algorithms are quite slow and do not really meet the speed requierements when scanning forms from a document scanner.
Can anyone point me to an alternative algorithm/solution for this specific problem?
Thanks in advance!
Try with ORB or FAST detector: they should be faster than SURF (documentation here).
If those don't match your speed requirement you should probably use a different approach. Do you need scale and rotation invariance? If not, you could try with the cross correlation.
Viola-Jones cascade classifier is pretty quick. It is used in OpenCV for Face detection, but you can train it for different purpose. Depending on the appearance of what you call your "form", you can use simpler algorithms such as cross correlation as said by Muffo.
Please suggest any template matching algorithms, which are independent of size and rotation.
(any source codes as examples if possible please)
EDIT 1:
Actually I understand how the algorithm works, we can resize template and rotate it. It is computationally expensive, but we can use image pyramids. But the real problem for me now is when the picture is made at some angle to object, so that only a perspective transform can correct the image. I mean that even if we rotate image or scale it, we will not get a good match if the object in image is perspectively transformed. Of course it is possible to try to generate many templates at different perspective, but I think it is very bad idea.
EDIT 2:
One more problem when using template matching based on shape matching.
What if image doesn't have many sharp edges? For example a plate or dish?
EDIT 3:
I've also heard about camera callibration for object detection. What is the algorithm used for that purpose? I don't understand how it can be used for template matching.
I don't think there is an efficient template matching algorithm that is affine-invariant (rotation+scale+translation).
You can make template matching somewhat robust to scale+rotation by using a distance transform (see Chamfering style methods). You should probably also look at SIFT and MSER to get a sense of how the research area has been shaped the past decade. But these are not template matching algorithms.
Check out this recent 2013 paper on efficient affine template matching: "Fast-Match". http://www.eng.tau.ac.il/~simonk/FastMatch/
Matlab code is available on that website. Basic idea is to exhaustively search the affine space, but do it in the sparsest way possible based on how smooth the image is. Has a formal approximation guarantee, although it won't always find the absolute best answer.
Recently I saw demonstration of image distortion algorithm that preserves some objects in image. For example changing weight/height ratio but preserving people faces in image to look realistically. The problem is i can't find the reference to that demonstration neither I know the name of such transformations.
Can anybody point me to a references to such algorithms?
Perhaps you are referring to liquid rescale?
Seam carving is the current favourite. Liquid Rescale is an implementation of it.