I have some doubt about FindHomography algorithm. I wrote a program to test it. In this program I have rotated an image look for descriptors in original image and rotated image. After matching I use findHomography to retrieve transformation and calculate square error for RANSAC, LMEDS and RHO method. I wrote algorithm for Levenberg-Marquardt algorithm (using Numerical Recipes ). I can add some noise to point location. NR algorithm is best without noise. Problem is when noise increase. NR is always best and other algorithms (RANSAC LMEDS and RHO) become completly wrong. I fit only six parameters using NR. I think it is like in findHomography (see original post).
Everybody can check my code here. If you want to check with NR you can download full code on github
Is my code good (see enter link description here)? Why opencv results are always worst than my code?
PS Original post is answers.opencv.org with three links and this title
FindHomography algorithm : some doubt
Related
I'm trying to find a way to reliably determine the location of a puzzle piece in an image. The puzzle piece varies in both shape and how easy it is to find it. What algorithm(s) in the opencv module would help me with the task at hand? Or is what I'm trying to do beyond the scope of the module?
Example images below:
Update
The original title was "Detecting obscure shapes with Opencv Python". However I am interested in concepts of image-processing that would solve such a problem: How to find a pasted image inside the bigger image?
Assume the following:
The jigsaw shapes are always of same (rectangle) boundary size (ie: a template-based searching method could work).
The jigsaw shape is not rotated to any angle (ie: there will be straight(-ish) horizontal and vertical lines to find.
The jigsaw shape is always "pasted" into some other "original" image (ie: a paste-detection method could work).
The solution can be OpenCV (as requested by the asker), but the core concepts should be applicable when using any appropriate software (ie: can loop through image pixels to process their values, in order to achieve the described solution).
I myself use JavaScript, but of course I will understand that openCV.calcHist() becomes a histogram function in JS code. I have no problem translating a good explanation into code. I will consider OpenCV code as pseudo-code towards a working idea.
In my opinion the best approach for a canonical answer was suggested in the comments by Christoph, which is training a CNN:
Implement a generator for shapes of puzzle pieces.
Get a large set of natural images from the net.
Generate tons of sample images with puzzle pieces.
Train your model to detect those puzzle pieces.
Histogram of Largest Error
This is a rough concept of a possible algorithm.
The idea comes from an unfounded premise that seems plausible enough.
The premise is that adding the puzzle piece drastically changes the histogram of the image.
Let's assume that the puzzle piece is bounded by a 100px by 100px square.
We are going to use this square as a mask to mask out pixels that are used to calculate the histogram.
The algorithm is to find the placement of the square mask on the image such that the error between the histogram of the masked image and the original image is maximized.
There are many norms to experiment with to measure the error: You could start with the sum over the error components squared.
I'll throw in my own attempt. It fails on the first image, only works fine on the next two images. I am open to other pixel-processing based techniques where possible.
I do not use OpenCV so the process is explained with words (and pictures). It is up to the reader to implement the solution in their own chosen programming language/tool.
Background:
I wondered if there was something inherent in pasted images (something maybe revealed by pixel processing or even by frequency domain analysis, eg: could a Fourier signal analysis help here?).
After some research I came across Error Level Analysis (or ELA). This page has a basic introduction for beginners.
Process: In 7 easy steps, this detects the location of a pasted puzzle piece.
(1) Take a provided cat picture and re-save 3 times as JPEG in this manner:
Save copy #1 as JPEG of quality setting 2.
Reload (to force a decode of) copy #1 then re-save copy #2 as JPEG of quality setting 5.
Reload (to force a decode of) copy #2 then re-save copy #3 as JPEG of quality setting 2.
(2) Do a difference blend-mode with original cat picture as base layer versus the re-saved copy #3 image. Thimage will be black so we increase Levels.
(3) Increase Levels to make the ELA detected area(s) more visible. note: I recommend working in BT.709 or BT.601 grayscale at this point. Not necessary, but it gives "cleaner" results when blurring later on.
(4) Alternate between applying a box blur to the image and also increasing levels, to a point where the islands disappear and a large blob remains..
(5) The blob itself is also emphasized with an increase of levels.
(6) Finally a Gaussian blur is used to smoothen the selection area
(7) Mark the blob area (draw an outline stroke) and compare to input image...
So I have an implementation for a neural network that I followed on Youtube. The guy uses SGD (Momentum) as an optimization algorithm and hyperbolic tangent as an activation function. I already changed the transfer function to Leaky ReLU (for the hidden layers) and Sigmoid (for the output layer).
But now I decided I should also change the optimization algorithm to Adam. And I ended up searching for SGD (Momentum) on Wikipedia for a deeper understanding of how it works and I noticed something's off. The formula the guy uses in the clip is different from the one on Wikipedia. And I'm not sure if that's a mistake, or not... The clip is one hour long, but I'm not asking you to watch the entire video, however I'm intrigued by the 54m37s mark and the Wikipedia formula, right here:
https://youtu.be/KkwX7FkLfug?t=54m37s
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum
So if you take a look at the guy's implementation and then at the Wikipedia link for SGD (Momentum) formula, basically the only difference is in delta weight's calculation.
Wikipedia states that you subtract from the momentum multiplied by the old delta weight, the learning rate multiplied by the gradient and the output value of the neuron. Whereas in the tutorial, instead of subtracting the guy adds those together. However, the formula for the new weight is correct. It simply adds the delta weight to the old weight.
So my question is, did the guy in the tutorial make a mistake, or is there something I am missing? Because somehow, I trained a neural network and it behaves accordingly, so I can't really tell what the problem is here. Thanks in advance.
I have seen momentum implemented in different ways. Personally, I followed this guide in the end: http://ruder.io/optimizing-gradient-descent
There, momentum and weights are updated separately, which I think makes it clearer.
I do not know enought about the variables in the video, so I am not sure about that, but the wikipedia version is deffinetly correct.
In the video, the gradient*learning_rate gets added instead of subtracted, which is fine if you calculate and propagate your error accordingly.
Also, where in the video says "neuron_getOutputVal()*m_gradient", if it is as I think it is, that whole thing is considered the gradient. What I mean is that you have to multiplicate what you propagate times the outputs of your neurons to get the actual gradient.
For gradient descent without momentum, once you have your actual gradient, you multiply it with a learning rate and subtract (or add, depending on how you calculated and propagated the error, but usually subtract) it from your weights.
With momentum, you do it as it says in the wikipedia, using the last "change to your weights" or "delta weights" as part of your formula.
first of all, I have to say I'm new to the field of computervision and I'm currently facing a problem, I tried to solve with opencv (Java Wrapper) without success.
Basicly I have a picture of a part from a Model taken by a camera (different angles, resoultions, rotations...) and I need to find the position of that part in the model.
Example Picture:
Model Picture:
So one question is: Where should I start/which algorithm should I use?
My first try was to use KeyPoint Matching with SURF as Detector, Descriptor and BF as Matcher.
It worked for about 2 pcitures out of 10. I used the default parameters and tried other detectors, without any improvements. (Maybe it's a question of the right parameters. But how to find out the right parameteres combined with the right algorithm?...)
Two examples:
My second try was to use the color to differentiate the certain elements in the model and to compare the structure with the model itself (In addition to the picture of the model I also have and xml representation of the model..).
Right now I extraxted the color red out of the image, adjusted h,s,v values manually to get the best detection for about 4 pictures, which fails for other pictures.
Two examples:
I also tried to use edge detection (canny, gray, with histogramm Equalization) to detect geometric structures. For some results I could imagine, that it will work, but using the same canny parameters for other pictures "fails". Two examples:
As I said I'm not familiar with computervision and just tried out some algorithms. I'm facing the problem, that I don't know which combination of algorithms and techniques is the best and in addition to that which parameters should I use. Testing it manually seems to be impossible.
Thanks in advance
gemorra
Your initial idea of using SURF features was actually very good, just try to understand how the parameters for this algorithm work and you should be able to register your images. A good starting point for your parameters would be varying only the Hessian treshold, and being fearles while doing so: your features are quite well defined, so try to use tresholds around 2000 and above (increasing in steps of 500-1000 till you get good results is totally ok).
Alternatively you can try to detect your ellipses and calculate an affine warp that normalizes them and run a cross-correlation to register them. This alternative does imply much more work, but is quite fascinating. Some ideas on that normalization using the covariance matrix and its choletsky decomposition here.
Found this very interesting code on total variation filter tvmfilter
The additional functions this code uses are very confusing but the denoising is far better than all the filters i have tried so far
i have figured out the code on my own :)
His additional function "tv" denoises with the ROF model which has been a major research topic for two decades now. See http://www.ipol.im/pub/algo/g_tv_denoising/ for a summary of current methods.
Briefly, the idea behind ROF is to approximate the given noisy image with a piecewise constant image by solving an optimization which penalizes the total variation (ie l1-norm of the gradient) of the image.
The reason this performs well is that the other denoising methods you are probably working with denoise by smoothing the image via convolution with a Gaussian (ie penalizing the l2-norm of the gradient (ie solving the heat equation on the image) ). While fast to compute, denoising by smoothing blurs edges and thus results in poor image quality. l1-norm optimization preserves edges.
It's not clear how Guy solves the tv problem in that code you linked. He references the original ROF paper so it's possible that he's just using the original method (gradient descent) which is quite slow to converge. I suggest you give this code/paper a try: http://www.stanford.edu/~tagoldst/Tom_Goldstein/Split_Bregman.html as it's probably faster than the .m file you are using.
Also, as was mentioned in the comments, you will get better denoising (ie higher SNR) using nonlocal means. However, it will take much longer for the nonlocal means algorithm to work as it requires that you search the entire image for similar patches and compute weights based on them.
It looks a very simple question.
There are many lines available as their two endpoints.
The question is how to discretize them into a matrix. Then the matrix can be used for image processing purposes.
At the following figure example lines (yellow) and their corresponding pixelated demonstrations are shown.
A piece of code in any language would be of great help and strongly recommended and of course is in advance appreciated.
Note that performance and accuracy are very important factors.
Also as demonstrated each point of line must have only one pixel (i.e., element of matrix) associated.
The easiest way is to use Bresenham's algorithm.