I have a project where I am required to subtract an empty template image from an incoming user filled image. The document type is a normal Bank cheque.
The aim is to extract the handwritten fields from it by subtracting one image from the empty template image.
The issue what i am facing is in aligning these two images, as there is scaling, translation, rotation etc
Any ideas on how to align the template image with the incoming image?
UPDATE 1:
I am posting an example image from the wikipedia page but in the monochrome format as my image is in monochrome format.
When working with Image processing for industrial projects we have in most of the cases a fiducial. A fiducial is like a mark - can be a hole, an cross mark - that never changes, is always in the same positions.
Generally two fiducials are enough to correct misaligning problems like rotation, translation and also scale. For instance If you know the distance between the two, you can always check it to make sure the scale factor is right, or correct it based on the difference of the current distance against the right distance.
In your case, what I would ask you is: Does the template and the incoming image share any visual sign that are invariant and can easily be segmented?
If you have the answer for that question, all the rest will be more simple - the difference itself is a quite straightforward algorithm.
The basic answer is write a function that takes two images and a 2D transform and tells you how aligned they are once you apply the transform to the target image. The function needs to be continuous based on the transform and have a local minima (0) where the images are aligned perfectly. This is called a cost function.
Then use any optimization algorithm over the function and inputs -- you are trying to optimize the transform (translation, scale, rotation). Examples are hill climbing, genetic, simulated annealing, etc.
There are products that do this -- usually they are called Forms Recognition, Forms Registration, Forms Processing, etc. Some are SDKs, but there are also applications that can do it without programming.
Disclaimer: I work at Atalasoft, where we sell a Forms Processing add-on to our .NET imaging SDK.
Related
I'm trying to find a way to reliably determine the location of a puzzle piece in an image. The puzzle piece varies in both shape and how easy it is to find it. What algorithm(s) in the opencv module would help me with the task at hand? Or is what I'm trying to do beyond the scope of the module?
Example images below:
Update
The original title was "Detecting obscure shapes with Opencv Python". However I am interested in concepts of image-processing that would solve such a problem: How to find a pasted image inside the bigger image?
Assume the following:
The jigsaw shapes are always of same (rectangle) boundary size (ie: a template-based searching method could work).
The jigsaw shape is not rotated to any angle (ie: there will be straight(-ish) horizontal and vertical lines to find.
The jigsaw shape is always "pasted" into some other "original" image (ie: a paste-detection method could work).
The solution can be OpenCV (as requested by the asker), but the core concepts should be applicable when using any appropriate software (ie: can loop through image pixels to process their values, in order to achieve the described solution).
I myself use JavaScript, but of course I will understand that openCV.calcHist() becomes a histogram function in JS code. I have no problem translating a good explanation into code. I will consider OpenCV code as pseudo-code towards a working idea.
In my opinion the best approach for a canonical answer was suggested in the comments by Christoph, which is training a CNN:
Implement a generator for shapes of puzzle pieces.
Get a large set of natural images from the net.
Generate tons of sample images with puzzle pieces.
Train your model to detect those puzzle pieces.
Histogram of Largest Error
This is a rough concept of a possible algorithm.
The idea comes from an unfounded premise that seems plausible enough.
The premise is that adding the puzzle piece drastically changes the histogram of the image.
Let's assume that the puzzle piece is bounded by a 100px by 100px square.
We are going to use this square as a mask to mask out pixels that are used to calculate the histogram.
The algorithm is to find the placement of the square mask on the image such that the error between the histogram of the masked image and the original image is maximized.
There are many norms to experiment with to measure the error: You could start with the sum over the error components squared.
I'll throw in my own attempt. It fails on the first image, only works fine on the next two images. I am open to other pixel-processing based techniques where possible.
I do not use OpenCV so the process is explained with words (and pictures). It is up to the reader to implement the solution in their own chosen programming language/tool.
Background:
I wondered if there was something inherent in pasted images (something maybe revealed by pixel processing or even by frequency domain analysis, eg: could a Fourier signal analysis help here?).
After some research I came across Error Level Analysis (or ELA). This page has a basic introduction for beginners.
Process: In 7 easy steps, this detects the location of a pasted puzzle piece.
(1) Take a provided cat picture and re-save 3 times as JPEG in this manner:
Save copy #1 as JPEG of quality setting 2.
Reload (to force a decode of) copy #1 then re-save copy #2 as JPEG of quality setting 5.
Reload (to force a decode of) copy #2 then re-save copy #3 as JPEG of quality setting 2.
(2) Do a difference blend-mode with original cat picture as base layer versus the re-saved copy #3 image. Thimage will be black so we increase Levels.
(3) Increase Levels to make the ELA detected area(s) more visible. note: I recommend working in BT.709 or BT.601 grayscale at this point. Not necessary, but it gives "cleaner" results when blurring later on.
(4) Alternate between applying a box blur to the image and also increasing levels, to a point where the islands disappear and a large blob remains..
(5) The blob itself is also emphasized with an increase of levels.
(6) Finally a Gaussian blur is used to smoothen the selection area
(7) Mark the blob area (draw an outline stroke) and compare to input image...
I have a small render engine written for fun. I would like to have some unit testing that would render automatically an image and then compare it to a stored image to check for differences. This should give some sort of metric to be able to gauge if the image is too far off or if we can attribute that to just different timings in animations. If it can also produce the location in the image of the differences that would be great, but not necessary. We can also assume that the 2 images are the exact same size.
What are the classic papers/techniques for that sort of thing ?
(the language is Go, probably nothing exists for it yet and I'd like to implement it myself to understand what's going on. The renderer is github.com/luxengine)
Thank you
One idea could be to see your problem as a case in Image Registration.
The following figure (taken from http://it.mathworks.com/help/images/point-mapping.html) gives a flow-chart for a method to solve the image registration problem.
Using the above figure terms, the basic idea is:
find some interest points in the Fixed image;
find in the Moving image the same corresponding points;
estimate the transformation between the two images using the point correspondences. One of the simplest transformation is a translation represented by a 2D vector; the magnitude of this vector is a measure of differences between the two images, in your case it can be related to the shift you wrote about in your comment. A richer transformation is an homography described by a 3x3 matrix, its distance from the identity matrix is again a measure of differences between the two images.
you can apply the transformation back, for example in the case of the translation you apply the translation to the Moving image and then the warped image can be compared (here I am simplifying a little) pixel by pixel to the Reference image.
Some more ideas are here: Image comparison - fast algorithm
For my bachelor thesis I need to analyse images taken in the ocean to count and measure the size of water particles.
my problem:
besides the wanted water particles, the images show hexagonal patches all over the image in:
- different sizes
- not regular shape
- different greyscale values
(Example image below!)
It is clear that these patches will falsify my image analysis concerning the size and number of particles.
For this reason this patches need to be detected and deleted somehow.
Since it will be just a little part of the work in my thesis, I don't want to spend much time in it and already tried classic ways like: (imageJ)
playing with the threshold (resulting in also deleting wanted water particles)
analyse image including the hexagonal patches and later sort out the biggest areas (the hexagonal patches have quite the biggest areas, but you will still have a lot of haxagons)
playing with filters: using gaussian filter on a duplicated image and subtract the copy from the original deletes many patches (in reducing the greyscale value) but also deletes little wanted water particles and so again falsifies the result
a more complicated and time consuming solution would be to use a implemented library in for example matlab or opencv to detect points, that describe the shapes.
but so far I could not find any code that fits my task.
Does anyone of you have created such a code I could use for my task or any other idea?
You can see a lot of hexagonal patches in different depths also.
the little spots with an greater pixel value are the wanted particles!
Image processing is quite an involved area so there are no hard and fast rules.
But if it was me I would 'Mask' the image. This involves either defining what you want to keep or remove as a pixel 'Mask'. You then scan the mask over the image recursively and compare the mask to the image portion selected. You then select or remove the section (depending on your method) if it meets your criterion.
One such example of a criteria would be the spatial and grey-scale error weighted against a likelihood function (eg Chi-squared, square mean error etc.) or a Normal distribution that you define the uncertainty..
Some food for thought
Maybe you can try with the Hough transform:
https://en.wikipedia.org/wiki/Hough_transform
Matlab have an built-in function, hough, wich implements this, but only works for lines. Maybe you can start from that and change it to recognize hexagons.
I need to do template matching in 360 degrees.
Mostly template is 80*120 and image is 640*480 grayscale (8 bit).
For non-rotation I am using opencv cvmatchtemplate which is working pretty fine.
I tried rotating template at various angles and doing cvmatchtemplate, it's working but consuming too much time.
For normal template match it is taking 12 ms, and for 360 degrees less than 50 ms is required.
If you convert your template and image to polar co-ordinates you can do the search as if it is a translation. This should be much faster because it is only one transform - you can implement this efficiently.
I think that expecting to get a good result for 360 degrees is challenging. The template must have changed during that transform. If it was only a few degrees then it is less likely to change.
Take a look at "An FFT based technique for translation, rotation and scale invariant image registration", Reddy and Chatterji, IEEE Transactions on Image Processing, 1996.
Search in Google Scholar for "synthetic discriminant functions" or "composite correlation filters". This is a good starting point: http://www.opticsinfobase.org/abstract.cfm?URI=ao-31-23-4773. If you can find the book "Correlation Pattern Recognition", section 6.2 explains composite filters as well.
The main idea is that you take the templates generated by rotating your images and generate a single synthetic template. You do this by formulating a system of linear equations of the form
Ax = c
Where A is the coefficient matrix generated from the templates you have available. x is the synthetic template you're going to determine, and c is a constraints vector. The constraints can be set to include some templates and to reject others.
The problem is that when you combine too many templates into one you start loosing matching performance. There are, of course, ways to overcome this problem depending on what additional information you have available about the images in which you plan to use your synthetic templates.
I am amazed at how well (and fast) this software works. I hovered my phone's camera over a small area of a book cover in dim light and it only took a couple of seconds for Google Shopper to identify it. It's almost magical. Does anyone know how it works?
I have no idea how Google Shopper actually works. But it could work like this:
Take your image and convert to edges (using an edge filter, preserving color information).
Find points where edges intersect and make a list of them (including colors and perhaps angles of intersecting edges).
Convert to a rotation-independent metric by selecting pairs of high-contrast points and measuring distance between them. Now the book cover is represented as a bunch of numbers: (edgecolor1a,edgecolor1b,edgecolor2a,edgecolor2b,distance).
Pick pairs of the most notable distance values and ratio the distances.
Send this data as a query string to Google, where it finds the most similar vector (possibly with direct nearest-neighbor computation, or perhaps with an appropriately trained classifier--probably a support vector machine).
Google Shopper could also send the entire picture, at which point Google could use considerably more powerful processors to crunch on the image processing data, which means it could use more sophisticated preprocessing (I've chosen the steps above to be so easy as to be doable on smartphones).
Anyway, the general steps are very likely to be (1) extract scale and rotation-invariant features, (2) match that feature vector to a library of pre-computed features.
In any case, the Pattern Recognition/Machine Learning methods often are based on:
Extract features from the image that can be described as numbers. For instance, using edges (as Rex Kerr explained before), color, texture, etc. A set of numbers that describes or represents an image is called "feature vector" or sometimes "descriptor". After extracting the "feature vector" of an image it is possible to compare images using a distance or (dis)similarity function.
Extract text from the image. There are several method to do it, often based on OCR (optical character recognition)
Perform a search on a database using the features and the text in order to find the closest related product.
It is also likely that the image is also cuted into subimages, since the algorithm often finds a specific logo on the image.
In my opinion, the image features are send to different pattern classifiers (algorithms that are able to predict a "class" using as input a feature vector), in order to recognize logos and, afterwards, the product itself.
Using this approach, it can be: local, remote or mixed. If local, all processing is carried out on the device, and just the "feature vector" and "text" are sent to a server where the database is. If remote, the whole image goes to the server. If mixed (I think this is the most probable one), partially executed locally and partially at the server.
Another interesting software is the Google Googles, that uses CBIR (content-based image retrieval) in order to search for other images that are related to the picture taken by the smartphone. It is related to the problem that is addressed by Shopper.
Pattern Recognition.