Best way to scan and compare images in real time - windows

Need your help.
For example i hv game. It is 500x500 pixels. shows random pictures. When there is a particular picture of smiling dog, you need to press space in 1 second.
Lets assume that we allready have that particular image of smiling dog in our memory.(Lets call it target image from now) And all that is left, is just to compare scan image to a target one. So we need to: scan this 500x500 area in real time(one-ten times per second) and compare scan result to target image, and then, do some stuff, depending on a result of comparison.
I need to do it as fast, as reliable and as efficient(in terms of computing power) as possible.
Also i need to be able to move window with game, so it would not brake screen capture(I need screen capture to be anchored to game window, not to whole screen)
So whats your suggestions on handling scanning and comparative phases of this programm in fastest, most reliable and most efficient way possible?
/*
EDIT:I think i need to use OpenCV for screen scan and comparision. But what language would be preferable in terms of speed and efficient computing power distribution? C++ or Python?
*/
Also i need this program to work Windows and Mac OS in future, for now it is just windows.
Ty in advance.
/*
EDIT: What do you guys think of this method :
let our scan result image be "X", and let our target image(image that we allready have aka "image of smiling dog" and want our X to be identical to this image) be "Y".
First we downscale Y from 500x500 to 50x50, than we dowunscale X from 500x500 to 50x50, then we transfer both images to grey scale. And then run normalize sum of squared differene, or even normalized cross corelation to detect similarities betwen X and Y. What do you think? Will it be faster than my suggestion:
{
downsample captured image to 50x50 pixels. Then compare every 25th pixel one by one, gradually(If 1st pixel is same as in target image - proceed to next pixel. If not - wait untill next scanning event and scann first pixel again) to downsampled target image. If all pixels are identical - then we are good, and we found that exact picture.
}
*/

Related

Detecting individual images in an array of images

I'm building a photographic film scanner. The electronic hardware is done now I have to finish the mechanical advance mechanism then I'm almost done.
I'm using a line scan sensor so it's one pixel width by 2000 height. The data stream I will be sending to the PC over USB with a FTDI FIFO bridge will be just 1 byte values of the pixels. The scanner will pull through an entire strip of 36 frames so I will end up scanning the entire strip. For the beginning I'm willing to manually split them up in Photoshop but I would like to implement something in my program to do this for me. I'm using C++ in VS. So, basically I need to find a way for the PC to detect the near black strips in between the images on the film, isolate the images and save them as individual files.
Could someone give me some advice for this?
That sounds pretty simple compared to the things you've already implemented; you could
calculate an average pixel value per row, and call the resulting signal s(n) (n being the row number).
set a threshold for s(n), setting everything below that threshold to 0 and everything above to 1
Assuming you don't know the exact pixel height of the black bars and the negatives, search for periodicities in s(n). What I describe in the following is total overkill, but that's how I roll:
use FFTw to calculate a discrete fourier transform of s(n), call it S(f) (f being the frequency, i.e. 1/period).
find argmax(abs(S(f))); that f represents the distance between two black bars: number of rows / f is the bar distance.
S(f) is complex, and thus has an argument; arctan(imag(S(f_max))/real(S(f_max)))*number of rows will give you the position of the bars.
To calculate the width of the bars, you could do the same with the second highest peak of abs(S(f)), but it'll probably be easier to just count the average length of 0 around the calculated center positions of the black bars.
To get the exact width of the image strip, only take the pixels in which the image border may lie: r_left(x) would be the signal representing the few pixels in which the actual image might border to the filmstrip material, x being the coordinate along that row). Now, use a simplistic high pass filter (e.g. f(x):= r_left(x)-r_left(x-1)) to find the sharpest edge in that region (argmax(abs(f(x)))). Use the average of these edges as the border location.
By the way, if you want to write a source block that takes your scanned image as input and outputs a stream of pixel row vectors, using GNU Radio would offer you a nice method of having a flow graph of connected signal processing blocks that does exactly what you want, without you having to care about getting data from A to B.
I forgot to add: Use the resulting coordinates with something like openCV, or any other library capable of reading images and specifying sub-images by coordinates as well as saving to new images.

"Barcode" reading from scanned image

I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.

resample an image from pixel to millimiters

I have an image (logical values), like this
I need to get this image resampled from pixel to mm or cm; this is the code I use to get the resampling:
function [ Ires ] = imresample3( I, pixDim )
[r,c]=size(I);
x=1:1:c;
y=1:1:r;
[X,Y]=meshgrid(x,y);
rn=r*pixDim;
cn=c*pixDim;
xNew=1:pixDim:cn;
yNew=1:pixDim:rn;
[Xnew,Ynew]=meshgrid(xNew,yNew);
Id=double(I);
Ires=interp2(X,Y,Id,Xnew,Ynew);
end
What I get is a black image. I suspect that this code does something that is not what I have in mind: it seems to take only the upper-left part of the image.
What I want is, instead, to have the same image on a mm/cm scale: what I expect is that every white pixel should be mapped from the original position to the new position (in mm/cm); what happen is certainly not what I expect.
I'm not sure that interp2 is the right command to use.
I don't want to resize the image, I just want to go from pixel world to mm/cm world.
pixDim is of course the dimension of the image pixel, obtained dividing the height of the ear in cm by the height of the ear in mm (and it is on average 0.019 cm).
Any ideas?
EDIT: I was quite sure that the code had no sense, but someone told me to do that way...anyway, if I have two edged ears, I need first to scale both the the real dimension and then perform some operations on them. What I mean with "real dimension" is that if one has size 6.5x3.5cm and the other has size 6x3.2cm, I need to perform operations on this dimensions.
I don't get how can I move from the pixel dimension to cm dimension BEFORE doing operation.
I want to move from one world to the other because I want to get rid of the capturing distance (because I suppose that if a picture of the ear is taken near and the other is taken far, they should have different size in pixel dimension).
Am I correct? There is a way to do it? I thought I can plot the ear scaling the axis, but then I suppose I cannot subtract one from the other, right?
Matlab does not use units. To apply your factor of 0.019cm/pixel you have to scale by a factor of 0.019 to have a 1cm grid, but this would cause any artefact below a size of 1cm to be lost.
Best practice is to display the data using multiple axis, one for cm and one for pixels. It's explained here: http://www.mathworks.de/de/help/matlab/creating_plots/using-multiple-x-and-y-axes.html
Any function processing the data should be independent of the scale or use the scale factor as an input argument, everything else is a sign of some serious algorithmic issues.

Invoice / OCR: Detect two important points in invoice image

I am currently working on OCR software and my idea is to use templates to try to recognize data inside invoices.
However scanned invoices can have several 'flaws' with them:
Not all invoices, based on a single template, are correctly aligned under the scanner.
People can write on invoices
etc.
Example of invoice: (Have to google it, sadly cannot add a more concrete version as client data is confidential obviously)
I find my data in the invoices based on the x-values of the text.
However I need to know the scale of the invoice and the offset from left/right, before I can do any real calculations with all data that I have retrieved.
What have I tried so far?
1) Making the image monochrome and use the left and right bounds of the first appearance of a black pixel. This fails due to the fact that people can write on invoices.
2) Divide the invoice up in vertical sections, use the sections that have the highest amount of black pixels. Fails due to the fact that the distribution is not always uniform amongst similar templates.
I could really use your help on (1) how to identify important points in invoices and (2) on what I should focus as the important points.
I hope the question is clear enough as it is quite hard to explain.
Detecting rotation
I would suggest you start by detecting straight lines.
Look (perhaps randomly) for small areas with high contrast, i.e. mostly white but a fair amount of very black pixels as well. Then try to fit a line to these black pixels, e.g. using least squares method. Drop the outliers, and fit another line to the remaining points. Iterate this as required. Evaluate how good that fit is, i.e. how many of the pixels in the observed area are really close to the line, and how far that line extends beyond the observed area. Do this process for a number of regions, and you should get a weighted list of lines.
For each line, you can compute the direction of the line itself and the direction orthogonal to that. One of these numbers can be chosen from an interval [0°, 90°), the other will be 90° plus that value, so storing one is enough. Take all these directions, and find one angle which best matches all of them. You can do that using a sliding window of e.g. 5°: slide accross that (cyclic) region and find a value where the maximal number of lines are within the window, then compute the average or median of the angles within that window. All of this computation can be done taking the weights of the lines into account.
Once you have found the direction of lines, you can rotate your image so that the lines are perfectly aligned to the coordinate axes.
Detecting translation
Assuming the image wasn't scaled at any point, you can then try to use a FFT-based correlation of the image to match it to the template. Convert both images to gray, pad them with zeros till the originals take up at most 1/2 the edge length of the padded image, which preferrably should be a power of two. FFT both images in both directions, multiply them element-wise and iFFT back. The resulting image will encode how much the two images would agree for a given shift relative to one another. Simply find the maximum, and you know how to make them match.
Added text will cause no problems at all. This method will work best for large areas, like the company logo and gray background boxes. Thin lines will provide a poorer match, so in those cases you might have to blur the picture before doing the correlation, to broaden the features. You don't have to use the blurred image for further processing; once you know the offset you can return to the rotated but unblurred version.
Now you know both rotation and translation, and assumed no scaling or shearing, so you know exactly which portion of the template corresponds to which portion of the scan. Proceed.
If rotation is solved already, I'd just sum up all pixel color values horizontally and vertically to a single horizontal / vertical "line". This should provide clear spikes where you have horizontal and vertical lines in the form.
p.s. Generated a corresponding horizontal image with Gimp's scaling capabilities, attached below (it's a bit hard to see because it's only one pixel high and may get scaled down because it's > 700 px wide; the url is http://i.stack.imgur.com/Zy8zO.png ).

How do I locate black rectangles in a grid and extract the binary code from that

i'm working in a project to recognize a bit code from an image like this, where black rectangle represents 0 bit, and white (white space, not visible) 1 bit.
Somebody have any idea to process the image in order to extract this informations? My project is written in java, but any solution is accepted.
thanks all for support.
I'm not an expert in image processing, I try to apply Edge Detection using Canny Edge Detector Implementation, free java implementation find here. I used this complete image [http://img257.imageshack.us/img257/5323/colorimg.png], reduce it (scale factor = 0.4) to have fast processing and this is the result [http://img222.imageshack.us/img222/8255/colorimgout.png]. Now, how i can decode white rectangle with 0 bit value, and no rectangle with 1?
The image have 10 line X 16 columns. I don't use python, but i can try to convert it to Java.
Many thanks to support.
This is recognising good old OMR (optical mark recognition).
The solution varies depending on the quality and consistency of the data you get, so noise is important.
Using an image processing library will clearly help.
Simple case: No skew in the image and no stretch or shrinkage
Create a horizontal and vertical profile of the image. i.e. sum up values in all columns and all rows and store in arrays. for an image of MxN (width x height) you will have M cells in horizontal profile and N cells in vertical profile.
Use a thresholding to find out which cells are white (empty) and which are black. This assumes you will get at least a couple of entries in each row or column. So black cells will define a location of interest (where you will expect the marks).
Based on this, you can define in lozenges in the form and you get coordinates of lozenges (rectangles where you have marks) and then you just add up pixel values in each lozenge and based on the number, you can define if it has mark or not.
Case 2: Skew (slant in the image)
Use fourier (FFT) to find the slant value and then transform it.
Case 3: Stretch or shrink
Pretty much the same as 1 but noise is higher and reliability less.
Aliostad has made some good comments.
This is OMR and you will find it much easier to get good consistent results with a good image processing library. www.leptonica.com is a free open source 'C' library that would be a very good place to start. It could process the skew and thresholding tasks for you. Thresholding to B/W would be a good start.
Another option would be IEvolution - http://www.hi-components.com/nievolution.asp for .NET.
To be successful you will need some type of reference / registration marks to allow for skew and stretch especially if you are using document scanning or capturing from a camera image.
I am not familiar with Java, but in Python, you can use the imaging library to open the image. Then load the height and the widths, and segment the image into a grid accordingly, by Height/Rows and Width/Cols. Then, just look for black pixels in those regions, or whatever color PIL registers that black to be. This obviously relies on the grid like nature of the data.
Edit:
Doing Edge Detection may also be Fruitful. First apply an edge detection method like something from wikipedia. I have used the one found at archive.alwaysmovefast.com/basic-edge-detection-in-python.html. Then convert any grayscale value less than 180 (if you want the boxes darker just increase this value) into black and otherwise make it completely white. Then create bounding boxes, lines where the pixels are all white. If data isn't terribly skewed, then this should work pretty well, otherwise you may need to do more work. See here for the results: http://imm.io/2BLd
Edit2:
Denis, how large is your dataset and how large are the images? If you have thousands of these images, then it is not feasible to manually remove the borders (the red background and yellow bars). I think this is important to know before proceeding. Also, I think the prewitt edge detection may prove more useful in this case, since there appears to be less noise:
The previous method of segmenting may be applied, if you do preprocess to bin in the following manner, in which case you need only count the number of black or white pixels and threshold after some training samples.

Resources