I am integrating Tesseract OCR in an app. Unfortunately the quality of the recognition is... not that great. The answer seems to be doing some very basic image cleaning before sending the image off for OCR.
Basically I plan to build a small pipeline that does the following:
Crop to a white bounding box on the assumption that most users will
try to do recco of ordinary black print on white background
(optional)
Convert to black/white
Despeckle to remove artifacts caused by step 2.
I have 2. down (the easy part), and am looking for input on how to do 3 and optionally 1.
Well... It turns out that Martin's suggestion of using ImageMagick is probably the best option in my case.
There's a CI filter that does noise removal, but it's not available in iOS, and I will have to use ImageMagick to convert a PDF to TIFF for OCR anyway, so ImageMagick it is.
An alternative is the small image processing library that Chris Greening made. If you don't need the full force of ImageMagick it will do most of the light lifting for you, and some of the heavy lifting too.
Related
I am working on a project in which I need to highlight the difference between pair of scanned images of text.
Example images are here and here.
I am building a webapp based on HTML,JS for this.
I found that openCV does support highlighting differences between 2 images.
Also I saw that imageMagick also has such support.
Does openCV has support for doing automatic registration of images?
And is there a JS module for openCV?
Which one is more suited for my purpose?
1. Simplistic way:
Suppose the images are perfectly aligned and similarly illuminated: subtract one image from another pixel by pixel, then threshold the result, filter out noisy blobs, and select the biggest ones. Good for a school project
2. A bit more complicated:
Align the images, then find a way to uniform the illumination, then apply the simplistic way.
How to align:
Find the text area in two images, as being a darker than the file color.
Find its corners
Use getPerspectiveTransform() to find the transform between images.
warpPerspective() one image to another.
Another way to register the two images is by feature matching. It has quite an extensive support in OpenCV. And findHomography() will estimate the pose between two images from a bigger set of matching points.
3. Canonical answer:
Align the image.
Convert it to text with an OCR engine.
Compare the text in the two images.
Well, besides the great help given by vasile, you also need the web app answer.
In order to make it work in a server, you will probably need a file upload form, as well as an answer from the server with the applied algorithm. There are several ways you can do it depending on the server restrictions you have. If you can run command line arguments, you would probably need to implement the highlight algorithm in opencv and pass the two input files a an output one for the program. A php script should be used for uploading the files, calling the command line program, and outputting the result to the user.
Another approach could be using java and JavaCV in a web container like Apache Tomcat, for instance.
Best regards,
Daniel
i'm just curious if someone knows how to analyse an image. for example:
i have a heatmap picture, know i want to extract the color value and the x,y coordiantes and redraw the image with javascript & canvas.
Another example would be to recognize pattern in the image (lines, arrows) and extract the direction and length.
A popular image/video analysis library I would recommend is OpenCV. I have used this github fork of the ruby-opencv gem with success. If you scroll down on the readme file, you'll see an example on face detection. The unit tests demonstrate how to do other things like drawing shapes and such. At a glance, I don't see any tests on extracting pixel data, but it most definitely is possible.
If you need something more simple, you can try out devil. It's more user-friendly and is focused on image manipulation, but you can probably extract pixel data with it.
It sounds like you'll be leaning towards OpenCV. It might be useful to look at this previous question, specifically the mention of the Hough transform
I'm trying to get an image of a blackboard readable by OCR. Naturally, most OCR software doesn't like dirty images. What image processing should I try to put the image through to clean the image up?
Have you tried the OCR software yet? It's likely that the OCR software is well suited to reading what's essentially already a black and white image.
However, if you were required to do so you could try to:
Threshold the image.
Essentially take a greyscale version of the image and turn it into black / white pixels
Perform Binary Dilation to grow the remaining objects
Perform Binary Erosion
The idea is by dilating then eroding you would remove any rough / noisy edges and then you can pass the skeletonized image to the OCR.
There are probably plenty of methods to achieve a similar result. Given that there are entire books devoted to computer vision this answer will hardly do them justice.
The only texts I have are from 1997, but surely there's been more written on the subject since.
Algorithms for Image Processing and Computer Vision - J.R. Parker
Digital Image Processing - Gonzalez / Woods
Offhand, I'd say invert the image (reverse the colors, so that the writing is black on white) and increase the contrast a bit. You can try modifying the brightness to get the erased chalk fogginess to disappear into the background.
In Photoshop, the Levels dialog may be your most useful image adjustment. Mimicking this in code is another subject, entirely.
The basis of Levels is that you adjust the max, min and midpoints of the brightness levels. Usually shown on a histogram, you adjust the points such that you obtain the desired amount of contrast, but also move the midpoint such that text in the image is the most well-defined; critical for OCR applications. By moving the midpoint you can "eliminate" the grayscale fuzz that ordinarily surrounds handwriting by causing it to disappear into the light (or dark) areas of the image.
Also you might try converting the image to 1-bit after such an adjustment, forcing everything to black or white. Sometimes this speeds up the OCR process. But be careful, it also will discard detail.
Have you tried edge detection techniques such as Roberts Cross and Sobel operator to filter noise out of the image? Without seeing the quality of the image, can't say how effective that'd be.
Not sure how constrained you are in the choice of OCR solution, but the ABBYY OCR engine (and a web API based on it, http://www.wisetrend.com/wisetrend_ocr_cloud.shtml ) includes automatic image cleanup / texture removal options.
There are commercial solutions but cleaning up board images appears to be an open problem. Add OCR to an unsolved problem, and you get... an unsolved problem.
What are some of the points that I need to follow if I want to have good quality images in a LaTeX document. These images are mostly screenshots of a software application or flow charts.
Below are two such images.
Flow Chart
Screenshot
Thanx
For diagrams, the rule is to use vector formats as much as you can — PDF, EPS or native LaTeX packages. When using vector graphics, the picture does not loose resolution and can be scaled freely. For a flow chart, I would either export it from the drawing application as a PDF, or use PGF/Tikz to produce it from LaTeX (see also examples). If your drawing application does not have a PDF export, consider using one that does — e.g., UMLet.
If you can't use vector graphics (e.g., because it is a screenshot), make sure you use high-enough resolution to begin with. If it is an academic paper, the publisher usually has guidelines for this.
If you use PDFLatex you can use png images and in those cases you definately should use png over jpeg. PNG compression is not lossy, so you get the best quality at the expense of file size.
The second important point is to create the images with sufficient resolution, for printing it should be about 300-600 dpi, higher is better but the filesize of the images and the resulting document will increase. For documents that will only be looked at a screen you can use a lower resolution, about 72-100 dpi should be enough.
For diagrams you should create vector graphics (eps or pdf) if possible, that way you do not lose any quality.
For screenshots, there is not much to do, but for flow charts, I'd suggest to create them in PDF format (vectorized) and to compile your LaTeX source with pdflatex.
for the flowchart i'd suggest TikZ, then your chart is directly typeset in TeX. Here's an example: http://www.texample.net/tikz/examples/simple-flow-chart/
Screenshots are pretty much a lost cause. I've had a good experience saving them as PDF and then embedding them, but you want to make sure you're on a high-res capture to begin with.
Charts are very easy. Most graphics programs (e.g., Vizio, OmniGraffle) will let you save it as EPS or PDF, and scaling works fairly well.
I need to compare 2 same-size, nearly identical images for exact differences in the RGBs of every pixel.
I would like to find a tool that already does it... seems nowhere to be found on google, strangely.
If I could even find a tool to print out the RGB values of every pixel I could compute it by hand (the images are small enough) or load that input for my tool. Again, couldn't find anything.
Otherwise I look for a simple C library to decode GIFs and access each pixel... recommendations? I see quite a few on google, most look old and have no documentation.
I hope someone with more exposure to image processing can help me solve this this somewhat trivial task in one way or another without spending too many hours!!
If you have ImageMagick installed, it already does it.
What about SDL + SDL_Image (main site)?
You can easily open GIFs and load them on SDL_Surfaces to retrieve the pixel information you need..
If you're not opposed to Python, one option would be to use the Python Imaging Library (PIL), which provides Python bindings for native decoders for many file formats, including PNG and GIF.
This past summer, I wrote a few small apps to do RGB-wise comparisons of PNG images, in C++, pure Python, and Python using PIL. It would be trivial to make the PIL code work with GIF images.
If you want to roll your own, the "standard" C library for simple image manipulation is GD.
Beyond Compare will do image comparisons and highlights differences.
http://www.scootersoftware.com/