Tesseract-OCR (3.02) recognition accuracy and speed - image

I have group of very small images (w:70-100 ; h:12-20), like the one below:
In those images nothing but nickname of group's member. I want to read the text from simple images, they all have one background, only nickames are different. So, what I've done with that image:
I am using code below to get text from second image:
tesseract::TessBaseAPI ocr;
ocr.Init(NULL, "eng");
PIX* pix = pixRead("D:\\image.png");
ocr.SetImage(pix);
std::string result = ocr.GetUTF8Text();
I have 2 problems with that:
The ocr.GetUTF8Text(); is working slow: 650-750ms. Image is small, why it works so long anyway?
From the image above I am getting result like: "iwillkillsm", "iwillkillsel" etc. That image is simple, and I believe tesseract gurus are able to recognize it with 100% accuracy.
What should I do with image/code or what should I read (and where) about tesseract-ocr (something about text speed and quality recognition) to solve those problems?

It may sound odd, but I've always had the best luck with tesseract when I increased the dimensions of the image. The image would look "worse" to me but tesseract went faster and had much better accuracy.
There is a limit to how big you can make the images before you start getting worse results however :) I think I remember shooting for 600px in the past. You'll have to play with it though.

Related

Steganography resistant to JPEG compression and manipulation

I want to hide a small image data inside a much bigger image i.e. Image Steganography.
And want to make it resistant to JPEG compression/re-compression , manipulations etc.
I was wondering if there is some widely-used/standard algorithms/methods for this.
I came across many algorithms like F5 etc.
But they seem to be not resistant against re-compression.
Also as mentioned in some papers , the method which should resist JPEG compression usually have less capacity.
Thanks in advance.
Think of it like a QR code -- with as much error correction as possible. That's why blurry pictures, or bad camera hardware, still works with most QR codes (and also why some people are able to put logos in the middle of QR codes and still get them to scan).
Basically, you'll need to design some really good redundancy (as much as possible), and then test the image by running a gaussian blur on it (or compressing it). There will always be a level where it won't work, but you'll have to play with that to find the best range for your business case.

Tesseract not recognizing anything after image processing

This is the image before I processed it:
After processing, it looks like that:
I think that the second one is great, but it looks like I am very wrong. When I used tesseract on original image, it recognized some of the text. But when I do it on the second one (the black and white one) it does not recognize anything at all! Why is that? What am I doing wrong here?
I've tested a little with your images. I think the main problem is the poor image quality. Try images in higher resolution, that could work a lot better.
I assume you only want the item names? If yes, then delete the "Buy Now for:" and the coins + the according numbers also (just paint them black like the rest), that made it better for me!
Also play around with the different settings of Tesseract I had the best results with psm 1 and 6 I think.
Conclusion: Higher image quality (resolution) should work the best!

Enlarging image without affecting clarity

I need to enlarge the image downloaded without affecting its clarity.but when resized its clarity has gone.Can any one help?
Given the context, by clarity I assume you mean visual appearance. You want your upscaled image, again I believe you are dealing with upscaling and not downscaling (it is not specified in your problem), to look visually good. We actually can magically create detail, but probably not a perfect one. There are techniques for specifically working with pixelated images, hqx or http://research.microsoft.com/en-us/um/people/kopf/pixelart/paper/pixel.pdf for instance. Since that is not clear from your description either, I'm simply assuming you have images of any kind.
With these considerations, you have yet to describe what you tried. Let me guess you tried a nearest neighbor interpolation, so you get something like:
There are other common types of interpolation. Like bicubic, Lanczos or something more modern like ICBI or http://www.cs.huji.ac.il/~raananf/projects/lss_upscale/paper.pdf. Consider the first three of those, we get the respective results:
It may be a little hard to visualize the differences among these last three, but if you zoom into the actual images then you will be able to notice them. ICBI gives sharpest edges in this case.
Image resizing will always affect clarity, unless you downloaded a vector graphics image. See if the image has a vector graphics format, and if so, download that.
Failing that, you could try to see if larger image sizes are available, as generally shrinking hurts the image quality less than increasing.

Restoring an old manuscript with image processing

Say i have this old manuscript ..What am trying to do is making the manuscript such that all the characters present in it can be perfectly recognized what are the things i should keep in mind ?
While approaching such a problem any methods for the same?
Please help thank you
Some graphics applications have macro recorders (e.g. Paint Shop Pro). They can record a sequence of operations applied to an image and store them as macro script. You can then run the macro in a batch process, in order to process all the images contained in a folder automatically. This might be a better option, than re-inventing the wheel.
I would start by playing around with the different functions manually, in order to see what they do to your image. There are an awful number of things you can try: Sharpening, smoothing and remove noise with a lot of different methods and options. You can work on the contrast in many different ways (stretch, gamma correction, expand, and so on).
In addition, if your image has a yellowish background, then working on the red or green channel alone would probably lead to better results, because then the blue channel has a bad contrast.
Do you mean that you want to make it easier for people to read the characters, or are you trying to improve image quality so that optical character recognition (OCR) software can read them?
I'd recommend that you select a specific goal for readability. For example, you might want readers to be able to read the text 20% faster if the image has been processed. If you're using OCR software to read the text, set a read rate you'd like to achieve. Having a concrete goal makes it easier to keep track of your progress.
The image processing book Digital Image Processing by Gonzalez and Woods (3rd edition) has a nice example showing how to convert an image like this to a black-on-white representation. Once you have black text on a white background, you can perform a few additional image processing steps to "clean up" the image and make it a little more readable.
Sample steps:
Convert the image to black and white (grayscale)
Apply a moving average threshold to the image. If the characters are usually about the same size in an image, then you shouldn't have much trouble selecting values for the two parameters of the moving average threshold algorithm.
Once the image has been converted to just black characters on a white background, try simple operations such as morphological "close" to fill in small gaps.
Present the original image and the cleaned image to adult readers, and time how long it takes for them to read each sample. This will give you some indication of the improvement in image quality.
A technique call Stroke Width Transform has been discussed on SO previously. It can be used to extract character strokes from even very complex backgrounds. The SWT would be harder to implement, but could work for quite a wide variety of images:
Stroke Width Transform (SWT) implementation (Java, C#...)
The texture in the paper could present a problem for many algorithms. However, there are technique for denoising images based on the Fast Fourier Transform (FFT), an algorithm that you can use to find 1D or 2D sinusoidal patterns in an image (e.g. grid patterns). About halfway down the following page you can see examples of FFT-based techniques for removing periodic noise:
http://www.fmwconcepts.com/misc_tests/FFT_tests/index.html
If you find a technique that works for the images you're testing, I'm sure a number of people would be interested to see the unprocessed and processed images.

How to detect subjective image quality

For an image-upload tool I want to detect the (subjective) quality of an image automatically, resulting in a rating of the quality.
I have the following idea to realize this heuristically:
Obviously incorporate the resolution into the rating.
Compress it to JPG (75%), decompress it and compare jpg-size vs. decompressed size to gain a ratio. The blurrier the image is, the higher the ratio.
Obviously my approach would use up a lot of cycles and memory if large images are rated, although this would do in my scenario (fat server, not many uploads), and I could always build in a "short circuit" around the more expensive steps if the image exceeds a certain resolution.
Is there something else I can try, or is there a way to do this more efficiently?
Assesing the image (the same goes for sound or video) quality is not an easy task, and there are numerous publications tackling the problem.
Much depends on the nature of the image - different set of criteria is appropriate for artificially created images (i.e. diagrams) or natural images (i.e. photographs). There are subtle effects that have to be taken into consideration - like color masking, luminance masking, contrast perception. For some images a given compression ratio is perfectly adequate, while for other it will result in significant loss of quality.
Here is a free-access publication giving a brief introduction to the subject of image quality evaluation.
The method you mentioned - compressing the image and comparing the result with the original is far from perfect. What will be the metric that you plan to use? MSE? MSE per block? For sure it is not too difficult to implement, but the results will be difficult to interpret (consider images with high-frequency components and without them).
And if you want to delve more into the are of image quality assessment there is also a lot of research done by the machine learning community.
You could try looking in the EXIF tags of the image (using something like exiftool), what you get will vary a lot. On my SLR, for example, you even get which of the focus points were active when the image was taken. There may also be something about compression quality.
The other thing to check is the image histogram - watch out for images biased to the left, which suggests under-exposure or lots of saturated pixels.
For image blur you could look at the high frequency components of the Fourier transform, this is probably accessing parameters relating to the JPG compression anyway.
This is a bit of a tricky area because most "rules" you might be able to implement could arguably be broken for artistic effect.
I'd like to shoot down the "obviously incorporate resolution" idea. Resolution tells you nothing. I can scale an image by a factor of 2 , quadrupling the number of pixels. This adds no information whatsoever, nor does it improve quality.
I am not sure about the "compress to JPG" idea. JPG is a photo-oriented algorithm. Not all images are photos. Besides, a blue sky compresses quite well. Uniformly grey even better. Do you think exact cloud types determine the image quality?
Sharpness is a bad idea, for similar reasons. Depth of Field is not trivially related to image quality. Items photographed against a black background will have a lot of pixels with quite low intensity, intentionally. Again, this does not signal underexposure, so the histogram isn't a good quality indicator by itself either.
But what if the photos are "commercial?" Does the value of the existing technology work if the photos are of every-day objects and purposefully non-artistic?
If I hire hundreds of people to take pictures of park benches I want to quickly know which pictures are of better quality (in-focus, well-lit) and which aren't. I don't want pictures of kittens, people, sunsets, etc.
Or what if the pictures are supposed to be of items for a catalog? No models, just garments. Would image-quality processing help there?
I'm also really interested working out how blurry a photograph is.
What about this:
measure the byte size of the image when compressed as JPEG
downscale the image to 1/4th
upscale it 4x, using some kind of basic interpolation
compress that version using JPEG
compare the sizes of the two compressed images.
If the size did not go down a lot (past some percentage threshold), then downscaling and upscaling did not lose much information, therefore the original image is the same as something that has been zoomed.

Resources