With my new assignment I am looking for a method to detect the presence of text on image. The image is a map - can be for example google map. The task is to detect where the street/city label is placed.
I know that opencv library has algorithm that can detect features (for example human faces) - haar classifier or hog (histogram of oriented gradients), but I heard that learning process of such algorithms is quite difficult.
Do you know of any algorithm, method or a library that could do that (detect presence of text on image)?
Thanks,
John
There is a standard problem in vision called text detection in images. it is quite different to OCR. OCR concerms itself with what it says, while text detection is about determining if there is text in the image. Adi Shavit's third link is a method to address this problem. You can look on google scholar well cited articles on text detection.
There are several possible approaches you can take.
Use OCR. A search for OCR on Stackoverflow will show many options. These include Tesseract and Ocropus.
If your text uses very specific fixed font, you may get away with simple template matching.
In the more general case you might want to take a look at "Detecting Text in Natural Scenes with Stroke Width Transform"
UPDATE Jan. 2017
The OpenCV 3.2 contrib module now has a text detection module.
It also includes a sample (C++, Python) of how to use it.
You need to tune this to a specific type of map images, or the problem is going to be very difficult (see the previous post about links to articles).
OCR is the way to go, and you should use an existing library. However, OCR is mainly done on text on white backgrounds. To reduce your problem to a regular OCR problem, you should attempt to work on the color space of the map. Likely the map text has a very specific color and this may be enough to find these pixels. You can then filter the detected pixels based on the size of connected regions.
If you literally only want to find the locations of text labels, you can do the above, and pretty much just skip the OCR step. If the labels are not too close, simple clustering algorithms can be used to find their respective positions.
Related
I am thinking of using OpenCV library for image analysis. Basically I want to automate in my project the extraction of image label from wine bottle.
This is the sample input image:
This is the sample output:
I am thinking what should be my general strategy to extract the image. I am not asking for direct code. Just want to know the general approach to solve the problem.
Thanks!
Sorry for vage answer but in applied computer vision is no such thing like general approach.
some will disagree of course but in reality
all CV applications are custom made for some specific purpose/task
in your case is the idea to find cylindric and probably standing object (bottle)
and then finding of irregular parts in it
I would do it like this:
1.remove noise as much as possible (smooth/sharpen filters)
2.(optionaly) reduce image data (via (i)FT or (i)DCT for example)
3.segmentate objects (usually by homogenity of color or by edge detection or by booth)
4.identify bottle object (by color,shape,or illumination (glass is transparent))
5.identify objects inside bottle (homogenity,not transparent,usually sharp edges,color is not good some labels are black on dark glass)
6.(optional) project label back from cylindric space to flat texture
[notes]
create app with many scrollbars and checkboxes
to be able to change all tresholds and enable disable filters or their order on the run
all parts will take a lot of tweaking of tresholds and weights
you have to do a lot of trial and error runs to find the best filters and their config for your task
I am learning image processing and i am trying to start my first project, that is Simple number recognition in an image.
So far i have applied thresholding to the image. Now i would like to know some algorithms by which my system can recognize the number in the image. Preferably the algorithm must be simple and it doesn't have to robust as i am would be generating the image in paint using the same font.
I have looked at the similar questions here on SO and they all point out to using libraries. Remember guys i am trying to learn so please don't point out some libraries.
Are the numbers printed or hand-written?
The Computer Vision System Toolbox includes a function called ocr, which will recognize both, letters and numbers.
If you are looking for hand-written digit recognition, please take a look at this example.
I am working on a project in which I need to highlight the difference between pair of scanned images of text.
Example images are here and here.
I am building a webapp based on HTML,JS for this.
I found that openCV does support highlighting differences between 2 images.
Also I saw that imageMagick also has such support.
Does openCV has support for doing automatic registration of images?
And is there a JS module for openCV?
Which one is more suited for my purpose?
1. Simplistic way:
Suppose the images are perfectly aligned and similarly illuminated: subtract one image from another pixel by pixel, then threshold the result, filter out noisy blobs, and select the biggest ones. Good for a school project
2. A bit more complicated:
Align the images, then find a way to uniform the illumination, then apply the simplistic way.
How to align:
Find the text area in two images, as being a darker than the file color.
Find its corners
Use getPerspectiveTransform() to find the transform between images.
warpPerspective() one image to another.
Another way to register the two images is by feature matching. It has quite an extensive support in OpenCV. And findHomography() will estimate the pose between two images from a bigger set of matching points.
3. Canonical answer:
Align the image.
Convert it to text with an OCR engine.
Compare the text in the two images.
Well, besides the great help given by vasile, you also need the web app answer.
In order to make it work in a server, you will probably need a file upload form, as well as an answer from the server with the applied algorithm. There are several ways you can do it depending on the server restrictions you have. If you can run command line arguments, you would probably need to implement the highlight algorithm in opencv and pass the two input files a an output one for the program. A php script should be used for uploading the files, calling the command line program, and outputting the result to the user.
Another approach could be using java and JavaCV in a web container like Apache Tomcat, for instance.
Best regards,
Daniel
Say i have this old manuscript ..What am trying to do is making the manuscript such that all the characters present in it can be perfectly recognized what are the things i should keep in mind ?
While approaching such a problem any methods for the same?
Please help thank you
Some graphics applications have macro recorders (e.g. Paint Shop Pro). They can record a sequence of operations applied to an image and store them as macro script. You can then run the macro in a batch process, in order to process all the images contained in a folder automatically. This might be a better option, than re-inventing the wheel.
I would start by playing around with the different functions manually, in order to see what they do to your image. There are an awful number of things you can try: Sharpening, smoothing and remove noise with a lot of different methods and options. You can work on the contrast in many different ways (stretch, gamma correction, expand, and so on).
In addition, if your image has a yellowish background, then working on the red or green channel alone would probably lead to better results, because then the blue channel has a bad contrast.
Do you mean that you want to make it easier for people to read the characters, or are you trying to improve image quality so that optical character recognition (OCR) software can read them?
I'd recommend that you select a specific goal for readability. For example, you might want readers to be able to read the text 20% faster if the image has been processed. If you're using OCR software to read the text, set a read rate you'd like to achieve. Having a concrete goal makes it easier to keep track of your progress.
The image processing book Digital Image Processing by Gonzalez and Woods (3rd edition) has a nice example showing how to convert an image like this to a black-on-white representation. Once you have black text on a white background, you can perform a few additional image processing steps to "clean up" the image and make it a little more readable.
Sample steps:
Convert the image to black and white (grayscale)
Apply a moving average threshold to the image. If the characters are usually about the same size in an image, then you shouldn't have much trouble selecting values for the two parameters of the moving average threshold algorithm.
Once the image has been converted to just black characters on a white background, try simple operations such as morphological "close" to fill in small gaps.
Present the original image and the cleaned image to adult readers, and time how long it takes for them to read each sample. This will give you some indication of the improvement in image quality.
A technique call Stroke Width Transform has been discussed on SO previously. It can be used to extract character strokes from even very complex backgrounds. The SWT would be harder to implement, but could work for quite a wide variety of images:
Stroke Width Transform (SWT) implementation (Java, C#...)
The texture in the paper could present a problem for many algorithms. However, there are technique for denoising images based on the Fast Fourier Transform (FFT), an algorithm that you can use to find 1D or 2D sinusoidal patterns in an image (e.g. grid patterns). About halfway down the following page you can see examples of FFT-based techniques for removing periodic noise:
http://www.fmwconcepts.com/misc_tests/FFT_tests/index.html
If you find a technique that works for the images you're testing, I'm sure a number of people would be interested to see the unprocessed and processed images.
I 'm trying to find an efficient way of acceptable complexity to
detect an object in an image so I can isolate it from its surroundings
segment that object to its sub-parts and label them so I can then fetch them at will
It's been 3 weeks since I entered the image processing world and I've read about so many algorithms (sift, snakes, more snakes, fourier-related, etc.), and heuristics that I don't know where to start and which one is "best" for what I'm trying to achieve. Having in mind that the image dataset in interest is a pretty large one, I don't even know if I should use some algorithm implemented in OpenCV or if I should implement one my own.
Summarize:
Which methodology should I focus on? Why?
Should I use OpenCV for that kind of stuff or is there some other 'better' alternative?
Thank you in advance.
EDIT -- More info regarding the datasets
Each dataset consists of 80K images of products sharing the same
concept e.g. t-shirts, watches, shoes
size
orientation (90% of them)
background (95% of them)
All pictures in each datasets look almost identical apart from the product itself, apparently. To make things a little more clear, let's consider only the 'watch dataset':
All the pictures in the set look almost exactly like this:
(again, apart form the watch itself). I want to extract the strap and the dial. The thing is that there are lots of different watch styles and therefore shapes. From what I've read so far, I think I need a template algorithm that allows bending and stretching so as to be able to match straps and dials of different styles.
Instead of creating three distinct templates (upper part of strap, lower part of strap, dial), it would be reasonable to create only one and segment it into 3 parts. That way, I would be confident enough that each part was detected with respect to each other as intended to e.g. the dial would not be detected below the lower part of the strap.
From all the algorithms/methodologies I've encountered, active shape|appearance model seem to be the most promising ones. Unfortunately, I haven't managed to find a descent implementation and I'm not confident enough that that's the best approach so as to go ahead and write one myself.
If anyone could point out what I should be really looking for (algorithm/heuristic/library/etc.), I would be more than grateful. If again you think my description was a bit vague, feel free to ask for a more detailed one.
From what you've said, here are a few things that pop up at first glance:
Simplest thing to do it binarize the image and do Connected Components using OpenCV or CvBlob library. For simple images with non-complex background this usually yeilds objects
HOwever, looking at your sample image, texture-based segmentation techniques may work better - the watch dial, the straps and the background are wisely variant in texture/roughness, and this could be an ideal way to separate them.
The roughness of a portion can be easily found by the Eigen transform (explained a bit on SO, check the link to the research paper provided there), then the Mean Shift filter can be applied on the output of the Eigen transform. This will give regions clearly separated according to texture. Both the pyramidal Mean Shift and finding eigenvalues by SVD are implemented in OpenCV, so unless you can optimize your own code its better (and easier) to use inbuilt functions (if present) as far as speed and efficiency is concerned.
I think I would turn the problem around. Instead of hunting for the dial, I would use a set of robust features from the watch to 'stitch' the target image onto a template. The first watch has a set of squares in the dial that are white, the second watch has a number of white circles. I would per type of watch:
Segment out the squares or circles in the dial. Segmentation steps can be tricky as they are usually both scale and light dependent
Estimate the centers or corners of the above found feature areas. These are the new feature points.
Use the Hungarian algorithm to match features between the template watch and the target watch. Alternatively, one can take the surroundings of each feature point in the original image and match these using cross correlation
Use matching features between the template and the target to estimate scaling, rotation and translation
Stitch the image
As the image is now in a known form, one can extract the regions simply via pre set coordinates