Improving speed of text recognition using OpenCL - parallel-processing

I'm using Tesseract's GetUTF8Text() method to extract text from an image. However, it takes about 10 - 30 seconds depending on the size of the image. Can I use OpenCL to call the same function multiple times on different parts of the image? Or does Tesseract already use OpenCL to speed up the text recognition process? I want to know if I will be able to get any speedup by splitting my image into multiple parts and extracting text using GetUTF8Text().

Related

Does image file type matter in terms of accuracy or speed when training/evaluating in machine learning?

I would like to know if the image file type matters at all in image classification using Keras, Tensorflow, or any other machine learning library. For example:
If I were to train using only JPG files, will the accuracy be significantly affected if I were to evaluate the model using only PNG files?
If so, will it be better to train using both JPG and PNG files so I can evaluate using both types?
Or does the image file type not matter at all?
The file type does not matter.
During training (and inference for that matter) images are converted into a tensors (you can think of this just as a multi dimensional array) where each pixel is represented by a small group of numbers (or a single number for black and white images).
Machine learning is performed on these tensors rather than the image itself so the original file format really doesn't matter.

How to do ocr fast in MATLAB?

The task is relative simple. It only needs to recognize numbers. The speed needs to be fast (within 0.5 seconds in a 2016 computer). A sample picture is attached. I need to create a function which contain recognize those two sets of numbers in the attached picture and put those into two arrays. I tried to use the OCR function in MATLAB computer vision toolbox. It produced many errors and the speed is very slow.
I am thinking to predefine 10 digits (0 to 9) and then use pixel to pixel comparison. I think that it will give high accuracy, but the speed will be slow.
Any suggestion?

Performance issue with Java2D gradients and iText PDF

I am using iText PDF 5.4 along with the Java2D interface (java.awt.Graphics canvas), and I have a severe problem with gradients.
I am painting many rectangular shapes whose paint is a LinearGradientPaint. This results in large files (e.g., 10 MB), and trying to open the results in e.g. Preview.app brings the computer to total halt. The problem seems to be memory usage, because the first dozens of boxes paint rather quickly and then performance slows down somewhat linearly with more boxes, which means that for a typical page it takes >10 minutes to open.
Adobe Acrobat is also slow but at least it takes some 4 or 5 seconds instead of several minutes.
Is this a bug of iText? Is there a setting or tweak in iText that controls the representation of gradients? I guess it decomposes them into hundreds of separate paint commands instead of using a direct gradient component (if that exists—I know it exists in SVG, but PDF I have no clue).
The condition is that I stay in the awt.Graphics, I cannot rewrite my rendering code to not use Java2D.
An alternative idea would be to use Apache Batik and output to SVG instead. There is an example that shows how to enable the correct transcoding of LinearGradientPaint to the SVG equivalent.
EDIT: There seems to be a new Java2D-to-SVG library JFreeSVG. Recent changes indicate that gradients are implemented.

Matlab SVM for getting output in binary form

I need to use SVM for image features extracted. The ouput of SVM should be in binary. Please share the resources if any available. Actually I am working on sclera detection. I manually got the sclera region from the training images and have the features extracted in the form of histogram values of each and every sub region in the image. Now I will extract the same features from the test image. Upon receiving the features from the test image, I need to compare test and training features with SVM on whether that particular block corresponds to sclera or not. If i get an out put in 0 and 1 form, then I shall use the code to segmentation of the region in an easier way.
Look at libsvm for Matlab, which is probably the most widely used free SVM toolbox in Matlab.
There is a useful post about libsvm in Matlab right here. When you download libsvm, it contains several examples for Matlab that I was able to run in a few minutes. You can easily modify one of those examples to get your binary output.

Optimize the SVG output from Gnuplot

I've been trying to plot a dataset containing about 500,000 values using gnuplot. Although the plotting went well, the SVG file it produced was too large (about 25 MB) and takes ages to render. Is there some way I can improve the file size?
I have vague understanding of the SVG file format and I realize that this is because SVG is a vector format and thus have to store 500,000 points individually.
I also tried Scour and re-printing the SVG without any success.
The time it takes to render you SVG file is proportional to the amount of information in it. Thus, the only way to speed up rendering is to reduce the amount of data
I think it is a little tedious to fiddle with an already generated SVG file. I would suggest to reduce the amount of data for gnuplot to plot.
Maybe every or some other reduction of data can help like splitting the data into multiple plots...
I would recommend keeping it in vector graphic format and then choosing a resolution for the document that you put it in later.
Main reason for doing this is that you might one day use that image in a poster (for example) and print it at hundreds of times the current resolution.
I normally convert my final pdf into djvu format.
pdf2djvu --dpi=600 -o my_file_600.djvu my_file.pdf
This lets me specify the resolution of the document as a whole (including the text), rather than different resolutions scattered throughout.
On the downside it does mean having a large pdf for the original document. However, this can be mitigated against if you are latex to make your original pdf - for you can use the draft option until you have finished, so that images are not imported in your day-to-day editing of the text (where rendering large images would be annoying).
Did you try printing to PDF and then convert to SVG?
In Linux, you can do that with imagemagick, which you even be able to use to reduce the size of your original SVG file.
Or there are online converters, such as http://image.online-convert.com/convert-to-svg

Resources