Tesseract OCR limit area in command line

Tesseract OCR limit area in command line - cmd

Is there any way to limit the image I want to recognize by tesseract in command line? This means I want to set the coordinates of an area to "crop it".
I was trying to find it in the usage information but I couldn't. Maybe I can do this by using a configfile. I'd appreciate any comment on the subject.

This thread has the answer to your question:
Tesseract: Specifying regions of text
Here is the answer from that link:
Calling tesseract with parameter "-psm 4" and renaming the uzn file with the same name of the image seem works.
Example: If we have C:\input.tif and C:\input.uzn, we do this:
tesseract -psm 4 C:\input.tif C:\output

Related

Postscript PPD Watermark on every page

I am trying to print a watermark on every printed page with the help of a custom PPD for a printer. I believe (please correct me if I'm wrong) that I will need to call the script that adds the watermarks inside the ppd with a like similar to this what I found online:
cupsFilter: "application/vnd.cups-postscript 100 /usr/lib/cups/filter/watermark"
How would the watermark filter need to look like? I could not find any examples for that.
Is it possible to incorporate a postscript file as watermark? How is the watermark added to the printed document?
Thank you very much in advance!

Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

A quick help is highly appreciated. I am extracting the text from the tiff image through tesseract-OCR.
The output I am looking for is.HOCR (HTML).
I am getting the perfect output in terms of content, but the format looks very unorganized.
But the same when I open with Notepad ++ it gives a clean format.
The windows command line is given below
Tesseract "Path\image.tiff" "Path\output" HOCR
need your help in getting the organised hocr format in notepad as enclosed
How do I get organized hocr data when I open with notepad?

Problem is not in tesseract, but in notepad. Use some normal text editor like notepad++ or context.

How to set image in axes in matlab GUI?

i have a code which gives several images as ouput and i want to set all these images in particular axes in GUI in matlab. I'm trying to make a GUI of the code.
For eg.
figure,imshow(s1);
figure,imshow(s2);
figure,imshow(s2&s1);
and i want to set the output image of first command in, say axes3, output image of second command in axes4 and similarly last output image in axes5.
Although i know i need to use
set(handles.axes...)
command but i don't know the exact syntax on how to make the image be shown in particular axes.
Please give explain on how to make this happen with any suitable example. Thanks in advance.

One line solution (for each image) is to set the axis as the parent of the image within the imshow command;
imshow(image_Data,'Parent',handles.axes1)
There should be no need to open the additional figure windows ( assuming the axes are with the gui...)
So specifically for the question above:
imshow(s1,'Parent',handles.axes3);
imshow(s2,'Parent',handles.axes4);
imshow(s2&s1,'Parent',handles.axes5);

First you should create an axes box in your gui, then in tag section get a name i.e. (original) and finaly in editor when you want to use it code something like this
A = imread (Path);
axes(handles.original);
imshow(A);
I hope to help you...

OSX: automated (every 1-2sec) screenshot (not full screen but (x,y,w,h)) using python

I want to make screenshots on OSX using python. I dont want make full screen shots but only certain rectangles on the screen. Something like (291,305,213,31). I need the correct pixel because afterwards the image files are processed by OCR (python-tesseract) to extract the text.
By the way this is since 6 years the first time I am programming, so far I only know Java a bit. I started yesterday and gave up this morning at 4am. So basically I have no clue yet...For example I still cannot build with Sublime because of path settings, but thats a different story. Cant figure out everything on one day.
I was trying already the following:
- wxPython
But the result are black images, see also:
stackoverflow.com/questions/8644908/take-screenshot-in-python-cross-platform
Additionally it only works in 32-bit mode, but when I do OCR using python-tesseract openCV requires 64-bit....
autopy
when trying to install I got errors, see also:
stackoverflow.com/questions/12993126/errors-while-installing-python-autopy
ImageGrab
only Windows
effbot.org/imagingbook/imagegrab.htm
commandline screencapture
os.system('screencapture test.png')
When I found this I thought, nice but only fullscreen when checking man screencapture. But then I found this: guides.macrumors.c om/screencapture
-R capture screen rect
That would be already enough, but on OSX 10.7.5 I dont have this option. Any ideas?
import Quartz.CoreGraphics
neverfear.org/blog/view/156/OS_X_Screen_capture_from_Python_PyObjC
Create screenshot as CGImage
image = CG.CGWindowListCreateImage(
region,
CG.kCGWindowListOptionOnScreenOnly,
CG.kCGNullWindowID,
CG.kCGWindowImageDefault)
Unfortunately the image is not in file format but a CGImage, no idea how to save as file.
So if possible I would like to use the commandline screencapture with -R if somebody knows how. Just as a start to continue.
Are there any other command line tools available?
What about other libs that I have missed?
Cheers
M

Given that you can get a CGImageRef, you can get its pixel data using the techniques described in Technical Q&A QA1509: Getting the pixel data from a CGImage object. In particular, it shows a function to get the pixel data as a CFDataRef using this function:
CFDataRef CopyImagePixels(CGImageRef inImage) { return CGDataProviderCopyData(CGImageGetDataProvider(inImage)); }
and says:
The pixel data returned by CGDataProviderCopyData has not been color
matched and is in the format that the image is in, as described by the
various CGImageGet functions …
It shows an alternative for getting the pixel data in other formats if you need that.

Why only 1 image out of 2 is correctly read by tesseract?

It's my first experience with tesseract, I'm trying to read the digits contained in these tiff images:
http://imageshack.us/g/703/64553021.png/
As you can see they are in the same format and also same width/height. I don't know why tesseract returns the correct output only for the second image ("150") instead for the first one returns a blank output.
Maybe I should modify them to best fit tesseract? How? I can use Imagemagick if needed.
Thanks in advance.

In the readme they say:
In the executable, page layout analysis is enabled by default. You may need to turn it off to process small images. No command-line control for this yet. Sorry. See tesseractmain.cpp.
I think your images are too small, try editing the code (and recompile).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio