Extract text from rectangle on Windows screen without using OCR - windows

Given a rectangle that represents an area on a Windows screen that contains text, what is the best way to extract the text?
I know that it is possible using OCR, but even after significant pre processing, the quality is really poor.
Getting the Window Text using Win32 API does not always work as well.
Assuming that the text was rendered using a font, is it possible to get it from there?
Any directions would be extremely helpful. Thanks!

Given a rectangle that represents an area on window screen, the best way to extract text is indeed OCR. Use a better OCR library like this one from Microsoft.
The reason getting the window text using Win32 API does not work well is because there may be multiple windows in that rectangle. You will have to find out what all windows the rectangle contains and send a message to get the text for each window. It is not impossible but difficult to do and even if you manage to do that, you will run into issues of text alignment, etc. OCR is your best option.

It does seem possible without using OCR, as NirSoft SysExporter can do this:
https://www.nirsoft.net/utils/sysexp.html
This may be suitable for programmatic use as it can be run from a command line:
Starting from version 1.70, you can export the content of Windows
control from command-line, without displaying any user interface.
You may not be able to target it at a specific rectangle on the screen, but maybe the same result could be achieved by first scraping everything followed by some post-processing.
Further basic info:
SysExporter utility allows you to grab the data stored in standard
list-views, tree-views, list boxes, combo boxes, text-boxes, and
WebBrowser/HTML controls from almost any application running on your
system, and export it to text, HTML or XML file.
...
Known Limitations
SysExporter can export data from most combo boxes, list boxes,
tree-view, and list-view controls, but not from all of them. There are
some applications that use these controls to display data, but the
data itself is not actually stored in the control, but in another
location in the computer's memory. In such cases, SysExporter won't be
able to export the data.
Personally I've used it to grab text from what look like label controls.

Related

Text Block Text Animation Like Live Tile Text

Hi every One See the Below Image
in this Screen that 3 Blocks are images and i am trying to Bind some text to that Text Block and display with some time intervals with Live tile Look and feels. Can any One Sugest How can i do this
If you want to mimic Live Tiles, I suggest that you take a look at The Windows Phone Toolkit. It contains the HubTile control which is live and is probably the thing you are looking for.
If not, you have the source code and you can check how they did it. This way you can replicate the behavior and then customize it.

Extract text from X11 GUIs?

I have a point-of-sale application written in Perl/Tk. I use X11::GUITest to do automated testing of it, driving the app via hot-keys bound to the buttons and other widgets (it's normally touch-screen driven). However, X11::GUITest doesn't have a way to "read" text back from the screen, so I resort to augmenting the app to write temp files as well as putting data on the screen. The test scripts then look at the temp files, not the GUI. But I'd love to extend X11::GUITest or make a new CPAN module that can scrape text strings from X11 GUIs. I'm not after graphics-to-text conversion; it's my (faint) understanding that somewhere in the depths of the X window system, label text and such are stored as text strings and rendered to bitmap form late in the pipeline (?).
Anyone have guidance on how to do this, or pointers on where to start?
Yeah, I know I should've adhered to better MVC separation and not actually test at the GUI level, but just below it; however reality got in the way and it is what it is!
The best way to do this is to make your program work with an accessibility framework like ATK (used in GTK applications) and then use that to query for the strings as a screen reader would have to for text-to-speech translation. This is the approach taken by the Linux Desktop Testing Project and dogtail testing frameworks. You get the bonus of using existing, well tested code and making your application more usable by disabled users (as may be required by laws such as the Americans with Disabilities Act in the US and similar laws in many other countries).
If your application is using modern font frameworks, like libXft2, this may be your only choice, as those strings are only in the client application, not the X server, and the character to pixmap conversion is done in the client. (If your text is antialiased, it must be using these instead of the legacy X11 API's.)
Even with the legacy X11 API's though, the X server doesn't store the strings once the text to bitmap conversion is done, so there's no good way to query them short of intercepting them in that case.
The program listres lists resources in widgets, including label texts and I think the contents of text entry boxes. You may be able to use its output directly, extracting what you need, or you may need to look at the source and see how it's done.

Image/form to Pascal/Delphi code converter?

Does anyone knows about any editor allowing to visually design a form (by form I do not mean DFM or Delphi form, but a "paper form", like those pre-printed forms that you fill with some info) and that generates pascal commands to draw that form in a Printer (or Image) canvas?
What I want is an easy way to draw/design this form visually, composed just by lines and text, and a way to convert this to Pascal commands that when run, will draw that form in a Canvas (Image or Printer), respecting the original layout and scale, doesn't matter the Canvas DPI where it is being drawn.
Update: Maybe I wasn't clear enough about what I need and why I need it. I developed an Open Source component called TFreeBoleto (freeboleto.sf.net). It is used to generate and print bank billets (a common method for billing people in Brazil). Right now, the component uses a TBitmap image containing the "billet" mask, and TextOut methods for the dynamic areas (ie: billet number, customer name, etc). It is fine when looked in the screen, but some people complains that the quality of the printed image is not good. The component uses a BltTBitmapAsDib procedure to maximize the quality of printing, but some people still think it is not good enough. So, my idea was to avoid using a bitmap image as the form layout, and draw everything direct in the canvas (both form and printer). Check here for a sample of what a bank billet looks like.
Of course ReportBuilder and/or FastReport could solve the problem, but they are not free, so I cannot include it in the component. I need "native" solution that any standard Delphi install would be able to compile.
You might get what you want out of the Fast Reports Report Designer which is a commercial reporting system for Delphi. Remember that a report is just a page. That page can be shown on the screen or printed on the printer.
You also might find that something like TRichView helps you.
Whether using TRichView in particular or not, I would look into using HTML to do what you want. I would use HTML+CSS to do both a screen and printer layout, that can also be viewed on the web. For simple text layout plus text boxes I think even bare HTML and HTML tables might be sufficient. To visually design simple text pages, using a Delphi application, I would use TRichView.
In both cases, you would be creating documents, not code. To create code that creates a page, without using any document system, would be very difficult indeed, and I am not sure what you would really do with that code, since you would need a compiler or interpreter to convert that code into something that you could use. Please clarify what you mean by "creating code", and what syntax you would want that code to be using. If HTML is code in your definition of "code" then maybe HTML is the best kind of "code" for your problem.
I do my form-work with WPTools. It is also a commercial product. The core is a very good wordprocessor and form-designer. The engine can render text and forms to any canvas (screen, printer, also create pdf) and is highly flexible. Output is mainly rtf and html.
I also see no advantage in creating pascal code to redraw the form. What you need, i think, is a good WYSIWYG-editor which creates a document that fits your needs.
Check out ReportBuilder # http://www.digital-metaphors.com/
It is a commercial reporting tool for Delphi - around a long time, very high quality, with all native Delphi source code packaged with it. I am using it for an important commercial project right now and I recommend it highly (I'm not working for them.) I've used MANY Delphi reporting tools over the years and this one is the best IMO.
RBuilder also has extensive support for paper form emulation see:
http://www.digital-metaphors.com/products/report_design/form_emulation.html
I haven't worked with that feature, but you can download a full-featured demo and try it.
Yoy can use Adobe Acrobat (full version) to create forms.
Then you can use free Acrobat Reader to display and print forms or other COM object in your application.
I think it is best solution for you.
PS
All tools for reports that are included in Delphi are free for you to design form and are free to distribute if user only preview and print already designed reports.
The same is valid for Adobe Acrobat (you may distribute forms) but you have added that you need to print form and some text over form. Maybe it is easier if you use reports but it is possible to do the same using PDF.
Most report engines are not open source but are free to distribute. There is many components for creating PDF - paid (one time), free, as well as open source.
PPS
I have read your updete for second time. Since you are using TBitmap and you can to TextOut so: You can use TMetafile. There is many editors for metafiles and it is free to distribute metafiles.

Get the word under the mouse cursor in Windows

Greetings everyone,
A friend and I are discussing the possibility of a new project: A translation program that will pop up a translation whenever you hover over any word in any control, even static, non-editable ones. I know there are many browser plugins to do this sort of thing on webpages; we're thinking about how we would do it system-wide (on Windows).
Of course, the key difficulty is figuring out the word the user is hovering over. I'm aware of MSAA and Automation, but as far as I can tell, those things only allow you to get the entire contents of a control, not the specific word the mouse is over.
I stumbled upon this (proprietary) application that does pretty much exactly what we want to do: http://www.gettranslateit.com/
Somehow they are able to get the exact word the user is hovering over in almost any application (It seems to have trouble in a few apps, notably Windows Explorer). It even grabs text out of obviously custom-drawn controls, somehow. At first I thought it must be using OCR. But even when I shrink the font so far down that the text becomes a completely unreadable blob, it can still recognize words perfectly. (And yet, it doesn't recognize anything if I change the font to Wingdings. But maybe that's by design?)
Any ideas as to how it's achieving this seemingly impossible task?
EDIT: It doesn't work with Wingdings, but it does work with some other nonsense fonts, so I've confirmed it can't be OCR.
You could capture the GDI calls that output text to the display, and then figure out which word's bounding box the cursor falls in.
Well, for GDI controls you can get the position and size of the control, and you can usually get the font info. For example, with static text controls you'd use WM_GETFONT. Then once you have that you can get the position of the mouse relative to the position of the control and use one of the font functions, perhaps something like GetTextExtentPoint32 to figure out what is under the cursor. I'm pretty sure the answer lies in that direction...
You can run dumpbin /imports on the other application and see what APIs they are calling.

Mirroring a portion of the screen to an external display (in OSX)

I would like to write a program that can mirror a portion of the main display into a new window. Ideally this new window could then be displayed on an external monitor. I have seen this uiltity for a flightsim that does this on a pc (a multifunction display extractor).
CLick here for a screenshot of the program (MFD Extractor)
This would be a live window ie. constantaly updated video display not just a static graphic.
I have looked at screen magnifiers or vnc clients for ideas but I think I need to write something from scratch. I have tried to do some reading on osx programing but where do I start in terms of gaining access to the display? I somehow need to extract the graphics from a particular program. Is it best to go near the final output stage (the individual pixels sent to the display) or somewhere nearer the window management stage.
Any ideas or pointers would be much appreciated. I just need somewhere to start from.
Regards,
There are a few ways to do this:
Quartz Display Services will let you get access to the video memory for a screen.
Quartz Window Services (a.k.a. CGWindow) will let you create an image of everything that lies below a window. If you create a borderless, transparent, empty, high-level window whose frame occupies an entire screen, everything below it will be everything on that screen. (Of course, you could create a smaller window in order to copy a section of the screen.)
There's also a way to do it using OpenGL that I never fully understood. That technique is demonstrated by a couple of code samples, OpenGLScreenSnapshot and OpenGLCaptureToMovie. It's more or less obsoleted by CGWindow, though.
Each of those will get you an image that you can then show or write to a file or something.
To show an image, use NSImageView or IKImageView. If you want to magnify it, IKImageView has a zoomFactor property, but if you want nearest-neighbor scaling (like Pixie, DigitalColor Meter, or xScope), I think you'll need to write a custom view for that (but even that isn't all that hard).

Resources