Greetings everyone,
A friend and I are discussing the possibility of a new project: A translation program that will pop up a translation whenever you hover over any word in any control, even static, non-editable ones. I know there are many browser plugins to do this sort of thing on webpages; we're thinking about how we would do it system-wide (on Windows).
Of course, the key difficulty is figuring out the word the user is hovering over. I'm aware of MSAA and Automation, but as far as I can tell, those things only allow you to get the entire contents of a control, not the specific word the mouse is over.
I stumbled upon this (proprietary) application that does pretty much exactly what we want to do: http://www.gettranslateit.com/
Somehow they are able to get the exact word the user is hovering over in almost any application (It seems to have trouble in a few apps, notably Windows Explorer). It even grabs text out of obviously custom-drawn controls, somehow. At first I thought it must be using OCR. But even when I shrink the font so far down that the text becomes a completely unreadable blob, it can still recognize words perfectly. (And yet, it doesn't recognize anything if I change the font to Wingdings. But maybe that's by design?)
Any ideas as to how it's achieving this seemingly impossible task?
EDIT: It doesn't work with Wingdings, but it does work with some other nonsense fonts, so I've confirmed it can't be OCR.
You could capture the GDI calls that output text to the display, and then figure out which word's bounding box the cursor falls in.
Well, for GDI controls you can get the position and size of the control, and you can usually get the font info. For example, with static text controls you'd use WM_GETFONT. Then once you have that you can get the position of the mouse relative to the position of the control and use one of the font functions, perhaps something like GetTextExtentPoint32 to figure out what is under the cursor. I'm pretty sure the answer lies in that direction...
You can run dumpbin /imports on the other application and see what APIs they are calling.
Related
For a screen sharing collaborative app, it is desirable to give the remote party a mouse cursor also.
Is it possible to leverage NSCursor?
If not, I guess I must create a small accurately positioned top-level window displaying a bitmap (I've asked here if there is some way to retrieve the bitmaps from the system cursors).
Code/link contributions appreciated. If I figure it out first I will (as always) answer my own question.
Given a rectangle that represents an area on a Windows screen that contains text, what is the best way to extract the text?
I know that it is possible using OCR, but even after significant pre processing, the quality is really poor.
Getting the Window Text using Win32 API does not always work as well.
Assuming that the text was rendered using a font, is it possible to get it from there?
Any directions would be extremely helpful. Thanks!
Given a rectangle that represents an area on window screen, the best way to extract text is indeed OCR. Use a better OCR library like this one from Microsoft.
The reason getting the window text using Win32 API does not work well is because there may be multiple windows in that rectangle. You will have to find out what all windows the rectangle contains and send a message to get the text for each window. It is not impossible but difficult to do and even if you manage to do that, you will run into issues of text alignment, etc. OCR is your best option.
It does seem possible without using OCR, as NirSoft SysExporter can do this:
https://www.nirsoft.net/utils/sysexp.html
This may be suitable for programmatic use as it can be run from a command line:
Starting from version 1.70, you can export the content of Windows
control from command-line, without displaying any user interface.
You may not be able to target it at a specific rectangle on the screen, but maybe the same result could be achieved by first scraping everything followed by some post-processing.
Further basic info:
SysExporter utility allows you to grab the data stored in standard
list-views, tree-views, list boxes, combo boxes, text-boxes, and
WebBrowser/HTML controls from almost any application running on your
system, and export it to text, HTML or XML file.
...
Known Limitations
SysExporter can export data from most combo boxes, list boxes,
tree-view, and list-view controls, but not from all of them. There are
some applications that use these controls to display data, but the
data itself is not actually stored in the control, but in another
location in the computer's memory. In such cases, SysExporter won't be
able to export the data.
Personally I've used it to grab text from what look like label controls.
I would like to write a program that can mirror a portion of the main display into a new window. Ideally this new window could then be displayed on an external monitor. I have seen this uiltity for a flightsim that does this on a pc (a multifunction display extractor).
CLick here for a screenshot of the program (MFD Extractor)
This would be a live window ie. constantaly updated video display not just a static graphic.
I have looked at screen magnifiers or vnc clients for ideas but I think I need to write something from scratch. I have tried to do some reading on osx programing but where do I start in terms of gaining access to the display? I somehow need to extract the graphics from a particular program. Is it best to go near the final output stage (the individual pixels sent to the display) or somewhere nearer the window management stage.
Any ideas or pointers would be much appreciated. I just need somewhere to start from.
Regards,
There are a few ways to do this:
Quartz Display Services will let you get access to the video memory for a screen.
Quartz Window Services (a.k.a. CGWindow) will let you create an image of everything that lies below a window. If you create a borderless, transparent, empty, high-level window whose frame occupies an entire screen, everything below it will be everything on that screen. (Of course, you could create a smaller window in order to copy a section of the screen.)
There's also a way to do it using OpenGL that I never fully understood. That technique is demonstrated by a couple of code samples, OpenGLScreenSnapshot and OpenGLCaptureToMovie. It's more or less obsoleted by CGWindow, though.
Each of those will get you an image that you can then show or write to a file or something.
To show an image, use NSImageView or IKImageView. If you want to magnify it, IKImageView has a zoomFactor property, but if you want nearest-neighbor scaling (like Pixie, DigitalColor Meter, or xScope), I think you'll need to write a custom view for that (but even that isn't all that hard).
I'm developing a Vista/Win7 Desktop Gadget that uses a translucent g:background (doc) area with g:text (doc) on top. I'm adding the text via addTextObject (doc), and this all works as expected.
However, I can't figure out how to set that text to bold style. There doesn't seem to be a way to do this directly via the exposed properties that I can see, and I can't use regular text + CSS in this case due to the fact this text is placed onto a g:background object.
I have also tried specifying a bold font directly, such as Arial Bold (doesn't work) instead of Arial (works).
So how can this be done?
Edit: I have tried setting font-weight:bold for both the body and the g:background object that parents my text; no luck.
See Flip Calendar, by Jonathan Abbott. His code is usually well commented so maybe you can get some ideas from that.
EDIT
The source of my information was from the early days of Vista Beta 2 where that was the official word from MS. I also found the following response to a thread on the MSDN forums regarding the Flip Calendar gadget itself:
http://social.msdn.microsoft.com/Forums/en-US/sidebargadfetdevelopment/thread/841e9d5e-32e9-453f-bd0e-dc5a4e607c33/
The gadget has options for setting bold font on the day of the month (a g:text object) but on closer inspection it doesn't work. Sorry about that. The MS guys have been known to be wrong as well on one or more occasions. I can honestly say that I don't use the g:text object.
This means your only (well, non activex route) option is VML text, which provides a lot of flexibility on layout. However, you will have to place it on a fully opaque area of the gadget which is probably why you wanted to use the addTextObject in the first place. Gary Beene's site really helped me out when I was getting started, but it doesn't go into any detail on the v:textbox element and the v:textpath element, though the MSDN documentation goes into enough detail on these.
If you need to place the text on a non-fully opaque area of the gadget, then you could still go the VML route and place an image behind the text that acts as a shadow, starting out fully opaque and fading to fully transparent. This is how Microsoft does text in window title bars with aero enabled.
Alternatively, you could create an ActiveXObject that draws the text you need in the font you want and saves the image to a temporary file in the gadget folder. Then you set that to the src of an addImageObject. I've done something similar in a gadget and it's fast enough not to be noticeable. You can also set min/max dimensions so shrinking/stretching to fit becomes a breeze.
Pardon my frustration. I've asked about this in many places and I seriously don't think that there wouldn't be a way in Windows 7 SDK to accomplish this.
All I want, is to capture part of a 'child window' ( setParent() ) created by a parent. I used to do this with bitblt() but the catch is that the child window can be any type of application, and in my case has OpenGL running in a section of it. If I bitblt() that, then the OGL part comes blank, doesn't get written to the BMP.
DWM, particularly dwmRegisterThumbnail() doesn't allow thumbnail generation of child windows. So please give me a direction.
Thanks.
It's been a while since I did any of this, so my explanation might be a bit vague, but from what I remember, the Windows doesn't "see" the OpenGL rendered inside the window.
What Windows does is create the window at the specified size and then "hands it over" to OpenGL for rendering. This means that you can't get at the pixels as rendered from the Windows side of the code.
When we wanted to capture the 3D we had to re-render the screen to an off screen bitmap which was then saved (or printed).
Obviously a whole screen capture (Print Screen) works because it's reading the final pixels.
I suggest that you:
Forget the Thumbnail part of the task (in terms of capture).
Calculate where your window is.
Capture full screen.
Excise the area you are interested in (using data from step 2).
Rescale to the appropriate thumbnail size.
Sorry, its more work, but it should work, which is better than what you have right now.
This may help:
http://code.google.com/p/telekinesis/source/browse/trunk/Mac/Source/glgrab.c?r=140
http://www.codeproject.com/KB/dialog/screencap.aspx
Also Java's Robot class (http://java.sun.com/javase/6/docs/api/java/awt/Robot.html#createScreenCapture%28java.awt.Rectangle%29)
I don't have access to the source code of any child window that may be open including the one with OpenGL