My goal is to screen-scrape a portion of a program which constantly updates with new text. I have tried OCR with Tesseract but I believe it would be much more efficient to somehow intercept the text if possible. I have attempted using the GetWindowText() function, but it only returns the window title. Using Window Detective I have determined that whenever the window updates in the way I wish to capture, a WM_PAINT message is reliably sent to the window.
I have looked a bit into Windows API Hooks, but it seems that most of these techniques involving DLL injection are intended at sending new messages, not accessing the content of already sent messages.
How should I approach this problem?
When you say 'screen-scrape', is that what you really mean? Reading your post, it sounds like you actually want to get at the text in the child window or control in question - as text, and not just as a bitmap. To do that, you will need to:
Determine which child window or control actually contains the text you want to get at. It sounds like you may have already done that but if not, the tool of choice is generally Spy++. (Please note: the version of Spy that you use must match the 'bitness' of your application.)
Then, firstly, try to figure out whether the text in that window can be retrieved somehow. If it's a standard Windows control (specifically EDIT or RICHEDIT) then there are documented ways to do that, see MSDN.
If that doesn't pan out, you might have some success hooking calls to ExtTextOut(), although that's not a pleasant proposition and I think you might struggle to achieve it. That said, I believe the accepted way (in some sense of the word 'accepted') is here.
With reference to point 3, even if you achieve it, how would you know whether any particular call to ExtTextOut() was drawing to the window you're interested in? Answer, most likely, HWND WindowFromDC().
I hope that helps a little. Please don't come back at me with a bunch of detailed questions about how this might apply to your particular use-case. I'm not really interested in that, these are just intended as a few signposts.
Related
I've written an NSTextView subclass that does frequent programmatic modification of the text within itself (kinda like an IDE's code formatting - auto-insertion of close braces, for example).
My initial implementation of this used NSTextView's insertText:. This actually appeared to work completely fine. But then while reading the NSTextView documentation (which I do for fun sometimes), I noticed in the Discussion section for insertText:
This method is the entry point for inserting text typed by the user and is generally not suitable for other purposes. Programmatic modification of the text is best done by operating on the text storage directly.
Oh, my bad, I thought. So I dutifully went around changing all my insertText calls to calls to the underlying NSTextStorage (replaceCharactersInRange:withString:, mostly). That appeared to work OK, until I noticed that it completely screws up Undo (of course, because Undo is handled by NSTextView, not NSTextStorage).
So before I haul off and put a buncha undo code in my text storage, I wonder if maybe I've been Punk'd, and really insertText: isn't so bad?
Right, so my question is this: is NSTextView's insertText: call really "not suitable" for programmatic modification of the text of an NSTextView, and if so, why?
insertText: is a method of NSResponder -- in general these would be thought of as methods that respond to user events. To a certain degree they imply a "user action." When the docs tell you to edit the NSTextStorage directly if you want to change things programmatically, the word "programmatically" is being used to distinguish user intent from application operation. If you want your changes to be undoable as if they were user actions, then insertText: seems like it would be OK to use. That said, most of the time, if the modification was not initiated by a user action, then the user won't consider it to be an undoable action, and making it a unit of undoable action would lead to confusion.
For example, say I pasted a word, "foo", into your text view, and your application then colored that word red (for whatever reason). If I then select undo, I expect my action to be the thing that's undone, not the coloring. The coloring isn't part of my user intent. If I then have to hit Cmd-Z again to actually undo my action, I'm left thinking, "WTF?"
NSUndoManager has support for grouping events via beginUndoGrouping and endUndoGrouping. This can allow the unit of user intent (the paste operation) to be grouped with the application coloring into a single "unit" of undo. In the simplest case, what you might want to try here is to set groupsByEvent on the NSUndoManager to YES and then find a way to trigger your application's action to occur in the same pass of the runLoop as the user action. This will cause NSUndoManager to group them automatically.
Beyond that, say if your programmatic modifications need to come in asynchronously or something, you'll need to manage these groupings somehow yourself, but that's likely going to be non-trivial to implement.
I don't know if you saw my similar question from a couple years ago, but I can tell you that #ipmcc is correct and that trying to manually manage the undo stack while making programmatic changes to the NSTextStorage is extremely non-trivial. I spent weeks on it and failed.
But your question and #ipmcc's answer make me think that what I was trying to do (and it sounds like pretty much the exact same thing that you are trying to do) may actually be more in the realm of responding to user intent than what the docs mean by programmatic change. So maybe your original solution of using insertText: is the right way to do it. It's been so long since I abandoned my project that I can't remember for sure if I ever tried that or not, but I don't think I did because I was trying to build my editor using just delegate methods, without subclassing NSTextView.
In my case, as an example, if the user selects some text and hits either the open or closed bracket key, instead of the default behavior of replacing the selected text with the bracket, what I want to do is wrap the selected text in brackets. And if the user then hits cmd-Z, I want the brackets to disappear.
Since I posted this question, I've moved forward with my original implementation, and it appears to be working great. So I think the answer to this question is:
insertText: is completely suitable for the programmatic modification
of text, for this particular use case.
I believe that the note is referring to pure programmatic modification of text, for example, setting all the text of a textview. I could definitely see that insertText: would not be appropriate for that. For my intended purpose, however - adding to or editing characters in direct response to user actions - insertText: is entirely appropriate.
In order to make my text modifications atomic with the user interactions that triggered them (as #ipmcc mentions in his answer), I do my own undo handling in an insertText: override. I wrote about this in #pjv's similar question. There's also a sample project on github, and I'll probably write it up on my blog at some point.
I've just got a fresh device context (DC):
GetDC(someForeignHwnd)
Most normal people now want to paint on this. I don't. I want to display the context in my own program. Or duplicate, I wouldn't even mind the window I stole the context from beeing empty.
In my case, I want it in a TPanel in Delphi, but anything else helping me understanding goes.
Afterwards, I'll probably find the DC invalid by the time I get to display it.
My main problem is: Showing the content of another window in my own. But that isn't important. First of all, I want to know how these DC are of any use. Can I do something like the following?
Canvas.Draw(0, 0, MyNewDC);
The answer can be in Java, C, or Pascal. Is it just not possible or just a stupid idea?
While it's possible to use a device context that you retrieve via GetDC() as the SOURCE for BitBlt(), etc., you will likely not get the results that you're looking for. When you call GetDC() for a specific window, Windows essentially returns a device context for the screen, but with a clipping region set to exclude any portions of the screen where the window is not visible. For example, if there happens to be another window overlapping the source window, the portion of the source window that is covered is clipped from the device context. Therefore, you can only "retrieve" the bits that are actually visible.
You may have better luck sending a WM_PRINT or WM_PRINTCLIENT message to the window. However, not all windows respond to these messages, so this isn't a universal solution.
I have a bit of a challenge.
In an earlier version of our product, we had an error message window (last resort, unhandled exception) that showed the exception message, type, stack trace + various bits and pieces of information.
This window was printscreen-friendly, in that if the user simply did a printscreen-capture, and emailed us the screenshot, we had almost everything we needed to start diagnosing the problem.
However, the form was deemed too technical and "scary" for normal users, so it was toned down to a more friendly one, still showing the error message, but not the stack trace and some of the more gory details that I'd still like to get. In addition, the form was added the capabilities of emailing us a text file containing everything we had before + lots of other technical details as well, basically everything we need.
However, users still use PrintScreen to capture the contents of the form and email that back to us, which means I now have a less than optimal amount of information to go on.
So I was wondering. Would it be possible for me to pre-render a bitmap the same size as my form, with everything I need on it, detect that PrintScreen was hit and quickly swap out the form contents with my bitmap before capture, and then back again afterwards?
And before you say "just educate the users", yes, that's not going to work. These are not out users, they're users at our customers place, so we really cannot tell them to wisen up all that much.
Or, barring this, is there a way for me to detect PrintScreen, tell Windows to ignore it, and instead react to it, by dumping the aformentioned prerendered bitmap onto the clipboard ready for placing into an email?
The code is C# 3.0 in .NET 3.5, if it matters, but pointers for something to look at/for is good enough.
Our error-reporting window has these capabilities:
Show a screenshot that was taken when the error occured (contains all the open windows of the program at the time, before the error dialog was shown)
Show a text file containing every gory detail we can think of (but no sensitive stuff)
Save the above two files to disk, for latter attaching to an email or whatnot by the user
Sending the above two files to us by email, either by opening a new support case, or entering an existing support case number to add more information to it
Ignore the problem and hope it goes away (return to app)
Exit the application (last resort)
We still get screenshots from some users. Not all, mind you, so my question is basically how I can make the PrintScreen button help us a bit more for those users that still use it.
One option: Put the stack trace and other scary stuff into the error screen using small, low-contrast type -- e.g. dark gray on light gray -- so that the user doesn't really even see it, but the Print Screen captures it.
But if you want to detect the PrintScreen and do your own thing, this looks like an example of what you want.
Wouldn't it be possible to disable the Print Screen button altogether when the error popup is active? Have it display a message along the lines of "Please use the clearly visible button in the middle of your screen to report the error" I agree it breaks expected functionality, but if your users are really that stupid, what can you do...
Alternatively, have it report errors automatically (or store the data locally, to be fetched later, if you can't send without asking for some reason), without asking the user. If you want to be able to connect print screened screenshots with detailed error data, have it send a unique ID with the data that's also displayed in the corner of the popup.
What about offering them a "Print Screen" button that performs these actions as well as performing the print screen? If you're locked into this method of having your customers send error details, this may be an easier route to take.
Lifted from my comment below for easier reference (looks helpful, perhaps):
codeproject.com/KB/cs/PrintScreen.aspx
This is in theory...the best way to deal with it I would think
Intercept a WM_PRINT message or inject one into your process... see this article here
Install a system-wide keyboard hook and intercept the print-screen key and swap it around with your contents prior to the capture. Now, I can point you to several places for this, here on CodeProject, and here also, keyboard spy, and finally, global Mouse and keyboard hook on CodeProject.
Now, once you intercept the print screen, invoke the WM_PRINT message by capturing the contents that you want to capture.
I know this is brief and short, but I hope this should get you going.
The only solution i came up with was to offer big, large, easy to read toolbar buttons that give the user every opportunity to save the contents of the error dialog:
Save
Copy to clipboard
Send using e-mail
Print
And after all that, i use the Windows function SetWindowDisplayAffinity in order to show the user a black box where the form should be:
This function and GetWindowDisplayAffinity are designed to support the window content protection feature that is new to Windows 7. This feature enables applications to protect their own onscreen window content from being captured or copied through a specific set of public operating system features and APIs. However, it works only when the Desktop Window Manager(DWM) is composing the desktop.
It is important to note that unlike a security feature or an implementation of Digital Rights Management (DRM), there is no guarantee that using SetWindowDisplayAffinity and GetWindowDisplayAffinity, and other necessary functions such as DwmIsCompositionEnabled, will strictly protect windowed content, for example where someone takes a photograph of the screen.
If their screenshots show a big black box, hopefully they'll get the hint.
I did add a defeat, if they hold down shift while clicking "show error details", i don't add the protection during form construction:
//Code released into public domain. No attribution required.
if (!IsShiftKeyPressed())
SetWindowDisplayAffinity(this.Handle, WDA_MONITOR); //Please don't screenshot the form, please e-mail me the contents!
I have a 3rd party ActiveX control I want to render within other presentation technologies (Direct3D and WPF). To do this, I need the ActiveX to render to a system memory bitmap instead of the screen. I know there is a way to do this, but not sure where to start. I'm not afraid of doing any native method hooking, but I'm not sure where to start. I was thinking BeginPaint(...) might be the hot ticket...
Has anyone done this or seen examples/samples floating around?
BTW, I do not want to do a WM_PRINT type solution. I'd rather this code be reactive, than proactive and forcing the hwnd to repaint.
EDIT:
Both answers were right in my case, so I gave each a +1. I wish to have a lower level solution to be more flexible, but this is good enough at the moment.
You're best bet is to see if the ActiveX object supports IViewObject and call the Draw() method to get it to render into your DC.
If the object doesn't support standard OLE interfaces, you're going to have a very difficult time getting this to work. You'll have to resort to doing evil things like re-writing the DLL's import table to redirect BitBlt(), etc. I've done this sort of thing. I don't recommend it.
Your next problem will be how to correctly map input events.
Does your site support the IOleInPlaceSiteWindowless interface?
Windowless support in ActiveX controls is unfortunately optional. If the 3rd party interface does support it then you can render the control using IViewObject::Draw onto whatever surface you want.
I'm currently working on a program that has many of those "the user SHOULD read it but he'll click OK like a stupid monkey" dialogs... So I was thinking of adding something like a captcha in order to avoid click-without thinking...
My ideas were:
Randomly change buttons
Randomly position buttons somewhere on the form
The user must click on a randomly colored word within the text he should read
add captcha
add captcha that includes the message for the user
Has anybody made any experience with such a situation. What would you suggest to do?
Well, you asked for opinions and here goes mine, but I don't think this is what you would like to hear...
Users like programs that they can depend on. They don't like when things change and they don't like to do extra work.
Randomly change buttons and Randomly position buttons somewhere on the form will only make them either press the wrong button or become annoyed with your application, because as you say, they don't read the text, and if you think about it, neither do we. As an example think of an Ok/Cancel dialog, you allways expect the ok button to be on the left, and most times i press it without reading it. It Will happen exactly the same with your users.
The user must click on a randomly colored word within the text he should read
add captcha
add captcha that includes the message for the user
With these 3 option you will add extra work to your application, your users will curse you for that. Just think of something that you would have to do 10x per day, let's say check in your code to source safe. How would you feel if your boss told you that from now on you will have to fill a captcha for each file you try to check in?
I think it's our jobs to make the lives of the people that use our software easier. If they must read some kind of text and they don't want to, there is absolutely no way you can make them do it.
You can´t make people work right, all you can do is provide them with the best possible tools and hope that they are professional enough to do their jobs.
So basically all i'm saying is, do your best to ease their work. If this is really important than you(or whoever is in charge) should talk to them and EXPLAIN WHY this is important.
You would be surprised on how people commit to things they understand.
I suggest that you don't; and that, unless you know better, you emulate respectable well-known, well-tested UIs like <big online retailer's> or <online banking site>.
Playing games with the user in order to get them to read messages is doomed. Users will focus mental resources on completing your game, rather than understanding the message. Your users may be less likely to actually understand the important part of the message if you have things like moved buttons, relabeling, scavenger hunts, captchas, or delays. They’ll focus on the instructions for the game, not on the real issue. Errors are likely to increase.
Users’ refusal to read message boxes is due to users wanting to get things done quickly rather than take the time to read stuff, and it is also due to message boxes being overused and misused so badly in so many apps. Including silly games in message boxes will just make users resent them all the more, compounding the problem.
Here’s what you can do:
Rule 1. Don’t use messages boxes. They should only appear for exceptional circumstances. An app should not have “many” message boxes. It should not be necessary to read a whole lot of documentation each time the user uses an app. If normal use of your app results in a message box, then your UI is wrong. Find another way.
Instead of verification messages, show clearly in the main window what has happened and provide a clear way to Undo it.
Use auto-correction, pictured/masked fields, and disabling rather than error messages.
Use good defaults and automation to avoid messages. For example, rather than showing an error message saying the user can’t upload because they’re not connected to the server, simply reconnect automatically.
Break commands along options. Rather that a message box to ask if the user wants paste with or without format, provide two different commands in the menu.
Don’t have information messages spontaneously popping up telling the user everything worked fine (e.g., “Preferences Saved!”)
Don’t have pop-ups providing helpful hints or documentation. Provide a tutorial or balloon help if you can’t make your UI self-documenting.
Don’t have nagging “upgrade me” messages.
Consider providing message text in the main window rather than in a separate message box (e.g., “Page may not look or act right because ActiveX is off for security.”). Pop-ups from web surfing have conditioned users to automatically dismiss anything that pops up as irrelevant.
Rule 2. If you have to use a message:
Make the text as brief as possible to get the key information across. More text is not equivalent to more helpful. Use “No match to [filemask] in [path].” Don’t use “Nonfatal Error 307: Search action aborted. [Appname] is unable to complete your string search for the regular expression you provided because the file mask you gave, namely [filemask], does not result in any matching files in the directory that you specified (which was [path]). Please check your filemask or path selection and again re-enter it or them in the Files to Search dialog box. Click the OK button below on this message box to return to the Files to Search dialog box. Click the Cancel Button on the Files to Search dialog when you get there to cancel your search for strings.” If there are some users who will need more explanation than can be achieved in a brief message, provide a Help button or a “How do I…” link in the message box.
Use plain language and no jargon in the message. That includes “innocent” words like “dialog,” “database,” and “toner.” Do not take raw exception text and throw it in a error message. Do not include any error numbers or dumps; log these instead. Purge your app of any debugging message boxes left by developers. Better to simply let the app disappear on a fatal error than to put up a message full of jargon and then the app disappears.
Label the buttons of a message box with what the action does, not “OK.” At the very least, the users have to focus on the activating button to dismiss a message box. If that button is labeled something like “Delete” or “Install,” it should give them pause. You should never have to explain in your message text what each button does. BTW, such labeling is a GUI standard on most platforms.
Redesign your application so that it does not use message boxes.
My suggestion, live with it or redesign your dialogs/interface. Do not add randomness to dialogs or otherwise treat the user like an idiot, even though you may think most are :-).
I just recently read a Joel on Software article, Designing for People Who Have Better Things To Do With Their Lives. It makes the point that most people won't read anything and discusses ways to work around that or at least not make it worse.
You could try with a timer which waits for the "supposed reading time" before enabling the submit button. You can even calculate the supposed reading time from the number of words.
I think that subtle ways to force the user to read your text (like moving around buttons or asking them to read a captcha) can make them feel like stupid monkeys.
You could use a choice question based on what the user should read.