My goal is to screen-scrape a portion of a program which constantly updates with new text. I have tried OCR with Tesseract but I believe it would be much more efficient to somehow intercept the text if possible. I have attempted using the GetWindowText() function, but it only returns the window title. Using Window Detective I have determined that whenever the window updates in the way I wish to capture, a WM_PAINT message is reliably sent to the window.
I have looked a bit into Windows API Hooks, but it seems that most of these techniques involving DLL injection are intended at sending new messages, not accessing the content of already sent messages.
How should I approach this problem?
When you say 'screen-scrape', is that what you really mean? Reading your post, it sounds like you actually want to get at the text in the child window or control in question - as text, and not just as a bitmap. To do that, you will need to:
Determine which child window or control actually contains the text you want to get at. It sounds like you may have already done that but if not, the tool of choice is generally Spy++. (Please note: the version of Spy that you use must match the 'bitness' of your application.)
Then, firstly, try to figure out whether the text in that window can be retrieved somehow. If it's a standard Windows control (specifically EDIT or RICHEDIT) then there are documented ways to do that, see MSDN.
If that doesn't pan out, you might have some success hooking calls to ExtTextOut(), although that's not a pleasant proposition and I think you might struggle to achieve it. That said, I believe the accepted way (in some sense of the word 'accepted') is here.
With reference to point 3, even if you achieve it, how would you know whether any particular call to ExtTextOut() was drawing to the window you're interested in? Answer, most likely, HWND WindowFromDC().
I hope that helps a little. Please don't come back at me with a bunch of detailed questions about how this might apply to your particular use-case. I'm not really interested in that, these are just intended as a few signposts.
I'm working on a custom cross platform UI library that needs a synchronous "ShowPopup" method that shows a popup, runs an event loop until it's finished and automatically cancels when clicking outside the popup or pressing escape. Keyboard, mouse and scroll wheel events need to be dispatched to the popup but other events (paint, draw, timers etc...) need to be dispatched to their regular targets while the loop runs.
Edit: for clarification, by popup, I mean this kind of menu style popup window, not an alert/dialog etc...
On Windows I've implemented this fairly simply by calling GetMessage/DispatchMessage and filtering and dispatching messages as appropriate. Works fine.
I've much less experience with Cocoa/OS X however and finding the whole event loop/dispatch paradigm a bit confusing. I've seen the following article which explains how to implement a mouse tracking loop which is very similar to what I need:
http://stpeterandpaul.ca/tiger/documentation/Cocoa/Conceptual/EventOverview/HandlingMouseEvents/chapter_5_section_4.html
but... there's some things about this that concern me.
The linked article states: "the application’s main thread is unable to process any other requests during an event-tracking loop and timers might not fire". Might not? Why not, when not, how to make sure they do?
The docs for nextEventMatchingMask:untilDate:inMode:dequeue: states "events that do not match one of the specified event types are left in the queue.". That seems a little odd. Does this mean that if an event loop only asks for mouse events then any pressed keys will be processed once the loop finishes? That'd be weird.
Is it possible to peek at a message in the event queue without removing it. eg: the Windows version of my library uses this to close the popup when it's clicked outside, but leaves the click event in the queue so that clicking outside the popup on a another button doesn't require a second click.
I've read and re-read about run loop modes but still don't really get it. A good explanation of what these are for would be great.
Are there any other good examples of implementing an event loop for a popup. Even better would be pseudo-code for what the built in NSApplication run loop does.
Another way of putting all this... what's the Cocoa equivalent of Windows' PeekMessage(..., PM_REMOVE), PeekMessage(..., PM_NOREMOVE) and DispatchMessage().
Any help greatly appreciated.
What exactly is a "popup" as you're using the term? That term means different things in different GUI APIs. Is it just a modal dialog window?
Update for edits to question:
It seems you just want to implement a custom menu. Apple provides a sample project, CustomMenus, which illustrates that technique. It's a companion to one of the WWDC 2010 session videos, Session 145, "Key Event Handling in Cocoa Applications".
Depending on exactly what you need to achieve, you might want to use an NSAlert. Alternatively, you can use a custom window and just run it modally using the -runModalForWindow: method of NSApplication.
To meet your requirement of ending the modal session when the user clicks outside of the window, you could use a local event monitor. There's even an example of just such functionality in the (modern, current) Cocoa Event Handling Guide: Monitoring Events.
All of that said, here are (hopefully no longer relevant) answers to your specific questions:
The linked article states: "the application’s main thread is unable to process any other requests during an event-tracking loop and
timers might not fire". Might not? Why not, when not, how to make
sure they do?
Because timers are scheduled in a particular run loop mode or set of modes. See the answer to question 4, below. You would typically use the event-tracking mode when running an event-tracking loop, so timers which are not scheduled in that mode will not run.
You could use the default mode for your event-tracking loop, but it really isn't a good idea. It might cause unexpected re-entrancy.
Assuming your pop-up is similar to a modal window, you should probably use NSModalPanelRunLoopMode.
The docs for nextEventMatchingMask:untilDate:inMode:dequeue:
states "events that do not match one of the specified event types are
left in the queue.". That seems a little odd. Does this mean that if
an event loop only asks for mouse events then any pressed keys will be
processed once the loop finishes? That'd be weird.
Yes, that's what it means. It's up to you to prevent that weird outcome. If you were to read a version of the Cocoa Event Handling Guide from this decade, you'd find there's a section on how to deal with this. ;-P
Is it possible to peek at a message in the event queue without removing it. eg: the Windows version of my library uses this to close
the popup when it's clicked outside, but leaves the click event in the
queue so that clicking outside the popup on a another button doesn't
require a second click.
Yes. Did you notice the "dequeue:" parameter of nextEventMatchingMask:untilDate:inMode:dequeue:? If you pass NO for that, then the event is left in the queue.
I've read and re-read about run loop modes but still don't really get it. A good explanation of what these are for would be great.
It's hard to know what to tell you without knowing what you're confused about and how the Apple guide failed you.
Are you familiar with handling multiple asynchronous communication channels using a loop around select(), poll(), epoll(), or kevent()? It's kind of like that, but a bit more automated. Not only do you build a data structure which lists the input sources you want to monitor and what specific events on those input sources you're interested in, but each input source also has a callback associated with it. Running the run loop is like calling one of the above functions to wait for input but also, when input arrives, calling the callback associated with the source to handle that input. You can run a single turn of that loop, run it until a specific time, or even run it indefinitely.
With run loops, the input sources can be organized into sets. The sets are called "modes" and identified by name (i.e. a string). When you run a run loop, you specify which set of input sources it should monitor by specifying which mode it should run in. The other input sources are still known to the run loop, but just ignored temporarily.
The -nextEventMatchingMask:untilDate:inMode:dequeue: method is, more or less, running the thread's run loop internally. In addition to whatever input sources were already present in the run loop, it temporarily adds an input source to monitor events from the windowing system, including mouse and key events.
Are there any other good examples of implementing an event loop for a popup. Even better would be pseudo-code for what the built in
NSApplication run loop does.
There's old Apple sample code, which is actually their implementation of GLUT. It provides a subclass of NSApplication and overrides the -run method. When you strip away some stuff that's only relevant for application start-up or GLUT, it's pretty simple. It's just a loop around -nextEventMatchingMask:... and -sendEvent:.
I am writing an app with raw windows API (opensource Win32++) where I have a ListView.
The problem I have now is that whenever an item in the ListView is clicked, the system/app will generate a warning tone/sound "ding". Furthermore, I noticed I cannot get the LVN_ITEMACTIVATE through item-dbl-click or item-keypress-enter, which would normally work if this problem had not occur.
Would anyone have any idea how this might be happening?
I believe there is nothing wrong with Win32++, it just could be one of the things I do is causing this. But my program has become quite big to dissect plus I have no idea where to start looking.
Thanks.
PS: I had my computer muted for the longest time, hence, I don't know when this started eventhough I had the listview since a long time ago. T_T
Start looking with a tool that can show you the Windows messages that the control generates and receives. Like Microsoft's Spy++. Compare it with a working list view to have an idea what might be amiss. Also check the parent window. I haven't otherwise heard of listviews that dingaling, the LVN_ACTIVATEITEM should fire the first WM_LBUTTONDOWN, no double-click necessary.
I am not sure which class I need. I am looking to implement the following:
User clicks on NSButton ( Have this working)
Have a background task run once he does this. (have this working)
While the task is in progress, I want to display a small popup which indicates the task is in progress and asks the user to wait/give him some more informtion. ( This is where I need help)
I am not really sure what class would help me get this popup..NSPopup /NsAlert ?
Appreciate any pointers. Thanks.
In addition to Peter's answer: Spend some time perusing Interface Builder's Library palette so you know what standard controls are available and what they're called. Spend more time reading the AppKit API reference to get a feel for how they work in general.
The specific control to display progress is an NSProgressIndicator. As Peter said, you can put this in a view or a sheet but the best approach really depends on the factors Peter mentioned.
It should simply be a view in your window.
If the user cannot do anything in the window until the operation finishes (and you should do everything you can to prevent such obstruction), then it should be a sheet over the window.
If the user can do more than one of these operations at the same time and they don't prevent the user from doing other things with the window, then a view in the window will become unwieldy, and you should move the progress display to a dedicated progress window, as done by Transmit, Mail, Adium, Finder, Amadeus Pro, and many other applications.
This is inspired by the question OK-Cancel or Cancel-OK?.
I remember reading somewhere about the concept of switching OK-Cancel/Cancel-OK in certain situations to prevent the user from clicking through information popups or dialog boxes without reading their content. As far as I remember, this also included moving the location of the OK button (horizontally, left to right) to prevent the user from just remembering where to click.
Does this really make sense? Is this a good way to force the user to "think/read first, then click"? Are there any other concepts applicable to this kind of situation?
I am particularly thinking of a safety-related application, where thoughtlessly pressing OK out of habit can result in a potentially dangerous situation whereas Cancel would lead to a safe state.
Please don't do this unless you are really, really, really sure it's absolutely required. This is a case of trying to fix carelessness and stupidity by technological means, and that sort of thing almost never works.
What you could do is use verbs or nouns instead of the typical Windows OK / Cancel button captions. That will give you an instant attention benefit without sacrificing predictability.
NOOOOOOOOOOOO!
In one of our products we have a user option to require Ctrl+Click for safety related commands.
But startling the user with buttons that swap place or move around is bad design in my book.
NO. If you make it harder for the user to click OK by mistake and force them to think, they will still only think harder about how to click OK -- they will not think about the actual thing they're trying to carry out. See usability expert Aza Raskin's article: Never use a warning when you mean Undo. Quote:
What about making the warning
impossible to ignore? If it’s
habituation on the human side that is
causing the problem, why not design
the interface such that we cannot form
a habit. That way we’ll always be
forced to stop and think before
answering the question, so we’ll
always choose the answer we mean.
That’ll solve the problem, right?
This type of thinking is not new: It’s
the
type-the-nth-word-of-this-sentence-to-continue approach. In the game Guild Wars, for
example, deleting a character requires
first clicking a “delete” button and
then typing the name of the character
as confirmation. Unfortunately, it
doesn’t always work. In particular:
It causes us to concentrate on the unhabitual-task at hand and not on
whether we want to be throwing away
our work. Thus, the
impossible-to-ignore warning is little
better than a normal warning: We end
up losing our work either way. This
(losing our work) is the worst
software sin possible.
It is remarkably annoying, and because it always requires our
attention, it necessarily distracts us
from our work (which is the second
worst software sin).
It is always slower and more work-intensive than a standard
warning. Thus, it commits the third
worst sin—requiring more work from us
than is necessary.
[If you want a Microsoftish one, this one by a .NET guy on MSDN says the same thing!]
If you must use a dialog, put descriptive captions on the buttons within the dialog.
For example, instead of OK and Cancel buttons, have them say "Send Invoice" and "Go Back", or whatever is appropriate in the context of your dialog.
That way, the text is right under their cursor and they have a good chance of understanding.
The Apple Human Interface Guideline site is a great reference, and very readable. This page on that site talks about Dialogs.
Here is an example image:
(source: apple.com)
No, it doesn't make sense. You're not going to "make" users read. If the decision is that crucial, then you're better off finding a way to mitigate the danger rather than handing a presumed-careless user a loaded gun.
Making the "safe" button default (triggered by enter/spacebar/etc.) is a good idea regardless, simply because if they surprise the user then a keystroke intended for the expected window won't accidentally trigger the unexpected action. But even in that scenario, you must be aware that by the time the user has realized what they've done, the choice is already gone (along with any explanatory text on the dialog). Again, you're better off finding another way to give them information.
What I've done in some instances was to compare the time of the message box being shown with the time of it being dismissed. If it was less than 'x' amount of seconds, it popped right back up. This forced them, in most cases, to actual read what was on the screen rather than just clicking through it blindly.
Fairly easy to do, as well....
Something like this:
Dim strStart As DateTime = Now
While Now < strStart.AddSeconds(5)
MessageBox.Show("Something just happened", "Pay Attention", MessageBoxButtons.OK)
If Now < strStart.AddSeconds(5) Then strStart = Now Else Exit While
End While
At the end of the day you can't force a user to do something they're unwilling to do... they will always find a way around it
Short cut keys to bypass the requirement to move the mouse to a moving button.
Scrolling down to the bottom of the EULA without reading it to enable to continue.
Starting the software and then going to get their cup of tea while waiting for the nag screen to enable the OK button.
The most reliable way I've seen this done is to give a multiple choice question based on what is written. If they don't get the answer correct, they can't continue... of course after a couple of times, they'll realise that they can just choose each of the answers in turn until the button enables and then click it. Once again meaning they don't read what was written.
You can only go so far before you have to put the responsibility on the user for their actions. Telling the user that their actions are logged will make them more careful - if they're being held accountable, they're more likely to do things right. Especially if there's a carefully crafted message that says something like:
This is being logged and you will be held accountable for any
repercussions of this decision. You have instructed me to delete
the table ALL_CORPORATE_DATA. Doing so will cause the entire company's
database to stop working, thus grinding the whole company to a halt.
You must select the checkbox to state that you accept this responsibility
before you can choose to continue...
And then a checkbox with "Yes, I accept the responsibility for my actions" and two buttons:
"YES, I WANT TO DELETE IT" this button should only be enabled if the checkbox is checked.
"OH CRAP, THAT'S NOT WHAT I MEANT AT ALL" this button can always be enabled.
If they delete the table and the company grids to a halt, they get fired. Then the backup is restored and everyone's happy as Larry [whoever Larry is] again.
Do NOT do it, please. This will have no positive effect: You are trying to AVOID people's clicking OK instead of Cancel, by making them potentially click Cancel instead of OK (okay, they may try again). But! you might as well achieve people's clicking OK when they really want to cancel and that could be a real disaster. It's just no good.
Why not reformulate the UI to make the OK the "safe choice"?
The problem is better solved with a combination of good feedback, communication of a system model and built-in tolerance.
In favor of the confirmation mechanism speaks the simplicity of implementation. From programmer's point of view it's the easiest way of shifting responsibility onto user: "Hey, I've asked you if you really want to shoot yourself into the foot, haven't I? Now there is no one to blame but yourself..."
From user point of view:
There is a productivity penalty of having to confirm operation twice every time even though actual mistakes take up just a fraction of total number of actions, any switching of buttons, breaking the habitual workflow or inserting a pause into confirmation just increases the penalty.
The mechanism doesn't really provide much safety net for frequent users whose reflexes work ahead of the concious mind. Personally I have many times done a complex sequence of actions only to realise a moment later when observing the consequences that my brain somehow took the wrong route!
A better for the user, but more complex (from software development point of view) solution would be:
Where possible communicate in advance what exact affect the action is going to make on the system (for instance Stack Overflow shows message preview above Post Your Answer button).
Give an immediate feedback to confirm once the action took place (SO highlights the freshly submitted answer, gmail displayes a confirmation when a message is sent etc).
Allow to undo or correct possible mistake (i.e. in SO case delete or edit the answer, Windows lets restore a file from recycle bin etc). For certain non-reversible actions it's still possible to give an undo capability but for a limited timeframe only (i.e. letting to cancel or change an online order during the first 10 minutes after its submission, or letting to recall an e-mail during the first 60 seconds after its been "sent", but actually queued in the outbox etc).
Sure, this is much more initial work than inserting a confimation message box, but instead of shifting the responsibility it attempts to solve the problem.
But if the OK/Cancels are not consistent, that might throw off or upset the user.
And don't do like some EULAs where a user is forced to scroll a panel to the bottom before the Agree button becomes clickable. Sometimes you just won't be able to get a user to read everything carefully.
If they really need to read it, maybe a short delay should happen before the buttons appear? This could also potentially be annoying to the user, but if it is a very critical question, it'd be worth it.
Edit: Or require some sort of additional mechanism than just clicking to "accept" the very important decision. A check box, key press, password, etc.
I recommend informing the user that this is a critical operation by using red text and explaining why is this an unsafe operation.
Also, rather than two buttons, have two radio buttons and one "Ok" button, with the "don't continue" radio button selected as default.
This will present the user with an uncommon interface, increasing cognitive load and slowing him down. Which is what you want here.
As always with anything with user interaction, you have a small space between helping the user and being annoying. I don't know you exact requirements but your idea seems OK(pun intended) to me.
It sounds like your user is going through a type of input wizard in the safety app.
Some ideas as alternatives to moving buttons.
Have a final screen to review all input before pressing the final ok.
Have a confirmation box after they hit ok explaining what the result of this action will be.
A disclaimer that require you to agree to it by checking a box before the user could continue.
Don't switch it around - you'll only confuse more than you'll help.
Instead, do like FireFox and not activate the control for 5 sec. - just make sure you include a timer or some sort of indicator that you're giving them a chance to read it over. If they click on it, it cuts off the timer, but requires they click one more time.
Don't know how much better it will be, but it could help.
Just remember, as the man said: You can't fix stupid.
This will give me headache. Especially when I accidentally close the application and forget to save my file :(
I see another good example of forcing user to "read" before click: Firefox always grayed out the button (a.k.a disable) the "OK" button. Therefore the user have to wait around 5 seconds before he can proceed to do anything. I think this is the best effort I have seen in forcing user to read (and think)
Another example I have seen is in "License and Agreements" page of the installer. Some of them required the user to scroll down to the end of the page before he/she can proceed to next step.
Keyboard shortcuts would still behave as before (and you'd be surprised how few people actually use mice (especially in LOB applications).
Vista (and OSX IIRC) have moved towards the idea of using specific verbs for each question (like the "Send"/"Don't send" when an app wants to crash and wants to submit a crashdump to MS)
In my opinion, I like the approach used by Outlook when an app tries to send an email via COM, with a timer before the buttons are allowed to be used (also affects keyboard shortcuts)
If you use Ok and Cancel as your interface you will always be allow the user to just skip your message or screen. If you then rearrange the Ok and Cancel you will just annoy your user.
A solution to this, if your goal is to insure the users understanding, is:
Question the user about the content. If you click Ok you are agreeing to Option 1, or if you click Ok you are agreeing to option 2. If they choose the correct answer, allow the action.
This will annoy the user, so if you can keep track of users, only do it to them once per message.
This is what I responded to Submit/Reset button order question and I think the same principle can be used here. The order does not really matter as far as you make sure the user can distinguish the two buttons. In the past what I have done is used a button for (submit/OK) button and used a link for (reset/cancel) button. The users can instantly tell that these two items are functionally different and hence treat them that way.
I am not really for OK/Cancel. It's overused and requires you to read the babbling in order to say what you are OKing or Canceling. Follow the idea of MacOSX UI: the button contains a simple, easy phrase that is meaningful by itself. Exampleç you change a file extension and a dialog pops up saying:
"Are you sure you want to change the extension from .py to .ps?"
If you perform the change, the document could be opened by a different application.
(Use .ps) (Keep .py)
It is way more communicative than OK/Cancel, and your question becomes almost superfluous, that is, you just need to keep active the rightmost button, which seems to be the standard.
As it concerns the raw question you posed. Never do it. Ever. Not even at gunpoint. Consistency is an important requisite for GUIs. If you are not consistent you will ruin the user experience, and your users will most likely to see this as a bug than a feature (indeed it would be a BUG). Consistency is very important. To break it, you must have very good reason, and there must not be another different, standard way to achieve the same effect.
I wonder if you're thinking about the option that exists in Visual Basic where you can set various prompts and response options; and one option is to allow you to switch Cancel and OK based on which should be the default; so the user could just hit enter and most of the time get the proper action.
If you really want to head in this direction (which I think is a bad idea, and I'm sure you will too after little reflection and reading all the oher posts) it would work even better to include a capcha display for OK.