Well the title seems pretty clear about what I want to do.
More precisely: I want to create a program (c++ or java is preferred) that manipulates the mouse in two ways, like: changing its position and doing clicks.
I was thinking about using allegro (it has mouse routines to manipulate the things cited above) or sdl(which I don't know if has that kind of routine). I tried with allegro nsuccessfully. My problem here was that I couldn`t virtually "do" clicks. I also couldn't redirect the stuff changed by my program to some other window.
Any tips?
There are a couple of ways to try automating other applications on Windows...
At the simplest level, one can use PostMessage to post keyboard and mouse messages to another application's windows. This has the advantage that it could work even if the other application is minimized. Unfortunately, this approach skips the majority of the input processing logic so applications that directly access key state using GetAsyncKeyState will not see (for example) the control key being 'down' no matter how many WM_KEYDOWN, vk=VK_CONTROL messages you send.
As Hans Passant commented, SendImput places input events in a lower level input event queue, and so can fully simulate modifier keys. These input events are not posted to windows however, so to get the input events delivered successfully the normal windows rules of activation and focus need to be followed. That said, this is the approach used by most testing-automation software (and is why most testing automation software requires the application being tested be the active application).
The last of the automation methods Ill mention - and sadly the least likley to work - is the Microsoft UI Automation framework. This framework is intended to allow applications to be used by disabled and/or special needs users. Sadly - very few software providers bother to implement this API in their products.
Related
After receiving some feedback on this question: How to create lParam for WM_CHAR or WM_KEYUP/WM_KEYDOWN?, I`ve started looking for a broader answer to a general solution. One thing I realized that using windows API's might not work for every app and in every case.
My first step in the follow up research was to make an Arduino powered servo to press the keys (yeah the concept is horrible ik).
But that prompted yet another idea, a hardware augmented small numpad keyboard which is also operated by Arduino which was controlled via another usb. This was at least somewhat usable - but still not very.
Then I tried to use Digispark Atiny85 microcontroller which in turn used Digikeyboard library. This solution was much better - but then necessity to have a digispark stuck in your usb port was a bit frustrating.
This made me curious if there are ways to emulate keyboard or any other HID devices using software only? Some brief googling pointed me to Kernel drivers and virtual COM ports, but that seem to be a bit over the top for me to process.
So can that task indeed be achieved by writing a kernel driver? Can it be done in any other manner? In either case are there any pointers which you can give me on the topic?
The SendInput function can be used to generate keyboard and mouse input. This input goes to the foreground window as if generated by real hardware (but lowlevel hooks can tell that it was software generated). It might not let you generate Ctrl+Alt+Delete nor control a UAC prompt but other than that it should be good enough in most cases. Writing a driver to overcome these limitations is normally not worth it.
There is no general way to generate input to a specific application/window if it is not the foreground window.
If you want to control a specific application you should use UI Automation.
Faking key up/down/char messages with PostMessage is not uncommon but it does not always work (the application might be using RAW input, input is not synchronized with real hardware etc.). If you are determined to use this method anyway, make sure you send it to the correct window (the HWND with the keyboard focus, not just the top-level window). Use the Spy++ tool to view the messages to make sure they are going to the correct window.
Was wondering if this: https://code.visualstudio.com/docs/editor/codebasics could be implemented with WinApi or by DLL calls/injection globally in every application?
Which api call could be relevant to get me started?
This cannot be done. There are too many problems that need to be solved for which there is no general solution.
The standard carets have a hard limit of one caret per queue. With that in mind you would now have to solve not one, but two problems: Getting a custom caret implementation, and fighting the system-provided one.
That might still sound doable, but even just getting your custom-rendered carets injected into foreign windows isn't possible. There is no infrastructure in the system that allows you to safely tap into the rendering of arbitrary (or even standard) controls you do not own.
Now even if you got a solution to all of the above, how would you communicate those multiple selection marks back to client code? EM_GETSEL is strictly limited to one selection mark at most.
So far, this was mostly just about standard controls. Things won't get any easier with custom control implementations. WPF, to my knowledge, uses a pretty much closed down control library, that doesn't even provide the customization points of Windows' common controls. Same goes for UI toolkits like Qt. While open source, Qt doesn't allow for any external customization.
I'm sure there are more problems that don't have a general solution. While not an exhaustive list, the above problems prevent implementation of multi-selection in arbitrary UIs outside your control.
i've been challenged to write an app simulating keystrokes. After pressing shortcut app should be able to send predefined key combination to currently active app. It is functionality provided by many existing applications, but i want to write it on my own. App should be usable on windows.
Could you provide me with suggestions about:
which programming language should I choose?
Is there any libraries providing such functionality?
EDIT: To be precise: both applications are standalone, windows app.
The native winapi for doing this is SendInput.
To answer your questions:
Which programming language? This is up to you. Because this is just a simple native API call, use any language that allows you to call native API.
Don't bother with libraries if this is all you need, because it's very simple.
Now, to go further, I know you didn't ask this but many many people continue by asking how to send keystrokes to windows that don't have keyboard focus - like sending keystrokes to a specific app. This is much more difficult and error-prone. And since it goes outside the behavior of actual real keystrokes, it can behave unpredictably. Here is one such question.
You can easily do this on C#, using
SendKeys.Send("key here");
or
SendKeys.SendWait("key here");
Some keys use other keycodes, you can see them here on MSDN.
For about a year a half, I've been working with SilkTest, which is a GUI automation tool, for both desktop and web applications. It simulates mouse and keyboard inputs, which eventually simulate end user behaviour. However, I find that it is a bit flaky; Button.Click() or DialogBox.Close() method calls that work just fine 9 times in a row seem to fail on a 10th call, only to go back working on the 11th. Normally I would just chalk this up to a quirk with SilkTest (or the application under Test, or the OS, or what have you) but then I see that there are similar issues with other GUI automation tools like Selenium:
Selenium Click() fails with Anchor Elements
Selenium Click() fails clicking button object
I know that for desktop apps, each GUI control/dialog has a tag element associated with it (at least in Windows-based GUIs) and that for web pages there is the domain object model hierarchy of page elements. My guess is that these tools sometimes run into issues navigating these hierarchies and finding unique elements and controls. But what is going on here? SilkTest is a relatively old, commercial software package while selenium is relatively new, open source and constantly evolving. The fact that they both can have similar problems raises a couple of flags with me.
Also, is this the case with other GUI test tools? Or have I just had a somewhat unusual experience?
There are 2 things here that you are talking about, first the concept of finding an object in the application under test that you want to automate. Your description of how SilkTest (and other tools) does this is quite accurate, i.e. as long as there is something that the automation software can use to identify the control then you are fine.
The second thing is why does the automation itself fails randomly, since the tool has not reported that it could not find the control then it must think that it sent the appropriate action to the application, e.g. a Click or a Type. This could be that the application is not ready to accept the action that you are sending it, this is similar to you attempting to click on something "before it was ready", in this case the application can decide to buffer the input or to discard the input.
So, how do you fix this? One way would be to use the capabilities of the tool to try to work out when the application is ready for input rather than sending it a stream of input blindly. SilkTest has capabilities that allow for you to do this (as does TestPartner). I cannot comment on Selenium as it is something I have not used.
A simple way of testing this would be to insert a pause for a couple of seconds before the offending action, then run this in a loop to see whether this solves the problem, if this is the case then it is your problem. If this does not fix the issue then there is something else going on that you need to contact the vendor of the testing tool.
Remember that applications are getting more and more complex, i.e. multi-threading, communications, any one of these could cause the automatic syncronisation to fail causing actions to fail.
Hope that helps.
So I have this program that I really like, and it doesn't support Applescript. I'd like to automate it a little bit. Now, I know that I could use applescript to tell the program to tell the menu to tell the submenu to tell the menuitem to activate or whatever, but frankly I don't like applescript very much anyway.
When I open the NIB file in IB, I can see the messages that are being sent to FirstResponder; for example, the Copy menu item sends "copy:". Is there any way for me to invoke this directly from another program?
No. It's called protected memory for a reason, you know. The other program is completely insulated from your application. There are ways to put code into other apps, but (a) it's very inadvisable (b) requires root privileges, which means the rest of your app needs to be ROCK SOLID AND IMPREGNABLE, and (c) writing such code is a black art requiring knowledge of the operating system kernel interfaces, virtual memory management, the ABI, the internals of the linker/loader, assembler programming, and the operational parameters and other specifics of the particular processor upon which your app happens to be running.
Really, AppleEvents and other such IPC mechanisms are there for a reason.
Your other alternatives (all of which are a bit hacky, to be honest, and give you the fairly significant burden of ensuring the target app is in the state you want/expect) the access the data you're looking for are:
The Accessibility APIs from the ApplicationServices framework, through which you can traverse the UI tree to grab the text from wherever you need it directly, or can activate the menu item. Access for your app has to be explicitly granted by the user, however (although this is much the same as the requirement for UI scripting).
You can use the CoreGraphics APIs (within the ApplicationServices framework again) to send keyboard events to the target application (or just to the system) directly. This would mean sending four events: Command-down, C-down, C-up, Command-up.
None of these are ideal. To be honest, your best approach would be to look at your requirements and figure out how you can best engineer around the problem by changing those requirements in some way, i.e. instead of grabbing something directly, ask the user to provide some input, etc.
You might be interested in SIMBL or in mach_inject. SIMBL is a daemon (in my fork based on mach_inject, in the original version based on injection via some ScriptingAdditions hack) which does the injection for you, so you just need to put a bundle with your code into the SIMBL directory and SIMBL will inject it for you into the target application. Or you can do so yourself via mach_inject. Or probably more convenient, mach_inject_framework which injects and runs code which just loads some framework.
I think Jim may overstate the point a bit; he's not wrong, but it seems misleading. There are lots of ways to cause a Cocoa program to execute its own code under you control (Carbon is harder). The Accessibility API is very commonly used this way (so commonly that I expect it to be repurposed eventually). Fscript can give you all kinds of access to the innards of another Cocoa program. While Input Managers may well exit the scene at some point, SIMBL is still out there today to do this kind of stuff.
Whether you like Applescript or not, Apple Events are the primary way Apple provides for inter-program control. Have you double-checked Script Editor's Open Library function to find out if the program really does have any Applescript support? You can code Apple Events entirely in Objective-C these days using Leopard's Scripting Bridge. I wrote up a tutorial if you like (it's still under-documented by Apple).
Cocoa is a reverse-engineer's dream. The same guys who host SIMBL have a nice intro to the subject. "Wolf" also writes a lot of useful information on this.
Jim's right. Many of these approaches can completely destabilize the system if done incorrectly (sometimes even if done correctly). I don't do much of this stuff on my production systems; I need them to work. But there are a lot of things you can make a Mac app do, and it's a good part of a Mac developer's training to understand how all the pieces really work.