How do you get current keyboard focus coordinates in Mac OS X's Accessibility API? - macos

I'm looking for a Mac OS X Accessibility API to get the coordinates of the location of the current keyboard (not mouse) focus. According to page 2 of the document I found at http://www.apple.com/accessibility/pdf/Mac_OS_X_Tiger_vpat.pdf, it's doable:
Supported:
Mac OS X exposes the location of the current keyboard and
mouse focus to assistive technologies via the Accessibility API and
also provides a visual indication of the focus on-screen.
Despite the statement above, I can't seem to find the API itself. I'm a seasoned dev (coding since 1982), but have never developed on Mac OS X; please be gentle.

OSX appears to have an asymmetric accessibility API; you can use the NSAccessibilityProtocol to make your own app accessible, but to access accessibility of another app, you have to use a separate set of interfaces/objects, AXUIElement and friends.
I found an article on Retreiving the window that has focus that may be of use here: seems the key steps are:
Use AXUIElementCreateSystemWide to create a 'system wide' accessibility object
Ask that object for the currently focused application by calling AXUIElementCopyAttributeValue asking for kAXFocusedApplicationAttribute
Ask the returned object for the focused window again using AXUIElementCopyAttributeValue, but this time for NSAccessibilityFocusedWindowAttribute - actually looks like you can skip this step below, and go straight from focused application to focused UI Element...
Ask the returned object for the currently focused element using the same API again, but this time with NSAccessibilityFocusedUIElementAttribute
Ask that element for its kAXSizeAttribute / kAXPositionAttribute
You might also want to check out the source code for UIElementInspector which displays information about the element under the mouse pointer (though it doesn't appear to do anything with focus).
Also looks like you'll need to enable the accessibility API either via GUI (see article above) or via terminal for any of the above to work - presumably this is to give users a defense against rogue apps taking control of their desktop.
I haven't used any of these personally (yet); but I'm familiar enough with accessibility APIs to know where to look - hope this helps.

Related

Options for getting/setting Windows window states for a browser window

Our agency has a Java web application that uses a third party Java applet as a TIFF viewer. In this application javascript spawns a separate window (and yes, it must be a separate window). We are currently using an in-house ActiveX control so that the HTML window can have the window state, window size and position, and screen device presisted and restored via localStorage. Our agency is IE11 only so for now using it is not a problem. It's embedded in the page and operates on the "containing window".
This control was made many years ago in VB6. It appears that Visual Studio doesn't support the creation of ActiveX controls, and I don't personally have the knowledge or skill to create one another way.
Please do not suggest the HTML DOM window and screen objects! Most answers I've searched for just refer to these! They can't get the entire browser window size (viewport + chrome + toolbars + titlebar), nor does javascript have any idea about Windows-specific things like window state and screen device (it's always the OS primary screen). I understand the HTML DOM can't be tied to anything OS-specific in order for it to be implementable on multiple platforms.
Up until now this control has worked pretty well - you can, for example, maximize on a given screen and restore the window to the same screen. But I realize that ActiveX won't be supported forever and it's IE only...It seems to me, though, that the need for such functionality can't be totally uncommon. I'm guessing that some may suggest that the application UI should be changed so that this is less of a problem - but being able to put the viewer on a different monitor from the application, and having that window be restored exactly to it's last position is important to us. I do understand that if you reuse that window the user only has to position it once, but we'd rather not require this of the user.
Question: are there any alternatives to interacting with the win32 API other than COM/ActiveX ? Are there any other methods of accurate window persistence that can know about screens other than the OS primary, and maximized/minimized/normal windows) that use something other than the Win32 API?

Intercepting keyboard and mouse events from focused applications on OS X

Soon I will have to work with OS X and tools like hammerspoon are missing some important capabilities for me. I need to be able to intercept keyboard and mouse events completely from the focused application. Say I ctrl+alt+apple+left_click on an application, I don't want the application to know about that left click. So far the only thing I came up with was to build a transparent fullscreen application, though I'm not sure how feasible that is yet.
Any better idea or hints how to go about this in a language of your choice?
Thanks!
You will need to create an event tap. However, the application will have to run as the root user, or the user will have to authorize that the application has been granted rights to accessibility features.
Apple's documentation can be found here.
Interestingly enough, I am in the process of writing a blog post about how to use event taps (including an ObjectiveC API that I wrote for my own use), but the post won't be made available for another week or so.

Mac OS X 10.10 Find window by title, find button by label and press it

I use Mac OS X 10.10 and I would like to write a program that looks continuously for a window analyzing all the names of the opened windows. When the windows appear, I would like that the program will look for a button with a specific label and once found it, the app should send it a "pressed message".
I would be able to do it under windows, but I am not so familiar with Mac.
I have found a question related to mine (How do I get a list of the window titles on the Mac OSX?), but I think the most difficult part is finding the button and sending it a "pressed message".
Thank you in advance!
What you are looking for is the Accessibilty APIs. These are mostly Core Foundation style C APIs and typically prefixed with AX.
You might also want to consider additional identifiers beyond window title as window titles are not necessarily unique.
Using the AX APIs is not easy and is extremely verbose. You can use them to explore the UI and find things and interact with them but you might have more limited success observing user interaction. That might require a more fragile combination with event monitoring using NSEvent globalMonitor or CGEventTap depending on the UI widgets involved.
Also note that using the AX APIs to control anything outside your app is not sandbox capable.

Creating a window manager type overlay for Mac OS X

I want to make my own window manager for OS X, or at least give it the appearance of a new one. I have many designs written down in a book, and would like to implement them. These include altering, or even completely removing, menu bars, creating entirely new guis for switching applications, etc.
I know that OS X does not have a window manager, and that basically the functions that an X11 window manager would perform are done by Carbon, Cocoa, the Dock application, and the window server. I've read that it would take an incredible amount of reverse engineering to write my own api, etc. at the hardware level. I am still not that good at programming though, and don't have that kind of time. That's why I was thinking of maybe running an application on top of OS X that will function like a separate window manager - and do everything that the normal OS GUI / window manager would do.
Is this possible? For example: making a custom button that would appear upon a certain key combination, that could be clicked to access a document viewer, change the time, minimize a window, etc. Is there some way to access functionality to basic tasks / actions like this without using the default OS X button controls, and implementing them with my own GUI? I am talking about more than a simple theme change, I want to completely change the user experience. This means that this application would be run in a full screen mode that blocks out default OS X menu bars.
I've heard something about using graphics architectures to plug in your own window manager? Would this be an option too? If so, how would I go about doing that?

Cocoa Accessibility API and Spaces?

I'm facing a problem for an application I'm writing (http://code.google.com/p/blazingstars/issues/detail?id=25), where my program is a menulet (menu bar) application that uses the Accessibility API to interact with and control another program. I do the usual things like registering for the API notifications and getting the window list through API calls, etc., but I realized a while ago that if my program is started in a second Space (virtual desktop) after the program I'm interacting with is started in the first, my program will crash and burn because it can't access any information about its target. (Is there a way around that problem I'm missing?)
A simple solution would be to popup a dialog asking the user to restart the program in the correct Space, but for the life of me I can't figure out how to tell which Space my target is in, either through NSWorkspace or the Accessibility API, so that I can compare it to the Space that I'm in. Any ideas?
Note that setting the collection behaviour to NSWindowCollectionBehaviorCanJoinAllSpaces isn't going to do me any good because I have to do a bunch of work upon launch, so I have to be in the same space as my target right from the start.
I think you can do this with the APIs in CGWindow.h..
Specifically see CGWindowListCopyWindowInfo() and kCGWindowWorkspace.
I've used these APIs to do all types of things like getting window contents, window frames, etc...
If that doesn't work then you might want to try this private API:
extern CGSError CGSGetWindowWorkspace(const CGSConnectionID cid,
CGSWindowID wid,
CGSWorkspaceID *workspace);
The trick would be getting the connection ID of the target process.
You should probably redesign your app so that it delays its initialization until the app you want to control is in the current space.
There is no easy way to do this under Leopard because there are no official "space change" notifications, but the blog post and comments on this page may help.

Resources