So I have this program that I really like, and it doesn't support Applescript. I'd like to automate it a little bit. Now, I know that I could use applescript to tell the program to tell the menu to tell the submenu to tell the menuitem to activate or whatever, but frankly I don't like applescript very much anyway.
When I open the NIB file in IB, I can see the messages that are being sent to FirstResponder; for example, the Copy menu item sends "copy:". Is there any way for me to invoke this directly from another program?
No. It's called protected memory for a reason, you know. The other program is completely insulated from your application. There are ways to put code into other apps, but (a) it's very inadvisable (b) requires root privileges, which means the rest of your app needs to be ROCK SOLID AND IMPREGNABLE, and (c) writing such code is a black art requiring knowledge of the operating system kernel interfaces, virtual memory management, the ABI, the internals of the linker/loader, assembler programming, and the operational parameters and other specifics of the particular processor upon which your app happens to be running.
Really, AppleEvents and other such IPC mechanisms are there for a reason.
Your other alternatives (all of which are a bit hacky, to be honest, and give you the fairly significant burden of ensuring the target app is in the state you want/expect) the access the data you're looking for are:
The Accessibility APIs from the ApplicationServices framework, through which you can traverse the UI tree to grab the text from wherever you need it directly, or can activate the menu item. Access for your app has to be explicitly granted by the user, however (although this is much the same as the requirement for UI scripting).
You can use the CoreGraphics APIs (within the ApplicationServices framework again) to send keyboard events to the target application (or just to the system) directly. This would mean sending four events: Command-down, C-down, C-up, Command-up.
None of these are ideal. To be honest, your best approach would be to look at your requirements and figure out how you can best engineer around the problem by changing those requirements in some way, i.e. instead of grabbing something directly, ask the user to provide some input, etc.
You might be interested in SIMBL or in mach_inject. SIMBL is a daemon (in my fork based on mach_inject, in the original version based on injection via some ScriptingAdditions hack) which does the injection for you, so you just need to put a bundle with your code into the SIMBL directory and SIMBL will inject it for you into the target application. Or you can do so yourself via mach_inject. Or probably more convenient, mach_inject_framework which injects and runs code which just loads some framework.
I think Jim may overstate the point a bit; he's not wrong, but it seems misleading. There are lots of ways to cause a Cocoa program to execute its own code under you control (Carbon is harder). The Accessibility API is very commonly used this way (so commonly that I expect it to be repurposed eventually). Fscript can give you all kinds of access to the innards of another Cocoa program. While Input Managers may well exit the scene at some point, SIMBL is still out there today to do this kind of stuff.
Whether you like Applescript or not, Apple Events are the primary way Apple provides for inter-program control. Have you double-checked Script Editor's Open Library function to find out if the program really does have any Applescript support? You can code Apple Events entirely in Objective-C these days using Leopard's Scripting Bridge. I wrote up a tutorial if you like (it's still under-documented by Apple).
Cocoa is a reverse-engineer's dream. The same guys who host SIMBL have a nice intro to the subject. "Wolf" also writes a lot of useful information on this.
Jim's right. Many of these approaches can completely destabilize the system if done incorrectly (sometimes even if done correctly). I don't do much of this stuff on my production systems; I need them to work. But there are a lot of things you can make a Mac app do, and it's a good part of a Mac developer's training to understand how all the pieces really work.
Related
I have a simple problem, I will be straighforward.
Suppose I have a third-party cocoa application running that has a chat box inside. Well, I need to capture the text inside that chat-box in real time from another application and write a logfile in real time with that information.
I am sure there is a way, I just don't know where to start. I have experience with cocoa and objective C, I have some apps in the iphone app store.
Thank you very much
Unless the app is suitably scriptable (e.g. AppleScript) or has some kind of external API then you're not going to be able to do this.
In short: Contact the developer of the application, but don't get your hopes up.
Unfortunately, in this day and age of protected memory and whatnot, we more or less have to be content with what the applications give us to work with.
However: You are not entirely without recourse. Using F-Script you might be able to attach to the process and cause some controller or other to emit notifications that you can capture and log.
Edit: If, as appears to be the case, it's a Carbon application, you are well and truly hosed:
F-script and similar is unlikely to be possible.
Even if it is, trying injection on a Carbon app, that is to say, a C++ app, is likely to be an exercise in futility and disappointment, if not completely impossible.
Seeing as how Carbon is deprecated (and how!), the application is unlikely to be updated with a proper API for that sort of thing.
All of the above.
Reedit: One tiny little aber; it is possible, although unlikely, that you can achieve something using Interface Scripting, but again; I wouldn't get my hopes up.
I'm now with a idea to start the development of a bare bones Qt/GTK+-like framework, but I want to know some things before I start the creation of this project:
What is the structure of GTK+ and Qt?
Do I need to develop a window manager to build my own framework?
Some resources to start?
Developing a GUI/Application framework is a significant undertaking. You might want to be very clear about why you need to write yet an other framework.
Both projects you mention are open source. Why not start there?
GTK: git clone git://git.gnome.org/gtk+
Qt: git clone git://gitorious.org/qt/qt.git
Ed You ask what the structure of GTK and Qt are, whether you need to write your own widow manager (answer: no) and how to get started. Answers to at least the first two are in the source code. Don't forget, great practitioners in any field learn by watching others. Reading code is no different.
Writing a GUI/app framework would be a great learning experience, but even a fairly small app framework would be a very big job, and not something you really should tackle until you're fairly expert in writing applications using several other frameworks and widget toolkits.
I did something like this once, back in the early years of this decade. That was after I'd been programming for the Mac for over 15 years, Windows over 10, and had programmed both directly to their native graphics, event, and widget APIs, as well as various object-oriented toolkits for them including PowerPlant, MFC, and MacApp. When I started working on a PalmOS application, I spent a couple of weeks writing a very small app framework modeled on PowerPlant. But I could not have succeeded at all without those decades of broad and deep experience with so many GUI systems.
Doing this for Linux/X11 is even more work. That's because, unlike Mac OS and Windows, neither X11 nor Linux supply built-in user interface widgets, or much in the way of graphics primitives or text layout capabilities. GTK+ is part of the GNOME ecosystem; it provides the widgets, gets its message queue and internal communications from GObject, relies on GDK to abstract and simplify its graphics and event communications with X11, and uses Pango and Cairo for text rendering and layout. I work all through that system, and it probably represents many dozens of person-years of hard work by a lot of really smart people. And I'm sure Qt is very similar.
So if you really want to do this, I would recommend you:
Write programs with a lot of different app and widget toolkits, on multiple operating systems. That will help you learn not just how such systems work, but why they are designed as they are. And it will give you some feeling for what works well, and what works poorly.
Contribute bug fixes or new features to one or more of the various open-source frameworks. GTK+ has a list of tasks for beginners to work on. Another great open-source framework is wxWidgets.
Become an expert-level C/C++ programmer.
When you've done that for a few years, you will have the expertise suitable for tackling your own framework.
That sounds like a major undertaking, at least as a starting project.
Not sure what you mean by "the structure" of e.g. GTK+. You can see the object hierarchy for GTK+, that tells you at least how the implemented objects (GTK+ is an object-oriented API) relate to each other. You can guess how the code can be structured, from that information.
And no, you don't need to write your own window manager; the toolkits mainly concern themselves with what happens inside windows, not with the window management itself. Of course you could decide that your "platform" should have a wider scope, and include a WM.
I think some of the answers here might exaggerate a bit. Obviously making something of the same quality, width and depth as Qt and Gtk is a huge untertaking. But you can make simpler stuff and still learn a lot about how it works. I suggest doing like I did in university. Use OpenGL with Glut. Then you got basic drawing functionality and event system in place already. You then need to create classes for buttons, text fields etc.
If you want to make it really simple then each component just needs to know where it is drawn and have some sort of bounding box where you check whether mouse click are inside or not. You also needs to create some system which makes it possible for buttons, check boxes etc to tell the rest of your code that they were clicked.
This isn't really the rocket science people here make it out to be. Games have made their own very simple GUI toolkits for years. You can try that approach as well. I have modeled a simple GUI tookit on top of a game engine before. Your buttons and textfield could be simply be sprites.
But yeah, if you want to make something that will compete with Gtk+ and Qt, forget about it. That is a team effort over many years.
Is it worth to try to keep your GUI within the system looks ?
Every major program have their own anyways...
(visual studio, iexplorer, firefox, symantec utilities, adobe ...)
Or just the frame and dialogs should be left in the system look 'n feel range ?
update:
One easy exemple, if you want to add a close button to your tab, usually you make it against your current desktop theme. But if the user has a different theme, your close button is out of place, it doesn't fit the system look anymore.
I played with the uxtheme api, but there is nothing much you can do, and some themes i've seen are incomplete sets.
So to address this issue, the best way i see, is to do like visual studio/firefox/chrome roolup your own tab control with your theme...
I think, that unless your program becomes a very major part of the users life, you should strive to minimize "surprises" and maximimze recognizability (is that even a word?).
So, if you are making something that is used by 1.000 people for 10 minutes a day, go with system looks, and mechanisms.
If, on the other hand, you are making something that 100 people are using for 6 hours a day, I would start exploring what UI improvements and shortcuts I could cram in to make those 6 hours easier to deal with.
Notice however, that UI fixes must not come at the expense of performance. This is almost always the case in the beginning when someone thinks that simply overriding the OnPaint event in .Net will be sufficient.
Before you know it you are once again intercepting NC_PAINT and NC_BACKGROUNDERASE and all those little tricks to make it go as fast as the built-in controls.
I tend to agree with others here- especially Soraz and Smaci.
One thing I'll add, though. If you do feel that the OS L&F is too constraining, and you have good grounds for going beyond it, I'd strive to follow the priciple of "Pacing and leading" (which I'm borrowing here from an NLP context).
The idea is that you still want to capitalise as much as possible on your intended audidences familiarity with the host OS (there will be rare exceptions to this, as Smaci has already covered). So you use as much as possible of the "standard" controls and behaviours (this is the "pacing") - but extend it where necessary in ways that still "fit in" as much as possible (leading).
You've already mentioned some good examples of this principle at work - Visual Studio, even Office to some extend (Office is "special" as new UI styles that cut their teeth here often find their way back into future OS versions - or de-facto standards).
I'm bringing this up to contrast the type of apps that just "do it their way" - usually because they've been ported from another platform, or have been written to be cross-platform in GUI as well as core. Java apps often fall into this category, but they're not the only ones. It's not as bad as it used to be, but even today most pro audio apps have mongrel UIs, showing their lineage as they have been ported from one platform to another through the years. While there might be good business reasons for these examples, it remains that their UIs tend to suck and going this route should be avoided if in any way possible!
The overriding principle is still to follow the path of least surprise, and take account of your user's familiarity with the OS, and ratio of their time using your app to others on the OS.
Yes, if only because it enables the OS to use any accessability features that are built in like text-to-speech. There is nothing more annoying for someone who needs accessability features to have yet another UI that breaks all the tools they are used to.
I'd say it depends on the users, the application and the platform. The interface should be intuitive to the users, which is only the same as following system UI standards if they are appropriate for those users. For example, in the past I have been involved in developing hand held systems for dairy and bread delivery on Windows CE hand helds. The users in this case typically were not computer literate, and had a weak educational backround. The user interface focussed on ease of use through simple language and was modelled on a pre-existing paper form system. It made no attempt to follow the Windows look and feel as this would not have been appropriate.
Currently, I develop very graphical software for a user group that is typically 3rd level educated and very computer literate. The expectation here is that the software will adhere to and extend the Windows look and feel.
Software should be easy and intuitive where possible, and how to achieve this is entirely context dependent.
I'd like to reply with another question (Not really Stackoverflow protocol, but I think that, in this case, it's justified)
The question is 'Is it worth breaking the OS look and feel?'
In other words,
Do you have justification for doing so? (In order to present data in some way that's not possible within normal L&F)
What do you gain from doing so? (Improvinging usability?)
What do you lose from doing so? (Intuitiveness & familiarity?)
Don't simply do it 'To be different'
It depends on how wide you would define system look'n feel... But in general, you should keep it.
Do not surprise the user with differentiating from what he is used to. That's one of the reasons why we call him user ;-)
Firefox and Adobe products usually don't because they are targeting several plattforms which all have their own L&F. But Visual Studio keeps the typical Windows L&F. And, as long as you are developing only for Windows, so should you.
Apart from the fact that there is no well-defined look-n-feel on Windows, you should always try to follow the host platform native L&F. Note however that look-n-feel is just as much about how a program behaves as how it looks. Programs which behave in a counter-intuitive way is just as annoying as programs sporting their own ugly widgets.
Fraps is a good example (IMHO) of a program which is actually very useful, but breaks several user interface guidelines and looks really ugly.
If you're developing for Apple's Mac OS X or Microsoft Windows, the vendors supply interface guidelines which should be followed for any application to be "native".
See Are there any standards to follow in determining where to place menu items? for more information.
If you are on (or develop for) a Mac, then definitely YES!
And this should be true for Windows also.
In general, yes. But there's the occassional program that does well despite being not formatted for all the OSes it runs on. For example, emacs runs pretty much contrary to every interface guideline on OS X or Windows (and probably even gnome/KDE) and it's not going away any time soon.
I strongly recommend making your application look native.
A common mistake that developers who are porting an application to a new platform seem to make is that the new application should look-and-feel like it does on the old platform.
No, the new application should look-and-feel like all the other application that the user is used to on the new platform.
Otherwise, you get abominations like iTunes on Windows. The same UI design may be exactly right on one platform and very wrong on the next.
You will find that your users may not be able to pin-point why they dislike your application, but they just feel it hard to use.
Yes, there are valid exceptions, but they are rare (and sure enough, they tend to be the major applications like Office and Firefox, rather than the little ones). If you are unsure enough to have to ask on StackOverflow, your application isn't one of them.
What would be the best way of inserting functionality into a binary application (3d party, closed source).
The target application is on OSX and seems to have been compiled using gcc 3+. I can see the listing of functions implemented in the binary and have debugged and isolated one particular function which I would like to remotely call.
Specifically, I would like to call this function - let's call it void zoomByFactor(x,y) - when I receive certain data from a complex HIDevice.
I can easily modify or inject instructions into the binary file itself (ie. the patching does not need to occur only in RAM).
What would you recommend as a way of "nicely" doing this?
Edit:
I do indeed need to entire application. So I can't ditch it and use a library. (For those who need an ethical explanation: this is a proprietary piece of CAD software whose company website hasn't been updated since 2006. I have paid for this product (quite a lot of money for what it is, really) and have project data which I can not easily migrate away from it. The product suits me just fine as it is, but I want to use a new HID which I recently got. I've examined the internals of the application, and I'm fairly confident that I can call the correct function with the relevant data and get it to work properly).
Here's what I've done so far, and it is quite gheto.
I've already modified parts of the application through this process:
xxd -g 0 binary > binary.hex
cat binary.hex | awk 'substitute work' > modified.hex
xxd -r modified.hex > newbinary
chmod 777 newbinary
I'm doing this kind of jumping through hoops because the binary is almost 100 megs large.
The jist of what I'm thinking is that I'd jmp somewhere in the main application loop, launch a thread, and return to the main function.
Now, the questions are: where can I insert the new code? do I need to modify symbol tables? alternatively, how could I make a dylib load automatically so that the only "hacking" I need to do is inserting a call to a normally loaded dylib into the main function?
For those interested in what I've ended up doing, here's a summary:
I've looked at several possibilities. They fall into runtime patching, and static binary file patching.
As far as file patching is concerned, I essentially tried two approaches:
modifying the assembly in the code
segments (__TEXT) of the binary.
modifying the load commands in the
mach header.
The first method requires there to be free space, or methods you can overwrite. It also suffers from extremely poor maintainability. Any new binaries will require hand patching them once again, especially if their source code has even slightly changed.
The second method was to try and add a LC_ LOAD_ DYLIB entry into the mach header. There aren't many mach-o editors out there, so it's hairy, but I actually modified the structures so that my entry was visible by otool -l. However, this didn't actually work as there was a dyld: bad external relocation length at runtime. I'm assuming I need to muck around with import tables etc. And this is way too much effort to get right without an editor.
Second path was to inject code at runtime. There isn't much out there to do this. Even for apps you have control over (ie. a child application you launch). Maybe there's a way to fork() and get the initialization process launched, but I never go that.
There is SIMBL, but this requires your app to be Cocoa because SIMBL will pose as a system wide InputManager and selectively load bundles. I dismissed this because my app was not Cocoa, and besides, I dislike system wide stuff.
Next up was mach_ inject and the mach_star project. There is also a newer project called
PlugSuit hosted at google which seems to be nothing more than a thin wrapper around mach_inject.
Mach_inject provides an API to do what the name implies. I did find a problem in the code though. On 10.5.4, the mmap method in the mach_inject.c file requires there to be a MAP_ SHARED or'd with the MAP_READ or else the mmap will fail.
Aside from that, the whole thing actually works as advertised. I ended up using mach_ inject_ bundle to do what I had intended to do with the static addition of a DYLIB to the mach header: namely launching a new thread on module init that does its dirty business.
Anyways, I've made this a wiki. Feel free to add, correct or update information. There's practically no information available on this kind of work on OSX. The more info, the better.
In MacOS X releases prior to 10.5 you'd do this using an Input Manager extension. Input Manager was intended to handle things like input for non-roman languages, where the extension could popup a window to input the appropriate glyphs and then pass the completed text to the app. The application only needed to make sure it was Unicode-clean, and didn't have to worry about the exact details of every language and region.
Input Manager was wildly abused to patch all sorts of unrelated functionality into applications, and often destabilized the app. It was also becoming an attack vector for trojans, such as "Oompa-Loompa". MacOS 10.5 tightens restrictions on Input Managers: it won't run them in a process owned by root or wheel, nor in a process which has modified its uid. Most significantly, 10.5 won't load an Input Manager into a 64 bit process and has indicated that even 32 bit use is unsupported and will be removed in a future release.
So if you can live with the restrictions, an Input Manager can do what you want. Future MacOS releases will almost certainly introduce another (safer, more limited) way to do this, as the functionality really is needed for language input support.
I believe you could also use the DYLD_INSERT_LIBRARIES method.
This post is also related to what you were trying to do;
I recently took a stab at injection/overriding using the mach_star sources. I ended up writing a tutorial for it since documentation for this stuff is always so sketchy and often out of date.
http://soundly.me/osx-injection-override-tutorial-hello-world/
Interesting problem. If I understand you correctly, you'd like to add the ability to remotely call functions in a running executable.
If you don't really need the whole application, you might be able to strip out the main function and turn it into a library file that you can link against. It'll be up to you to figure out how to make sure all the required initialization occurs.
Another approach could be to act like a virus. Inject a function that handles the remote calls, probably in another thread. You'll need to launch this thread by injecting some code into the main function, or wherever else is appropriate. Most likely you'll run into major issues with initialization, thread safety, and/or maintaining proper program state.
The best option, if its available, is to get the vendor of your application to expose a plugin API that lets you do this cleanly and reliably in a supported manner.
If you go with either hack-the-binary route, it'll be time consuming and brittle, but you'll learn a lot in the process.
On Windows, this is simple to do, is actually very widely done and is known as DLL/code injection.
There is a commercial SDK for OSX which allows doing this: Application Enhancer (free for non-commercial use).
Is there anything similar on Windows what would achieve the same as the InputManager on OS X?
If you are looking to inject code into processes (which is what Input Managers are most commonly used for), the Windows equivalents are:
AppInit_DLLs to automatically load your DLL into new processes,
CreateRemoteThread to start a new thread in a particular existing process, and
SetWindowsHookEx to allow the capture of window events (keyboard, mouse, window creating, drawing, etc).
All of these methods require a DLL which will be injected into the remote process. C would be the best language to write such a DLL in as such a DLL needs to be quite light weight as to not bog the system down. RPC methods such as named pipes can be used to communicate to a master process should this be required.
Googling for these three APIs will turn up general sample code for these methods.
I'm pretty sure Windows has an API that developers can use to create new kinds of text input systems. I gather there are a wide variety of text input systems in use in non-Roman-derived markets, many of which are provided by third parties.
It's unclear if that's what you were really asking about, though, because you just assumed everyone knows what you would want to use an Input Manager for on Mac OS X.
If you want to create a new type of input method, ask how to do that.
If you want to get your own code running inside other applications, ask how to do that.
Don't just assume people can read your mind when asking questions, and don't assume that they have the same experience that you do and will recognize all the same platform-specific terminology.