Why Are Automated GUI Tools So Fragile? - user-interface

For about a year a half, I've been working with SilkTest, which is a GUI automation tool, for both desktop and web applications. It simulates mouse and keyboard inputs, which eventually simulate end user behaviour. However, I find that it is a bit flaky; Button.Click() or DialogBox.Close() method calls that work just fine 9 times in a row seem to fail on a 10th call, only to go back working on the 11th. Normally I would just chalk this up to a quirk with SilkTest (or the application under Test, or the OS, or what have you) but then I see that there are similar issues with other GUI automation tools like Selenium:
Selenium Click() fails with Anchor Elements
Selenium Click() fails clicking button object
I know that for desktop apps, each GUI control/dialog has a tag element associated with it (at least in Windows-based GUIs) and that for web pages there is the domain object model hierarchy of page elements. My guess is that these tools sometimes run into issues navigating these hierarchies and finding unique elements and controls. But what is going on here? SilkTest is a relatively old, commercial software package while selenium is relatively new, open source and constantly evolving. The fact that they both can have similar problems raises a couple of flags with me.
Also, is this the case with other GUI test tools? Or have I just had a somewhat unusual experience?

There are 2 things here that you are talking about, first the concept of finding an object in the application under test that you want to automate. Your description of how SilkTest (and other tools) does this is quite accurate, i.e. as long as there is something that the automation software can use to identify the control then you are fine.
The second thing is why does the automation itself fails randomly, since the tool has not reported that it could not find the control then it must think that it sent the appropriate action to the application, e.g. a Click or a Type. This could be that the application is not ready to accept the action that you are sending it, this is similar to you attempting to click on something "before it was ready", in this case the application can decide to buffer the input or to discard the input.
So, how do you fix this? One way would be to use the capabilities of the tool to try to work out when the application is ready for input rather than sending it a stream of input blindly. SilkTest has capabilities that allow for you to do this (as does TestPartner). I cannot comment on Selenium as it is something I have not used.
A simple way of testing this would be to insert a pause for a couple of seconds before the offending action, then run this in a loop to see whether this solves the problem, if this is the case then it is your problem. If this does not fix the issue then there is something else going on that you need to contact the vendor of the testing tool.
Remember that applications are getting more and more complex, i.e. multi-threading, communications, any one of these could cause the automatic syncronisation to fail causing actions to fail.
Hope that helps.


TWebBrowser control slow performance only in Delphi

Can someone explain me why TWebBrowser control is working so slow on all XE editions of Delphi including XE5 and possibly XE6? To test this you need to create a new Delphi project and put TWebBrowser control in it. On form show event, navigate to this website:
Please test this on Windows 7 or later. When navigation is complete, run setImmediate test and watch the results. It will take huge amount of time to complete the test. It will take about a minute to finish this.
When you open true Internet Explorer browser and do the same thing - test will be completed instantly (~200 miliseconds).
Some additional wierd informations:
When you recreate this procedure on old versions of Delphi (Delphi 7 to be precise) the web-control works as fast as it should be working and test is completed instantly. But the HTML5 speed test will still works slow (alternative test on this page).
Another weird thing is, the same slow behavior can be seen on C++ Builder but not in Visual Studio products. Is Microsoft deliberately slowing down the TWebBorwser in Embarcadero products?? I can't belive this.
I was trying to overcome this problem with diffrent methods such as:
Trying different feature options in registry such as:
FEATURE_ALIGNED_TIMERS (undocumented option),
FEATURE_ALLOW_HIGHFREQ_TIMER (undocumented option),
Setting timerBeginPeriod(1) - no effect.
Please, if someone have any clue how to fix this issue - share this information with me.
I made standalone test app if anyone cares. It can be downloaded here: http://mp.org.pl/download/ietest.zip It contains source and exe app with htm file. HTM file contains some js procedure that works 10 times faster in standalone IE than in TWebBrowser control. It uses setImmediate as a test (the same procedure used in test described above). But it can be easier for testing this way.
I can also see the behavior described (in your original post and in the comments). I have a few thoughts, but not necessarily an answer.
One should expect some difference in performance between the WebBrowser control and IE, in part because your Delphi app will need to build in support for certain features/APIs that IE supports out of the box.
For example, the WebBrowser control fires notifications related to tabbed browsing (old, but relevant), but it does not intrinsically handle those notifications or update the UI. You have to respond to the notifications and draw the tabs yourself. By default, IE is hardware accelerated and uses certain Windows APIs that may not be directly supported by Delphi's VCL (for resource/performance) reasons. (Hardware acceleration could account for some of the performance differences you've noticed.)
(And, for the record, I don't believe a list of differences between IE and the WebBrowser control was ever documented. I certainly don't remember seeing one in the portfolio.)
Also, the default values for various feature controls vary between IE and applications hosting the WebBrowser control. Part of the reason for this stems from the idea that IE needs to highlight performance over compatibility whereas applications generally need to emphasize compatibility over performance. You may wish to review the feature control reference to see if there are other FCKs you need to enable for your app.
Second, your loops are very tight, perhaps too tight. You've got one request piling on earlier requests and you're not really leaving much room for processing, even with setImmediate. (IIRC, we're not really supposed to use anything smaller than 250ms for setInterval without risking performance hits from the sheer number of requests.) The remarks in the setImmdiate ref. page provide some guidance, as does this article on requestAnimationFrame.
One reason why dragging the window appears to improve performance may due to the priority of window drag repaint requests. They may be forcing your loops to hold long enough (or even break) to allow other events to process. Hard to say without with tracing the system with a debugger.
Have you ever had to add application.processMessages() to your Delphi apps in order to allow the system a chance to handle the work you've already assigned? A similar need may be coming into play given the nature of your test.
Performance testing and timing is a tricky thing. You need to make sure the test isn't imposing so much overhead that it interferes with the actual work you're trying to perform.
Finally, there were some questions about the document mode of the page as it's loaded into your project. When I first started messing around with your sample, I couldn't get project4 to load slowtest.html in anything other than IE5 quirks mode (notoriously slow). Here's what eventually started working for me:
<!DOCTYPE html>
<!-- saved from url=(0023)http://www.contoso.com/ -->
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<script type="text/javascript">
(Note, I deleted your initial doctype declaration and rewrite it to resolve a syntax error that was being reported by the F12 tools debugger.)
A few style key points here:
I used a mark of the web to load the page in the Internet zone. I find this makes it easier to load the page in edge mode, as pages in the intranet zone are loaded in compatibility view by default (unless you map the zones differently).
The x-ua-compatible header needs to be one of the first in the head block. It can follow title, but not much else.
Stylistically, elements need to be specified in lower case these days. There's a possibility that not following the current conventions forces the parser to fall back to an earlier rendering that supports the conventions.
Once I was able to control the documentMode at runtime, I found results I expected: older document modes ran more slowly. I also found that using requestAnimationFrame instead of setImmediate led to even better performance, but also surfaced the timing issue almost immediately.
In the end, this may be a case where the test highlights a problem, but not necessarily one you're trying to solve. (Insert Inigo meme here.) I get that you're trying to resolve a bottleneck. Are you sure you've found the correct bottleneck?
You may not be able to replicate the same performance of the native browser, but perhaps you can refactor the code to perform adequately without the extra overhead? Is there anything that might be better handled with a worker or some other implementation technique?
Hope this helps...

Poor performance of Microsoft UI Automation Library

I am currently trying to automate a Windows Forms application by using the Microsoft UI Automation Library and C#, but I have big problems concerning the performance. The Identification of single elements by using a PropertyCondition or iterating over all elements of a window takes very long (up to 4 minutes). As soon as I have a AutomationElement, everything is fine (e.g. GetCurrentPropertyValue reacts within 100ms).
The poor performance only applies to one application. I don't have access to the source but if something needs to be changed or checked, I can talk to the responsible programmer. As far as I know, some events (e.g. paint) were overwritten for the application. A typical window of the application contains about ~100 elements which are found by the FindAll method.
I also tried the COM interface of the UI automation library, which is about two times faster but this does not really solve the problem.
Does anyone have an idea how to solve this problem or experienced similar behavior?
We found the answer when we took a closer look at the main loop. In most cases Application.Run is used to start up the main window and run the application but for some reason the following code was used:
while DoStop == false
As the Microsoft UI Automation Library uses window messages, all the System.Threading.Thread.Sleep(10); summed up and made the object detection become really slow. This does not happen, if Application.Run is used.

Visual VoiceXML/VXML development tool?

Does anyone know of any tools out there that will let me run and debug a VXML application visually? There are a ton of VXML development tools, but they all require you to build your application within them.
I have an existing application that uses JSPs to generate VXML, and I'm looking for a way to navigate through and debug the rendered VXML in much the same way that Firebug allows one to do this with HTML. I have some proxy-like tools that let me inspect the rendered code as it is sent to the VXML browser, but there's a ton of JS, which makes traversing the code by hand rather difficult.
Has anyone worked with a product that allows for this?
IVR Avenger
There is JigSaw Test suite - has free trial license and reasonably priced.
There is IBM's debugger - part of WebSphere Voice Toolkit.
Many other products have debuggers - a very good summary is here
Disclaimer: I am the development manager for Voiyager (www.voiyager.com), a VoiceXML testing tool. It doesn't meet your criteria nor do I believe it is the type of tool you want, but I thought it was worth mentioning it.
As far as I know, there isn't such a test tool for VoiceXML. In fact there are very few VoiceXML tools on the market and hardly any of them test or analysis. The vendors that created development tools, have all been acquired by other companies. Some of them offered did offer various forms of debugging that were specific to their tool set or stayed at the Dialog (caller input) level. From your question, I'm assuming you need much lower level debugging capabilities.
I think the alternative paths are minimal and somewhat difficult. I believe your primary goal is to debug or rewrite an existing application, but you haven't provided any specific challenges beyond the JavaScript. Some thoughts or approaches that may help:
Isolate the JavaScript and place the code into a unit test harness. That will go a long way to understanding the logic of the application. Any encapsulation of the JavaScript you perform will probably go a long way towards better code maintainability.
Attempt to run the VoiceXML through a translation layer to HTML so you could use FireBug. The largest challenge would involve caller input (ie processing the SRGS grammars). You could probably cheat this by just having the form accept a JSON string the populates the field values. There are tools on the market to test grammars. Depending on the nature of your problems, you could take a simple and light approach and attempt this over just the trouble areas.
Plumb the application with a lot of logging. This can be done through the VoiceXML LOG element, or push the variable space back to the server. By adding intermediate forms, you may be able to provide a dump from each via the VoiceXML Data element.
See if your application will run in one of the open source VoiceXML browsers (not sure of the state of the open source browsers as we've built and bought for our various product lines). If you can get it mostly working, you can use the development debugger to provide some ability to step through the logic. However, it is probably one of the more difficult paths as you'll really need to understand the browser to know when and where to stick your breakpoints and to figure out how to expose the data you want.
Good luck on the challenge. If you find another approach, I would be interested in seeing it posted.
An alternative debug env is to use something like Asterisk with a voicexml browser plugin like the one from http://www.voiceglue.org/ or for a limited licence, i6net.
You can keep all the pieces separate(dynamic html and vxml application in php/jsp/j2ee/, tts processing, and optional asr processing as separate virtual machines with something like virtualbox. If the logic can be kept the same, then it is just a matter of changing the UI based on the channel.
A softphone is all you need to call a minimal asterisk machine, which has the voicexml browser with the url of the vxml in the call plan.
I just used Zend Framework as php is used in this environment, and changed view suffixes(phtml vs vxml) based on the user-agent string.
Flite for tts is fine for debugging, and when your app is ready you can either record phrases, and there was a page on the ubuntu forums with directions for how to increase flite quality with some additional sound files.
Do you have tried Eclipse VTP or InVision Studio?
Eclipse VTP
This is Eclipse plugin. But I feel that it is user-unfriendly a little (of Japanese viewpoint).
InVision Studio *Required create user account*
This is Convergys's IVR tool. It has to edit standard VXML mode. (Unfortunately, It's not exact matching.)
For just debugging vxml, I use Nuance Cafe's VoiceXML checker. It doesn't give you a visual tree or anything, but it's pretty good at spotting syntax errors and is free. I think they might also have more advanced debugging tools if you look into it, but I haven't had the need. (Note: I have no association with them)
I'm looking for the same problem that most of the links are down. I found a document where they propose an open source solution, which works as a plugin for Asterisk (https://www.researchgate.net/publication/228873959_Open_Source_VoiceXML_Interpreter_over_Asterisk_for_Use_in_IVR_Applications) and is available at https://sourceforge.net/projects/voxy/
I would like to know if there are current options to create a VXML structure graphically, like the next image.

Why do update at-startup-background-update-services exist?

I think one of the main causes of winrot are the sheer number of services that run at startup (and don't shut down) that phone home every x seconds to see if there is a new version of some piece of software.
Me personally, I disable every single one of them because they seem utterly useless to me. Most of the software packages that use these things, have an option to check for updates whenever you launch the program itself too. This looks way more efficient to me.
I was asking myself what the reason is for companies like Adobe and Apple to create such services that bog clients' computers down and at the same time increase the burden on their own update servers for what looks to me as very little return value for neither of them.
My client requests such a service, but I don't see any reason for it. I want to make sure I'm not missing a piece of the puzzle so I can come back with an educated opinion on why this is should or shouldn't be a desired functionality.
It's usually a desire by management to get brand recognition. It goes something like this:
Oh no. If our program just does its job, the user will never see that it's there, and they'll never find out who we are, and what a great company we are.
We need an icon in the tray; we need a shortcut on the desktop, and in the quick launch toolbar, and at the top level of the Start menu. If we could add a control panel applet, and an item on the right-click menu in Windows Explorer, and an icon in Internet Explorer, that'd be fantastic.
Of course, since our program's so important, the user's going to be using it a lot. Let's add a "speed boost" program that runs at startup, that makes sure that all of our binaries and dependencies are pre-loaded in the cache.
Oh, and we'll need an automated update program, to make sure that all of these components are as wham-bam-great as we can make them.
And can you put a splash screen on that as well?
Can you tell I'm bitter?
Roger's spot on.
Plus, once an application has developed to the point where it already has all the features you could expect it to cover for its intended purpose, the vendor is stuck. They need to keep banging out exciting new versions, so scope bloat creeps in. Instead of doing one thing well and getting out of the way, we must do everything related to it. We must always be in the user's face; they must never be allowed to use software that isn't ours; they must always be interacting with our brand. And of course we must take care to always start an updater task in the background, because we added a completely unnecessary internet-facing browser plugin/toolbar/ActiveX thing that will surely turn out to have security holes.
Acquisitive software is a huge problem that is steadily degrading the user experience on Windows. And it's an arms race: Microsoft hide old application surface interfaces (deprecating the classic start menu, removing quick launch, hiding system tray icons, auto-removing inactive Desktop icons) as they become so full of acquisitive-software junk that they're basically unusable, whilst introducing new ones that "will be better". But how long until applications start "helpfully" adding themselves to the Start menu's MRU list (because you're definitely going to want to use our great software a lot!) and pinning themselves to the Windows 7 dock?
Linux is doing better here because the distros own access to the user and aren't going to put up with any of this crap. Not something Microsoft can get away with though unfortunately.
Bonus Did You Know Fun Fact: Once upon a time, Nero was a nice, elegant CD-burning tool.

Do Character User Interfaces have a future?

We've got products built both with GUI and CHUI. Going forward, we're looking at redesigning a lot of our software and mainly taking the route of going all GUI. My question to the group is, do we need to account for keeping a CHUI around? What are the advantages of CHUI over GUI? Many times in the past people have said that CHUI is faster because you don't need a mouse. I argue that GUI can be just as fast with the right keyboard shortcuts, hotkeys and/or touch screens.
Is CHUI something we should no longer consider if hardware no longer provides a constraint?
Also to clarify, when I speak about CHUI I mean a CHaracter based User Interface, and I'm also mainly concerned with the effective presentation of data to an end user.
There have been some fantastic responses that have highlighted the importance of having a command line based interface for automation and scripting based tasks which I will certainly take to heart when we begin the design!
The primary benefits of a CHUI (that is something with forms and fields, not necessarily command line interfaces) is the keyboard for navigation and consistent layout. That is key.
If your GUI can be completely, and efficiently, keyboard navigated, then your CHUI user base should be happy. This is because in time, the users simply "type" their commands in to the system without "seeing the interface". They don't need to "discover" the interface, which is a primary feature of the GUI.
While CHUIs appear to be dinosaurs, they are still functional and usable. Most folks once they're trained (notably POS/Counter workers, but even back office scenarios like factory or warehouse floor, etc) have no problem using a CHUI.
But the key is the keyboard support so the user don't have to wait for the screen to catch up with them. Seeing a skilled operator with a mastery of the keyboard can make an application fly. You barely have a chance to see popup windows and what not.
You should poll your customers, not programmers. If your customers, who use your applications, want a CHUI, even if all your developers think it's a waste of time, you build it, because the customer is always right (except for when they're wrong).
You should absolutely still consider it. Most importantly, command line programs can be automated (and chained together in scripts) much more easily than GUIs (typically). I can't imagine working with a source control tool which didn't have a command line interface - although obviously having a GUI is useful too.
Now whether you need a command line version for your particular app is hard to say without knowing what your app does. Do you need automation and scripting? Might someone want to VPN in and run it from a very bad connection, and thus appreciate low bandwidth?
Note that MS certainly doesn't believe the command line is dead - or they wouldn't have created PowerShell.
I agree with Eli that your customers should have final say, but if you can keep the meat of your program from being too interwoven with the GUI(or CHUI), then production cost to make both available should be minimal.
If you write apps for unix and you need to handle users who telnet / ssh to your box then you will need command line interfaces.
I would say it depends on your target. Do you script your code from other apps? That would be a requirement to keep the interactive version (or some piece to avoid the GUI startup).
We usually do one or the other. But sometimes we have utils that have to be deployable through ftp and run ssh. Or we have tools that our users embed into their apps and don't want to expose a UI (data migration / conversion).
To this day, some of the most efficient user interfaces I've ever seen were plain old terminal-based character interfaces.
Anecdote: I was once part of a project to "modernize" a terminal application used by 500 customer service representatives. We published sexy GUI mockups and everyone, including the users, were suitably impressed. We worked for six months on the application, and all the user acceptance testing seemed to indicate we had a winner.
But when the application was finally launched, it failed miserably. As it turns out, CSRs are measured for performance daily, right down to the average number of seconds per call handled. And no matter how hard they tried, they could not match the same level of efficiency in the GUI as they could in the terminal interface. They could get close with tabs and shortcuts, but not quite there.
Hard lessons learned. Modern programmers may abhor "dinosaurs", but do users really care about slick interfaces? Usually they just want to get their work done.
When I first read this, my immediate thought was that this is probably one of those apps that's basically a series of forms, but displays inside a terminal. Often you see such dinosaurs running on cash registers. I also recall seeing such an app used to apply for a loan when I bought my car. This type of application doesn't seem to have a place in the modern world -- any system with even a tiny bit of processing power can handle a normal GUI nowadays. Unless you're trying to support really low-end legacy customers, get rid of this user interface. A GUI with decent keyboard shortcuts (please, please, please put some thought into keyboard-only use of your GUI programs...) is going to be equally effective for the users coming from the old CHUI system and much friendlier to those used to a GUI, without having to have 2 versions of your app.
I don't see why everyone is bringing up command line apps. I think most people recognize that the command line isn't going away. It's far faster for many tasks than a GUI, largely because the programs tend to be non-interactive (and thus easily scriptable). As soon as your app becomes interactive (or, at least, doesn't have a param to make it non-interactive), running it from the command line is much less important. Even awesome programs like Vim that are terminal-based are transitioning to their graphical counterparts (gVim) because it gives you the best of both worlds.
Even GUI apps like Firefox can benefit from command line interfaces like Ubiquity. If there's a way to provide the command line from within the GUI then why not have the best of both worlds?
A lot of CAD programs have command line interfaces that show you what the GUI interaction you just performed equates to in the command line. That way you can learn the command line operations for the things that you do frequently and where the command line can be quicker to interact with whist still having the discoverability of the GUI interface.
See this youtube video demonstrating Rhino3D's command line
CHUI is faster in execution speed, not user interaction speed. I write embedded systems (as well as GUIs), so I'll always have a use for command line apps.
Every study I have ever read showed that CHUI's are much faster for experienced users. GUI's are easier for new users and for applications that are only occasionally used. Also for a given screen size, you can display more information on a CHUI then a GUI. A good GUI can give you a quick over view at a glance.
In addition to the other benefits mentioned above, I've frequently found another reason to keep around an alternative UI--it keeps you and your interfaces honest. When an application is built with only one user interface, it becomes much easier to let design principles slide and for your business logic, etc. and your GUI to become an intertwined ball of spaghetti--despite best intentions. Regardless of the importance of your customers having a command-line interface, soon there might come a time when an alternative GUI (read: presentation layer) might be needed, and you'll want to be prepared. This might not be relevant to your requirements, but I think it's something good to keep in mind...
One of the big issues that we encountered was multisession capability which is almost nonexistent with the GUI technologies I have seen. Our users were quick to point out that with the current character based interface they could have over a dozen Telnet based terminal sessions going at the same time on their PC screen which enabled them to multitask or task switch with high efficiency. They rated multitasking as the killer feature which they benefitted from in our fast paced environment where interruptions are frequent. Being able to have concurrent access to multiple instances of a particular ERP application or multiple different ERP applications while always retaining session states was important to our user community.
I think the problem comes from design practices in GUI forms. We tend to place more objects on them especially with a vertical scroll bar and tab capabilities. This also makes loading slower. Going through CHUI menus with the keyboard is faster once you've memorized those sequences and holding the Ctrl key isn't required. There is something about the menu bar in Windows where the short-cut key descriptions are off to the right. The character based menus seemed easier to remember after awhile.
A) - This Menu
B) - That Menu
C) - Some other Menu
Or you could arrow through the choices and you just seemed to have some muscle memory where That Menu is the second choice.
As soon as you present some data, someone's going to want to query against it. You can integrate that with a gui, no problem. If you think some of your customers are going to want to script certain tasks. set it up. Anything to do with automation is better done from the command line(y harlo thar cron job!)
I love guis. I'm a mac user. But there is a time and a place for a CLI.
I was sysadmin at a university math department when the registration system went from a character based system using telnet, to a gui system on a PeopleSoft app.
The gals in the front office HATED the new system. Now part of this was the whole bit about old shoes being more comfortable. But when I asked about it, Christine said that even after a week of doing several hundred registrations per day, the new system took several times as long to do anything. Lots of things only doable with a mouse. The old system could accept input as fast as they could type. Screen repaints were under a tenth of a second. New system had lots of 3/4 to 2 second pauses -- just long enough to be annoying, not long enough to do anything else.
