We have an application that has one or more text console windows that all essentially represent serial ports (text input and output, character by character). These windows have turned into a major performance problem in the way they are currently code... we manage to spend a very significant chunk of time in them.
The current code is structured by having the window living its own little life, and the main application thread driving it across "SendMessage()" calls. This message-passing seems to be the cause of incredible overhead. Basically, having a detour through the OS feels to be the wrong thing to do.
Note that we do draw text lines as a whole where appropriate, so that easy optimization is already done.
I am not an expert in Windows coding, so I need to ask the community if there is some other architecture to drive the display of text in a window than sending messages like this? It seems pretty heavyweight.
Note that this is in C++ or plain C, as the main application is a portable C/C++/some other languages program that also runs on Linux and Solaris.
We did some more investigations, seems that half of the overhead is preparing and sending each message using SendMessage, and the other half is the actual screen drawing. The SendMessage is done between functions in the same file...
So I guess all the advice given below is correct:
Look for how much things are redrawn
Draw things directly
Chunk drawing operations in time, to not send every character to the screen, aiming for 10 to 20 Hz update rate of the serial console.
Can you accept ALL answers?
I agree with Will Dean that the drawing in a console window or a text box is a performance bottleneck by itself. You first need to be sure that this isn't your problem. You say that you draw each line as a whole, but even this could be a problem, if the data throughput is too high.
I recommend that you don't use the SendMessage to pass data from the main application to the text window. Instead, use some other means of communication. Are these in the same process? If not, you could use shared memory. Even a file in the disk could do in some circumstances. Have the main application write to this file and the text console read from it. You could send a SendMessage notification to the text console to inform it to update the view. But do not send the message whenever a new line arrives. Define a minimum interval between two subsequent updates.
You should try profiling properly, but in lieu of that I would stop worrying about the SendMessage, which almost certainly not your problem, and think about the redrawing of the window itself.
You describe these are 'text console windows', but then say you have multiple of them - are they actually Windows Consoles? Or are they something your application is drawing?
If the latter, then I would be looking at measuring my paint code, and whether I'm invalidating too much of a window on each update.
Are the output windows part of the same application? It almost sounds like they aren't...
If they are, you should look into the Observer design pattern to get away from SendMessage(). I've used it for the same type of use case, and it worked beautifully for me.
If you can't make a change like that, perhaps you could buffer your output for something like 100ms so that you don't have so many out-going messages per second, but it should also update at a comfortable rate.
Are the output windows part of the
same application? It almost sounds
like they aren't...
Yes they are, all in the same process.
I did not write this code... but it seems like SendMessage is a bit heavy for this all in one application case.
You describe these are 'text console
windows', but then say you have
multiple of them - are they actually
Windows Consoles? Or are they
something your application is drawing?
Our app is drawing them, they are not regular windows consoles.
Note that we also need to get data back when a user types into the console, as we quite often have interactive serial sessions. Think of it as very similar to what you would see in a serial terminal program -- but using an external application is obviously even more expensive than what we have now.
If you can't make a change like that,
perhaps you could buffer your output
for something like 100ms so that you
don't have so many out-going messages
per second, but it should also update
at a comfortable rate.
Good point. Right now, every single character output causes a message to be sent.
And when we scroll the window up when a newline comes, then we redraw it line-by-line.
Note that we also have a scrollback buffer of arbitrary size, but scrolling back is an interactive case with much lower performance requirements.
Related
I want to do some tests on our tcl tk application regarding the user interaction. As the application has parts similar to a CAD for which every mouse movement is relevant, I would like to do something like record all events of some user interactions. My goal would be to playback these events laterwards and on every program change to discover potential changes. Or even better to assure the GUI behaves always the same and produces always the same data.
I know, that I can generate some enter motion and button events, but this would not be the same like the thousands of events generated by a real user interaction. But it is very important for me to have exactly these thousands of events.
Is there any possibility to achieve this?
It's relatively easy to record events of particular types with bind — you'll find that <ButtonPress>, <ButtonRelease>, <Enter>, <Leave>, <FocusIn>, <FocusOut>, <KeyPress> and <KeyRelease> cover pretty much everything that you are interested in — and then play them back with event generate. (You need to record quite a bit of information about each event in order to regenerate it correctly, but the underlying model is that of X events with similar names.) Assuming you're not wanting to support inter-application cut-and-paste or drag-and-drop for the purposes of recording; those complicate things a lot. You'll likely have a lot of events; recording to an SQLite database might make a lot of sense.
However, you should think carefully about which parts of the application you want to record. Does it matter if the order of two buttons in the outer shell of the application outside the CAD-like area get swapped in order? For most users, provided you're clear about what the buttons do (through clear labels and icons) it isn't very important, but for replaying recorded events it can matter hugely. Instead, for the parts of the application that are simple buttons and edit fields, I'd not record the details of them but would instead just record when the buttons are clicked and the changes to the text content of entries and so on. In effect, it's capturing higher-level events, and that's much easier to replay correctly. It's only when the user is in that main CAD area that you need the full detail.
Also, beware of changes to font sizes and screen sizes/scaling. They can change how things are laid out and may happen because of system-level alterations outside the scope of your application.
We started out the way you describe: record all those thousands of motion events, etc. Including exact timings which are extremely important for a GUI application as well.
It quickly became appearent that those recordings became too hard to maintain. They are also overly brittle in light of UI changes. Another problem where the hardcoded time values. A switch to a more powerful machine (or a cpu under load) would break the execution.
The two biggest improvements we introduced
Event compression: recognize the high-level action the user wanted to perform (like selecting a menu item). The recorded activateItem command would then perform the necessary work (event emulation) on replay.
Synchronization functions: instead of relying on a particular timing commands like waitForObject wait for an object to come into existance and become ready for interaction.
It took several years for this to work fluently, however. Including a central Object Map repository, property and screenshot verifications, high-level test descriptions in BDD and others. Feel free to take a look a the Squish for Tk product that came out of this work.
(I work in QA, this really is for legitimate use.)
I'm trying to come up with a way to introduce forced input lag for both keyboard and mouse (in Windows). Like, when I press 'A' on the keyboard, I want to introduce a very slight delay before the OS processes that A. Or if I move the mouse, I'd like the same mouse speed, but just with the same slight delay before it kicks in. This lag needs to be present across any threads, not just the one that kicked off the process. But, the lag doesn't have to be to-the-millisecond precise every time.
I'm not even sure how to go about setting this up. I'm capable of writing it in whatever language/environment we may need, I'm just not sure where to start. I think something like AutoHotkey may be able to do what I want by essentially making an arbitrary key call a macro that delays very slightly before sending that key, but I'm not sure what function calls I may need to make it happen. Or, maybe there's a way in C to get at the input across the OS before it kicks in. I'm just not sure.
Can anyone can point me to some resources or a language/function(s) that can accomplish this? (Or even an already existing program or service.)
If you want purely software solution, I’m afraid you’ll need to develop a filter driver for your keyboard and mouse. Very expensive to develop.
Instead, you can plug your mouse and keyboard into somewhere else, have the input messages come through the network, and then introduce network latency. You could use second PC + VNC software, or second PC + software USB/IP, or hardware USB/IP device like this one.
There’s an easy but less reliable way.
You could develop a system-wide WH_KEYBOARD_LL and WH_MOUSE_LL hooks, discard original messages, and after a while send the delayed messages with SendInput API. This should work mostly, however there’re cases where it wont, e.g. I don’t expect anything happens with most videogames because raw input.
I somehow have the feeling that modern systems, including runtime libraries, this exception handler and that built-in debugger build up more and more layers between my (C++) programs and the CPU/rest of the hardware.
I'm thinking of something like this:
1 + 2 >> OS top layer >> Runtime library/helper/error handler >> a hell lot of DLL modules >> OS kernel layer >> Do you really want to run 1 + 2?-Windows popup (don't take this serious) >> OS kernel layer >> Hardware abstraction >> Hardware >> Go through at least 100 miles of circuits >> Eventually arrive at the CPU >> ADD 1, 2 >> Go all the way back to my program
Nearly all technical things are simply wrong and in some random order, but you get my point right?
How much longer/shorter is this chain when I run a C++ program that calculates 1 + 2 at runtime on Windows?
How about when I do this in an interpreter? (Python|Ruby|PHP)
Is this chain really as dramatic in reality? Does Windows really try "not to stand in the way"? e.g.: Direct connection my binary <> hardware?
"1 + 2" in C++ gets directly translated in an add assembly instruction that is executed directly on the CPU. All the "layers" you refer to really only come into play when you start calling library functions. For example a simple printf("Hello World\n"); would go through a number of layers (using Windows as an example, different OSes would be different):
CRT - the C runtime implements things like %d replacements and creates a single string, then it calls WriteFile in kernel32
kernel32.dll implements WriteFile, notices that the handle is a console and directs the call to the console system
the string is sent to the conhost.exe process (on Windows 7, csrss.exe on earlier versions) which actually hosts the console window
conhost.exe adds the string to an internal buffer that represents the contents of the console window and invalidates the console window
The Window Manager notices that the console window is now invalid and sends it a WM_PAINT message
In response to the WM_PAINT, the console window (inside conhost.exe still) makes a series of DrawString calls inside GDI32.dll (or maybe GDI+?)
The DrawString method loops through each character in the string and:
Looks up the glyph definition in the font file to get the outline of the glyph
Checks it's cache for a rendered version of that glyph at the current font size
If the glyph isn't in the cache, it rasterizes the outline at the current font size and caches the result for later
Copies the pixels from the rasterized glyph to the graphics buffer you specified, pixel-by-pixel
Once all the DrawString calls are complete, the final image for the window is sent to the DWM where it's loaded into the graphics memory of your graphics card, and replaces the old window
When the next frame is drawn, the graphics card now uses the new image to render the console window and your new string is there
Now there's a lot of layers that I've simplified (e.g. the way the graphics card renders stuff is a whole 'nother layer of abstractions). I may have made some errors (I don't know the internals of how Windows is implemented, obviously) but it should give you an idea hopefully.
The important point, though, is that each step along the way adds some kind of value to the system.
As codeka said there's a lot that goes on when you call a library function but what you need to remember is that printing a string or displaying a jpeg or whatever is a very complicated task. Even more so when the method used must work for everyone in every situation; hundreds of edge cases.
What this really means is that when you're writing serious number crunching, chess playing, weather predicting code don't call library functions. Instead use only cheap functions which can and will be executed by the CPU directly. Additionally planning where your expensive functions are can make a huge difference (print everything at the end not each time through the loop).
It doesnt matter how many levels of abstraction there is, as long has the hard work is done in the most efficient way.
In a general sense you suffer from "emulating" your lowest level, e.g. you suffer from emulating a 68K CPU on a x86 CPU running some poorly implemented app, but it wont perform worse than the original hardware would. Otherwise you would not emulate it in the first place. E.g. today most user interface logic is implemented using high level dynamic script languages, because its more productive while the hardcore stuff is handled by optimized low level code.
When it comes to performance, its always the hard work that hits the wall first. The thing in between never suffers from performance issues. E.g. a key handler that processes 2-3 key presses a second can spend a fortune in badly written code without affecting the end user experience, while the motion estimator in an mpeg encoder will fail utterly by just being implemented in software instead of using dedicated hardware.
If you try to close a Microsoft Office application when you've got a ton of its stuff on the clipboard (e.g. a whole word doc), it prompts you to ask if you would like to access that data after the application is closed.
In this day and age, does it really matter if I have say 10MB of stuff on the clipboard?
As I write my image processing app, should I follow the same rule as Office, or just forget about it and leave the data there?
Personally, I favor just forgetting about it and leaving it there. I've always found the MS Word approach more annoying than useful.
The reason word is asking about this is that it uses Delayed Rendering for the clipboard. This allows word to only create the formats that are requested upon a paste operation. However, any formats on the clipboard that were created with delayed rendering will disappear if the program exits without rendering them. In the case of excel, rendering thousands of empty cells in RTF would take megabytes of space, even though the actual data is tiny. Thus it asks whether you want the data rendered to the clipboard or if you're fine with losing it.
For your own application, the question then becomes: should I use delayed rendering? The answer depends on how long it take you to render in the various formats you'd like to support. If you're only supporting one format that renders quickly, odds are there's not going to be an advantage to delayed rendering.
If you're storing things on the clipboard without the user requesting it then you might just annoy a few people :)
Otherwise, yeah, just leave the data there.
How To Erase Windows Clipboard Data and Why.
any one who has access to the computer can easily see the clipboard information.
I'd guess, malware today would also be snooping there.
this question might be more suitable on superuser.
Am trying to write a piece of code which will allows the user to type text into a textbox which then gets saved on the server. When the user types some more text in the textbox, I want only the difference to be sent to the server.
Is there a difference algorithm for JS which I can use to send only information about the difference. So it should be able to tell the difference between two text boxes essentially.
It could also be language agnostic and I can port it.
Thank you for your time.
UPDATE
In simple words. I have a text area which keeps saving the text in the box every X seconds. Now to save bandwidth I only want it to send the difference from the last saved revision (which I can say put in a variable. Initially this will be empty). Now the JS has to check the difference between the last revision and the current state of the textbox and generate a change list to send to the server.
UPDATE 2
Something like www.etherpad.com
Google DiffMatchPatch has a Javascript implementation, I've used it with much success.
http://code.google.com/p/google-diff-match-patch/
The Python difflib module does this and more. It's very flexible but might be challenging to port to Javascript.
Regarding your update, I'm first wondering why you need to worry about bandwidth. Unless your users are typing a lot of text into an edit box (which has its own usability issues) then there just aren't that many bytes to send. Send the whole text box each time you autosave. Users can't type fast enough to really notice the use of bandwidth.
Or, you could meet halfway. Every time you autosave, check to see whether the user has only added new text to the end compared to the last time. If so, send an "append" type update with just the new text. If the user has gone back and edited anything else, then send a "replace" type update where you send the whole text. This takes care of the common append-only case without severely complicating your implementation.
Instead of calculating a diff between 2 texts, which is difficult,
you could always, while people are editting, record the keystrokes and the caret position in the textbox. If you send this over every now and then (and clean the buffer), the server can playback the exact same sequence.
This code-smells of premature optimization. Perhaps you should implement your solution first and then see about optimizing your transfer rates using diffs. How much text are you looking at? Because the request and response packets are going to be more or less the same size with only a few bytes difference for your message, so the savings could be very minimal.
At the very least, complete your solution without optimization and profile your network traffic using tools like Firebug and then test to see how much worse the performance is with what you would consider to be the maximum text block that could be sent.
Finally, you could always use the TypeWatch JQuery plugin to listen for change events in the textbox. You can set a delay so that once the user finishes typing and the delay elapses, the callback function is triggered. This means that the text will only be sent when the user types something, and only when they are finished typing. This will be significantly more efficient than repeatedly polling the server.
Depends on how far you are ready to go.
You would like to check deltav algorithm, it is used by svn in particular: http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff