I have windowed WinApi/OpenGL app. Scene is drawn rarely (compared to games) in WM_PAINT, mostly triggered by user input - MW_MOUSEMOVE/clicks etc.
I noticed, that when there is no scene moving by user mouse (application "idle") and then some mouse action by user starts, the first frame is drawn with unpleasant delay - like 300 ms. Following frames are fast again.
I implemented 100 ms timer, which only does InvalidateRect, which is later followed by WM_PAINT/draw scene. This "fixed" the problem. But I don't like this solution.
I'd like know why is this happening and also some tips how to tackle it.
Does OpenGL render context save resources, when not used? Or could this be caused by some system behaviour, like processor underclocking/energy saving etc? (Although I noticed that processor runs underclocked even when app under "load")
This sounds like Windows virtual memory system at work. The sum of all the memory use of all active programs is usually greater than the amount of physical memory installed on your system. So windows swaps out idle processes to disc, according to whatever rules it follows, such as the relative priority of each process and the amount of time it is idle.
You are preventing the swap out (and delay) by artificially making the program active every 100ms.
If a swapped out process is reactivated, it takes a little time to retrieve the memory content from disc and restart the process.
Its unlikely that OpenGL is responsible for this delay.
You can improve the situation by starting your program with a higher priority.
https://superuser.com/questions/699651/start-process-in-high-priority
You can also use the virtuallock function to prevent Windows from swapping out part of the memory, but not advisable unless you REALLY know what you are doing!
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx
EDIT: You can improve things for sure by adding more memory and for sure 4GB sounds low for a modern PC, especially if you Chrome with multiple tabs open.
If you want to be scientific before spending any hard earned cash :-), then open Performance Manager and look at Cache Faults/Sec. This will show the swap activity on your machine. (I have 16GB on my PC so this number is very low mostly). To make sure you learn, I would check Cache Faults/Sec before and after the memory upgrade - so you can quantify the difference!
Finally, there is nothing wrong with the solution you found already - to kick start the graphic app every 100ms or so.....
Problem was in NVidia driver global 3d setting -"Power management mode".
Options "Optimal Power" and "Adaptive" save power and cause the problem.
Only "Prefer Maximum Performance" does the right thing.
I have two computers that can talk to each other over a serial connection. The connection is made over a wireless network. There is a variable, changing delay in communications between the two systems. On both systems I have a counter runtime that increments by 1 every ms. They both start as soon as the applications start. Say each computer is started at different times. How can I with with the serial connection synchronize the counters so that systemA.counter will equal systemB.counter and so that both counters increment at the same time (or as close as possible).
Ideally once synchronized the counters would drift only slowly apart so that once every 3 or 4 thousand incs I could re-synchronize.
I'm looking for good resources on the topic, example algorythms, example code (c/c++), anything to point me in the right direction.
Update
This is a closed system, no internet. For all intents and purposes no real protocol at all besides and open serial line over the wireless link. That link at the moment is bluetooth, but I'm thinking over moving it to a ZigBee Mesh. There are currently 2 nodes, but if I have 30 nodes all running this same application I would want them all to synchronize. There is not client/server designation, just a couple of devices running the same program with a counter. I don't have access to anything like time, just this counter that increments once a millisecond and whatever algorithm I can put in place.
Once I can get this working, I would like to put in place a propositioning and mapping system, but to figure out distances between nodes, I need actuate timing synchronized on the devices.
If you use this counters to order events in a system, you should look at vector clocks or Lamport timestamps.
The obvious resource is NTP, which is documented for example at http://www.eecis.udel.edu/~mills/ntp.html and with links off there. Basically, this uses timestamps to adjust the frequency at which local clocks run. The protocol has been around for years and been the subject of continuous research - I can't see any pack of slides there which immediately makes it clear how it works. You might be better to see if there is already an NTP implementation available than to try and re-implement it yourself.
It appears (e.g. from searching) that there is a small industry of people working on time synchronisation algorithms, especially in the context of wireless sensor networks. One jumping-off point, apart from searches, is the survey paper at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.2012 - Time synchronization in sensor networks: A survey (2004)
I work for a company that develops psychological tests. One of these tests measures the reaction time of a candidate.
Anyone has an idea of the maximum delay between a key press and the time that this key event is available? What are the dependencies?
Is there a guaranteed maximum response time? I readed something about 5 - 25 ms.
What is the best way to handle keyevents to have a minimum delay?
Thanks in advance,
Kevin
Windows UI processing is very complex. It includes algorithms like priority promotion on keypress, but it will usually wait until the next scheduler ticklet (at worst, 30ms on desktop systems and 60ms on server systems) if another process asks for a full CPU cycle,
To overcome that, you would need a special keyboard driver that would provide event at the same latency, but also measure the accurate time. Accurate time measurement is possible on windows systems if dynamic CPU clock switching is disabled (Lookup entry QueryPerformanceCounter(), you would need to know how to invoke it from DDK), in which case, the keyboard event would still arrive with unpredictable latency, but the original bus event would be time-stamped correctly. Then you remain only with bus latencis which should be smaller than the standard deviation of your measurements. See also What happens from the moment we press a key on the keyboard, until it appears in your word document
Computers keep time normally with a built in clock on the motherboard. But out of curiosity, can a computer determine when a certain interval of time has passed?
I would think not as a computer can only execute instructions. Of course, you could rely on the computer knowing its own processing speed, but that would be an extremely crude hack that would take up way too many clock cycles and be error prone.
Not without it constantly running to keep track of it, it pulling the time off of the internet constantly, or a piece of hardware to get the time from the constantly broadcast signal.
In certain cases, this is possible. Some microcontrollers and older processors execute their instructions in a defined time period, so by tracking the number of instructions executed, one can keep track of periods of time. This works well, and is useful, for certain cases (such as oscillating to play a sound wave), but in general, you're right, it's not particularly useful.
In the olden days there was a fixed timer interrupt. Often every 60th of a second in the US.
Old OS's simply counted these interrupts -- effectively making it a clock.
And, in addition to counting them, it also used this to reschedule tasks, thereby preventing a long-running task from monopolizing the procesor.
This scheme did mean that you had to set the time every time you turned the power on.
But in those days, powering down a computer (and getting it to reboot) was an epic task performed only by specialists.
I recall days when the machine wouldn't "IPL" (Initial Program Load) for hours because something was not right in the OS and the patches and the hardware.
In a typical handheld/portable embedded system device Battery life is a major concern in design of H/W, S/W and the features the device can support. From the Software programming perspective, one is aware of MIPS, Memory(Data and Program) optimized code.
I am aware of the H/W Deep sleep mode, Standby mode that are used to clock the hardware at lower Cycles or turn of the clock entirel to some unused circutis to save power, but i am looking for some ideas from that point of view:
Wherein my code is running and it needs to keep executing, given this how can I write the code "power" efficiently so as to consume minimum watts?
Are there any special programming constructs, data structures, control structures which i should look at to achieve minimum power consumption for a given functionality.
Are there any s/w high level design considerations which one should keep in mind at time of code structure design, or during low level design to make the code as power efficient(Least power consuming) as possible?
Like 1800 INFORMATION said, avoid polling; subscribe to events and wait for them to happen
Update window content only when necessary - let the system decide when to redraw it
When updating window content, ensure your code recreates as little of the invalid region as possible
With quick code the CPU goes back to deep sleep mode faster and there's a better chance that such code stays in L1 cache
Operate on small data at one time so data stays in caches as well
Ensure that your application doesn't do any unnecessary action when in background
Make your software not only power efficient, but also power aware - update graphics less often when on battery, disable animations, less hard drive thrashing
And read some other guidelines. ;)
Recently a series of posts called "Optimizing Software Applications for Power", started appearing on Intel Software Blogs. May be of some use for x86 developers.
Zeroith, use a fully static machine that can stop when idle. You can't beat zero Hz.
First up, switch to a tickless operating system scheduler. Waking up every millisecend or so wastes power. If you can't, consider slowing the scheduler interrupt instead.
Secondly, ensure your idle thread is a power save, wait for next interrupt instruction.
You can do this in the sort of under-regulated "userland" most small devices have.
Thirdly, if you have to poll or perform user confidence activities like updating the UI,
sleep, do it, and get back to sleep.
Don't trust GUI frameworks that you haven't checked for "sleep and spin" kind of code.
Especially the event timer you may be tempted to use for #2.
Block a thread on read instead of polling with select()/epoll()/ WaitForMultipleObjects().
Puts stress on the thread scheuler ( and your brain) but the devices generally do okay.
This ends up changing your high-level design a bit; it gets tidier!.
A main loop that polls all the things you Might do ends up slow and wasteful on CPU, but does guarantee performance. ( Guaranteed to be slow)
Cache results, lazily create things. Users expect the device to be slow so don't disappoint them. Less running is better. Run as little as you can get away with.
Separate threads can be killed off when you stop needing them.
Try to get more memory than you need, then you can insert into more than one hashtable and save ever searching. This is a direct tradeoff if the memory is DRAM.
Look at a realtime-ier system than you think you might need. It saves time (sic) later.
They cope better with threading too.
Do not poll. Use events and other OS primitives to wait for notifiable occurrences. Polling ensures that the CPU will stay active and use more battery life.
From my work using smart phones, the best way I have found of preserving battery life is to ensure that everything you do not need for your program to function at that specific point is disabled.
For example, only switch Bluetooth on when you need it, similarly the phone capabilities, turn the screen brightness down when it isn't needed, turn the volume down, etc.
The power used by these functions will generally far outweigh the power used by your code.
To avoid polling is a good suggestion.
A microprocessor's power consumption is roughly proportional to its clock frequency, and to the square of its supply voltage. If you have the possibility to adjust these from software, that could save some power. Also, turning off the parts of the processor that you don't need (e.g. floating-point unit) may help, but this very much depends on your platform. In any case, you need a way to measure the actual power consumption of your processor, so that you can find out what works and what not. Just like speed optimizations, power optimizations need to be carefully profiled.
Consider using the network interfaces the least you can. You might want to gather information and send it out in bursts instead of constantly send it.
Look at what your compiler generates, particularly for hot areas of code.
If you have low priority intermittent operations, don't use specific timers to wake up to deal with them, but deal with when processing other events.
Use logic to avoid stupid scenarios where your app might go to sleep for 10 ms and then have to wake up again for the next event. For the kind of platform mentioned it shouldn't matter if both events are processed at the same time.
Having your own timer & callback mechanism might be appropriate for this kind of decision making. The trade off is in code complexity and maintenance vs. likely power savings.
Simply put, do as little as possible.
Well, to the extent that your code can execute entirely in the processor cache, you'll have less bus activity and save power. To the extent that your program is small enough to fit code+data entirely in the cache, you get that benefit "for free". OTOH, if your program is too big, and you can divide your programs into modules that are more or less independent of the other, you might get some power saving by dividing it into separate programs. (I suppose it's also possible to make a toolchain that spreas out related bundles of code and data into cache-sized chunks...)
I suppose that, theoretically, you can save some amount of unnecessary work by reducing the number of pointer dereferencing, and by refactoring your jumps so that the most likely jumps are taken first -- but that's not realistic to do as a programmer.
Transmeta had the idea of letting the machine do some instruction optimization on-the-fly to save power... But that didn't seem to help enough... And look where that got them.
Set unused memory or flash to 0xFF not 0x00. This is certainly true for flash and eeprom, not sure about s or d ram. For the proms there is an inversion so a 0 is stored as a 1 and takes more energy, a 1 is stored as a zero and takes less. This is why you read 0xFFs after erasing a block.
Rather timely this, article on Hackaday today about measuring power consumption of various commands:
Hackaday: the-effect-of-code-on-power-consumption
Aside from that:
- Interrupts are your friends
- Polling / wait() aren't your friends
- Do as little as possible
- make your code as small/efficient as possible
- Turn off as many modules, pins, peripherals as possible in the micro
- Run as slowly as possible
- If the micro has settings for pin drive strengh, slew rate, etc. check them & configure them, the defaults are often full power / max speed.
- returning to the article above, go back and measure the power & see if you can drop it by altering things.
also something that is not trivial to do is reduce precision of the mathematical operations, go for the smallest dataset available and if available by your development environment pack data and aggregate operations.
knuth books could give you all the variant of specific algorithms you need to save memory or cpu, or going with reduced precision minimizing the rounding errors
also, spent some time checking for all the embedded device api - for example most symbian phones could do audio encoding via a specialized hardware
Do your work as quickly as possible, and then go to some idle state waiting for interrupts (or events) to happen. Try to make the code run out of cache with as little external memory traffic as possible.
On Linux, install powertop to see how often which piece of software wakes up the CPU. And follow the various tips that the powertop site links to, some of which are probably applicable to non-Linux, too.
http://www.lesswatts.org/projects/powertop/
Choose efficient algorithms that are quick and have small basic blocks and minimal memory accesses.
Understand the cache size and functional units of your processor.
Don't access memory. Don't use objects or garbage collection or any other high level constructs if they expands your working code or data set outside the available cache. If you know the cache size and associativity, lay out the entire working data set you will need in low power mode and fit it all into the dcache (forget some of the "proper" coding practices that scatter the data around in separate objects or data structures if that causes cache trashing). Same with all the subroutines. Put your working code set all in one module if necessary to stripe it all in the icache. If the processor has multiple levels of cache, try to fit in the lowest level of instruction or data cache possible. Don't use floating point unit or any other instructions that may power up any other optional functional units unless you can make a good case that use of these instructions significantly shortens the time that the CPU is out of sleep mode.
etc.
Don't poll, sleep
Avoid using power hungry areas of the chip when possible. For example multipliers are power hungry, if you can shift and add you can save some Joules (as long as you don't do so much shifting and adding that actually the multiplier is a win!)
If you are really serious,l get a power-aware debugger, which can correlate power usage with your source code. Like this