What is the preferred way of synchronizing with monitor refreshes, when vsync is not an option? We enable vsync, however, some users disable it in driver settings, and those override app preferences. We need reliable predictable frame lengths to simulate the world correctly, do some visual effects, and synchronize audio (more precisely, we need to estimate how long a frame is going to be on screen, and when it will be on screen).
Is there any way to force drivers to enable vsync despite what the user set in the driver? Or to ask Windows when a monitor rerfesh is going to happen? We have issues with manual sleeping when our frame boundaries line up closely to vblank. It causes occasional missed frames, and up to 1 extra frame of input latency.
We mainly use OpenGL, but Direct3D advice is also appreciated.
You should not build your application's timing on the basis of vsync and exact timings of frame presentation. Games don't do that these days and have not do so for quite some time. This is what allows them to keep a consistent speed even if they start dropping frames; because their timing, physics computations, AI, etc isn't based on when a frame gets displayed but instead on actual timing.
Game frame timings are typically sufficiently small (less than 50ms) that human beings cannot detect any audio/video synchronization issues. So if you want to display an image that should have a sound played alongside it, as long as the sound starts within about 30ms or so of the image, you're fine.
Oh and don't bother trying to switch to Vulkan/D3D12 to resolve this problem. They don't. Vulkan in particular decouples presentation from other tasks, making it basically impossible to know the exact time when an image starts appearing on the screen. You give Vulkan an image, and it presents it... at whatever is the next most opportune moment. You get some control over how that moment gets chosen, but even those choices can be restricted based on factors outside of your control.
Design your program to avoid the need for rigid vsync. Use internal timings instead.
I am attempting to create a kext which will allow me to lower the minimum display brightness. Which drivers would be relevant to this? Would this be an I/O Kit driver?
This pertains to the internal display on my MacBook Pro 14,1 running macOS 10.14.4 using the integrated Intel Iris Plus Graphics 640.
The driver that controls is the "AppleBacklight.kext" kernel extension.
In general: Display backlight is usually (and this is the case on the MacBook Pro) controlled by a PWM (pulse-width modulation) signal from 0% to 100%. A controller - which could be the GPU or a dedicated IC - sends out the PWM signal according to the brightness level chosen by the user. In some cases this factors in an ambient light sensor.
The controller operates by dividing up the usuable PWM range in a number of settings (for example 20 seperate steps). The whole PWM range is usually not available, as backlights have different minimum and maximum allowable PWM ranges. If you go outside that range, you're violating the specs and may cause damage to the display.
On modern Intel computers the PWM range is stored in the SSDT (system service descriptor table) accessible through the ACPI (Advanced Configuration and Power Interface). These tables are usually dumped into .aml/.dsl files. You'll be looking at for example the LMIN and LMAX parameters (LMIN/LMAX = Backlight PWM Min/Max).
You could also consider replacing the default backlight kernel extension with for example this:
https://github.com/RehabMan/OS-X-Intel-Backlight
It is only meant to be used with Hackintoshes, but it controls the same Intel integrated GPU as you have.
Here's a different kernel extension that uses the ACPI method described above to control the backlight:
https://github.com/RehabMan/OS-X-ACPI-Backlight
Again, it is intended for Hackintoshes.
If you want to try dumping and patching your SSDT manually, you can take a look at this guide:
https://www.tonymacx86.com/threads/guide-patching-laptop-dsdt-ssdts.152573/
Again note that it is intended to be used with Hackintoshes.
In general I wouldn't recommend trying to change the minimum display brightness on original Apple hardware. You run the risk of damaging circuits - but most likely you'll just experience a black screen when you lower the brightness below the minimum.
I have windowed WinApi/OpenGL app. Scene is drawn rarely (compared to games) in WM_PAINT, mostly triggered by user input - MW_MOUSEMOVE/clicks etc.
I noticed, that when there is no scene moving by user mouse (application "idle") and then some mouse action by user starts, the first frame is drawn with unpleasant delay - like 300 ms. Following frames are fast again.
I implemented 100 ms timer, which only does InvalidateRect, which is later followed by WM_PAINT/draw scene. This "fixed" the problem. But I don't like this solution.
I'd like know why is this happening and also some tips how to tackle it.
Does OpenGL render context save resources, when not used? Or could this be caused by some system behaviour, like processor underclocking/energy saving etc? (Although I noticed that processor runs underclocked even when app under "load")
This sounds like Windows virtual memory system at work. The sum of all the memory use of all active programs is usually greater than the amount of physical memory installed on your system. So windows swaps out idle processes to disc, according to whatever rules it follows, such as the relative priority of each process and the amount of time it is idle.
You are preventing the swap out (and delay) by artificially making the program active every 100ms.
If a swapped out process is reactivated, it takes a little time to retrieve the memory content from disc and restart the process.
Its unlikely that OpenGL is responsible for this delay.
You can improve the situation by starting your program with a higher priority.
https://superuser.com/questions/699651/start-process-in-high-priority
You can also use the virtuallock function to prevent Windows from swapping out part of the memory, but not advisable unless you REALLY know what you are doing!
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx
EDIT: You can improve things for sure by adding more memory and for sure 4GB sounds low for a modern PC, especially if you Chrome with multiple tabs open.
If you want to be scientific before spending any hard earned cash :-), then open Performance Manager and look at Cache Faults/Sec. This will show the swap activity on your machine. (I have 16GB on my PC so this number is very low mostly). To make sure you learn, I would check Cache Faults/Sec before and after the memory upgrade - so you can quantify the difference!
Finally, there is nothing wrong with the solution you found already - to kick start the graphic app every 100ms or so.....
Problem was in NVidia driver global 3d setting -"Power management mode".
Options "Optimal Power" and "Adaptive" save power and cause the problem.
Only "Prefer Maximum Performance" does the right thing.
I'm using MatLab R2014B on Win 8.1 I have a figure with two sub-plots. The data for the first sub-plot is about 700,000 points; the second is about 50,000 points. When I display it or manipulate it in any way (zoom, say), there's a huge lag in time, up to about 30 seconds. Obviously I'd like to improve performance. Here's what I know:
If I break it into 4 plots, each covering 1/4 of the data, performance is fast. Much more than 4 times as fast. The difference seems exponential.
A colleague (running R2014A I believe) has a machine that should be slower but in fact the figure displays with near-realtime speed.
The problem is perhaps how the figure is being rendered. I ran MatLab's "opengl info" and it reports that the Software flag is false. That should mean it's using the display's hardware rendering.
So maybe the display adapter isn't set quite right. My machine (it's a Lenovo laptop) has two display adapters: Intel HD Graphics 3000 and NVIDIA NVS 4200M. I don't know why there are both or whether there are any relevant settings.
Any thoughts on how to proceed?
It could be that you're running it through your integrated graphics processor (Intel HD Graphics 3000) rather than your dedicated graphics processor (NVIDIA NVS 4200M). If your Lenovo has "switchable graphics" enabled, you should be able to switch to the NVIDIA, or check that you are indeed rendering through that. Right click on your power manager in the taskbar. If you see a menu item that says "switchable graphics," you can change it to your NVIDIA. Note, you'll have to close out of MATLAB to do switch.
It does sound like a slowdown caused by the rendering configuration. When you run opengl info in MATLAB, what device is listed as "Renderer"?
If you don't need to manipulate it (say you only want an image file), you can always create your figure with figure('Visible','Off') and save it without actually showing the figure on screen.
MATLAB releases ever since R2014b use a new graphics engine which is known to be extremely slow with large data sets; see for example
http://www.mathworks.com/matlabcentral/newsreader/view_thread/337755
The solution has nothing to do with graphics drivers etc. Revert back to MATLAB R2014a and stay there.
I wrote a function, plotECG, that enables you to show plots with millions of samples. It includes sliders for quick scrolling and zooming.
If you have multiple timeseries and want them to be displayed in a synchronized way, you can pass them as matrix all at once and define the key 'AutoStackSignals', followed by a cell array of strings with the signal's names. Then, the signals are shown one below the other in the same axis with the corresponding name as YTickLabel.
https://de.mathworks.com/matlabcentral/fileexchange/59296
In a typical handheld/portable embedded system device Battery life is a major concern in design of H/W, S/W and the features the device can support. From the Software programming perspective, one is aware of MIPS, Memory(Data and Program) optimized code.
I am aware of the H/W Deep sleep mode, Standby mode that are used to clock the hardware at lower Cycles or turn of the clock entirel to some unused circutis to save power, but i am looking for some ideas from that point of view:
Wherein my code is running and it needs to keep executing, given this how can I write the code "power" efficiently so as to consume minimum watts?
Are there any special programming constructs, data structures, control structures which i should look at to achieve minimum power consumption for a given functionality.
Are there any s/w high level design considerations which one should keep in mind at time of code structure design, or during low level design to make the code as power efficient(Least power consuming) as possible?
Like 1800 INFORMATION said, avoid polling; subscribe to events and wait for them to happen
Update window content only when necessary - let the system decide when to redraw it
When updating window content, ensure your code recreates as little of the invalid region as possible
With quick code the CPU goes back to deep sleep mode faster and there's a better chance that such code stays in L1 cache
Operate on small data at one time so data stays in caches as well
Ensure that your application doesn't do any unnecessary action when in background
Make your software not only power efficient, but also power aware - update graphics less often when on battery, disable animations, less hard drive thrashing
And read some other guidelines. ;)
Recently a series of posts called "Optimizing Software Applications for Power", started appearing on Intel Software Blogs. May be of some use for x86 developers.
Zeroith, use a fully static machine that can stop when idle. You can't beat zero Hz.
First up, switch to a tickless operating system scheduler. Waking up every millisecend or so wastes power. If you can't, consider slowing the scheduler interrupt instead.
Secondly, ensure your idle thread is a power save, wait for next interrupt instruction.
You can do this in the sort of under-regulated "userland" most small devices have.
Thirdly, if you have to poll or perform user confidence activities like updating the UI,
sleep, do it, and get back to sleep.
Don't trust GUI frameworks that you haven't checked for "sleep and spin" kind of code.
Especially the event timer you may be tempted to use for #2.
Block a thread on read instead of polling with select()/epoll()/ WaitForMultipleObjects().
Puts stress on the thread scheuler ( and your brain) but the devices generally do okay.
This ends up changing your high-level design a bit; it gets tidier!.
A main loop that polls all the things you Might do ends up slow and wasteful on CPU, but does guarantee performance. ( Guaranteed to be slow)
Cache results, lazily create things. Users expect the device to be slow so don't disappoint them. Less running is better. Run as little as you can get away with.
Separate threads can be killed off when you stop needing them.
Try to get more memory than you need, then you can insert into more than one hashtable and save ever searching. This is a direct tradeoff if the memory is DRAM.
Look at a realtime-ier system than you think you might need. It saves time (sic) later.
They cope better with threading too.
Do not poll. Use events and other OS primitives to wait for notifiable occurrences. Polling ensures that the CPU will stay active and use more battery life.
From my work using smart phones, the best way I have found of preserving battery life is to ensure that everything you do not need for your program to function at that specific point is disabled.
For example, only switch Bluetooth on when you need it, similarly the phone capabilities, turn the screen brightness down when it isn't needed, turn the volume down, etc.
The power used by these functions will generally far outweigh the power used by your code.
To avoid polling is a good suggestion.
A microprocessor's power consumption is roughly proportional to its clock frequency, and to the square of its supply voltage. If you have the possibility to adjust these from software, that could save some power. Also, turning off the parts of the processor that you don't need (e.g. floating-point unit) may help, but this very much depends on your platform. In any case, you need a way to measure the actual power consumption of your processor, so that you can find out what works and what not. Just like speed optimizations, power optimizations need to be carefully profiled.
Consider using the network interfaces the least you can. You might want to gather information and send it out in bursts instead of constantly send it.
Look at what your compiler generates, particularly for hot areas of code.
If you have low priority intermittent operations, don't use specific timers to wake up to deal with them, but deal with when processing other events.
Use logic to avoid stupid scenarios where your app might go to sleep for 10 ms and then have to wake up again for the next event. For the kind of platform mentioned it shouldn't matter if both events are processed at the same time.
Having your own timer & callback mechanism might be appropriate for this kind of decision making. The trade off is in code complexity and maintenance vs. likely power savings.
Simply put, do as little as possible.
Well, to the extent that your code can execute entirely in the processor cache, you'll have less bus activity and save power. To the extent that your program is small enough to fit code+data entirely in the cache, you get that benefit "for free". OTOH, if your program is too big, and you can divide your programs into modules that are more or less independent of the other, you might get some power saving by dividing it into separate programs. (I suppose it's also possible to make a toolchain that spreas out related bundles of code and data into cache-sized chunks...)
I suppose that, theoretically, you can save some amount of unnecessary work by reducing the number of pointer dereferencing, and by refactoring your jumps so that the most likely jumps are taken first -- but that's not realistic to do as a programmer.
Transmeta had the idea of letting the machine do some instruction optimization on-the-fly to save power... But that didn't seem to help enough... And look where that got them.
Set unused memory or flash to 0xFF not 0x00. This is certainly true for flash and eeprom, not sure about s or d ram. For the proms there is an inversion so a 0 is stored as a 1 and takes more energy, a 1 is stored as a zero and takes less. This is why you read 0xFFs after erasing a block.
Rather timely this, article on Hackaday today about measuring power consumption of various commands:
Hackaday: the-effect-of-code-on-power-consumption
Aside from that:
- Interrupts are your friends
- Polling / wait() aren't your friends
- Do as little as possible
- make your code as small/efficient as possible
- Turn off as many modules, pins, peripherals as possible in the micro
- Run as slowly as possible
- If the micro has settings for pin drive strengh, slew rate, etc. check them & configure them, the defaults are often full power / max speed.
- returning to the article above, go back and measure the power & see if you can drop it by altering things.
also something that is not trivial to do is reduce precision of the mathematical operations, go for the smallest dataset available and if available by your development environment pack data and aggregate operations.
knuth books could give you all the variant of specific algorithms you need to save memory or cpu, or going with reduced precision minimizing the rounding errors
also, spent some time checking for all the embedded device api - for example most symbian phones could do audio encoding via a specialized hardware
Do your work as quickly as possible, and then go to some idle state waiting for interrupts (or events) to happen. Try to make the code run out of cache with as little external memory traffic as possible.
On Linux, install powertop to see how often which piece of software wakes up the CPU. And follow the various tips that the powertop site links to, some of which are probably applicable to non-Linux, too.
http://www.lesswatts.org/projects/powertop/
Choose efficient algorithms that are quick and have small basic blocks and minimal memory accesses.
Understand the cache size and functional units of your processor.
Don't access memory. Don't use objects or garbage collection or any other high level constructs if they expands your working code or data set outside the available cache. If you know the cache size and associativity, lay out the entire working data set you will need in low power mode and fit it all into the dcache (forget some of the "proper" coding practices that scatter the data around in separate objects or data structures if that causes cache trashing). Same with all the subroutines. Put your working code set all in one module if necessary to stripe it all in the icache. If the processor has multiple levels of cache, try to fit in the lowest level of instruction or data cache possible. Don't use floating point unit or any other instructions that may power up any other optional functional units unless you can make a good case that use of these instructions significantly shortens the time that the CPU is out of sleep mode.
etc.
Don't poll, sleep
Avoid using power hungry areas of the chip when possible. For example multipliers are power hungry, if you can shift and add you can save some Joules (as long as you don't do so much shifting and adding that actually the multiplier is a win!)
If you are really serious,l get a power-aware debugger, which can correlate power usage with your source code. Like this