glutMainLoop() vs glutTimerFunc()? - animation

I know that glutMainLoop() is used to call display over and over again, maintaining a constant frame rate. At the same time, if I also have glutTimerFunc(), which calls glutPostRedisplay() at the end, so it can maintain a different framerate.
When they are working together, what really happens ? Does the timer function add on to the framerate of main loop and make it faster ? Or does it change the default refresh rate of main loop ? How do they work in conjunction ?

I know that glutMainLoop() is used to call display over and over again, maintaining a constant frame rate.
Nope! That's not what glutMainLoop does. The purpose of glutMainLoop is to pull operating system events, check if timers elapsed, see if windows have to be redrawn and then call into the respective callback functions registered by the user. This happens in a loop and usually this loop is started from the main entry point of the program, hence the name "main - loop".
When they are working together, what really happens ? Does the timer function add on to the framerate of main loop and make it faster ? Or does it change the default refresh rate of main loop ? How do they work in conjunction?
As already told, dispatching timers is part of the responsibility of glutMainLoop, so you can't have GLUT timers without that. More importantly if there happened no events and no re-display was posted and if there's not idle function registerd, glutMainLoop will "block" the program until some interesting happens (i.e. no CPU cycles are being consumed).
Essentially it goes like
void glutMainLoop(void)
{
for(;;){
/* ... */
foreach(t in timers){
if( t.elapsed() ){
t.callback(…);
continue;
}
}
/* ... */
if( display.posted ){
display.callback();
display.posted = false;
continue;
}
idle.callback();
}
}
At the same time, if I also have glutTimerFunc(), which calls glutPostRedisplay() at the end, so it can maintain a different framerate.
The timers provided by GLUT make no guarantees about their precision and jitter. Hence they're not particularly well suited for framerate limiting.
Normally the framerate is limited by v-sync (or it should be), but blocking on v-sync means you can not use that time to do something usefull, because the process is blockd. A better approach is to register an idle function, in which you poll a high resolution timer (on POSIX compliant systems clock_gettime(CLOCK_MONOTONIC, …), on Windows QueryPerformanceCounter) and perform a glutPostRedisplay after one display refresh interval minus the time required for rendering the frame elapsed.
Of course it's hard to predict how long rendering is going to take exactly, so the usual approach is to collect sliding window average and deviation and adjust with that. Also you want to align that timer with v-sync.
This is of course a solved problem (at least in electrical engineering) which can be addressed by a Phase Locked Loop. Essentially you have a "phase comparator" (i.e. something that compares if your timer runs slower or faster than something you want synchronize to), a "charge pump" (a variable you add to or subtract from the delta from the phase comparator), a "loop filter" (sliding window average) and an "oscillator" (a timer) controlled by the loop filtered value in the charge pump.
So you poll the status of the v-sync (not possible with GLUT functions, and not even possible with core OpenGL or even some of the swap control extensions – you'll have to use OS specific functions for that) and compare if your timers lag beind or run fast compared to that. You add that delta to the "charge pump", filter it and feed the result back into the timer. The nice thing about this approach is, that this will automatically adjust to and filter the time spent for rendering frames as well.

From the glutMainLoop doc pages:
glutMainLoop enters the GLUT event processing loop. This routine should be called at most once in a GLUT program. Once called, this routine will never return. It will call as necessary any callbacks that have been registered. (grifos mine)
That means that the idea of glutMainLoop is just processing events, calling anything that is installed. Indeed, I do not believe that it keeps calling display over and over, but only when there is an event that request its redisplay.
This is where glutTimerFunc() comes into the play. It register a timer event callback to be called by glutMainLoop when this event is triggered. Note that this is one of several possible others event callbacks that can be registered. That explains why in doc they use the expression at least.
(...) glutTimerFunc registers the timer callback func to be triggered in at least msecs milliseconds. (...)

Related

Triggering a software event from an interrupt (XMEGA, GCC)

I want to run a periodic "housekeeping" event, triggered regularly by a timer interrupt. The interrupt fires frequently (kHz+), while the function may take a long time to finish, so I can't simply have it executed in line.
In the past, I've done this on an ATMEGA, where an ISR can simply permit other interrupts to fire (including itself again) with sei(). By wrapping the event in a "still executing" flag, it won't pile up on the stack and cause a... you know:
if (!inFunction) { inFunction = true; doFunction(); inFunction = false; }
I don't think this can be done -- at least as easily -- on the XMEGA, due to the PMIC interrupt controller. It appears the interrupt flags can only be reset by executing RETI.
So, I was thinking, it would be convenient if I could convince GCC to produce a tail call out of an interrupt. That would immediately execute the event, while clearing interrupts.
This would be easy enough to do in assembler, just push the address and IRET. (Well, some stack-mangling because ISR, but, yeah.) But I'm guessing it'll be a hack in GCC, possibly a custom ASM wrapper around a "naked" function?
Alternately, I would love to simply set a low priority software interrupt, but I don't see an intentional way to do this.
I could use software to trigger an interrupt from an otherwise unused peripheral. That's fine as a special case, but then, if I ever need to use that device, I have to find another. It's bad for code reuse, too.
Really, this is an X-Y problem and I know it. I think I want to do X, but really I need method Y that I just don't know about.
One better method is to set a flag, then let main() deal with it when it gets around to it. Unfortunately, I have blocking functions in main() (handling user input via serial), so that would take work, and be a mess.
The only "proper" method I know of offhand, is to do a full task switch -- but damned if I'm going to effectively implement an RTOS, or pull one in, just for this. There's got to be a better way.
Have I actually covered all the possibilities, and painted myself into a corner? Do I have to compromise and choose one of these? Am I missing anything better?
There are more possibilities to solve this.
1. Enable your timer interrupt as low priority. In this way the medium and high priority interrupts will be able to interrupt this low priority interrupt, and run unaffected.
This is similar to using sei(); in your interrupt handler in older processors (without PMIC).
2.a Set a flag (variable) in the interrupt. Poll the flag in the main loop. If the flag is set, clear it and do your stuff.
2.b Set up the timer but don't enable its interrupt. Poll the OVF interrupt flag of your timer in the main loop. If the flag is set, clear it and do your stuff.
These are timed less accurately according to what else the main loop does, so depends on your expectations for accuracy. Handling more tasks in the main loop without an OS: Cooperative multitasking, State machine.

Kernel threads vs Timers

I'm writing a kernel module which uses a customized print-on-screen system. Basically each time a print is involved the string is inserted into a linked list.
Every X seconds I need to process the list and perform some operations on the strings before printing them.
Basically I have two choices to implement such a filter:
1) Timer (which restarts itself in the end)
2) Kernel thread which sleeps for X seconds
While the filter is performing its stuff nothing else can use the linked list and, of course, while inserting a string the filter function shall wait.
AFAIK timer runs in interrupt context so it cannot sleep, but what about kernel threads? Can they sleep? If yes is there some reason for not to use them in my project? What other solution could be used?
To summarize: my filter function has got only 3 requirements:
1) Must be able to printk
2) When using the list everything else which is trying to access the list must block until the filter function finishes execution
3) Must run every X seconds (not a realtime requirement)
kthreads are allowed to sleep. (However, not all kthreads offer sleepful execution to all clients. softirqd for example would not.)
But then again, you could also use spinlocks (and their associated cost) and do without the extra thread (that's basically what the timer does, uses spinlock_bh). It's a tradeoff really.
each time a print is involved the string is inserted into a linked list
I don't really know if you meant print or printk. But if you're talking about printk(), You would need to allocate memory and you are in trouble because printk() may be called in an atomic context. Which leaves you the option to use a circular buffer (and thus, you should be tolerent to drop some strings because you might not have enough memory to save all the strings).
Every X seconds I need to process the list and perform some operations on the strings before printing them.
In that case, I would not even do a kernel thread: I would do the processing in print() if not too costly.
Otherwise, I would create a new system call:
sys_get_strings() or something, that would dump the whole linked list into userspace (and remove entries from the list when copied).
This way the whole behavior is controlled by userspace. You could create a deamon that would call the syscall every X seconds. You could also do all the costly processing in userspace.
You could also create a new device says /dev/print-on-screen:
dev_open would allocate the memory, and print() would no longer be a no-op, but feed the data in the device pre-allocated memory (in case print() would be used in atomic context and all).
dev_release would throw everything out
dev_read would get you the strings
dev_write could do something on your print-on-screen system

How to locate idle time (and network IO time, etc.) in XPerf?

Let's say I have a contrived program:
#include <Windows.h>
void useless_function()
{
Sleep(5000);
}
void useful_function()
{
// ... do some work
useless_function();
// ... do some more work
}
int main()
{
useful_function();
return 0;
}
Objective: I want the profiler to tell me useful_function() is needlessly calling useless_function() which waits for no obvious reasons. Under XPerf, this doesn't show up in any of the graphs I have because the call to WaitForMultipleObjects() seem to be accounted to Idle.exe instead of my own program.
And here's the xperf command line that I currently run:
xperf -on Latency -stackwalk Profile
Any ideas?
(This is not restricted to wait functions. The above might have been solved by placing breakpoints at NtWaitForMultipleObjects. Ideally there could be a way to see the stack sample that's taking up a lot of wall-clock time as opposed to only CPU time)
I think what you are looking for is the Wait analysis with Ready Thread functionality in Xperf. It captures every context switch and gives you the call stack of the thread once it wakes up from sleep (or an otherwise blocked operation). In your case, you would see the stack just after the call sleep(5000) as well as the time spend sleeping.
The functionality is a bit obscure to use. But it is fortunately well described here:
Use Xperf's Wait Analysis for Application-Performance Troubleshooting
Wait Analysis is the way to do this. You should:
Record the CSWITCH provider, in order to get all context switches
Record call stacks on context switches by adding +CSWITCH to your -stackwalk argument
Probably record call stacks on the ready thread to get more information on who readied you (i.e.; who released the Mutex or CS or semaphore and where) by adding +READYTHREAD to your -stackwalk
Then you use CPU Usage (Precise) in WPA (or xperfview, but that's ancient) to look at the context switches and find where your TimeSinceLast is high on a thread that shouldn't be going idle. You'll typically want the columns in CPU Usage (Precise) in this sort of order:
NewProcess (your process being switched in)
NewThreadId
NewThreadStack
ReadyingProcess (who made your thread ready to run)
ReadyingThreadId (optional)
ReadyThreadStack (optional, requires +ReadyThread on -stackwalk)
Orange bar
Count
TimeSinceLast (us) - sort by this column, usually
Whatever other columns you want
For details see these particular articles from my blog:
- https://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/
- https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/
This "profiler" will tell you - just randomly pause it a few times and look at the stack. If do some work takes 5 seconds, and do some more work takes 5 seconds, then 33% of the time the stack will look like this
main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep
So roughly 33% of your stack samples will show exactly that. Any line of code that's costing some fraction of wall-clock time will appear on roughly that fraction of samples.
On the rest of the samples you will see it doing the other things.
There are automated profilers that do the same thing in a more pretty way, such as Zoom and LTProf, although they don't actually show you the samples.
I looked at the xperf doc, trying to figure out if you could get stack samples on wall-clock time and get percents at line-level resolution. It seems you gotta be on Windows 7 or Vista. They only bother with functions, not lines, which if you have realistically big functions, is important. I couldn't figure out how to get access to the individual samples, which I think is important for seeing why the program is spending its time.

MSG::time is later than timeGetTime

After noticing some timing descrepencies with events in my code, I boiled the problem all the way down to my Windows Message Loop.
Basically, unless I'm doing something strange, I'm experiencing this behaviour:-
MSG message;
while (PeekMessage(&message, _applicationWindow.Handle, 0, 0, PM_REMOVE))
{
int timestamp = timeGetTime();
bool strange = message.time > timestamp; //strange == true!!!
TranslateMessage(&message);
DispatchMessage(&message);
}
The only rational conclusion I can draw is that MSG::time uses a different timing mechanism then timeGetTime() and therefore is free to produce differing results.
Is this the case or am i missing something fundemental?
Could this be a signed unsigned issue? You are comparing a signed int (timestamp) to an unsigned DWORD (msg.time).
Also, the clock wraps every 40ish days - when that happens strange could well be true.
As an aside, if you don't have a great reason to use timeGetTime, you can use GetTickCount here - it saves you bringing in winmm.
The code below shows how you should go about using times - you should never compare the times directly, because clock wrapping messes that up. Instead you should always subtract the start time from the current time and look at the interval.
// This is roughly equivalent code, however strange should never be true
// in this code
DWORD timestamp = GetTickCount();
bool strange = (timestamp - msg.time < 0);
I don't think it's advisable to expect or rely on any particular relationship between the absolute values of timestamps returned from different sources. For one thing, the multimedia timer may have a different resolution from the system timer. For another, the multimedia timer runs in a separate thread, so you may encounter synchronisation issues. (I don't know if each CPU maintains its own independent tick count.) Furthermore, if you are running any sort of time synchronisation service, it may be making its own adjustments to your local clock and affecting the timestamps you are seeing.
Are you by any chance running an AMD dual core? There is an issue where since each core has a separate timer and can run at different speeds, the timers can diverge from each other. This can manifest itself in negative ping times, for example.
I had similar issues when measuring timeouts in different threads using GetTickCount().
Install this driver (IIRC) to resolve the issue.
MSG.time is based on GetTickCount(), and timeGetTime() uses the multimedia timer, which is completely independent of GetTickCount(). I would not be surprised to see that one timer has 'ticked' before the other.

Handle Events in wxWidgets

I'm creating a game engine using wxWidgets and OpenGL. I'm trying to set up a timer so the game can be updated regularly. I don't want to use wxTimer, because it's probably not accurate enough for what I need. I'm using a while (true) and a wxStopWatch:
while (true) {
stopWatch.Start();
<handle events> // I need a function for this
game->OnUpdate();
game->Refresh();
if (stopWatch.Time() < 1000 / 60)
wxMilliSleep(1000 / 60 - stopWatch.Time());
}
What I need is a function that will handle all the wxWidgets events, because right now my app just freezes.
UPDATE: It doesn't. It's slightly jerky on Windows, and when tested on a Mac, it was extremely jerky. Apparently EVT_IDLE doesn't get called consistently on Windows, and even less on a Mac.
UPDATE2: It actually mostly does. It's fine on a Mac; I misunderstood my Mac tester's reply.
Instead of using a while (true) loop, I'm using EVT_IDLE, and it works perfectly.
UPDATE: It doesn't. It's slightly jerky on Windows, and when tested on a Mac, it was extremely jerky. Apparently EVT_IDLE doesn't get called consistently on Windows, and even less on a Mac.
UPDATE2: It actually mostly does. It's fine on a Mac; I misunderstood my Mac tester's reply.
"ave you requested idle events to be generated at the maximum rate? You have to call RequestMore() on the event, if you don't you will get the next idle event only after some other event has been processed. Note that constant idle processing will cause 100% CPU load on one core."
This works, I have the following code in a graphical window:-
BEGIN_EVENT_TABLE(MyCanvas, wxScrolledWindow)
EVT_PAINT (MyCanvas::OnPaint)
EVT_IDLE(MyCanvas::OnIdle)
EVT_MOTION (MyCanvas::OnMouseMove)
END_EVENT_TABLE()
The canvas needs to be updated when my_canvas->Refresh(bClearBackground) is called and not otherwise. To do this I needed to make a modification as the program was eating up half of the cpu time (or 100% of 1 cpu on a duel core).
void MyCanvas::OnIdle(wxIdleEvent &event)
{
wxPaintEvent unused;
OnPaint(unused);
event.RequestMore(false);
}
Setting the parameter of RequestMore() to false makes the app only ask for more when its needed, i.e. only when Refresh() has been called.
Have you requested idle events to be generated at the maximum rate? You have to call RequestMore() on the event, if you don't you will get the next idle event only after some other event has been processed. Note that constant idle processing will cause 100% CPU load on one core.
Even if you request more idle events you can't be sure how long it will take for the next one to arrive. Therefore to get smooth animation you will need to calculate the elapsed time since the last event, and update the display accordingly.

Resources