Given that the standard number of ticks for a cycle in a WP7 app is 333,333 ticks (or it is if you set it as such), how much of this time slice does someone have to work in?
To put it another way, how many ticks do the standard processes eat up (Drawing the screen, clearing buffers, etc)?
I worked out a process for doing something in a Spike (as I often do) but it is eating up about (14 ms) of time right now (about half the time slice that I have available) and I am concerned about what will happen if it runs past that point.
The conventional way of doing computationally intensive things is to do them on a background thread - this means that the UI thread(s) don't block while the computations are occurring - typically the UI threads are scheduled ahead of the background threads so that the screen drawing continues smoothly even though the CPU is 100% busy. This approach allows you to queue as much work as you want to.
If you need to do the computational work within the UI thread - e.g. because its part of game mechanics or part of the "per frame" update/drawing logic, then conventionally what happens is that the game frame rate slows down a bit because the phone is waiting on your logic before it can draw.
If your question is "what is a decent frame rate?" Then that depends a bit on the type of app/game, but generally (at my age...) I think anything 30Hz and above is OK - so up to 33ms for each frame - and it is important that the frame rate is smooth - i.e. each frame length takes about the same time.
I hope that approximately answers your question... wasn't entirely sure I understood it!
Related
I was searching for ways to improve my game looping and how to implement more performance options to the players when I found the term UPS. I know it means updates per seconds, but how it affects performance? And should I worry about it?
Let's assume you have an extremely simple game with a single thread and a basic loop, like "while(running) { get_input(); update_world_state(); update_video(); }". In this case you end up with "UPS = FPS" (and no reason to track UPS separately from FPS); and if the GPU is struggling to keep up the entire game slows down (e.g. if you're getting 15 frames per second, then things that have nothing to do with graphics might take 4 times longer than they should, even when you have 8 CPUs doing nothing while waiting for GPU to finish).
For one alternative, what if you had 2 threads, where one thread does "while(running) { get_input(); update_world_state(); }" and the other thread does "while(running) { update_video(); }"? In this case there's no reason to expect UPS to have anything to do with FPS. The problem here is that most games aren't smart enough to handle variable timing, so you'd end up with something more like "while(running) { get_input(); update_world_state(); wait_until_next_update_starts(); }" to make sure that the game can't run too fast (e.g. cars that are supposed to be moving at a speed of 20 Km per hour moving at 200 Km per hour because update_world_state() is being called too often). Depending on things and stuff, you might get 60 UPS (regardless of what FPS is); but if the CPU can't keep up the game can/will still slow down and you might get 20 UPS (regardless of what FPS is). Of course there's no point updating the video if the world state hasn't changed; so you'd want the graphics loop to be more like "while(running) { wait_for_world_state_update(); update_video(); }", where wait_for_world_state_update() makes sure FPS <= UPS (and where wait_for_world_state_update() returns immediately without any delay when UPS is keeping up).
The next step beyond this is "tickless". In this case you might have one high priority thread monitoring user input and assigning timestamps to input events (e.g. "at time = 12356 the user fired their main weapon") and storing them in a list. Then you might have a second (lower priority, to avoid messing up the accuracy of the user input timestamps) thread with a main loop like "while(running) { next_frame_time = estimate_when__next_frame_will_actually_be_visible(); update_world_state_until(next_frame_time); update_video(); }", where update_world_state_until() uses a whole pile of maths to predict what the game state will be at a specific point in time (and consumes the list of stored user input events while taking their timestamps into account). In this case UPS doesn't really make any sense (you'd only care about FPS). This is also much more complicated (due to the maths involved in calculating the world state at any point in time); but the end result is like having "infinite UPS" without the overhead of updating the world state more than once per frame; and it allows you to hide any graphics latency (e.g. things seen 16.66 ms later than they should); which makes it significantly better than other options (much smoother, significantly less likely for performance problems to cause simulation speed variations, etc).
I have windowed WinApi/OpenGL app. Scene is drawn rarely (compared to games) in WM_PAINT, mostly triggered by user input - MW_MOUSEMOVE/clicks etc.
I noticed, that when there is no scene moving by user mouse (application "idle") and then some mouse action by user starts, the first frame is drawn with unpleasant delay - like 300 ms. Following frames are fast again.
I implemented 100 ms timer, which only does InvalidateRect, which is later followed by WM_PAINT/draw scene. This "fixed" the problem. But I don't like this solution.
I'd like know why is this happening and also some tips how to tackle it.
Does OpenGL render context save resources, when not used? Or could this be caused by some system behaviour, like processor underclocking/energy saving etc? (Although I noticed that processor runs underclocked even when app under "load")
This sounds like Windows virtual memory system at work. The sum of all the memory use of all active programs is usually greater than the amount of physical memory installed on your system. So windows swaps out idle processes to disc, according to whatever rules it follows, such as the relative priority of each process and the amount of time it is idle.
You are preventing the swap out (and delay) by artificially making the program active every 100ms.
If a swapped out process is reactivated, it takes a little time to retrieve the memory content from disc and restart the process.
Its unlikely that OpenGL is responsible for this delay.
You can improve the situation by starting your program with a higher priority.
https://superuser.com/questions/699651/start-process-in-high-priority
You can also use the virtuallock function to prevent Windows from swapping out part of the memory, but not advisable unless you REALLY know what you are doing!
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx
EDIT: You can improve things for sure by adding more memory and for sure 4GB sounds low for a modern PC, especially if you Chrome with multiple tabs open.
If you want to be scientific before spending any hard earned cash :-), then open Performance Manager and look at Cache Faults/Sec. This will show the swap activity on your machine. (I have 16GB on my PC so this number is very low mostly). To make sure you learn, I would check Cache Faults/Sec before and after the memory upgrade - so you can quantify the difference!
Finally, there is nothing wrong with the solution you found already - to kick start the graphic app every 100ms or so.....
Problem was in NVidia driver global 3d setting -"Power management mode".
Options "Optimal Power" and "Adaptive" save power and cause the problem.
Only "Prefer Maximum Performance" does the right thing.
I'm a newbie game developer and I'm having an issue I'm trying to deal with. I am working in a game for Android using JAVA.
The case is I'm using deltaTime to have smooth movements and so on in any devices, but I came across with a problem. In a specific moment of the game, it realizes a quite expensive operation which increments the deltaTime for the next iteration. With this, this next iteration lags a bit and in old slow devices can be really bad.
To fix this, I have thought in a solution I would like to share with you and have a bit of feedback about what could happen with this. The algorythm is the following:
1) Every iteration, the deltatime is added to an "average deltatime variable" which keeps an average of all the iterations
2) If in an iteration the deltaTime is at least twice the value of the "average variable", then I reasign its value to the average
With this the game will adapt to the actual performance of the device and will not lag in a concret iteration.
What do you think? I just made it up, I suppose more people came across with this and there is another better solution... need tips! Thanks
There is a much simpler and accurate method than storing averages. I dont believe your proposal will ever get you the results that you want.
Take the total span of time (including fraction) since the previous frame began - this is your
delta time. It is often milliseconds or seconds.
Multiply your move speed by delta time before you apply
it.
This gives you frame rate independence. You will want to experiment until your speeds are correct.
Lets consider the example from my comment above:
If you have one frame that takes 1ms, and object that moves 10 units
per frame is moving at a speed of 10 units per millisecond. However, if
a frame takes 10ms, your object slows to 1 unit per millisecond.
In the first frame, we multiply the speed (10) by 1 (the delta time). This gives us a speed of 10.
In the second frame, our delta is 10 - the frame was ten times slower. If we multiply our speed (10) by the delta (10) we get 100. This is the same speed as object was moving in the 1ms frame.
We now have consistent movement speeds in our game, regardless of how often the screen updates.
EDIT:
In response to your comments.
A faster computer is the answer ;) There is no easy fix for framerate consistency and it can manifest itself in a variety of ways - screen tearing being the grimmest dilemma.
What are you doing in the frames with wildly inconsistent deltas? Consider optimizing that code. The following operations can really kill your framerate:
AI routines like Pathing
IO operations like disk/network access
Generation of procedural resources
Physics!
Anything else that isn't rendering code...
These will all cause the delta to increase by X, depending on the order of the algorithms and quantity of data being processed. Consider performing these long running operations in a separate thread and act on/display the results when they are ready.
More edits:
What you are effectively doing in your solution is slowing everything back down to avoid the jump in on screen position, regardless of the game rules.
Consider a shooter, where reflexes are everything and estimation of velocity is hugely important. What happens if the frame rate doubles and you halve the rotation speed of the player for a frame? Now the player has experienced a spike in frame rate AND their cross-hair moved slower than they thought. Worse, because you are using a running average, subsequent frames will have their movement slowed.
This seems like quite a knock on effect for one slow frame. If you had a physics engine, that slow frame may even have a very real impact on the game world.
Final thought: the idea of the delta time is to disconnect the game rules from the hardware you are running on - your solution reconnects them
There are plenty of examples in Windows of applications triggering code at fairly high and stable framerates without spiking the CPU.
WPF/Silverlight/WinRT applications can do this, for example. So can browsers and media players. How exactly do they do this, and what API calls would I make to achieve the same effect from a Win32 application?
Clock polling doesn't work, of course, because that spikes the CPU. Neither does Sleep(), because you only get around 50ms granularity at best.
They are using multimedia timers. You can find information on MSDN here
Only the view is invalidated (f.e. with InvalidateRect)on each multimedia timer event. Drawing happens in the WM_PAINT / OnPaint handler.
Actually, there's nothing wrong with sleep.
You can use a combination of QueryPerformanceCounter/QueryPerformanceFrequency to obtain very accurate timings and on average you can create a loop which ticks forward on average exactly when it's supposed to.
I have never seen a sleep to miss it's deadline by as much as 50 ms however, I've seen plenty of naive timers that drift. i.e. accumalte a small delay and conincedentally updates noticable irregular intervals. This is what causes uneven framerates.
If you play a very short beep on every n:th frame, this is very audiable.
Also, logic and rendering can be run independently of each other. The CPU might not appear to be that busy, but I bet you the GPU is hard at work.
Now, about not hogging the CPU. CPU usage is just a break down of CPU time spent by a process under a given sample (the thread schedulerer actually tracks this). If you have a target of 30 Hz for your game. You're limited to 33ms per frame, otherwise you'll be lagging behind (too slow CPU or too slow code), if you can't hit this target you won't be running at 30 Hz and if you hit it under 33ms then you can yield processor time, effectivly freeing up resources.
This might be an intresting read for you as well.
On a side note, instead of yielding time you could effecivly be doing prepwork for future computations. Some games when they are not under the heaviest of loads actually do things as sorting and memory defragmentation, a little bit here and there, adds up in the end.
Recently I was doing some deep timing checks on a DirectShow application I have in Delphi 6, using the DSPACK components. As part of my diagnostics, I created a Critical Section class that adds a time-out feature to the usual Critical Section object found in most Windows programming languages. If the time duration between the first Acquire() and the last matching Release() is more than X milliseconds, an Exception is thrown.
Initially I set the time-out at 10 milliseconds. The code I have wrapped in Critical Sections is pretty fast using mostly memory moves and fills for most of the operations contained in the protected areas. Much to my surprise I got fairly frequent time-outs in seemingly random parts of the code. Sometimes it happened in a code block that iterates a buffer list and does certain quick operations in sequence, other times in tiny sections of protected code that only did a clearing of a flag between the Acquire() and Release() calls. The only pattern I noticed is that the durations found when the time-out occurred were centered on a median value of about 16 milliseconds. Obviously that's a huge amount of time for a flag to be set in the latter example of an occurrence I mentioned above.
So my questions are:
1) Is it possible for Windows thread management code to, on a fairly frequent basis (about once every few seconds), to switch out an unblocked thread and not return to it for 16 milliseconds or longer?
2) If that is a reasonable scenario, what steps can I take to lessen that occurrence and should I consider elevating my thread priorities?
3) If it is not a reasonable scenario, what else should I look at or try as an analysis technique to diagnose the real problem?
Note: I am running on Windows XP on an Intel i5 Quad Core with 3 GB of memory. Also, the reason why I need to be fast in this code is due to the size of the buffer in milliseconds I have chosen in my DirectShow filter graphs. To keep latency at a minimum audio buffers in my graph are delivered every 50 milliseconds. Therefore, any operation that takes a significant percentage of that time duration is troubling.
Thread priorities determine when ready threads are run. There's, however, a starvation prevention mechanism. There's a so-called Balance Set Manager that wakes up every second and looks for ready threads that haven't been run for about 3 or 4 seconds, and if there's one, it'll boost its priority to 15 and give it a double the normal quantum. It does this for not more than 10 threads at a time (per second) and scans not more than 16 threads at each priority level at a time. At the end of the quantum, the boosted priority drops to its base value. You can find out more in the Windows Internals book(s).
So, it's a pretty normal behavior what you observe, threads may be not run for seconds.
You may need to elevate priorities or otherwise consider other threads that are competing for the CPU time.
sounds like normal windows behaviour with respect to timer resolution unless you explicitly go for some of the high precision timers. Some details in this msdn link
First of all, I am not sure if Delphi's Now is a good choice for millisecond precision measurements. GetTickCount and QueryPerformanceCoutner API would be a better choice.
When there is no collision in critical section locking, everything runs pretty fast, however if you are trying to enter critical section which is currently locked on another thread, eventually you hit a wait operation on an internal kernel object (mutex or event), which involves yielding control on the thread and waiting for scheduler to give control back later.
The "later" above would depend on a few things, including priorities mentioned above, and there is one important things you omitted in your test - what is the overall CPU load at the time of your testing. The more is the load, the less chances to get the thread continue execution soon. 16 ms time looks perhaps a bit still within reasonable tolerance, and all in all it might depends on your actual implementation.