Lag with deltaTime - performance

I'm a newbie game developer and I'm having an issue I'm trying to deal with. I am working in a game for Android using JAVA.
The case is I'm using deltaTime to have smooth movements and so on in any devices, but I came across with a problem. In a specific moment of the game, it realizes a quite expensive operation which increments the deltaTime for the next iteration. With this, this next iteration lags a bit and in old slow devices can be really bad.
To fix this, I have thought in a solution I would like to share with you and have a bit of feedback about what could happen with this. The algorythm is the following:
1) Every iteration, the deltatime is added to an "average deltatime variable" which keeps an average of all the iterations
2) If in an iteration the deltaTime is at least twice the value of the "average variable", then I reasign its value to the average
With this the game will adapt to the actual performance of the device and will not lag in a concret iteration.
What do you think? I just made it up, I suppose more people came across with this and there is another better solution... need tips! Thanks

There is a much simpler and accurate method than storing averages. I dont believe your proposal will ever get you the results that you want.
Take the total span of time (including fraction) since the previous frame began - this is your
delta time. It is often milliseconds or seconds.
Multiply your move speed by delta time before you apply
it.
This gives you frame rate independence. You will want to experiment until your speeds are correct.
Lets consider the example from my comment above:
If you have one frame that takes 1ms, and object that moves 10 units
per frame is moving at a speed of 10 units per millisecond. However, if
a frame takes 10ms, your object slows to 1 unit per millisecond.
In the first frame, we multiply the speed (10) by 1 (the delta time). This gives us a speed of 10.
In the second frame, our delta is 10 - the frame was ten times slower. If we multiply our speed (10) by the delta (10) we get 100. This is the same speed as object was moving in the 1ms frame.
We now have consistent movement speeds in our game, regardless of how often the screen updates.
EDIT:
In response to your comments.
A faster computer is the answer ;) There is no easy fix for framerate consistency and it can manifest itself in a variety of ways - screen tearing being the grimmest dilemma.
What are you doing in the frames with wildly inconsistent deltas? Consider optimizing that code. The following operations can really kill your framerate:
AI routines like Pathing
IO operations like disk/network access
Generation of procedural resources
Physics!
Anything else that isn't rendering code...
These will all cause the delta to increase by X, depending on the order of the algorithms and quantity of data being processed. Consider performing these long running operations in a separate thread and act on/display the results when they are ready.
More edits:
What you are effectively doing in your solution is slowing everything back down to avoid the jump in on screen position, regardless of the game rules.
Consider a shooter, where reflexes are everything and estimation of velocity is hugely important. What happens if the frame rate doubles and you halve the rotation speed of the player for a frame? Now the player has experienced a spike in frame rate AND their cross-hair moved slower than they thought. Worse, because you are using a running average, subsequent frames will have their movement slowed.
This seems like quite a knock on effect for one slow frame. If you had a physics engine, that slow frame may even have a very real impact on the game world.
Final thought: the idea of the delta time is to disconnect the game rules from the hardware you are running on - your solution reconnects them

Related

What is the difference between FPS and UPS and should I keep track of UPS on my game loop?

I was searching for ways to improve my game looping and how to implement more performance options to the players when I found the term UPS. I know it means updates per seconds, but how it affects performance? And should I worry about it?
Let's assume you have an extremely simple game with a single thread and a basic loop, like "while(running) { get_input(); update_world_state(); update_video(); }". In this case you end up with "UPS = FPS" (and no reason to track UPS separately from FPS); and if the GPU is struggling to keep up the entire game slows down (e.g. if you're getting 15 frames per second, then things that have nothing to do with graphics might take 4 times longer than they should, even when you have 8 CPUs doing nothing while waiting for GPU to finish).
For one alternative, what if you had 2 threads, where one thread does "while(running) { get_input(); update_world_state(); }" and the other thread does "while(running) { update_video(); }"? In this case there's no reason to expect UPS to have anything to do with FPS. The problem here is that most games aren't smart enough to handle variable timing, so you'd end up with something more like "while(running) { get_input(); update_world_state(); wait_until_next_update_starts(); }" to make sure that the game can't run too fast (e.g. cars that are supposed to be moving at a speed of 20 Km per hour moving at 200 Km per hour because update_world_state() is being called too often). Depending on things and stuff, you might get 60 UPS (regardless of what FPS is); but if the CPU can't keep up the game can/will still slow down and you might get 20 UPS (regardless of what FPS is). Of course there's no point updating the video if the world state hasn't changed; so you'd want the graphics loop to be more like "while(running) { wait_for_world_state_update(); update_video(); }", where wait_for_world_state_update() makes sure FPS <= UPS (and where wait_for_world_state_update() returns immediately without any delay when UPS is keeping up).
The next step beyond this is "tickless". In this case you might have one high priority thread monitoring user input and assigning timestamps to input events (e.g. "at time = 12356 the user fired their main weapon") and storing them in a list. Then you might have a second (lower priority, to avoid messing up the accuracy of the user input timestamps) thread with a main loop like "while(running) { next_frame_time = estimate_when__next_frame_will_actually_be_visible(); update_world_state_until(next_frame_time); update_video(); }", where update_world_state_until() uses a whole pile of maths to predict what the game state will be at a specific point in time (and consumes the list of stored user input events while taking their timestamps into account). In this case UPS doesn't really make any sense (you'd only care about FPS). This is also much more complicated (due to the maths involved in calculating the world state at any point in time); but the end result is like having "infinite UPS" without the overhead of updating the world state more than once per frame; and it allows you to hide any graphics latency (e.g. things seen 16.66 ms later than they should); which makes it significantly better than other options (much smoother, significantly less likely for performance problems to cause simulation speed variations, etc).

Instantiating multiple objects in one frame or one object per frame?

The idea is simple 'instantiating a map' in Awake with random values.
But the Question is:
Should i instantiate the whole map in one frame (using Loop)?
or better to instantiate each object per frame?
Because i don't want to ruin player's ram by instantiating 300 gameobjects in less than a second.
Whether you instantiate all gameobjects in one frame or not, they will always end up in the RAM the same way. The only way to "ruin" someone's ram would be to instantiate so many gameobjects til there is no memory left. Considering that a typical prefab in unity is only a few kb in size and a typical RAM nowadays is a few GB in size , that would take roughly a million gameobjects.
Never ever make things that depends on frames, never!!
There are some exceptions when this can be good but most of the time its not.
Good case:
- Incremental garbage collection (still has drawbacks)
Bad case:
- Your case, loading a map should be at the beginning
Why i should not make my game frame dependent?
Because, PCs have different computational speed, a good example was Harry Potter II, the game was developed for machines capable of 30 frames per seconds, modern machines can run that game extremely fast thus the game is basically sped up, you need to manually throttle the CPU to make the game playable.
Another example is unitys delta time, the reason you use it when moving objects over multiple frames because it takes into account the last frames computation speed
Also 300 objects is nothing when loading a game, also from a player point of view:
What is better 10 seconds loading, or 30 seconds 15 fps then normal speed
(above example is exaggerated tho)
When loading a map you can do it asynchronously at the start of entering the scene. This way you can put a loadingscreen during the loading time. This is a good way to do it if you are making a single player game. If it's a multiplayer game you need to sync it on the server for every other player aswell. The method for loading a scene async is SceneManager.LoadSceneAsync().
If you're trying to instantiate objects during runtime because you want to randomize certain objects I would recommend loading every object that doesn't need randomizing by entering the scene (so dropping them in the scene).
This is how I interpreted your question tell me if I am wrong.

Is it possible to find hotspots in a parallel application using a sampling profiler?

As far as I understand a sampling profiler works as follows: it interupts the program execution in regular intervals and reads out the call stack. It notes which part of the program is currently executing and increments a counter that represents this part of the program. In a post processing step: For each function of the program the ratio of the whole execution time is computed, for which the function is responsible for. This is done by looking at the counter C for this specific function and the total number of samples N:
ratio of the function = C / N
Finding the hotspots then is easy, as this are the parts of the program with a high ratio.
But how can this be done for a parallel program running on parallel hardware. As far as I know, when the program execution is interupted the executing parts of the program on ALL processors are determined. Due to that a function which is executed in parallel gets counted multiple times. Thus the number of samples C of this function can not be used for computing its share of the whole execution time anymore.
Is my thinking correct? Are there other ways how the hotspots of a parallel program can be identified - or is this just not possible using sampling?
You're on the right track.
Whether you need to sample all the threads depends on whether they are doing the same thing or different things.
It is not essential to sample them all at the same time.
You need to look at the threads that are actually working, not just idling.
Some points:
Sampling should be on wall-clock time, not CPU time, unless you want to be blind to needless I/O and other blocking calls.
You're not just interested in which functions are on the stack, but which lines of code, because they convey the purpose of the time being spent. It is more useful to look for a "hot purpose" than a "hot spot".
The cost of a function or line of code is just the fraction of samples it appears on. To appreciate that, suppose samples are taken every 10ms for a total of N samples. If the function or line of code could be made to disappear, then all the samples in which it is on the stack would also disappear, reducing N by that fraction. That's what speedup is.
In spite of the last point, in sampling, quality beats quantity. When the goal is to understand what opportunities you have for speedup, you get farther faster by manually scrutinizing 10-20 samples to understand the full reason why each moment in time is being spent. That's why I take samples manually. Knowing the amount of time with statistical precision is really far less important.
I can't emphasize enough the importance of finding and fixing more than one problem. Speed problems come in severals, and each one you fix has a multiplier effect on those done already. The ones you don't find end up being the limiting factor.
Programs that involve a lot of asynchronous inter-thread message-passing are more difficult, because it becomes harder to discern the full reason why a moment in time is being spent.
More on that.

Win32 game loop that doesn't spike the CPU

There are plenty of examples in Windows of applications triggering code at fairly high and stable framerates without spiking the CPU.
WPF/Silverlight/WinRT applications can do this, for example. So can browsers and media players. How exactly do they do this, and what API calls would I make to achieve the same effect from a Win32 application?
Clock polling doesn't work, of course, because that spikes the CPU. Neither does Sleep(), because you only get around 50ms granularity at best.
They are using multimedia timers. You can find information on MSDN here
Only the view is invalidated (f.e. with InvalidateRect)on each multimedia timer event. Drawing happens in the WM_PAINT / OnPaint handler.
Actually, there's nothing wrong with sleep.
You can use a combination of QueryPerformanceCounter/QueryPerformanceFrequency to obtain very accurate timings and on average you can create a loop which ticks forward on average exactly when it's supposed to.
I have never seen a sleep to miss it's deadline by as much as 50 ms however, I've seen plenty of naive timers that drift. i.e. accumalte a small delay and conincedentally updates noticable irregular intervals. This is what causes uneven framerates.
If you play a very short beep on every n:th frame, this is very audiable.
Also, logic and rendering can be run independently of each other. The CPU might not appear to be that busy, but I bet you the GPU is hard at work.
Now, about not hogging the CPU. CPU usage is just a break down of CPU time spent by a process under a given sample (the thread schedulerer actually tracks this). If you have a target of 30 Hz for your game. You're limited to 33ms per frame, otherwise you'll be lagging behind (too slow CPU or too slow code), if you can't hit this target you won't be running at 30 Hz and if you hit it under 33ms then you can yield processor time, effectivly freeing up resources.
This might be an intresting read for you as well.
On a side note, instead of yielding time you could effecivly be doing prepwork for future computations. Some games when they are not under the heaviest of loads actually do things as sorting and memory defragmentation, a little bit here and there, adds up in the end.

How much time is too much?

Given that the standard number of ticks for a cycle in a WP7 app is 333,333 ticks (or it is if you set it as such), how much of this time slice does someone have to work in?
To put it another way, how many ticks do the standard processes eat up (Drawing the screen, clearing buffers, etc)?
I worked out a process for doing something in a Spike (as I often do) but it is eating up about (14 ms) of time right now (about half the time slice that I have available) and I am concerned about what will happen if it runs past that point.
The conventional way of doing computationally intensive things is to do them on a background thread - this means that the UI thread(s) don't block while the computations are occurring - typically the UI threads are scheduled ahead of the background threads so that the screen drawing continues smoothly even though the CPU is 100% busy. This approach allows you to queue as much work as you want to.
If you need to do the computational work within the UI thread - e.g. because its part of game mechanics or part of the "per frame" update/drawing logic, then conventionally what happens is that the game frame rate slows down a bit because the phone is waiting on your logic before it can draw.
If your question is "what is a decent frame rate?" Then that depends a bit on the type of app/game, but generally (at my age...) I think anything 30Hz and above is OK - so up to 33ms for each frame - and it is important that the frame rate is smooth - i.e. each frame length takes about the same time.
I hope that approximately answers your question... wasn't entirely sure I understood it!

Resources