Metal api - GPU FPS is 0 - xcode

I had play around with Signed Distance Functions and the frame rate is falling to 30 fps. So I just take a look at the bebugger in Xcode:
and just realize that all the processing is done in cpu and seems like gpu is not running at all.
Almost all of my code is inside metal file with compute shader. cpu is just to compile and launch the app.
What could possibly be happening here? Anyway for me to test and inspect this issue?
I am using macOS 10.12.2 and Xcode 8.3.2.

You shouldn't pay too much attention to those gauges; they're both lying to you. The GPU meter always reports 0 utilization on some AMD GPUs, despite the fact that your SDF raymarcher is probably quite taxing on the GPU. The apparently high CPU utilization is actually caused by the fact that the frame time is calculated from the beginning of the frame to the end, rather than the amount of time the CPU is actually busy (e.g., if the GPU takes 30 ms to complete the frame, that will show up as ~30ms on the CPU, even though it was mostly idle during that time). Notice that the CPU utilization is actually only ~3% on the left; this is a more accurate reflection of how little work you're doing to encode the frame.
In short: the gauges are unreliable. Your shader is expensive, and that's why your frame rate is suffering.

Related

Unreasonable CPU consumption for server build with nographics

I have built my game in server mode on Mac OS and attached profiler to it. In profiler I can see unreasonable high cpu load.
Other scripts take a lot of cpu time. How can this be optimized?
Vsync takes a lot of time otherwise. How can it be since I have disabled VSync, built as server, run with -nographics and even removed cameras and UI?
I dont know what your game is doing in terms of calculations from custom scripts, but if its not just an empty project, then i dont see why 33ms is unreasonable. Check your servers specs maybe? Also VSync just idles for a need amount of time to reach the fixed target FPS, meaning its not actually under load, even though the profiler is showing it as a big lump of color. Think of it more as headroom - how much processing you can still do per frame and have your target FPS.

GPU affects core calculation and or RAM access (high jitter)?

i have a kthread which runs alone on one core from a multi-core CPU. This kthread disables all IRQs for that core, runs a loop as fast as possible and measures the maximum loop duration with the help of the TSC. The whole ACPI stuff is disabled (no frequency scaling, no power saving, etc.).
My problem is, that the maximum loop duration apparently depends on the gpu.
When the system is used normal (a little bit office, Internet and programming stuff / not really busy) then the maximum loop duration is around 5 us :-(
The same situation, but with a stressed CPU (the other three cores are 100% busy) leads to a maximum loop duration of approximately 1 us :-|
But when the GPU is switching into idle mode (turning-off the screen), then the maximum loop duration is going down to less than 300 ns :-)
Why is that? And how can i influence this behavior? I thought the CPU and the RAM are directly connected. I recognized, that the maximum loop duration becomes better on a system with a external graphic card for the first situation. For the second and third case i couldn't see a difference. I also tested AMD and Intel systems without success - always the same :-(
I'm fine with the second case. But is it possible to achieve that without stressing the CPU additionally?
Many thanks in advance!
Billy

Why the data transfer is slow from GPU to CPU?

Today I have figured out something that really made me wondering. I have the Samsung Exynos 4412 ARM9 CPU which has a GPU400(QuadCore). I tried to get a texture from the GPU to CPU by all known methods and its really slow. The same scenario and slow speed happens also in modern CPUs and GPUs in the PC Platform. My wondering is how that happens and the Samsung Exynos is an SoC and both of them has the same memory and I should not care about the bus.
Why that happens ?
The data from the GPU to the CPU is transferred by many methods which I have tried glReadpixels, gltexSubImage2D, gltexImage2d, FBO.
The frame rate drops from 40FPS to 7FPs or 7FPS while using any of those methods, on a texture 1024*1024 24bits.
Possible answers taken from the OpenGL forums:
Latency: it takes time for the read command to reach the hardware.
OpenGL command buffering: Reading the data requires the OpenGL driver to complete all outstanding commands.
Hardware buffering: Hardware must empty all GPU core pipelines before doing a readback.
Possible solution:
- Copy the data internally on the GPU to another location and read it back some number of frames after computing it. This should allow everything writing to that location to have completed before you attempt to read it.

Win32 game loop that doesn't spike the CPU

There are plenty of examples in Windows of applications triggering code at fairly high and stable framerates without spiking the CPU.
WPF/Silverlight/WinRT applications can do this, for example. So can browsers and media players. How exactly do they do this, and what API calls would I make to achieve the same effect from a Win32 application?
Clock polling doesn't work, of course, because that spikes the CPU. Neither does Sleep(), because you only get around 50ms granularity at best.
They are using multimedia timers. You can find information on MSDN here
Only the view is invalidated (f.e. with InvalidateRect)on each multimedia timer event. Drawing happens in the WM_PAINT / OnPaint handler.
Actually, there's nothing wrong with sleep.
You can use a combination of QueryPerformanceCounter/QueryPerformanceFrequency to obtain very accurate timings and on average you can create a loop which ticks forward on average exactly when it's supposed to.
I have never seen a sleep to miss it's deadline by as much as 50 ms however, I've seen plenty of naive timers that drift. i.e. accumalte a small delay and conincedentally updates noticable irregular intervals. This is what causes uneven framerates.
If you play a very short beep on every n:th frame, this is very audiable.
Also, logic and rendering can be run independently of each other. The CPU might not appear to be that busy, but I bet you the GPU is hard at work.
Now, about not hogging the CPU. CPU usage is just a break down of CPU time spent by a process under a given sample (the thread schedulerer actually tracks this). If you have a target of 30 Hz for your game. You're limited to 33ms per frame, otherwise you'll be lagging behind (too slow CPU or too slow code), if you can't hit this target you won't be running at 30 Hz and if you hit it under 33ms then you can yield processor time, effectivly freeing up resources.
This might be an intresting read for you as well.
On a side note, instead of yielding time you could effecivly be doing prepwork for future computations. Some games when they are not under the heaviest of loads actually do things as sorting and memory defragmentation, a little bit here and there, adds up in the end.

Windows Phone 7 Frame Rate Performance

Reading Jeff Willcox on frame rate counters, I realized my application rarely hit the 60 fps. I'm not satisfied with the global performance of my app (compared to its iPhone counterpart), but the numbers seems weird to me.
When the app is doing nothing, even just after launch, it's even sometimes at 0 fps. And the higher I hit is 50 fps.
Overall, my application is not blazing fast, but not really slow. So how can I interpret the numbers ? How can I spot the issue that makes my app have a bad fps ?
A low frame rate doesn't necessarily indicate poor performance.
If you're testing on an actual device and you see poor performance when then an investigation may indicate a problem that may be related to an issue which also impacts frame rate.
Hmmm. That sentence may not be clear.
Don't worry too much about getting a high frame rate all the time. Focus on actual performance experienced by the user.
If the actual performance is poor and the frame rate is low, that's when you should worry about the frame rate.
What's important is testing on an actual device and what performance is like there.
Jeff Wilcox notes in his post that:
Frame rate counters may be 0 when there is no animation being updated on the thread at any particular moment. You can add a very simple, continually animating and repeating, animation to your application during development & testing if you want to ensure that there is always some frame rate value available.
So the 0fps reading seems not be an issue since no screen updates need to be rendered.

Resources