How to request use of integrated GPU when using Metal API? - macos

According to Apple documentation, when adding the value "YES" (or true) for key "NSSupportsAutomaticGraphicsSwitching" to the Info.plist file for an OSX app, the integrated GPU will be invoked on dual-GPU systems (as opposed to the discrete GPU). This is useful as the integrated GPU -- while less performant -- is adequate for my app's needs and consumes less energy.
Unfortunately, building as per above and subsequently inspecting the Activity Monitor (Energy tab: "Requires High Perf GPU" column) reveals that my Metal API-enabled app still uses the discrete GPU, despite requesting the integrated GPU.
Is there any way I can give a hint to the Metal system itself to use the integrated GPU?

The problem was that Metal API defaults to using the discrete GPU. Using the following code, along with the correct Info.plist configuration detailed above, results in the integrated GPU being used:
NSArray<id<MTLDevice>> *devices = MTLCopyAllDevices();
gpu_ = nil;
// Low power device is sufficient - try to use it!
for (id<MTLDevice> device in devices) {
if (device.isLowPower) {
gpu_ = device;
break;
}
}
// below: probably not necessary since there is always
// integrated GPU, but doesn't hurt.
if (gpu_ == nil)
gpu_ = MTLCreateSystemDefaultDevice();
If you're using an MTKView remember to pass gpu_ to the its initWithFrame:device: method.

Related

Best Practice for testing Managed Buffers?

On Macs with discrete graphics cards, Managed buffers should be used instead of Shared Buffers, however there are additional requirements to maintain synchronisation using [MTLBuffer:didModifyRange:].
However on Apple Silicon, if I force the use of Managed buffers by pretending [MTLDevice hasUnifiedMemory] returns NO, and removing calls to didModifyRange:, then the rendering is working just fine.
What's the best way to test Managed buffers on Apple Silicon where the GPU memory is unified so that I can be sure my code will work on older Macs?
The best practice for testing hardware compatibility is on the actual hardware in which you are testing compatibility. If you plan on supporting discrete GPUs, which are substantially different from Apple Silicon, it would be best to have access to one for testing.
You might approximate behavior, but remember that it is only a emulation, and there is no way to ensure that the actual hardware will work the same.
It would be akin to developing with the Simulator only, which is not at all a good practice.
UPDATE: There are numerous services that rent access to bare metal Macs. The MacInCloud service allows you to configure a machine with an external GPU (such as a AMD RX 580). It is only $0.99 for the first 24 hours.
There are many similar services out there, but that is the first service I was able to verify that discrete GPUs are an option.
In my experience, there is no best practice to test code when it comes to rendering api's, since there are many different factors here:
GPU and CPU vendors (Apple, AMD, Intel), operating systems, drivers.
I agree with Jeshua:
The best practice for testing hardware compatibility is on the actual
hardware in which you are testing compatibility.
There are many useful ways that can make development and testing easier:
You can detect vendors id:
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
NSString* appleGPU = [device.name containsString:#"Apple"];
NSString* intelGPU = [device.name containsString:#"Intel"];
NSString* amdGPU = [device.name containsString:#"AMD"];
NSString* nvidiaGPU = [device.name containsString:#"Nvidia"];
With following method you can find your gpu type:
bool externalGPU = [device isRemovable] == true;
bool integratedGPU = [device isLowPower] == true;
bool discreteGPU = [device isLowPower] == false;
Note:
[device isLowPower];
On a Mac with an Apple silicon M1 chip, the property is NO because the
GPU runs with both high performance and low power.
Determine TBDR GPU architecture:
if (#available(macOS 10.15, *)) {
if ([device supportsFamily: MTLGPUFamilyApple4])
{
// The GPU does support Tile-Based Deferred Rendering technique
}
}
Understand the Managed Mode:
In a unified memory model, a resource with a MTLStorageModeManaged
mode resides in system memory accessible to both the CPU and the GPU.
Behaves like MTLStorageModeShared, has only one copy of content.
Note:
In a unified memory model, Metal may ignore
synchronization calls completely because it only creates a single
memory allocation for the resource.
You can also check some implementations by other developers:
PixarAnimationStudios/USD:
HgiMetalCapabilities::HgiMetalCapabilities(id<MTLDevice> device)
{
if (#available(macOS 10.14.5, ios 12.0, *)) {
_SetFlag(HgiDeviceCapabilitiesBitsConcurrentDispatch, true);
}
defaultStorageMode = MTLResourceStorageModeShared;
bool unifiedMemory = false;
if (#available(macOS 100.100, ios 12.0, *)) {
unifiedMemory = true;
} else if (#available(macOS 10.15, ios 13.0, *)) {
#if defined(ARCH_OS_IOS) || (defined(__MAC_10_15) && __MAC_OS_X_VERSION_MAX_ALLOWED >= __MAC_10_15)
unifiedMemory = [device hasUnifiedMemory];
#else
unifiedMemory = [device isLowPower];
#endif
}
_SetFlag(HgiDeviceCapabilitiesBitsUnifiedMemory, unifiedMemory);
#if defined(ARCH_OS_MACOS)
if (!unifiedMemory) {
defaultStorageMode = MTLResourceStorageModeManaged;
}
#endif
}
KhronosGroup/MoltenVK
// Metal Managed:
// - applies to both buffers and textures
// - default mode for textures on macOS
// - two copies of each buffer or texture when discrete memory available
// - convenience of shared mode, performance of private mode
// - on unified systems behaves like shared memory and has only one copy of content
// - when writing, use:
// - buffer didModifyRange:
// - texture replaceRegion:
// - when reading, use:
// - encoder synchronizeResource: followed by
// - cmdbuff waitUntilCompleted (or completion handler)
// - buffer/texture getBytes:

What is the correct usage of FrameTiming and FrameTimingManager

I'm trying to log the time the GPU takes to render a frame. To do this I found that Unity implemented a struct FrameTiming, and a class named FrameTimingManager
The FrameTiming struct has a property gpuFrameTime which sounds like exactly what I need, however the value is never set, and the documentation on it doesn't provide much help either
public double gpuFrameTime;
Description
The GPU time for a given frame, in ms.
Looking further I found the FrameTimingManager class which contains a static method for GetGpuTimerFrequency(), which has the not so helpful documentation stating only:
Returns ulong GPU timer frequency for current platform.
Description
This returns the frequency of GPU timer on the current platform, used to interpret timing results. If the platform does not support returning this value it will return 0.
Calling this method in an update loop only ever yields 0 (on both Window 10 running Unity 2019.3 and Android phone running Android 10).
private void OnEnable()
{
frameTiming = new FrameTiming();
}
private void Update()
{
FrameTimingManager.CaptureFrameTimings();
var result = FrameTimingManager.GetGpuTimerFrequency();
Debug.LogFormat("result: {0}", result); //logs 0
var gpuFrameTime = frameTiming.gpuFrameTime;
Debug.LogFormat("gpuFrameTime: {0}", gpuFrameTime); //logs 0
}
So what's the deal here, am I using the FrameTimeManager incorrectly, or are Windows and Android not supported (Unity mentions in the docs that not all platforms are supported, but nowhere do they give a list of supported devices..)?
While grabbing documentation links for the question I stumbled across some forum posts that shed light on the issue, so leaving it here for future reference.
The FrameTimingManager is indeed not supported for Windows, and only has limited support for Android devices, more specifically only for Android Vulkan devices. As explained by jwtan_Unity on the forums here (emphasis mine):
FrameTimingManager was introduced to support Dynamic Resolution. Thus, it is only supported on platforms that support Dynamic Resolution. These platforms are currently Xbox One, PS4, Nintendo Switch, iOS, macOS and tvOS (Metal only), Android (Vulkan only), Windows Standalone and UWP (DirectX 12 only).
Now to be able to use the FrameTimingManager.GetGpuTimerFrequency() we need to do something else first. We need to take a snapshot of the current timings using FrameTimingManager.CaptureFrameTimings first (this needs to be done every frame). From the docs:
This function triggers the FrameTimingManager to capture a snapshot of FrameTiming's data, that can then be accessed by the user.
The FrameTimingManager tries to capture as many frames as the platform allows but will only capture complete timings from finished and valid frames so the number of frames it captures may vary. This will also capture platform specific extended frame timing data if the platform supports more in depth data specifically available to it.
As explained by Timothyh_Unity on the forums hereenter link description here
CaptureFrameTimings() - This should be called once per frame(presuming you want timing data that frame). Basically this function captures a user facing collection of timing data.
So the total code to get the GPU frequency (on a supported device) would be
private void Update()
{
FrameTimingManager.CaptureFrameTimings();
var result = FrameTimingManager.GetGpuTimerFrequency();
Debug.LogFormat("result: {0}", result);
}
Note that all FrameTimingManager methods are static, and do not require you to instantiate a manager first
Why none of this is properly documented by Unity beats me...

How to pick the right Metal Device for GPU processing on a Mac Pro

When creating a new CIContext with Metal device one has to provide which device (a GPU) to use:
let context = CIContext(
mtlDevice: device
)
On my MacBook Pro for the development purposes I always pick the device associated with the screen with MTLCreateSystemDefaultDevice() method:
guard
let device:MTLDevice = MTLCreateSystemDefaultDevice()
else {
exit(EXIT_FAILURE)
}
However on a Mac Pro which will be used in production in a headless mode there are two GPU cards that I can target. In order to get all available devices one can use MTLCopyAllDevices() method which gives the following output on my Mac Pro:
[
<MTLDebugDevice: 0x103305450> -> <BronzeMtlDevice: 0x10480a200>
name = AMD Radeon HD - FirePro D700
<MTLDebugDevice: 0x103307730> -> <BronzeMtlDevice: 0x104814800>
name = AMD Radeon HD - FirePro D700
]
This Mac Pro will be utilised heavily with hundreds of small tasks per second and every time the new task comes in I need to select a GPU device on which the task will be processed.
Now the question is - is picking a random device from the above array a good idea:
let devices = MTLCopyAllDevices() // get all available devices
let rand = Int(arc4random_uniform(UInt32(devices.count))) // random index
let device = devices[rand] // randomly selected GPU to use
let context = CIContext(
mtlDevice: device
)
Since there are two equal GPU devices on a Mac Pro, targeting always one will be a waste of resources. Logic tells me that with the above code both GPUs will be utilised equally but maybe I'm wrong and MacOS offer some kind of abstraction layer that will intelligently pick the GPU which is less utilised at the time of execution?
Thank you in advance.
Why not just alternate between them? Even if you're committing command buffers from multiple threads, the work should be spread roughly evenly:
device = devices[taskIndex % devices.count]
Also, make sure to avoid creating CIContexts for every operation; those are expensive, so you should keep a list of contexts (one per device) instead.
Note that if you're doing any of your own Metal work (as opposed to just Core Image filtering), you'll need to have a command queue for each device, and any resources you want to use will need to be allocated by their respective device (resources can't be shared by MTLDevices).

OSG uses client-side GPU not host GPU

I made a simple OSG off screen renderer that renders without popping up a window.
osg::ref_ptr<osg::GraphicsContext::Traits> traits = new osg::GraphicsContext::Traits;
traits->x = 0;
traits->y = 0;
traits->width = screenWidth;
traits->height = screenHeight;
if (offScreen) {
traits->windowDecoration = false;
traits->doubleBuffer = true;
traits->pbuffer = true;
} else {
traits->windowDecoration = true;
traits->doubleBuffer = true;
traits->pbuffer = false;
}
traits->sharedContext = 0;
std::cout << "DisplayName : " << traints->displayName() << std::endl;
traits->readDISPLAY();
osg::GraphicsContext* _gc = osg::GraphicsContext::createGraphicsContext(traits.get());
if (!_gc) {
osg::notify(osg::NOTICE)<< "Failed to create pbuffer, failing back to normal graphics window." << std::endl;
traits->pbuffer = false;
_gc = osg::GraphicsContext::createGraphicsContext(traits.get());
}
However, if I ssh to server and run the application, it actually uses client GPU rather than server GPU. There are four GeForce GPUs on the server. I tried to change the DISPLAY to hostname:0.0 but it did not work.
What should I do to make the application use server GPU not client GPU in Linux?
First a little bit of nomenclauture: The system on which the display is connected is the server in X11. So you got your terminlogy reversed. Then to make use of the GPUs on the remote system for OpenGL rendering, the currently existing Linux driver model requires an X11 server to run (this is about to change with Wayland, but there's still a lot of work to be done, before it can be used). Essentially the driver loaded into the X server, hence you need that.
Of course an X server can not be accessed by any user. An XAuthority token is required (see the xauth manpage). Also if no monitors are connected, you may have to do extra configuration to convince the GPUs driver to not refuse starting. Also you probably want to disable the use of input devices.
Then with an X server running and the user which shall run the OSG program having got a XAuthority token you can run the OSG program. Yes, it is tedious, but ATM we're stuck with that.
I've done some search and for those who wind up in this question, I'll summarize what I find and I'll update specific commands that enables server side off-screen rendering.
and Yes, it is definitely possible.
Use VirtualGL to route all the commands back to server.
VirtualGL is a X11 specific API that capture OpenGL commands execute on the server-side GPU. However, this might change server-side OpenGL behavior so I would not recommend if other users use OpenGL at the same time.
Offscreen rendering using Mesa graphics library.
Mesa is an open-source implementation of the OpenGL specification - a system for rendering interactive 3D graphics.
A variety of device drivers allows Mesa to be used in many different environments ranging from software emulation to complete hardware acceleration for modern GPUs.
Mesa allows user to create GraphicsContext that resides on the server-side memory and allows off-screen rendering. link. I'll update some codes.

audioqueue kAudioQueueParam_Pitch

The documentation for Audio Queue Services under OS 10.6 now includes a pitch parameter:
kAudioQueueParam_Pitch
The number of cents to pitch-shift the audio queue’s playback, in the range -2400through 2400 cents (where 1200 cents corresponds to one musical octave.)
This parameter is usable only if the time/pitch processor is enabled.
Other sections of the same document still say that volume is the only available parameter, and I can't find any reference to the time/pitch processor mentioned above.
Does anyone know what this refers to? Directly writing a value to the parameter has no effect on playback (although no error is thrown). Similarly writing the volume setting does work.
Frustrating as usual with no support from Apple.
This is only available on OSX until iOS 7. If you look at AudioQueue.h you'll find it is conditionally available only on iOS 7. [note: on re-reading I see you were referring to OS X, not iOS, but hopefully the following is cross-platform]
Also, you need to enable the queue for time_pitch before setting the time_pitch algorithm, and only the Spectral algorithm supports pitch (all of them support rate)
result = AudioQueueNewOutput(&(pAqData->mDataFormat), aqHandleOutputBuffer, pAqData,
0, kCFRunLoopCommonModes , 0, &(pAqData->mQueue));
// enable time_pitch
UInt32 trueValue = 1;
AudioQueueSetProperty(pAqData->mQueue, kAudioQueueProperty_EnableTimePitch, &trueValue, sizeof(trueValue));
UInt32 timePitchAlgorithm = kAudioQueueTimePitchAlgorithm_Spectral; // supports rate and pitch
AudioQueueSetProperty(pAqData->mQueue, kAudioQueueProperty_TimePitchAlgorithm, &timePitchAlgorithm, sizeof(timePitchAlgorithm));

Resources