Best Practice for testing Managed Buffers? - xcode

On Macs with discrete graphics cards, Managed buffers should be used instead of Shared Buffers, however there are additional requirements to maintain synchronisation using [MTLBuffer:didModifyRange:].
However on Apple Silicon, if I force the use of Managed buffers by pretending [MTLDevice hasUnifiedMemory] returns NO, and removing calls to didModifyRange:, then the rendering is working just fine.
What's the best way to test Managed buffers on Apple Silicon where the GPU memory is unified so that I can be sure my code will work on older Macs?

The best practice for testing hardware compatibility is on the actual hardware in which you are testing compatibility. If you plan on supporting discrete GPUs, which are substantially different from Apple Silicon, it would be best to have access to one for testing.
You might approximate behavior, but remember that it is only a emulation, and there is no way to ensure that the actual hardware will work the same.
It would be akin to developing with the Simulator only, which is not at all a good practice.
UPDATE: There are numerous services that rent access to bare metal Macs. The MacInCloud service allows you to configure a machine with an external GPU (such as a AMD RX 580). It is only $0.99 for the first 24 hours.
There are many similar services out there, but that is the first service I was able to verify that discrete GPUs are an option.

In my experience, there is no best practice to test code when it comes to rendering api's, since there are many different factors here:
GPU and CPU vendors (Apple, AMD, Intel), operating systems, drivers.
I agree with Jeshua:
The best practice for testing hardware compatibility is on the actual
hardware in which you are testing compatibility.
There are many useful ways that can make development and testing easier:
You can detect vendors id:
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
NSString* appleGPU = [device.name containsString:#"Apple"];
NSString* intelGPU = [device.name containsString:#"Intel"];
NSString* amdGPU = [device.name containsString:#"AMD"];
NSString* nvidiaGPU = [device.name containsString:#"Nvidia"];
With following method you can find your gpu type:
bool externalGPU = [device isRemovable] == true;
bool integratedGPU = [device isLowPower] == true;
bool discreteGPU = [device isLowPower] == false;
Note:
[device isLowPower];
On a Mac with an Apple silicon M1 chip, the property is NO because the
GPU runs with both high performance and low power.
Determine TBDR GPU architecture:
if (#available(macOS 10.15, *)) {
if ([device supportsFamily: MTLGPUFamilyApple4])
{
// The GPU does support Tile-Based Deferred Rendering technique
}
}
Understand the Managed Mode:
In a unified memory model, a resource with a MTLStorageModeManaged
mode resides in system memory accessible to both the CPU and the GPU.
Behaves like MTLStorageModeShared, has only one copy of content.
Note:
In a unified memory model, Metal may ignore
synchronization calls completely because it only creates a single
memory allocation for the resource.
You can also check some implementations by other developers:
PixarAnimationStudios/USD:
HgiMetalCapabilities::HgiMetalCapabilities(id<MTLDevice> device)
{
if (#available(macOS 10.14.5, ios 12.0, *)) {
_SetFlag(HgiDeviceCapabilitiesBitsConcurrentDispatch, true);
}
defaultStorageMode = MTLResourceStorageModeShared;
bool unifiedMemory = false;
if (#available(macOS 100.100, ios 12.0, *)) {
unifiedMemory = true;
} else if (#available(macOS 10.15, ios 13.0, *)) {
#if defined(ARCH_OS_IOS) || (defined(__MAC_10_15) && __MAC_OS_X_VERSION_MAX_ALLOWED >= __MAC_10_15)
unifiedMemory = [device hasUnifiedMemory];
#else
unifiedMemory = [device isLowPower];
#endif
}
_SetFlag(HgiDeviceCapabilitiesBitsUnifiedMemory, unifiedMemory);
#if defined(ARCH_OS_MACOS)
if (!unifiedMemory) {
defaultStorageMode = MTLResourceStorageModeManaged;
}
#endif
}
KhronosGroup/MoltenVK
// Metal Managed:
// - applies to both buffers and textures
// - default mode for textures on macOS
// - two copies of each buffer or texture when discrete memory available
// - convenience of shared mode, performance of private mode
// - on unified systems behaves like shared memory and has only one copy of content
// - when writing, use:
// - buffer didModifyRange:
// - texture replaceRegion:
// - when reading, use:
// - encoder synchronizeResource: followed by
// - cmdbuff waitUntilCompleted (or completion handler)
// - buffer/texture getBytes:

Related

How to pick the right Metal Device for GPU processing on a Mac Pro

When creating a new CIContext with Metal device one has to provide which device (a GPU) to use:
let context = CIContext(
mtlDevice: device
)
On my MacBook Pro for the development purposes I always pick the device associated with the screen with MTLCreateSystemDefaultDevice() method:
guard
let device:MTLDevice = MTLCreateSystemDefaultDevice()
else {
exit(EXIT_FAILURE)
}
However on a Mac Pro which will be used in production in a headless mode there are two GPU cards that I can target. In order to get all available devices one can use MTLCopyAllDevices() method which gives the following output on my Mac Pro:
[
<MTLDebugDevice: 0x103305450> -> <BronzeMtlDevice: 0x10480a200>
name = AMD Radeon HD - FirePro D700
<MTLDebugDevice: 0x103307730> -> <BronzeMtlDevice: 0x104814800>
name = AMD Radeon HD - FirePro D700
]
This Mac Pro will be utilised heavily with hundreds of small tasks per second and every time the new task comes in I need to select a GPU device on which the task will be processed.
Now the question is - is picking a random device from the above array a good idea:
let devices = MTLCopyAllDevices() // get all available devices
let rand = Int(arc4random_uniform(UInt32(devices.count))) // random index
let device = devices[rand] // randomly selected GPU to use
let context = CIContext(
mtlDevice: device
)
Since there are two equal GPU devices on a Mac Pro, targeting always one will be a waste of resources. Logic tells me that with the above code both GPUs will be utilised equally but maybe I'm wrong and MacOS offer some kind of abstraction layer that will intelligently pick the GPU which is less utilised at the time of execution?
Thank you in advance.
Why not just alternate between them? Even if you're committing command buffers from multiple threads, the work should be spread roughly evenly:
device = devices[taskIndex % devices.count]
Also, make sure to avoid creating CIContexts for every operation; those are expensive, so you should keep a list of contexts (one per device) instead.
Note that if you're doing any of your own Metal work (as opposed to just Core Image filtering), you'll need to have a command queue for each device, and any resources you want to use will need to be allocated by their respective device (resources can't be shared by MTLDevices).

Windowed Rendering Different Adapter

I have a laptop with two adapters - An Intel and a NVIDIA. I'm running Windows 10 and there is no option in the Bios for turning off the embedded Intel adapter. I can specify to use the NVIDIA adapter for specific applications, or as the default for all Direct3D device creation. When I use the Intel adapter (which is the fixed adapter for the Windows Desktop) my 3D application in windowed mode works fine.
If I change the NVIDIA global setting to force the NVIDIA adapter for all Direct3D devices, or change my code to select the NVIDIA adapter, the code executes without any errors (I have DirectX Debug device attached) but nothing gets rendered in my window.
I believe that it is not possible to have a Windowed swapchain output attached to an Adapter that isn't the adapter used by the Windows desktop, but I have never seen this made explicit.
This means on a laptop using an embedded hardware adapter for the Windows desktop I cannot make use of the more powerful NVIDIA adapter in a Window and will have to use full-screen mode.
Can anyone confirm this, or suggest a device creation method that successfully allows me to address the second adapter in a Window ?
For clarity my device creation code is;
private static void initializeDirect3DGraphicsDevice(System.Windows.Forms.Control winFormsControl, out Device device, out SharpDX.DXGI.SwapChain sc)
{
SharpDX.DXGI.SwapChainDescription destination = new SharpDX.DXGI.SwapChainDescription()
{
BufferCount = 1,
ModeDescription = new SharpDX.DXGI.ModeDescription(
winFormsControl.ClientSize.Width,
winFormsControl.ClientSize.Height,
new SharpDX.DXGI.Rational(60, 1),
SharpDX.DXGI.Format.R8G8B8A8_UNorm),
IsWindowed = true,
OutputHandle = winFormsControl.Handle,
SampleDescription = new SharpDX.DXGI.SampleDescription(1, 0),
SwapEffect = SharpDX.DXGI.SwapEffect.Discard,
Usage = SharpDX.DXGI.Usage.RenderTargetOutput
};
using (SharpDX.DXGI.Factory1 factory = new SharpDX.DXGI.Factory1())
{
// Pick the adapter with teh best video memory allocation - this is the NVIDIA adapter
List<SharpDX.DXGI.Adapter> adapters = factory.Adapters.OrderBy(item => (long)item.Description.DedicatedVideoMemory).Reverse().ToList();
SharpDX.DXGI.Adapter bestAdapter = adapters.First();
foreach(SharpDX.DXGI.Output output in bestAdapter.Outputs)
{
System.Diagnostics.Debug.WriteLine("Adapter " + bestAdapter.Description.Description.Substring(0,20) + " output " + output.Description.DeviceName);
}
device = new Device(bestAdapter, DeviceCreationFlags.Debug);
// Uncomment the below to allow the NVIDIA control panel to select the adapter for me.
//device = new Device(SharpDX.Direct3D.DriverType.Hardware, DeviceCreationFlags.Debug);
sc = new SharpDX.DXGI.SwapChain(factory, device, destination);
factory.MakeWindowAssociation(winFormsControl.Handle, SharpDX.DXGI.WindowAssociationFlags.IgnoreAll);
System.Diagnostics.Debug.WriteLine("Device created with feature level " + device.FeatureLevel + " on adapter " + bestAdapter.Description.Description.Substring(0, 20));
System.Diagnostics.Debug.WriteLine("");
}
}
The proprietary technology NVIDIA uses to manage both an Intel Integrated device and a NVIDIA discrete part is known as Optimus. AMD has a simliar technology they call PowerXpress. They both play tricks with the default Direct3D device in the driver to control this behavior, which can be a bit strange to cope with as a developer.
The hardware solution for these 'hybrid graphics' devices deal with the issue of merging the scanout from both GPUs so the monitor is always attached to just a single device.
The user can always choose to force an application to use one or the other through the control panel, which is the best user experience. The problem is that the default is often not a good choice for games. The solution for Win32 classic desktop apps it to put a 'magic export' into your EXE that the NVIDIA/AMD software will use to pick a default for an application not in the database:
// Indicates to hybrid graphics systems to prefer the discrete part by default
extern "C"
{
__declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
__declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 1;
}
The other option is to not use the default adapter when creating the device and explicitly enumerate them. This should work, but means that the user no longer has a way to easily change which device is being used. For example enumeration code, see DeviceResources and the GetHardwareAdapter method in particular. The drivers mess around with the enumeration as I note above, so the 'magic export' is probably the best general solution.

How to request use of integrated GPU when using Metal API?

According to Apple documentation, when adding the value "YES" (or true) for key "NSSupportsAutomaticGraphicsSwitching" to the Info.plist file for an OSX app, the integrated GPU will be invoked on dual-GPU systems (as opposed to the discrete GPU). This is useful as the integrated GPU -- while less performant -- is adequate for my app's needs and consumes less energy.
Unfortunately, building as per above and subsequently inspecting the Activity Monitor (Energy tab: "Requires High Perf GPU" column) reveals that my Metal API-enabled app still uses the discrete GPU, despite requesting the integrated GPU.
Is there any way I can give a hint to the Metal system itself to use the integrated GPU?
The problem was that Metal API defaults to using the discrete GPU. Using the following code, along with the correct Info.plist configuration detailed above, results in the integrated GPU being used:
NSArray<id<MTLDevice>> *devices = MTLCopyAllDevices();
gpu_ = nil;
// Low power device is sufficient - try to use it!
for (id<MTLDevice> device in devices) {
if (device.isLowPower) {
gpu_ = device;
break;
}
}
// below: probably not necessary since there is always
// integrated GPU, but doesn't hurt.
if (gpu_ == nil)
gpu_ = MTLCreateSystemDefaultDevice();
If you're using an MTKView remember to pass gpu_ to the its initWithFrame:device: method.

OSG uses client-side GPU not host GPU

I made a simple OSG off screen renderer that renders without popping up a window.
osg::ref_ptr<osg::GraphicsContext::Traits> traits = new osg::GraphicsContext::Traits;
traits->x = 0;
traits->y = 0;
traits->width = screenWidth;
traits->height = screenHeight;
if (offScreen) {
traits->windowDecoration = false;
traits->doubleBuffer = true;
traits->pbuffer = true;
} else {
traits->windowDecoration = true;
traits->doubleBuffer = true;
traits->pbuffer = false;
}
traits->sharedContext = 0;
std::cout << "DisplayName : " << traints->displayName() << std::endl;
traits->readDISPLAY();
osg::GraphicsContext* _gc = osg::GraphicsContext::createGraphicsContext(traits.get());
if (!_gc) {
osg::notify(osg::NOTICE)<< "Failed to create pbuffer, failing back to normal graphics window." << std::endl;
traits->pbuffer = false;
_gc = osg::GraphicsContext::createGraphicsContext(traits.get());
}
However, if I ssh to server and run the application, it actually uses client GPU rather than server GPU. There are four GeForce GPUs on the server. I tried to change the DISPLAY to hostname:0.0 but it did not work.
What should I do to make the application use server GPU not client GPU in Linux?
First a little bit of nomenclauture: The system on which the display is connected is the server in X11. So you got your terminlogy reversed. Then to make use of the GPUs on the remote system for OpenGL rendering, the currently existing Linux driver model requires an X11 server to run (this is about to change with Wayland, but there's still a lot of work to be done, before it can be used). Essentially the driver loaded into the X server, hence you need that.
Of course an X server can not be accessed by any user. An XAuthority token is required (see the xauth manpage). Also if no monitors are connected, you may have to do extra configuration to convince the GPUs driver to not refuse starting. Also you probably want to disable the use of input devices.
Then with an X server running and the user which shall run the OSG program having got a XAuthority token you can run the OSG program. Yes, it is tedious, but ATM we're stuck with that.
I've done some search and for those who wind up in this question, I'll summarize what I find and I'll update specific commands that enables server side off-screen rendering.
and Yes, it is definitely possible.
Use VirtualGL to route all the commands back to server.
VirtualGL is a X11 specific API that capture OpenGL commands execute on the server-side GPU. However, this might change server-side OpenGL behavior so I would not recommend if other users use OpenGL at the same time.
Offscreen rendering using Mesa graphics library.
Mesa is an open-source implementation of the OpenGL specification - a system for rendering interactive 3D graphics.
A variety of device drivers allows Mesa to be used in many different environments ranging from software emulation to complete hardware acceleration for modern GPUs.
Mesa allows user to create GraphicsContext that resides on the server-side memory and allows off-screen rendering. link. I'll update some codes.

How do you retrieve stylus pressure information on windows?

Is anyone aware of a sane way to get tablet/stylus pressure information on Windows?
It's possible to distinguish stylus from mouse with ::GetMessageExtraInfo, but you can't get any more information beyond that. I also found the WinTab API in a out of the way corner of the Wacom site, but that's not part of windows as far as i can tell, and has a completely distinct event/messaging system from the message queue.
Given all I want is the most basic pressure information surely there is a standard Win32/COM API, is anyone aware of what it might be?
The current way to do this is to handle WM_POINTERnnn msgs.
Note this is for Win 8 and later.
Note you will get these msgs for touch AND pen, so you'll need to know the pointerType in order to test for pen. The WPARAM received by a WNDPROC for WM_POINTERnnnn msgs such a WM_POINTERUPDATE and other msgs contains the pointer id which you will need in order to request more info. Empirically I found that WM_POINTERUPDATE results in info that contains pressure data whereas if the pointer flags indicate down/up there is no pressure info.
const WORD wid = GET_POINTERID_WPARAM(wParam);
POINTER_INFO piTemp = {NULL};
GetPointerInfo(wid, &piTemp);
if (piTemp.pointerType == PT_PEN
{
UINT32 entries = 0;
UINT32 pointers = 0;
GetPointerFramePenInfoHistory(wid, &entries, &pointers, NULL); // how many
// TODO, allocate space needed for the info, process the data in a loop to retrieve it, test pointerInfo.pointerFlags for down/up/update.
}
Once you know you are dealing with pen, you can get the pressure info from the POINTER_PEN_INFO struct.
This is similar to handling touch although for touch you'd want gesture recognition and inertia. There is a Microsoft sample illustrating using these functions.
It's part of a Build talk:
https://channel9.msdn.com/Events/Build/2013/4-022
You need to use the Tablet PC Pen/Ink API. The COM version of the API lives in InkObj.dll. Here is a starting point for documentation: http://msdn.microsoft.com/en-us/library/ms700664.aspx
If I remember correctly, InkObj.dll is available on Windows XP SP2 and all later Windows client OSes, regardless of whether the machine is a Tablet PC.
UPDATE:
It's been a number of years since I initially provided this answer, but wintab has become the de facto standard, and Ntrig more or less folded, eventually building a wrapper to allow for the wintab API to be accessed via this digitizer.
(http://www.tabletpcbuzz.com/showthread.php?37547-N-trig-Posts-WinTAB-Support-Driver)
This is a pretty late response, but recently my wife and I purchased a Dell XT tablet PC, which as it turns out actually uses NTrig, a suite of interfaces that utilize Ink, the accepted new windows API that shipped with Windows XP Tablet edition, then SP 2 and all versions thereafter.
A lot of Wacom tablets and others use the Wintab API, which is not currently open nor really permitted to use. From what I hear the folks who maintain it are pretty sue-happy.
So it depends on what type of tablet you're using, and the drivers you have installed for it. In my biased opinion, you should work with Ink, as it provides (or at least through NTrig and Windows 7 WILL provide) multi-touch capability and will likely be the new standard for tablet interfaces. But as of now, NTrig devices do not translate their pressure and angle information to common Wintab-based applications, such as Photoshop or Corel Painter. The applications tend to require at least some support for Microsoft's Tablet API in order to function properly.
If using UWP Windows Runtime then it's quite straightforward. The PointerEventArgs event seems to have all necessary data.
Modified Core App (C++/WinRT) template project snippet from Visual Studio 2019:
void OnPointerMoved(IInspectable const &, PointerEventArgs const &args)
{
if (m_selected)
{
float2 const point = args.CurrentPoint().Position();
m_selected.Offset(
{
point.x + m_offset.x,
point.y + m_offset.y,
0.0f
});
// (new!) Change sprite color based on pen pressure and tilt
auto sprite = m_selected.as<SpriteVisual>();
auto const props = args.CurrentPoint().Properties();
auto const pressure = props.Pressure();
auto const orientation = props.Orientation() / 360.0f;
auto const tiltx = (props.XTilt() + 90) / 180.0f;
auto const tilty = (props.YTilt() + 90) / 180.0f;
Compositor compositor = m_visuals.Compositor();
sprite.Brush(compositor.CreateColorBrush({
(uint8_t)(pressure * 0xFF),
(uint8_t)(tiltx * 0xFF),
(uint8_t)(tilty * 0xFF),
(uint8_t)(orientation * 0xFF)
}));
}
}
Similar code will likely work in C#, JavaScript, etc.

Resources