What APIs do I need to use, and what precautions do I need to take, when writing to an IOSurface in an XPC process that is also being used as the backing store for an MTLTexture in the main application?
In my XPC service I have the following:
IOSurface *surface = ...;
CIRenderDestination *renderDestination = [... initWithIOSurface:surface];
// Send the IOSurface to the client using an NSXPCConnection.
// In the service, periodically write to the IOSurface.
In my application I have the following:
IOSurface *surface = // ... fetch IOSurface from NSXPConnection.
id<MTLTexture> texture = [device newTextureWithDescriptor:... iosurface:surface];
// The texture is used in a fragment shader (Read-only)
I have an MTKView that is running it's normal update loop. I want my XPC service to be able to periodically write to the IOSurface using Core Image and then have the new contents rendered by Metal on the app side.
What synchronization is needed to ensure this is done properly? A double or triple buffering strategy is one, but that doesn't really work for me because I might not have enough memory to allocate 2x or 3x the number of surfaces. (The example above uses one surface for clarity, but in reality I might have dozens of surfaces I'm drawing to. Each surface represents a tile of an image. An image can be as large as JPG/TIFF/etc allows.)
WWDC 2010-442 talks about IOSurface and briefly mentions that it all "just works", but that's in the context of OpenGL and doesn't mention Core Image or Metal.
I originally assumed that Core Image and/or Metal would be calling IOSurfaceLock() and IOSurfaceUnlock() to protect read/write access, but that doesn't appear to be the case at all. (And the comments in the header file for IOSurfaceRef.h suggest that the locking is only for CPU access.)
Can I really just let Core Image's CIRenderDestination write at-will to the IOSurface while I read from the corresponding MTLTexture in my application's update loop? If so, then how is that possible if, as the WWDC video states, all textures bound to an IOSurface share the same video memory? Surely I'd get some tearing of the surface's content if reading and writing occurred during the same pass.
The thing you need to do is ensure that the CoreImage drawing has completed in the XPC before the IOSurface is used to draw in the application. If you were using either OpenGL or Metal on both sides, you would either call glFlush() or [-MTLRenderCommandEncoder waitUntilScheduled]. I would assume that something in CoreImage is making one of those calls.
I can say that it will likely be obvious if that's not happening because you will get tearing or images that are half new rendering and half old rendering if things aren't properly synchronized. I've seen that happen when using IOSurfaces across XPCs.
One thing you can do is put some symbolic breakpoints on -waitUntilScheduled and -waitUntilCompleted and see if CI is calling them in your XPC (assuming the documentation doesn't explicitly tell you). There are other synchronization primitives in Metal, but I'm not very familiar with them. They may be useful as well. (It's my understanding that CI is all Metal under the hood now.)
Also, the IOSurface object has methods -incrementUseCount, -decrementUseCount, and -localUseCount. It might be worth checking those to see if CI sets them appropriately. (See <IOSurface/IOSurfaceObjC.h> for details.)
Related
In Windows World, a dedicated render thread would loop something similar to this:
void RenderThread()
{
while (!quit)
{
UpdateStates();
RenderToDirect3D();
// Can either present with no synchronisation,
// or synchronise after 1-4 vertical blanks.
// See docs for IDXGISwapChain::Present
PresentToSwapChain();
}
}
What is the equivalent in Cocoa with CAMetalLayer? All the examples deal with updates being done in the main thread, either using MTKView (with it's internal timer) or using CADisplayLink in the iOS examples.
I want to be in control of the whole render loop, rather than just receiving a callback at some non-specified interval (and ideally blocking for V-Sync if it's enabled).
At some level, you're going to be throttled by the availability of drawables. A CAMetalLayer has a fixed pool of drawables available, and calling nextDrawable will block the current thread until a drawable becomes available. This doesn't imply you have to call nextDrawable at the top of your render loop, though.
If you want to draw on your own schedule without getting blocked waiting on a drawable, render to an off-screen renderbuffer (i.e., a MTLTexture with dimensions matching your drawable size), and then blit from the most-recently-drawn texture to a drawable's texture and present on whatever cadence you prefer. This can be useful for getting frame timings, but every frame you draw and then don't display is wasted work. It also increases the risk of judder.
Your options are limited when it comes to getting callbacks that match the v-sync cadence. Your best is almost certainly a CVDisplayLink scheduled in the default and tracking run loop modes, though this has caveats.
You could use something like a counting semaphore in concert with a display link if you want to free-run without getting too far ahead.
If your application is able to maintain a real-time framerate, you'll normally be rendering a frame or two ahead of what's going on the glass, so you don't want to literally block on v-sync; you just want to inform the window server that you'd like presentation to match v-sync. On macOS, you do this by setting the layer's displaySyncEnabled to true (the default). Turning this off may cause tearing on certain displays.
At the point where you want to render to screen, you obtain the drawable from the layer by calling nextDrawable. You obtain the drawable's texture from its texture property. You use that texture to set up the render target (color attachment) of a MTLRenderPassDescriptor. For example:
id<CAMetalDrawable> drawable = layer.nextDrawable;
id<MTLTexture> texture = drawable.texture;
MTLRenderPassDescriptor *desc = [MTLRenderPassDescriptor renderPassDescriptor];
desc.colorAttachments[0].texture = texture;
From here, it's pretty similar to what you do in an MTKView's drawRect: method. You create a command buffer (if you don't already have one), create a render command encoder using the descriptor, encode drawing commands, end encoding, tell the command buffer to present the drawable (using a -presentDrawable:... method), and commit the command buffer. Whatever was drawn to the drawable's texture is what will end up on-screen when it's presented.
I agree with Warren that you probably don't really want to sync your loop with the display refresh. You want parallelism. You want the CPU to be working on the next frame while the GPU is rendering the most current frame (and the display is showing the last frame).
The fact that there's a limit on how many drawables may be in flight at once and that nextDrawable will block waiting for one will prevent your render loop from getting too far ahead. (You'll probably use some other synchronization before that, like for managing a small pool of buffers.) If you want only double-buffering and not triple-buffering, you can set the layer's maximumDrawableCount to 2 instead of its default value of 3.
TL;DR: In macOS 10.13, an MTLTexture has a maximum width and height of 16,384. What strategies can you use to be able to process and display images larger than 16,384 pixels using Metal?
In a photo viewer that I'm currently working on, I've moved most of the viewing of images into a Metal backed view that uses Core Image for doing any image adjustments. This is working really well but I've recently started testing against some really large images (panoramas) and I'm now hitting some limits that I'm not entirely sure how to workaround while remaining relatively performant.
My current environment looks like this:
Load and decode an image from from an NSURL into an IOSurface. This is done using either Image IO directly or a Core Image pipeline that renders into the IOSurface. The IOSurface is then passed from an XPC service back into the main app.
In the main app, a new MTLTexture is created that is backed by the IOSurface. Then, a CIImage is created from the MTLTexture and that CIImage is used throughout an image pipeline as the root "source image".
However, if I attempt to open an image larger that 16,384 pixels in one dimension, then I'm unable to create the original IOSurface 16,384 on my laptop. (13" MBP-TB 2016)
But even if I could create a larger IOSurface, then I'm still stuck with the same limit on the MTLTexture.
See: Apple's Metal Feature Table Set
I'm curious what strategies others would recommend to allow one to open large image files while still taking advantage of Core Image and Metal.
One attempt I've made is to just have the root source image be a CIImage that was created with a CGImageRef. However, there's a significant drop in performance between that arrangement and a CIImage backed by a texture for even smaller sized images.
Another idea I've had, but haven't yet explored, was to use CIImageProvider in some capacity but I'm not entirely sure how I'd go about "tiling" potentially several IOSurfaces or MTLTextures, and if that even makes sense or if it would be better to just allocate a single large buffer to read from. (Or perhaps use dispatch_data in some capacity?)
(macOS 10.13 or even 10.14 would be fine.)
Is there any way in Mac OS X to share an OpenGL framebuffer between processes? That is, I want to render to an off-screen target in one process and display it in another.
You can do this with DirectX (actually DXGI) in Windows by creating a surface (the DXGI equivalent of an OpenGL framebuffer) in shared mode, getting an opaque handle for that surface, passing that to another process via whatever means you like, then creating a surface in that other process, but passing in the existing handle. You use the surface as a render target in one process then and use it as a texture in the other to consume as you wish. And in fact the whole compositing Window system works like this from Vista onwards.
If this isn't possible I can of course get the contents of the framebuffer into system memory and use cross-process shared memory to get it to the target process, then upload it again from there, but that would be unnecessarily slow.
Depending on what you're really trying to do this sample code project may be what you want:
MultiGPUIOSurface sample code
It really depends upon the context of how you're using it.
Objects that may be shared between contexts include buffer objects,
program and shader objects, renderbuffer objects, sampler objects,
sync objects, and texture objects (except for the texture objects
named zero).
Some of these objects may contain views (alternate interpretations) of
part or all of the data store of another object. Examples are texture
buffer objects, which contain a view of a buffer object’s data store,
and texture views, which contain a view of another texture object’s
data store. Views act as references on the object whose data store is
viewed.
Objects which contain references to other objects include framebuffer,
program pipeline, query, transform feedback, and vertex array objects.
Such objects are called container objects and are not shared.
Chapter 5 / OpenGL-4.4 core specification
The reason you can do those things on Windows and not OS X is that graphics obviously utilizes an API that allows DirectX contexts to be shared between those processes. If OS X doesn't have the capability within the OpenGL API then you're going to have to come up with your own solution. Take a look at OpenGL Programming Guide for Mac, there's a small section that describes using multiple OpenGL contexts.
I am trying to modify the Apple MultiGPUIOSurface sample (specifically the file http://developer.apple.com/library/mac/#samplecode/MultiGPUIOSurface/Listings/ServerOpenGLView_m.html) so that the server side will render to an IOSurface, without the need for a NSOpenGLView.
My modified version of that is at: http://pastebin.com/z3r715jJ
The difference in my approach is I'm rendering to the IOSurface based on a timer, and not in drawRect. I also am not using the NSOpenGlView's context.
The problem is that I see a corrupt view of the IOSurface in the client application. However if I set the NSOpenGLView's context to the one I created, or use the context from the NSOpenGLView, it works. This leads me to think that the NSOpenGLView is doing something extra that I also need to do, but I'm not sure what.
Found a solution (though I don't understand why): Create a pixelbuffer.
I found some discussion about offscreen buffers and the need for a drawable (http://www.mentby.com/Group/mac-opengl/opengl-offscreen-rendering-without-a-window.html)
Anyways, my fix was adding the lines:
NSOpenGLPixelBuffer* pbuf = [[NSOpenGLPixelBuffer alloc] initWithTextureTarget:GL_TEXTURE_RECTANGLE_EXT textureInternalFormat:GL_RGBA textureMaxMipMapLevel:0 pixelsWide:512 pixelsHigh:512];
[_nsContext setPixelBuffer:pbuf cubeMapFace:0 mipMapLevel:0 currentVirtualScreen:[_nsContext currentVirtualScreen]];
I know how to do it the other way around. But how can I create a CIImage from a texture, without having to copy into CPU memory? [CIImage imageWithData]? CVOpenGLESTextureCache?
Unfortunately, I don't think there's any way to avoid having to read back pixel data using glReadPixels(). All of the inputs for a CIImage (data, CGImageRef, CVPixelBufferRef) are CPU-side, so I don't see a fast path to deliver that to a CIImage. It looks like your best alternative there would be to use glReadPixels() to pull in the raw RGBA data from your texture and send it into the CIImage using -initWithData:options: and an kCIFormatRGBA8 pixel format. (Update: 3/14/2012) On iOS 5.0, there is now a faster way to grab OpenGL ES frame data, using the new texture caches. I describe this in detail in this answer.
However, there might be another way to achieve what you want. If you simply want to apply filters on a texture for output to the screen, you might be able to use my GPUImage framework to do the processing. It already uses OpenGL ES 2.0 as the core of its rendering pipeline, with textures as the way that frames of images or video are passed from one filter to the next. It's also much faster than Core Image, in my benchmarks.
You can supply your texture as an input here, so that it never has to touch the CPU. I don't have a stock class for grabbing raw textures from OpenGL ES yet, but you can modify the code for one of the existing GPUImageOutput subclasses to use this as a source fairly easily. You can then chain filters on to that, and direct the output to the screen or to a still image. At some point, I'll add a class for this kind of data source, but the project's still fairly new.
As of iOS 6, you can use a built-in init method for this situation:
initWithTexture:size:flipped:colorSpace:
See the docs:
http://developer.apple.com/library/ios/#DOCUMENTATION/GraphicsImaging/Reference/QuartzCoreFramework/Classes/CIImage_Class/Reference/Reference.html
You might find these helpful:
https://developer.apple.com/library/ios/#samplecode/RosyWriter/Introduction/Intro.html
https://developer.apple.com/library/ios/#samplecode/GLCameraRipple/Listings/GLCameraRipple_RippleViewController_m.html
In general I think the image data will need to be copied from the GPU to the CPU. However the iOS features mentioned above might make this easier and more efficient.