I'm trying to benchmark the loading of large images in Corona SDK.
local startTime = system.getTimer()
local myImage = display.newImageRect( "someImage.jpg", 1024, 768 )
local endTime = system.getTimer()
print( endTime - startTime ) -- prints 8.4319999999998
This returns values of around 8 ms. I know it takes longer to load an display an image because if it really took 8 ms I wouldn't notice the delay, but I do notice it. I'd say it takes about 300 ms.
Also the FPS drop drastically when loading a large image. I'm monitoring this using an enterFrame event and when loading the image it prints values of around 0.3 for 1 frame.
Runtime:addEventListener( "enterFrame", myListener )
function onEnterFrame (event)
print( display.fps )
end
The frame takes a long time to render when loading, even when the loading of the image takes less than 1/60 of a second. I guess it means the rendering is happening asynchronously somewhere else.
So, how can I measure the time it takes to really load and display an image?
Since Corona SDK is closed source, we'll have to use the docs and imagination.
I see three possibilities here:
Corona is doing what it says, and your subjective experience is wrong.
Corona is loading the images in a background thread, so the call to display.newImageRect is non-blocking: it "starts" loading the image, and then continues. When this happens in other SDKs (mostly javascript-based ones) you get a "ready callback" that you can use on the image object, but I could not find such thing in the docs.
Corona loads the image quickly, but requires "extra work afterwards". For example, it generates lots of garbage which has to be garbage-collected. So the image gets loaded fast, but then this "extra work" slows down the app.
My bet is on 3. But this doesn't really matter. Independently of which one of these options is causing the slowdowns, they can be solved the same way: instead of loading the images right before you draw them, you have to preload them.
I don't use Corona SDK, but a quick google pointed me to the storyboard module, in particular to storyboard.loadScene.
Create a new scene, list all the images that you need on it, and load it before showing it - that way image loading will be done in advance, not slowing down your app.
Most likely the image is rendered during the scene's rendering loop. There is no event to indicate that an image has been rendered. However if you create the display object in the scene's create event handler or a button click handler, and register an enterFrame event handler, you can measure the time between that and the first frame event. I can't try this here but my guess is this will give you an estimate of the time to render the image. Dont use FPS. Larger image will probably give you a larger measurement. If you measure the time between enterFrame events you will probably find that it is much smaller than the time between create/click event and the first frame event, or between the first two frame events after the create/click event. Post a comment if you would like to see some example code.
Related
I am creating a simple photo catalogue application for macOS to see whether the latest APIs can significantly improve performance of loading directories with large numbers of images.
So far it looks pretty promising and loading around 600 45MB RAW image thumbnails using QLThumbnailGenerator and CGImageSourceCreateWithURL is super fast allowing thumbnail images and image metadata to be displayed almost instantly.
Displaying these images in a NSCollectionView using a CALayer in the NSCollectionViewItem's view also appears to be extremely fast and scrolling is very smooth.
I did find that QLThumbnailGeneratorseems to start failing after a few hundred images and starts returning error code 108 if I call the api in a continuous loop - I fixed that by calling CGImageSourceCopyPropertiesAtIndex immediately after the thumbnail generator api call - so maybe there is a timing issue or not enough file handles or something if the api is called to quickly and for too long.
However I am still having trouble rendering a full sized image to the display - here I am using a NSScrollView with a layer backed NSView documentView. Everything is super fast until the following call:
view.layer.contents = cgImage
And at this point the entire main thread hangs until the image has loaded - and this may take a few seconds.
Once it has loaded it's fine and zooming in and out by changing the documentView frame size is very fast - scrolling around the full size image is also super smooth without any of the typical hiccups.
Is there a way of loading these images without causing the UI to freeze ?
I've seen the recent WWDC2020 session where they demonstrate similar scrolling of large numbers of images but I haven't been able to find anything useful on loading large images other than CATiledLayer - but it's not really clear if that is the right answer for this problem.
The old Apple sample RawExpose seemed to be an option but most of that code is deprecated and it seems one has to use MetalKit not instead of GLKit - unfortunately there is no example of using MetaKit with Core Image that I can find.
FYI - I tried using some the new SwiftUI CollectionView and List but they seem to be significantly slower than AppKit and I found some of the collection view items never render - of course these could just be bugs in the macOS 11 beta.
OK - well I finally figured it out and it's complicated but simple. It's complicated because there are so many options to choose from and so many outdated sample apps to look at. In any event I think I have solved most if not all the issues related to using metal backed CALayers and rendering realtime updates of the images as CIFilter adjustments are applied. There are many pieces to the puzzle and happy to share if anyone is looking for help.
Some key pointers:
I am using CAMetalLayer and NSView
I override the CAMetalLayer.display(layer:) method and call the layer.setNeedsDisplay() when the user slides an adjustment slider.
I chain together all the CIFilters, including the RAW filter created with CIFilter(imageUrl:)
Most importantly I use the RAW filters scaleFactor parameter to size the image - encountered major performance issues using any other method to resize the image for the views size
Don't expect high performance if the image is zoomed right in - 50% is seems to be the limit for 45megapixel RAW imaged from Nikon D850.
A short video of the result is here https://youtu.be/5wp0CIWAoIM
I'm trying to understand why these Raster processes have such a long duration, but I'm finding the documentation to be sparse.
From other people's questions, I thought it might be related to the images being painted, or javascript listeners, or elements being repainted due to suboptimal CSS transitions but removing the images, javascript, and CSS transitions didn't do the trick.
Would someone be able to point me in the right direction? How do I narrow down which elements or scripts are causing this long process? It's been two days and I'm making no headway.
Thanks!
The "Raster" section represents all activities related to painting. Any HTML page, after all, is an "image". The browser converts your DOM and CSS to the image to display it on a screen. You can read about it here. So even if you don't have any image on the page you still would see as a minimum one rasterizer thread in "Raster" which represents converting your HTML page to the "image".
By the way, Chrome(79.0.3945.79) provides some information if an image was initiated this thread.
Also, you can enable "Advanced paint instrumentation" in "Performance" settings to see in more details what is going on when the browser renders an image
After spending some hours with the same, I believe that the 4 ugly green rectangles called "Rasterize paint" are a bug in the profiler DISPLAY. My suspicion based on:
1) The rectangles start some senconds after the profiler started. NOT after the page loaded, so it seems it is bound to the profiler, not to the page.
2) The starting point of the rectangles depends on the size of the profiling timeframe. If I capture 3 seconds it starts after ~2 secs, if I capture 30 seconds it starts after ~20 secs. So the "cpu load increase" depends on the time you press the stop button.
3) If I enable "Advanced paint instrumentation" as maksimr suggested, I can click on the rectangle to see the details, and the details show ~0.4 ms events in the "Paint profiler", just like before the death rectangles started. (see screenshot, bottom right part)
3b) I even can click on different parts of the same rectangle, resulting different ~0.4 ms long events in the Paint profiler...
In Windows World, a dedicated render thread would loop something similar to this:
void RenderThread()
{
while (!quit)
{
UpdateStates();
RenderToDirect3D();
// Can either present with no synchronisation,
// or synchronise after 1-4 vertical blanks.
// See docs for IDXGISwapChain::Present
PresentToSwapChain();
}
}
What is the equivalent in Cocoa with CAMetalLayer? All the examples deal with updates being done in the main thread, either using MTKView (with it's internal timer) or using CADisplayLink in the iOS examples.
I want to be in control of the whole render loop, rather than just receiving a callback at some non-specified interval (and ideally blocking for V-Sync if it's enabled).
At some level, you're going to be throttled by the availability of drawables. A CAMetalLayer has a fixed pool of drawables available, and calling nextDrawable will block the current thread until a drawable becomes available. This doesn't imply you have to call nextDrawable at the top of your render loop, though.
If you want to draw on your own schedule without getting blocked waiting on a drawable, render to an off-screen renderbuffer (i.e., a MTLTexture with dimensions matching your drawable size), and then blit from the most-recently-drawn texture to a drawable's texture and present on whatever cadence you prefer. This can be useful for getting frame timings, but every frame you draw and then don't display is wasted work. It also increases the risk of judder.
Your options are limited when it comes to getting callbacks that match the v-sync cadence. Your best is almost certainly a CVDisplayLink scheduled in the default and tracking run loop modes, though this has caveats.
You could use something like a counting semaphore in concert with a display link if you want to free-run without getting too far ahead.
If your application is able to maintain a real-time framerate, you'll normally be rendering a frame or two ahead of what's going on the glass, so you don't want to literally block on v-sync; you just want to inform the window server that you'd like presentation to match v-sync. On macOS, you do this by setting the layer's displaySyncEnabled to true (the default). Turning this off may cause tearing on certain displays.
At the point where you want to render to screen, you obtain the drawable from the layer by calling nextDrawable. You obtain the drawable's texture from its texture property. You use that texture to set up the render target (color attachment) of a MTLRenderPassDescriptor. For example:
id<CAMetalDrawable> drawable = layer.nextDrawable;
id<MTLTexture> texture = drawable.texture;
MTLRenderPassDescriptor *desc = [MTLRenderPassDescriptor renderPassDescriptor];
desc.colorAttachments[0].texture = texture;
From here, it's pretty similar to what you do in an MTKView's drawRect: method. You create a command buffer (if you don't already have one), create a render command encoder using the descriptor, encode drawing commands, end encoding, tell the command buffer to present the drawable (using a -presentDrawable:... method), and commit the command buffer. Whatever was drawn to the drawable's texture is what will end up on-screen when it's presented.
I agree with Warren that you probably don't really want to sync your loop with the display refresh. You want parallelism. You want the CPU to be working on the next frame while the GPU is rendering the most current frame (and the display is showing the last frame).
The fact that there's a limit on how many drawables may be in flight at once and that nextDrawable will block waiting for one will prevent your render loop from getting too far ahead. (You'll probably use some other synchronization before that, like for managing a small pool of buffers.) If you want only double-buffering and not triple-buffering, you can set the layer's maximumDrawableCount to 2 instead of its default value of 3.
I am developing an aframe project on my MacBook pro, late 2013. When running the project, the fan of my computer always spins fast, regardless which browser I use (firefox, safari, chrome) and the project size (also happens with a project just containing a simple a-box).
aframe-stats show me that my project (1028244 vertices, 342748 faces) still runs with 20 fps.
Is it somehow possible to limit the frame rate to 10fps in order to keep my computer quite? Or any other way to limit the flop-consumption of the aframe project? I already tried a native approach with sudo cputhrottle plugin-container 10 but that did not just throttle the aframe-renderer but the whole firefox browser. Can I pull the break somewhere in the JavaScript or the Browser settings?
It's difficult to say without your project code. Large data sets will simply crank out even a high spec macbook pro. I have found it helpful to pause any rendering whenever possible to quiet the users' machines.
I personally removed automated next animation frame rendering in favor of waiting for controls and objects to change.
For example:
this.controls.addEventListener( 'change', function(e){ addToRenderStack(); });
A simple function addtorenderstack puts in a new value in a list for a render, with the expectation that the render will occur at some point in the future and not right away. the list can also be used to log who requested the render in the call stack, and narrow down performance hogs.
addtorenderstack places a render request in a list. In the requestanimationframe loop, if the list has any length, a render is called on the scene. The stack is immediately cleared rather than processed one by one. If controls or animations continue to make render requests, the list will have a length again and request animationframe will process them in the same way with another render.
In this way, the code only renders when absolutely required. This saved me much grinding on framerate and the fans only come on during intensive operations and then shutdown when its complete, much like a typical 3d game experience.
Your mileage may vary depending on what's happening in your app. I work in engineering so often the view of the 3d world is stopped as an engineer examines or shows a model.
I am hitting low FPS on one of the application that I am working on. I found that GPUView can be used for debugging graphics performance issues. I have collected Merged.etl file for use case. This shows FPS chart for my application. I am trying to understand co-relation between this chart and GPU Hardware Queue and CPU Context Queue. Basically I want to know how this FPS chart has been derived? If there is any event that can trace this information I am thinking of adding real time tracing of this event so that I can display the FPS as widget while I run my application, something similar to http://blogs.msdn.com/b/jgoldb/archive/2008/09/26/etw-event-tracing-in-wpf.aspx
look for D3D9-Present, and DX-Flip for DX1X. span around for significant amount of time say 1 second. Press Ctrl+E, which will show event viewer, from that, select D3D9-Present or DX-Flip events. Left pane in the event viewer will give you number of these events and you already have the time span you have Zoomed to. I think dividing the number of events by the time would give you average fps for that duration. How gpu view does it, might be running average or instantaneous (1/ deltaT), where deltaT may be time between two events.
Can you try this little math and post some results for reader :)
The FPS chart is derived from the event "D3D9 - Present"(maybe other similar name for d3d10/11)
the "D3D9 - Present" event is produced when the D3D runtime calls into the user mode graphic driver's interface Present
The FPS chat in GPUview only consider the counter of Present called, but for some application, like windows media player, it will call Present twice for one frame in DWM off for tearing free, first one to present top half, and second one to present bottom half. In this case, the FPS in GPUview actually is double of real FPS. To get real FPS, you also need look the "DX - blt" event, if there are "Present Blt" information in "DX - Blt" event, it indicates the "DX-blt" is for "Present", and then look the Source Rect and Dest Rect information in this event to determine whether the Present is for a full frame or a part of frame.