Windows GPUView - windows-7

Windows GPUView - windows-7

I am hitting low FPS on one of the application that I am working on. I found that GPUView can be used for debugging graphics performance issues. I have collected Merged.etl file for use case. This shows FPS chart for my application. I am trying to understand co-relation between this chart and GPU Hardware Queue and CPU Context Queue. Basically I want to know how this FPS chart has been derived? If there is any event that can trace this information I am thinking of adding real time tracing of this event so that I can display the FPS as widget while I run my application, something similar to http://blogs.msdn.com/b/jgoldb/archive/2008/09/26/etw-event-tracing-in-wpf.aspx

look for D3D9-Present, and DX-Flip for DX1X. span around for significant amount of time say 1 second. Press Ctrl+E, which will show event viewer, from that, select D3D9-Present or DX-Flip events. Left pane in the event viewer will give you number of these events and you already have the time span you have Zoomed to. I think dividing the number of events by the time would give you average fps for that duration. How gpu view does it, might be running average or instantaneous (1/ deltaT), where deltaT may be time between two events.
Can you try this little math and post some results for reader :)

The FPS chart is derived from the event "D3D9 - Present"(maybe other similar name for d3d10/11)
the "D3D9 - Present" event is produced when the D3D runtime calls into the user mode graphic driver's interface Present
The FPS chat in GPUview only consider the counter of Present called, but for some application, like windows media player, it will call Present twice for one frame in DWM off for tearing free, first one to present top half, and second one to present bottom half. In this case, the FPS in GPUview actually is double of real FPS. To get real FPS, you also need look the "DX - blt" event, if there are "Present Blt" information in "DX - Blt" event, it indicates the "DX-blt" is for "Present", and then look the Source Rect and Dest Rect information in this event to determine whether the Present is for a full frame or a part of frame.

Related

How to optimize Raster (Rasterizer Thread) as seen in the Chrome Dev Tools Performance tab?

I'm trying to understand why these Raster processes have such a long duration, but I'm finding the documentation to be sparse.
From other people's questions, I thought it might be related to the images being painted, or javascript listeners, or elements being repainted due to suboptimal CSS transitions but removing the images, javascript, and CSS transitions didn't do the trick.
Would someone be able to point me in the right direction? How do I narrow down which elements or scripts are causing this long process? It's been two days and I'm making no headway.
Thanks!

The "Raster" section represents all activities related to painting. Any HTML page, after all, is an "image". The browser converts your DOM and CSS to the image to display it on a screen. You can read about it here. So even if you don't have any image on the page you still would see as a minimum one rasterizer thread in "Raster" which represents converting your HTML page to the "image".
By the way, Chrome(79.0.3945.79) provides some information if an image was initiated this thread.
Also, you can enable "Advanced paint instrumentation" in "Performance" settings to see in more details what is going on when the browser renders an image

After spending some hours with the same, I believe that the 4 ugly green rectangles called "Rasterize paint" are a bug in the profiler DISPLAY. My suspicion based on:
1) The rectangles start some senconds after the profiler started. NOT after the page loaded, so it seems it is bound to the profiler, not to the page.
2) The starting point of the rectangles depends on the size of the profiling timeframe. If I capture 3 seconds it starts after ~2 secs, if I capture 30 seconds it starts after ~20 secs. So the "cpu load increase" depends on the time you press the stop button.
3) If I enable "Advanced paint instrumentation" as maksimr suggested, I can click on the rectangle to see the details, and the details show ~0.4 ms events in the "Paint profiler", just like before the death rectangles started. (see screenshot, bottom right part)
3b) I even can click on different parts of the same rectangle, resulting different ~0.4 ms long events in the Paint profiler...

Limit the frame rate on an aframe project

I am developing an aframe project on my MacBook pro, late 2013. When running the project, the fan of my computer always spins fast, regardless which browser I use (firefox, safari, chrome) and the project size (also happens with a project just containing a simple a-box).
aframe-stats show me that my project (1028244 vertices, 342748 faces) still runs with 20 fps.
Is it somehow possible to limit the frame rate to 10fps in order to keep my computer quite? Or any other way to limit the flop-consumption of the aframe project? I already tried a native approach with sudo cputhrottle plugin-container 10 but that did not just throttle the aframe-renderer but the whole firefox browser. Can I pull the break somewhere in the JavaScript or the Browser settings?

It's difficult to say without your project code. Large data sets will simply crank out even a high spec macbook pro. I have found it helpful to pause any rendering whenever possible to quiet the users' machines.
I personally removed automated next animation frame rendering in favor of waiting for controls and objects to change.
For example:
this.controls.addEventListener( 'change', function(e){ addToRenderStack(); });
A simple function addtorenderstack puts in a new value in a list for a render, with the expectation that the render will occur at some point in the future and not right away. the list can also be used to log who requested the render in the call stack, and narrow down performance hogs.
addtorenderstack places a render request in a list. In the requestanimationframe loop, if the list has any length, a render is called on the scene. The stack is immediately cleared rather than processed one by one. If controls or animations continue to make render requests, the list will have a length again and request animationframe will process them in the same way with another render.
In this way, the code only renders when absolutely required. This saved me much grinding on framerate and the fans only come on during intensive operations and then shutdown when its complete, much like a typical 3d game experience.
Your mileage may vary depending on what's happening in your app. I work in engineering so often the view of the 3d world is stopped as an engineer examines or shows a model.

Calling functions only after drawing is completed

I am making a drawing on NSView using a timer that is set to update every .02 seconds. On update some physical simulation makes a step, and then Canvas!.needsDisplay = true. It works when app is in foreground (usually), but when some lags happen, simulation progresses forward despite the fact that view hasn't reflected it yet. How do I pause the timer during these times to make simulation happen only when NSView can show it? I do not want to call step_over from inside drawRect, cause it seems like a bad idea, because then it would be harder to stop the simulation.

Generally this kind of update should be done the other way around, by letting the display ask you for frames as it can display them. This is done with a CADisplayLink CVDisplayLink (this is Mac; CADisplayLink is iOS). Configure it with a method you want to be called when a frame can be drawn.
Generally you do want your simulation to keep moving forward, even if it means dropping frames occasionally. For that, you check the timestamp and use that to work out what time to use for your new frame. But if you only want to move forward when the display can show it, then just update once per call.
Note that generating at 50fps is often going to mismatch the system that's trying to draw at 60fps, so you're going to wind up missing frames occasionally. That's one of several reasons not to try to push drawing with a timer.
See also Alternative of CADisplayLink for Mac OS X. Note that trying to draw at 50fps with Core Graphics usually isn't going to give good results in any case. The right tool here in OS X is Core Animation (or SpriteKit for games on 10.10, or OpenGL for more advanced high-speed rendering). You can do very basic animations with an NSTimer (and we did for years before Core Animation came along), but it's not really a tool for complex drawing.

measure elapsed time of image loading in Corona SDK

I'm trying to benchmark the loading of large images in Corona SDK.
local startTime = system.getTimer()
local myImage = display.newImageRect( "someImage.jpg", 1024, 768 )
local endTime = system.getTimer()
print( endTime - startTime ) -- prints 8.4319999999998
This returns values of around 8 ms. I know it takes longer to load an display an image because if it really took 8 ms I wouldn't notice the delay, but I do notice it. I'd say it takes about 300 ms.
Also the FPS drop drastically when loading a large image. I'm monitoring this using an enterFrame event and when loading the image it prints values of around 0.3 for 1 frame.
Runtime:addEventListener( "enterFrame", myListener )
function onEnterFrame (event)
print( display.fps )
end
The frame takes a long time to render when loading, even when the loading of the image takes less than 1/60 of a second. I guess it means the rendering is happening asynchronously somewhere else.
So, how can I measure the time it takes to really load and display an image?

Since Corona SDK is closed source, we'll have to use the docs and imagination.
I see three possibilities here:
Corona is doing what it says, and your subjective experience is wrong.
Corona is loading the images in a background thread, so the call to display.newImageRect is non-blocking: it "starts" loading the image, and then continues. When this happens in other SDKs (mostly javascript-based ones) you get a "ready callback" that you can use on the image object, but I could not find such thing in the docs.
Corona loads the image quickly, but requires "extra work afterwards". For example, it generates lots of garbage which has to be garbage-collected. So the image gets loaded fast, but then this "extra work" slows down the app.
My bet is on 3. But this doesn't really matter. Independently of which one of these options is causing the slowdowns, they can be solved the same way: instead of loading the images right before you draw them, you have to preload them.
I don't use Corona SDK, but a quick google pointed me to the storyboard module, in particular to storyboard.loadScene.
Create a new scene, list all the images that you need on it, and load it before showing it - that way image loading will be done in advance, not slowing down your app.

Most likely the image is rendered during the scene's rendering loop. There is no event to indicate that an image has been rendered. However if you create the display object in the scene's create event handler or a button click handler, and register an enterFrame event handler, you can measure the time between that and the first frame event. I can't try this here but my guess is this will give you an estimate of the time to render the image. Dont use FPS. Larger image will probably give you a larger measurement. If you measure the time between enterFrame events you will probably find that it is much smaller than the time between create/click event and the first frame event, or between the first two frame events after the create/click event. Post a comment if you would like to see some example code.

How does software like GotoMeeting capture an image of the desktop?

I was wondering how do software like GotoMeeting capture desktop. I can do a full screen (or block by block) capture using GDI but that just seems too wasteful to me. Also I have looked into Mirror devices but I was wondering if there's a simpler technique or a library out there which does this.
I need fast and efficient desktop screen capture (10p15 fps) which I am eventually going to convert into a video file and integrate with my application to send the captured feed over the network or something.
Thanks!

Yes, taking a screen capture and finding the diff between previous capture would be a good way to reduce bandwidth on transmission by sending the changes across only, of course this is similar to video encoding techniques which does this block by block.
Still means you need to do a capture plus extra processing for getting the difference, i.e, encoding it.

by using the Mirror devices you can get both the updated Rectangle that are change and also pointer to the Screen. updated Rectangle pointer point to all the rectangle that are change , these rectangle are the change rectangle that are frequently changing. filter out some of the rectangle because in one second you can get 1000 of rectangles.

I would either:
Do full screen captures, and then
perform image processing to isolate
parts of the screen that have changed
to save bandwidth.
-OR-
Use a program like CamStudio.

i got 20 to 30 frame per second using memory driver i am display on my picture box but when i get full screen update then these frame are buffered. as picture box is slow and have change to my own component this is some how fast but not good on full screen as well averge i display 10 fps in full screen. i am facing problem in rendering frames i can capture 20 to 30 frames per second but my rendering is 8 to 10 full screen frames in one second . if any one has achive rendering frames in full screen please replay me.

What language?
.NET provides Graphics.CopyFromScreen.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Windows GPUView - windows-7

Related

How to optimize Raster (Rasterizer Thread) as seen in the Chrome Dev Tools Performance tab?

Limit the frame rate on an aframe project

Calling functions only after drawing is completed

measure elapsed time of image loading in Corona SDK

How does software like GotoMeeting capture an image of the desktop?

Categories

Resources