Chrome WebGL performance wildly inconsistent? - performance

function render(time, scene) {
if (useFramebuffer) {
gl.bindFramebuffer(gl.FRAMEBUFFER, scene.fb);
}
gl.viewport(0.0, 0.0, canvas.width, canvas.height);
gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
gl.enable(gl.DEPTH_TEST);
renderScene(scene);
gl.disable(gl.DEPTH_TEST);
if (useFramebuffer) {
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
copyFBtoBackBuffer(scene.fb);
}
window.requestAnimationFrame(function(time) {
render(time, scene);
});
}
I'm not able to share the exact code I use, but a mockup will illustrate my point.
I'm rendering a fairly complex scene and am also doing some ray tracing in WebGL. I've noticed two very strange performance issues.
1) Inconsistent frame rate between runs.
Sometimes, when the page starts the first ~100 frames render in 25ms, then it suddenly drops to 45ms, without any user input or changes to the scene. I'm not updating any buffer or texture data in a frame, only shader uniforms. When this happens the GPU memory stays constant.
2) Rendering to the default framebuffer is slower than using an extra pass.
If I render to a created frambuffer and then blit to the HTML canvas (the default framebuffer), I get 10% performance increase. So in the code snippet, if useFramebuffer == true performance is gained, which seems very counter intuitive.
Edit 1:
Due to changes in requirements, the scene will always be rendered to a framebuffer and then copied to the canvas. This makes question 2) a non-issue.
Edit 2:
System specs of the PC this was tested on:
OS: Win 10
CPU: Intel i7-7700
Nvidia GTX 1080
RAM: 16 GB
Edit 3:
I profiled the scene using chrome:tracing. The first ~100-200 frames render 16.6ms.
Then it starts dropping frames.
I'll try profiling everything with timer queries, but I'm afraid that each render actually takes the same amount of time, and buffer swap will randomly take twice as long.
Another thing I noticed is that this starts happening when I use Chrome for a while. When the problems start, clearing the browser cache or killing the Chrome process don't help, only a system reboot does.
Is it possible that Chrome is throttling the GPU on a whim?
P.S.
The frame times changed because of some optimizations, but the core problem persists.

Related

WebGL scene does't render because of lost context

I have a 3d model of a road with textures and without.
When I load road without textures everything works fine ~ 60fps. But when I load road with textures there are 2 variants:
1) If 3D model is not big then it loads and works but with very low fps ~ 10-20
2) If 3D model is big it loads without any errors and warnings but after that I get this error:
WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost
THREE.WebGLShader: gl.getShaderInfoLog() null
This error here:
animate = function() {
renderer.render(scene, camera); <--- error occurs here
stats.update();
controls.update(clock.getDelta());
return requestAnimationFrame(animate);
};
I've read that this error means: 'browser or the OS decides to reset the GPU to get control back' but how can I solve this problem?
1. The reason
It's exactly happening what you already said.
Your browser is hanging while rendering the WebGL scene and loses its context because the browser has effectively lost control on your GPU.
Either you have a huge memory leak in your application or your machine does not have enough power to run your texturized model on a WebGL scene. Squeezing too much by trying to render a heavy object with a really big texture resolution, can lead to a context loss.
2. Diagnosis
If 3D model is not big then it loads and works but with very low fps ~ 10-20
makes me think your machine actually can't handle 3D on a browser very well.
3. Troubleshooting
The first advice is to decrease the resolution of your scene, you can do this halving by 2 or 3 the setSize of your renderer object.
For performance intensive games, you can also give setSize smaller values, like window.innerWidth/2 and window.innerHeight/2, for half the resolution. This does not mean that the game will only fill half the window, but rather look a bit blurry and scaled up.
This means you need to add this to your renderer:
renderer.setSize( window.innerWidth/2, window.innerHeight/2 );
Also tweaking the far distance rendering of your camera helps to gain some performance too. Generally those are the most used values: new THREE.PerspectiveCamera( 75, window.innerWidth / window.innerHeight, 0.1, 1000 );, if you bring down the far to 800 or 700 you can squeeze extra FPS from your scene (at the price of rendering distance, of course.)
If your application after these tweaks then start running fine then you're actually facing resource-starving problems, which means your computer is not fit to run WebGL or you have a huge memory leak
You can also test your application on another, better computer and see how it performs and how smooth it runs.
If your computer is cutting edge high-end monster gaming machine then the only thing I can suggest you is to start also looking at the resolution of your texture and scale it a bit down.
I'll leave also this link: WebGL - HandlingContextLost (probably you have already seen this), which provides some troubleshooting and ways to recover a crashed instance of WebGL.
4. Solution (to this specific answer)
After a quick chat with Eugene, the problem at the root of his project was Ram usage, his 3D model wasted a lot of Ram that Chrome was taking up to 1.6GB.
The blue bumps are when his WebGL app is running.
After flagging this to him he came back with his solution:
I’ve solved my problem. I’ve concatenated roads to one file and changed renderer size

No smooth animation for processing sketch, yet normal GPU/CPU load and framerate

I'm working on the visualizations of an interactive installation as seen here: http://vimeo.com/78977964. But I'm running into some issues with the smoothness of the animation. While it tells me it runs on a steady 30 or 60 fps, the actual image is not smooth at all; imagine a 15fps animation with an unsteady clock. Can you guys give me some pointers on where to look in optimizing my sketch?
What I'm doing is receiving relative coordinates (0.-1. on x and y axis) through oscP5. This goes through a data handler to check if there hasn't been input in that area for x amount of time. If all is ok, a new Wave object is created, which will draw an expanding (modulated) circle on its location. As the installation had to be very flexible, all visual parameters are adjustable through a controlP5 GUI.
All of this is running on a computer with i7 3770 3.4Ghz,8 GB RAm and two Radeon HD7700's to drive 4 to 10 Panasonic EX600 XGA projectors over VGA (simply drawing a 3072x1536 window). The CPU and GPU load is reasonable ( http://imgur.com/a/usNVC ) but the performance is not what we want it to be.
We tried a number of solutions including: changing rendering mode; trying a different GPU; different drawing methods; changing process priority; exporting to application; etc. But nothing seemed to make a noticeable improvement. So now I'm guessing its either just processing/java not being able to run smoothly over multiple monitors or something is causing this in my code...
How I draw the waves within the wave class (this is called from the main draw loop for every wave object)
public void draw(){
this.diameter = map(this.frequency, lowLimitFrequency, highLimitFrequency, speedLowFreq, speedHighFreq) * (millis()-date)/5f;
strokeWeight(map(this.frequency, lowLimitFrequency, highLimitFrequency, lineThicknessLowFreq, lineThicknessHighFreq)*map(this.diameter, 0, this.maxDiameter, 1., 0.1)*50);
stroke(255,255,255, constrain((int)map(this.diameter, 0, this.maxDiameter, 255, 0),0,255));
pushMatrix();
beginShape();
translate(h*this.x*width, v*this.y*height);
//this draws a circle from line segments, and is modified by a sinewave
for (int i = 0;i<segments;i++) {
vertex(
(this.distortion*sin(map(i, 0, segments, 0, this.periods*TWO_PI))+1)* this.diameter*sin(i*TWO_PI/segments),
(this.distortion*sin(map(i, 0, segments, 0, this.periods*TWO_PI))+1)* this.diameter* cos(i*TWO_PI/segments)
);
}
vertex(
(this.distortion*sin(map(0, 0, segments, 0, this.periods*TWO_PI))+1)* this.diameter*sin(0*TWO_PI/segments),
(this.distortion*sin(map(0, 0, segments, 0, this.periods*TWO_PI))+1)* this.diameter* cos(0*TWO_PI/segments)
);
endShape();
popMatrix();
}
I hope I've provided enough information to grasp whats going wrong!
My colleagues and I have had similar issues here running a PowerWall (6x3 monitors) from one PC using an Eyefinity setup. The short version is that, as you've discovered, there are a lot of problems running Processing sketches across multiple cards.
We've tended to work around it by using a different approach - multiple copies of the application, which each span one monitor only, render a subsection and sync themselves up. This is the approach people tend to use when driving large displays from multiple machines, but it seems to sidestep these framerate problems as well.
For Processing, there're a couple of libraries that support this: Dan Shiffman's Most Pixels Ever and the Massive Pixel Environment from the Texas Advanced Computing Center. They've both got reasonable examples that should help you through the setup phase.
One proviso though, we kept encountering crashes from JOGL if we tried this with OpenGL rendering - this was about 6 months ago, so maybe that's fixed now. Your draw loop looks like it'll be OK using Java2D, so hopefully that won't be an issue for you.

OpenGL ES rendering to user-space memory

I need to implement off-screen rendering to texture on an ARM device with PowerVR SGX hardware.
Everything is done (pixelbuffers and OpenGL ES 2.0 API were used). The only problem unsolved is very slow glReadPixels function.
I'm not an expert in OpenGL ES, so I'm asking community: is it possible to render textures directly into user-space memory? Or may be there is some way to get hardware address of texture's memory region? Some other technique (EGL extensions)?
I don't need an universal solution, just working one for PowerVR hardware.
Update: A little more information on 'slow function glReadPixels'. Copy 512x512 RGB texture data to CPU's memory:
glReadPixels(0, 0, WIDTH, HEIGHT, GL_RGBA, GL_UNSIGNED_BYTE, &arr) takes 210 ms,
glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, &arr) takes 24 ms (GL_BGRA is not standard for glReadPixels, it's PoverVR extension),
memcpy(&arr, &arr2, WIDTH * HEIGHT * 4) takes 5 ms
In case of bigger textures, differences are bigger too.
Solved.
The way how to force OpenVR hardware render into user-allocated memory:
http://processors.wiki.ti.com/index.php/Render_to_Texture_with_OpenGL_ES#Pixmaps
An example, how to use it:
https://gforge.ti.com/gf/project/gleslayer/
After all of this I can get rendered image as faster as 5 ms.
When you call opengl functions, you're queuing commands in a render queue. Those commands are executed by the GPU asynchronously. When you call glReadPixels, the cpu must wait the gpu to finish its rendering. So the call might be waiting for that draw to finish. On most hardware ( at least those I work on ), the memory is shared by the cpu and the gpu, so the read pixel should not be that slow if the rendering is done.
If you can wait the result or deferred it to the next frame, you might not see that delay anymore
Frame buffer objects are what you are looking for. They are supported on OpenGL ES, and on PowerVr-SGX
EDIT:
Keep in mind that GPU/CPU hardware is incredibly optimized towards moving data in one direction from CPU side to GPU side. The backpath from GPU to CPU is often much slower (its just not a priority to spend hardware resources on). So what ever technique you use (eg FBO/getTexImage) you're going to run against this limit.

glTexSubImage2D extremely slow on Intel video card

My video card is Mobile Intel 4 Series. I'm updating a texture with changing data every frame, here's my main loop:
for(;;) {
Timer timer;
glBindTexture(GL_TEXTURE2D, tex);
glBegin(GL_QUADS); ... /* draw textured quad */ ... glEnd();
glTexSubImage2D(GL_TEXTURE2D, 0, 0, 0, 512, 512,
GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, data);
swapBuffers();
cout << timer.Elapsed();
}
Every iteration takes 120ms. However, inserting glFlush before glTexSubImage2D brings the iteration time to 2ms.
The issue is not in the pixel format. I've tried the pixel formats BGRA, RGBA and ABGR_EXT together with the pixel types UNSIGNED_BYTE, BYTE, UNSIGNED_INT_8_8_8_8 and UNSIGNED_INT_8_8_8_8_EXT. The texture's internal pixel format is RGBA.
The order of calls matters. Moving the texture upload before the quad drawing, for example, fixes the slowness.
I also tried this on an GeForce GT 420M card, and it works fast there. My real app does have performance problems on non-Intel cards that are fixed by glFlush calls, but I haven't distilled those to a test case yet.
Any ideas on how to debug this?
One issue is that glTexImage2D performs a full reinitialization of the texture object. If only the data changes, but the format remains the same, use glTexSubImage2D to speed things up (just a reminder).
The other issue is, that despite its name the immediate mode, i.e. glBegin(…) … glEnd() the drawing calls are not synchronous, i.e. the calls return long before the GPU is done drawing. Adding a glFinish() will synchronize. But as well will do calls to anything that modifies data still required by queued operations. So in your case glTexImage2D (and glTexSubImage2D) must wait for the drawing to finish.
Usually it's best to do all volatile resource uploads at either the beginning of the drawing function, or during the SwapBuffers block in a separate thread through buffer objects. Buffer objects have been introduced for that very reason, to allow for asynchronous, yet tight operation.
I assume you're actually using that texture for one or more of your quads?
Uploading textures is one of the most expensive operations possible. Since your texture data changes every frame, the upload is unavoidable, but you should try to do it when the texture isn't in use by shaders. Remember that glBegin(GL_QUADS); ... glEnd(); doesn't actually draw quads, it requests that the GPU render the quads. Until the rendering completes, the texture will be locked. Depending on the implementation, this might cause the texture upload to wait (ala glFlush), but it could also cause the upload to fail, in which case you've wasted megabytes of PCIe bandwidth and the driver has to retry.
It sounds like you already have a solution: upload all new textures at the beginning of the frame. So what's your question?
NOTE: Intel integrated graphics are horribly slow anyway.
When you make a Draw Call ( glDrawElements, other ), the driver simply add this call in a buffer, and let the GPU consume these commands when it can.
If this buffer had to be consumed entirely at glSwapBuffers, this would mean that the GPU would be idle after that, waiting for you to send new commands.
Drivers solve this by letting the GPU lag one frame behind. This is the first reason why glTexSubImage2D blocks : the driver waits for the GPU not to use it anymore (in the previous frame) to begin the transfer, so that you never get half-updated data.
The other reason is that glTexSubImage2D is synchronous. Il will also block during the whole transfer.
You can solve the first issue by keeping 2 textures : one for the current frame, one for the previous frame. Upload the texture in the former, but draw with the latter.
You can solve the second issue by using a GL_TEXTURE_BUFFER Buffer Object, which allows asynchronous transfers.
In your case, I suspect that calling glTexSubImage2D just before glSwapBuffer adds an extra synchronization in the driver, whereas drawing the quad just before glSwapBuffer simply appends the command in the buffer. 120ms is probably a driver bug, though : even an Intel GMA doesn't need 120ms to upload a 512x512 texture.

glFlush() takes very long time on window with transparent background

I used the code from How to make an OpenGL rendering context with transparent background? to create a window with transparent background. My problem is that the frame rate is very low - I have around 20 frames/sec even when I draw one quad(made from 2 triangles). I tried to find out why and glFlush() takes around 0.047 seconds. Do you have any idea why? Same thing is rendered in a window that does not have transparent background at 6000 fps(when I remove 60 fps limitation). It also takes one core to 100%. I test it on a Q9450#2.66GHz with ATI Radeon 4800 using Win7.
I think you can't get good performances this way, In the example linked there is the following code
void draw(HDC pdcDest)
{
assert(pdcDIB);
verify(BitBlt(pdcDest, 0, 0, w, h, pdcDIB, 0, 0, SRCCOPY));
}
BitBlt is a function executed on the processor, whereas the OpenGL functions are executed by the GPU. So the rendered data from the GPU as to crawl back to the main memory, and effectively the bandwidth from the GPU to the CPU is somewhat limited (even more because data as to go back there once BitBlt'ed).
If you really want transparent window with rendered content, you might want to look at Direct2D and/or Direct3D, maybe there is some way to do that without the performance penalty of data moving.

Resources