OpenGL GLUT window very slow, why? - visual-studio-2010

The problem
I have just now begun working with OpenGL using GLUT. The code below compiles and displays two wireframe cubes and a sphere. The problem is that when I attempt to drag or resize the window it induces a noticeable delay before following my mouse.
This problem does not occur on my colleague's computer, same code.
I am working with Visual Studio 2012 c++ express on a Windows 7 computer.
I am a not an experienced programmer.
The code
// OpenGLHandin1.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <GL/glut.h>
void initView(int argc, char * argv[]){
//init here
glutInit(&argc, argv);
//Simple buffer
glutInitDisplayMode( GLUT_SINGLE | GLUT_RGBA );
glutCreateWindow("Handin 2");
void draw(){
//Background color
glTranslatef(0.6, 0, 0);
glutWireCube(1.1); //Draw the cube
glTranslatef(-0.5, 0, -0.2);
glutWireCube(1.1); //Draw the cube
glTranslatef(0, 1.2, 0);
glRotatef(90, 1, 0, 0);
glutWireSphere(0.6, 20, 20); //Draw the sphere
//draw here
void reshape (int w, int h){
glViewport(0,0,w ,h);
gluPerspective(45, (float)w/(float)h, 1.5, 10);
gluLookAt(1.5, 2.5, 4,
0, 0.6, 0,
0, 1, 0); //Orient the camera
glRotatef(5, 0, 0, 1);
int main(int argc, char * argv[])

It seems that the simple solution using Sleep(1) in the render function worked. You've also asked why - I'm not sure I will be able to solve this properly, but here's my best guess:
Why does it even work?
Your fellow students can have VSync turned on by default in their drivers. This causes their code to run only as fast as the screen can refresh, most probably 60 fps. It gives you around 16 miliseconds to render the frame, and if the code is efficient (taking, say, 2 ms for render) it leaves plenty of time for the CPU to do other OS-related stuff, such as moving your window.
Now, if you disable vertical sync, the program will try to render as many frames as possible, effectively clogging all other processes. I've suggested you to use Sleep, because it reveals this one particular issue. It doesn't really matter if it's 1 or 3 ms, what it really does is say "hey, CPU, I'm not doing anything in particular right now, so you may do other things".
But isn't it slowing my program?
Using Sleep is a common technique. If you're concerned with that lost 1 ms every frame, you can also try putting Sleep(0), as it should act exactly the same - giving the spare time to the CPU. You could also try enabling vertical sync and verifying if indeed my guess was correct.
As a side note, you can also look at CPU usage graphs with and without sleep. It should be 100% (or 50% on a dual-core CPU) without (running as fast as possible), and much lower with, depending on your program requirements and your CPU's speed.
Additional remarks about Sleep(0)
After the sleep interval has passed, the thread is ready to run. If you specify 0 milliseconds, the thread will relinquish the remainder of its time slice but remain ready. Note that a ready thread is not guaranteed to run immediately. Consequently, the thread may not run until some time after the sleep interval elapses. - it's from here.
Also note that on Linux systems behavior might be slightly different; but I'm not a linux expert; perhaps a passer-by could clarify.


Why Compute Shader is slowing down the rendering API calls?

I am using compute shader to process the input buffer data and store it as output texture using imagestore().
After executing the compute shader, I have 3 render calls sequentially.
Compute Shader Code:
#version 310 es
precision mediump image2D;
layout(std430) buffer; // Sets the default layout for SSBOs
layout(local_size_x = 256) in; // 256 threads per work group
layout(binding = 0) readonly buffer InputBuf
uint input_buff[];
} inputbuff;
layout (rgba32f, binding = 1 ) uniform writeonly image2D out_teximg;
void main()
int idx = int(gl_GlobalInvocationID.x);
int idy = int(gl_GlobalInvocationID.y);
unsigned int inputpix = inputbuff[1024 * idy + idx];
// some calculation on inputpix and output is rcolor, bcolor, gcolor
imageStore(out_teximg, ivec2(idx , idy), vec4(rcolor, bcolor, gcolor, 1.0));
void initCompute()
glGenTextures(1, &computeOutTex);
glGenBuffers(1, &inSSBOId);
uint inputBuffData = { .... }; // input buffer data
void execute_compute()
// compute shader code starts...
glBindTexture(GL_TEXTURE_2D, computeOutTex);
glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA32F, width, height);
glBindImageTexture(1, computeOutTex, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F); // binding is 1
glUniform1i( glGetUniformLocation(computePgmId, "out_teximg"), 0);
uint inputBuffSize = 1024 * 512 * 3;
glBufferData(GL_SHADER_STORAGE_BUFFER, inputBuffSize, inputBuffData, GL_STATIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0 , inSSBOId); // binding is 0
glDispatchCompute(width / 256, height, 1);
// glFinish();
glBindImageTexture(1, 0, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F); // binding is 1
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);// binding is 0
int draw()
glBindFramebuffer(GL_FRAMEBUFFER, m_FBOId); // Offscreen Rendering
glBindTexture(GL_TEXTURE_2D, computeOutTex);
glDrawElements(); // Render the texture data
// 2nd draw call
// 3rd draw call
glBindFramebuffer(GL_FRAMEBUFFER, 0); // unbind FBO
Here, the only 2nd draw call is taking more time after using compute shader.
If glFinish() is called after glMemoryBarrier(), then only execute_compute() call is slowed down.
Why compute shader is slowing down the subsequent draw calls?
Is glFinish() really needed?
The compute shader does not slow down the subsequent draw call. However, the compute shader itself takes some time to execute. Since you are setting a memory barrier, the subsequent draws have to wait.
The OpenGL commands are cached and are not executed immediately when they are called. GPU and CPU work in parallel. The CPU sends instructions to the GPU and the GPU processes them as quickly as possible.
glFinish gets everything ready and does not return until all previously called commands have been completed. glFinish itself is not "costly". It just seems "costly" when measuring the time on the CPU since it measures the time it takes to complete the previously called OpenGL commands.
Anyway glFinish is not needed here. All you need is the memory barrier. When using the memory barrier, the following OpenGL commands, which depend on this barrier, appear to take longer to complete. However they don't need any longer, they just have to wait until the condition indicated by the barrier is met.
In your case you need to use GL_ALL_BARRIER_BITS or GL_TEXTURE_FETCH_BARRIER_BIT, which reflects incoherent memory writes (e.g.: Image store) prior to the barrier to texture fetches after the barrier.

Unbinding a WebGL buffer, worth it?

In various sources I've seen recommendations for 'unbinding' buffers after use, i.e. setting it to null. I'm curious if there is really a need for this. e.g.
var buffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
// ... buffer related operations ...
gl.bindBuffer(gl.ARRAY_BUFFER, null); // unbinding
On the one hand, it's likely better for debugging as you'll probably get better error messages, but is there any significant performance loss from unbinding buffers all the time? It's generally recommended to reduce WebGL calls where possible.
The reason people often unbind buffers and other objects is to minimize the side effects of functions/methods. It's a general software development principle that functions should only perform their advertised operations, and not have any unexpected side effects. Therefore, it's a common practice that if a function binds objects, it unbinds them before returning.
Let's look at a typical example (with no particular language syntax). First, we define a function that creates a texture without any defined content:
function GLuint createEmptyTexture(int texWidth, int texHeight) {
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
return texId;
Then, let's have another function to create a texture. But this one fills the texture with data from a buffer (which I believe is not supported in WebGL yet, but it still helps illustrates the general principle):
function GLuint createTextureFromBuffer(int texWidth, int texHeight,
GLuint bufferId) {
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferId);
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
return texId;
Now, I can call these functions, and everything works as expected:
GLuint tex1 = createEmptyTexture(width, height);
GLuint tex2 = createTextureFromBuffer(width, height, bufferId);
But see what happens if I call them in the opposite order:
GLuint tex1 = createTextureFromBuffer(width, height, bufferId);
GLuint tex2 = createEmptyTexture(width, height);
This time, both textures will be filled with the buffer content, because the pixel unpack buffer was still bound after the first function returned, and therefore when the second function was called.
One way of avoiding this is to unbind the pixel unpack buffer at the end of the function that binds it. And to make sure that similar issues can not happen because the texture is still bound, it can unbind that one as well:
function GLuint createTextureFromBuffer(int texWidth, int texHeight,
GLuint bufferId) {
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferId);
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
glBindTexture(GL_TEXTURE_2D, 0);
return texId;
With this implementation, both call sequences of using these two functions will produce the same result.
There are other approaches to address this. For example:
Each function documents its preconditions and side effects, and the caller is responsible to make any necessary state changes to meet the preconditions of the next function after calling a function with side effects.
Each function is completely responsible for setting up all it's state. In the example above, this would mean that the createEmptyTexture() function would have to unbind the pixel unpack buffer, because it relies on none being bound.
Approach 1 does not really scale well, and will be painful to maintain in larger systems. Approach 2 is also unsatisfactory because OpenGL has a lot of state, and having to set up all relevant state in every function would be verbose and inefficient.
This is really part of a bigger question: How do you deal with the state based nature of OpenGL in a modular software architecture? Buffer bindings are just one example of state you need to deal with. This is typically not very difficult to handle in small programs that you write by yourself, but is a possible trouble spot in larger systems. It gets worse if components from different sources (e.g. different vendors) are mixed.
I don't think there's one single approach that is ideal in all possible scenarios. The important thing is that you pick one clearly defined strategy, and use it consistently. How to handle this best in various scenarios is somewhat beyond the scope of an answer here.
While unbinding buffers should be fairly cheap, I'm not a fan of unnecessary calls. So I would try to avoid those calls, unless you really feel you need them to enforce a clear and consistent policy for the software you are writing.

How can I improve performance of Direct3D when I'm writing to a single vertex buffer thousands of times per frame?

I am trying to write an OpenGL wrapper that will allow me to use all of my existing graphics code (written for OpenGL) and will route the OpenGL calls to Direct3D equivalents. This has worked surprisingly well so far, except performance is turning out to be quite a problem.
Now, I admit I am most likely using D3D in a way it was never designed. I am updating a single vertex buffer thousands of times per render loop. Every time I draw a "sprite" I send 4 vertices to the GPU with texture coordinates, etc and when the number of "sprites" on the screen at one time gets to around 1k to 1.5k, then the FPS of my app drops to below 10fps.
Using the VS2012 Performance Analysis (which is awesome, btw), I can see that the ID3D11DeviceContext->Draw method is taking up the bulk of the time:
Screenshot Here
Is there some setting I'm not using correctly while setting up my vertex buffer, or during the draw method? Is it really, really bad to be using the same vertex buffer for all of my sprites? If so, what other options do I have that wouldn't drastically alter the architecture of my existing graphics code base (which are built around the OpenGL paradigm...send EVERYTHING to the GPU every frame!)
The biggest FPS killer in my game is when I'm displaying a lot of text on the screen. Each character is a textured quad, and each one requires a separate update to the vertex buffer and a separate call to Draw. If D3D or hardware doesn't like many calls to Draw, then how else can you draw a lot of text to the screen at one time?
Let me know if there is any more code you'd like to see to help me diagnose this problem.
Here's the hardware I'm running on:
Core i7 # 3.5GHz
16 gigs of RAM
GeForce GTX 560 Ti
And here's the software I'm running:
Windows 8 Release Preview
VS 2012
DirectX 11
Here is the draw method:
void OpenGL::Draw(const std::vector<OpenGLVertex>& vertices)
auto matrix = *;
_constantBufferData.view = DirectX::XMMatrixTranspose(matrix);
_context->UpdateSubresource(_constantBuffer, 0, NULL, &_constantBufferData, 0, 0);
_context->VSSetShader(_vertexShader, nullptr, 0);
_context->VSSetConstantBuffers(0, 1, &_constantBuffer);
ID3D11ShaderResourceView* texture = _textures[_currentTextureId];
// Set shader texture resource in the pixel shader.
_context->PSSetShader(_pixelShaderTexture, nullptr, 0);
_context->PSSetShaderResources(0, 1, &texture);
D3D11_MAPPED_SUBRESOURCE mappedResource;
auto hr = _context->Map(_vertexBuffer, 0, mapType, 0, &mappedResource);
if (SUCCEEDED(hr))
OpenGLVertex *pData = reinterpret_cast<OpenGLVertex *>(mappedResource.pData);
memcpy(&(pData[_currentVertex]), &vertices[0], sizeof(OpenGLVertex) * vertices.size());
_context->Unmap(_vertexBuffer, 0);
UINT stride = sizeof(OpenGLVertex);
UINT offset = 0;
_context->IASetVertexBuffers(0, 1, &_vertexBuffer, &stride, &offset);
_context->Draw(vertices.size(), _currentVertex);
_currentVertex += (int)vertices.size();
And here is the method that creates the vertex buffer:
void OpenGL::CreateVertexBuffer()
ZeroMemory(&bd, sizeof(bd));
bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = _maxVertices * sizeof(OpenGLVertex);
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.MiscFlags = 0;
bd.StructureByteStride = 0;
ZeroMemory(&initData, sizeof(initData));
_device->CreateBuffer(&bd, NULL, &_vertexBuffer);
Here is my vertex shader code:
cbuffer ModelViewProjectionConstantBuffer : register(b0)
matrix model;
matrix view;
matrix projection;
struct VertexShaderInput
float3 pos : POSITION;
float4 color : COLOR0;
float2 tex : TEXCOORD0;
struct VertexShaderOutput
float4 pos : SV_POSITION;
float4 color : COLOR0;
float2 tex : TEXCOORD0;
VertexShaderOutput main(VertexShaderInput input)
VertexShaderOutput output;
float4 pos = float4(input.pos, 1.0f);
// Transform the vertex position into projected space.
pos = mul(pos, model);
pos = mul(pos, view);
pos = mul(pos, projection);
output.pos = pos;
// Pass through the color without modification.
output.color = input.color;
output.tex = input.tex;
return output;
What you need to do is batch vertexes as aggressively as possible, then draw in large chunks. I've had very good luck retrofitting this into old immediate-mode OpenGL games. Unfortunately, it's kind of a pain to do.
The simplest conceptual solution is to use some sort of device state (which you're probably tracking already) to create a unique stamp for a particular set of vertexes. Something like blend modes and bound textures is a good set. If you can find a fast hashing algorithm to run on the struct that's in, you can store it pretty efficiently.
Next, you need to do the vertex caching. There are two ways to handle that, both with advantages. The most aggressive, most complicated, and in the case of many sets of vertexes with similar properties, most efficient is to make a struct of device states, allocate a large (say 4KB) buffer, and proceed to store vertexes with matching states in that array. You can then dump the entire array into a vertex buffer at the end of the frame, and draw chunks of the buffer (to recreate original order). Keeping track of all the buffer and state and order is difficult, however.
The simpler method, which can provide a good bit of caching under good circumstances, is to cache vertexes in a large buffer until device state changes. At that point, prior to actually changing state, dump the array into a vertex buffer and draw. Then reset the array index, commit state changes, and go again.
If your application has large numbers of similar vertexes, which is very possible working with sprites (texture coordinates and colors may change, but good sprites will use a single texture atlas and few blending modes), even the second method can give some performance boosts.
The trick here is to build up a cache in system memory, preferably a large chunk of pre-allocated memory, then dump it to video memory just prior to drawing. This allows you to perform far fewer writes to video memory and draw calls, which tend to be expensive (especially together). As you've seen, the number of calls you make gets to be slow, and batching stands a good chance of helping with that. The trick is to not allocate memory each frame if you can help it, batch large enough chunks to be worthwhile, and maintain correct device state and order for each draw.

How to avoid perceived flicker during scrolling in Qt?

I'm trying use Qt framework(4.7.4) to demonstrate a sliding display in which new pixel data is added to first row of the screen and previous pixels are scrolled one pixel below in every refresh.
It is refreshed 20 times per second and in every refresh, random green points (pixels) are drawn on black background.
The problem is; there is highly noticeable flickers in every refresh. I have researched through the web and optimized my code as much as possible. I tried to use raster rendering with both QPainter (on QWidget) and QGraphicsScene(on QGraphicsView) and even I tried to use OpenGL rendering on QGLWidget. However, at the end I have still the same flicker problem.
What may cause this flickering? I begin to suspect that my LCD monitor can not refresh the display for black to green transitions. I have also noticed that if I select a gray background instead of black, there happens no flicker.
The effect you're seeing is purely psychovisual. It's a human defect, not a software defect. I'm serious. You can verify by fixing the value of x - you'll still be repainting the entire pixmap on the window, there won't be any flicker - because there is no flicker per se.
The psychovisual flicker occurs when the scroll rate is not tied to the passage of real time. When occasionally the time between updates varies due to CPU load, or due to system timer inaccuracies, our visual system integrates two images and it appears as if the overall brightness is changed.
You've correctly noticed that the perceived flicker is reduced as you reduce the contrast ratio of the image by setting the background to grey. This is an additional clue that the effect is psychovisual.
Below is a way of preventing this effect. Notice how the scroll distance is tied to the time (here: 1ms = 1pixel).
#include <QElapsedTimer>
#include <QPaintEvent>
#include <QBasicTimer>
#include <QApplication>
#include <QPainter>
#include <QPixmap>
#include <QWidget>
#include <QDebug>
static inline int rand(int range) { return (double(qrand()) * range) / RAND_MAX; }
class Widget : public QWidget
float fps;
qint64 lastTime;
QPixmap pixmap;
QBasicTimer timer;
QElapsedTimer elapsed;
void timerEvent(QTimerEvent * ev) {
if (ev->timerId() == timer.timerId()) update();
void paintEvent(QPaintEvent * ev) {
qint64 time = elapsed.elapsed();
qint64 delta = time - lastTime;
lastTime = time;
if (delta > 0) {
const float weight(0.05);
fps = (1.0-weight)*fps + weight*(1E3/delta);
if (pixmap.size() != size()) {
pixmap = QPixmap(size());
int dy = qMin((int)delta, pixmap.height());
pixmap.scroll(0, dy, pixmap.rect());
QPainter pp(&pixmap);
pp.fillRect(0, 0, pixmap.width(), dy, Qt::black);
for(int i = 0; i < 30; ++i){
int x = rand(pixmap.width());
pp.fillRect(x, 0, 3, dy, Qt::green);
QPainter p(this);
p.drawPixmap(ev->rect(), pixmap, ev->rect());
p.fillRect(0, 0, 100, 50, Qt::black);
p.drawText(rect(), QString("FPS: %1").arg(fps, 0, 'f', 0));
explicit Widget(QWidget *parent = 0) : QWidget(parent), fps(0), lastTime(0), pixmap(size())
timer.start(1000/60, this);
int main(int argc, char *argv[])
QApplication a(argc, argv);
Widget w;;
return a.exec();
I'd recommend you do not scroll the pixmap in-place, but create a second pixmap and use drawPixmap() to copy everything but one line from pixmap 1 to pixmap 2 (with the scroll offset). Then continue painting on pixmap 2. After the frame, exchange the references to both pixmaps, and start over.
The rationale is that copying from one memory area to a different one can be optimised more easily than modifying one memory area in-place.

Pretty Pixel-level Picture Painting, Programmatically

My mac laptop has 1,024,000 pixels.
What's the simplest way to turn my display completely black and go nuts with writing little programs to twiddle pixels to my heart's delight?
To make it more concrete, say I wanted to implement the Chaos Game to draw a Sierpinski triangle, at the pixel level, with nothing else on the screen.
What are ways to do that?
Perhaps pick Processing
Create Quartz in your Captured Console
Surely a Screen Saver would be a Serendipitous Solution ?
One approach would be to download sample code for a screen saver module and then then use that as a template for your own screen saver. That way you don't have to write much beyond the actual drawing code, and you get your own custom screen saver module to boot.
I you're using C or C++ you can do this stuff with SDL. It allows low level access to the pixels of a single window or the full screen (plus the keyboard, mouse, and soundcard) and works on most Windows, OSX, and Linux.
There are some excellent tutorials on how to use this library at I think you want tutorial #31 after the first few.
A good way to go is GLUT, the (slightly) friendly multiplatform wrapper to OpenGL. Here's some code to twiddle some points:
#include <GL/glut.h>
reshape(int w, int h)
glViewport(0, 0, w, h);
glOrtho(0, w, 0, h, -1, 1);
for (int i = 0; i<399; ++i)
glVertex2i(i, (i*i)%399);
main(int argc, char **argv)
glutInit(&argc, argv);
glutCreateWindow("some points");
return 0;
That's C++ but it should be nearly identical in Python and many other languages.
