Why is my GLFW window so slow?

Why is my GLFW window so slow? - performance

For some reason the following code produces a result such as this:
I have no idea why it takes a second or two to render each frame. The code seems to be normal to me, but I have no idea why it renders it so slowly.
#define GLFW_INCLUDE_GLU
#define GLEW_STATIC
#include"GL/glew.c"
#include"GLFW/glfw3.h"
#include<cmath>
#include<ctime>
#include<stdlib.h>
using namespace std;
int main( ) {
// init glfw
if ( !glfwInit( ) ) return 0;
// create window and set active context
GLFWwindow* window = glfwCreateWindow( 640, 480, "Test", NULL, NULL );
glfwMakeContextCurrent( window );
// init glew
if ( glewInit( ) != GLEW_OK ) return 0;
// set swap interval
glfwSwapInterval( 1 );
// render loop
while ( !glfwWindowShouldClose( window ) ) {
srand( time( NULL ) );
float r = ( rand( ) % 100 ) / 100.0;
float ratio;
int width, height;
glfwGetFramebufferSize( window, &width, &height );
ratio = width / ( float ) height;
glViewport( 0, 0, width, height );
// render
glClear( GL_COLOR_BUFFER_BIT );
glClearColor( r, 0, 0, 1 );
// swap
glfwSwapBuffers( window );
// events
glfwPollEvents( );
}
// terminate glfw
glfwTerminate( );
return 0;
}

Your GLFW code is correct, and it is performing much faster than you think.
This is incorrect usage of rand and srand. More concretely, you srand with the current time (measured in seconds) every time you render, which will, within the same second, produce exactly the same value for r every time.
You therefore clear the screen to the same color several hundred times per second. This looks like it only renders at one frame per second, but it really isn't.
There are a few other problems with your code (such as using rand()%100 which gives a biased distribution. and using some OpenGL commands redundantly), but the one thing you need to fix for your immediate problem is: srand only once, not every time you render.

Related

Optimizing Metal Compute - texture sampling using Gather?

Im attempting to optimize a compute shader that calculates some values from texture samples, and uses atomic operations to increment counters to a buffer, very similar to the following answer:
https://stackoverflow.com/a/68076730/5510818
kernel void compute(texture2d<half, access::read> inTexture [[texture(0)]],
volatile device atomic_uint *samples [[buffer(0)]],
ushort2 position [[thread_position_in_grid]])
{
// Early bail
if ( position.x >= inTexture.get_width() || position.y >= inTexture.get_height() )
{
return;
}
half3 color = inTexture.read(position).rgb;
// do some math here
// increment
atomic_fetch_add_explicit( &( samples[offset] ), uint32_t( somevalue ), memory_order_relaxed );
And part of my encoder on obj-c:
NSUInteger w = self.pass1PipelineState.threadExecutionWidth;
NSUInteger h = self.pass1PipelineState.maxTotalThreadsPerThreadgroup / w;
MTLSize threadsPerThreadGroup = MTLSizeMake(w, h, 1);
MTLSize threadsPerGrid = MTLSizeMake(frameMPSImage.width, frameMPSImage.height, 1);
[pass1Encoder dispatchThreads:threadsPerGrid threadsPerThreadgroup:threadsPerThreadGroup];
In an attempt to optimize, I am curious if I can leverage texture gather operations.
My understanding is that gather will fetch 4 samples 'about' the thread position in grid - and that it does so in an optimal manner. Am I right in understanding that I could in theory optimize this by fetching via gather, and doing 4x compute in my kernel, and write out 4x from a single thread group?
I would have to ensure that my thread width and height in metal passed to the encoder ensures I don't duplicate work (ie / 4 ?)
Something like:
kernel void compute(texture2d<half, access::read> inTexture [[texture(0)]],
volatile device atomic_uint *samples [[buffer(0)]],
ushort2 position [[thread_position_in_grid]])
{
// Early bail
if ( position.x >= inTexture.get_width() || position.y >= inTexture.get_height() )
{
return;
}
vec4<half3> colorGather = inTexture.gather(position).rgb;
color1 = half3[0]
// do some math here
color2 = half3[1]
// do some math here
color3 = half3[2]
// do some math here
color4 = half3[3]
// do some math here
// increment 4x
atomic_fetch_add_explicit( &( samples[offset1] ), uint32_t( somevalue1 ), memory_order_relaxed );
atomic_fetch_add_explicit( &( samples[offset2] ), uint32_t( somevalue2 ), memory_order_relaxed );
atomic_fetch_add_explicit( &( samples[offset3] ), uint32_t( somevalue3 ), memory_order_relaxed );
atomic_fetch_add_explicit( &( samples[offset4] ), uint32_t( somevalue4 ), memory_order_relaxed );
Am I understanding gather correctly?
Are there any publicly available examples of gather? I cannot seem to find any!
Is there a way to do a mutex lock about the buffer so I am not locking 4x in the above code?
Am I correctly understanding needing to adjust my obj-c encoder pass to account for the fact I'd be sampling 4x in the shader?
Thank you.

Separate image object into N sections of equal pixels (Approach)

Sorry in advance, this is more of an algorithmic problem rather than a coding problem, but I wasn't sure where to put it. For simplicity sake, say you have a binary image (white background, solid black object in foreground)
Example:
sample input
I want to divide this object (meaning only the black pixels) into N sections, all with the same number of pixels (so each section should contain (1/N)*(total # of black pixels)).
With the current algorithm that I'm using, I (1) find the total number of black pixels and (2) divide by N. Then I (3) scan the image row by row marking all black pixels. The result looks something like this:
current output sketch
The problem with this is the last (yellow) section, which isn't continuous. I want to divide the image in a way that makes more sense, like this:
ideal output
Basically, I'd like the boundary between the sections to be as short as possible.
I've been stumped on this for a while, but my old code just isn't cutting it anymore. I only need an approach to identifying the sections, I'll ultimately be outputting each section as individual images, as well as a grayscale copy of the input image where every pixel's value corresponds to its section number (these things I don't need help with). Any ideas?

I only need an approach to identifying the sections
According to this, I tried couple of approaches, these may help for guidelines:
Find contour of the image
Find the moments of contour and detect mass center.
For outer corners, you can simply use convex hull
Find the closest contour points(which are will be inner corners) to mass center
Then you can seperate it to desired regions by using these important points
Here is the result and code:
#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/imgproc.hpp"
#include <iostream>
using namespace cv;
using namespace std;
vector<Point>innerCorners;
bool isClose(Point test);
int main()
{
Mat src_gray;
int thresh = 100;
Mat src = imread("image/dir/star.png");
cvtColor( src, src_gray, COLOR_BGR2GRAY );
namedWindow( "Source",WINDOW_NORMAL );
Mat canny_output;
Canny( src_gray, canny_output, thresh, thresh*2 );
vector<vector<Point> > contours;
findContours( canny_output, contours, RETR_TREE, CHAIN_APPROX_SIMPLE );
vector<Vec4i> hierarchy;
vector<vector<Point> >hull( contours.size() );
vector<Moments> mu(contours.size() );
for( int i = 0; i <(int)contours.size(); i++ )
{ mu[i] = moments( contours[i], false ); }
for( size_t i = 0; i < contours.size(); i++ )
{
if(contours[i].size()>20)
convexHull( contours[i], hull[i] );
}
vector<Point2f> mc( contours.size() );
for( int i = 0; i <(int)contours.size(); i++ )
{ mc[i] = Point2f( mu[i].m10/mu[i].m00 , mu[i].m01/mu[i].m00 ); }
Mat drawing = Mat::zeros( canny_output.size(), CV_8UC3 );
int onlyOne = 1;
for( size_t i = 0; i< contours.size(); i++ )
{
if(contours[i].size()>20 && onlyOne)
{
circle( src, mc[i], 4, Scalar(0,255,255), -1, 8, 0 );
Scalar color = Scalar(255,0,0);
drawContours( drawing, contours, (int)i, color );
drawContours( src, hull, (int)i, color,5 );
Point centerMass = mc[i];
for(int a=0; a<(int)contours[i].size();a++)
{
if(cv::norm(cv::Mat(contours[i][a]),Mat(centerMass))<200 && isClose(contours[i][a]))
{
circle(src,contours[i][a],5,Scalar(0,0,255),10);
innerCorners.push_back(contours[i][a]);
line(src,contours[i][a],centerMass,Scalar(0,255,255),5);
}
}
onlyOne = 0;
}
}
namedWindow( "Hull demo",WINDOW_NORMAL );
imshow( "Hull demo", drawing );
imshow("Source", src );
waitKey();
return 0;
}
bool isClose(Point test){
if(innerCorners.size()==0)
return 1;
for(Point a:innerCorners)
if((cv::norm(cv::Mat(a),cv::Mat(test)))<70)
return 0;
return 1;
}

Why does this wobble?

Tested on Processing 2.2.1 & 3.0a2 on OS X.
The code I've tweaked below may look familiar to some of you, it's what Imgur now uses as their loading animation. It was posted on OpenProcessing.org and I've been able to get it working in Processing, but the arcs are constantly wobbling around (relative movement within 1 pixel). I'm new to Processing and I don't see anything in the sketch that could be causing this, it runs in ProcessingJS without issue (though very high CPU utilization).
int num = 6;
float step, spacing, theta, angle, startPosition;
void setup() {
frameRate( 60 );
size( 60, 60 );
strokeWeight( 3 );
noFill();
stroke( 51, 51, 51 );
step = 11;
startPosition = -( PI / 2 );
}
void draw() {
background( 255, 255, 255, 0 );
translate( width / 2, height / 2 );
for ( int i = 0; i < num; i++ ) {
spacing = i * step;
angle = ( theta + ( ( PI / 4 / num ) * i ) ) % PI;
float arcEnd = map( sin( angle ), -1, 1, -TWO_PI, TWO_PI );
if ( angle <= ( PI / 2 ) ) {
arc( 0, 0, spacing, spacing, 0 + startPosition , arcEnd + startPosition );
}
else {
arc( 0, 0, spacing, spacing, TWO_PI - arcEnd + startPosition , TWO_PI + startPosition );
}
}
arc( 0, 0, 1, 1, 0, TWO_PI );
theta += .02;
}
If it helps, I'm trying to export this to an animated GIF. I tried doing this with ProcessingJS and jsgif, but hit some snags. I'm able to get it exported in Processing using gifAnimation just fine.
UPDATE
Looks like I'm going with hint( ENABLE_STROKE_PURE );, cleaned up with strokeCap( SQUARE ); within setup(). It doesn't look the same as the original but I do like the straight edges. Sometimes when you compromise, the result ends up even better than the "ideal" solution.

I see the problem on 2.2.1 for OS X, and calling hint(ENABLE_STROKE_PURE) in setup() fixes it for me. I couldn't find good documentation for this call, though; it's just something that gets mentioned here and there.
As for the root cause, if I absolutely had to speculate, I'd guess that Processing's Java renderer approximates a circular arc using a spline with a small number of control points. The control points are spaced out between the endpoints, so as the endpoints move, so do the bumps in the approximation. The approximation might be good enough for a single frame, but the animation makes the bumps obvious. Setting ENABLE_STROKE_PURE might increase the number of control points, or it might force Processing to use a more expensive circular arc primitive in the underlying graphics library it's built upon. Again, though, this is just a guess as to why a drawing environment might have a bug like the one you've seen. I haven't read Processing's source code to verify the guess.

Core Animation / GLES2.0: Getting bitmap from CALayer into GL Texture

I am rewriting in GL some core animation code that wasn't performing fast enough.
on my previous version each button was represented by a CALayer, containing sublayers for the overall shape and the text content.
what I would like to do is set .setRasterize = YES on this layer, force it to render onto its own internal bitmap, then send that over to my GL code, currently:
// Create A 512x512 greyscale texture
{
// MUST be power of 2 for W & H or FAILS!
GLuint W = 512, H = 512;
printf("Generating texture: [%d, %d]\n", W, H);
// Create a pretty greyscale pixel pattern
GLubyte *P = calloc( 1, ( W * H * 4 * sizeof( GLubyte ) ) );
for ( GLuint i = 0; ( i < H ); ++i )
{
for ( GLuint j = 0; ( j < W ); ++j )
{
P[( ( i * W + j ) * 4 + 0 )] =
P[( ( i * W + j ) * 4 + 1 )] =
P[( ( i * W + j ) * 4 + 2 )] =
P[( ( i * W + j ) * 4 + 3 )] = ( i ^ j );
}
}
// Ask GL to give us a texture-ID for us to use
glGenTextures( 1, & greyscaleTexture );
// make it the ACTIVE texture, ie functions like glTexImage2D will
// automatically know to use THIS texture
glBindTexture( GL_TEXTURE_2D, greyscaleTexture );
// set some params on the ACTIVE texture
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST );
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST );
// WRITE/COPY from P into active texture
glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA, W, H, 0, GL_RGBA, GL_UNSIGNED_BYTE, P );
free( P );
glLogAndFlushErrors();
}
could someone help me patch this together?
EDIT: I actually want to create a black and white mask, so every pixel would either be 0x00 or 0xFF, then I can make a bunch of quads, and for each quad I can set all of its vertices to a particular colour. hence I can easily get different coloured buttons from the same stencil...

http://iphone-3d-programming.labs.oreilly.com/ch05.html#GeneratingTexturesWithQuartz
GLSprite example here,
http://developer.apple.com/library/ios/navigation/#section=Frameworks&topic=OpenGLES

Screen grab of a window

I've been writing some code to do a screen grab of a window (in Windows). The code works fine, however prior to the screen grab I have to bring the window to the front that I want to capture and force a redraw.
I force the redraw with InvalidateRect, I then have to pump some messages from the message loop in order for the WM_PAINT to get processed. This is obviously a bit lame, as i don't know how many messages to pump.
I tried using RedrawWindow with RDW_ALLCHILDREN, however the app I am grabbing a screen from is an MDI app and doesn't seem to redraw all of it's children.
So my question is, is there a better way to redraw the window prior to the screen grab?
Cheers
Rich

Since you have not mentioned the language you are using, I hope the following code in C++ helps you!
void getScreenShot( int texWidth, int texHeight, unsigned char* pBuffer, HWND handle )
{
/* Local variables */
HDC screenDC;
RECT screenRect;
int extraBytesPerRow;
BITMAPINFO bitmapInfo;
HDC bitmapDC;
void* bitmapDataPtr;
HBITMAP hBitmap;
HBITMAP hPrevBitmap;
unsigned char* pIn;
unsigned char* pOut;
int rowIndex;
int colIndex;
/* Get a DC from the desktop window */
screenDC = GetDC(handle);
GetClientRect(handle, &screenRect );
/* Determine the extra bytes we need per row (each row of bitmap data must end on a 32bit boundary) */
extraBytesPerRow = ( texWidth * 3 ) % 4;
extraBytesPerRow = extraBytesPerRow ? 4 - extraBytesPerRow : 0;
/* Setup the bitmap info structure */
memset( &bitmapInfo, 0, sizeof( bitmapInfo ) );
bitmapInfo.bmiHeader.biSize = sizeof( BITMAPINFOHEADER );
bitmapInfo.bmiHeader.biWidth = texWidth;
bitmapInfo.bmiHeader.biHeight = texHeight;
bitmapInfo.bmiHeader.biPlanes = 1;
bitmapInfo.bmiHeader.biBitCount = 24;
bitmapInfo.bmiHeader.biCompression = BI_RGB;
/* Create a bitmap device context (bitmapDataPtr will be a pointer to the bits in the bitmap) */
bitmapDC = CreateCompatibleDC( NULL );
hBitmap = CreateDIBSection( bitmapDC, ( BITMAPINFO* )&bitmapInfo.bmiHeader, DIB_RGB_COLORS, &bitmapDataPtr, NULL, 0 );
hPrevBitmap = ( HBITMAP )SelectObject( bitmapDC, hBitmap );
/* BitBlt or StretchBlt the image from the input DC into our bitmap DC */
if ( ( texWidth != screenRect.right ) || ( texHeight != screenRect.bottom ) )
{
SetStretchBltMode( bitmapDC, HALFTONE );
StretchBlt( bitmapDC, 0, 0, texWidth, texHeight, screenDC, 0, 0, screenRect.right, screenRect.bottom, SRCCOPY );
}
else
{
BitBlt( bitmapDC, 0, 0, texWidth, texHeight, screenDC, 0, 0, SRCCOPY);
}
/* Copy the data from the bitmap to the user's buffer (bitmap data is BGR and 4 byte aligned on each row, we want tightly-packed RGB) */
pIn = ( unsigned char* )bitmapDataPtr;
pOut = pBuffer;
for ( rowIndex = 0; rowIndex < texHeight; rowIndex++ )
{
for ( colIndex = 0; colIndex < texWidth; colIndex++ )
{
pOut[ 0 ] = pIn[2];
pOut[ 1 ] = pIn[1];
pOut[ 2 ] = pIn[0];
pOut += 3;
pIn += 3;
}
pIn += extraBytesPerRow;
}
/* Free memory used by the bitmap */
SelectObject( bitmapDC, hPrevBitmap );
DeleteObject( hBitmap );
DeleteDC( bitmapDC );
/* Release the screen DC */
ReleaseDC(handle, screenDC );
}
You dont actually need to force a redraw.. But in case the window is minimised, you might need to bring it up, before you call the function with the window handle..! texWidth and texHeight are dimension of the window you are about to capture; to get this you can use, GetWindowRect(..) or check out the link here: link

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why is my GLFW window so slow? - performance

Related

Optimizing Metal Compute - texture sampling using Gather?

Separate image object into N sections of equal pixels (Approach)

Why does this wobble?

Core Animation / GLES2.0: Getting bitmap from CALayer into GL Texture

Screen grab of a window

Categories

Resources