Related
I noticed that my 3D engine runs very slow on AMD hardware. After some investigation the slow code boiled down to creating FBO with several attachments and writing to any nonzero attachment. In all tests I compared AMD performance with the same AMD GPU, but writing to unaffected GL_COLOR_ATTACHMENT0, and with Nvidia hardware whose performance difference to my AMD device is well known.
Writing fragments to nonzero attachments is 2-3 times slower than expected.
This code is equivalent to how I create a framebuffer and measure performance in my test apps:
// Create a framebuffer
static const auto attachmentCount = 6;
GLuint fb, att[attachmentCount];
glGenTextures(attachmentCount, att);
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
for (auto i = 0; i < attachmentCount; ++i) {
glBindTexture(GL_TEXTURE_2D, att[i]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
}
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(attachmentCount, dbs);
// Main loop
while (shouldWork) {
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
showFps();
}
Is anything wrong with it?
Fully reproducible minimal tests can be found here. I tried many other writing patterns or OpenGL states and described some of them in AMD Community.
I suppose the problem is in AMD's OpenGL driver, but if it's not, or you faced the same problem and found a workaround (a vendor extension?), please share.
UPD: moving problem detail here.
I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:
Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.
Same as 1, but with output and attachment 1.
Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.
Same as 3, but for attachment 1.
I run all tests on two machines with almost similar CPUs and following GPUs:
AMD Radeon RX550, driver version 19.30.01.16
Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550
and got these results:
Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS
Pre-built test executables are attached to the post or can be downloaded from Google drive.
Test sources (with MSVS-friendly cmake buildsystem) are available here on Github
All four programs show a black window and console with FPS counter.
We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.
I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.
Explicitly turning off scissor, stencil and depth tests does not help.
If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.
glClear calls are also affected, and their performance under various conditions fits the above observations.
My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests. I can't tell exactly what is his driver version, but it's provided by Ubuntu 19.04 repos.
Another teammate tried the tests on Radeon RX590 and got the same 2 times difference.
Finally, let me copy-paste two almost equivalent test examples here. This one works fast:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
{
outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_COLOR_ATTACHMENT0,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
And this one works equivalently fast on Nvidia and Intel GPUs, but 2-3 times slower than the first example on AMD GPUs:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
{
outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
The only difference between these examples is the color attachment used.
I composed two almost similar copy-pasted programs on purpose to avoid possible nasty effects of framebuffer deletion and recreation.
UPD2: Also tried OpenGL 4.6 debug context on my test examples on both Nvidia and AMD. Got no performance warnings.
UPD3: RX470 results:
attachment0: 775 FPS
attachment1: 396 FPS
UPD4: I built attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo, build command lines are
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
Both test programs issue a single drawcall: glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
First test: Firefox with default config, i.e. DirectX-backed ANGLE.
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
Second test: Firefox with disabled ANGLE, (about:config -> webgl.disable-angle = true), using native OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL. It's an expected result, as gamers and developers complained only about OpenGL performance.
P.S. Probably my issue is the root of this and this performance drops.
The problem is fixed by AMD since (at least) December 2019 driver. The fix is confirmed by abovementioned test programs and our game engine FPS rate.
See also this thread.
Dear AMD OpenGL driver team, thank you very much!
when testing on windows the code works as expected, but on android the glGetTexImage api doesn't exist, is there an other way of getting all the pixels from OpenGL without caching them before creating the texture?
this is the code:
void Texture::Bind(int unit)
{
glActiveTexture(GL_TEXTURE0 + unit);
glBindTexture(GL_TEXTURE_2D, mTextureID);
}
GLubyte* Texture::GetPixels()
{
Bind();
int data_size = mWidth * mHeight * 4;
GLubyte* pixels = new GLubyte[mWidth * mHeight * 4];
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
return pixels;
}
glGetTexImage doesn't exist in OpenGL ES.
In OpenGL ES, you have to attach the texture to a framebuffer and read the color plane from the framebuffer by glReadPixels
Bind();
int data_size = mWidth * mHeight * 4;
GLubyte* pixels = new GLubyte[mWidth * mHeight * 4];
GLuint textureObj = ...; // the texture object - glGenTextures
GLuint fbo;
glGenFramebuffers(1, &fbo);
glBindFramebuffer(GL_FRAMEBUFFER, fbo);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, textureObj, 0);
glReadPixels(0, 0, mWidth, mHeight, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glDeleteFramebuffers(1, &fbo);
How to display the image texture in GLES2.
In below source initializing GLES2 display,surface ..,
Creating offline framebuffer,
Loading the RGBA image to texture,
Clearing the screen by BLUE color,
Trying to display the loaded image texture (.. failed to find correct API for GLES2)
Reading the FBO & writing to a file.
For displaying glEnableClientState&glVertexPointer API's is not supporting in GLES2
How to display the loaded image texture in GLES2.
In the below source getting only blue color in buffer got from glReadPixels
unsigned char *video_raw = loadFile("./video.raw");//RGBA raw image
int iConfigs;
EGLConfig eglConfig;
EGLint ai32ContextAttribs[] = { EGL_CONTEXT_CLIENT_VERSION, 2,EGL_NONE };
EGLDisplay eglDisplay = eglGetDisplay((EGLNativeDisplayType)0);
eglInitialize(eglDisplay, 0, 0);
eglBindAPI(EGL_OPENGL_ES_API);
EGLint pi32ConfigAttribs[5];
pi32ConfigAttribs[0] = EGL_SURFACE_TYPE;
pi32ConfigAttribs[1] = EGL_WINDOW_BIT;
pi32ConfigAttribs[2] = EGL_RENDERABLE_TYPE;
pi32ConfigAttribs[3] = EGL_OPENGL_ES2_BIT;
pi32ConfigAttribs[4] = EGL_NONE;
eglChooseConfig(eglDisplay, pi32ConfigAttribs, &eglConfig, 1, &iConfigs);
EGLSurface eglSurface = eglCreatePbufferSurface(eglDisplay, eglConfig, NULL);
EGLContext eglContext = eglCreateContext(eglDisplay, eglConfig, NULL, ai32ContextAttribs);
eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext);
GLuint fboId = 0;
GLuint renderBufferWidth = 960;
GLuint renderBufferHeight = 540;
glGenFramebuffers(1, &fboId);
glBindFramebuffer(GL_FRAMEBUFFER, fboId);
GLuint renderBuffer;
glGenRenderbuffers(1, &renderBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, renderBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_RGB565, renderBufferWidth, renderBufferHeight);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, renderBuffer);
glClearColor(0.0,0.0,1.0,1.0);
glClear(GL_COLOR_BUFFER_BIT);
glEnable(GL_TEXTURE_2D);
GLuint texture_object_id;
glGenTextures(1, &texture_object_id);
glBindTexture(GL_TEXTURE_2D, texture_object_id);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, renderBufferWidth, renderBufferHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, video_raw);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
GLfloat vtx1[] = { -1, -1, 0, -1, 1, 0, 1, 1, 0, 1, -1, 0 };
GLfloat tex1[] = { 0, 0, 0, 1, 1, 1, 1, 0 };
/*glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, vtx1);
glTexCoordPointer(2, GL_FLOAT, 0, tex1);
glDrawArrays(GL_TRIANGLE_FAN, 0, 4);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);*/
eglSwapBuffers( eglDisplay, eglSurface);
//read & write to a file
int size = 4 * renderBufferHeight * renderBufferWidth;
unsigned char *data2 = new unsigned char[size];
glReadPixels(0, 0, renderBufferWidth, renderBufferHeight, GL_RGBA, GL_UNSIGNED_BYTE, data2);
dumptoFile("./read1.raw", size, data2);
Edit 1:
#Rabbid76 ,
Thanks for the reply. When i used your vertx shader "in vec3 inPos;\n" shader compilation failed. i replace "in" with "uniform".
Getting black screen from the below source with your input added.
static const GLuint WIDTH = 960;
static const GLuint HEIGHT = 540;
static const GLchar* vertex_shader_source =
"#version 100\n"
"precision mediump float;\n"
"uniform vec3 inPos;\n"
"uniform vec2 inUV;\n"
"varying vec2 vUV;\n"
"void main(){\n"
" vUV = inUV;\n"
" gl_Position = vec4(inPos, 1.0);\n"
"}\n";
static const GLchar* fragment_shader_source =
"#version 100\n"
"precision mediump float;\n"
"varying vec2 vUV;\n"
"uniform sampler2D u_texture;\n"
"void main(){\n"
" gl_FragColor = texture2D(u_texture, vUV);\n"
"}\n";
int main(int argc, char **argv)
{
unsigned char *video_raw = loadFile("./video.raw");
int iConfigs;
EGLConfig eglConfig;
EGLint ai32ContextAttribs[] = { EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE };
EGLDisplay eglDisplay = eglGetDisplay((EGLNativeDisplayType) 0);
eglInitialize(eglDisplay, 0, 0);
eglBindAPI(EGL_OPENGL_ES_API);
EGLint pi32ConfigAttribs[5];
pi32ConfigAttribs[0] = EGL_SURFACE_TYPE;
pi32ConfigAttribs[1] = EGL_WINDOW_BIT;
pi32ConfigAttribs[2] = EGL_RENDERABLE_TYPE;
pi32ConfigAttribs[3] = EGL_OPENGL_ES2_BIT;
pi32ConfigAttribs[4] = EGL_NONE;
eglChooseConfig(eglDisplay, pi32ConfigAttribs, &eglConfig, 1, &iConfigs);
EGLSurface eglSurface = eglCreatePbufferSurface(eglDisplay, eglConfig, NULL);
EGLContext eglContext = eglCreateContext(eglDisplay, eglConfig, NULL, ai32ContextAttribs);
eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext);
GLuint shader_program, framebuffer, renderBuffer;
glGenRenderbuffers(1, &renderBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, renderBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_RGBA, WIDTH, HEIGHT);
glGenFramebuffers(1, &framebuffer);
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, renderBuffer);
glClearColor(0.0f, 0.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glViewport(0, 0, WIDTH, HEIGHT);
glEnable(GL_TEXTURE_2D);
shader_program = common_get_shader_program(vertex_shader_source, fragment_shader_source);
GLint vert_inx = glGetAttribLocation(shader_program, "inPos");
GLint uv_inx = glGetAttribLocation(shader_program, "inUV");
GLint tex_loc = glGetUniformLocation(shader_program, "u_texture");
GLuint texture_object_id;
glGenTextures(1, &texture_object_id);
glBindTexture(GL_TEXTURE_2D, texture_object_id);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, WIDTH, HEIGHT, 0, GL_RGBA, GL_UNSIGNED_BYTE, video_raw);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
GLfloat vtx1[] = { -1, -1, 0, -1, 1, 0, 1, 1, 0, 1, -1, 0 };
GLfloat tex1[] = { 0, 0, 0, 1, 1, 1, 1, 0 };
glVertexAttribPointer(vert_inx, 3, GL_FLOAT, GL_FALSE, 0, vtx1);
glEnableVertexAttribArray(vert_inx);
glVertexAttribPointer(uv_inx, 2, GL_FLOAT, GL_FALSE, 0, tex1);
glEnableVertexAttribArray(uv_inx);
glViewport(0,0,renderBufferWidth,renderBufferHeight);
glUseProgram(shader_program);
glUniform1i(tex_loc, 0);
glDrawArrays( GL_TRIANGLE_FAN, 0, 4);
glFlush();
int size = 4 * WIDTH * HEIGHT;
unsigned char *data2 = new unsigned char[size];
glReadPixels(0, 0, WIDTH, HEIGHT, GL_RGBA, GL_UNSIGNED_BYTE, data2);
dumptoFile("./read1.raw", size, data2);
return EXIT_SUCCESS;
}
You have to use a shader program, and to define the arrays of generic vertex attribute data. See also Vertex Specification.
Create, compile and link a very simple shader program like the following:
const char *sh_vert =
"#version 100\n"\
"precision mediump float;\n"\
"attribute vec3 inPos;\n"\
"attribute vec2 inUV;\n"\
"varying vec2 vUV;\n"\
"void main()\n"\
"{\n"\
" vUV = inUV;\n"\
" gl_Position = vec4(inPos, 1.0);\n"\
"}";
const char *sh_frag =
"#version 100\n"\
"precision mediump float;\n"\
"varying vec2 vUV;\n"\
"uniform sampler2D u_texture;\n"\
"void main()\n"\
"{\n"\
" gl_FragColor = texture2D(u_texture, vUV);\n"\
"}";
GLuint v_sh = glCreateShader( GL_VERTEX_SHADER );
glShaderSource( v_sh, 1, &sh_vert, nullptr );
glCompileShader( v_sh );
GLint status = GL_TRUE;
glGetShaderiv( v_sh, GL_COMPILE_STATUS, &status );
if ( status == GL_FALSE )
{
// compile error
}
GLuint f_sh = glCreateShader( GL_FRAGMENT_SHADER );
glShaderSource( f_sh, 1, &sh_frag, nullptr );
glCompileShader( f_sh );
status = GL_TRUE;
glGetShaderiv( f_sh, GL_COMPILE_STATUS, &status );
if ( status == GL_FALSE )
{
// compile error
}
GLuint prog = glCreateProgram();
glAttachShader( prog, v_sh );
glAttachShader( prog, f_sh );
glLinkProgram( prog );
status = GL_TRUE;
glGetProgramiv( prog, GL_LINK_STATUS, &status );
if ( status == GL_FALSE )
{
// link error
}
Get the attribute indices and the location of the texture sampler uniform:
GLint vert_inx = glGetAttribLocation( prog, "inPos" );
GLint uv_inx = glGetAttribLocation( prog, "inUV" );
GLint tex_loc = glGetUniformLocation( prog, "u_texture" );
Then define the arrays of generic vertex attribute data by (glVertexAttribPointer) and enable them by glEnableVertexAttribArray:
GLfloat vtx1[] = { -1, -1, 0, -1, 1, 0, 1, 1, 0, 1, -1, 0 };
GLfloat tex1[] = { 0, 0, 0, 1, 1, 1, 1, 0 };
glVertexAttribPointer( vert_inx, 3, GL_FLOAT, GL_FALSE, 0, vtx1);
glEnableVertexAttribArray( vert_inx );
glVertexAttribPointer( uv_inx, 2, GL_FLOAT, GL_FALSE, 0, tex1);
glEnableVertexAttribArray( uv_inx );
Setup the renderbuffer and the framebuffer and adjust the viewport:
glGenFramebuffers(1, &fboId);
glBindFramebuffer(GL_FRAMEBUFFER, fboId);
GLuint renderBuffer;
glGenRenderbuffers(1, &renderBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, renderBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_RGBA, renderBufferWidth, renderBufferHeight);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, renderBuffer);
glViewport(0,0,renderBufferWidth,renderBufferHeight);
glClearColor(0.0f, 0.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
Use the program, set the texture sampler uniform and draw the geometry:
// use the program
glUseProgram( prog );
glUniform1i( tex_loc, 0 ); // 0 == texture unit 0
// draw the geometry
glDrawArrays( GL_TRIANGLE_FAN, 0, 4 );
glUseProgram( 0 );
Finally the image can be read:
int size = 4 * renderBufferHeight * renderBufferWidth;
unsigned char *data2 = new unsigned char[size];
glReadPixels(0, 0, renderBufferWidth, renderBufferHeight, GL_RGBA, GL_UNSIGNED_BYTE, data2);
I was curious to see the performance of texture uploads with my configuration using OpenGL and noticed something I think is odd. I create a 4K texture using glTexStorage2D with a format of GL_RGBA8. Then, every frame I use glTexSubImage2D to re-upload a static image buffer to the texture. Based off the frame rate I get about 5.19GB/s. Next, I changed the format of the texture to GL_SRGB8_ALPHA8 and re-try the experiment. This time I am getting 2.81GB/s, a significant decrease. This seems odd because as far as I know there shouldn't be anything different about uploading sRGB data verses uploading RGB data, as there is no conversion that should be taking place (sRGB conversion should take place in the shader, during sampling).
Some additional information. For the first test I use GL_RGBA and GL_UNSIGNED_INT_8_8_8_8_REV in the call to glTexSubImage2D, as this is what the driver (through glGetInternalformativ) tells me is ideal. For the second test I use GL_UNSIGNED_INT_8_8_8_8, as per the drivers suggestion. A bit of testing confirms that these are the fastest formats to use respectively. This is using a Nvidia GeForce GTX 760 on Windows 7 x64 using the 332.21 drivers.
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <vector>
#include <cstdlib>
#include <cstdio>
#define SCREEN_SIZE_X 1024
#define SCREEN_SIZE_Y 1024
#define GLSL(src) "#version 440 core\n" #src
const char* vertex_shader = GLSL(
const vec2 data[4] = vec2[]
(
vec2(-1.0, 1.0),
vec2(-1.0, -1.0),
vec2( 1.0, 1.0),
vec2( 1.0, -1.0)
);
void main()
{
gl_Position = vec4(data[gl_VertexID], 0.0, 1.0);
}
);
const char* fragment_shader = GLSL(
layout(location = 0) uniform sampler2D texture0;
layout(location = 1) uniform vec2 screenSize;
out vec4 frag_color;
void main()
{
frag_color = texture(texture0, gl_FragCoord.xy / screenSize);
}
);
int main(int argc, char *argv[])
{
if(!glfwInit())
exit(EXIT_FAILURE);
glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 4);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
GLFWwindow* window = glfwCreateWindow(SCREEN_SIZE_X, SCREEN_SIZE_Y, "OpenGL Texture Upload", nullptr, nullptr);
if(!window)
{
glfwTerminate();
exit(EXIT_FAILURE);
}
glfwMakeContextCurrent(window);
glfwSwapInterval(0);
glewExperimental = GL_TRUE;
if(glewInit() != GLEW_OK)
{
glfwTerminate();
exit(EXIT_FAILURE);
}
GLuint vao = 0;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint vs = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(vs, 1, &vertex_shader, nullptr);
glCompileShader(vs);
GLuint fs = glCreateShader(GL_FRAGMENT_SHADER);
glShaderSource(fs, 1, &fragment_shader, nullptr);
glCompileShader(fs);
GLuint shader_program = glCreateProgram();
glAttachShader(shader_program, fs);
glAttachShader(shader_program, vs);
glLinkProgram(shader_program);
glUseProgram(shader_program);
glProgramUniform2f(shader_program, 1, SCREEN_SIZE_X, SCREEN_SIZE_Y);
GLuint texture = 0;
glGenTextures(1, &texture);
#ifdef USE_SRGB
glTextureStorage2DEXT(texture, GL_TEXTURE_2D, 1, GL_SRGB8_ALPHA8, 4096, 4096);
#else
glTextureStorage2DEXT(texture, GL_TEXTURE_2D, 1, GL_RGBA8, 4096, 4096);
#endif
glTextureParameteriEXT(texture, GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTextureParameteriEXT(texture, GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTextureParameteriEXT(texture, GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTextureParameteriEXT(texture, GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindMultiTextureEXT(GL_TEXTURE0, GL_TEXTURE_2D, texture);
glProgramUniform1i(shader_program, 0, 0);
std::vector<unsigned int> image_buffer(4096*4096, 0xFF0000FFul);
double lastTime = glfwGetTime();
double nbFrames = 0;
while(!glfwWindowShouldClose(window))
{
double currentTime = glfwGetTime();
nbFrames++;
if (currentTime - lastTime >= 1.0)
{
char cbuffer[50];
snprintf(cbuffer, sizeof(cbuffer), "OpenGL Texture Upload [%.1f fps, %.3f ms]", nbFrames, 1000.0 / nbFrames);
glfwSetWindowTitle(window, cbuffer);
nbFrames = 0;
lastTime++;
}
#ifdef USE_SRGB
glTextureSubImage2DEXT(texture, GL_TEXTURE_2D, 0, 0, 0, 4096, 4096, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8, image_buffer.data());
#else
glTextureSubImage2DEXT(texture, GL_TEXTURE_2D, 0, 0, 0, 4096, 4096, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8_REV, image_buffer.data());
#endif
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glfwSwapBuffers(window);
glfwPollEvents();
}
glfwDestroyWindow(window);
glfwTerminate();
exit(EXIT_SUCCESS);
}
Apparently there is such a thing as a 'native pixel format'. Look at this link from Nvidia, especially section 32.1.3.
I'm programming a 2D-Game in OpenGL and I have to output a level which consists of 20x15 fields.
So I'm currently outputting a texture for each field which is quite slow (300 textures/frame).
But due to the reason that the level never changes, I wondered if it's possible to combine the textures to a big, single texture before the game-loop starts.
Then I would have to output only one texture with 4 Texture Coordinates (0/0)(0/1)(1/1)(1/0) and 4 glVertex2f() which specifies the position in the Window.
This is my current Code for each of the 300 fields:
glColor3f(1,1,1);
glBindTexture(GL_TEXTURE_2D,textur);
glBegin(GL_QUADS);
glTexCoord2f(textArea.a.x,textArea.b.y);glVertex2f(display.a.x,display.a.y);
glTexCoord2f(textArea.a.x,textArea.a.y);glVertex2f(display.a.x,display.b.y);
glTexCoord2f(textArea.b.x,textArea.a.y);glVertex2f(display.b.x,display.b.y);
glTexCoord2f(textArea.b.x,textArea.b.y);glVertex2f(display.b.x,display.a.y);
glEnd();
Note that I have the images for all possible field-types in one .tga-File. So I'm choosing the right one with glTexCoord2f().
The image-File with all Tiles is loaded into
GLuint textur;
So I bind the same texture for every field.
My target is to decrease CPU-time. Display-Lists didn't work because there is so many data to load in the Graphics Card, that, in the end, display-Lists were even slower.
I also wasn't able to use VBOs because I don't use extensions like GLUT.
So my idea was to generate a single texture which should be quite easy and effective.
I hope you can give me feedback how I can combine textures and if this method would be the easiest one to increase performance
EDIT: that are the OpenGl-Functions I use in my program:
When I start the program, I initialize the window:
glfwInit();
if( !glfwOpenWindow(windowSize.x,windowSize.y, 0, 0, 0, 0, 0, 0, GLFW_WINDOW ) )
{ glfwTerminate();
return;
}
And that's all what the game-loop does with OpenG:
int main()
{
//INIT HERE (see code above)
glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
glAlphaFunc(GL_GREATER,0.1f);
glEnable(GL_ALPHA_TEST);
long loopStart;//measure loopcycle-time
do{
height = height > 0 ? height : 1;
glViewport( 0, 0, width, height ); //set Origin
glClearColor( 0.0f, 0.0f, 0.0f, 0.0f ); //background-color
glClear(GL_COLOR_BUFFER_BIT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0,windowSize.x,0,windowSize.y,0,128); //2D-Mode
glMatrixMode(GL_MODELVIEW);
loopStart=clock();
//(...) OUTPUT HERE (code see above)
glfwSwapBuffers(); //erzeugte Grafikdaten ausgeben
printf("%4dms -> ",clock()-loopStart);
}while(...);
glDisable(GL_ALPHA_TEST);
glDisable(GL_TEXTURE_2D);
glfwTerminate();
}
I see you're using GLFW. You can add GLEW and GLM and then you should use OpenGL 3.x or higher.
Here is a FULL example, how you can easily draw 2000 Textured Quads (With Alphablending) or more with FPS of 200 or more on a lost budget Laptop. It has only one little Texture, but it will work also with an 4096x4096 Texture Atlas. You get one HUGE Performance-Hit, if the Subtexture Size in the big texture EXACTLY matches the size of your Quad you draw! You should use 50x50 Pixels also in the Big-Texture! The following Deme-Code here also UPDATES ALL 2000 Quads each frame and send them to the GPU. If you will not have to update them each frame and put the Scroll-Coordinates to the Shader..you will gain performance again.
If you need no blending...use Alpha-Tests..you will gain again more speed.
#define GLEW_STATIC
#include "glew.h"
#include "glfw.h"
#include "glm.hpp"
#include "glm/gtc/matrix_transform.hpp"
#include "glm/gtx/transform.hpp"
#include <sstream>
#include <fstream>
#include <vector>
#define BUFFER_OFFSET(i) ((char *)NULL + (i))
std::ofstream logger("Log\\Ausgabe.txt", (std::ios::out | std::ios::app));
class Vertex
{
public:
float x;
float y;
float z;
float tx;
float ty;
};
class Quad
{
public:
float x;
float y;
float width;
float height;
};
int getHighResTimeInMilliSeconds(bool bFirstRun);
GLuint buildShader();
void addQuadToLocalVerticeArray(Vertex * ptrVertexArrayLocal, Quad *quad, int *iQuadCounter);
int main()
{
logger << "Start" << std::endl;
if(!glfwInit())
exit(EXIT_FAILURE);
glfwOpenWindowHint(GLFW_OPENGL_VERSION_MAJOR,3);
glfwOpenWindowHint(GLFW_OPENGL_VERSION_MINOR,3);
glfwOpenWindowHint(GLFW_OPENGL_FORWARD_COMPAT, 1);
glfwOpenWindowHint(GLFW_OPENGL_PROFILE,GLFW_OPENGL_CORE_PROFILE);
if( !glfwOpenWindow(1366, 768,8,8,8,8,32,32,GLFW_FULLSCREEN) )
{
glfwTerminate();
exit( EXIT_FAILURE );
}
if (glewInit() != GLEW_OK)
exit( EXIT_FAILURE );
//Init
GLuint VertexArrayID;
GLuint vertexbuffer;
GLuint MatrixID;
GLuint TextureID;
GLuint Texture;
GLuint programID = buildShader();
//Texture in Video-Speicher erstellen
GLFWimage img;
int iResult = glfwReadImage("Graphics\\gfx.tga", &img, GLFW_NO_RESCALE_BIT);
glEnable(GL_TEXTURE_2D);
glGenTextures(1, &Texture);
glBindTexture(GL_TEXTURE_2D, Texture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA,32,32, 0, GL_RGBA, GL_UNSIGNED_BYTE, img.Data);
glfwFreeImage(&img);
Vertex * ptrVertexArrayLocal = new Vertex[12000];
glGenVertexArrays(1, &VertexArrayID);
glBindVertexArray(VertexArrayID);
glGenBuffers(1, &vertexbuffer);
glBindBuffer(GL_ARRAY_BUFFER, VertexArrayID);
glBufferData(GL_ARRAY_BUFFER, sizeof(Vertex) * 12000, NULL, GL_DYNAMIC_DRAW);
glm::mat4 Projection = glm::ortho(0.0f, (float)1366,0.0f, (float)768, 0.0f, 100.0f);
glm::mat4 Model = glm::mat4(1.0f);
glm::mat4 MVP = Projection * Model;
glViewport( 0, 0, 1366, 768 );
MatrixID = glGetUniformLocation(programID, "MVP");
glEnable(GL_CULL_FACE);
glEnable (GL_BLEND);
glBlendFunc (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
TextureID = glGetUniformLocation(programID, "myTextureSampler");
glUseProgram(programID);
glUniformMatrix4fv(MatrixID, 1, GL_FALSE, &MVP[0][0]);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, Texture);
glUniform1i(TextureID, 0);
int iQuadVerticeCounter=0;
int iNumOfQuads = 2000;
Quad * ptrQuads = new Quad[iNumOfQuads];
//LOCAL VERTICES CHANGES EACH LOOP
for (int i=0; i<iNumOfQuads; i++)
{
ptrQuads[i].width = 32;
ptrQuads[i].height = 32;
ptrQuads[i].x = (float)(rand() % (1334));
ptrQuads[i].y = (float)(rand() % (736));
}
int iCurrentTime=0;
int iFPS=0;
int iFrames=0;
int iFrameCounterTimeStart=0;
int running = GL_TRUE;
bool bFirstRun=true;
while( running )
{
iCurrentTime = getHighResTimeInMilliSeconds(bFirstRun);
bFirstRun=false;
//UPDATE ALL QUADS EACH FRAME!
for (int i=0; i<iNumOfQuads; i++)
{
ptrQuads[i].width = 32;
ptrQuads[i].height = 32;
ptrQuads[i].x = ptrQuads[i].x;
ptrQuads[i].y = ptrQuads[i].y;
addQuadToLocalVerticeArray(ptrVertexArrayLocal, &ptrQuads[i], &iQuadVerticeCounter);
}
//DO THE RENDERING
glClear( GL_COLOR_BUFFER_BIT );
glBindBuffer(GL_ARRAY_BUFFER, VertexArrayID);
glBufferSubData(GL_ARRAY_BUFFER, 0,sizeof(Vertex) * iQuadVerticeCounter, ptrVertexArrayLocal);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer);
glVertexAttribPointer(0,3,GL_FLOAT,GL_FALSE,sizeof(Vertex),BUFFER_OFFSET(0));
glEnableVertexAttribArray(1);
glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer);
glVertexAttribPointer(1,2,GL_FLOAT,GL_FALSE,sizeof(Vertex),BUFFER_OFFSET(3*sizeof(GL_FLOAT)));
glDrawArrays(GL_TRIANGLES, 0, iQuadVerticeCounter);
glDisableVertexAttribArray(0);
glDisableVertexAttribArray(1);
iQuadVerticeCounter=0;
glfwSwapBuffers();
//END OF DOING THE RENDERING
running = !glfwGetKey( GLFW_KEY_ESC ) &&glfwGetWindowParam( GLFW_OPENED );
iFrames++;
if (iCurrentTime >= iFrameCounterTimeStart + 1000.0f)
{
iFPS = (int)((iCurrentTime - iFrameCounterTimeStart) / 1000.0f * iFrames);
iFrameCounterTimeStart = iCurrentTime;
iFrames = 0;
logger << "FPS: " << iFPS << std::endl;
}
}
glfwTerminate();
exit( EXIT_SUCCESS );
}
int getHighResTimeInMilliSeconds(bool bFirstRun)
{
if (bFirstRun)
glfwSetTime(0);
return (int)((float)glfwGetTime()*1000.0f);
}
GLuint buildShader()
{
//Hint: Shader in the TXT-File looks like this
/*std::stringstream ssVertexShader;
ssVertexShader << "#version 330 core"<< std::endl
<< "layout(location = 0) in vec3 vertexPosition_modelspace;"<< std::endl
<< "layout(location = 1) in vec2 vertexUV;"<< std::endl
<< "out vec2 UV;"<< std::endl
<< "uniform mat4 MVP;"<< std::endl
<< "void main(){"<< std::endl
<< "vec4 v = vec4(vertexPosition_modelspace,1);"<< std::endl
<< "gl_Position = MVP * v;"<< std::endl
<< "UV = vertexUV;"<< std::endl
<< "}"<< std::endl;*/
std::string strVertexShaderCode;
std::ifstream VertexShaderStream("Shader\\VertexShader.txt", std::ios::in);
if(VertexShaderStream.is_open())
{
std::string Line = "";
while(getline(VertexShaderStream, Line))
strVertexShaderCode += "\n" + Line;
VertexShaderStream.close();
}
//Hint: Shader in the TXT-File looks like this
/*std::stringstream ssFragmentShader;
ssFragmentShader << "#version 330 core\n"
"in vec2 UV;\n"
"out vec4 color;\n"
"uniform sampler2D myTextureSampler;\n"
"void main(){\n"
"color = texture( myTextureSampler, UV ).rgba;\n"
"}\n";*/
std::string strFragmentShaderCode;
std::ifstream FragmentShaderStream("Shader\\FragmentShader.txt", std::ios::in);
if(FragmentShaderStream.is_open())
{
std::string Line = "";
while(getline(FragmentShaderStream, Line))
strFragmentShaderCode += "\n" + Line;
FragmentShaderStream.close();
}
GLuint gluiVertexShaderId = glCreateShader(GL_VERTEX_SHADER);
char const * VertexSourcePointer = strVertexShaderCode.c_str();
glShaderSource(gluiVertexShaderId, 1, &VertexSourcePointer , NULL);
glCompileShader(gluiVertexShaderId);
GLint Result = GL_FALSE;
int InfoLogLength;
glGetShaderiv(gluiVertexShaderId, GL_COMPILE_STATUS, &Result);
glGetShaderiv(gluiVertexShaderId, GL_INFO_LOG_LENGTH, &InfoLogLength);
std::vector<char> VertexShaderErrorMessage(InfoLogLength);
glGetShaderInfoLog(gluiVertexShaderId, InfoLogLength, NULL, &VertexShaderErrorMessage[0]);
std::string strInfoLog = std::string(&VertexShaderErrorMessage[0]);
GLuint gluiFragmentShaderId = glCreateShader(GL_FRAGMENT_SHADER);
char const * FragmentSourcePointer = strFragmentShaderCode.c_str();
glShaderSource(gluiFragmentShaderId, 1, &FragmentSourcePointer , NULL);
glCompileShader(gluiFragmentShaderId);
Result = GL_FALSE;
glGetShaderiv(gluiFragmentShaderId, GL_COMPILE_STATUS, &Result);
glGetShaderiv(gluiFragmentShaderId, GL_INFO_LOG_LENGTH, &InfoLogLength);
std::vector<char> FragmentShaderErrorMessage(InfoLogLength);
glGetShaderInfoLog(gluiFragmentShaderId, InfoLogLength, NULL, &FragmentShaderErrorMessage[0]);
strInfoLog = std::string(&FragmentShaderErrorMessage[0]);
GLuint gluiProgramId = glCreateProgram();
glAttachShader(gluiProgramId, gluiVertexShaderId);
glAttachShader(gluiProgramId, gluiFragmentShaderId);
glLinkProgram(gluiProgramId);
Result = GL_FALSE;
glGetProgramiv(gluiProgramId, GL_LINK_STATUS, &Result);
glGetProgramiv(gluiProgramId, GL_INFO_LOG_LENGTH, &InfoLogLength);
std::vector<char> ProgramErrorMessage( std::max(InfoLogLength, int(1)) );
glGetProgramInfoLog(gluiProgramId, InfoLogLength, NULL, &ProgramErrorMessage[0]);
strInfoLog = std::string(&ProgramErrorMessage[0]);
glDeleteShader(gluiVertexShaderId);
glDeleteShader(gluiFragmentShaderId);
return gluiProgramId;
}
void addQuadToLocalVerticeArray(Vertex * ptrVertexArrayLocal, Quad *quad, int *ptrQuadVerticeCounter)
{
//Links oben
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 1.0f;
++(*ptrQuadVerticeCounter);
//Links unten
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y - quad->height;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 0.0f;
++(*ptrQuadVerticeCounter);
//Rechts unten
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x + quad->width;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y - quad->height;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 1.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 0.0f;
++(*ptrQuadVerticeCounter);
//Rechts unten
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x + quad->width;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y - quad->height;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 1.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 0.0f;
++(*ptrQuadVerticeCounter);
//Rechts oben
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x + quad->width;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 1.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 1.0f;
++(*ptrQuadVerticeCounter);
//Links oben
ptrVertexArrayLocal[*ptrQuadVerticeCounter].x = quad->x;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].y = quad->y;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].z = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].tx = 0.0f;
ptrVertexArrayLocal[*ptrQuadVerticeCounter].ty = 1.0f;
++(*ptrQuadVerticeCounter);
}
I identified a huge time-killer now. The textures I was using were too large, and the resolution was very unefficient.
The main-texture which included the level sprites had a resolution of 2200x2200 Pixels. So the GPU increased the size to 4096x4096 and calculated it with a huge amount of data.
The image contains 10x10 different Level-Tiles which are outputed on the screen with a resolution of 50x50 pixels each.
So I saved the Tiles-File with a lower resolution (1020 x 1020 Pixels -> each tile=102x102px) and now I have a loop-cycle time of <=15ms.
This isn't perfect, but in comparison with my previous 30-60ms it was a huge progress.