Zero Opengl 3.2 pixelformat matches found? - windows

Today I finally found out what has been stalling my development process: Even though no errorcode is set, the function wglChoosePixelFormatARB returns 0 pixelformats.
I am trying to set up an OpenGL context in my C++ application and I have managed to retrieve the function pointers for the extensions.
glGetIntegerv(GL_MAJOR_VERSION, &maj)
returns 4 so, naturally, I assumed it would be possible to create an OpenGL 3.2 context. However, after finding out there were no matches, I started to comment out some of my requirements to go in the attribList parameter. There were no matches whatsoever.
Only when I, just to be certain, commented out
WGL_CONTEXT_MAJOR_VERSION_ARB, 3,
WGL_CONTEXT_MINOR_VERSION_ARB, 2,
I finally got matches. Out of the 8 matching pixel formats that the other requirements meet, not ONE of them seems to support version 3 of OGL.
Has anyone ever run into this? I have tried updating/reinstalling my video drivers, but nothing has changed. I am running this on Windows 7, MS Visual Studio 2008, and my graphics card is one from the AMD Radeon HD 7700 Series.

The WGL_CONTEXT_MAJOR_VERSION_ARB, WGL_CONTEXT_MINOR_VERSION_ARB and related attributes are not attributes of the Windows Pixelformat.
You must not use them with wglChoosePixelFormatARB().
Those options belong into the attribute list of wglCreateContextAttribsARB as defined by the WGL_ARB_create_context extension.

Related

Adjust value set in IDXGISwapChain2::SetMaximumFrameLatency

I use the combination of DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT, GetFrameLatencyWaitableObject() and SetMaximumFrameLatency(UINT MaxLatency) to control the input lag vs. smoothness of my application as explained at https://learn.microsoft.com/en-us/windows/uwp/gaming/reduce-latency-with-dxgi-1-3-swap-chains. A value of 1 gives the lowest input lag, but sometimes I need a higher value to reduce jitter/stutter/slowdown caused by cpu and gpu cannot really work in parallel when the value is 1.
I want to be able to dynamically change this value based on the required input lag vs smoothness trade-off.
The problem I have noticed is that while it's possible to, between frames, increase this value by calling SetMaximumFrameLatency with a higher value than set before, I see no effect when decreasing this value by calling the function again with a lower value than the maximum value ever set for this swap chain by a previous call to the same function. So if I ever set it to 2, it is not possible to set it to 1 later. Is this a bug or undocumented "feature"? Or did I do something wrong?
The API itself does not return any error or similar; from the API point of view it appears to apply the new lower value correctly.
To test this, I have BufferCount = 16 and then adjust the max latency value from 1 to 16 which makes the current latency obvious to the eye. It's therefore apparent that dxgi does not apply new lower values.
I've tried to call functions in different orders, close the handle for the waitable object and recreate a new one when modifying the latency, but nothing works. The only workaround so far I'm aware of is to fully recreate the swap chain, which is annoying due to the requirement to unbind all context objects etc.
When initializing the game, I create the swap chain and set an initial latency using SetMaximumFrameLatency.
The game loop is then basically this:
Call WaitForSingleObject on the waitable object handle.
Process inputs.
Render and present a frame.
If it's decided that the latency should change at this point, call SetMaximumFrameLatency with the new value.
Other info:
Renderer: Direct3D 11
OS: Windows 11 21H2 version 22000.675
Graphics card: Intel UHD Graphics 620 / Nvidia GeForce MX150 (tried with both cards) with latest drivers, supporting WDDM 3.0
App type: Win32 desktop application

How to use OpenGL ES with GLFW on windows?

Since NVIDIA DRIVE product supports the OpenGL ES 2 and 3 specifications, I want to run OpenGL ES code on Windows 10 with GTX 2070, which will elimated a
Also, GLFW support configuration like glfwWindowHint(GLFW_CLIENT_API, GLFW_OPENGL_ES_API). Is it possible to use GLFW to run OpenGL ES code on Windows 10?
First of all, make sure you have downloaded glad with GL ES.
https://glad.dav1d.de/
For the glfw part, you need to set the GLFW_CLIENT_API window hints.
glfwWindowHint(GLFW_CLIENT_API, GLFW_OPENGL_ES_API);
And also choose wich version you want, for example:
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2);
Then specify the kind of context. In the case of OpenGL ES, according to the documentation, it must be GLFW_OPENGL_ANY_PROFILE.
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_ANY_PROFILE);
GLFW_OPENGL_PROFILE indicates the OpenGL profile used by the context.
This is GLFW_OPENGL_CORE_PROFILE or GLFW_OPENGL_COMPAT_PROFILE if the
context uses a known profile, or GLFW_OPENGL_ANY_PROFILE if the OpenGL
profile is unknown or the context is an OpenGL ES context. Note that
the returned profile may not match the profile bits of the context
flags, as GLFW will try other means of detecting the profile when no
bits are set.
However, GLFW_OPENGL_ANY_PROFILE is already the default value, so you don't really need to set it.

WinDbg not showing register values

Basically, this is the same question that was asked here.
When performing kernel debugging of a machine running Windows 7 or older, with WinDbg version 6.2 and up, the debugger doesn't show anything in the registers window. Pressing the Customize... button results in a message box that reads Registers are not yet known.
At the same time, issuing the r command results in perfectly valid register values being printed out.
What is the reason for this behaviour, and can it be fixed?
TL;DR: I wrote an extension DLL that fixes the bug. Available here.
The Problem
To understand the problem, we first need to understand that WinDbg is basically just a frontend to Microsoft's Windows Symbolic Debugger Engine, implemented inside dbgeng.dll. Other frontends include the command-line kd.exe (kernel debugger) and cdb.exe (user-mode debugger).
The engine implements everything we expect from a debugger: working with symbol files, read and writing memory and registers, setting breakpoitns, etc. The engine then exposes all of this functionality through COM-like interfaces (they implement IUnknown but are not registered components). This allows us, for instance, to write our own debugger (like this person did).
Armed with this knowledge, we can now make an educated guess as to how WinDbg obtains the values of the registers on the target machine.
The engine exposes the IDebugRegisters interface for manipulating registers. This interface declares the GetValues method for retrieving the values of multiple registers in one go. But how does WinDbg know how many registers are there? That why we have the GetNumberRegisters method.
So, to retrieve the values of all registers on the target, we'll have to do something like this:
Call IDebugRegisters::GetNumberRegisters to get the total number of registers.
Call IDebugRegisters::GetValues with the Count parameter set to the total number of registers, the Indices parameter set to NULL, and the Start parameter set to 0.
One tiny problem, though: the second call fails with E_INVALIDARG.
Ehm, excuse me? How can it fail? Especially puzzling is the documentation for this return value:
The value of the index of one of the registers is greater than the number of registers on the target machine.
But I just asked you how many registers there are, so how can that value be out of range? Okay, let's continue reading the docs anyway, maybe something will become clear:
If the return value is not S_OK, some of the registers still might have been read. If the target was not accessible, the return type is E_UNEXPECTED and Values is unchanged; otherwise, Values will contain partial results and the registers that could not be read will have type DEBUG_VALUE_INVALID.
(Emphasis mine.)
Aha! So maybe the engine just couldn't read one of the registers! But which one? Turns out that the engine chokes on the xcr0 register. From the Intel 64 and IA-32 Architectures Software Developer’s Manual:
Extended control register XCR0 contains a state-component bitmap that specifies the user state components that software has enabled the XSAVE feature set to manage. If the bit corresponding to a state component is clear in XCR0, instructions in the XSAVE feature set will not operate on that state component, regardless of the value of the instruction mask.
Okay, so the register controls the operation of the XSAVE instruction, which saves the state of the CPU's extended features (like XMM and AVX). According to the last comment on this page, this instruction requires some support from the operating system. Although the comment states that Windows 7 (that's what the VM I was testing on was running) does support this instruction, it seems that the issue at hand is related to the OS anyway, as when the target is Windows 8 everything works fine.
Really, it's unclear whether the bug is within the debugger engine, which reports more registers than it can retrieve values for, or within WinDbg, which refuses to show any values at all if the engine fails to produce all of them.
The Solution
We could, of course, bite the bullet and just use an older version of WinDbg for debugging older Windows versions. But where's the challenge in that?
Instead, I present to you a debugger extension that solves this problem. It does so by hooking (with the help of this library) the relevant debugger engine methods and returning S_OK if the only register that failed was xcr0. Otherwise, it propagates the failure. The extension supports runtime unload, so if you experience problems you can always disable the hooks.
That's it, have fun!

OpenGL core profile incredible slowdown on OS X

I added a new GL renderer to my engine, which uses the core profile. While it runs fine on Windows and/or nvidia cards, it is like 10 times slower on OS X (3 fps instead of 30). The weird thing is, that my compatibility profile renderer runs fine.
I collected some traces with Instruments and the GL profiler:
https://www.dropbox.com/sh/311fg9wu0zrarzm/31CGvUcf2q
It shows that the application spends its time in glDrawRangeElements.
I tried the following things:
use glDrawElements instead (no effect)
flip culling (no effect on speed)
disable some GL_DYNAMIC_DRAW buffers (no effect)
bind index buffer after VAO when drawing (no effect)
converted indices to 4 byte (no effect)
use GL_BGRA textures (no effect)
What I didn't try is to align my vertices to 16 byte boundary and/or convert indices to 4 byte, but seriously, if that would be the issue then why the hell does the standard allow it?
I'm creating the context like this:
NSOpenGLPixelFormatAttribute attributes[] =
{
NSOpenGLPFAColorSize, 24,
NSOpenGLPFAAlphaSize, 8,
NSOpenGLPFADepthSize, 24,
NSOpenGLPFAStencilSize, 8,
NSOpenGLPFADoubleBuffer,
NSOpenGLPFAAccelerated,
NSOpenGLPFANoRecovery,
NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core,
0
};
NSOpenGLPixelFormat* format = [[NSOpenGLPixelFormat alloc] initWithAttributes:attributes];
NSOpenGLContext* context = [[NSOpenGLContext alloc] initWithFormat:format shareContext:nil];
[self.view setOpenGLContext:context];
[context makeCurrentContext];
Tried on the following specs:
radeon 6630M, OS X 10.7.5
radeon 6750M, OS X 10.7.5
geforce GT 330M, OS X 10.8.3
Do you have any ideas what I might do wrong? Again, it works fine with the compatibility profile (not using VAOs though).
UPDATE: reported to Apple.
UPDATE: Apple doesn't give a damn to the problem...anyway I created a small test program which is actually good. Now I compared the call stack with Instruments, and found out that when using the engine, glDrawRangeElements does two calls:
gleDrawArraysOrElements_ExecCore
gleDrawArraysOrElements_Entries_Body
while in the test program it calls only the second. Now the first call does something like an immediate mode render (gleFlushPrimitivesTCLFunc, gleRunVertexSubmitterImmediate), so obviously casues the slowdown.
Finally, I was able to reproduce the slowdown. This is just crazy... It is clearly caused by glBindAttribLocation being called on the "my_Position" attribute. Now I did some testing:
1 is default (as returned by glGetAttribLocation)
if I set it to zero, theres no problem
if I set it to 1, the rendering becomes slow
if I set it to any larger number, it is slow again
Obviously I relink the program (check code). It is not a problem in the implementation, I tested it with "normal" values too.
Test program:
https://www.dropbox.com/s/dgg48g1fwgyc5h0/SLOWDOWN_REPRO.zip
How to repro:
open with XCode
open common/glext.h (don't be disturbed by the name)
modify the GLDECLUSAGE_POSITION constant from 0 to 1
compile and run => slow
changing back to zero => good
I have managed to get myself the same problem in the following circumstance under
OS X Mavericks:
Instanced rendering using array buffers to give each instance its own modelToWorld and inverseNormal matrices; attribute locations are being specified through layout rather than using glGetAttribLocation
leaving one of these array buffers unused in the shader, where location is declared but the attribute isn't actually used for anything in the glsl code
In this case, a call to glDrawElementsInstanced takes up a LOT of CPU time (under normal circumstances, this call uses nearly zero CPU even when drawing several thousand instances).
You can tell that you're getting this specific problem if almost all of the CPU time used within glDrawElementsInstanced is spent in gleDrawArraysOrElements_ExecCore. Making sure that all of the array buffers are actually referenced in your shader code fixes the CPU time back to (nearly) zero.
I suspect that this is one of the situations where leaving a variable out of your main() in glsl confuses the compiler in to deleting all reference to that variable, leaving you with a dangling reference to an attribute or uniform.

CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?

Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.
The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.
I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.
I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.
Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.

Resources