Views over Append/Consume buffers in Direct3D 11 - gpgpu

I'm using append/consume buffers to reduce shading work in my path-tracer (immediately shade empty space + emitters in a pre-pass, append the remaining pixels for full processing), and I've heard that I should be using a UAV when I'm accessing through AppendStructuredBuffer<T> and an SRV when I'm accessing through ConsumeStructuredBuffer. I haven't seen that claim in any of Microsoft's documentation, but it might explain why my calls to [Consume()] are returning empty data - is it accurate?

I should have tested myself before I asked - shaders with ConsumeStructuredBuffers declared in SRV registers (tN) fail to compile and emit an error saying they're only bindable through UAVs.
My AppendStructuredBuffer bound through the UAV registers works fine, so it seems like I was just quoting hearsay; unordered-access-views should be used for both HLSL types.

Related

Adjust value set in IDXGISwapChain2::SetMaximumFrameLatency

I use the combination of DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT, GetFrameLatencyWaitableObject() and SetMaximumFrameLatency(UINT MaxLatency) to control the input lag vs. smoothness of my application as explained at https://learn.microsoft.com/en-us/windows/uwp/gaming/reduce-latency-with-dxgi-1-3-swap-chains. A value of 1 gives the lowest input lag, but sometimes I need a higher value to reduce jitter/stutter/slowdown caused by cpu and gpu cannot really work in parallel when the value is 1.
I want to be able to dynamically change this value based on the required input lag vs smoothness trade-off.
The problem I have noticed is that while it's possible to, between frames, increase this value by calling SetMaximumFrameLatency with a higher value than set before, I see no effect when decreasing this value by calling the function again with a lower value than the maximum value ever set for this swap chain by a previous call to the same function. So if I ever set it to 2, it is not possible to set it to 1 later. Is this a bug or undocumented "feature"? Or did I do something wrong?
The API itself does not return any error or similar; from the API point of view it appears to apply the new lower value correctly.
To test this, I have BufferCount = 16 and then adjust the max latency value from 1 to 16 which makes the current latency obvious to the eye. It's therefore apparent that dxgi does not apply new lower values.
I've tried to call functions in different orders, close the handle for the waitable object and recreate a new one when modifying the latency, but nothing works. The only workaround so far I'm aware of is to fully recreate the swap chain, which is annoying due to the requirement to unbind all context objects etc.
When initializing the game, I create the swap chain and set an initial latency using SetMaximumFrameLatency.
The game loop is then basically this:
Call WaitForSingleObject on the waitable object handle.
Process inputs.
Render and present a frame.
If it's decided that the latency should change at this point, call SetMaximumFrameLatency with the new value.
Other info:
Renderer: Direct3D 11
OS: Windows 11 21H2 version 22000.675
Graphics card: Intel UHD Graphics 620 / Nvidia GeForce MX150 (tried with both cards) with latest drivers, supporting WDDM 3.0
App type: Win32 desktop application

How can I bind a buffer resource that resides on the GPU to the input assembler (IA)?

I use compute shaders to compute a triangle list and to store it in a RWStructuredBuffer. For testing I read this buffer and pass it to the IA via context.InputAssembler.SetVertexBuffers (…). This approach works, but is valid only for testing the data for correctness.
Now I want to bind the (already existing) buffer to the IA stage using a resource view (aka without passing a pointer to the vertex buffer).
I am reading some good books (Frank D. Luna, Jason Zink), but they never mention this case.
===============
EDIT:
The syntax I am using here in imposed by the SharpDX wrapper.
I can bind the buffer to the vertex shader via context.VertexShader.SetShaderResource(...), bindig a ResoureceView. In the VS I use SV_VertexID to access the buffer. So I HAVE a working solution for moment, but there might be cases in the future where I must bind the buffer to the input assembler.
Simply put, you can't bind a structured buffer to the IA stage, at least directly, runtime will not allow this.
If you put ResourceOptionFlags.BufferStructured as OptionFlags, you are not allowed to use : VertexBuffer/IndexBuffer/StreamOutput/ConstantBuffer/RenderTarget/Depth as bind flags, Resource creation will fail.
One option, which costs you a GPU copy, is to create a second buffer with VertexBuffer BindFlags, and Default usage (same size as your structured buffer).
Once you are done processing your structuredbuffer, call:
DeviceContext.CopyResource
And you'll have a standard vertex buffer ready to use.

Directx Texture interface to existing memory

I'm writing a rendering app that communicates with an image processor as a sort of virtual camera, and I'm trying to figure out the fastest way to write the texture data from one process to the awaiting image buffer in the other.
Theoretically I think it should be possible with 1 DirectX copy from VRAM directly to the area of memory I want it in, but I can't figure out how to specify a region of memory for a texture to occupy, and thus must perform an additional memcpy. DX9 or DX11 solutions would be welcome.
So far, the docs here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb174363(v=vs.85).aspx have held the most promise.
"In Windows Vista CreateTexture can create a texture from a system memory pointer allowing the application more flexibility over the use, allocation and deletion of the system memory"
I'm running on Windows 7 with the June 2010 Directx SDK, However, whenever I try and use the function in the way it specifies, I the function fails with an invalid arguments error code. Here is the call I tried as a test:
static char s_TextureBuffer[640*480*4]; //larger than needed
void* p = (void*)s_TextureBuffer;
HRESULT res = g_D3D9Device->CreateTexture(640,480,1,0, D3DFORMAT::D3DFMT_L8, D3DPOOL::D3DPOOL_SYSTEMMEM, &g_ReadTexture, (void**)p);
I tried with several different texture formats, but with no luck. I've begun looking into DX11 solutions, it's going slowly since I'm used to DX9. Thanks!

Some Windows API calls fail unless the string arguments are in the system memory rather than local stack

We have an older massive C++ application and we have been converting it to support Unicode as well as 64-bits. The following strange thing has been happening:
Calls to registry functions and windows creation functions, like the following, have been failing:
hWnd = CreateSysWindowExW( ExStyle, ClassNameW.StringW(), Label2.StringW(), Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
ClassNameW and Label2 are instances of our own Text class which essentially uses malloc to allocate the memory used to store the string.
Anyway, when the functions fail, and I call GetLastError it returns the error code for "invalid memory access" (though I can inspect and see the string arguments fine in the debugger). Yet if I change the code as follows then it works perfectly fine:
BSTR Label2S = SysAllocString(Label2.StringW());
BSTR ClassNameWS = SysAllocString(ClassNameW.StringW());
hWnd = CreateSysWindowExW( ExStyle, ClassNameWS, Label2S, Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
SysFreeString(ClassNameWS); ClassNameWS = 0;
SysFreeString(Label2S); Label2S = 0;
So what gives? Why would the original functions work fine with the arguments in local memory, but when used with Unicode, the registry function require SysAllocString, and when used in 64-bit, the Windows creation functions also require SysAllocString'd string arguments? Our Windows procedure functions have all been converted to be Unicode, always, and yes we use SetWindowLogW call the correct default Unicode DefWindowProcW etc. That all seems to work fine and handles and draws Unicode properly etc.
The documentation at http://msdn.microsoft.com/en-us/library/ms632679%28v=vs.85%29.aspx does not say anything about this. While our application is massive we do use debug heaps and tools like Purify to check for and clean up any memory corruption. Also at the time of this failure, there is still only one main system thread. So it is not a thread issue.
So what is going on? I have read that if string arguments are marshalled anywhere or passed across process boundaries, then you have to use SysAllocString/BSTR, yet we call lots of API functions and there is lots of code out there which calls these functions just using plain local strings?
What am I missing? I have tried Googling this, as someone else must have run into this, but with little luck.
Edit 1: Our StringW function does not create any temporary objects which might go out of scope before the actual API call. The function is as follows:
Class Text {
const wchar_t* StringW () const
{
return TextStartW;
}
wchar_t* TextStartW; // pointer to current start of text in DataArea
I have been running our application with the debug heap and memory checking and other diagnostic tools, and found no source of memory corruption, and looking at the assembly, there is no sign of temporary objects or invalid memory access.
BUT I finally figured it out:
We compile our code /Zp1, which means byte aligned memory allocations. SysAllocString (in 64-bits) always return a pointer that is aligned on a 8 byte boundary. Presumably a 32-bit ANSI C++ application goes through an API layer to the underlying Unicode windows DLLs, which would also align the pointer for you.
But if you use Unicode, you do not get that incidental pointer alignment that the conversion mapping layer gives you, and if you use 64-bits, of course the situation will get even worse.
I added a method to our Text class which shifts the string pointer so that it is aligned on an eight byte boundary, and viola, everything runs fine!!!
Of course the Microsoft people say it must be memory corruption and I am jumping the wrong conclusion, but there is evidence it is not the case.
Also, if you use /Zp1 and include windows.h in a 64-bit application, the debugger will tell you sizeof(BITMAP)==28, but calling GetObject on a bitmap will fail and tell you it needs a 32-byte structure. So I suspect that some of Microsoft's API is inherently dependent on aligned pointers, and I also know that some optimized assembly (I have seen some from Fortran compilers) takes advantage of that and crashes badly if you ever give it unaligned pointers.
So the moral of all of this is, dont use "funky" compiler arguments like /Zp1. In our case we have to for historical reasons, but the number of times this has bitten us...
Someone please give me a "this is useful" tick on my answer please?
Using a bit of psychic debugging, I'm going to guess that the strings in your application are pooled in a read-only section.
It's possible that the CreateSysWindowsEx is attempting to write to the memory passed in for the window class or title. That would explain why the calls work when allocated on the heap (SysAllocString) but not when used as constants.
The easiest way to investigate this is to use a low level debugger like windbg - it should break into the debugger at the point where the access violation occurs which should help figure out the problem. Don't use Visual Studio, it has a nasty habit of being helpful and hiding first chance exceptions.
Another thing to try is to enable appverifier on your application - it's possible that it may show something.
Calling a Windows API function does not cross the process boundary, since the various Windows DLLs are loaded into your process.
It sounds like whatever pointer that StringW() is returning isn't valid when Windows is trying to access it. I would look there - is it possible that the pointer returned it out of scope and deleted shortly after it is called?
If you share some more details about your string class, that could help diagnose the problem here.

Converting glReadBuffer() / glDrawBuffer() calls into OpenGL ES

I'm having trouble understanding how to port glReadBuffer() & glDrawBuffer() calls into Open GL ES 1.1. Various forum posts on the internet just say "use VBOs," without going into more depth.
Can you please help me understand an appropriate conversion? Say I have:
glReadBuffer(GL_FRONT);
followed by
glDrawBuffer(GL_BACK_LEFT);
state->paint(state_id, f);
How can I write the pixels out?
glReadBuffer and glDrawBuffer just set the source and target for subsequent drawing operations. Assuming you're targeting a monoscopic device, such as the iPhone or an Android device, and have requested two buffers then you're already set for drawing to the back buffer. The only means of reading the colour buffer in GL ES is glReadPixels, which will read from the same buffer that you're drawing to.
All of these are completely unrelated to VBOs, which pass off management of arrays of data to the driver, often implicitly allowing them to be put into the GPU's direct address space.

Resources