Is there performance side effects or any other side effects from always defining standard directx buffers with the bind flags D3D11_BIND_SHADER_RESOURCE and D3D11_BIND_RENDER_TARGET with the exception of more specialised buffers such as index, vertex, constant etc.
Yes
Bind Flags will associated the resource with a shader stage or how to access it. The device will not be able to optimize for its use.
Also it can cause an issue when trying to combine with another flag, example: D3D11_BIND_SHADER_RESOURCE can not be use with D3D11_MAP_WRITE_NO_OVERWRITE
http://msdn.microsoft.com/en-us/library/windows/desktop/ff476085(v=vs.85).aspx
Remarks
In general, binding flags can be combined using a logical OR (except the constant-buffer flag);> however, you should use a single flag to allow the device to optimize the resource usage.
Related
Suppose I have 2 OpenCL-capable devices on my machine (not including CPUs); and suppose that an evil colleague of mine creates a different context for each of them, which I have to work with.
I know I can't share buffers between contexts - not properly and officially, at least. But suppose that I create two OpenCL buffers, one in each context, and pass to each of them the same region of host memory, with the CL_MEM_USE_HOST_PTR flag. e.g.:
enum { size = 1234 };
//...
context_1 = clCreateContext(NULL, 1, &some_device_id, NULL, NULL, NULL);
context_2 = clCreateContext(NULL, 1, &another_device_id, NULL, NULL, NULL);
void* host_mem = malloc(size);
assert(host_mem != NULL);
buff_1 = clCreateBuffer(context_1, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
buff_2 = clCreateBuffer(context_2, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
I realize that, officially,
The result of OpenCL commands that operate on multiple buffer objects created with the same host_ptr or overlapping host regions is considered to be undefined.
But what will actually happen if I copy to this buffer from one device, and from this buffer to another device? I'm specifically interested in the case of (relatively-recent) AMD and NVIDIA GPUs.
If your OpenCL implementation's vendor guarantees some kind of specific behaviour that goes beyond the standard, then go with that and make sure to follow any instructions about limitations to the letter.
If it doesn't, then you have to assume what the standard says.
I know I can't share buffers between contexts
It's not the contexts that are the problem. It's platforms. There are essentially two cases:
1) you want to share buffers between devices from the same platform. In that case, simply create a single context with all devices, don't complicate your life, and let the platform handle it.
2) you need to share buffer between devices from different platforms. In that case, you're on your own.
waiting "for the split-context bug report to assigned and handled" isn't going to get you anywhere, because if it's contexts from same platform they'll tell you what i said in 1), and if it's contexts from different platforms they'll tell you it's impossible to support in any sane way.
"what will actually happen" ... depends (on a gajillion things). Some platforms will try to map the memory pointer (if it's properly aligned, for some definition of "properly") to the device address space. Some platforms will just silently copy it to device memory. Some platforms will also update the contents of the host memory after every enqueued command (which could mean a huge slowdown), while others will only update it at some specific "synchronization points".
My personal experience is to avoid CL_MEM_USE_HOST_PTR unless i know i'm working with iGPU or a CPU implementation (and have properly aligned pointers).
If you have AMD and NVIDIA gpus in the same machine, i'm not aware of any official way they can share buffers efficiently, which means you'll have to go through host memory anyway... in which case i'd avoid any games with CL_MEM_USE_HOST_PTR and just rely on clMap/Unmap or clRead/Write.
I'm using append/consume buffers to reduce shading work in my path-tracer (immediately shade empty space + emitters in a pre-pass, append the remaining pixels for full processing), and I've heard that I should be using a UAV when I'm accessing through AppendStructuredBuffer<T> and an SRV when I'm accessing through ConsumeStructuredBuffer. I haven't seen that claim in any of Microsoft's documentation, but it might explain why my calls to [Consume()] are returning empty data - is it accurate?
I should have tested myself before I asked - shaders with ConsumeStructuredBuffers declared in SRV registers (tN) fail to compile and emit an error saying they're only bindable through UAVs.
My AppendStructuredBuffer bound through the UAV registers works fine, so it seems like I was just quoting hearsay; unordered-access-views should be used for both HLSL types.
I just wonder if we can bind the image (such as texture) that shader will use to a VkDeviceMemory that is allocated by flags HOST_VISIBLE | HOST_COHERENT.
It can if your implementation allows it.
Before you can bind any VkImage to memory, you must first use vkGetImageMemoryRequirements to determine what memory types are allowed for that particular VkImageFormat and VkImageType. These are implementation defined properties. If the implementation says that a particular memory type can be used for that image, then you can use memory allocated from that memory type for that VkImage (and ones with similar parameters, as defined by the specification).
If it does not, then you cannot.
Just as the title says, I am writing a networking program where I open a handle to a network driver using CreateFile, and I have been experimenting with the NO_BUFFERING flag.
Most documentation won't even mention this being used with communication devices, and the ones that do (AKA the MSDN reference, etc), simply mention that you can.
Does anyone have any idea how this may affect communication with the device?
It is a device driver implementation detail, options you specify in the CreateFile() call are passed in the IRP_MJ_REQUEST request. The one I linked is the one for file systems, it is very fancy one. Click through the IrpSp->Parameters.Create.Options link to IoCreateFileSpecifyDeviceObjectHint()'s Options argument to see FILE_NO_INTERMEDIATE_BUFFERING.
The documentation for the IRP_MJ_REQUEST for serial ports is here. Very simple one, no arguments at all :) In general, the winapi to device driver interface for communication ports is a very straight-forward. There's an (almost) direct mapping between the documented winapi function and its underlying IOCTL. The winapi function doesn't do much beyond basic error checking, then quickly passes the job to the driver.
So there isn't any way to pass the FILE_FLAG_NO_BUFFERING option you specify so it simply doesn't get used.
Otherwise the logical conclusion, serial port I/O is interrupt driven, the driver must buffer in order to not lose bytes and keep an acceptable transfer rate. You can technically tinker with the buffer sizes through SetupComm() but, as documented, it is only a recommendation with pretty high odds that the driver simply ignores very low values.
I use compute shaders to compute a triangle list and to store it in a RWStructuredBuffer. For testing I read this buffer and pass it to the IA via context.InputAssembler.SetVertexBuffers (…). This approach works, but is valid only for testing the data for correctness.
Now I want to bind the (already existing) buffer to the IA stage using a resource view (aka without passing a pointer to the vertex buffer).
I am reading some good books (Frank D. Luna, Jason Zink), but they never mention this case.
===============
EDIT:
The syntax I am using here in imposed by the SharpDX wrapper.
I can bind the buffer to the vertex shader via context.VertexShader.SetShaderResource(...), bindig a ResoureceView. In the VS I use SV_VertexID to access the buffer. So I HAVE a working solution for moment, but there might be cases in the future where I must bind the buffer to the input assembler.
Simply put, you can't bind a structured buffer to the IA stage, at least directly, runtime will not allow this.
If you put ResourceOptionFlags.BufferStructured as OptionFlags, you are not allowed to use : VertexBuffer/IndexBuffer/StreamOutput/ConstantBuffer/RenderTarget/Depth as bind flags, Resource creation will fail.
One option, which costs you a GPU copy, is to create a second buffer with VertexBuffer BindFlags, and Default usage (same size as your structured buffer).
Once you are done processing your structuredbuffer, call:
DeviceContext.CopyResource
And you'll have a standard vertex buffer ready to use.