I just wonder if we can bind the image (such as texture) that shader will use to a VkDeviceMemory that is allocated by flags HOST_VISIBLE | HOST_COHERENT.
It can if your implementation allows it.
Before you can bind any VkImage to memory, you must first use vkGetImageMemoryRequirements to determine what memory types are allowed for that particular VkImageFormat and VkImageType. These are implementation defined properties. If the implementation says that a particular memory type can be used for that image, then you can use memory allocated from that memory type for that VkImage (and ones with similar parameters, as defined by the specification).
If it does not, then you cannot.
Related
I found information that general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables, but do they have any other purpose? Also which memory are this registers part of(cache/RAM)?
Finally what does global pointer gp in register r26 points to?
Also which memory are this registers part of(cache/RAM)?
Register are on-processors storage allowing a fast data transfer (2 reads/1 write per cycle). They store variables that can represent memory addresses, but, besides that, are completely unrelated to memory or cache.
I found information that general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables, but do they have any other purpose?
Registers are use with respect to hardware or software conventions. Hardware conventions are related to the instruction set architecture. For instance, the call instruction transfers control to a subroutine and stores return address in register r31 (ra). Very nasty things are likely to happen if you overwrite r31 register by any mean without precautions. Software conventions are supposed to insure a proper behavior if used consistently within software. They indicate which register have special use, which need to be saved when context switching, etc. These conventions can be changed without hardware modifications, but doing so will probably require changes in several software tools (compiler, linker, loader, OS, ...).
general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables
Actually, some registers are reserved.
r1 is used by asm for macro expansion. (sw)
r2-r7 are used by the compiler to pass arguments to functions or get return values. (sw)
r24-r25 can only be used by exception handlers. (sw)
r26-r28 hold different pointers (global, stack, frame) that are set either by the runtime or the compiler and cannot be modified by the programmer.(sw)
r29-r31 are hw coded returns addresses for subprograms or interrupts/exceptions. (hw)
So only r8-r23 can used by the compiler.
but do they have any other purpose?
No, and that's why they can be freely used by the compiler or programmer.
Finally what does global pointer in register r26 points to?
Accessing memory with load or stores have a based memory addressing. Effective address for ldx or stx (where 'x' is is b, bu, h, etc depending on data characteristics) is computed by adding a register and a 16 bits immediate. This only allows to go an an address within +/-32k of the content of register.
If the processor has the address of a var in a register (for instance the value returned by a malloc) the immediate allows to do a displacement to access fields in a struct, next array value, etc.
If the address is local or global, it must be computed by the program. Pointers registers are used to that purpose. Local vars addresses are computed by adding an immediate to the stack pointer (r27or sp).
Addresses of global or static vars are computed by adding an integer to the global pointer (r26 or gp). Content of gp corresponds to the start of the memory data segment and is initialized by the loader just before program execution and must not be modified. The immediate displacement with respect to the start of data segment is computed by the linker when it defines memory layout.
Note that this only allows to access 64k memory because of the 16 bits immediate width. If the size of global/static variables exceeds this value and a var is not within this range, a couple of instructions are required to enter the 32 bits of the address of the var before the data transfer. With gp this is not required and it is a way to provide a faster access to global variables.
I'd like to translate the delegate members .ptr and .funcptr to an absolute address that matches something in the executable image in DRAM.
The goal is not to call, neither to modify, but rather to allow the target to disassemble itself at run-time, when its own image is loaded in DRAM.
So far it already works with global functions.
Is it possible ?
The address of a delegate is the value of the .funcptr property. The type of this property is a bit misleading - it is of type function and does not list the hidden argument that is actually expected for passing the context in, but for just getting the address, you can ignore the type (explicitly casting to void* or size_t if you like to change the type) and just look at the address.
This isn't the address in physical memory, you'd have to ask the operating system for that, but since the virtual address it gives is automatically translated by the processor, it is most likely what you want anyway.
I use compute shaders to compute a triangle list and to store it in a RWStructuredBuffer. For testing I read this buffer and pass it to the IA via context.InputAssembler.SetVertexBuffers (…). This approach works, but is valid only for testing the data for correctness.
Now I want to bind the (already existing) buffer to the IA stage using a resource view (aka without passing a pointer to the vertex buffer).
I am reading some good books (Frank D. Luna, Jason Zink), but they never mention this case.
===============
EDIT:
The syntax I am using here in imposed by the SharpDX wrapper.
I can bind the buffer to the vertex shader via context.VertexShader.SetShaderResource(...), bindig a ResoureceView. In the VS I use SV_VertexID to access the buffer. So I HAVE a working solution for moment, but there might be cases in the future where I must bind the buffer to the input assembler.
Simply put, you can't bind a structured buffer to the IA stage, at least directly, runtime will not allow this.
If you put ResourceOptionFlags.BufferStructured as OptionFlags, you are not allowed to use : VertexBuffer/IndexBuffer/StreamOutput/ConstantBuffer/RenderTarget/Depth as bind flags, Resource creation will fail.
One option, which costs you a GPU copy, is to create a second buffer with VertexBuffer BindFlags, and Default usage (same size as your structured buffer).
Once you are done processing your structuredbuffer, call:
DeviceContext.CopyResource
And you'll have a standard vertex buffer ready to use.
Is there performance side effects or any other side effects from always defining standard directx buffers with the bind flags D3D11_BIND_SHADER_RESOURCE and D3D11_BIND_RENDER_TARGET with the exception of more specialised buffers such as index, vertex, constant etc.
Yes
Bind Flags will associated the resource with a shader stage or how to access it. The device will not be able to optimize for its use.
Also it can cause an issue when trying to combine with another flag, example: D3D11_BIND_SHADER_RESOURCE can not be use with D3D11_MAP_WRITE_NO_OVERWRITE
http://msdn.microsoft.com/en-us/library/windows/desktop/ff476085(v=vs.85).aspx
Remarks
In general, binding flags can be combined using a logical OR (except the constant-buffer flag);> however, you should use a single flag to allow the device to optimize the resource usage.
I want to move my caching library to a DLL and allow multiple applications to share a single pointer allocated within the DLL using GlobalAlloc(). How could I accomplish this, and would it result in a significant performance decrease?
You could certainly do this and there won't be any performance implication for a single pointer.
Rather than use GlobalAlloc, a legacy API, you should opt for a different shared heap. For example the simplest to use is the COM allocator, CoTaskMemAlloc. Or you can use HeapAlloc passing the process heap obtained by GetProcessHeap.
For example, and neglecting to show error checking:
void *mem = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size);
Note that you only need to worry about heap sharing if you expect the memory to be deallocated in a different module from where it was created. If your DLL both creates and destroys the memory then you can use plain old malloc. Because all modules live in the same process address space, memory allocated by any module in that process, can be used by any other module.
Update
I failed on first reading of the question to pick up on the possibility that you may be wanting multiple process to have access to the same memory. If that's what you need then it is only possible with memory mapped files, or perhaps with some form of IPC.