API to get the graphics or video memory - windows

I want to get the adpater RAM or graphics RAM which you can see in Display settings or Device manager using API. I am in C++ application.
I have tried seraching on net and as per my RnD I have come to conclusion that we can get the graphics memory info from
1. DirectX SDK structure called DXGI_ADAPTER_DESC. But what if I dont want to use DirectX API.
2. Win32_videocontroller : But this class does not always give you adapterRAM info if availability of video controller is offline. I have checked it on vista.
Is there any other way to get the graphics RAM?

There is NO way to directly get graphics RAM on windows, windows prevents you doing this as it maintains control over what is displayed.
You CAN, however, create a DirectX device. Get the back buffer surface and then lock it. After locking you can fill it with whatever you want and then unlock and call present. This is slow, though, as you have to copy the video memory back across the bus into main memory. Some cards also use "swizzled" formats that it has to un-swizzle as it copies. This adds further time to doing it and some cards will even ban you from doing it.
In general you want to avoid directly accessing the video card and letting windows/DirectX do the drawing for you. Under D3D1x Im' pretty sure you can do it via an IDXGIOutput though. It really is something to try and avoid though ...
You can write to a linear array via standard win32 (This example assumes C) but its quite involved.
First you need the linear array.
unsigned int* pBits = malloc( width * height );
Then you need to create a bitmap and select it to the DC.
HBITMAP hBitmap = ::CreateBitmap( width, height, 1, 32, NULL );
SelectObject( hDC, (HGDIOBJ)hBitmap );
You can then fill the pBits array as you please. When you've finished you can then set the bitmap's bits.
::SetBitmapBits( hBitmap, width * height * 4, (void*)pBits )
When you've finished using your bitmap don't forget to delete it (Using DeleteObject) AND free your linear array!
Edit: There is only one way to reliably get the video ram and that is to go through the DX Diag interfaces. Have a look at IDxDiagProvider and IDxDiagContainer in the DX SDK.

Win32_videocontroller is your best course to get the amount of gfx memory. That's how its done in Doom3 source.
You say "..availability of video controller is offline. I have checked it on vista." Under what circumstances would the video controller be offline?
Incidentally, you can find the Doom3 source here. The function you're looking for is called Sys_GetVideoRam and it's in a file called win_shared.cpp, although if you do a solution wide search it'll turn it up for you.

User mode threads cannot access memory regions and I/O mapped from hardware devices, including the framebuffer. Anyway, what you would want to do that? Suppose the case you can access the framebuffer directly: now you must handle a LOT of possible pixel formats in the framebuffer. You can assume a 32-bit RGBA or ARGB organization. There is the possibility of 15/16/24-bit displays (RGBA555, RGBA5551, RGBA4444, RGBA565, RGBA888...). That's if you don't want to also support the video-surface formats (overlays) such as YUV-based.
So let the display driver and/or the subjacent APIs to do that effort.
If you want to write to a display surface (which not equals exactly to framebuffer memory, altough it's conceptually almost the same) there are a lot of options. DX, Win32, or you may try the SDL library (libsdl).

Related

How to best organize constant buffers

I'm having some trouble wrapping my head around how to organize the constant buffers in a very basic D3D11 engine I'm making.
My main question is: Where does the biggest performance hit take place? When using Map/Unmap to update buffer data or when binding the cbuffers themselves?
At the moment, I'm deciding between the following two implementations for a sort of "shader-wrapper" class:
Holding an array of 14 ID3D11Buffer*s
class VertexShader
{
...
public:
Bind(context)
{
// Bind all 14 buffers at once
context->VSSetConstantBuffers(0, 14, &m_ppCBuffers[0]);
context->VSSetShader(pVS, nullptr, 0);
}
// Set the data for a buffer in a particular slot
SetData(slot, size, pData)
{
D3D11_MAPPED_SUBRESOURCE mappedBuffer = {};
context->Map(buffers[slot], 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedBuffer);
memcpy(mappedBuffer.pData, pData, size);
context->Unmap(buffers[slot], 0);
}
private:
ID3D11Buffer* buffers[14];
ID3D11VertexShader* pVS;
}
This approach would have the shader bind all the cbuffers in a single batch of 14. If the shader has cbuffers registered to b0, b1, b3 the array would look like -> [cb|cb|0|cb|0|0|0|0|0|0|0|0|0|0]
Constant Buffer wrapper that knows how to bind itself
class VertexShader
{
...
public:
Bind(context)
{
// all the buffers bind themselves
for(auto cb : bufferMap)
cb->Bind(context);
context->VSSetShader(pVS, nullptr, 0);
}
// Set the data for a buffer with a particular ID
SetData(std::string, size, pData)
{
// table lookup into bufferMap, then Map/Unmap
}
private:
std::unordered_map<std::string, ConstantBuffer*> bufferMap;
ID3D11VertexShader* pVS;
}
This approach would hold "ConstantBuffers" in a hash table, each one would know what slot it's bound to and how to bind itself to the pipeline. I would have to call VSSetConstantBuffers() individually for each cbuffer since the ID3D11Buffer*s wouldn't be contiguous anymore, but the organization is friendlier and has a bit less wasted space.
How would you typically organize the relationship between CBuffers, Shaders, SRVs, etc? Not looking for a do-all solution, but some general advice and things to read more about from people hopefully more experienced than I am
Also if #Chuck Walbourn sees this, I'm a fan of your work and using DXTK/WiCTextureLoader for this project!
Thanks.
Constant Buffers were a major feature of Direct3D 10, so one of the best talks on the subject was given way back at Gamefest 2007:
Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games
See also Why Can Updating Constant Buffers be so painfully slow? (NVIDIA)
The original intention was for CBs to be organized by frequency of update: something like one CB for stuff that is set 'per level', another for stuff 'per frame', another for 'per object', another 'per pass' etc. Therefore the assumption is that if you changed any part of a CB, you were going to be uploading the whole thing. Bandwdith between the CPU and GPU is the real bottleneck here.
For this approach to be effective, you basically need to set up all your shaders to use the same scheme. This can be difficult to manage, especially when so many modern material systems are art-driven.
Another approach to CBs is to use them like a dynamic VB for particles submission where you fill it up with short-lived constants, submit work, and then reset the thing each frame. This approach is basically what people do for DirectX 12 in many cases. The problem is that without the ability to update parts of CBs, it's too slow. The "partial constant buffer updates and offsets' optional features in DirectX 11.1 were a way to make this work. That said, this feature is not supported on Windows 7 and is 'optional' on newer versions of Windows, so you have to support two codepaths to use it.
TL;DR: you can technically have a lot of CBs bound at once, but the key thing is to keep the individual size small for the ones that change often. Also assume any change to a CB is going to require updating the whole thing to the GPU every time you do change it.

OpenCL inter-context buffer aliasing

Suppose I have 2 OpenCL-capable devices on my machine (not including CPUs); and suppose that an evil colleague of mine creates a different context for each of them, which I have to work with.
I know I can't share buffers between contexts - not properly and officially, at least. But suppose that I create two OpenCL buffers, one in each context, and pass to each of them the same region of host memory, with the CL_MEM_USE_HOST_PTR flag. e.g.:
enum { size = 1234 };
//...
context_1 = clCreateContext(NULL, 1, &some_device_id, NULL, NULL, NULL);
context_2 = clCreateContext(NULL, 1, &another_device_id, NULL, NULL, NULL);
void* host_mem = malloc(size);
assert(host_mem != NULL);
buff_1 = clCreateBuffer(context_1, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
buff_2 = clCreateBuffer(context_2, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
I realize that, officially,
The result of OpenCL commands that operate on multiple buffer objects created with the same host_ptr or overlapping host regions is considered to be undefined.
But what will actually happen if I copy to this buffer from one device, and from this buffer to another device? I'm specifically interested in the case of (relatively-recent) AMD and NVIDIA GPUs.
If your OpenCL implementation's vendor guarantees some kind of specific behaviour that goes beyond the standard, then go with that and make sure to follow any instructions about limitations to the letter.
If it doesn't, then you have to assume what the standard says.
I know I can't share buffers between contexts
It's not the contexts that are the problem. It's platforms. There are essentially two cases:
1) you want to share buffers between devices from the same platform. In that case, simply create a single context with all devices, don't complicate your life, and let the platform handle it.
2) you need to share buffer between devices from different platforms. In that case, you're on your own.
waiting "for the split-context bug report to assigned and handled" isn't going to get you anywhere, because if it's contexts from same platform they'll tell you what i said in 1), and if it's contexts from different platforms they'll tell you it's impossible to support in any sane way.
"what will actually happen" ... depends (on a gajillion things). Some platforms will try to map the memory pointer (if it's properly aligned, for some definition of "properly") to the device address space. Some platforms will just silently copy it to device memory. Some platforms will also update the contents of the host memory after every enqueued command (which could mean a huge slowdown), while others will only update it at some specific "synchronization points".
My personal experience is to avoid CL_MEM_USE_HOST_PTR unless i know i'm working with iGPU or a CPU implementation (and have properly aligned pointers).
If you have AMD and NVIDIA gpus in the same machine, i'm not aware of any official way they can share buffers efficiently, which means you'll have to go through host memory anyway... in which case i'd avoid any games with CL_MEM_USE_HOST_PTR and just rely on clMap/Unmap or clRead/Write.

Directx Texture interface to existing memory

I'm writing a rendering app that communicates with an image processor as a sort of virtual camera, and I'm trying to figure out the fastest way to write the texture data from one process to the awaiting image buffer in the other.
Theoretically I think it should be possible with 1 DirectX copy from VRAM directly to the area of memory I want it in, but I can't figure out how to specify a region of memory for a texture to occupy, and thus must perform an additional memcpy. DX9 or DX11 solutions would be welcome.
So far, the docs here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb174363(v=vs.85).aspx have held the most promise.
"In Windows Vista CreateTexture can create a texture from a system memory pointer allowing the application more flexibility over the use, allocation and deletion of the system memory"
I'm running on Windows 7 with the June 2010 Directx SDK, However, whenever I try and use the function in the way it specifies, I the function fails with an invalid arguments error code. Here is the call I tried as a test:
static char s_TextureBuffer[640*480*4]; //larger than needed
void* p = (void*)s_TextureBuffer;
HRESULT res = g_D3D9Device->CreateTexture(640,480,1,0, D3DFORMAT::D3DFMT_L8, D3DPOOL::D3DPOOL_SYSTEMMEM, &g_ReadTexture, (void**)p);
I tried with several different texture formats, but with no luck. I've begun looking into DX11 solutions, it's going slowly since I'm used to DX9. Thanks!

Is it possible to save some data parmanently in AVR Microcontroller?

Well, the question says it all.
What I would like to do is that, every time I power up the micro-controller, it should take some data from the saved data and use it. It should not use any external flash chip.
If possible, please give some code-snippet so that I can use them in AVR studio 4. for example if I save 8 uint16_t data it should load those data into an array of uint16_t.
You have to burn the data to the program memory of the chip if you don't need to update them programmatically, or if you want read-write support, you should use the built-in EPROM.
Pgmem example:
#include <avr/pgmspace.h>
PROGMEM uint16_t data[] = { 0, 1, 2, 3 };
int main()
{
uint16_t x = pgm_read_word_near(data + 1); // access 2nd element
}
You need to get the datasheet for the part you are using. Microcontrollers like these typically contain at least a flash and sometimes multiple banks of flash to allow for different bootloaders while making it easy to erase one whole flash without affecting another. Likewise some have eeprom. This is all internal, not external. Esp since you say you need to save programatically this should work (remember how easy it is to wear out a flash do dont save unless you need to). Either eeprom or flash will meet the requirement of having that information there when you power up, non-volatile. As well as being able to save it programmatically. Googling will find a number of examples on how to do this, in addition to the datasheet you apparently have not read, as well as the app notes that also contain this information (that you should have read). If you are looking for some sort of one time programmable fuse blowing thing, there may be OTP versions of the avr, and you will have to read the datasheets, programmers references and app notes on how to program that memory, and should tell you if OTP parts can be written programmatically or if they are treated differently.
The reading of the data is in the memory map in the datasheet, write code to read those adresses. Writing is described in the datasheet (programmers reference manual, users guide, whatever atmel calls it) as well and there are many examples on the net.

Converting glReadBuffer() / glDrawBuffer() calls into OpenGL ES

I'm having trouble understanding how to port glReadBuffer() & glDrawBuffer() calls into Open GL ES 1.1. Various forum posts on the internet just say "use VBOs," without going into more depth.
Can you please help me understand an appropriate conversion? Say I have:
glReadBuffer(GL_FRONT);
followed by
glDrawBuffer(GL_BACK_LEFT);
state->paint(state_id, f);
How can I write the pixels out?
glReadBuffer and glDrawBuffer just set the source and target for subsequent drawing operations. Assuming you're targeting a monoscopic device, such as the iPhone or an Android device, and have requested two buffers then you're already set for drawing to the back buffer. The only means of reading the colour buffer in GL ES is glReadPixels, which will read from the same buffer that you're drawing to.
All of these are completely unrelated to VBOs, which pass off management of arrays of data to the driver, often implicitly allowing them to be put into the GPU's direct address space.

Resources