Performance loss with CopyResource() and then Map()/Unmap() - windows

The problem is that if you do not use these methods, then the FPS differs by about 2 times in a big way. For example, I had about 5000 fps in a 3d scene. And it became about 2500. I know that the problem is that the application is waiting for the copy to wait. But it's only 4 bytes... If you use the D3D11_MAP_FLAG_DO_NOT_WAIT flag, Map() will always return DXGI_ERROR_WAS_STILL_DRAWING. What can be done so that I can use this method without losing fps? Here is my code:
Init
D3D11_BUFFER_DESC outputDesc;
outputDesc.Usage = D3D11_USAGE_DEFAULT;
outputDesc.ByteWidth = sizeof(float);
outputDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
outputDesc.CPUAccessFlags = 0;
outputDesc.StructureByteStride = sizeof(float);
outputDesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
FOG_TRACE(mDevice->CreateBuffer(&outputDesc, nullptr, &outputBuffer));
outputDesc.Usage = D3D11_USAGE_STAGING;
outputDesc.BindFlags = 0;
outputDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
FOG_TRACE(mDevice->CreateBuffer(&outputDesc, nullptr, &outputResultBuffer));
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc{};
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_APPEND;
uavDesc.Buffer.NumElements = 1;
uavDesc.Format = DXGI_FORMAT_UNKNOWN;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
FOG_TRACE(mDevice->CreateUnorderedAccessView(outputBuffer, &uavDesc, &unorderedAccessView));
Update
const UINT offset = 0;
mDeviceContext->OMSetRenderTargetsAndUnorderedAccessViews(1, &mRenderTargetView, mDepthStencilView, 1, 1, &unorderedAccessView, &offset);
mDeviceContext->ClearDepthStencilView(mDepthStencilView, D3D11_CLEAR_DEPTH | D3D11_CLEAR_STENCIL, 1.0f, 0);
ObjectManager::Draw();
mDeviceContext->CopyResource(outputResultBuffer, outputBuffer);
D3D11_MAPPED_SUBRESOURCE mappedBuffer;
HRESULT hr;
FOG_TRACE(hr = mDeviceContext->Map(outputResultBuffer, 0, D3D11_MAP_READ, 0/*D3D11_MAP_FLAG_DO_NOT_WAIT*/, &mappedBuffer));
if (SUCCEEDED(hr))
{
float* copy = (float*)(mappedBuffer.pData);
OutputDebugString(String::ToStr(*copy) + L"\n");
}
mDeviceContext->Unmap(outputResultBuffer, 0);
const UINT var[4]{};
mDeviceContext->ClearUnorderedAccessViewUint(unorderedAccessView, var);
I've already profiled and checked everything possible, the problem is exactly in pending. I would be very grateful if someone could explain everything in detail :)

The problem was solved very simply but for a long time! I just didn't make copy calls until I had read past data. Here is a small crutch:
static bool isWait = false;
if (!isWait)
{
mDeviceContext->CopyResource(outputResultBuffer, outputBuffer);
}
D3D11_MAPPED_SUBRESOURCE mappedBuffer;
HRESULT hr;
FOG_TRACE(hr = mDeviceContext->Map(outputResultBuffer, 0, D3D11_MAP_READ, D3D11_MAP_FLAG_DO_NOT_WAIT, &mappedBuffer));
if (SUCCEEDED(hr))
{
float* copy = (float*)(mappedBuffer.pData);
OutputDebugString(String::ToStr(*copy) + L"\n");
mDeviceContext->Unmap(outputResultBuffer, 0);
const UINT var[4]{};
mDeviceContext->ClearUnorderedAccessViewUint(unorderedAccessView, var);
isWait = false;
}
else
{
isWait = true;
}

Related

Making a Cubemap in DX11 from 6 Textures

I want to load in a skysphere, I know how to do it using a dds file, but I want to try and do it from 6 separate texture files.
The problem I have is that when I load in the texturecube, only 3 separate textures are visible, the other 3 are not. I'll show you my code and how it looks in Nsight. I am right now using just 6 different uniform colored png files, they are all 512x512 in size.
std::vector<std::string> paths = { "../Resources/Textures/posX.png", "../Resources/Textures/negX.png",
"../Resources/Textures/posY.png",
"../Resources/Textures/negY.png",
"../Resources/Textures/posZ.png", "../Resources/Textures/negZ.png" };
ID3D11Texture2D* cubeTexture = NULL;
WRL::ComPtr<ID3D11ShaderResourceView> shaderResourceView = NULL;
//Description of each face
D3D11_TEXTURE2D_DESC texDesc = {};
texDesc.Width = 512;
texDesc.Height = 512;
texDesc.MipLevels = 1;
texDesc.ArraySize = 6;
texDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
texDesc.CPUAccessFlags = 0;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = 0;
texDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
//The Shader Resource view description
D3D11_SHADER_RESOURCE_VIEW_DESC SMViewDesc = {};
SMViewDesc.Format = texDesc.Format;
SMViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
SMViewDesc.TextureCube.MipLevels = texDesc.MipLevels;
SMViewDesc.TextureCube.MostDetailedMip = 0;
D3D11_SUBRESOURCE_DATA pData[6] = {};
for (int i = 0; i < 6; i++)
{
ID3D11Resource* res = nullptr;
std::wstring pathWString(paths[j].begin(), paths[j].end());
HRESULT hr = DirectX::CreateWICTextureFromFileEx(Renderer::getDevice(), pathWString.c_str(), 0, D3D11_USAGE_STAGING, 0, D3D11_CPU_ACCESS_READ, 0,
WIC_LOADER_FLAGS::WIC_LOADER_DEFAULT,
&res, 0);
assert(SUCCEEDED(hr));
D3D11_MAPPED_SUBRESOURCE destRes = {};
Renderer::getContext()->Map(res, 0, D3D11_MAP_READ, 0, &destRes);
pData[i].pSysMem = destRes.pData;
pData[i].SysMemPitch = destRes.RowPitch;
pData[i].SysMemSlicePitch = destRes.DepthPitch;
Renderer::getContext()->Unmap(res, 0);
RELEASE_COM(res);
}
Renderer::getDevice()->CreateTexture2D(&texDesc, &pData[0], &cubeTexture);
Renderer::getDevice()->CreateShaderResourceView(cubeTexture, &SMViewDesc, shaderResourceView.GetAddressOf());
When graphics debugging, this is what the cubemap looks like, 3 textures are loaded in twice, overwriting the other 3.
When reading the documentation it says subresource should be relating to mip levels.
If I loop 9 times instead of 6, the other 3 images are shown instead of these current 3. All 6 should have unique colors.
What I'm doing in the code is creating a Texture2D Description, a Shader Resource View Descriptuion, then I try to fetch data from imported images using WIC, I put it in a resource then map that to a subresource struct.
When looking at the addresses of the subresource, all 6 are always unique so it seems they load in the textures correctly, I have tried moving around the rowpitch, changing the size of the image but it only seems to affect the single images inside the textureCube, it doesn't seem to move around the duplicates if you understand what I mean.
Any help is greatly appreciated.
So I found some code from a Frank D Luna example doing something else.
Here is the code I use that works, mip levels had to be taken into account.
I hope this helps if someone in the future has a similar issue.
ID3D11Texture2D* cubeTexture = NULL;
WRL::ComPtr<ID3D11ShaderResourceView> shaderResourceView = NULL;
//Description of each face
D3D11_TEXTURE2D_DESC texDesc = {};
D3D11_TEXTURE2D_DESC texDesc1 = {};
//The Shader Resource view description
D3D11_SHADER_RESOURCE_VIEW_DESC SMViewDesc = {};
ID3D11Texture2D* tex[6] = { nullptr, nullptr, nullptr,nullptr, nullptr, nullptr };
for (int i = 0; i < 6; i++)
{
std::wstring pathWString(paths[i].begin(), paths[i].end());
HRESULT hr = DirectX::CreateWICTextureFromFileEx(Renderer::getDevice(), pathWString.c_str(), 0, D3D11_USAGE_STAGING, 0, D3D11_CPU_ACCESS_READ| D3D11_CPU_ACCESS_WRITE, 0,
WIC_LOADER_FLAGS::WIC_LOADER_DEFAULT,
(ID3D11Resource**)&tex[i], 0);
assert(SUCCEEDED(hr));
}
tex[0]->GetDesc(&texDesc1);
texDesc.Width = texDesc1.Width;
texDesc.Height = texDesc1.Height;
texDesc.MipLevels = texDesc1.MipLevels;
texDesc.ArraySize = 6;
texDesc.Format = texDesc1.Format;
texDesc.CPUAccessFlags = 0;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = 0;
texDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
SMViewDesc.Format = texDesc.Format;
SMViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
SMViewDesc.TextureCube.MipLevels = texDesc.MipLevels;
SMViewDesc.TextureCube.MostDetailedMip = 0;
Renderer::getDevice()->CreateTexture2D(&texDesc, NULL, &cubeTexture);
for (int i = 0; i < 6; i++)
{
for (UINT mipLevel = 0; mipLevel < texDesc.MipLevels; ++mipLevel)
{
D3D11_MAPPED_SUBRESOURCE mappedTex2D;
HRESULT hr = (Renderer::getContext()->Map(tex[i], mipLevel, D3D11_MAP_READ, 0, &mappedTex2D));
assert(SUCCEEDED(hr));
Renderer::getContext()->UpdateSubresource(cubeTexture,
D3D11CalcSubresource(mipLevel, i, texDesc.MipLevels),
0, mappedTex2D.pData, mappedTex2D.RowPitch, mappedTex2D.DepthPitch);
Renderer::getContext()->Unmap(tex[i], mipLevel);
}
}
for (int i = 0; i < 6; i++)
{
RELEASE_COM(tex[i]);
}
Renderer::getDevice()->CreateShaderResourceView(cubeTexture, &SMViewDesc, shaderResourceView.GetAddressOf());

rgb32 data resource mapping. using directx memcpy

I have been trying to solve the problem for a month with googling.
But Now I have to ask for help here.
I want to render using ffmpeg decoded frame.
and using frame(it converted to RGB32 format), I try to render frame with DX2D texture.
ZeroMemory(&TextureDesc, sizeof(TextureDesc));
TextureDesc.Height = pFrame->height;
TextureDesc.Width = pFrame->width;
TextureDesc.MipLevels = 1;
TextureDesc.ArraySize = 1;
TextureDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT; //size 16
TextureDesc.SampleDesc.Count = 1;
TextureDesc.SampleDesc.Quality = 0;
TextureDesc.Usage = D3D11_USAGE_DYNAMIC;
TextureDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
TextureDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
TextureDesc.MiscFlags = 0;
result = m_device->CreateTexture2D(&TextureDesc, NULL, &m_2DTex);
if (FAILED(result)) return false;
ShaderResourceViewDesc.Format = TextureDesc.Format;
ShaderResourceViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
ShaderResourceViewDesc.Texture2D.MostDetailedMip = 0;
ShaderResourceViewDesc.Texture2D.MipLevels = 1;
D3D11_MAPPED_SUBRESOURCE S_mappedResource_tt = { 0, };
ZeroMemory(&S_mappedResource_tt, sizeof(D3D11_MAPPED_SUBRESOURCE));
result = m_deviceContext->Map(m_2DTex, 0, D3D11_MAP_WRITE_DISCARD, 0, &S_mappedResource_tt);
if (FAILED(result)) return false;
BYTE* mappedData = reinterpret_cast<BYTE *>(S_mappedResource_tt.pData);
for (auto i = 0; i < pFrame->height; ++i) {
memcpy(mappedData, pFrame->data, pFrame->linesize[0]);
mappedData += S_mappedResource_tt.RowPitch;
pFrame->data[0] += pFrame->linesize[0];
}
m_deviceContext->Unmap(m_2DTex, 0);
result = m_device->CreateShaderResourceView(m_2DTex, &ShaderResourceViewDesc, &m_ShaderResourceView);
if (FAILED(result)) return false;
m_deviceContext->PSSetShaderResources(0, 1, &m_ShaderResourceView);
but it shows me just black screen(nothing render).
I guess it's wrong memcpy size.
The biggest problem is that I don't know what is the problem.
Question 1 :
It has any problem creating 2D texture for mapping?
Question 2 :
What size of the memcpy parameters should I enter (related to formatting)?
I based on the link below.
[1]https://www.gamedev.net/forums/topic/667097-copy-2d-array-into-texture2d/
[2]https://www.gamedev.net/forums/topic/645514-directx-11-maping-id3d11texture2d/
[3]https://www.gamedev.net/forums/topic/606100-solved-dx11-updating-texture-data/
Thank U for watching, Please reply.
Nobody reply. I solved my issue.
I have modified some code and I'm not sure if it solves the problem. The problem with the black screen Reason is my matrix.
D3D11_TEXTURE2D_DESC TextureDesc;
D3D11_RENDER_TARGET_VIEW_DESC RenderTargetViewDesc;
D3D11_SHADER_RESOURCE_VIEW_DESC ShaderResourceViewDesc;
ZeroMemory(&TextureDesc, sizeof(TextureDesc));
TextureDesc.Height = pFrame->height;
TextureDesc.Width = pFrame->width;
TextureDesc.MipLevels = 1;
TextureDesc.ArraySize = 1;
TextureDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;/*DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;*/ //size 32bit
TextureDesc.SampleDesc.Count = 1;
TextureDesc.SampleDesc.Quality = 0;
TextureDesc.Usage = D3D11_USAGE_DYNAMIC;
TextureDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
TextureDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
TextureDesc.MiscFlags = 0;
DWORD* pInitImage = new DWORD[pFrame->width*pFrame->height];
memset(pInitImage, 0, sizeof(DWORD)*pFrame->width*pFrame->height);
D3D11_SUBRESOURCE_DATA InitData;
InitData.pSysMem = pInitImage;
InitData.SysMemPitch = pFrame->width*sizeof(DWORD);
InitData.SysMemSlicePitch = 0;
result = m_device->CreateTexture2D(&TextureDesc, &InitData, &m_2DTex);
if (FAILED(result)) return false;
ShaderResourceViewDesc.Format = TextureDesc.Format;
ShaderResourceViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
ShaderResourceViewDesc.Texture2D.MostDetailedMip = 0;
ShaderResourceViewDesc.Texture2D.MipLevels = 1;
result = m_device->CreateShaderResourceView(m_2DTex, &ShaderResourceViewDesc, &m_ShaderResourceView);
if (FAILED(result)) return false;
D3D11_MAPPED_SUBRESOURCE S_mappedResource_tt;
ZeroMemory(&S_mappedResource_tt, sizeof(S_mappedResource_tt));
DWORD Stride = pFrame->linesize[0];
result = m_deviceContext->Map(m_2DTex, 0, D3D11_MAP_WRITE_DISCARD, 0, &S_mappedResource_tt);
if (FAILED(result)) return false;
BYTE * pFrameData = pFrame->data[0]; // now we have a pointer that points to begin of the destination buffer
BYTE* mappedData = (BYTE *)S_mappedResource_tt.pData;// +S_mappedResource_tt.RowPitch;
for (auto i = 0; i < pFrame->height; i++) {
memcpy(mappedData, pFrameData, Stride);
mappedData += S_mappedResource_tt.RowPitch;
pFrameData += Stride;
}
m_deviceContext->Unmap(m_2DTex, 0);
It works vell. I hope that it will be helpful to those who are doing the same thing with me.

Image saved from Fingerprint sensor seems corrupted

I have been trying to put together code to actually save image from fingerprint sensors. I have already tried forums and this is my current code which saves file with correct file size but When i open the image, Its not image of fingerprint rather it looks like a corrupted image. Here is what it looks like.
My code is given below. Any help will be appreciated. I am new to windows development.
bool SaveBMP(BYTE* Buffer, int width, int height, long paddedsize, LPCTSTR bmpfile)
{
BITMAPFILEHEADER bmfh;
BITMAPINFOHEADER info;
memset(&bmfh, 0, sizeof(BITMAPFILEHEADER));
memset(&info, 0, sizeof(BITMAPINFOHEADER));
//Next we fill the file header with data:
bmfh.bfType = 0x4d42; // 0x4d42 = 'BM'
bmfh.bfReserved1 = 0;
bmfh.bfReserved2 = 0;
bmfh.bfSize = sizeof(BITMAPFILEHEADER) +
sizeof(BITMAPINFOHEADER) + paddedsize;
bmfh.bfOffBits = 0x36;
//and the info header:
info.biSize = sizeof(BITMAPINFOHEADER);
info.biWidth = width;
info.biHeight = height;
info.biPlanes = 1;
info.biBitCount = 8;
info.biCompression = BI_RGB;
info.biSizeImage = 0;
info.biXPelsPerMeter = 0x0ec4;
info.biYPelsPerMeter = 0x0ec4;
info.biClrUsed = 0;
info.biClrImportant = 0;
HANDLE file = CreateFile(bmpfile, GENERIC_WRITE, FILE_SHARE_READ,
NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
//Now we write the file header and info header:
unsigned long bwritten;
if (WriteFile(file, &bmfh, sizeof(BITMAPFILEHEADER),
&bwritten, NULL) == false)
{
CloseHandle(file);
return false;
}
if (WriteFile(file, &info, sizeof(BITMAPINFOHEADER),
&bwritten, NULL) == false)
{
CloseHandle(file);
return false;
}
//and finally the image data:
if (WriteFile(file, Buffer, paddedsize, &bwritten, NULL) == false)
{
CloseHandle(file);
return false;
}
//Now we can close our function with
CloseHandle(file);
return true;
}
HRESULT CaptureSample()
{
HRESULT hr = S_OK;
WINBIO_SESSION_HANDLE sessionHandle = NULL;
WINBIO_UNIT_ID unitId = 0;
WINBIO_REJECT_DETAIL rejectDetail = 0;
PWINBIO_BIR sample = NULL;
SIZE_T sampleSize = 0;
// Connect to the system pool.
hr = WinBioOpenSession(
WINBIO_TYPE_FINGERPRINT, // Service provider
WINBIO_POOL_SYSTEM, // Pool type
WINBIO_FLAG_RAW, // Access: Capture raw data
NULL, // Array of biometric unit IDs
0, // Count of biometric unit IDs
WINBIO_DB_DEFAULT, // Default database
&sessionHandle // [out] Session handle
);
// Capture a biometric sample.
wprintf_s(L"\n Calling WinBioCaptureSample - Swipe sensor...\n");
hr = WinBioCaptureSample(
sessionHandle,
WINBIO_NO_PURPOSE_AVAILABLE,
WINBIO_DATA_FLAG_RAW,
&unitId,
&sample,
&sampleSize,
&rejectDetail
);
wprintf_s(L"\n Swipe processed - Unit ID: %d\n", unitId);
wprintf_s(L"\n Captured %d bytes.\n", sampleSize);
PWINBIO_BIR_HEADER BirHeader = (PWINBIO_BIR_HEADER)(((PBYTE)sample) + sample->HeaderBlock.Offset);
PWINBIO_BDB_ANSI_381_HEADER AnsiBdbHeader = (PWINBIO_BDB_ANSI_381_HEADER)(((PBYTE)sample) + sample->StandardDataBlock.Offset);
PWINBIO_BDB_ANSI_381_RECORD AnsiBdbRecord = (PWINBIO_BDB_ANSI_381_RECORD)(((PBYTE)AnsiBdbHeader) + sizeof(WINBIO_BDB_ANSI_381_HEADER));
PBYTE firstPixel = (PBYTE)((PBYTE)AnsiBdbRecord) + sizeof(WINBIO_BDB_ANSI_381_RECORD);
SaveBMP(firstPixel, AnsiBdbRecord->HorizontalLineLength, AnsiBdbRecord->VerticalLineLength, AnsiBdbRecord->BlockLength, "D://test.bmp");
wprintf_s(L"\n Press any key to exit.");
_getch();
}
IInspectable is correct, the corruption looks like it's coming from your implicit use of color tables:
info.biBitCount = 8;
info.biCompression = BI_RGB;
If your data is actually just 24-bit RGB, you can do info.biBitCount = 24; to render a valid bitmap. If it's lower (or higher) than that, then you'll need to do some conversion work. You can check AnsiBdbHeader->PixelDepth to confirm that it's the 8 bits per pixel that you expect.
It also looks like your passing AnsiBdbRecord->BlockLength to SaveBMP isn't quite right. The docs for this field say:
WINBIO_BDB_ANSI_381_RECORD structure
BlockLength
Contains the number of bytes in this structure plus the number of bytes of sample image data.
So you'll want to make sure to subtract sizeof(WINBIO_BDB_ANSI_381_RECORD) before passing it as your bitmap buffer size.
Side note, make sure you free the memory involved after the capture.
WinBioFree(sample);
WinBioCloseSession(sessionHandle);

Direct2D Create SwapChain

I am trying to program a Direct2D desktop app based on a Windows tutorial, but am having problems creating a SwapChain1. In the code below everything gets initialized until the CreateSwapChainForHwnd. The pointer m_pDXGISwapChain1 stays NULL. All the pointers except pOutput are ComPtrs.
D2D1_FACTORY_OPTIONS options;
ZeroMemory(&options, sizeof(D2D1_FACTORY_OPTIONS));
HRESULT hr = D2D1CreateFactory(D2D1_FACTORY_TYPE_SINGLE_THREADED,
__uuidof(ID2D1Factory1), &options, &m_pD2DFactory1);
if(SUCCEEDED(hr))
{
UINT creationFlags = D3D11_CREATE_DEVICE_BGRA_SUPPORT;
D3D_FEATURE_LEVEL featureLevels[] = { D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 };
hr = D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, 0, creationFlags,
featureLevels, ARRAYSIZE(featureLevels), D3D11_SDK_VERSION, &m_pD3DDevice,
&m_featureLevel, &m_pD3DDeviceContext);
}
if(SUCCEEDED(hr))
hr = m_pD3DDevice.As(&m_pDXGIDevice1);
if(SUCCEEDED(hr))
hr = m_pD2DFactory1->CreateDevice(m_pDXGIDevice1.Get(), &m_pD2DDevice);
if(SUCCEEDED(hr))
hr = m_pD2DDevice->CreateDeviceContext(D2D1_DEVICE_CONTEXT_OPTIONS_NONE, &m_pD2DDeviceContext);
if(SUCCEEDED(hr))
hr = m_pDXGIDevice1->GetAdapter(&m_pDXGIAdapter);
if(SUCCEEDED(hr))
hr = m_pDXGIAdapter->GetParent(IID_PPV_ARGS(&m_pDXGIFactory2));
DXGI_SWAP_CHAIN_DESC1 swapChainDesc1 = {0};
swapChainDesc1.Width = 0;
swapChainDesc1.Height = 0;
swapChainDesc1.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
swapChainDesc1.Stereo = false;
swapChainDesc1.SampleDesc.Count = 1;
swapChainDesc1.SampleDesc.Quality = 0;
swapChainDesc1.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDesc1.BufferCount = 2;
swapChainDesc1.Scaling = DXGI_SCALING_NONE;
swapChainDesc1.SwapEffect = DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL;
swapChainDesc1.AlphaMode = DXGI_ALPHA_MODE_IGNORE;
swapChainDesc1.Flags = 0;
IDXGIOutput *pOutput;
m_pDXGIAdapter->EnumOutputs(0, &pOutput);
if(SUCCEEDED(hr))
hr = m_pDXGIFactory2->CreateSwapChainForHwnd(
static_cast<IUnknown*>(m_pD3DDevice.Get()), m_hwnd, &swapChainDesc1,
NULL, pOutput, &m_pDXGISwapChain1);
if(SUCCEEDED(hr))
hr = m_pDXGIDevice1->SetMaximumFrameLatency(1);
if(SUCCEEDED(hr))
hr = m_pDXGISwapChain1->GetBuffer(0, IID_PPV_ARGS(&m_pDXGIBackBuffer));
If all your pointers are ComPtr, then the call should look like this:
ComPtr<ID3D11Device> d3dDevice;
ComPtr<IDXGIFactory2> dxgiFactory;
// assuming d3dDevice and dxgiFactory are initialized correctly:
ComPtr<IDXGISwapChain1> swapChain;
dxgiFactory->CreateSwapChainForHwnd(d3dDevice.Get(), hWnd, &swapChainDescription, nullptr, nullptr, swapChain.GetAddressOf())
As for your swap chain description, if you're making a non-Windows Store App, you should set
swapChainDescription.Scaling = DXGI_SCALING_STRETCH;
swapChainDescription.SwapEffect = DXGI_SWAP_EFFECT_DISCARD;
Both of those values are 0, so you can leave them out.
Here is the complete swap chain description that I use:
DXGI_SWAP_CHAIN_DESC1 swapChainDescription = {};
swapChainDescription.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
swapChainDescription.SampleDesc.Count = 1;
swapChainDescription.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDescription.BufferCount = 2;
See this article for full a walkthrough of how to set up Direct2D 1.1 properly - including the CreateSwapChainForHwnd call: http://msdn.microsoft.com/en-us/magazine/dn198239.aspx
Direct2D don't have SwapChain at user level, SwapChain is for Direct3D, I see some DirectX 11 code in your post, do you really want Direct2D? or Direct3D?

Slow transfer speed on remote computer

Hey there StackOverflow people!
I'm making an IOCP server and I have ironed out most issues so far but one still remain and I do not know where to start looking at. When I run the client/server on my machine everything is fine and dandy. It matches the speed of the Windows SDK Sample maybe a little bit faster and definitely uses less CPU cycle. However when I run the client from a separate computer, transfer speed caps at 37 KB/s and has a roundtrip latency of 200ms (as opposed to 0). Now if I connect the client to the SDK Sample server, I don't have that problem so there is something wrong with my code. As far as I know, the sockets are initialized the exact same way with the same options. I have also ran my server in a profiler to check for bottleneck but I couldn't find any. Also, the computers I have tried it on were connected to the same gigabit switch (with gigabit adapter). I know this is kind of vague but that's because I couldn't pinpoint the problem so far and I would be eternally grateful if any of you guys could point me in the right direction.
Cheers,
-Roxy
EDIT2:
After following Mike's advise, I did some research on the code and found out that when a remote client connects to the server most of the time the code is waiting on GetQueuedCompletionStatus. This suggest that IO request are simply taking a long time to complete but I still don't understand why. This only occurs only when the client is on a remote computer. I'm thinking this has something to do with how a setup the sockets or how I'm posting the request but I don't see any difference with the sample code.
Any ideas?
EDIT (Added sample code):
Alright, here it is! It ain't pretty though!
If you have the Windows SDK installed, you can connect to it using the iocpclient sample (Program Files\Microsoft SDKs\Windows\v7.1\Samples\netds\winsock\iocp\client) and changing it's default port at line 73 to 5000.
Weird things I've just noticed when trying it myself is that it seems the sample iocpclient doesn't cause the same caps at 37KB/s issue... However it looks like the sample code has a limit set to around 800KB/s. I'll post a client if that can be of any help.
#pragma comment(lib, "Ws2_32.lib")
#include <WinSock2.h>
#include <stdio.h>
unsigned int connection = 0;
unsigned int upload = 0;
unsigned int download = 0;
#define IO_CONTEXT_COUNT 5
class NetClientHost
{
friend class gNetProtocolHost;
public:
enum Operation
{
kOperationUnknown,
kOperationRead,
kOperationWrite,
};
struct ClientData
{
SOCKET socket;
};
struct IOContext
{
WSAOVERLAPPED overlapped;
WSABUF wsaReceiveBuf;
WSABUF wsaSendBuf;
char *buf;
char *TESTbuf;
unsigned long bytesReceived;
unsigned long bytesSent;
unsigned long flags;
unsigned int bytesToSendTotal;
unsigned int remainingBytesToSend;
unsigned int chunk;
Operation operation;
};
NetClientHost()
{
memset((void *) &m_clientData, 0, sizeof(m_clientData));
}
NetClientHost::IOContext *NetClientHost::AcquireContext()
{
while (true)
{
for (int i = 0; i < IO_CONTEXT_COUNT; ++i)
{
if (!(m_ioContexts + i)->inUse)
{
InterlockedIncrement(&(m_ioContexts + i)->inUse);
//ResetEvent(*(m_hContextEvents + i));
if ((m_ioContexts + i)->ioContext.TESTbuf == 0)
Sleep(1);
return &(m_ioContexts + i)->ioContext;
}
}
//++g_blockOnPool;
//WaitForMultipleObjects(IO_CONTEXT_COUNT, m_hContextEvents, FALSE, INFINITE);
}
}
const ClientData *NetClientHost::GetClientData() const
{
return &m_clientData;
};
void NetClientHost::Init(unsigned int bufferSize)
{
_InitializeIOContexts(bufferSize ? bufferSize : 1024);
}
void NetClientHost::ReleaseContext(IOContext *ioContext)
{
int i = sizeof(_IOContextData), j = sizeof(IOContext);
_IOContextData *contextData = (_IOContextData *) (((char *) ioContext) - (i - j));
InterlockedDecrement(&contextData->inUse);
//SetEvent(*(m_hContextEvents + contextData->index));
}
struct _IOContextData
{
unsigned int index;
volatile long inUse;
IOContext ioContext;
};
ClientData m_clientData;
_IOContextData *m_ioContexts;
HANDLE *m_hContextEvents;
void _InitializeIOContexts(unsigned int bufferSize)
{
m_ioContexts = new _IOContextData[IO_CONTEXT_COUNT];
m_hContextEvents = new HANDLE[IO_CONTEXT_COUNT];
memset((void *) m_ioContexts, 0, sizeof(_IOContextData) * IO_CONTEXT_COUNT);
for (int i = 0; i < IO_CONTEXT_COUNT; ++i)
{
(m_ioContexts + i)->index = i;
(m_ioContexts + i)->ioContext.buf = new char[bufferSize];
(m_ioContexts + i)->ioContext.wsaReceiveBuf.len = bufferSize;
(m_ioContexts + i)->ioContext.wsaReceiveBuf.buf = (m_ioContexts + i)->ioContext.buf;
(m_ioContexts + i)->ioContext.TESTbuf = new char[10000];
(m_ioContexts + i)->ioContext.wsaSendBuf.buf = (m_ioContexts + i)->ioContext.TESTbuf;
*(m_hContextEvents + i) = CreateEvent(0, TRUE, FALSE, 0);
}
}
void _SetSocket(SOCKET socket)
{
m_clientData.socket = socket;
}
};
bool WriteChunk(const NetClientHost *clientHost, NetClientHost::IOContext *ioContext)
{
int status;
status = WSASend(clientHost->GetClientData()->socket, &ioContext->wsaSendBuf, 1, &ioContext->bytesSent, ioContext->flags, &ioContext->overlapped, 0);
if (status == SOCKET_ERROR && WSAGetLastError() != WSA_IO_PENDING)
{
// ...
return false;
}
return true;
}
bool Write(NetClientHost *clientHost, void *buffer, unsigned int size, unsigned int chunk)
{
//__ASSERT(m_clientHost);
//__ASSERT(m_clientHost->GetClientData()->remainingBytesToSend == 0);
NetClientHost::IOContext *ioContext = clientHost->AcquireContext();
if (!chunk)
chunk = size;
ioContext->wsaSendBuf.buf = ioContext->TESTbuf;
ioContext->operation = NetClientHost::kOperationWrite;
ioContext->flags = 0;
ioContext->wsaSendBuf.buf = new char[size];
memcpy((void *) ioContext->wsaSendBuf.buf, buffer, chunk);
ioContext->wsaSendBuf.len = chunk;
ioContext->chunk = chunk;
ioContext->bytesToSendTotal = size;
ioContext->remainingBytesToSend = size;
return WriteChunk(clientHost, ioContext);
}
void Read(NetClientHost *clientHost)
{
NetClientHost::IOContext *ioContext = clientHost->AcquireContext();
int status;
memset((void *) ioContext, 0, sizeof(NetClientHost::IOContext));
ioContext->buf = new char[1024];
ioContext->wsaReceiveBuf.len = 1024;
ioContext->wsaReceiveBuf.buf = ioContext->buf;
ioContext->flags = 0;
ioContext->operation = NetClientHost::kOperationRead;
status = WSARecv(clientHost->GetClientData()->socket, &ioContext->wsaReceiveBuf, 1, &ioContext->bytesReceived, &ioContext->flags, &ioContext->overlapped, 0);
int i = WSAGetLastError();
if (status == SOCKET_ERROR && WSAGetLastError() != WSA_IO_PENDING)
{
// ...
}
}
bool AddSocket(HANDLE hIOCP, SOCKET socket)
{
++connection;
int bufSize = 0;
LINGER lingerStruct;
lingerStruct.l_onoff = 1;
lingerStruct.l_linger = 0;
setsockopt(socket, SOL_SOCKET, SO_SNDBUF, (char *) &bufSize, sizeof(int));
setsockopt(socket, SOL_SOCKET, SO_RCVBUF, (char *) &bufSize, sizeof(int));
setsockopt(socket, SOL_SOCKET, SO_LINGER, (char *) &lingerStruct, sizeof(lingerStruct) );
NetClientHost *clientHost = new NetClientHost;
clientHost->_InitializeIOContexts(1024);
clientHost->Init(0);
clientHost->_SetSocket(socket);
// Add this socket to the IO Completion Port
CreateIoCompletionPort((HANDLE) socket, hIOCP, (DWORD_PTR) clientHost, 0);
Read(clientHost);
return true;
}
int read = 0, write = 0;
DWORD WINAPI WorkerThread(LPVOID param)
{
LPOVERLAPPED overlapped;
NetClientHost *clientHost;
HANDLE hIOCP = (HANDLE) param;
DWORD ioSize;
BOOL status;
while (true)
{
status = GetQueuedCompletionStatus(hIOCP, &ioSize, (PULONG_PTR) &clientHost, (LPOVERLAPPED *) &overlapped, INFINITE);
if (!(status || ioSize))
{
--connection;
//_CloseConnection(clientHost);
continue;
}
NetClientHost::IOContext *ioContext = (NetClientHost::IOContext *) overlapped;
switch (ioContext->operation)
{
case NetClientHost::kOperationRead:
download += ioSize;
Write(clientHost, ioContext->wsaReceiveBuf.buf, ioSize, 0);
write++;
clientHost->ReleaseContext(ioContext);
break;
case NetClientHost::kOperationWrite:
upload += ioSize;
if (ioContext->remainingBytesToSend)
{
ioContext->remainingBytesToSend -= ioSize;
ioContext->wsaSendBuf.len = ioContext->chunk <= ioContext->remainingBytesToSend ? ioContext->chunk : ioContext->remainingBytesToSend; // equivalent to min(clientData->chunk, clientData->remainingBytesToSend);
ioContext->wsaSendBuf.buf += ioContext->wsaSendBuf.len;
}
if (ioContext->remainingBytesToSend)
{
WriteChunk(clientHost, ioContext);
}
else
{
clientHost->ReleaseContext(ioContext);
Read(clientHost);
read++;
}
break;
}
}
return 0;
}
DWORD WINAPI ListenThread(LPVOID param)
{
SOCKET sdListen = (SOCKET) param;
HANDLE hIOCP = CreateIoCompletionPort(INVALID_HANDLE_VALUE, 0, 0, 0);
CreateThread(0, 0, WorkerThread, hIOCP, 0, 0);
CreateThread(0, 0, WorkerThread, hIOCP, 0, 0);
CreateThread(0, 0, WorkerThread, hIOCP, 0, 0);
CreateThread(0, 0, WorkerThread, hIOCP, 0, 0);
while (true)
{
SOCKET as = WSAAccept(sdListen, 0, 0, 0, 0);
if (as != INVALID_SOCKET)
AddSocket(hIOCP, as);
}
}
int main()
{
SOCKET sdListen;
SOCKADDR_IN si_addrlocal;
int nRet;
int nZero = 0;
LINGER lingerStruct;
WSADATA wsaData;
WSAStartup(0x202, &wsaData);
sdListen = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_IP, NULL, 0, WSA_FLAG_OVERLAPPED);
si_addrlocal.sin_family = AF_INET;
si_addrlocal.sin_port = htons(5000);
si_addrlocal.sin_addr.s_addr = htonl(INADDR_ANY);
nRet = bind(sdListen, (struct sockaddr *)&si_addrlocal, sizeof(si_addrlocal));
nRet = listen(sdListen, 5);
nZero = 0;
nRet = setsockopt(sdListen, SOL_SOCKET, SO_SNDBUF, (char *) &nZero, sizeof(nZero));
nZero = 0;
nRet = setsockopt(sdListen, SOL_SOCKET, SO_RCVBUF, (char *)&nZero, sizeof(nZero));
lingerStruct.l_onoff = 1;
lingerStruct.l_linger = 0;
nRet = setsockopt(sdListen, SOL_SOCKET, SO_LINGER, (char *)&lingerStruct, sizeof(lingerStruct) );
CreateThread(0, 0, ListenThread, (LPVOID) sdListen, 0, 0);
HANDLE console = GetStdHandle(STD_OUTPUT_HANDLE);
while (true)
{
COORD c = {0};
SetConsoleCursorPosition(console, c);
printf("Connections: %i \nUpload: %iKB/s \nDownload: %iKB/s ", connection, upload * 2 / 1024, download * 2 / 1024);
upload = 0;
download = 0;
Sleep(500);
}
return 0;
}
This kind of asynchronous system should be able to run at full datalink speed. Problems I've found wrong are such as:
timeout settings causing needless retransmissions
in the receiving process, received message A might trigger a database update, such that received message B has to wait, causing an unnecessary delay in the response to message B back to the sender, when the DB update could actually be done in idle time.
There's something called wireshark that can give you some visibility into the message traffic.
I used to do it the hard way, with time-stamped message logs.
BTW: I would first use this method on the individual processes, to clean out any bottlenecks, before doing the asynchronous analysis. If you haven't done this, you can bet they're in there.
Just any old profiler isn't reliable. There are good ones, including Zoom.

Resources