I have an HLSL shader that is giving an unexpected error when I add a particular variable to a constant buffer. The entire shader is below. To rule everything out I included the entire shader.
The error occurs when I add the NewVariable to the third constant buffer.
I haven't found this in documentation, but can I not perform the following packing where a vector starts part-way into the c0 vector?
float1 var1 : packoffset( c0.x );
float3 var2 : packoffset( c0.yzw );
I get the poorly documented error X3530 (ERR_BIND_INVALID). MSDN says, "Invalid binding operation was performed. For example, buffers can only be bound to one slot or one constant offset; invalid register specification because a particular binding was expected but didn't occur; can't mix packoffset elements with nonpackoffset elements in a cbuffer." This error message really doesn't tell me what is wrong and the pack offset looks correct to me.
cbuffer SceneGlobals : register( b0 )
{
int NumAmbientLights : packoffset( c0 );
int NumDirectionalLights : packoffset( c0.y );
int NumPointLights : packoffset( c0.z );
int NumSpotLights : packoffset( c0.w );
}
cbuffer FrameGlobals : register( b1 )
{
float Time : packoffset( c0 );
float Timestep : packoffset( c0.y );
int NumCameras : packoffset( c0.z );
float4 CameraPosition[1] : packoffset( c1 );
float4x4 CameraView[1] : packoffset( c2 );
float4x4 CameraProjection[1] : packoffset( c6 );
float4x4 CameraViewProjection[1] : packoffset( c10 );
}
cbuffer ObjectGlobals : register( b2 )
{
int ActiveCamera : packoffset( c0 );
int3 NewVariable : packoffset( c0.yzw ); // ERROR OCCURS HERE
}
struct InputStruct
{
float4 Position : POSITION;
float4 Color : COLOR;
};
struct OutputStruct
{
float4 Position : SV_POSITION0;
float4 Color : COLOR0;
float4 Normal : NORMAL0;
float4 WorldPos : POSITION0;
float4 TexCoords : TEXCOORD0;
};
OutputStruct VS( InputStruct Input )
{
OutputStruct Output = (OutputStruct)0;
Output.Position = mul(CameraViewProjection[ActiveCamera], Input.Position);
Output.Color = Input.Color;
return Output;
}
packoffset is exactly what it sounds like: an offset, not a range. This means it simply indicates where your variable should start. xyz refers to 3 different locations, but naturally, you can't start a variable in 3 locations and this is why your getting an error. What you realy want is for your new variable to start at y, so try packoffset(c0.y) instead of c0.yzw
Related
I am learning GLSL and CG now and come across this code:
float trace( vec3 origin, vec3 direction, out vec3 p ) //<-- What is "out"?
{
float totalDistanceTraveled = 0.0;
for( int i=0; i <64; ++i)
{
p = origin + direction * totalDistanceTraveled;
float distanceFromPointOnRayToClosestObjectInScene = map( p );
totalDistanceTraveled += distanceFromPointOnRayToClosestObjectInScene;
if( distanceFromPointOnRayToClosestObjectInScene < 0.0001 )
{
break;
}
if( totalDistanceTraveled > 10000.0 )
{
totalDistanceTraveled = 0.0000;
break;
}
}
return totalDistanceTraveled;
}
I am converting these code into shaders.metal so that I can use with Xcode. But I am not sure what the out is and how to change it so that I can use this function in my shader in Metal.
The out qualifier signifies that the value will be written to by the function. It's similar to (but not exactly like) pass-by-reference. The closest equivalent in Metal is a reference in the thread address space. An equivalent function declaration in Metal Shading Language looks like:
static float trace(float3 origin, float3 direction, thread float3 &p);
Using DirectX 11, I created a 3D volume texture that can be bound as a render target:
D3D11_TEXTURE3D_DESC texDesc3d;
// ...
texDesc3d.Usage = D3D11_USAGE_DEFAULT;
texDesc3d.BindFlags = D3D11_BIND_RENDER_TARGET;
// Create volume texture and views
m_dxDevice->CreateTexture3D(&texDesc3d, nullptr, &m_tex3d);
m_dxDevice->CreateRenderTargetView(m_tex3d, nullptr, &m_tex3dRTView);
I would now like to update the whole render target and fill it with procedural data generated in a pixel shader, similar to updating a 2D render target with a 'fullscreen pass'. Everything I need to generate the data is the UVW coordinates of the pixel in question.
For 2D, a simple vertex shader that renders a full screen triangle can be built:
struct VS_OUTPUT
{
float4 position : SV_Position;
float2 uv: TexCoord;
};
// input: three empty vertices
VS_OUTPUT main( uint vertexID : SV_VertexID )
{
VS_OUTPUT result;
result.uv = float2((vertexID << 1) & 2, vertexID & 2);
result.position = float4(result.uv * float2(2.0f, -2.0f) + float2(-1.0f, 1.0f), 0.0f, 1.0f);
return result;
}
I have a hard time wrapping my head around how to adopt this principle for 3D. Is this even possible in DirectX 11, or do I have to render to individual slices of the volume texture as described here?
Here is some sample code doing it with pipeline version. You basically batch N triangles and route each instance to a volume slice using Geometry Shader.
struct VS_OUTPUT
{
float4 position : SV_Position;
float2 uv: TexCoord;
uint index: SLICEINDEX;
};
VS_OUTPUT main( uint vertexID : SV_VertexID, uint ii : SV_InstanceID )
{
VS_OUTPUT result;
result.uv = float2((vertexID << 1) & 2, vertexID & 2);
result.position = float4(result.uv * float2(2.0f, -2.0f) + float2(-1.0f, 1.0f), 0.0f, 1.0f);
result.index= ii;
return result;
}
Now you need to call DrawInstanced with 3 vertices and N instances where N is your volume slices count
Then you assign triangles to GS like this:
struct psInput
{
float4 pos : SV_POSITION;
float2 uv: TEXCOORD0;
uint index : SV_RenderTargetArrayIndex; //This will write your vertex to a specific slice, which you can read in pixel shader too
};
[maxvertexcount(3)]
void GS( triangle VS_OUTPUT input[3], inout TriangleStream<psInput> gsout )
{
psInput output;
for (uint i = 0; i < 3; i++)
{
output.pos = input[i].pos;
output.uv = input[i].uv;
output.index= input[0].index; //Use 0 as we need to push a full triangle to the slice
gsout.Append(output);
}
gsout.RestartStrip();
}
Now you have access to slice index in your pixel shader:
float4 PS(psInput input) : SV_Target
{
//Do something with uvs, and use slice input as Z
}
Compute shader version (don't forget to create a UAV for your volume), and numthreads here is totally arbirtary
[numthreads(8,8,8)]
void CS(uint3 tid : SV_DispatchThreadID)
{
//Standard overflow safeguards
//Generate data using tid coordinates
}
Now instead you need to call dispatch with
width/8, height/8, depth/8
I have currently the problem that a library creates a DX11 texture with BGRA pixel format.
But the displaying library can only display RGBA correctly. (This means the colors are swapped in the rendered image)
After looking around I found a simple for-loop to solve the problem, but the performance is not very good and scales bad with higher resolutions. I'm new to DirectX and maybe I just missed a simple function to do the converting.
// Get the image data
unsigned char* pDest = view->image->getPixels();
// Prepare source texture
ID3D11Texture2D* pTexture = static_cast<ID3D11Texture2D*>( tex );
// Get context
ID3D11DeviceContext* pContext = NULL;
dxDevice11->GetImmediateContext(&pContext);
// Copy data, fast operation
pContext->CopySubresourceRegion(texStaging, 0, 0, 0, 0, tex, 0, nullptr);
// Create mapping
D3D11_MAPPED_SUBRESOURCE mapped;
HRESULT hr = pContext->Map( texStaging, 0, D3D11_MAP_READ, 0, &mapped );
if ( FAILED( hr ) )
{
return;
}
// Calculate size
const size_t size = _width * _height * 4;
// Access pixel data
unsigned char* pSrc = static_cast<unsigned char*>( mapped.pData );
// Offsets
int offsetSrc = 0;
int offsetDst = 0;
int rowOffset = mapped.RowPitch % _width;
// Loop through it, BRGA to RGBA conversation
for (int row = 0; row < _height; ++row)
{
for (int col = 0; col < _width; ++col)
{
pDest[offsetDst] = pSrc[offsetSrc+2];
pDest[offsetDst+1] = pSrc[offsetSrc+1];
pDest[offsetDst+2] = pSrc[offsetSrc];
pDest[offsetDst+3] = pSrc[offsetSrc+3];
offsetSrc += 4;
offsetDst += 4;
}
// Adjuste offset
offsetSrc += rowOffset;
}
// Unmap texture
pContext->Unmap( texStaging, 0 );
Solution:
Texture2D txDiffuse : register(t0);
SamplerState texSampler : register(s0);
struct VSScreenQuadOutput
{
float4 Position : SV_POSITION;
float2 TexCoords0 : TEXCOORD0;
};
float4 PSMain(VSScreenQuadOutput input) : SV_Target
{
return txDiffuse.Sample(texSampler, input.TexCoords0).rgba;
}
Obviously iterating over a texture on you CPU is not the most effective way. If you know that colors in a texture are always swapped like that and you don't want to modify the texture itself in your C++ code, the most straightforward way would be to do it in the pixel shader. When you sample the texture, simply swap colors there. You won't even notice any performance drop.
I have a Texture2D readily available; I have an apparently-working shader texture sampler and shader texture variable that I can put that Texture2D in.
The only problem is, I don't know how to load a texture into a shader in DirectX11 - And either Google is being unhelpful, or it's just my inability to construct good search terms.
What I need: Code that will take a Texture2D and load it into a shader. A link on how to do so, for example.
Anyway, here's my shader code:
cbuffer CameraSet : register(b0)
{
float4x4 ViewProj ;
} ;
cbuffer MeshSet : register(b1)
{
float4x4 World ;
texture2D Texture ;
SamplerState MeshTextureSampler
{
Filter = MIN_MAG_MIP_LINEAR ;
AddressU = WRAP ;
AddressV = WRAP ;
} ;
} ;
struct VShaderOutput
{
float4 WorldPosition : POSITION ;
float4 ScreenPosition : SV_POSITION ;
float2 UV : TEXCOORD;
} ;
VShaderOutput VShader( float4 position : POSITION, float2 uv : TEXCOORD )
{
VShaderOutput r ;
r.WorldPosition = mul( position, World ) ;
r.ScreenPosition = mul( r.WorldPosition, ViewProj ) ;
r.UV.x = abs( uv.x ) ;
r.UV.y = abs( uv.y ) ;
return r ;
}
struct PShaderOutput
{
float4 SV_Target : SV_TARGET ;
float SV_Depth : SV_DEPTH ;
};
PShaderOutput PShader( VShaderOutput input )
{
PShaderOutput r ;
r.SV_Depth = input.ScreenPosition.z;
r.SV_Target = Texture.Sample( MeshTextureSampler, input.UV ) ;
return r ;
}
Thanks.
...If it's
context.PixelShader.SetShaderResource(TextureShaderResourceView, 0);
I think I just answered my own question. But why does an array value of '0' work?
I use 2 shader Resources:
hlsl:
Texture2D<float4> Self : register(t0);
Texture2D<float4> Other : register(t1);
cs:
device.ImmediateContext.ComputeShader.SetShaderResource(resourceViewSelf, 0);
device.ImmediateContext.ComputeShader.SetShaderResource(resourceViewOther, 1);
somehow it works...
I am trying some experiments in fractal rendering with DirectX11 Compute Shaders.
The provided example runs on a FeatureLevel_10 device.
My RwStructured output buffer has a data format of R32G32B32A32_FLOAT
The problem is that when writing to the buffer, it seems that only the ALPHA ( w ) value gets written nothing else....
Here is the shader code:
struct BufType
{
float4 value;
};
cbuffer ScreenConstants : register(b0)
{
float2 ScreenDimensions;
float2 Padding;
};
RWStructuredBuffer<BufType> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void Main( uint3 DTid : SV_DispatchThreadID )
{
uint index = DTid.y * ScreenDimensions.x + DTid.x;
float minRe = -2.0f;
float maxRe = 1.0f;
float minIm = -1.2;
float maxIm = minIm + ( maxRe - minRe ) * ScreenDimensions.y / ScreenDimensions.x;
float reFactor = (maxRe - minRe ) / (ScreenDimensions.x - 1.0f);
float imFactor = (maxIm - minIm ) / (ScreenDimensions.y - 1.0f);
float cim = maxIm - DTid.y * imFactor;
uint maxIterations = 30;
float cre = minRe + DTid.x * reFactor;
float zre = cre;
float zim = cim;
bool isInside = true;
uint iterationsRun = 0;
for( uint n = 0; n < maxIterations; ++n )
{
float zre2 = zre * zre;
float zim2 = zim * zim;
if ( zre2 + zim2 > 4.0f )
{
isInside = false;
iterationsRun = n;
}
zim = 2 * zre * zim + cim;
zre = zre2 - zim2 + cre;
}
if ( isInside )
{
BufferOut[index].value = float4(1.0f,0.0f,0.0f,1.0f);
}
}
The code actually produces in a sense the correct result ( 2D Mandelbrot set ) but it seems somehow only the alpha value is touched and nothing else is written, although the pixels inside the set should be colored red... ( the image is black & white )
Anybody has a clue what's going on here ?
After some fiddling around i found the problem.
I have not found any documentation from MS mentioning this, so it could also be a Nvidia
specific driver issue.
Apparently you are only allowed to write ONCE per Compute Shader Invocation to the same element in a RWSructuredBuffer. And you also HAVE to write ONCE.
I changed the code to accumulate the correct color in a local variable, and write it now only once at the end of the shader.
Everything works perfectly now in that way.
I'm not sure but, shouldn't it be for BufferOut decl:
RWStructuredBuffer<BufType> BufferOut : register(u0);
instead of :
RWStructuredBuffer BufferOut : register(u0);
If you are only using a float4 write target, why not use just:
RWBuffer<float4> BufferOut : register (u0);
Maybe this could help.
After playing around today again, i ran into the same problem once again.
The following code produced all white output:
[numthreads(1, 1, 1)]
void Main( uint3 dispatchId : SV_DispatchThreadID )
{
float4 color = float4(1.0f,0.0f,0.0f,1.0f);
WriteResult(dispatchId,color);
}
The WriteResult method is a utility method from my hlsl standard library.
Long story short. After i upgraded from Driver version 192 to 195(beta) the problem went away.
Seems like the drivers have some definitive problems in compute shader support left, so beware.
from what ive seen, computer shaders are only useful if you need a more general computational model than the tradition pixel shader, or if you can load data and then share it between threads in fast shared memory. im fairly sure u would get better performance with a pixel shader for the mandelbrot shader.
on my setup (win7, feb 10 dx sdk, gtx480) my compute shaders have a punishing setup time of over 0.2-0.3ms (binding a SRV and a UAV and then calling dispatch()).
if u do a PS implementation please post your experiences.
I have no direct experience with DX compute shaders but...
Why are you setting alpha = 1.0?
IIRC, that makes the pixel 100% transparent, so your inside pixels are transparent red, and show up as whatever color was drawn behind them.
When alpha = 1.0, the RGB components are never used.