I am trying to create a pseudo random float between 0.0 (inclusive) and 1.0 (inclusive) in GLSL ES in order to process the mutations for a chromosome on the GPU rather than a CPU in a genetic algorithm. How would I go about this?
float random(vec2 st)
{
return fract(sin(dot(st.xy, vec2(12.9898,78.233))) * 43758.5453123);
}
If you want to learn more about this function, here's a link from The Book of Shaders: https://thebookofshaders.com/10/
Related
I'm trying to do neighbour processing on GPU with HLSL, and I'm wondering if there is a way to load an array of neigbour samples at once and not just one sample, so that I can utilize matrix math instead of for loops.
My current implementation, using SampleLevel function is something like:
float3 pixel = inputTex.SampleLevel(sampleState, uv + uvOffset, 0.0, 0.0);
Instead, I'd like to load more than one sample at a time, but I haven't found an API for that. Or if my approach for this is totally wrong, please let me know how else to go about utilizing vectorization and matrix math in HLSL. Thanks for any advice and have a great day!
As far as I know max you can get from single texture access is float4.
Your options are (Assuming you only need 4 neighboring values):
Calling 4 Load methods (requires texel integer coordinate, and not UV).
Calling 4 Sample methods using point sampler (requires offseting UV by 0.5/textureSize, so your samples would be (u - offset, v - offset), (u + offset, v - offset), (u - offset, v + offset), (u + offset, v + offset))
If your texture is single channel (Or if you only need red channel), you can use Gather - https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-to-gather
I'm writing CG shaders for advanced lighting calculation for game based on Unity. Sometimes it is needed to sum all vector components. There are two ways to do it:
Just write something like:
float sum = v.x + v.y + v.z;
Or do something like:
float sum = dot(v,float3(1,1,1));
I am really curious about what is faster and looks better for code style.
It's obvious that if we have same question for CPU calculations, the first simle way is much better. Because of:
a) There is no need to allocate another float(1,1,1) vector
b) There is no need to multiply every original vector "v" components by 1.
But since we do it in shader code, which runs on GPU, I belive there is some great hardware optimization for dot product function, and may be allocation of float3(1,1,1) will be translated in no allocation at all.
float4 _someVector;
void surf (Input IN, inout SurfaceOutputStandard o){
float sum = _someVector.x + _someVector.y + _someVector.z + _someVector.w;
// VS
float sum2 = dot(_someVector, float4(1,1,1,1));
}
Check this link.
Vec3 Dot has a cost of 3 cycles, while Scalar Add has a cost of 1.
Thus, in almost all platforms (AMD and NVIDIA):
float sum = v.x + v.y + v.z; has a cost of 2
float sum = dot(v,float3(1,1,1)); has a cost of 3
The first implementation should be faster.
Implementation of the Dot product in cg: https://developer.download.nvidia.com/cg/dot.html
IMHO difference is immeasurable, in 98% of the cases, but first one should be faster, because multiplication is a "more expensive" operation
I'm just starting to get the hang of Perlin Noise in general, but many sites I've read up on in regards to terrain generation refer to a falloff value.
It seems quite typical in 3D (cube-based terrain) to use the result of a 3D Perlin Noise function as a density test, where if it's greater than 0 it's land, less than or equal to 0 is air. Then simply offset the result from the function by the current y value before you do the density test to get smooth semi-flat terrain.
What I don't understand is what is meant in regards to a falloff value.
Can someone please explain what a falloff value in this sense is referring to, perhaps even using a code example?
The falloff is used to determine the weight of the octaves. You can either use explicit weights, which allows you to customize the result in a wider variety. Or you can use implicit weights with a falloff value. This will set the weights to an exponential function.
E.g. if you have a falloff value of 0.5, then the octaves' weights are as follows (unnormalized)
Octave 1: 1 = falloff ^ 0
Octave 2: 1 * 0.5 = 0.5 = falloff ^ 1
Octave 3: 0.5 * 0.5 = 0.25 = falloff ^ 2
Octave 4: 0.25 * 0.5 = 0.125 = falloff ^ 3
The overall result is calculated with
Sum [i] ( (value of octave i) * (weight i) )
Typically a normalization is needed, so that weights sum up to 1.
I'm attempting to skin vertices using DirectCompute. The method of skinning employed is such that you can have a variable amount of weights influencing each vertex (e.g. Md5 meshes are defined this way).
Basically inputs to the compute shader are.
JointsBuffer { float4 orientation, float4 position } Structured buffer SRV
WeightsBuffer { float3 normal, float4 position, float bias, uint jointIndex } Structured buffer SRV
VerticesBuffer { float2 texcoords, uint weightIndex, uint numWeights } Structured buffer SRV
and the output is
SkinnedVerticesBuffer { float3 normal, float4 position, float2 texcoord } Structured buffer UAV
Now the compute shader should be run once per element in the vertex buffer, and using SV_DispatchThreadID the shader attempts to populate the corresponding SkinnedVertex in the SkinnedVerticesBuffer for every Vertex in the VerticesBuffer ( 1:1 correspondence ).
So the problem is that many meshes have greater than 65535 vertices, and the DispatchThreadID command only allows for dispatching that many threads per dimension. Now I can theoretically write something that divides a lot of numbers up into a combination of three factors less than 65535, but I can't possibly do that for prime numbers.
So for example when some mesh with 71993 ( a prime number ) of vertices comes up I can't think of a way to handle it.
I can't over dispatch say 72000 threads with context->Dispatch( 36000, 2, 0 ), because then DispatchThreadID will run out of my buffer bounds.
Right now I'm leaning towards a constant buffer holding the amount of vertices, and then over dispatching to the nearest power of 2 and then simply doing
if( SV_DispatchThreadID > numVertices ) return;
Is this my only option? Anyone else run into this snag.
I've never. But 65000 threads seems like an awful lot.
Then, when I try to find documentation it seems that the values you pass are not threads, but thread groups. Someone on gamedev seems to have performance issues when passing a number as great as 768, so it seems to me that you will have to decrease that huge number.
I'm not sure, but I got the feeling you're misinterpreting these parameters. Try to read again what these values actually mean. (Just a layman's gut feeling, though.)
I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.
If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?
I'm speaking only theoretically here since computer generated random numbers are not actually random.
One simple trick is to select each dimension from a gaussian distribution, then normalize:
from random import gauss
def make_rand_vector(dims):
vec = [gauss(0, 1) for i in range(dims)]
mag = sum(x**2 for x in vec) ** .5
return [x/mag for x in vec]
For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.
If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.
You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.
This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.
Pseudocode:
Let n be the number of dimensions, K the desired number of vectors:
vec_count=0
while vec_count < K
generate n uniformly distributed values a[0..n-1] over [-1, 1]
r_squared = sum over i=0,n-1 of a[i]^2
if 0 < r_squared <= 1.0
b[i] = a[i]/sqrt(r_squared) ; normalize to length of 1
add vector b[0..n-1] to output list
vec_count = vec_count + 1
else
reject this sample
end while
There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere
I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.
Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:
f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰
where x is the angle, and f_X is the probability density distribution.
I have written about this here:
https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/
#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)
// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);
double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);