opencl kernel too many operands - xcode

I made the basic OpenCL example in "OpenCL Programming Guide for Mac" (rev. 2013-08-08). It worked perfect. Then I replaced and added my own code (for optical calcs) but then it crashes, the debugger show "gcl_log_cl_fatal(err, "Executing calculate failed");". After some experimentation I now know where the crash happens, It happens inside the kernel in the following lines of code,
float M = angle[i];
float K = sqrt(1.0f - M*M); // program crashes
Some experimentations with these lines of code gives various results
Example1
float M = angle[i];
float K = sqrt(M*M); // works perfect
Example2
float M = angle[i];
float MM = 1.0f - M*M; // program crash
float K = sqrt(MM);
Example3
float M = angle[i];
float MM = M*M;
float K = sqrt(1.0f - MM); // works perfect
It seems that when there is more than 2 operands per line, in the kernel code, it crashes. Probably not a hardware issue because APPLE's NBody-demo works (I have a ATI 4850; Xcode 5.1; osx 10.9). I using the recommended libraries with no extra libraries and I have checked the Build-Settings.
Making the program with 2-opernd statements works but is ugly, and i don't understand what happening.
Tommy Landberg, Stockholm, Sweden

Related

How to convert dp to px in xamarin.android?

I want to convert dp to px in my C# code in xamarin.android, but all I could find were java codes in android studio that have some problems in xamarin. I tried to use equivalent like using Resources instead of getResources() and I could solve some little problems, but there are some problems yet that I couldn't find any equivalent for them. here are original codes, my codes, and my codes problems in xamarin:
First code
(found from Programatically set height on LayoutParams as density-independent pixels)
java code
int height = (int)TypedValue.applyDimension(TypedValue.COMPLEX_UNIT_DIP, < HEIGHT >, getResources().getDisplayMetrics());
C# code
int height = (int)TypedValue.ApplyDimension(TypedValue.COMPLEX_UNIT_DIP, < HEIGHT >, Resources.DisplayMetrics);
problems:
'TypedValue' does not contain a definition for 'COMPLEX_UNIT_DIP'
Invalid expression term < (The same error for >)
The name 'HEIGHT' does not exist in the current context
Second code
(found from Formula px to dp, dp to px android)
java code
DisplayMetrics displayMetrics = getContext().getResources().getDisplayMetrics();
int px = Math.round(dp * (displayMetrics.xdpi / DisplayMetrics.DENSITY_DEFAULT));
C# code
DisplayMetrics displayMetrics = Application.Context.Resources.DisplayMetrics;
int pixel = Math.Round(dp * (displayMetrics.Xdpi / DisplayMetrics.DensityDefault));
problem
Operator '/' cannot be applied to operands of type 'float' and 'DisplayMetricsDensity'
Now I have actually two questions. Which code is more proper? What's equivalent code for them in xamarin.android?
Thanks in advance.
Solution to "First code":
Xamarin tends to move constants into their own enums. COMPLEX_UNIT_DIP can be found on ComplexUnitType enum. Also you cannot have < HEIGHT > in your code you actually need to pass in dips to get the equivalent pixel value. in the example below I am getting pixels in 100 dips.
var dp = 100;
int pixel = (int)TypedValue.ApplyDimension(ComplexUnitType.Dip, dp, Context.Resources.DisplayMetrics);
Solution to "Second code":
You need to explicitly cast 'DisplayMetrics.DensityDefault' to a float and the entire round to an int:
int pixel = (int)System.Math.Round(dp * (displayMetrics.Xdpi / (float)DisplayMetrics.DensityDefault));
I prefer the first approach as the second code is specifically for calculating along the "x dimension":
Per Android docs and Xamarin.Android docs, the Xdpi property is
"The exact physical pixels per inch of the screen in the X dimension."
These are from the project that I am currently working on..
public static float pxFromDp(Context context, float dp)
{
return dp * context.Resources.DisplayMetrics.Density;
}
public static float dpFromPx(Context context, float px)
{
return px / context.Resources.DisplayMetrics.Density;
}
From android source code textview settextsize
var convertToDp = TypedValue.ApplyDimension( ComplexUnitType.Dip, size, context.Resouces.DisplayMetrics);
dp to px:
DisplayMetrics displayMetrics = Application.Context.Resources.DisplayMetrics;
double pixel = Math.Round((dp * DisplayMetrics.DensityDefault) + 0.5);
Answer taken from here : https://stackoverflow.com/a/8490361/6949388
Sorry, not enough rep to comment.

Performance issues with LDS memory in OpenCL

I have a performance problem when using LDS memory with AMD Radeon HD 6850.
I have two kernels as parts of a N-particle simulation. Each work unit has to calculate force which acts on a corresponding particle based on relative position to other particles. The problematic kernel is:
#define UNROLL_FACTOR 8
//Vernet velocity part kernel
__kernel void kernel_velocity(const float deltaTime,
__global const float4 *pos,
__global float4 *vel,
__global float4 *accel,
__local float4 *pblock,
const float bound)
{
const int gid = get_global_id(0); //global id of work item
const int id = get_local_id(0); //local id of work item within work group
const int s_wg = get_local_size(0); //work group size
const int n_wg = get_num_groups(0); //number of work groups
const float4 myPos = pos[gid];
const float4 myVel = vel[gid];
const float4 dt = (float4)(deltaTime, deltaTime, 0.0f, 0.0f);
float4 acc = (float4)0.0f;
for (int jw = 0; jw < n_wg; ++jw)
{
pblock[id] = pos[jw * s_wg + id]; //cache a particle position; position in array: workgroup no. * size of workgroup + local id
barrier (CLK_LOCAL_MEM_FENCE); //wait for others in the work group
for (int i = 0; i < s_wg; )
{
#pragma unroll UNROLL_FACTOR
for (int j = 0; j < UNROLL_FACTOR; ++j, ++i)
{
float4 r = myPos - pblock[i];
float rSizeSquareInv = native_recip (r.x*r.x + r.y*r.y + 0.0001f);
float rSizeSquareInvDouble = rSizeSquareInv * rSizeSquareInv;
float rSizeSquareInvQuadr = rSizeSquareInvDouble * rSizeSquareInvDouble;
float rSizeSquareInvHept = rSizeSquareInvQuadr * rSizeSquareInvDouble * rSizeSquareInv;
acc += r * (2.0f * rSizeSquareInvHept - rSizeSquareInvQuadr);
}
}
barrier(CLK_LOCAL_MEM_FENCE);
}
acc *= 24.0f / myPos.w;
//update velocity only
float4 newVel = myVel + 0.5f * dt * (accel[gid] + acc);
//write to global memory
vel[gid] = newVel;
accel[gid] = acc;
}
The simulation runs fine in terms of results, but the problem is in the performance when using the local memory for caching the particle positions to relieve the big amount of reading from the global memory. Actually if the line
float4 r = myPos - pblock[i];
is replaced by
float4 r = myPos - pos[jw * s_wg + i];
the kernel runs faster. I don't really get that since reading from global should be much slower than reading from local.
Moreover, when the line
float4 r = myPos - pblock[i];
is removed completely and all following occurences of r are replaced by myPos - pblock[i], the speed is the same as before as if the line was not there at all. This I don't get even more as accessing private memory in r should be the fastest but the compiler somehow "optimizes" this line out.
Global work size is 4608, local worksize is 192. It is run with AMD APP SDK v2.9 and Catalyst drivers 13.12 in Ubuntu 12.04.
Can anyone please help me with this? Is that my fault or is that a problem of the GPU / drivers / ... ? Or is it a feature? :-)
I'm gonna make a wild guess:
When using float4 r = myPos - pos[jw * s_wg + i]; the compiler is smart enough to notice that the barrier put after the initialization of pblock[id] is not necessary anymore and remove it. Very likely all these barriers (in the for loop) impact your performances, so removing them is very noticeable.
Yeah but global access cost a lot too...So I'm guessing that behind the scene cache memories are well utilized. There is also the fact that you use vectors and as a matter of fact the architecture of the AMD Radeon HD 6850 uses VLIW processors...maybe it helps also to make a better use of the cache memories...maybe.
EDIT:
I've just found out a article benchmarking GPU/APU Cache and Memory Latencies. Your GPU is in the list. You might get some more answers (sorry didn't really read it - too tired).
After some more digging it turned out that the code causes some LDS bank conflicts. The reason is that for AMD there are 32 banks with 4 bytes length, but the float4 covers 16 bytes and therefore the half-wavefront accesses different addresses in the same banks. The solution was to make __local float* for x and y coordinates separately and read them also separately with the proper shift of array index as (id + i) % s_wg. Nevertheless, the overall gain in performance is small, most likely due to the overall latencies described in the link provided by #CaptainObvious (well then one has to increase the global work size to hide them).

OpenCV running slow

I have got OpenCV installed on vs2010/win7 however I'm seeing some behaviour that I can't figure out.
I am new to OpenCV so just have a basic program to pull frames from an avi file - it then splits that frame into single channel images and generates histograms for each of those (taken from an internet example). It actually all works fine, it's just extremely slow. It turns out that cvFillConvexPoly is actually taking 10-15 seconds (sometimes longer) to complete - but when it eventually returns it is correct.
This is the code snippet where I call the culprit function and as you can see I also tried cvFillPoly which took the same amount of time to complete.
IplImage* DrawHistogram(CvHistogram *hist, float sX)
{
float histMax = 0;
cvGetMinMaxHistValue(hist, 0, &histMax, 0, 0);
IplImage *imgHist = cvCreateImage(cvSize(256, 64), IPL_DEPTH_8U, 1);
cvZero(imgHist);
float histValue = 0;
float nextValue = 0;
for (int i = 0; i < ((BINS - 1)*sX); i++)
{
histValue = cvQueryHistValue_1D(hist, i);
nextValue = cvQueryHistValue_1D(hist, i + 1);
CvPoint p1 = cvPoint(i * sX, 64);
CvPoint p2 = cvPoint((i + 1) * sX, 64);
CvPoint p3 = cvPoint((i + 1) * sX, 64 - histValue*(64/histMax));
CvPoint p4 = cvPoint(i * sX, 64 - histValue*(64/histMax));
int n = 5;
CvPoint pts[] = {p1, p2, p3, p4};
cvFillConvexPoly(imgHist, pts, n, cvScalar(255));
//cvFillPoly(imgHist, pts, &n, 1,cvScalar(255));
}
return imgHist;
}
Any help is appreciated.
Compiled on Win7 x64 with CMake 2.8.2/VS2010 as 32 bit. Same behaviour when debugging and when running as a standalone.
Also have it running on Ubuntu 10.10 32 bit, compiled with gcc 4.4.5 where there isn't a problem.
Edit
I've tried recompiling with VS2008 and it still does the same thing. I don't understand what would cause it to run so slowly - unless it's the way 64bit windows "emulates" 32 bit which is causing the problem.
I can spot 2 possible bugs in your code, both of which have to do with bounds. Reading/writing outside array boundaries may result in all kinds of unexpected behavior, so it's a wonder your programs don't crash. Maybe GCC and/or the OpenCV library behaves differently on Ubuntu and Windows causing it not to crash on Ubuntu, but you should definitely take a look at the following 2 points.
I assume sX is a scaling factor? Your for loop should run from 0 to (BINS-1) no matter this scaling, since you're using i to index your histogram and there are BINS bins, not BINS*sX. As long as sX == 1 you won't run into trouble but any other value will invalidate your histogram drawing code. You are already using sX in its correct way too in the cvPoint declarations.
According to the docs of the cvFillConvexPoly function, n should be the number of points, which is 4 in your case, not 5.

What are some algorithms that will allow me to simulate planetary physics?

I'm interested in doing a "Solar System" simulator that will allow me to simulate the rotational and gravitational forces of planets and stars.
I'd like to be able to say, simulate our solar system, and simulate it across varying speeds (ie, watch Earth and other planets rotate around the sun across days, years, etc). I'd like to be able to add planets and change planets mass, etc, to see how it would effect the system.
Does anyone have any resources that would point me in the right direction for writing this sort of simulator?
Are there any existing physics engines which are designed for this purpose?
It's everything here and in general, everything that Jean Meeus has written.
You need to know and understand Newton's Law of Universal Gravitation and Kepler's Laws of Planetary Motion. These two are simple and I'm sure you've heard about them, if not studied them in high school. Finally, if you want your simulator to be as accurate as possible, you should familiarize yourself with the n-Body problem.
You should start out simple. Try making a Sun object and an Earth object that revolves around it. That should give you a very solid start and it's fairly easy to expand from there. A planet object would look something like:
Class Planet {
float x;
float y;
float z; // If you want to work in 3D
double velocity;
int mass;
}
Just remember that F = MA and the rest just just boring math :P
This is a great tutorial on N-body problems in general.
http://www.artcompsci.org/#msa
It's written using Ruby but pretty easy to map into other languages etc. It covers some of the common integration approaches; Forward-Euler, Leapfrog and Hermite.
You might want to take a look at Celestia, a free space simulator. I believe that you can use it to create fictitious solar systems and it is open source.
All you need to implement is proper differential equation (Keplers law) and using Runge-Kutta. (at lest this worked for me, but there are probably better methods)
There are loads of such simulators online.
Here is one simple one implemented in 500lines of c code. (montion algorhitm is much less)
http://astro.berkeley.edu/~dperley/programs/ssms.html.
Also check this:
http://en.wikipedia.org/wiki/Kepler_problem
http://en.wikipedia.org/wiki/Two-body_problem
http://en.wikipedia.org/wiki/N-body_problem
In physics this is known as the N-Body Problem. It is famous because you can not solve this by hand for a system with more than three planets. Luckily, you can get approximate solutions with a computer very easily.
A nice paper on writing this code from the ground up can be found here.
However, I feel a word of warning is important here. You may not get the results you expect. If you want to see how:
the mass of a planet affects its orbital speed around the Sun, cool. You will see that.
the different planets interact with each other, you will be bummed.
The problem is this.
Yeah, modern astronomers are concerned with how Saturn's mass changes the Earth's orbit around the Sun. But this is a VERY minor effect. If you are going to plot the path of a planet around the Sun, it will hardly matter that there are other planets in the Solar System. The Sun is so big it will drown out all other gravity. The only exceptions to this are:
If your planets have very elliptical orbits. This will cause the planets to potentially get closer together, so they interact more.
If your planets are almost the exact same distance from the Sun. They will interact more.
If you make your planets so comically large they compete with the Sun for gravity in the outer Solar System.
To be clear, yes, you will be able to calculate some interactions between planets. But no, these interactions will not be significant to the naked eye if you create a realistic Solar System.
Try it though, and find out!
Check out nMod, a n-body modeling toolkit written in C++ and using OpenGL. It has a pretty well populated solar system model that comes with it and it should be easy to modify. Also, he has a pretty good wiki about n-body simulation in general. The same guy who created this is also making a new program called Moody, but it doesn't appear to be as far along.
In addition, if you are going to do n-body simulations with more than just a few objects, you should really look at the fast multipole method (also called the fast multipole algorithm). It can the reduce number of computations from O(N^2) to O(N) to really speed up your simulation. It is also one of the top ten most successful algorithms of the 20th century, according to the author of this article.
Algorithms to simulate planetary physics.
Here is an implementation of the Keppler parts, in my Android app. The main parts are on my web site for you can download the whole source: http://www.barrythomas.co.uk/keppler.html
This is my method for drawing the planet at the 'next' position in the orbit. Think of the steps like stepping round a circle, one degree at a time, on a circle which has the same period as the planet you are trying to track. Outside of this method I use a global double as the step counter - called dTime, which contains a number of degrees of rotation.
The key parameters passed to the method are, dEccentricty, dScalar (a scaling factor so the orbit all fits on the display), dYear (the duration of the orbit in Earth years) and to orient the orbit so that perihelion is at the right place on the dial, so to speak, dLongPeri - the Longitude of Perihelion.
drawPlanet:
public void drawPlanet (double dEccentricity, double dScalar, double dYear, Canvas canvas, Paint paint,
String sName, Bitmap bmp, double dLongPeri)
{
double dE, dr, dv, dSatX, dSatY, dSatXCorrected, dSatYCorrected;
float fX, fY;
int iSunXOffset = getWidth() / 2;
int iSunYOffset = getHeight() / 2;
// get the value of E from the angle travelled in this 'tick'
dE = getE (dTime * (1 / dYear), dEccentricity);
// get r: the length of 'radius' vector
dr = getRfromE (dE, dEccentricity, dScalar);
// calculate v - the true anomaly
dv = 2 * Math.atan (
Math.sqrt((1 + dEccentricity) / (1 - dEccentricity))
*
Math.tan(dE / 2)
);
// get X and Y coords based on the origin
dSatX = dr / Math.sin(Math.PI / 2) * Math.sin(dv);
dSatY = Math.sin((Math.PI / 2) - dv) * (dSatX / Math.sin(dv));
// now correct for Longitude of Perihelion for this planet
dSatXCorrected = dSatX * (float)Math.cos (Math.toRadians(dLongPeri)) -
dSatY * (float)Math.sin(Math.toRadians(dLongPeri));
dSatYCorrected = dSatX * (float)Math.sin (Math.toRadians(dLongPeri)) +
dSatY * (float)Math.cos(Math.toRadians(dLongPeri));
// offset the origin to nearer the centre of the display
fX = (float)dSatXCorrected + (float)iSunXOffset;
fY = (float)dSatYCorrected + (float)iSunYOffset;
if (bDrawOrbits)
{
// draw the path of the orbit travelled
paint.setColor(Color.WHITE);
paint.setStyle(Paint.Style.STROKE);
paint.setAntiAlias(true);
// get the size of the rect which encloses the elliptical orbit
dE = getE (0.0, dEccentricity);
dr = getRfromE (dE, dEccentricity, dScalar);
rectOval.bottom = (float)dr;
dE = getE (180.0, dEccentricity);
dr = getRfromE (dE, dEccentricity, dScalar);
rectOval.top = (float)(0 - dr);
// calculate minor axis from major axis and eccentricity
// http://www.1728.org/ellipse.htm
double dMajor = rectOval.bottom - rectOval.top;
double dMinor = Math.sqrt(1 - (dEccentricity * dEccentricity)) * dMajor;
rectOval.left = 0 - (float)(dMinor / 2);
rectOval.right = (float)(dMinor / 2);
rectOval.left += (float)iSunXOffset;
rectOval.right += (float)iSunXOffset;
rectOval.top += (float)iSunYOffset;
rectOval.bottom += (float)iSunYOffset;
// now correct for Longitude of Perihelion for this orbit's path
canvas.save();
canvas.rotate((float)dLongPeri, (float)iSunXOffset, (float)iSunYOffset);
canvas.drawOval(rectOval, paint);
canvas.restore();
}
int iBitmapHeight = bmp.getHeight();
canvas.drawBitmap(bmp, fX - (iBitmapHeight / 2), fY - (iBitmapHeight / 2), null);
// draw planet label
myPaint.setColor(Color.WHITE);
paint.setTextSize(30);
canvas.drawText(sName, fX+20, fY-20, paint);
}
The method above calls two further methods which provide values of E (the mean anomaly) and r, the length of the vector at the end of which the planet is found.
getE:
public double getE (double dTime, double dEccentricity)
{
// we are passed the degree count in degrees (duh)
// and the eccentricity value
// the method returns E
double dM1, dD, dE0, dE = 0; // return value E = the mean anomaly
double dM; // local value of M in radians
dM = Math.toRadians (dTime);
int iSign = 1;
if (dM > 0) iSign = 1; else iSign = -1;
dM = Math.abs(dM) / (2 * Math.PI); // Meeus, p 206, line 110
dM = (dM - (long)dM) * (2 * Math.PI) * iSign; // line 120
if (dM < 0)
dM = dM + (2 * Math.PI); // line 130
iSign = 1;
if (dM > Math.PI) iSign = -1; // line 150
if (dM > Math.PI) dM = 2 * Math.PI - dM; // line 160
dE0 = Math.PI / 2; // line 170
dD = Math.PI / 4; // line 170
for (int i = 0; i < 33; i++) // line 180
{
dM1 = dE0 - dEccentricity * Math.sin(dE0); // line 190
dE0 = dE0 + dD * Math.signum((float)(dM - dM1));
dD = dD / 2;
}
dE = dE0 * iSign;
return dE;
}
getRfromE:
public double getRfromE (double dE, double dEccentricty, double dScalar)
{
return Math.min(getWidth(), getHeight()) / 2 * dScalar * (1 - (dEccentricty * Math.cos(dE)));
}
It looks like it is very hard and requires strong knowledge of physics but in fact it is very easy, you need to know only 2 formulas and basic understanding of vectors:
Attractional force (or gravitational force) between planet1 and planet2 with mass m1 and m2 and distance between them d: Fg = G*m1*m2/d^2; Fg = m*a. G is a constant, find it by substituting random values so that acceleration "a" will not be too small and not too big approximately "0.01" or "0.1".
If you have total vector force which is acting on a current planet at that instant of time, you can find instant acceleration a=(total Force)/(mass of current planet). And if you have current acceleration and current velocity and current position, you can find new velocity and new position
If you want to look it real you can use following supereasy algorythm (pseudocode):
int n; // # of planets
Vector2D planetPosition[n];
Vector2D planetVelocity[n]; // initially set by (0, 0)
double planetMass[n];
while (true){
for (int i = 0; i < n; i++){
Vector2D totalForce = (0, 0); // acting on planet i
for (int j = 0; j < n; j++){
if (j == i)
continue; // force between some planet and itself is 0
Fg = G * planetMass[i] * planetMass[j] / distance(i, j) ^ 2;
// Fg is a scalar value representing magnitude of force acting
// between planet[i] and planet[j]
// vectorFg is a vector form of force Fg
// (planetPosition[j] - planetPosition[i]) is a vector value
// (planetPosition[j]-planetPosition[i])/(planetPosition[j]-plantetPosition[i]).magnitude() is a
// unit vector with direction from planet[i] to planet[j]
vectorFg = Fg * (planetPosition[j] - planetPosition[i]) /
(planetPosition[j] - planetPosition[i]).magnitude();
totalForce += vectorFg;
}
Vector2D acceleration = totalForce / planetMass[i];
planetVelocity[i] += acceleration;
}
// it is important to separate two for's, if you want to know why ask in the comments
for (int i = 0; i < n; i++)
planetPosition[i] += planetVelocity[i];
sleep 17 ms;
draw planets;
}
If you're simulating physics, I highly recommend Box2D.
It's a great physics simulator, and will really cut down the amount of boiler plate you'll need, with physics simulating.
Fundamentals of Astrodynamics by Bate, Muller, and White is still required reading at my alma mater for undergrad Aerospace engineers. This tends to cover the orbital mechanics of bodies in Earth orbit...but that is likely the level of physics and math you will need to start your understanding.
+1 for #Stefano Borini's suggestion for "everything that Jean Meeus has written."
Dear Friend here is the graphics code that simulate solar system
Kindly refer through it
/*Arpana*/
#include<stdio.h>
#include<graphics.h>
#include<conio.h>
#include<math.h>
#include<dos.h>
void main()
{
int i=0,j=260,k=30,l=150,m=90;
int n=230,o=10,p=280,q=220;
float pi=3.1424,a,b,c,d,e,f,g,h,z;
int gd=DETECT,gm;
initgraph(&gd,&gm,"c:\tc\bgi");
outtextxy(0,10,"SOLAR SYSTEM-Appu");
outtextxy(500,10,"press any key...");
circle(320,240,20); /* sun */
setfillstyle(1,4);
floodfill(320,240,15);
outtextxy(310,237,"sun");
circle(260,240,8);
setfillstyle(1,2);
floodfill(258,240,15);
floodfill(262,240,15);
outtextxy(240,220,"mercury");
circle(320,300,12);
setfillstyle(1,1);
floodfill(320,298,15);
floodfill(320,302,15);
outtextxy(335,300,"venus");
circle(320,160,10);
setfillstyle(1,5);
floodfill(320,161,15);
floodfill(320,159,15);
outtextxy(332,150, "earth");
circle(453,300,11);
setfillstyle(1,6);
floodfill(445,300,15);
floodfill(448,309,15);
outtextxy(458,280,"mars");
circle(520,240,14);
setfillstyle(1,7);
floodfill(519,240,15);
floodfill(521,240,15);
outtextxy(500,257,"jupiter");
circle(169,122,12);
setfillstyle(1,12);
floodfill(159,125,15);
floodfill(175,125,15);
outtextxy(130,137,"saturn");
circle(320,420,9);
setfillstyle(1,13);
floodfill(320,417,15);
floodfill(320,423,15);
outtextxy(310,400,"urenus");
circle(40,240,9);
setfillstyle(1,10);
floodfill(38,240,15);
floodfill(42,240,15);
outtextxy(25,220,"neptune");
circle(150,420,7);
setfillstyle(1,14);
floodfill(150,419,15);
floodfill(149,422,15);
outtextxy(120,430,"pluto");
getch();
while(!kbhit()) /*animation*/
{
a=(pi/180)*i;
b=(pi/180)*j;
c=(pi/180)*k;
d=(pi/180)*l;
e=(pi/180)*m;
f=(pi/180)*n;
g=(pi/180)*o;
h=(pi/180)*p;
z=(pi/180)*q;
cleardevice();
circle(320,240,20);
setfillstyle(1,4);
floodfill(320,240,15);
outtextxy(310,237,"sun");
circle(320+60*sin(a),240-35*cos(a),8);
setfillstyle(1,2);
pieslice(320+60*sin(a),240-35*cos(a),0,360,8);
circle(320+100*sin(b),240-60*cos(b),12);
setfillstyle(1,1);
pieslice(320+100*sin(b),240-60*cos(b),0,360,12);
circle(320+130*sin(c),240-80*cos(c),10);
setfillstyle(1,5);
pieslice(320+130*sin(c),240-80*cos(c),0,360,10);
circle(320+170*sin(d),240-100*cos(d),11);
setfillstyle(1,6);
pieslice(320+170*sin(d),240-100*cos(d),0,360,11);
circle(320+200*sin(e),240-130*cos(e),14);
setfillstyle(1,7);
pieslice(320+200*sin(e),240-130*cos(e),0,360,14);
circle(320+230*sin(f),240-155*cos(f),12);
setfillstyle(1,12);
pieslice(320+230*sin(f),240-155*cos(f),0,360,12);
circle(320+260*sin(g),240-180*cos(g),9);
setfillstyle(1,13);
pieslice(320+260*sin(g),240-180*cos(g),0,360,9);
circle(320+280*sin(h),240-200*cos(h),9);
setfillstyle(1,10);
pieslice(320+280*sin(h),240-200*cos(h),0,360,9);
circle(320+300*sin(z),240-220*cos(z),7);
setfillstyle(1,14);
pieslice(320+300*sin(z),240-220*cos(z),0,360,7);
delay(20);
i++;
j++;
k++;
l++;
m++;
n++;
o++;
p++;
q+=2;
}
getch();
}

DirectX 11 Compute Shader - not writing all values

I am trying some experiments in fractal rendering with DirectX11 Compute Shaders.
The provided example runs on a FeatureLevel_10 device.
My RwStructured output buffer has a data format of R32G32B32A32_FLOAT
The problem is that when writing to the buffer, it seems that only the ALPHA ( w ) value gets written nothing else....
Here is the shader code:
struct BufType
{
float4 value;
};
cbuffer ScreenConstants : register(b0)
{
float2 ScreenDimensions;
float2 Padding;
};
RWStructuredBuffer<BufType> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void Main( uint3 DTid : SV_DispatchThreadID )
{
uint index = DTid.y * ScreenDimensions.x + DTid.x;
float minRe = -2.0f;
float maxRe = 1.0f;
float minIm = -1.2;
float maxIm = minIm + ( maxRe - minRe ) * ScreenDimensions.y / ScreenDimensions.x;
float reFactor = (maxRe - minRe ) / (ScreenDimensions.x - 1.0f);
float imFactor = (maxIm - minIm ) / (ScreenDimensions.y - 1.0f);
float cim = maxIm - DTid.y * imFactor;
uint maxIterations = 30;
float cre = minRe + DTid.x * reFactor;
float zre = cre;
float zim = cim;
bool isInside = true;
uint iterationsRun = 0;
for( uint n = 0; n < maxIterations; ++n )
{
float zre2 = zre * zre;
float zim2 = zim * zim;
if ( zre2 + zim2 > 4.0f )
{
isInside = false;
iterationsRun = n;
}
zim = 2 * zre * zim + cim;
zre = zre2 - zim2 + cre;
}
if ( isInside )
{
BufferOut[index].value = float4(1.0f,0.0f,0.0f,1.0f);
}
}
The code actually produces in a sense the correct result ( 2D Mandelbrot set ) but it seems somehow only the alpha value is touched and nothing else is written, although the pixels inside the set should be colored red... ( the image is black & white )
Anybody has a clue what's going on here ?
After some fiddling around i found the problem.
I have not found any documentation from MS mentioning this, so it could also be a Nvidia
specific driver issue.
Apparently you are only allowed to write ONCE per Compute Shader Invocation to the same element in a RWSructuredBuffer. And you also HAVE to write ONCE.
I changed the code to accumulate the correct color in a local variable, and write it now only once at the end of the shader.
Everything works perfectly now in that way.
I'm not sure but, shouldn't it be for BufferOut decl:
RWStructuredBuffer<BufType> BufferOut : register(u0);
instead of :
RWStructuredBuffer BufferOut : register(u0);
If you are only using a float4 write target, why not use just:
RWBuffer<float4> BufferOut : register (u0);
Maybe this could help.
After playing around today again, i ran into the same problem once again.
The following code produced all white output:
[numthreads(1, 1, 1)]
void Main( uint3 dispatchId : SV_DispatchThreadID )
{
float4 color = float4(1.0f,0.0f,0.0f,1.0f);
WriteResult(dispatchId,color);
}
The WriteResult method is a utility method from my hlsl standard library.
Long story short. After i upgraded from Driver version 192 to 195(beta) the problem went away.
Seems like the drivers have some definitive problems in compute shader support left, so beware.
from what ive seen, computer shaders are only useful if you need a more general computational model than the tradition pixel shader, or if you can load data and then share it between threads in fast shared memory. im fairly sure u would get better performance with a pixel shader for the mandelbrot shader.
on my setup (win7, feb 10 dx sdk, gtx480) my compute shaders have a punishing setup time of over 0.2-0.3ms (binding a SRV and a UAV and then calling dispatch()).
if u do a PS implementation please post your experiences.
I have no direct experience with DX compute shaders but...
Why are you setting alpha = 1.0?
IIRC, that makes the pixel 100% transparent, so your inside pixels are transparent red, and show up as whatever color was drawn behind them.
When alpha = 1.0, the RGB components are never used.

Resources