Set one element of each float4 in an array using CUDA/thrust - parallel-processing

I have a fairly standard float4 class and an array of said float4s on the GPU. Each float4 represents an (x,y,z,rgba) point, and I'd like to use thrust to set the rgba value for each float4 in my array to a specific value. Seems like thrust::fill() might work with a custom iterator, but I don't know how to write a thrust iterator. Any ideas for how to do this?
(x1, y1, z1, c1), (x2, y2, z2, c2), .... --> (x1, y1, z1, value), (x2, y2, z2, value), ....

Seems like thrust::fill() might work with a custom iterator
It won't. fill is simple assignment. The destination iterator is not read, so it can't be modified. You don't want to do assignment over an iterator range, you want to modify an existing iterator range. transform will be the correct algorithm.
Write a functor, something like:
struct set_c
{
float y;
__host__ __device__
set_c(const float val) : y(val) {};
__host__ __device__
myfloat4 operator()(const myfloat4& x)
{
myfloat4 val = x;
val.c = y;
return val;
}
};
Then apply that functor over your data with transform in-place:
thrust::device_vector<myfloat4> data(bigconstant);
// something ...
set_c op(5.f); // set each myfloat4.c = 5.f
thrust::transform(data.begin(), data.end(), data.begin(), op);
All the code above obviously written in the browser and has never been near a compiler. Use at own risk.

Related

How to convert and use the GLSL mat2 type to its equivalent in RenderScript

What is the equivalent conversion of the glsl mat2 type in Renderscript and how to use it?
I came to the conclusion that it may be rs_matrix2x2, but I can't find any sample code about how to use it.
I 'm trying to convert the next GLSL snippet into RenderScript:
GLSL:
vec2 test(vec2 coord, float c, float s)
{
mat2 m = mat2(c, -s, s, c);
return m * coord;
}
Rendercript:
float2 test(float2 coord, float c, float s)
{
//???? -> mat2 m = mat2(c, -s, s, c);
return m * coord;
}
Just found the solution. In case is of any use to others, here is the conversion and usage based in the sample snipped in the question:
float2 test(float2 coord, float c, float s)
{
rs_matrix2x2 m = {c, -s, s, c};
return rsMatrixMultiply(&m, coord);
}

How to implement this function without an if-else branch? (GLSL)

I'm working on an game, using OpenGL ES 2.0
I would like to eliminate branches in the fragment shader, if possible. But there is a function, which I cannot improve:
float HS(in float p, in float c) {
float ap = abs(p);
if( ap > (c*1.5) ) {
return ap - c ;
} else {
return mod(ap+0.5*c,c)-0.5*c;
}
}
The c is a constant in most of the cases, if it helps in this situation. I use this function like this:
vec3 op = sign(p1)*vec3(
HS(p1.x, cc),
HS(p1.y, cc),
HS(p1.z, cc)
);
Here's a trick that "eliminates" the branch. But the more important thing it does is vectorize your code. After all, the compiler probably eliminated the branch for you; it's far less likely that it realized it could do this:
vec3 HSvec(in vec3 p, in const float c)
{
vec3 ap = abs(p);
vec3 side1 = ap - c;
const float val = 0.5 * c;
vec3 side2 = mod(ap + val, vec3(c)) - val;
bvec3 tests = greaterThan(ap, vec3(c*1.5));
return mix(side2, side1, vec3(tests));
}
This eliminates lots of redundant computations, as well as doing lots of computations simultaneously.
The key here is the mix function. mix performs linear interpolation between the two arguments based on the third. But since a bool converted to a float will be exactly 1.0 or 0.0, it's really just selecting either side1 or side2. And this selection is defined by the results of the component-wise greaterThan operation.

Sorting an array of objects based on one attribute only in Processing

I have a series of randomly plotted lines from a class called Line.
I have put all the objects into an array. I would like to connect any lines that are near each other with a dotted line. The simplest way I can think of doing this is to say if the x1 co-ordinate is <5 pixels from the x1 of another line, then draw a dotted line connecting the two x1 co-ordinates.
The problem I have is how to compare all the x1 co-ordinates with all the other x1 co-ordinates. I think this should involve 1. Sorting the array and then 2. Comparing consecutive array elements. However I want to sort only on x1 and I dont know how to do this.
Here is my code so far:
class Line{
public float x1;
public float y1;
public float x2;
public float y2;
public color cB;
public float rot;
public float fat;
public Line(float x1, float y1, float x2, float y2, color tempcB, float rot, float fat){
this.x1 = x1;
this.y1 = y1;
this.x2 = x2;
this.y2 = y2;
this.cB = tempcB;
this.rot = rot;
this.fat = fat;
};void draw(){
line(x1, y1, x2, y2);
//float rot = random(360);
float fat = random(5);
strokeWeight(fat);
////stroke (red,green,blue,opacity)
stroke(fat*100, 0, 0);
rotate(rot);
}
}
//Create array of objects
ArrayList<Line> lines = new ArrayList<Line>();
void setup(){
background(204);
size(600, 600);
for(int i = 0; i < 200; i++){
float r = random(500);
float s = random(500);
lines.add(new Line(r,s,r+10,s+10,color(255,255,255),random(360),random(5)));
}
//Draw out all the lines from the array
for(Line line : lines){
line.draw();
//Print them all out
println(line.x1,line.y1,line.x2,line.y2,line.cB,line.rot,line.fat);
}
}
//Now create connections between the elements
//If the x1 of the line is <5 pixels from another line then create a dotted line between the x1 points.
Like the other answer said, you need to compare both end points for this to make any sense. You also don't have to sort anything.
You should be using the dist() function instead of trying to compare only the x coordinate. The dist() function takes 2 points and gives you their distance. You can use this to check whether two points are close to each other or not:
float x1 = 75;
float y1 = 100;
float x2 = 25;
float y2 = 125;
float distance = dist(x1, y1, x2, y2);
if(distance < 100){
println("close");
}
You can use this function in your Line class to loop through other Lines and check for close points, or find the closest points, whatever you want.
As always, I recommend you try something out and ask another question if you get stuck.
The problem lies in the fact that a Line is composed of two points, and despite being tied together (pun intended), you need to check the points of each Line independently. The only point you really don't need to check is other point in the same Line instance.
In this case, it might be in your best interest to have a Point class. Line would then use Point instances to define both ends rather than the raw float coordinates. In this way, you can have both a list of Lines as well as a list of Points.
In this way you can sort Points by x coordinate or y coordinate and grab all points within 5 pixels of your point (and that isn't the same instance or other point in Line instance of course).
Being able to split handling into Points and Lines is important in that you're using multiple views to handle the same data. As a general rule, you should rearrange said data whenever it becomes cumbersome to deal with in its current form. However if I may make a recommendation, the sorting is not strictly necessary. If you're checking a single point with all other points, you'd have to sort repeatedly according to the current point which is more work than simply making a pass in a list to deal with all other points that are close enough.

Two float[] outputs in one kernel pass (Sobel -> Magnitude and Direction)

I wrote the following rs code in order to calculate the magnitude and the direction within the same kernel as the sobel gradients.
#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed
rs_allocation bmpAllocIn, direction;
int32_t width;
int32_t height;
// Sobel, Magnitude und Direction
float __attribute__((kernel)) sobel_XY(uint32_t x, uint32_t y) {
float sobX=0, sobY=0, magn=0;
// leave a border of 1 pixel
if (x>0 && y>0 && x<(width-1) && y<(height-1)){
uchar4 c11=rsGetElementAt_uchar4(bmpAllocIn, x-1, y-1); uchar4 c12=rsGetElementAt_uchar4(bmpAllocIn, x-1, y);uchar4 c13=rsGetElementAt_uchar4(bmpAllocIn, x-1, y+1);
uchar4 c21=rsGetElementAt_uchar4(bmpAllocIn, x, y-1);uchar4 c23=rsGetElementAt_uchar4(bmpAllocIn, x, y+1);
uchar4 c31=rsGetElementAt_uchar4(bmpAllocIn, x+1, y-1);uchar4 c32=rsGetElementAt_uchar4(bmpAllocIn, x+1, y);uchar4 c33=rsGetElementAt_uchar4(bmpAllocIn, x+1, y+1);
sobX= (float) c11.r-c31.r + 2*(c12.r-c32.r) + c13.r-c33.r;
sobY= (float) c11.r-c13.r + 2*(c21.r-c23.r) + c31.r-c33.r;
float d = atan2(sobY, sobX);
rsSetElementAt_float(direction, d, x, y);
magn= hypot(sobX, sobY);
}
else{
magn=0;
rsSetElementAt_float(direction, 0, x, y);
}
return magn;
}
And the Java part:
float[] gm = new float[width*height]; // gradient magnitude
float[] gd = new float[width*height]; // gradient direction
ScriptC_sobel script;
script=new ScriptC_sobel(rs);
script.set_bmpAllocIn(Allocation.createFromBitmap(rs, bmpGray));
// dirAllocation: reference to the global variable "direction" in rs script. This
// dirAllocation is actually the second output of the kernel. It will be "filled" by
// the rsSetElementAt_float() method that include a reference to the current
// element (x,y) during the passage of the kernel.
Type.Builder TypeDir = new Type.Builder(rs, Element.F32(rs));
TypeDir.setX(width).setY(height);
Allocation dirAllocation = Allocation.createTyped(rs, TypeDir.create());
script.set_direction(dirAllocation);
// outAllocation: the kernel will slide along this global float Variable, which is
// "formally" the output (in principle the roles of the outAllocation (magnitude) and the
// second global variable direction (dirAllocation)could have been switched, the kernel
// just needs at least one in- or out-Allocation to "slide" along.)
Type.Builder TypeOut = new Type.Builder(rs, Element.F32(rs));
TypeOut.setX(width).setY(height);
Allocation outAllocation = Allocation.createTyped(rs, TypeOut.create());
script.forEach_sobel_XY(outAllocation); //start kernel
// here comes the problem
outAllocation.copyTo(gm) ;
dirAllocation.copyTo(gd);
In a nutshell: this code works for my older Galaxy Tab2 (API17) but it creates a crash (Fatal signal 7 (SIGBUS), code 2, fault addr 0x9e6d4000 in tid 6385) with my Galaxy S5 (API 21). The strange thing is that when I use a simpler Kernel that just calculates SobelX or SobelY gradients in the very same way (except the 2nd allocation, here for the direction), it works also on the S5. Thus, the Problem cannot be some compatibility issue. Also, as I said, the kernel itself passes without problems (I can log the Magnitude and direction values) but it struggles with the above .copyTo Statements. As you can see the gm and gd floats have the same dimensions (width*height) as all other allocations used by the kernel. Any idea what the Problem could be? Or is there an alternative, more robust way to do the whole Story?

find center of circle when three points are given

I studied this link and coded accordingly but getting Wrong Answer for the example explained in the link,
During solving the equation, I subtracted equation 2 from equation 1 and equation 3 from equation 2 and then proceed further. Please check link for clarification.
My code is:
include<stdio.h>
int is_formCircle(float a1,float b1,float a2,float b2,float a3,float b3) {
float p1=a2-a1;
float p2=a3-a2;
float p3=b2-b1;
float p4=b3-b2;
float alpha=(a1+a2)*(a1-a2) + (b1+b2)*(b1-b2);
float beta =(a2+a3)*(a2-a3) + (b2+b3)*(b2-b3);
float y1=p1*beta - p2*alpha;
float y2=p2*p3 - p1*p4;
if(y2==0 || y1==0) return 1;
float y=y1/y2;
float x1 = 2*p4*y + beta;
float x2 = 2*p2;
float x = x1/x2;
printf("x=%f y=%f\n",x,y);
return 0;
}
int main() {
float a1,a2,a3,a4,b1,b2,b3,b4;
a1=4.0;
b1=1.0;
a2=-3.0;
b2=7.0;
a3=5.0;
b3=-2.0;
is_formCircle(a1,b1,a2,b2,a3,b3);
return 0;
}
MY another Code:
#include<stdio.h>
int is_formCircle(float a1,float b1,float a2,float b2,float a3,float b3) {
float mid1,mid2,mid3,mid4,m1,m2,D,Dx,Dy,x,y;
mid1 = a1+(a2-a1)/2;
mid2 = b1+(b2-b1)/2;
mid3 = a2+(a3-a2)/2;
mid4 = b2+(b3-b2)/2;
m1=(b2-b1)/(a2-a1);
m2=(b3-b2)/(a3-a2);
m1=-1*m1;
m2=-1*m2;
D=m2-m1;
Dx=mid2-(m1*mid1) + (mid3*m2) - mid4;
Dy=(m1*(mid3*m2-mid4))-(m2*(mid1*m1-mid2));
x=Dx/D;
y=Dy/D;
printf("%f %f",x,y);
return 0;
}
int main() {
float a1,a2,a3,b1,b2,b3;
a1=4.0;
b1=1.0;
a2=-3.0;
b2=7.0;
a3=5.0;
b3=-2.0;
is_formCircle(a1,b1,a2,b2,a3,b3);
return 0;
}
Why my code giving Wrong Answer?
I have to say, if you're following the link you listed, it would've helped to keep the variable names the same. We could understand the algorithm much better seeing x1, y1, x2, y2, x3, y3 instead of p1, p2, p3, p4, alpha and beta. In fact, I don't see much in your algorithm that matches the link. I'm not trying to be as harsh as the comments were (and if you're worried about switching float to double, that was a perfectly good case for a typedef), but debugging algorithms is easiest when you don't have to convert variable names.
I would recommend simply using what they give you for h and k in the link, which is namely done by calculating determinants of 3x3 matrices. You can find lots of references for that.
I'd make two functions, as follows:
float calculateH(float x1, float y1, float x2, float y2, float x3, float y3) {
float numerator = (x2*x2+y2*y2)*y3 - (x3*x3+y3*y3)*y2 -
((x1*x1+y1*y1)*y3 - (x3*x3+y3*y3)*y1) +
(x1*x1+y1*y1)*y2 - (x2*x2+y2*y2)*y1;
float denominator = (x2*y3-x3*y2) -
(x1*y3-x3*y1) +
(x1*y2-x2*y1);
denominator *= 2;
return numerator / denominator;
}
float calculateK(float x1, float y1, float x2, float y2, float x3, float y3) {
float numerator = x2*(x3*x3+y3*y3) - x3*(x2*x2+y2*y2) -
(x1*(x3*x3+y3*y3) - x3*(x1*x1+y1*y1)) +
x1*(x2*x2+y2*y2) - x2*(x1*x1+y1*y1);
float denominator = (x2*y3-x3*y2) -
(x1*y3-x3*y1) +
(x1*y2-x2*y1);
denominator *= 2;
return numerator / denominator;
}
Then your is_formCircle would simply be:
float is_formCircle(float x1, float y1, float x2, float y2, float x3, float y3) {
float h = calculateH(x1, y1, x2, y2, x3, y3);
float k = calculateK(x1, y1, x2, y2, x3, y3);
printf("x=%f y=%f\n",h,k);
}
There are tons of ways to optimize this, and there's a chance I typoed any of the determinant calculations, but it should get you going.
The solution that was given in the link is a "blind" solution, i.e., you know the equation, boom solve it.
However, if you understand more deeply what is behind the scene, you will be able to:
Write a more readable, reliable, flexible code.
Debug easily.
What happen when you substract equation 1 from equation 2? You actually try to find the equation of the straight line describing those points which are equidistant from the point 1 and the point 2. Then, you do the same with point 2 and 3. Finally, you find the intersection between these to lines, which gives you the center of the circle.
How do you describe the straight line of the points equidistant to the point 1 and 2? You take the point that is in the middle of the two, and go in the direction perpendicular to the direction between point 1 and 2.
If this is not absolutly clear, take a paper and draw an example: put points 1,2 and 3, find the two lines and find the intersection.
Now that you understood everything, reshape your code with two functions, on that find the line equidistant between two points, another one which compute the intersection between two lines...
After your edit, the code looked better, although it was not simple to understand. I think that the mistake is when you solve for the intersection of the two lines, do not forget that you are under parametric form:
Dx = (mid4-mid2) - m2*(mid3-mid1);
lambda=Dx/D;
x = mid1 + lambda*m1;
y = mid2 + lambda*1.0;
Checked graphically using Matlab.

Resources