rfactor schedule for descriptor matching

rfactor schedule for descriptor matching - halide

I'm trying to use Halide for brute-force descriptor (e.g SIFT) matching. I'd like to try rfactor in the schedule, but I can't seem to get the associativity prover to oblige. So far I have the following:
Var c("c"), i("i");
Func diff("diff"), diffSq("diffSq"), dotp("dotp"), out("out"),
inp1("inp1"), inp2("inp2"), minVal("minVal");
inp1(c,x) = input1(c,x);
inp2(c,y) = input2(c,y);
diff(x,y,c) = inp1(c, x) - inp2(c, y);
diffSq(x,y,c) = diff(x,y,c) * diff(x,y,c);
RDom rc(0,128);
dotp(x, y) = 0.f;
dotp(x, y) += diffSq(x, y, rc);
// Argmin, see https://github.com/halide/Halide/blob/master/test/correctness/rfactor.cpp#L804
RDom ry(0, input2.height(), "ry");
minVal(x) = {-1, std::numeric_limits<float>::max()};
minVal(x) = {
select(minVal(x)[1] < dotp(x, ry)
,minVal(x)[0]
,ry),
min(minVal(x)[1], dotp(x, ry))
};
out(x) = minVal(x)[0];
// Schedule
RVar ryo("ryo"), ryi("ryi");
Var yy("yy");
Func intermediate("inter");
dotp.compute_root();
minVal.update(0).split(ry, ryo, ryi, 16);
//intermediate = minVal.update(0).rfactor(ryo, yy);
The last, uncommented line sadly fails with:
|| Failed to call rfactor() on minVal.update(0) since it can't prove associativity of the operator
Thanks for any pointers as to how I could resolve this!

Quick answer: only one order of the Tuple elements is matched. Flipping them should allow rfactor. There will be a more complete answer on the list and we'll look at generalizing the matcher. (Answering to make sure the SO side doesn't get forgotten.)

Related

GLSL optimization: check if variable is within range

In my shader I have variable b and need to determine within which range it lies and from that assign the right value to variable a. I ended up with a lot of if statements:
float a = const1;
if (b >= 2.0 && b < 4.0) {
a = const2;
} else if (b >= 4.0 && b < 6.0) {
a = const3;
} else if (b >= 6.0 && b < 8.0) {
a = const4;
} else if (b >= 8.0) {
a = const5;
}
My question is could this lead to performance issues (branching) and how can I optimize it? I've looked at the step and smoothstep functions but haven't figured out a good way to accomplish this.

To solve the problem depicted and avoid branching the usual techniques is to find a series of math functions, one for each condition, that evaluate to 0 for all the conditions except the one the variable satisfies. We can use these functions as gains to build a sum that evaluates to the right value each time.
In this case the conditions are simple intervals, so using the step functions we could write:
x in [a,b] as step(a,x)*step(x,b) (notice the inversion of x and b to get x<=b)
Or
x in [a,b[ as step(a,x)-step(x,b) as explained in this other post: GLSL point inside box test
Using this technique we obtain:
float a = (step(x,2.0)-((step(2.0,x)*step(x,2.0)))*const1 +
(step(2.0,x)-step(4.0,x))*const2 +
(step(4.0,x)-step(6.0,x))*const3 +
(step(6.0,x)-step(8.0,x))*const4 +
step(8.0,x)*const5
This works for general disjoint intervals, but in the case of a step or staircase function as in this question, we can simplify it as:
float a = const1 + step(2.0,x)*(const2-const1) +
step(4.0,x)*(const3-const2) +
step(6.0,x)*(const4-const3) +
step(8.0,x)*(const5-const4)
We could also use a 'bool conversion to float' as means to express our conditions, so as an example step(8.0,x)*(const5-const4) is equivalent to float(x>=8.0)*(const5-const4)

You can avoid branching by creating kind of a lookup table:
float table[5] = {const1, const2, const3, const4, const5};
float a = table[int(clamp(b, 0.0, 8.0) / 2)];
But the performance will depend on whether the lookup table will have to be created in every shader or if it's some kind of uniform... As always, measure first...

It turned out Jaa-cs answere wasn't viable for me as I'm targeting WebGL which doesn't allow variables as indexes (unless it's a loop index). His solution might work great for other OpenGL implementations though.
I came up with this solution using mix and step functions:
//Outside of main function:
uniform vec3 constArray[5]; // Values are sent in to shader
//Inside main function:
float a = constArray[0];
a = mix(a, constArray[1], step(2.0, b));
a = mix(a, constArray[2], step(4.0, b));
a = mix(a, constArray[3], step(6.0, b));
a = mix(a, constArray[4], step(8.0, b));
But after some testing it didn't give any visible performance boost. I finally ended up with this solution:
float a = constArray[0];
if (b >= 2.0)
a = constArray[1];
if (b >= 4.0)
a = constArray[2];
if (b >= 6.0)
a = constArray[3];
if (b >= 8.0)
a = constArray[4];
Which is both compact and easily readable. In my case both these alternatives and my original code performed equally, but at least here are some options to try out.

How to pass a parameter that changes with time to SciLab ode?

I'm trying to solve and heat transfer problem using SciLab's ode function. The thing is: one of the parameters changes with time, h(t).
ODE
My question is: how can I pass an argument to ode function that is changing over time?

ode allows extra function's parameters as list :
It may happen that the simulator f needs extra arguments. In this
case, we can use the following feature. The f argument can also be a
list lst=list(f,u1,u2,...un) where f is a Scilab function with
syntax: ydot = f(t,y,u1,u2,...,un) and u1, u2, ..., un are extra
arguments which are automatically passed to the simulator simuf.
Extra parameter is a function of t
function y = f(t,y,h)
// define y here depending on t and h(t),eg y = t + h(t)
endfunction
function y = h(t)
// define here h(t), eg y = t
endfunction
// define y0,t0 and t
y = ode(y0, t0, t, list(f,h)) // this will pass the h function as a parameter
Extra is a vector for which we want to extract the corresponding term.
Since ode only compute the solution y at t. An idea is to look for Ti < t < Tj when ode performs a computation and get Hi < h < Hj.
This is rather ugly but totally works:
function y = h(t,T,H)
res = abs(t - T) // looking for nearest value of t in T
minres = min(res) // getting the smallest distance
lower = find(res==minres) // getting the index : T(lower)
res(res==minres)=%inf // looking for 2nd nearest value of t in T: nearest is set to inf
minres = min(res) // getting the smallest distance
upper = find(minres==res) // getting the index: T(upper)
// Now t is between T(lower) (nearest) and T(upper) (farest) (! T(lower) may be > T(upper))
y = ((T(upper)-t)*H(lower)+(t-T(lower))*H(upper))/(T(upper)-T(lower)) // computing h such as the barycenter with same distance to H(lower) and H(upper)
endfunction
function ydot=f(t, y,h,T,H)
hi = h(t,T,H) // if Ti< t < Tj; Hi<h(t,T,H)<Hj
disp([t,hi]) // with H = T, hi = t
ydot=y^2-y*sin(t)+cos(t) - hi // example of were to use hi
endfunction
// use base example of `ode`
y0=0;
t0=0;
t=0:0.1:%pi;
H = t // simple example
y = ode(y0,t0,t,list(f,h,t,H));
plot(t,y)

Are there any restrictions with LUT: unbounded way in dimension

When trying to run the sample code below (similar to a look up table), it always generates the following error message: "The pure definition of Function 'out' calls function 'color' in an unbounded way in dimension 0".
RDom r(0, 10, 0, 10);
Func label, color, out;
Var x,y,c;
label(x,y) = 0;
label(r.x,r.y) = 1;
color(c) = 0;
color(label(r.x,r.y)) = 255;
out(x,y) = color(label(x,y));
out.realize(10,10);
Before calling realize, I have tried to statically set bound, like below, without success.
color.bound(c,0,10);
label.bound(x,0,10).bound(y,0,10);
out.bound(x,0,10).bound(y,0,10);
I also looked at the histogram examples, but they are a bit different.
Is this some kind of restrictions in Halide?

Halide prevents any out of bounds access (and decides what to compute) by analyzing the range of the values you pass as arguments to a Func. If those values are unbounded, it can't do that. The way to make them bounded is with clamp:
out(x, y) = color(clamp(label(x, y), 0, 9));
In this case, the reason it's unbounded is that label has an update definition, which makes the analysis give up. If you wrote label like this instead:
label(x, y) = select(x >= 0 && x < 10 && y >= 0 && y < 10, 1, 0);
Then you wouldn't need the clamp.

How to save intermediate outputs into images on a multi-stage pipeline?

Say I have computation something like
Image resultA, resultB;
Func A, B, C, D, E;
Var x, y;
A(x,y) = C(x,y) * D(x,y);
B(x,y) = C(x,y) - D(x,y);
E(x,y) = abs(A(x,y)/B(x,y));
resultA(x,y) = sqrt(E(x,y));
resultB(x,y) = 2.f * E(x,y) + C(x,y);
How to define AOT schedule such that I can save resultA and resultB ?
E(x,y) is common to the computation of resultA and resultB.
Thank you in advance

If the results are the same size in all dimensions, you can return a Tuple:
result(x, y) = Tuple(resultA, resultB);
If they are not the same size, they can be added to a Pipeline and the Pipeline can be compiled to a filter that returns multiple Funcs.
See:
https://github.com/halide/Halide/blob/master/test/correctness/multiple_outputs.cpp

Least Squares Algorithm doesn't work

:) I'm trying to code a Least Squares algorithm and I've come up with this:
function [y] = ex1_Least_Squares(xValues,yValues,x) % a + b*x + c*x^2 = y
points = size(xValues,1);
A = ones(points,3);
b = zeros(points,1);
for i=1:points
A(i,1) = 1;
A(i,2) = xValues(i);
A(i,3) = xValues(i)^2;
b(i) = yValues(i);
end
constants = (A'*A)\(A'*b);
y = constants(1) + constants(2)*x + constants(3)*x^2;
When I use this matlab script for linear functions, it works fine I think. However, when I'm passing 12 points of the sin(x) function I get really bad results.
These are the points I pass to the function:
xValues = [ -180; -144; -108; -72; -36; 0; 36; 72; 108; 144; 160; 180];
yValues = [sind(-180); sind(-144); sind(-108); sind(-72); sind(-36); sind(0); sind(36); sind(72); sind(108); sind(144); sind(160); sind(180) ];
And the result is sin(165°) = 0.559935259380508, when it should be sin(165°) = 0.258819

There is no reason why fitting a parabola to a full period of a sinusoid should give good results. These two curves are unrelated.

MATLAB already contains a least square polynomial fitting function, polyfit and a complementary function, polyval. Although you are probably supposed to write your own, trying out something like the following will be educational:
xValues = [ -180; -144; -108; -72; -36; 0; 36; 72; 108; 144; 160; 180];
% you may want to experiment with different ranges of xValues
yValues = sind(xValues);
% try this with different values of n, say 2, 3, and 4
p = polyfit(xValues,yValues,n);
x = -180:36:180;
y = polyval(p,x);
plot(xValues,yValues);
hold on
plot(x,y,'r');
Also, more generically, you should avoid using loops as you have in your code. This should be equivalent:
points = size(xValues,1);
A = ones(points,3);
A(:,2) = xValues;
A(:,3) = xValues.^2; % .^ and ^ are different
The part of the loop involving b is equivalent to doing b = yValues; either name the incoming variable b or just use the variable yValues, there's no need to make a copy of it.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

rfactor schedule for descriptor matching - halide

Quick answer: only one order of the Tuple elements is matched. Flipping them should allow rfactor. There will be a more complete answer on the list and we'll look at generalizing the matcher. (Answering to make sure the SO side doesn't get forgotten.)

Related

GLSL optimization: check if variable is within range

How to pass a parameter that changes with time to SciLab ode?

Are there any restrictions with LUT: unbounded way in dimension

How to save intermediate outputs into images on a multi-stage pipeline?

Least Squares Algorithm doesn't work

Categories

Resources