Halide: Cannot print in Generator - "!function_takes_user_context(op->name)" - halide

When I try to print() an expression within a generator, I cannot build:
Internal Error at /home/halidenightly/build_bot/worker/linux-64-gcc53-800/halide/src/CodeGen_OpenCL_Dev.cpp:229 triggered by user code at :
Condition failed: !function_takes_user_context(op->name):
Aborted (core dumped)
I don't understand this error message, what is it?
EDIT 1: I've now included fuller code below.
#include "Halide.h"
using namespace Halide;
class SimpleGenerator : public Generator<SimpleGenerator>{
public:
Input<Buffer<uint8_t >> source{"src", 2};
Input<Buffer<uint8_t >> reference{"ref", 2};
Output<Buffer<uint8_t >> output{"out", 2};
void generate(){
intermediate(x, y) = print(source(x, y), "source at (", x,", ", y, ")") + print(reference(x, y));
output(x, y) = intermediate(x, y);
}
void schedule(){
Var xo("xo"), yo("yo"), xi("xi"), yi("yi");
if (get_target().has_gpu_feature()) {
std::cout << "Using GPU schedule\n";
output.gpu_tile(x, y, xo, yo, xi, yi, 16, 16, TailStrategy::GuardWithIf);
} else {
std::cout << "Using CPU schedule\n";
}
}
private:
Func intermediate{"intermediate"};
Var x{"x"}, y{"y"};
};
HALIDE_REGISTER_GENERATOR(SimpleGenerator, simple_generator)
EDIT 2: I narrowed down the issue; this issue occurs when try to target the GPU with OpenCL. I remember reading somewhere that printing Halide Exprs on the GPU is buggy. Does anyone know how to solve this?

You're syntax seems to be fine, like the one in the tutorial - lesson 04
Func f;
f(x, y) = sin(x) + print(cos(y), "<- this is cos(", y, ") when x =", x);
Could you share more of the 'final' function to get more context? Maybe it could be something prior to that

Related

How to use placeholders in std::bind

I give you a simple snippet of code:
#include <functional>
#include <iostream>
using namespace std;
void module2(int x, int y)
{
cout << "\n " << __PRETTY_FUNCTION__ << ":\t x = " << x << "\t y = " << y;
}
void module3(int x, int y, int z)
{
cout << "\n " << __PRETTY_FUNCTION__ << ":\t x = " << x << "\t y = " << y << "\t z = " << z;
}
int main()
{
using namespace std::placeholders;
int a = 39;
int b = 7;
int c = 3;
auto func_m2 = bind(&module2, _1, _2);
func_m2(a, b); // OK
auto func_m2_PH = bind(&module2, _2, _1);
func_m2_PH(b, a); // OK
//---------------------------------------------------------
auto func_m3 = bind(&module3, a, b, c);
func_m3(); // OK
cout << "\n With PlaceHolders:";
auto func_m3_PH_0 = bind(&module3, _1, _2, _3);
func_m3_PH_0(a, b, c); // OK
auto func_m3_PH_1 = bind(&module3, _2, _1, _3);
func_m3_PH_1(b, a, c); // OK
auto func_m3_PH_2 = bind(&module3, _3, _1, _2);
func_m3_PH_2(c, a, b); // KO !!!
auto func_m3_PH_3 = bind(&module3, _3, _2, _1);
func_m3_PH_3(c, b, a); // OK
auto func_m3_PH_4 = bind(&module3, _1, _3, _2);
func_m3_PH_4(a, c, b); // OK
auto func_m3_PH_5 = bind(&module3, _2, _3, _1);
func_m3_PH_5(b, c, a); // KO !!!
return 0;
}
link to coliru
When the first argument is a function that takes 2 arguments everything is fine: the code works as I expect.
However when the first std::bind's parameter is a function with 3 (or more) arguments the code stops working as I expect (these cases are marked with 'KO !!!' )
But, what do I expect from std::bind and its placeholders?
In this particular case I expect the output:
void module3(int, int, int): x = 39 y = 7 z = 3
every time that I invoke the function object generated from
bind(&module3, etc...)
but, more in general:
I expect that the parameter that replaces the placeholder named '_K' will be the K-th parameter passed to the underlying function (i.e. the first parameter of the std::bind).
What is wrong? My understanding of the std::bind or there is a bug in this function template?
Thanks for your time.
You have it backwards. The _K placeholder defines the mapping from the Kth argument passed to the generated functor (the result of bind) to the position of the placeholder in the parameters of the bound function. So putting _3 in the first argument position of bind means that the first argument given to the bound function will be the third parameter given to the generated function.
The other cases worked because your reversed logic just so happened to be the same as the correct version.

Error: Input buffer filter is accessed at 63, which is beyond the max (15) in dimension 2 Aborted (core dumped)

I want to test my algorithm written in halide on tiramisu compiler
once i run it i got an error like this one
Error: Input buffer filter is accessed at 63, which is beyond the max (15) in dimension 2
Aborted (core dumped)
So i decided to only test the call of the method even i have same parameter but i get same error or error similar like
Error: Input buffer bias is accessed at 15, which is beyond the max (4) in dimension 0
Aborted (core dumped)
here is my wrapper_vgg.h
#ifndef HALIDE__build___wrapper_vgg_o_h
#define HALIDE__build___wrapper_vgg_o_h
#include <tiramisu/utils.h>
#define RADIUS 3
#ifdef __cplusplus
extern "C" {
#endif
int vgg_tiramisu(halide_buffer_t *, halide_buffer_t *_b_input_buffer ,halide_buffer_t *filter,halide_buffer_t *bias,halide_buffer_t *conv,halide_buffer_t *filter2, halide_buffer_t *bias2 ,halide_buffer_t *conv2,halide_buffer_t *_b_output_buffer,halide_buffer_t *_negative_slope);
int vgg_tiramisu_argv(void **args);
int vgg_ref( halide_buffer_t *_b_input_buffer ,halide_buffer_t *filter,halide_buffer_t *bias,halide_buffer_t *filter2, halide_buffer_t *bias2 ,halide_buffer_t *_b_output_buffer);
int vgg_ref_argv(void **args);
// Result is never null and points to constant static data
const struct halide_filter_metadata_t *vgg_tiramisu_metadata();
const struct halide_filter_metadata_t *vgg_ref_metadata();
#ifdef __cplusplus
} // extern "C"
#endif
and here is my vgg_ref.cpp
#include "Halide.h"
#include "configure.h"
using namespace Halide;
int main(int argc, char **argv)
{
ImageParam input{Float(32), 4, "input"};
ImageParam filter{Float(32), 4, "filter"};
ImageParam bias{Float(32), 1, "bias"};
ImageParam filter2{Float(32), 4, "filter2"};
ImageParam bias2{Float(32), 1, "bias2"};
/* THE ALGORITHM */
Var x("x"), y("y"), z("z"), n("n");
Func f_conv("conv"), f_conv2("conv2");
Func f_ReLU("ReLU"), f_ReLU2("ReLU2") ;
//Func f_Maxpool("Maxpool");
Func f_vgg("vgg");
RDom r(0, K+1, 0, K+1, 0, FIn);
RDom r2(0, K+1, 0, K+1, 0, FOut);
// First conv computations
f_conv(x, y, z, n) = bias(z);
f_conv(x, y, z, n) += filter(r.x, r.y, r.z, z) * input(x + r.x, y + r.y, r.z, n);
//first relu
f_ReLU(x, y, z, n) = max(0, f_conv(x, y, z, n));
.....
.....
/* THE SCHEDULE */
// Provide estimates on the input image
.....
.....
f_vgg.compile_to_object("build/generated_fct_vgg_ref.o", {input, filter, bias, filter2, bias2}, "vgg_ref");
f_vgg.compile_to_lowered_stmt("build/generated_fct_vgg_ref.txt", {input, filter, bias, filter2, bias2}, Text);
return 0;
}
and here is the wrapper where i call vgg_ref method
...
#include "configure.h"
#include "wrapper_vgg.h"
#include <tiramisu/utils.h>
using namespace std;
int main(int, char**)
{
Halide::Buffer<float> input(N+K, N+K, FIn, BATCH_SIZE);
Halide::Buffer<float> filter(K+1, K+1, FIn, FOut);
Halide::Buffer<float> bias(FOut);
Halide::Buffer<float> conv(N, N, FOut, BATCH_SIZE);
Halide::Buffer<float> filter2(K+1, K+1, FOut, FOut);
Halide::Buffer<float> bias2(FOut);
Halide::Buffer<float> conv2_tiramisu(N-K, N-K, FOut, BATCH_SIZE);
Halide::Buffer<float> vgg_tiramisu_buff(N-2*K, N-2*K, FOut, BATCH_SIZE);
Halide::Buffer<int> parameters(5);
Halide::Buffer<float> negative_slope(1);negative_slope(0) = 1;
// Buffer for Halide
Halide::Buffer<float> vgg_halide(N-2*K, N-2*K, FOut, BATCH_SIZE);
std::vector<std::chrono::duration<double,std::milli>> duration_vector_1;
std::vector<std::chrono::duration<double,std::milli>> duration_vector_2;
/****************************************** Initialize Buffers *********************************************/
....
....
....
std::cout << "\t\tBuffers initialized" << std::endl;
/****************************************** Halide Part ********************************************************/
for (int i=0; i<NB_TESTS; i++)
{
auto start1 = std::chrono::high_resolution_clock::now();
vgg_ref(input.raw_buffer(), filter.raw_buffer(), bias.raw_buffer(), filter2.raw_buffer(), bias2.raw_buffer(), vgg_halide.raw_buffer());
auto end1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double,std::milli> duration = end1 - start1;
duration_vector_2.push_back(duration);
}
std::cout << "\t\tHalide vgg duration" << ": " << median(duration_vector_1)/1000 << "; " << std::endl;
std::cout << "\t\t Result" << ": ";
/****************************************** Tiramisu Part ********************************************************/
/* // Initialize parameters[]
parameters(0) = N;
parameters(1) = K;
parameters(2) = FIn;
parameters(3) = FOut;
parameters(4) = BATCH_SIZE;
for (int i=0; i<NB_TESTS; i++)
{
// srand (1);
auto start1 = std::chrono::high_resolution_clock::now();
vgg_tiramisu(parameters.raw_buffer(), input.raw_buffer(), filter.raw_buffer(), bias.raw_buffer(), conv.raw_buffer(), filter2.raw_buffer(), bias2.raw_buffer(), conv2_tiramisu.raw_buffer(),vgg_tiramisu_buff.raw_buffer(),negative_slope.raw_buffer());
auto end1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double,std::milli> duration = end1 - start1;
duration_vector_1.push_back(duration);
}
std::cout << "\t\tTiramisu vgg duration" << ": " << median(duration_vector_2)/1000 << "; " << std::endl;
std::cout << "\t\t Result" << ": ";
*/
}
i noticed that once i comment this line in halide part everything work well
vgg_ref(input.raw_buffer(), filter.raw_buffer(), bias.raw_buffer(), filter2.raw_buffer(), bias2.raw_buffer(), vgg_halide.raw_buffer());
so the problem is in this call of the halide function "vgg_ref" .
but i do not know this error related to what i tried to call only one parameter i do always have same problem. i do not know how to fix it.
thank you for sharing any advice or paying my attention to something.
Thank you.
I have been able to fix the problem later AlhamduAllah.
I wanna pay the attention here that it's impossible to be able to run the benchmarks without creating the ".o" file so without this line
f_vgg.compile_to_object("build/generated_fct_vgg_ref.o", {input, filter, bias, filter2, bias2}, "vgg_ref");
But how it comes that it was run in my case!!!
Ok this is basically because ".o" file was generated somewhere in the previous execution.
Be careful here :The trick of the old ".o" should be a reflex many issues of the false result is due to the existence of an old copy that object file.
Even I pay attention for that later, I still have same error or error similar :(.
What does this error refer to ? it mean generally in your code their is an index that does not mach it's definition in the wrapper.
So here is two (02) things to verify to help fix this issue:
Verify the call of the function, it's parameter : ex if the function require put 5 parameter verify if you put 5 not more not less.
Verify all the index their interval.
My problem was in this 2 lines
RDom r(0, K, 0, K, 0, FIn);
RDom r2(0, K, 0, K, 0, FOut);
RDom (A multi-dimensional domain over which to iterate.) help you to browse a small matrix in the input matrix like apply a filter for the input. This RDom above define the intervals of x, y and z of the filter matrix.
In the wrapper i define the parameter of the filter like this
Halide::Buffer<float> filter(K+1, K+1, FIn, FOut);
So in RDom too i have to put that x varies from 0 to k+1 but i have only k that's why i got that problem shown in the question.
So it should be done like this
RDom r(0, K+1, 0, K+1, 0, FIn);
RDom r2(0, K+1, 0, K+1, 0, FOut);
And that do fix my problem.
So just pay attention to those small errors that may ruins your day but it's ok since it will help you learn more.

Ceres: Compute uncertainty on parameter

I am using Ceres to make a fit, and would like to get an uncertainty for the fit parameters. It has been suggested to use the Covariance class, but I am not sure whether I read the documentation correctly. Here is what I tried in analogy to the documentation to get the uncertainties for a simple linear fit:
void Fit::fit_linear_function(const std::vector<double>& x, const std::vector<double>& y, int idx_start, int idx_end, double& k, double& d) {
Problem problem;
for (int i = idx_start; i <= idx_end; ++i) {
//std::cout << "i x y "<<i<< " " << x[i] << " " << y[i] << std::endl;
problem.AddResidualBlock(
new ceres::AutoDiffCostFunction<LinearResidual, 1,1, 1>(
new LinearResidual(x[i], y[i])),
NULL, &k, &d);
}
Covariance::Options options;
Covariance covariance(options);
std::vector<std::pair<const double*, const double *>> covariance_blocks;
covariance_blocks.push_back(std::make_pair(&k,&k));
covariance_blocks.push_back(std::make_pair(&d,&d));
CHECK(covariance.Compute(covariance_blocks,&problem));
double covariance_kk;
double covariance_dd;
covariance.GetCovarianceBlock(&k,&k, &covariance_kk);
covariance.GetCovarianceBlock(&d,&d, &covariance_dd);
std::cout<< "Covariance test k" << covariance_kk<<std::endl;
std::cout<< "Covariance test d" << covariance_dd<<std::endl;
It compiles and produces output, but the results are quite off from what I get from scipy so I must have made a mistake.
Solve the problem and then use the ceres::Covariance class.
http://ceres-solver.org/nnls_covariance.html

Halide: Reduction over a domain for the specific values

I got a func f(x, y, z) in which the values is either 1 and 0, and I need to get the the first 100 coordinates of the values which equals to 1, to reduction/update them to 0.
This is very simple to realize in c and other languages, However, I've been trying to solve it with Halide for a couple of days. Is there any Function or Algorithm that I can use to solve it in Halide Generators?
The question amounts to "How do I implement stream compaction in Halide?" There is much written on parallel stream compaction and it is somewhat non-trivial to do well. See this Stack Overflow answer on doing it in cuda for some discussion and references: CUDA stream compaction algorithm
An quick implementation of simple stream compaction in Halide using a prefix sum looks like so:
#include "Halide.h"
#include <iostream>
using namespace Halide;
static void print_1d(const Buffer<int32_t> &result) {
std::cout << "{ ";
const char *prefix = "";
for (int i = 0; i < result.dim(0).extent(); i++) {
std::cout << prefix << result(i);
prefix = ", ";
}
std::cout << "}\n";
}
int main(int argc, char **argv) {
uint8_t vals[] = {0, 10, 99, 76, 5, 200, 88, 15};
Buffer<uint8_t> in(vals);
Var x;
Func prefix_sum;
RDom range(1, in.dim(0).extent() - 1);
prefix_sum(x) = (int32_t)0;
prefix_sum(range) = select(in(range - 1) > 42, prefix_sum(range - 1) + 1, prefix_sum(range - 1));
RDom in_range(0, in.dim(0).extent());
Func compacted_indices;
compacted_indices(x) = -1;
compacted_indices(clamp(prefix_sum(in_range), 0, in.dim(0).extent() - 1)) = select(in(in_range) > 42, in_range, - 1);
Buffer<int32_t> sum = prefix_sum.realize(8);
Buffer<int32_t> indices = compacted_indices.realize(8);
print_1d(sum);
print_1d(indices);
return 0;
}

performance tuning on Eigen sparse matrix

I've implemented something using Eigen's SparseMatrix, basically it's something like,
SparseMatrix W;
...
W.row(i) += X.row(j); // X is another SparseMatrix, both W and X are row major.
...
and I did some perf-profiling on the code via google-pprof, and I think the above code is problematic, see figure below,
fig 1
then fig 2
finally fig 3
looks like the operator+= brings in much memory-copy stuff.
I don't know much about the internals of SparseMatrix operations, but is there any recommended way to optimize the above code?
If the sparsity of X is a subset of the sparsity of W, then you can wrote your own function doing the addition in-place:
namespace Eigen {
template<typename Dst, typename Src>
void inplace_sparse_add(Dst &dst, const Src &src)
{
EIGEN_STATIC_ASSERT( ((internal::evaluator<Dst>::Flags&RowMajorBit) == (internal::evaluator<Src>::Flags&RowMajorBit)),
THE_STORAGE_ORDER_OF_BOTH_SIDES_MUST_MATCH);
using internal::evaluator;
evaluator<Dst> dst_eval(dst);
evaluator<Src> src_eval(src);
assert(dst.rows()==src.rows() && dst.cols()==src.cols());
for (Index j=0; j<src.outerSize(); ++j)
{
typename evaluator<Dst>::InnerIterator dst_it(dst_eval, j);
typename evaluator<Src>::InnerIterator src_it(src_eval, j);
while(src_it)
{
while(dst_it && dst_it.index()!=src_it.index())
++dst_it;
assert(dst_it);
dst_it.valueRef() += src_it.value();
++src_it;
}
}
}
}
Here is a usage example:
int main()
{
int n = 10;
MatrixXd R = MatrixXd::Random(n,n);
SparseMatrix<double, RowMajor> A = R.sparseView(0.25,1), B = 0.5*R.sparseView(0.65,1);
cout << A.toDense() << "\n\n" << B.toDense() << "\n\n";
inplace_sparse_add(A, B);
cout << A.toDense() << "\n\n";
auto Ai = A.row(2);
inplace_sparse_add(Ai, B.row(2));
cout << A.toDense() << "\n\n";
}

Resources