Argument forwarding in LLVM - performance

I need some advice on "forwarding" arguments to a callee (in the LLVM-IR).
Suppose I have a function F that is called at the beginning of all other functions in the module. From F I need to access (read) the arguments passed to its immediate caller.
Right now to do this I box all arguments in the caller inside a struct and pass a i8* pointer to the struct to F, alongside an identifier telling which caller F is being called from. F has then a giant switch that branches to the appropriate unboxing code. This must be done because the functions in the module have differing signatures (differing argument/return value count and types; even differing calling conventions), but it is obviously suboptimal (both from a performance and code size point-of-view) because I need to allocate the struct on the stack, copy the arguments inside of it, passing an additional pointer to F and then performing the unboxing.
I was wondering if there's a better way to do this, i.e. a way to access from a function the stack frame of its immediate caller (knowing, thanks to the identifier, which caller the function was called from) or, more in general, arbitrary values defined in its immediate caller. Any suggestions?
note: the whole point of what I'm working on is having a single function F that does all this; splitting/inlining/specializing/templating F is not an option.
to clarify, suppose we have the following functions FuncA and FuncB (note: what follows is just pseudo-C-code, always remember we are talking about LLVM-IR!)
Type1 FuncA(Type2 ArgA1) {
F();
// ...
}
Type3 FuncB(Type4 ArgB1, Type5 ArgB2, Type6 ArgB3) {
F();
// ...
}
what I need is an efficient way for the function F to do the following:
void F() {
switch (caller) {
case FuncA:
// do something with ArgA1
break;
case FuncB:
// do something with ArgB1, ArgB2, ArgB3
break;
}
}
as I explained in the first part, right now my F looks like this:
struct Args_FuncA { Type2 ArgA1 };
struct Args_FuncB { Type4 ArgB1, Type5 ArgB2, Type6 ArgB3 };
void F(int callerID, void *args) {
switch (callerID) {
case ID_FuncA:
Args_FuncA *ArgsFuncA = (Args_FuncA*)args;
Type2 ArgA1 = ArgsFuncA->ArgA1;
// do something with ArgA1
break;
case ID_FuncB:
Args_FuncB *ArgsFuncB = (Args_FuncB*)args;
Type4 ArgB1 = ArgsFuncB->ArgB1;
Type5 ArgB2 = ArgsFuncB->ArgB2;
Type6 ArgB3 = ArgsFuncB->ArgB3;
// do something with ArgB1, ArgB2, ArgB3
break;
}
}
and the two functions become:
Type1 FuncA(Type2 ArgA1) {
Args_FuncA args = { ArgA1 };
F(ID_FuncA, (void*)&args);
// ...
}
Type3 FuncB(Type4 ArgB1, Type5 ArgB2, Type6 ArgB3) {
Args_FuncB args = { ArgB1, ArgB2, ArgB3 };
F(ID_FuncB, (void*)&args);
// ...
}

IMHO you've done it right. While there are solutions in machinecode assembly, I am afraid there might be no solution in LLVM assembly, as it's "higher level". If you'd like to run a function on the beginning of some functions have you thought about checking
debugger sources (like gdb)
Binary Instrumentation with Valgrind
I know it's not direct answer, but I hope it might be helpful in some way ;).

Not sure if this helps, but I had a similar problem and got around the limitations of LLVM's tbaa analysis by using a llvm vector to store the intermediate values. LLVM optimization passes were later able to optimize the vector load / stores into scalar registers.
There were a few caveats as I recall. Let me know if you explore this route and I can dig up some code.

Related

Call golang function from Tcl sript

We use a third party Tcl parsing library to validation Tcl script for both syntax and semantic checking. The driver was written in C and defined a set of utility functions. Then it calls Tcl_CreateObjCommand so the script could call these C functions. Now we are in the process of porting the main program to go and I could not find a way to do this. Anyone know a way to call golang functions from Tcl script?
static int
create_utility_tcl_cmds(Tcl_Interp* interp)
{
if (Tcl_CreateObjCommand(interp, "ip_v4_address",
ip_address, (ClientData)AF_INET, NULL) == NULL) {
TCL_CHECKER_TCL_CMD_EVENT(0, "ip_v4_address");
return -1;
}
.....
return 0;
}
Assuming you've set the relevant functions as exported and built the Go parts of your project as in
Using Go code in an existing C project
[…]
The important things to note are:
The package needs to be called main
You need to have a main function, although it can be empty.
You need to import the package C
You need special //export comments to mark the functions you want callable from C.
I can compile it as a C callable static library with the following command:
go build -buildmode=c-archive foo.go
Then the core of what remains to be done is to write the C glue function from Tcl's API to your Go code. That will involve a function something like:
static int ip_address_glue(
ClientData clientData, Tcl_Interp *interp, int objc, Tcl_Obj *const *objv) {
// Need an explicit cast; ClientData is really void*
GoInt address_family = (GoInt) clientData;
// Check for the right number of arguments
if (objc != 2) {
Tcl_WrongNumArgs(interp, 1, objv, "address");
return TCL_ERROR;
}
// Convert the argument to a Go string
GoString address;
int len;
address.p = Tcl_GetStringFromObj(objv[1], &len);
address.n = len; // This bit is hiding a type mismatch
// Do the call; I assume your Go function is called ip_address
ip_address(address_family, address);
// Assume the Go code doesn't fail, so no need to map the failure back to Tcl
return TCL_OK;
}
(Credit to https://medium.com/learning-the-go-programming-language/calling-go-functions-from-other-languages-4c7d8bcc69bf for providing enough information for me to work out some of the type bindings.)
That's then the function that you register with Tcl as the callback.
Tcl_CreateObjCommand(interp, "ip_v4_address", ip_address_glue, (ClientData)AF_INET, NULL);
Theoretically, a command registration can fail. Practically, that only happens when the Tcl interpreter (or a few critical namespaces within it) is being deleted.
Mapping a failure into Tcl is going to be easiest if it is encoded at the Go level as an enumeration. Probably easiest to represent success as zero. With that, you'd then do:
GoInt failure_code = ip_address(address_family, address);
switch (failure_code) {
case 0: // Success
return TCL_OK;
case 1: // First type of failure
Tcl_SetResult(interp, "failure of type #1", TCL_STATIC);
return TCL_ERROR;
// ... etc for each expected case ...
default: // Should be unreachable, yes?
Tcl_SetObjResult(interp, Tcl_ObjPrintf("unexpected failure: %d", failure_code));
return TCL_ERROR;
}
Passing back more complex return types with tuples of values (especially a combination of a success indicator and a “real” result value) should also be possible, but I've not got a Go development environment in order to probe how they're mapped at the C level.

Why do I need to set the lambda capture?

I have not much experience in using lambda's - I was hoping someone could explain what I did below in 'layman's terms' (if possible).
I have a std::vector with a number of objects (or none). Each object has an id. I want to place the object with the id I am interested in at the back of the vector.
I did that like so
std::vector<my_ob> l_obs;
[...] // populate the vector
auto l_elem = std::find_if(l_obs.rbegin(),
l_obs.rend(), [](my_ob const& ob){ return ob.mv_id == 8;});
if(l_elem-l_obs.rbegin())
std::iter_swap(l_elem, l_obs.rbegin());
I am using a reverse iterator as I expect the match to already be at the back of the vector in most cases.
The above worked fine, until I moved it into a method and instead of trying to find '8', I wanted to find a value passed as a const int parameter. The compiler told me that the parameter I used was not captured, and that the lambda had no capture default. So I changed the lambda to
[=](my_ob const& ob){ return ob.mv_id == _arg;}
and this all seems to work now.
Why was this = sign needed?
Lambda expressions produce closure objects, which are function objects (similar to a struct with an overloaded operator()).
In order for closures to use variables in the outer scope, they must know how: either by copying the variable into the closure itself, or by referring to it.
Writing
[=](my_ob const& ob){ return ob.mv_id == _arg;}
is equivalent to
[_arg](my_ob const& ob){ return ob.mv_id == _arg;}
which roughly desugars to
struct LAMBDA
{
int _arg;
LAMBDA(int arg) : _arg{arg} { }
auto operator()(my_ob const& ob) const { return ob.mv_id == _arg; }
};
As you can see, _arg needs to be available in the scope of the generated LAMBDA function object, so it needs to be a data member of the closure.
When you were using a literal, no captures were needed as the generated closure looked like:
struct LAMBDA
{
auto operator()(my_ob const& ob) const { return ob.mv_id == 5; }
};

Reimplement insert iterator to make easy work with "related types" to the one stored on the container

We have the following lightweight classes:
struct A {};
struct B { A get_a() const { return /* sth */; } };
And let's suppose I have an ordered container of type A, and I want to copy objects from another container of type B to it:
std::copy(b_cont.begin(), b_cont.end(),
std::make_insert_iterator(a_cont, a_cont.end())
);
Of course, it won't work because a_cont and b_cont have different types, and classes A and B don't provide conversion operators. The most obvious solution is to call the function get_a for each B object on the range [b_cont.begin(), b_cont.end()), so, lets write a custom insert iterator:
template<template<class...> class container>
struct ba_insert_iterator : public std::insert_iterator<container<A> >
{
using std::insert_iterator<container<A>>::insert_iterator;
ba_insert_iterator& operator=(B const& o)
{
std::insert_iterator<container<A>>::operator=(o.get_a());
return *this;
}
};
template<template<class...> class container>
ba_insert_iterator<container> make_a_inserter(container<A>& c)
{ return ba_insert_iterator<container>(c, c.end()); }
Just an iterator that receives an object of type B, instead of another one of type A, and a function to create them easily. But when instantiating the template:
std::copy(b_cont.begin(), b_cont.end(), make_a_inserter(a_cont));
It says that it doesn't find the operator= because the second operand is an A object (as expected), but the first operand is an std::insert_iterator<std::set<A> >!!, so the compiler is "casting" the iterator to its clase base, which of course lacks of a method for receiving B objects to insert.
Why does it happen?
You inherited operator* (and operator++ too, for that matter) from insert_iterator.
And those return insert_iterator&, not ba_insert_iterator&.
For obvious reasons, std::copy dereferences the output iterator before assigning to it.

Trying to use lambda functions as predicate for condition_variable wait method

I am trying to make the producer-consumer method using c++11 concurrency. The wait method for the condition_variable class has a predicate as second argument, so I thought of using a lambda function:
struct LimitedBuffer {
int* buffer, size, front, back, count;
std::mutex lock;
std::condition_variable not_full;
std::condition_variable not_empty;
LimitedBuffer(int size) : size(size), front(0), back(0), count(0) {
buffer = new int[size];
}
~LimitedBuffer() {
delete[] buffer;
}
void add(int data) {
std::unique_lock<std::mutex> l(lock);
not_full.wait(l, [&count, &size]() {
return count != size;
});
buffer[back] = data;
back = (back+1)%size;
++count;
not_empty.notify_one();
}
int extract() {
std::unique_lock<std::mutex> l(lock);
not_empty.wait(l, [&count]() {
return count != 0;
});
int result = buffer[front];
front = (front+1)%size;
--count;
not_full.notify_one();
return result;
}
};
But I am getting this error:
[Error] capture of non-variable 'LimitedBuffer::count'
I don't really know much about c++11 and lambda functions so I found out that class members can't be captured by value. By value though, I am capturing them by reference, but it seems like it's the same thing.
In a display of brilliance I stored the struct members values in local variables and used them in the lambda function, and it worked! ... or not:
int ct = count, sz = size;
not_full.wait(l, [&ct, &sz]() {
return ct != sz;
});
Obviously I was destroying the whole point of the wait function by using local variables since the value is assigned once and the fun part is checking the member variables which may, should and will change. Silly me.
So, what are my choices? Is there any way I can make the wait method do what it has to do, using the member variables? Or I am forced to not use lambda functions so I'd have to declare auxiliary functions to do the work?
I don't really get why I can't use members variables in lambda functions, but since the masters of the universe dessigned lamba functions for c++11 this way, there must be some good reason.
count is a member variable. Member variables can not be captured directly. Instead, you can capture this to achieve the same effect:
not_full.wait(l, [this] { return count != size; });

C++11 Magic to test and assign from pointer if not nullptr

Okay, I believe in defensive programming. I assume that if I get a pointer it might be null (especially when using GSOAP). Therefore before I try to use the value of the pointer, I always check to make sure the pointer is not null.
In my current code, this is leading to a lot of nearly identical statements.
if (res->A) {
item.out_trace->a = *res->A;
}
if (res->B) {
item.out_trace->b = *res->B;
}
if (res->C) {
item.out_trace->b = *res->C;
}
I realize that I could always go and define a macro for this, but I am wondering if there is a neat C++11 trick to do that. I would love something like the C# ??
// Set y to the value of x if x is NOT null; otherwise,
// if x = null, set y to -1.
int y = x ?? -1;
Thanks.
Perhaps a template like this would meet your need:
template<typename T>
T safe_get( T const *ptr, T defval = T{} ) {
return ptr ? *ptr : std::move(defval);
}
It could be used like this:
item.out_trace->a = safe_get( rez->A );
Ideally it would be inlined and effectively zero-overhead (other than the inherent overhead of doing the safety check and having a branch, of course).

Resources