how to rb_protect everything in ruby - ruby

I want to call ruby code from my own C code. In case an exception gets raised, I have to rb_protect the ruby code I call. rb_protect looks like this:
VALUE rb_protect(VALUE (* proc) (VALUE), VALUE data, int * state)
So proc has to be a function which takes VALUE arguments and returns VALUE. I have to call a lot of functions which do not work that way. How can I rb_protect them from raising exceptions?
I have thought of using Data_Make_Struct to wrap everything into one ruby object and call methods on it. Data_Make_Struct could itself raise an exception. How do I rb_protect Data_Make_Struct?

To use rb_protect in a flexible way (e.g., to call a Ruby function with an arbitrary numbers of arguments), pass a small dispatch function to rb_protect. Ruby requires that sizeof(VALUE) == sizeof(void*), and rb_protect blindly passes the VALUE-typed data to the dispatch function without inspecting it or modifying it. This means that you can pass whatever data you want to the dispatch function, let it unpack the data and call the appropriate Ruby method(s).
For example, to rb_protect a call to a Ruby method, you might use something like this:
#define MAX_ARGS 16
struct my_callback_stuff {
VALUE obj;
ID method_id;
int nargs;
VALUE args[MAX_ARGS];
};
VALUE my_callback_dispatch(VALUE rdata)
{
struct my_callback_stuff* data = (struct my_callback_stuff*) rdata;
return rb_funcall2(data->obj, data->method_id, data->nargs, data->args);
}
... in some other function ...
{
/* need to call Ruby */
struct my_callback_stuff stuff;
stuff.obj = the_object_to_call;
stuff.method_id = rb_intern("the_method_id");
stuff.nargs = 3;
stuff.args[0] = INT2FIX(1);
stuff.args[1] = INT2FIX(2);
stuff.args[2] = INT2FIX(3);
int state = 0;
VALUE ret = rb_protect(my_callback_dispatch, (VALUE)(&stuff), &state);
if (state) {
/* ... error processing happens here ... */
}
}
Also, keep in mind that rb_rescue or rb_ensure may be a better approach for some problems.

Related

Ruby C extension : How do I know that a ruby VALUE generated in my C code will be correctly cleaned by GC?

I'm trying to write a really small C extension. So I don't want to make a whole ruby class, with initializer, allocator, and so forth. All I want to do is add a static method to an existing class, method which will run an algorithm and return a result. Unfortunately, all documentation I find only speak about wrapping a C struct into a VALUE, but that's not my use case.
What I want to know : if I create a ruby object (which will allocate memory) inside my C code, and that I return it as the result of my function, will it be taken care of properly by the garbage collector, or is it going to leak ?
Example :
void Init_my_extension()
{
VALUE cFooModule;
cFooModule = rb_const_get(rb_cObject, rb_intern("Foo"));
rb_define_singleton_method(cFooModule, "big_calc", method_big_calc, 1);
}
VALUE method_big_calc(VALUE self, VALUE input)
{
VALUE result;
result = rb_ary_new();
return result;
}
Will the array that was allocated by rb_ary_new() be properly cleaned when it's not used anymore ? How is the garbage collector aware of references to this value ?
Yes, You code properly clean memory if You using rb_ary_new().
In my opinion You need answer on other question. How create you own object.
http://www.onlamp.com/pub/a/onlamp/2004/11/18/extending_ruby.html
first You must create rb_define_alloc_func(cYouObject,t_allocate);
similar this
struct stru { char a; };
void t_free(struct stru *a) { }
static VALUE t_allocate(VALUE obj) { return
Data_Wrap_Struct(obj,NULL,t_free,m); }

Why do I need to set the lambda capture?

I have not much experience in using lambda's - I was hoping someone could explain what I did below in 'layman's terms' (if possible).
I have a std::vector with a number of objects (or none). Each object has an id. I want to place the object with the id I am interested in at the back of the vector.
I did that like so
std::vector<my_ob> l_obs;
[...] // populate the vector
auto l_elem = std::find_if(l_obs.rbegin(),
l_obs.rend(), [](my_ob const& ob){ return ob.mv_id == 8;});
if(l_elem-l_obs.rbegin())
std::iter_swap(l_elem, l_obs.rbegin());
I am using a reverse iterator as I expect the match to already be at the back of the vector in most cases.
The above worked fine, until I moved it into a method and instead of trying to find '8', I wanted to find a value passed as a const int parameter. The compiler told me that the parameter I used was not captured, and that the lambda had no capture default. So I changed the lambda to
[=](my_ob const& ob){ return ob.mv_id == _arg;}
and this all seems to work now.
Why was this = sign needed?
Lambda expressions produce closure objects, which are function objects (similar to a struct with an overloaded operator()).
In order for closures to use variables in the outer scope, they must know how: either by copying the variable into the closure itself, or by referring to it.
Writing
[=](my_ob const& ob){ return ob.mv_id == _arg;}
is equivalent to
[_arg](my_ob const& ob){ return ob.mv_id == _arg;}
which roughly desugars to
struct LAMBDA
{
int _arg;
LAMBDA(int arg) : _arg{arg} { }
auto operator()(my_ob const& ob) const { return ob.mv_id == _arg; }
};
As you can see, _arg needs to be available in the scope of the generated LAMBDA function object, so it needs to be a data member of the closure.
When you were using a literal, no captures were needed as the generated closure looked like:
struct LAMBDA
{
auto operator()(my_ob const& ob) const { return ob.mv_id == 5; }
};

ruby c extension how to manage garbage collection between 2 objects

I have a C extension in which I have a main class (class A for example) created with the classical:
Data_Wrap_Struct
rb_define_alloc_func
rb_define_private_method(mymodule, "initialize" ...)
This A class have an instance method that generate B object. Those B objects can only be generated from A objects and have C data wrapped that depends on the data wrapped in the A instance.
I the A object are collected by the garbage collector before a B object, this could result in a Seg Fault.
How can I tell the GC to not collect a A instance while some of his B objects are still remaining. I guess I have to use rb_gc_mark or something like that. Should I have to mark the A instance each time a B object is created ??
Edit : More specifics Informations
I am trying to write a Clang extension. With clang, you first create a CXIndex, from which you can get a CXTranslationUnit, from which you can get a CXDiagnostic and or a CXCursor and so on. here is a simple illustration:
Clangc::Index#new => Clangc::Index
Clangc::Index#create_translation_unit => Clangc::TranslationUnit
Clangc::TranslationUnit#diagnostic(index) => Clangc::Diagnostic
You can see some code here : https://github.com/cedlemo/ruby-clangc
Edit 2 : A solution
The stuff to build the "b" objects with a reference to the "a" object:
typedef struct B_t {
void * data;
VALUE instance_of_a;
} B_t;
static void
c_b_struct_free(B_t *s)
{
if(s)
{
if(s->data)
a_function_to_free_the_data(s->data);
ruby_xfree(s);
}
}
static void
c_b_mark(void *s)
{
B_t *b =(B_t *)s;
rb_gc_mark(b->an_instance_of_a);
}
VALUE
c_b_struct_alloc( VALUE klass)
{
B_t * ptr;
ptr = (B_t *) ruby_xmalloc(sizeof(B_t));
ptr->data = NULL;
ptr->an_instance_of_a = Qnil;
return Data_Wrap_Struct(klass, c_b_mark, c_b_struct_free, (void *) ptr);
}
The c function that is used to build a "b" object from an "a" object:
VALUE c_A_get_b_object( VALUE self, VALUE arg)
{
VALUE mModule = rb_const_get(rb_cObject, rb_intern("MainModule"));\
VALUE cKlass = rb_const_get(mModule, rb_intern("B"));
VALUE b_instance = rb_class_new_instance(0, NULL, cKlass);
B_t *b;
Data_Get_Struct(b_instance, B_t, b);
/*
transform ruby value arg to C value c_arg
*/
b->data = function_to_fill_the_data(c_arg);
b->instance_of_a = self;
return b_instance;
}
In the Init_mainModule function:
void Init_mainModule(void)
{
VALUE mModule = rb_define_module("MainModule");
/*some code ....*/
VALUE cKlass = rb_define_class_under(mModule, "B", rb_cObject);
rb_define_alloc_func(cKlass, c_b_struct_alloc);
}
Same usage of the rb_gc_mark can be found in mysql2/ext/mysql2/client.c ( rb_mysql_client_mark function) in the project https://github.com/brianmario/mysql2
In the mark function for your B class, you should mark the A Ruby object, telling the garbage collector not to garbage collect it.
The mark function can be specified as the second argument to Data_Wrap_Struct. You might need to modify your design somehow to expose a pointer to the A objects.
Another option is to let the A object be an instance variable of the B object. You should probably do this anyway so that Ruby code can obtain the A object from the B object. Doing this would have the side effect of making the garbage collector not collect the A before the B, but you should not be relying on this side effect because it would be possible for your Ruby code to accidentally mess up the instance variable and then cause a segmentation fault.
Edit: Another option is to use reference counting of the shared C data. Then when the last Ruby object that is using that shared data gets garbage collected, you would delete the shared data. This would involve finding a good, cross-platform, thread-safe way to do reference counting so it might not be trivial.

modify captured array c++11 lambda function

I'm writing an Windows phone application with C++/CX. The function tries to copy input array to output array asynchronously:
IAsyncAction CopyAsync(const Platform::Array<byte, 1>^ input, Platform::WriteOnlyArray<byte, 1>^ output)
{
byte *inputData = input->Data;
byte *outputData = output->Data;
int byteCount = input->Length;
// if I put it here, there is no error
//memcpy_s(outputData, byteCount, inputData, byteCount);
return concurrency::create_async([&]() -> void {
memcpy_s(outputData, byteCount, inputData, byteCount); // access violation exception
return;
});
}
This function compiles but cannot run correctly and produces an "Access violation exception". How can I modify values in the output array?
This is Undefined Behaviour: by the time you use your 3 captured (by reference) variables inputData/outputData/byteCount in the lambda, you already returned from CopyAsync and the stack has been trashed.
It's really the same issue as if you returned a reference to a local variable from a function (which we know is evil), except that here the references are hidden inside the lambda so it's a bit harder to see at first glance.
If you are sure that input and output won't change and will still be reachable between the moment you call CopyAsync and the moment you run the asynchronous action, you can capture your variables by value instead of by reference:
return concurrency::create_async([=]() -> void {
// ^ here
memcpy_s(outputData, byteCount, inputData, byteCount);
return;
});
Since they're only pointers (and an int), you won't be copying the pointed-to data, only the pointers themselves.
Or you could just capture input and output by value: since they're garbage-collected pointers this will at least make sure the objects are still reachable by the time you run the lambda:
return concurrency::create_async([=]() -> void {
memcpy_s(output->Data, input->Length, input->Data, input->Length);
return;
});
I for one prefer this second solution, it provides more guarantees (namely, object reachability) than the first one.

v8::FunctionTemplate referencing a non-global variable

Google's v8 documentation describes how to add a global function to a JavaScript context. We can implement a printf-like function quite easily using the new lambda feature from C++11:
Handle<ObjectTemplate> global = ObjectTemplate::New();
global->Set(String::New("print"), FunctionTemplate::New(
[](const v8::Arguments &args) -> v8::Handle<v8::Value>
{
v8::String::AsciiValue ascii(args[0]);
std::cout << *ascii << "\n";
} ));
Persistent<Context> context = Context::New(NULL, global);
This works well for any global JavaScript function that is either stateless or references a global C++ variable (i.e. std::cout). But what if we want our global JavaScript function to reference a non-global C++ variable? For example, suppose we are creating several different JavaScript contexts each with its own global print function that uses a different C++ std::ostream? If v8 function templates used std::function objects instead of function pointers, the we would do something like this:
Persistent<Context> create_context(std::ostream &out)
{
Handle<ObjectTemplate> global = ObjectTemplate::New();
global->Set(String::New("print"), FunctionTemplate::New(
[&out](const v8::Arguments &args) -> v8::Handle<v8::Value>
{
v8::String::AsciiValue ascii(args[0]);
out << *ascii << "\n";
} ));
return Context::New(NULL, global);
}
Unfortunately, v8 does not seem to support this. I assume (hope?) that v8 has a way of doing something functionally equivalent, but I find myself mystified by the Doxygen for v8::FunctionTemplate. Would anyone who has attempted something similar be willing to distill the process down into something more understandable? I would also like to learn how to create a global instance of a JavaScript object that is bound to an existing, non-global instance of a C++ object.
In answer to my own question... the key is to realize that v8::Arguments is not simply an array of arguments. It also contains the exceedingly useful Callee() and Data() methods. If the function is a method of a JavaScript object then Callee() can, I think, be used to get ahold of whatever instance of that object the method was called on. Useful state information could then be stored in the object instance. You can also supply a data handle, which may point to any C++ object through void*, when adding a function template to an object. This function-specific data handle may then be accessed through the Data() method.
Below is a reasonably complete example of what I was trying to do in the question using v8::Arguments::Data(). Hopefully this will be useful to anyone who wants to do something similar. If you have an alternative strategy you like (and I am certain there is more than one way of doing this), please feel free to add it in another answer!
#include <iostream>
#include <ostream>
#include <v8.h>
// add print() function to an object template
void add_print(v8::Handle<v8::ObjectTemplate>& ot, std::ostream* out)
{
// add function template to ot
ot->Set(v8::String::New("print"), v8::FunctionTemplate::New(
// parameter 1 is the function callback (implemented here as a lambda)
[](const v8::Arguments& args)->v8::Handle<v8::Value>
{
// recover our pointer to an std::ostream from the
// function template's data handle
v8::Handle<v8::External> data = v8::Handle<v8::External>::Cast(args.Data());
std::ostream* out = static_cast<std::ostream*>(data->Value());
// verify that we have the correct number of function arguments
if ( args.Length() != 1 )
return v8::ThrowException(v8::String::New("Too many arguments to print()."));
// print the ascii representation of the argument to the output stream
v8::String::AsciiValue ascii(args[0]);
*out << *ascii << "\n";
// like 'return void;' only in JavaScript
return v8::Undefined();
},
// parameter 2 is the data handle with the pointer to an std::ostream
v8::External::New(out)
));
}
int main()
{
// create a stack-allocated handle scope
v8::HandleScope handle_scope;
// create a global template
v8::Local<v8::ObjectTemplate> global = v8::ObjectTemplate::New();
// add a print() function using std::cout to the global template
add_print(global, &std::cout);
// create a context
v8::Persistent<v8::Context> context = v8::Context::New(nullptr, global);
// enter the created context
v8::Context::Scope context_scope(context);
// create a string containing the JavaScript source code
v8::Local<v8::String> source = v8::String::New("print('1 + 1 = ' + (1 + 1));");
// compile the source code
v8::Local<v8::Script> script = v8::Script::Compile(source);
// run the script
script->Run();
// dispose of the persistent context
context.Dispose();
return 0;
}

Resources