ruby c extension how to manage garbage collection between 2 objects - ruby

I have a C extension in which I have a main class (class A for example) created with the classical:
Data_Wrap_Struct
rb_define_alloc_func
rb_define_private_method(mymodule, "initialize" ...)
This A class have an instance method that generate B object. Those B objects can only be generated from A objects and have C data wrapped that depends on the data wrapped in the A instance.
I the A object are collected by the garbage collector before a B object, this could result in a Seg Fault.
How can I tell the GC to not collect a A instance while some of his B objects are still remaining. I guess I have to use rb_gc_mark or something like that. Should I have to mark the A instance each time a B object is created ??
Edit : More specifics Informations
I am trying to write a Clang extension. With clang, you first create a CXIndex, from which you can get a CXTranslationUnit, from which you can get a CXDiagnostic and or a CXCursor and so on. here is a simple illustration:
Clangc::Index#new => Clangc::Index
Clangc::Index#create_translation_unit => Clangc::TranslationUnit
Clangc::TranslationUnit#diagnostic(index) => Clangc::Diagnostic
You can see some code here : https://github.com/cedlemo/ruby-clangc
Edit 2 : A solution
The stuff to build the "b" objects with a reference to the "a" object:
typedef struct B_t {
void * data;
VALUE instance_of_a;
} B_t;
static void
c_b_struct_free(B_t *s)
{
if(s)
{
if(s->data)
a_function_to_free_the_data(s->data);
ruby_xfree(s);
}
}
static void
c_b_mark(void *s)
{
B_t *b =(B_t *)s;
rb_gc_mark(b->an_instance_of_a);
}
VALUE
c_b_struct_alloc( VALUE klass)
{
B_t * ptr;
ptr = (B_t *) ruby_xmalloc(sizeof(B_t));
ptr->data = NULL;
ptr->an_instance_of_a = Qnil;
return Data_Wrap_Struct(klass, c_b_mark, c_b_struct_free, (void *) ptr);
}
The c function that is used to build a "b" object from an "a" object:
VALUE c_A_get_b_object( VALUE self, VALUE arg)
{
VALUE mModule = rb_const_get(rb_cObject, rb_intern("MainModule"));\
VALUE cKlass = rb_const_get(mModule, rb_intern("B"));
VALUE b_instance = rb_class_new_instance(0, NULL, cKlass);
B_t *b;
Data_Get_Struct(b_instance, B_t, b);
/*
transform ruby value arg to C value c_arg
*/
b->data = function_to_fill_the_data(c_arg);
b->instance_of_a = self;
return b_instance;
}
In the Init_mainModule function:
void Init_mainModule(void)
{
VALUE mModule = rb_define_module("MainModule");
/*some code ....*/
VALUE cKlass = rb_define_class_under(mModule, "B", rb_cObject);
rb_define_alloc_func(cKlass, c_b_struct_alloc);
}
Same usage of the rb_gc_mark can be found in mysql2/ext/mysql2/client.c ( rb_mysql_client_mark function) in the project https://github.com/brianmario/mysql2

In the mark function for your B class, you should mark the A Ruby object, telling the garbage collector not to garbage collect it.
The mark function can be specified as the second argument to Data_Wrap_Struct. You might need to modify your design somehow to expose a pointer to the A objects.
Another option is to let the A object be an instance variable of the B object. You should probably do this anyway so that Ruby code can obtain the A object from the B object. Doing this would have the side effect of making the garbage collector not collect the A before the B, but you should not be relying on this side effect because it would be possible for your Ruby code to accidentally mess up the instance variable and then cause a segmentation fault.
Edit: Another option is to use reference counting of the shared C data. Then when the last Ruby object that is using that shared data gets garbage collected, you would delete the shared data. This would involve finding a good, cross-platform, thread-safe way to do reference counting so it might not be trivial.

Related

Ruby C extension : How do I know that a ruby VALUE generated in my C code will be correctly cleaned by GC?

I'm trying to write a really small C extension. So I don't want to make a whole ruby class, with initializer, allocator, and so forth. All I want to do is add a static method to an existing class, method which will run an algorithm and return a result. Unfortunately, all documentation I find only speak about wrapping a C struct into a VALUE, but that's not my use case.
What I want to know : if I create a ruby object (which will allocate memory) inside my C code, and that I return it as the result of my function, will it be taken care of properly by the garbage collector, or is it going to leak ?
Example :
void Init_my_extension()
{
VALUE cFooModule;
cFooModule = rb_const_get(rb_cObject, rb_intern("Foo"));
rb_define_singleton_method(cFooModule, "big_calc", method_big_calc, 1);
}
VALUE method_big_calc(VALUE self, VALUE input)
{
VALUE result;
result = rb_ary_new();
return result;
}
Will the array that was allocated by rb_ary_new() be properly cleaned when it's not used anymore ? How is the garbage collector aware of references to this value ?
Yes, You code properly clean memory if You using rb_ary_new().
In my opinion You need answer on other question. How create you own object.
http://www.onlamp.com/pub/a/onlamp/2004/11/18/extending_ruby.html
first You must create rb_define_alloc_func(cYouObject,t_allocate);
similar this
struct stru { char a; };
void t_free(struct stru *a) { }
static VALUE t_allocate(VALUE obj) { return
Data_Wrap_Struct(obj,NULL,t_free,m); }

Reimplement insert iterator to make easy work with "related types" to the one stored on the container

We have the following lightweight classes:
struct A {};
struct B { A get_a() const { return /* sth */; } };
And let's suppose I have an ordered container of type A, and I want to copy objects from another container of type B to it:
std::copy(b_cont.begin(), b_cont.end(),
std::make_insert_iterator(a_cont, a_cont.end())
);
Of course, it won't work because a_cont and b_cont have different types, and classes A and B don't provide conversion operators. The most obvious solution is to call the function get_a for each B object on the range [b_cont.begin(), b_cont.end()), so, lets write a custom insert iterator:
template<template<class...> class container>
struct ba_insert_iterator : public std::insert_iterator<container<A> >
{
using std::insert_iterator<container<A>>::insert_iterator;
ba_insert_iterator& operator=(B const& o)
{
std::insert_iterator<container<A>>::operator=(o.get_a());
return *this;
}
};
template<template<class...> class container>
ba_insert_iterator<container> make_a_inserter(container<A>& c)
{ return ba_insert_iterator<container>(c, c.end()); }
Just an iterator that receives an object of type B, instead of another one of type A, and a function to create them easily. But when instantiating the template:
std::copy(b_cont.begin(), b_cont.end(), make_a_inserter(a_cont));
It says that it doesn't find the operator= because the second operand is an A object (as expected), but the first operand is an std::insert_iterator<std::set<A> >!!, so the compiler is "casting" the iterator to its clase base, which of course lacks of a method for receiving B objects to insert.
Why does it happen?
You inherited operator* (and operator++ too, for that matter) from insert_iterator.
And those return insert_iterator&, not ba_insert_iterator&.
For obvious reasons, std::copy dereferences the output iterator before assigning to it.

How to create a string on the heap in D?

I'm writing a trie in D and I want each trie object have a pointer to some data, which has a non-NULL value if the node is a terminal node in the trie, and NULL otherwise. The type of the data is undetermined until the trie is created (in C this would be done with a void *, but I plan to do it with a template), which is one of the reasons why pointers to heap objects are desirable.
This requires me to eventually create my data on the heap, at which point it can be pointed to by the trie node. Experimenting, it seems like new performs this task, much as it does in C++. However for some reason, this fails with strings. The following code works:
import std.stdio;
void main() {
string *a;
string b = "hello";
a = &b;
writefln("b = %s, a = %s, *a = %s", b, a, *a);
}
/* OUTPUT:
b = hello, a = 7FFF5C60D8B0, *a = hello
*/
However, this fails:
import std.stdio;
void main() {
string *a;
a = new string();
writefln("a = %s, *a = %s", a, *a);
}
/* COMPILER FAILS WITH:
test.d(5): Error: new can only create structs, dynamic arrays or class objects, not string's
*/
What gives? How can I create strings on the heap?
P.S. If anyone writing the D compiler is reading this, the apostrophe in "string's" is a grammatical error.
Strings are always allocated on the heap. This is the same for any other dynamic array (T[], string is only an alias to type immutable(char)[]).
If you need only one pointer there are two ways to do it:
auto str = "some immutable(char) array";
auto ptr1 = &str; // return pointer to reference to string (immutable(char)[]*)
auto ptr2 = str.ptr; // return pointer to first element in string (char*)
If you need pointer to empty string, use this:
auto ptr = &"";
Remember that you can't change value of any single character in string (because they are immutable). If you want to operate on characters in string use this:
auto mutableString1 = cast(char[])"Convert to mutable."; // shouldn't be used
// or
auto mutableString2 = "Convert to mutable.".dup; // T[].dup returns mutable duplicate of array
Generally you should avoid pointers unless you absolutely know what are you doing.
From memory point of view any pointer take 4B (8B for x64 machines) of memory, but if you are using pointers to arrays then, if pointer is not null, there are 12B (+ data in array) of memory in use. 4B if from pointer and 8B are from reference to array, because array references are set of two pointers. One to first and one to last element in array.
Remember that string is just immutable(char)[]. So you don't need pointers since string is already a dynamic array.
As for creating them, you just do new char[X], not new string.
The string contents are on the heap already because strings are dynamic arrays. However, in your case, it is better to use a char dynamic array instead as you require mutability.
import std.stdio;
void main() {
char[] a = null; // redundant as dynamic arrays are initialized to null
writefln("a = \"%s\", a.ptr = %s", a, a.ptr); // prints: a = "", a.ptr = null
a = "hello".dup; // dup is required because a is mutable
writefln("a = \"%s\", a.ptr = %s", a, a.ptr); // prints: a = "hello", a.ptr = 7F3146469FF0
}
Note that you don't actually hold the array's contents, but a slice of it. The array is handled by the runtime and it is allocated on the heap.
A good reading on the subject is this article http://dlang.org/d-array-article.html
If you can only use exactly one pointer and you don't want to use the suggestions in Marmyst's answer (&str in his example creates a reference to the stack which you might not want, str.ptr loses information about the strings length as D strings are not always zero terminated) you can do this:
Remeber that you can think of D arrays (and therefore strings) as a struct with a data pointer and length member:
struct ArraySlice(T)
{
T* ptr;
size_t length;
}
So when dealing with an array the array's content is always on the heap, but the ptr/length combined type is a value type and therefore usually kept on the stack. I don't know why the compiler doesn't allow you to create that value type on the heap using new, but you can always do it manually:
import core.memory;
import std.stdio;
string* ptr;
void alloc()
{
ptr = cast(string*)GC.malloc(string.sizeof);
*ptr = "Hello World!";
}
void main()
{
alloc();
writefln("ptr=%s, ptr.ptr=%s, ptr.length=%s, *ptr=%s", ptr, ptr.ptr, ptr.length, *ptr);
}

v8::FunctionTemplate referencing a non-global variable

Google's v8 documentation describes how to add a global function to a JavaScript context. We can implement a printf-like function quite easily using the new lambda feature from C++11:
Handle<ObjectTemplate> global = ObjectTemplate::New();
global->Set(String::New("print"), FunctionTemplate::New(
[](const v8::Arguments &args) -> v8::Handle<v8::Value>
{
v8::String::AsciiValue ascii(args[0]);
std::cout << *ascii << "\n";
} ));
Persistent<Context> context = Context::New(NULL, global);
This works well for any global JavaScript function that is either stateless or references a global C++ variable (i.e. std::cout). But what if we want our global JavaScript function to reference a non-global C++ variable? For example, suppose we are creating several different JavaScript contexts each with its own global print function that uses a different C++ std::ostream? If v8 function templates used std::function objects instead of function pointers, the we would do something like this:
Persistent<Context> create_context(std::ostream &out)
{
Handle<ObjectTemplate> global = ObjectTemplate::New();
global->Set(String::New("print"), FunctionTemplate::New(
[&out](const v8::Arguments &args) -> v8::Handle<v8::Value>
{
v8::String::AsciiValue ascii(args[0]);
out << *ascii << "\n";
} ));
return Context::New(NULL, global);
}
Unfortunately, v8 does not seem to support this. I assume (hope?) that v8 has a way of doing something functionally equivalent, but I find myself mystified by the Doxygen for v8::FunctionTemplate. Would anyone who has attempted something similar be willing to distill the process down into something more understandable? I would also like to learn how to create a global instance of a JavaScript object that is bound to an existing, non-global instance of a C++ object.
In answer to my own question... the key is to realize that v8::Arguments is not simply an array of arguments. It also contains the exceedingly useful Callee() and Data() methods. If the function is a method of a JavaScript object then Callee() can, I think, be used to get ahold of whatever instance of that object the method was called on. Useful state information could then be stored in the object instance. You can also supply a data handle, which may point to any C++ object through void*, when adding a function template to an object. This function-specific data handle may then be accessed through the Data() method.
Below is a reasonably complete example of what I was trying to do in the question using v8::Arguments::Data(). Hopefully this will be useful to anyone who wants to do something similar. If you have an alternative strategy you like (and I am certain there is more than one way of doing this), please feel free to add it in another answer!
#include <iostream>
#include <ostream>
#include <v8.h>
// add print() function to an object template
void add_print(v8::Handle<v8::ObjectTemplate>& ot, std::ostream* out)
{
// add function template to ot
ot->Set(v8::String::New("print"), v8::FunctionTemplate::New(
// parameter 1 is the function callback (implemented here as a lambda)
[](const v8::Arguments& args)->v8::Handle<v8::Value>
{
// recover our pointer to an std::ostream from the
// function template's data handle
v8::Handle<v8::External> data = v8::Handle<v8::External>::Cast(args.Data());
std::ostream* out = static_cast<std::ostream*>(data->Value());
// verify that we have the correct number of function arguments
if ( args.Length() != 1 )
return v8::ThrowException(v8::String::New("Too many arguments to print()."));
// print the ascii representation of the argument to the output stream
v8::String::AsciiValue ascii(args[0]);
*out << *ascii << "\n";
// like 'return void;' only in JavaScript
return v8::Undefined();
},
// parameter 2 is the data handle with the pointer to an std::ostream
v8::External::New(out)
));
}
int main()
{
// create a stack-allocated handle scope
v8::HandleScope handle_scope;
// create a global template
v8::Local<v8::ObjectTemplate> global = v8::ObjectTemplate::New();
// add a print() function using std::cout to the global template
add_print(global, &std::cout);
// create a context
v8::Persistent<v8::Context> context = v8::Context::New(nullptr, global);
// enter the created context
v8::Context::Scope context_scope(context);
// create a string containing the JavaScript source code
v8::Local<v8::String> source = v8::String::New("print('1 + 1 = ' + (1 + 1));");
// compile the source code
v8::Local<v8::Script> script = v8::Script::Compile(source);
// run the script
script->Run();
// dispose of the persistent context
context.Dispose();
return 0;
}

how to rb_protect everything in ruby

I want to call ruby code from my own C code. In case an exception gets raised, I have to rb_protect the ruby code I call. rb_protect looks like this:
VALUE rb_protect(VALUE (* proc) (VALUE), VALUE data, int * state)
So proc has to be a function which takes VALUE arguments and returns VALUE. I have to call a lot of functions which do not work that way. How can I rb_protect them from raising exceptions?
I have thought of using Data_Make_Struct to wrap everything into one ruby object and call methods on it. Data_Make_Struct could itself raise an exception. How do I rb_protect Data_Make_Struct?
To use rb_protect in a flexible way (e.g., to call a Ruby function with an arbitrary numbers of arguments), pass a small dispatch function to rb_protect. Ruby requires that sizeof(VALUE) == sizeof(void*), and rb_protect blindly passes the VALUE-typed data to the dispatch function without inspecting it or modifying it. This means that you can pass whatever data you want to the dispatch function, let it unpack the data and call the appropriate Ruby method(s).
For example, to rb_protect a call to a Ruby method, you might use something like this:
#define MAX_ARGS 16
struct my_callback_stuff {
VALUE obj;
ID method_id;
int nargs;
VALUE args[MAX_ARGS];
};
VALUE my_callback_dispatch(VALUE rdata)
{
struct my_callback_stuff* data = (struct my_callback_stuff*) rdata;
return rb_funcall2(data->obj, data->method_id, data->nargs, data->args);
}
... in some other function ...
{
/* need to call Ruby */
struct my_callback_stuff stuff;
stuff.obj = the_object_to_call;
stuff.method_id = rb_intern("the_method_id");
stuff.nargs = 3;
stuff.args[0] = INT2FIX(1);
stuff.args[1] = INT2FIX(2);
stuff.args[2] = INT2FIX(3);
int state = 0;
VALUE ret = rb_protect(my_callback_dispatch, (VALUE)(&stuff), &state);
if (state) {
/* ... error processing happens here ... */
}
}
Also, keep in mind that rb_rescue or rb_ensure may be a better approach for some problems.

Resources