Generator method not working as expected in Ruby native extension - ruby

I work in simulation and often need multiple data generators which are structurally identical but have different parameterizations. I'm trying to write a gem using C extensions to implement a factory method, which determines and creates an appropriate type of enumerator based on the parameterization provided. The following is not my actual code, but is intended as a minimal reproducible example to illustrate the behavior I find confusing:
#include "ruby.h"
VALUE rb_mTest = Qnil;
VALUE rb_cTest = Qnil;
static VALUE super_initialize(VALUE self) {
return self;
}
static VALUE rand_enum(VALUE self) {
RETURN_SIZED_ENUMERATOR(self, 0, 0, 0);
VALUE ary[1] = {LL2NUM(42)};
for(;;) {
rb_yield(rb_funcallv(rb_mKernel, rb_intern("rand"), 1, ary));
}
return Qnil;
}
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value);
for(;;) {
rb_yield(LL2NUM(value));
value += 2;
}
return Qnil;
}
static VALUE enum_factory(int32_t argc, VALUE* argv, VALUE self) {
VALUE argument;
rb_scan_args(argc, argv, "01", &argument);
switch (TYPE(argument)) {
case T_NIL:
printf("It's a nil\n");
return rand_enum(self);
case T_FIXNUM:
printf("It's a FIXNUM\n");
return enum_by_2(1, argv, self);
default:
printf("Unrecognized argument type\n");
return Qnil;
}
}
void Init_test(void) {
rb_mTest = rb_define_module("Test");
rb_cTest = rb_define_class_under(rb_mTest, "Tester", rb_cObject);
rb_define_method(rb_cTest, "initialize", super_initialize, 0);
rb_define_method(rb_cTest, "enum_factory", enum_factory, -1);
}
This code works, but in a way that I didn't expect. Here's an irb session from a test run after compiling the code above:
irb(main):001:0> require_relative 'lib/test'
=> true
irb(main):002:0> t = Test::Tester.new
=> #<Test::Tester:0x0000000104b23c70>
irb(main):003:0> g1 = t.enum_factory
It's a nil
=> #<Enumerator: ...>
irb(main):004:0> g2 = t.enum_factory(10)
It's a FIXNUM
=> #<Enumerator: ...>
irb(main):005:0> g1.take(5).to_a
It's a nil
=> [35, 21, 30, 20, 0]
irb(main):006:0> g2.take(5).to_a
It's a FIXNUM
=> [10, 12, 14, 16, 18]
As you can see, the enum_factory switches to return different enumerators based on the parameterization, and the printf statements and results indicate that the correct parameterization is being applied. My confusion stems from the printf output which shows that subsequent method calls applied directly to g1 and g2 still seem to be going through the factory method. My work often generates samples into the millions or even tens of millions, so my intentions were to maximize speed by writing this in C and to avoid parsing the parameters after doing it once in the factory, but clearly that's not what's happening. I'm probably misreading the (sparse) C API documentation and missing something trivial or obvious. I'd appreciate any pointers to where my misunderstanding lies.

You've only tagged ruby, I know ruby, but my c knowledge is 0, but maybe this will help.
I've added some extra output, because it was messing with my head, when things are actually executed in c vs ruby.
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
printf("before RETURN_SIZED_ENUMERATOR\n");
RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value);
printf("after RETURN_SIZED_ENUMERATOR\n");
for(;;) {
printf("for\n");
rb_yield(LL2NUM(value));
printf("after rb_yield\n");
value += 2;
}
return Qnil;
}
Best I figured, this is how RETURN_SIZED_ENUMERATOR function behaves. It returns a proc from the ruby method that was called, enum_factory. When you chain another method to it, proc is called again:
>> require_relative "test"; t = Test::Tester.new; g2 = t.enum_factory(1)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
=> #<Enumerator: ...>
# ^
# NOTE: not very helpful
>> g2.inspect
=> "#<Enumerator: #<Test::Tester:0x00007f0ddf6cecd0>:enum_factory(1)>"
# ^
# NOTE: a little more context -----------------------'
So g2 now holds a proc type of a thing that calls enum_factory.
>> g2.take(1)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
=> [1]
# NOTE: take(2) actually does a second for loop.
# yes, take(3) does it three times. it makes so much sense now.
>> g2.take(2)
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
# after rb_yield
# for
=> [1, 3]
But it also depends on the implementation of the method. For example, next works differently:
>> g2.next
# It's a FIXNUM
# before RETURN_SIZED_ENUMERATOR
# after RETURN_SIZED_ENUMERATOR
# for
=> 1
>> g2.next
# after rb_yield
# for
=> 3
From ruby perspective, this is analogous to each and anything else that returns an enumerator:
>> g3 = [1].each
=> #<Enumerator: ...>
>> g3.inspect
=> "#<Enumerator: [1]:each>"
# ^
# NOTE: looks like enum_factory
That's about as much as I could figure out.
Update
Ok, I figured out a little more. This call ID2SYM(rb_frame_this_func()) is what determines the method/proc that is returned in ruby. So just copy pasting rb_enumeratorize_with_size and fixing up some arguments, does the job:
static VALUE enum_by_2(int32_t argc, VALUE* argv , VALUE self) {
rb_check_arity(argc, 1, 1);
int64_t value = NUM2LL(argv[0]);
/* RETURN_SIZED_ENUMERATOR(self, 1, argv, (1ll << 63) - value); */
if (!rb_block_given_p()){
return rb_enumeratorize_with_size(
(self), ID2SYM(rb_intern("enum_by_2")),
(argc), (argv), ((1ll << 63) - value)
);
}
/* there is also `while(0)` in RETURN_SIZED_ENUMERATOR,
I don't know what that's about, so skipped */
for(;;) {
rb_yield(LL2NUM(value));
value += 2;
}
return Qnil;
}
and define this method for ruby in Init_test:
rb_define_method(rb_cTest, "enum_by_2", enum_by_2, -1);
>> g2 = t.enum_factory(1)
# It's a FIXNUM
=> #<Enumerator: ...>
# NOTE: no FIXNUM the second time
>> g2.take(1)
=> [1]
>> g2.inspect
=> "#<Enumerator: #<Test::Tester:0x00007f6b11ba4158>:enum_by_2(1)>"
# ^
# NOTE: the final `enum_by_2` is returned in ruby ---'
This roughly resembles return enum_for(__callee__) unless block_given?.
...
#define RETURN_SIZED_ENUMERATOR(obj, argc, argv, size_fn) do {
...
https://github.com/ruby/ruby/blob/v3_1_2/include/ruby/internal/intern/enumerator.h#L198

Related

Hash Enumerable methods: Inconsistent behavior when passing only one parameter

Ruby's enumerable methods for Hash expect 2 parameters, one for the key and one for the value:
hash.each { |key, value| ... }
However, I notice that the behavior is inconsistent among the enumerable methods when you only pass one parameter:
student_ages = {
"Jack" => 10,
"Jill" => 12,
}
student_ages.each { |single_param| puts "param: #{single_param}" }
student_ages.map { |single_param| puts "param: #{single_param}" }
student_ages.select { |single_param| puts "param: #{single_param}" }
student_ages.reject { |single_param| puts "param: #{single_param}" }
# results:
each...
param: ["Jack", 10]
param: ["Jill", 12]
map...
param: ["Jack", 10]
param: ["Jill", 12]
select...
param: Jack
param: Jill
reject...
param: Jack
param: Jill
As you can see, for each and map, the single parameter gets assigned to a [key, value] array, but for select and reject, the parameter is only the key.
Is there a particular reason for this behavior? The docs don't seem to mention this at all; all of the examples given just assume that you are passing in two parameters.
Just checked Rubinius behavior and it is indeed consistent with CRuby. So looking at the Ruby implementation - it is indeed because #select yields two values:
yield(item.key, item.value)
while #each yields an array with two values:
yield [item.key, item.value]
Yielding two values to a block that expects one takes the first argument and ignores the second one:
def foo
yield :bar, :baz
end
foo { |x| p x } # => :bar
Yielding an array will either get completely assigned if the block has one parameter or get unpacked and assigned to each individual value (as if you passed them one by one) if there are two or more parameters.
def foo
yield [:bar, :baz]
end
foo { |x| p x } # => [:bar, :baz]
As for why they made that descision - there probably isn't any good reason behind it, it just wasn't expected people to call them with one argument.
My guess is that internally map is just each with collect. Interesting they don't work quite the same way.
As to each...
The source code is below. It checks how many arguments you've passed into the block. If more than one it calls each_pair_i_fast, otherwise just each_pair_i.
static VALUE
rb_hash_each_pair(VALUE hash)
{
RETURN_SIZED_ENUMERATOR(hash, 0, 0, hash_enum_size);
if (rb_block_arity() > 1)
rb_hash_foreach(hash, each_pair_i_fast, 0);
else
rb_hash_foreach(hash, each_pair_i, 0);
return hash;
}
each_pair_i_fast returns two distinct values:
each_pair_i_fast(VALUE key, VALUE value)
{
rb_yield_values(2, key, value);
return ST_CONTINUE;
}
each_pair_i does not:
each_pair_i(VALUE key, VALUE value)
{
rb_yield(rb_assoc_new(key, value));
return ST_CONTINUE;
}
rb_assoc_new returns a two element array (at least I'm assuming that is what rb_ary_new3 does
rb_assoc_new(VALUE car, VALUE cdr)
{
return rb_ary_new3(2, car, cdr);
}
select looks like this:
rb_hash_select(VALUE hash)
{
VALUE result;
RETURN_SIZED_ENUMERATOR(hash, 0, 0, hash_enum_size);
result = rb_hash_new();
if (!RHASH_EMPTY_P(hash)) {
rb_hash_foreach(hash, select_i, result);
}
return result;
}
and select_i looks like this:
select_i(VALUE key, VALUE value, VALUE result)
{
if (RTEST(rb_yield_values(2, key, value))) {
rb_hash_aset(result, key, value);
}
return ST_CONTINUE;
}
And I'm going to assume that rb_hash_aset returns two distinct arguments similar to each_pair_i.
Most important notice that select/etc doesn't check the argument arity at all.
Sources:
https://github.com/ruby/ruby/blob/d5c5d5c778a0e8d61ab07669132dc18fb1a2e874/hash.c
https://github.com/ruby/ruby/blob/9f44b77a18d4d6099174c6044261eb1611a147ea/array.c

Reverse words of a sentence

I want to reverse a sentence. For example, my string is like follows.
str = "I am a good boy"
I want the result "boy good a am I". I can reverse the string by using built in Ruby methods like:
str.split(" ").reverse.join(" ") #=> "boy good a am I"
Is there any way to do this without using Ruby built in methods?
Sure, there is the way. In fact my favorite way. You said you don't want to use Ruby's builtins. Well, we won't. What about native extensions? I know people love them.
Firstly, create reverse/reverse.c file. Most of the source I took here.
#include "ruby/ruby.h"
void reverseWords(char *s)
{
char *word_begin = NULL;
char *temp = s;
while( *temp )
{
if (( word_begin == NULL ) && (*temp != ' ') )
{
word_begin=temp;
}
if(word_begin && ((*(temp+1) == ' ') || (*(temp+1) == '\0')))
{
reverse(word_begin, temp);
word_begin = NULL;
}
temp++;
}
reverse(s, temp-1);
}
void reverse(char *begin, char *end)
{
char temp;
while (begin < end)
{
temp = *begin;
*begin++ = *end;
*end-- = temp;
}
}
VALUE reverse_words(VALUE str)
{
char *s;
s = RSTRING_PTR(str);
reverseWords(s);
return str;
}
void Init_reverse_words()
{
VALUE string = rb_const_get(rb_cObject, rb_intern("String"));
rb_define_method(string, "reverse_words!", reverse_words, 0);
}
Secondly, create reverse/extconf.rb file:
require 'mkmf'
create_makefile('reverse_words')
Thirdly, in terminal cd to reverse folder and run:
$ ruby extconf.rb
$ make && make install
Finally, test it at irb.
irb(main):001:0> require 'reverse_words'
=> true
irb(main):002:0> "foo bar baz".reverse_words!
=> "baz bar foo"
That's the way to reverse words order without using builtins.
You can reverse an Array by pop ing each element into a new Array
arr, new_arr = ["I", "am", "a", "good", "boy"], []
for i in 0...arr.length do
new_arr << arr[arr.length - 1 - i]
end
new_arr
# => ["boy", "good", "a", "am", "I"]
TRy this
def reverse(string)
reverse = ""
index = 0
while index < string.length
reverse = string[index] + reverse
index += 1
end
return reverse
end
reverse_sent = reverse(""I am a good boy"")
reverse_sent.split.map{|word| reverse(word)}.join(" ")
You can not.
In ruby, you'll always end up using just an other methods, or reimplementing some that already exist (like other answer did).
In C maybe you could, using dynamic length array, but in ruby you MUST use the builtins methods.
Is there any way to do this without using Ruby built in methods?
No.
Ruby is an object-oriented language. In OO, you do things by calling methods. There is no other way to perform any action.
Okay, technically, there are some corners where Ruby is not object-oriented: if, &&, ||, and and or have no corresponding methods. So, that's all you can use.

How can one know if a Proc is a lambda or not in Ruby

Suppose I have created a lambda instance, and I want later to query this object to see if it is a proc or a lambda. How does one do that? the .class() method does not do the trick.
irb(main):001:0> k = lambda{ |x| x.to_i() +1 }
=> #<Proc:0x00002b931948e590#(irb):1>
irb(main):002:0> k.class()
=> Proc
Ruby 1.9.3 and higher
You are looking for Proc#lambda? method.
k = lambda { |x| x.to_i + 1 }
k.lambda? #=> true
k = proc { |x| x.to_i + 1 }
k.lambda? #=> false
Pre 1.9.3 solution
We are going to make ruby native extension.
Create proc_lambda/proc_lambda.c file with following content.
#include <ruby.h>
#include <node.h>
#include <env.h>
/* defined so at eval.c */
#define BLOCK_LAMBDA 2
struct BLOCK {
NODE *var;
NODE *body;
VALUE self;
struct FRAME frame;
struct SCOPE *scope;
VALUE klass;
NODE *cref;
int iter;
int vmode;
int flags;
int uniq;
struct RVarmap *dyna_vars;
VALUE orig_thread;
VALUE wrapper;
VALUE block_obj;
struct BLOCK *outer;
struct BLOCK *prev;
};
/* the way of checking if flag is set I took from proc_invoke function at eval.c */
VALUE is_lambda(VALUE self)
{
struct BLOCK *data;
Data_Get_Struct(self, struct BLOCK, data);
return (data->flags & BLOCK_LAMBDA) ? Qtrue : Qfalse;
}
void Init_proc_lambda()
{
/* getting Proc class */
ID proc_id = rb_intern("Proc");
VALUE proc = rb_const_get(rb_cObject, proc_id);
/* extending Proc with lambda? method */
rb_define_method(proc, "lambda?", is_lambda, 0);
}
Create proc_lambda/extconf.rb file:
require 'mkmf'
create_makefile('proc_lambda')
In terminal cd to proc_lambda and run
$ ruby extconf.rb
$ make && make install
Test it in irb
irb(main):001:0> require 'proc_lambda'
=> true
irb(main):002:0> lambda {}.lambda?
=> true
irb(main):003:0> Proc.new {}.lambda?
=> false

Easy way to parse hashes and arrays

Typically, parsing XML or JSON returns a hash, array, or combination of them. Often, parsing through an invalid array leads to all sorts of TypeErrors, NoMethodErrors, unexpected nils, and the like.
For example, I have a response object and want to find the following element:
response['cars'][0]['engine']['5L']
If response is
{ 'foo' => { 'bar' => [1, 2, 3] } }
it will throw a NoMethodError exception, when all I want is to see is nil.
Is there a simple way to look for an element without resorting to lots of nil checks, rescues, or Rails try methods?
Casper was just before me, he used the same idea (don't know where i found it, is a time ago) but i believe my solution is more sturdy
module DeepFetch
def deep_fetch(*keys, &fetch_default)
throw_fetch_default = fetch_default && lambda {|key, coll|
args = [key, coll]
# only provide extra block args if requested
args = args.slice(0, fetch_default.arity) if fetch_default.arity >= 0
# If we need the default, we need to stop processing the loop immediately
throw :df_value, fetch_default.call(*args)
}
catch(:df_value){
keys.inject(self){|value,key|
block = throw_fetch_default && lambda{|*args|
# sneak the current collection in as an extra block arg
args << value
throw_fetch_default.call(*args)
}
value.fetch(key, &block) if value.class.method_defined? :fetch
}
}
end
# Overload [] to work with multiple keys
def [](*keys)
case keys.size
when 1 then super
else deep_fetch(*keys){|key, coll| coll[key]}
end
end
end
response = { 'foo' => { 'bar' => [1, 2, 3] } }
response.extend(DeepFetch)
p response.deep_fetch('cars') { nil } # nil
p response.deep_fetch('cars', 0) { nil } # nil
p response.deep_fetch('foo') { nil } # {"bar"=>[1, 2, 3]}
p response.deep_fetch('foo', 'bar', 0) { nil } # 1
p response.deep_fetch('foo', 'bar', 3) { nil } # nil
p response.deep_fetch('foo', 'bar', 0, 'engine') { nil } # nil
I tried to look through both the Hash documentation and also through Facets, but nothing stood out as far as I could see.
So you might want to implement your own solution. Here's one option:
class Hash
def deep_index(*args)
args.inject(self) { |e,arg|
break nil if e[arg].nil?
e[arg]
}
end
end
h1 = { 'cars' => [{'engine' => {'5L' => 'It worked'}}] }
h2 = { 'foo' => { 'bar' => [1, 2, 3] } }
p h1.deep_index('cars', 0, 'engine', '5L')
p h2.deep_index('cars', 0, 'engine', '5L')
p h2.deep_index('foo', 'bonk')
Output:
"It worked"
nil
nil
If you can live with getting an empty hash instead of nil when there is no key, then you can do it like this:
response.fetch('cars', {}).fetch(0, {}).fetch('engine', {}).fetch('5L', {})
or save some types by defining a method Hash#_:
class Hash; def _ k; fetch(k, {}) end end
response._('cars')._(0)._('engine')._('5L')
or do it at once like this:
["cars", 0, "engine", "5L"].inject(response){|h, k| h.fetch(k, {})}
For the sake of reference, there are several projects i know of that tackle the more general problem of chaining methods in the face of possible nils:
andand
ick
zucker's egonil
methodchain
probably others...
There's also been considerable discussion in the past:
Ruby nil-like object - One of many on SO
Null Objects and Falsiness - Great article by Avdi Grimm
The 28 Bytes of Ruby Joy! - Very interesting discussion following J-_-L's post
More idiomatic way to avoid errors when calling method on variable that may be nil? on ruby-talk
et cetera
Having said that, the answers already provided probably suffice for the more specific problem of chained Hash#[] access.
I would suggest an approach of injecting custom #[] method to instances we are interested in:
def weaken_checks_for_brackets_accessor inst
inst.instance_variable_set(:#original_get_element_method, inst.method(:[])) \
unless inst.instance_variable_get(:#original_get_element_method)
singleton_class = class << inst; self; end
singleton_class.send(:define_method, :[]) do |*keys|
begin
res = (inst.instance_variable_get(:#original_get_element_method).call *keys)
rescue
end
weaken_checks_for_brackets_accessor(res.nil? ? inst.class.new : res)
end
inst
end
Being called on the instance of Hash (Array is OK as all the other classes, having #[] defined), this method stores the original Hash#[] method unless it is already substituted (that’s needed to prevent stack overflow during multiple calls.) Then it injects the custom implementation of #[] method, returning empty class instance instead of nil/exception. To use the safe value retrieval:
a = { 'foo' => { 'bar' => [1, 2, 3] } }
p (weaken_checks_for_brackets_accessor a)['foo']['bar']
p "1 #{a['foo']}"
p "2 #{a['foo']['bar']}"
p "3 #{a['foo']['bar']['ghgh']}"
p "4 #{a['foo']['bar']['ghgh'][0]}"
p "5 #{a['foo']['bar']['ghgh'][0]['olala']}"
Yielding:
#⇒ [1, 2, 3]
#⇒ "1 {\"bar\"=>[1, 2, 3]}"
#⇒ "2 [1, 2, 3]"
#⇒ "3 []"
#⇒ "4 []"
#⇒ "5 []"
Since Ruby 2.3, the answer is dig

Difference between puts a and puts "#{a}"

I thought that doing puts #{a} would result in the same output as puts a, but found this not to be the case. Consider:
irb(main):001:0> a = [1,2]
=> [1, 2]
irb(main):002:0> puts a
1
2
=> nil
irb(main):003:0> puts "#{a}"
12
=> nil
irb(main):004:0>
In the above example it doesn't matter much, but it may matter when I want to print multiple variables on one line, such as (psudocode):
puts "There are #{a.size} items in the whitelist: #{a}"
Why is the output different here? Do they actually do different things, or have different semantics?
That's because "#{a}" calls the #to_s method on the expression.
So:
puts a # equivalent to, well, puts a
puts "#{a}" # equivalent to the next line
puts a.to_s
Update:
To elaborate, puts eventually calls #to_s but it adds logic in front of the actual output, including special handling for arrays. It just happens that Array#to_s doesn't use the same algorithm. (See puts docs here.) Here is exactly what it does...
rb_io_puts(int argc, VALUE *argv, VALUE out)
{
int i;
VALUE line;
/* if no argument given, print newline. */
if (argc == 0) {
rb_io_write(out, rb_default_rs);
return Qnil;
}
for (i=0; i<argc; i++) {
if (TYPE(argv[i]) == T_STRING) {
line = argv[i];
goto string;
}
line = rb_check_array_type(argv[i]);
if (!NIL_P(line)) {
rb_exec_recursive(io_puts_ary, line, out);
continue;
}
line = rb_obj_as_string(argv[i]);
string:
rb_io_write(out, line);
if (RSTRING_LEN(line) == 0 ||
!str_end_with_asciichar(line, '\n')) {
rb_io_write(out, rb_default_rs);
}
}
return Qnil;
}

Resources