Chaining enumerators that yield multiple arguments - ruby

I'm trying to figure out how Ruby handles chaining enumerators that yield multiple arguments. Take a look at this snippet:
a = ['a', 'b', 'c']
a.each_with_index.select{|pr| p pr}
# prints:
# ["a", 0]
# ["b", 1]
# ["c", 2]
a.each_with_index.map{|pr| p pr}
# prints:
# "a"
# "b"
# "c"
Why does select yield the arguments as an array, whereas map yields them as two separate arguments?

Try:
a.each_with_index.map{|pr,last| p "pr: #{pr} last: #{last}"}
map is automatically deconstructing the values passed to it. The next question is why is it doing this deconstruction and select isn't?
If you look at the source given on the Rdoc page for Array they're virtually identical, select only differs in that it does a test on the value yielded. There must be something happening elsewhere.
If we look at the Rubinius source (mainly because I'm better with Ruby than C;) for map (aliased from collect) it shows us:
each do |*o|
so it's splatting the arguments on the way through, whereas select (aliased from find_all) does not:
each do
again, the design decision as to why is beyond me. You'll have to find out who wrote it, maybe ask Matz :)
I should add, looking at the Rubinius source again, map actual splats on each and on yield, I don't understand why you'd do both when only the yield splat is needed:
each do |*o|
ary << yield(*o)
end
whereas select doesn't.
each do
o = Rubinius.single_block_arg
ary << o if yield(o)
end

According to the MRI source, it seems like the iterator used in select splats its arguments coming in, but map does not and passes them unpacked; the block in your latter case silently ignores the other arguments.
The iterator used in select:
static VALUE
find_all_i(VALUE i, VALUE ary, int argc, VALUE *argv)
{
ENUM_WANT_SVALUE();
if (RTEST(rb_yield(i))) {
rb_ary_push(ary, i);
}
return Qnil;
}
The iterator used in map:
static VALUE
collect_i(VALUE i, VALUE ary, int argc, VALUE *argv)
{
rb_ary_push(ary, enum_yield(argc, argv));
return Qnil;
}
I'm pretty sure the ENUM_WANT_SVALUE() macro is used to turn the value passed into the block into a splat array value (as opposed to a tuple with the latter arguments silently ignored). That said, I don't know why it was designed this way.

From the discourse so far, it follows that we can analyze the source code, but we do not know the whys. Ruby core team is relatively very responsive. I recommend you to sign in at http://bugs.ruby-lang.org/issues/ and post a bug report there. They will surely look at this issue at most within a few weeks, and you can probably expect it corrected in the next minor version of Ruby. (That is, unless there is a design rationale unknown to us to keep things as they are.)

Let's see MRI source in enum.c. As #PlatinumAzure said, the magic happens in ENUM_WANT_SVALUE():
static VALUE
find_all_i(VALUE i, VALUE ary, int argc, VALUE *argv)
{
ENUM_WANT_SVALUE();
if (RTEST(rb_yield(i))) {
rb_ary_push(ary, i);
}
return Qnil;
}
And we can find this macro actually is: do {i = rb_enum_values_pack(argc, argv);}while(0).
So Let's continue dive into rb_enum_values_pack function:
VALUE
rb_enum_values_pack(int argc, VALUE *argv)
{
if (argc == 0) return Qnil;
if (argc == 1) return argv[0];
return rb_ary_new4(argc, argv);
}
See? The arguments are packed by rb_ary_new4, which is defined in array.c.

Related

Why the following code prints garbage values for input strings greater than 128 bytes?

This is a problem of codechef that I recently came across. The answer seems to be right for every test case where the value of input string is less than 128 bytes as it is passing a couple of test cases. For every value greater than 128 bytes it is printing out a large value which seems to be a garbage value.
std::string str;
std::cin>>str;
vector<pair<char,int>> v;
v.push_back(make_pair('C',0));
v.push_back(make_pair('H',0));
v.push_back(make_pair('E',0));
v.push_back(make_pair('F',0));
int i=0;
while(1)
{
if(str[i]=='C')
v['C'].second++;
else if (str[i]=='H')
{
v['H'].second++;
v['C'].second--;
}
else if (str[i]=='E')
{
v['E'].second++;
v['C'].second--;
}
else if (str[i]=='F')
v['F'].second++;
else
break;
i++;
Even enclosing the same code within
/*reading the string values from a file and not console*/
std::string input;
std::ifstream infile("input.txt");
while(getline(infile,input))
{
istringstream in(input);
string str;
in>>str;
/* above code goes here */
}
generates the same result. I am not looking for any solution(s) or hint(s) to get to the right answer as I want to test the correctness of my algorithm. But I want to know why this happens as I am new to vector containers`.
-Regards.
if(str[i]=='C')
v['C'].second++;
You're modifying v[67]
... which is not contained in your vector, and thus either invalid memory or uninitialized
You seem to be trying to use a vector as an associative array. There is already such a structure in C++: a std::map. Use that instead.
With using this v['C'] you actually access the 67th (if 'A' is 65 from ASCII) element of a container having only 4 items. Depending on compiler and mode (debug vs release) you get undefined behavior for the code.
What you probably wanted to use was map i.e. map<char,int> v; instead of vector<pair<char,int>> v; and simple v['C']++; instead of v['C'].second++;

Binding using std::bind vs lambdas. How expensive are they?

I was playing with bind and I was thinking, are lambdas as expensive as function pointers?
What I mean is, as I understand lambdas, they are syntactic sugar for functors and bind is similar. However, if you do this:
#include<functional>
#include<iostream>
void fn2(int a, int b)
{
std::cout << a << ", " << b << std::endl;
}
void fn1(int a, int b)
{
//auto bound = std::bind(fn2, a, b);
//static auto bound = std::bind(fn2, a, b);
//auto bound = [&]{ fn2(a, b); };
static auto bound = [&]{ fn2(a, b); };
bound();
}
int main()
{
fn1(3, 4);
fn1(1, 2);
return 0;
}
Now, if I were to use the 1st auto bound = std::bind(fn2, a, b);, I get an output of 3, 4
1, 2, the 2nd I get 3, 4
3, 4. The 3rd and 4th I get output like the 1st.
Now I get why the 1st and 2nd work that way, they are getting initialised at the beginning of the function call (the static one, only the 1st time it is called). However, 3 and 4 seem to have compiler magic going on where the generated functors are not really creating references to the enclosing scope's variables, but are actually latching on to the symbols whether or not it is initialised only the first time or every time.
Can someone clarify what is actually happening here?
Edit: What I was missing is using static auto bound = std::bind(fn2, std::ref(a), std::ref(b)); to have it work as the 4th option.
You have this code:
static auto bound = [&]{ fn2(a, b); };
Assignment is done only first time you are invoking this function because it's static. So in fact it's called only once. Compiler creates closure when you are making lambdas, so references to a and b from first call to fn1 was captured. It's very risky. It may lead to dangling references. I'm surprised it didn't crashed since you are making closure from function parameters passed by value - to local variables.
I recommend this excellent article about lambdas: http://www.cprogramming.com/c++11/c++11-lambda-closures.html .
As a general rule, only use [&] lambdas when your closure is going to go away by the end of the current scope.
If it is going to outlast the current scope, and you need by-reference, explicitly capture the things you are going to capture, or create local pointers to the things you are going to capture and capture them by-value.
In your case, your static lambda code is full of undefined behavior, as you [&] capture a and b in the first call, then use it in the second call.
In theory, the compiler could rewrite your code to capture a and b by value instead of by reference, then call that every time, because the only difference between that implementation and the one you wrote occurs when the behavior is undefined, and the result will be much faster.
It could do a more efficient job by ignoring your static completely, as the entire state of your static object is undefined after you leave scope the first time you call, and the construction has no visible side effects.
To fix your problem with the lambdas, use [=] or [a,b] to introduce the lambda, and it will capture the a and b by value. I prefer to capture state explicitly on lambdas when I expect the lambda to persist longer than the current block.

Odd Ruby Behaviour

Consider the following Ruby code:
a = ["x"] * 3 # or a = Array.new(3, "x")
a[0].insert(0, "a")
a.each {|i| puts i}
I would expect the output to be ax, x, x (on new lines of course). However, with Ruby 1.9.1 the output is ax, ax, ax. What's going on? I've narrowed the problem down to the way the array a is defined. If I explicitly write out
a = ["x", "x", "x"]
then the code works as expected, but either version in the original code gives me this unexpected behaviour. It appears that the */initializer means the copies are actually references to the same copy of the string "x". However, if instead of the insert command I write
a[0] = "a" + a[0]
Then I get the desired output. Is this a bug, or is there some feature at work which I'm not understanding?
The documentation to Array.new(size=0, obj=nil):
... it is created with size copies of obj (that is, size references to the same obj).
and Array * int:
... returns a new array built by concatenating the int copies of self
So in both of the forms you're surprised by, you end up with three references to the same "x" object, just as you figured out. I'd say you might argue about the design decision, but it's a documented intentional behavior, not a bug.
The best way I know to get the behavior you want without manually writing the array literal (["x", "x", "x"]) is
a = Array.new(3) {"x"}
Or course, with just three elements, it doesn't much matter, but with anything much bigger, this form comes in handy.
In short, although "x" is just a literal, it is an object. You use ["x'] * 3 so a is containing 3 same object. You insert 'a' to one of them, they will be all changed.

C++: shared_ptr as unordered_set's key

Consider the following code
#include <boost/unordered_set.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
int main()
{
boost::unordered_set<int> s;
s.insert(5);
s.insert(5);
// s.size() == 1
boost::unordered_set<boost::shared_ptr<int> > s2;
s2.insert(boost::make_shared<int>(5));
s2.insert(boost::make_shared<int>(5));
// s2.size() == 2
}
The question is: how come the size of s2 is 2 instead of 1? I'm pretty sure it must have something to do with the hash function. I tried looking at the boost docs and playing around with the hash function without luck.
Ideas?
make_shared allocates a new int, and wraps a shared_ptr around it. This means that your two shared_ptr<int>s point to different memory, and since you're creating a hash table keyed on pointer value, they are distinct keys.
For the same reason, this will result in a size of 2:
boost::unordered_set<int *> s3;
s3.insert(new int(5));
s3.insert(new int(5));
assert(s3.size() == 2);
For the most part you can consider shared_ptrs to act just like pointers, including for comparisons, except for the auto-destruction.
You could define your own hash function and comparison predicate, and pass them as template parameters to unordered_map, though:
struct your_equality_predicate
: std::binary_function<boost::shared_ptr<int>, boost::shared_ptr<int>, bool>
{
bool operator()(boost::shared_ptr<int> i1, boost::shared_ptr<int> i2) const {
return *i1 == *i2;
}
};
struct your_hash_function
: std::unary_function<boost::shared_ptr<int>, std::size_t>
{
std::size_t operator()(boost::shared_ptr<int> x) const {
return *x; // BAD hash function, replace with somethign better!
}
};
boost::unordered_set<int, your_hash_function, your_equality_predicate> s4;
However, this is probably a bad idea for a few reasons:
You have the confusing situation where x != y but s4[x] and s4[y] are the same.
If someone ever changes the value pointed-to by a hash key your hash will break! That is:
boost::shared_ptr<int> tmp(new int(42));
s4[tmp] = 42;
*tmp = 24; // UNDEFINED BEHAVIOR
Typically with hash functions you want the key to be immutable; it will always compare the same, no matter what happens later. If you're using pointers, you usually want the pointer identity to be what is matched on, as in extra_info_hash[&some_object] = ...; this will normally always map to the same hash value whatever some_object's members may be. With the keys mutable after insertion is it all too easy to actually do so, resulting in undefined behavior in the hash.
Notice that in Boost <= 1.46.0, the default hash_value of a boost::shared_ptr is its boolean value, true or false.
For any shared_ptr that is not NULL, hash_value evaluates to 1 (one), as the (bool)shared_ptr == true.
In other words, you downgrade a hash set to a linked list if you are using Boost <= 1.46.0.
This is fixed in Boost 1.47.0, see https://svn.boost.org/trac/boost/ticket/5216 .
If you are using std::shared_ptr, please define your own hash function, or use boost/functional/hash/extensions.hpp from Boost >= 1.51.0
As you found out, the two objects inserted into s2 are distinct.

Code folding on consecutive collect/select/reject/each

I play around with arrays and hashes quite a lot in ruby and end up with some code that looks like this:
sum = two_dimensional_array.select{|i|
i.collect{|j|
j.to_i
}.sum > 5
}.collect{|i|
i.collect{|j|
j ** 2
}.average
}.sum
(Let's all pretend that the above code sample makes sense now...)
The problem is that even though TextMate (my editor of choice) picks up simple {...} or do...end blocks quite easily, it can't figure out (which is understandable since even I can't find a "correct" way to fold the above) where the above blocks start and end to fold them.
How would you fold the above code sample?
PS: considering that it could have 2 levels of folding, I only care about the outer consecutive ones (the blocks with the i)
To be honest, something that convoluted is probably confusing TextMate as much as anyone else who has to maintain it, and that includes you in the future.
Whenever you see something that rolls up into a single value, it's a good case for using Enumerable#inject.
sum = two_dimensional_array.inject(0) do |sum, row|
# Convert row to Fixnum equivalent
row_i = row.collect { |i| i.to_i }
if (row_i.sum > 5)
sum += row_i.collect { |i| i ** 2 }.average
end
sum # Carry through to next inject call
end
What's odd in your example is you're using select to return the full array, allegedly converted using to_i, but in fact Enumerable#select does no such thing, and instead rejects any for which the function returns nil. I'm presuming that's none of your values.
Also depending on how your .average method is implemented, you may want to seed the inject call with 0.0 instead of 0 to use a floating-point value.

Resources