Yield within Set to eliminate in an Array - ruby

I found the following code here for eliminating duplicate records in an array:
require 'set'
class Array
def uniq_by
seen = Set.new
select{ |x| seen.add?( yield( x ) ) }
end
end
And we can use the code above as follows:
#messages = Messages.all.uniq_by { |h| h.body }
I would like to know how and what happens when the method is called. Can someone explain the internals of the code above? In the uniq_by method, we did not do anything to handle block argument. How is the passed argument handled by uniq_by method?

Let's break it down :
seen = Set.new
Create an empty set
select{ |x| seen.add?( yield( x ) ) }
Array#select will keep elements when the block yields true.
seen.add?(yield(x)) will return true if the result of the block can be added in the set, or false if it can't.
Indeed, yield(x) will call the block passed to the uniq_by method, and pass x as an argument.
In our case, since our block is { |h| h.body }, it would be the same as calling seen.add?(x.body)
Since a set is unique, calling add? when the element already exists will return false.
So it will try to call .body on each element of the array and add it in a set, keeping elements where the adding was possible.

The method uniq_by accepts a block argument. This allows to specify, by what criteria you wish to identify two elements as "unique".
The yield statement will evaluate the value of the given block for the element and return the value of the elements body attribute.
So, if you call unique_by like above, you are stating that the attribute body of the elements has to be unique for the element to be unique.
To answer the more specific question you have: yield will call the passed block {|h| h.body} like a method, substituting h for the current x and therefore return x.body

In Ruby, when you are putting yield keyword inside any method(say #bar), you are explicitly telling #bar that, you will be using a block with the method #bar. So yield knows, inside the method block will be converted to a Proc object, and yield have to call that Proc object.
Example :
def bar
yield
end
p bar { "hello" } # "hello"
p bar # bar': no block given (yield) (LocalJumpError)
In the uniq_by method, we did not do anything to handle block argument. How is the passed argument handled by uniq_by method?
You did do, that is you put yield. Once you will put this yield, now method is very smart to know, what it supposed to so. In the line Messages.all.uniq_by { |h| h.body } you are passing a block { |h| h.body }, and inside the method definition of uniq_by, that block has been converted to a Proc object, and yield does Proc#call.
Proof:
def bar
p block_given? # true
yield
end
bar { "hello" } # "hello"
Better for understanding :
class Array
def uniq_by
seen = Set.new
select{ |x| seen.add?( yield( x ) ) }
end
end
is same as
class Array
def uniq_by
seen = Set.new
# Below you are telling uniq_by, you will be using a block with it
# by using `yield`.
select{ |x| var = yield(x); seen.add?(var) }
end
end
Read the doc of yield
Called from inside a method body, yields control to the code block (if any) supplied as part of the method call. If no code block has been supplied, calling yield raises an exception. yield can take an argument; any values thus yielded are bound to the block's parameters. The value of a call to yield is the value of the executed code block.

Array#select returns a new array containing all elements of the array for which the given block returns a true value.
The block argument of the select use Set#add? to determine whether the element is already there. add? returns nil if there is already the same element in the set, otherwise it returns the set itself and add the element to the set.
The block again pass the argument (an element of the array) to another block (the block passed to the uniq_by) using yield; Return value of the yield is return value of the block ({|h| h.body })
The select .. statement is basically similar to following statement:
select{ |x| seen.add?(x.body) }
But by using yield, the code avoid hard-coding of .body, and defers decision to the block.

Related

Advantages of using block, proc, lambda in Ruby

Example: LinkedList printing method.
For this object, you will find a printing method using block, proc, and lambda.
It is not clear to me what the advantages/disadvantages are (if any).
Thank you
What is a LinkedList?
A LinkedList is a node that has a specific value attached to it (which is sometimes called a payload), and a link to another node (or nil if there is no next item).
class LinkedListNode
attr_accessor :value, :next_node
def initialize(value, next_node = nil)
#value = value
#next_node = next_node
end
def method_print_values(list_node)
if list_node
print "#{list_node.value} --> "
method_print_values(list_node.next_node)
else
print "nil\n"
return
end
end
end
node1 = LinkedListNode.new(37)
node2 = LinkedListNode.new(99, node1)
node3 = LinkedListNode.new(12, node2)
#printing the linked list through a method defined within the scope of the class
node3.method_print_values(node3)
#---------------------------- Defining the printing method through a BLOCK
def block_print_value(list_node, &block)
if list_node
yield list_node
block_print_value(list_node.next_node, &block)
else
print "nil\n"
return
end
end
block_print_value(node3) { |list_node| print "#{list_node.value} --> " }
#---------------------------- Defining the printing method through a PROC
def proc_print_value(list_node, callback)
if list_node
callback.call(list_node) #this line invokes the print function defined below
proc_print_value(list_node.next_node, callback)
else
print "nil\n"
end
end
proc_print_value(node3, Proc.new {|list_node| print "#{list_node.value} --> "})
#---------------------------- Defining the printing method through a LAMBDA
def lambda_print_value(list_node, callback)
if list_node
callback.call(list_node) #this line invokes the print function defined below
lambda_print_value(list_node.next_node, callback)
else
print "nil\n"
end
end
lambda_print_value(node3, lambda {|list_node| print "#{list_node.value} --> "})
#---------------------------- Defining the printing method outside the class
def print_values(list_node)
if list_node
print "#{list_node.value} --> "
print_values(list_node.next_node)
else
print "nil\n"
return
end
end
print_values(node3)
Examples display how to use different things to do the same. So, there is no principal difference between them in this context:
my_proc = Proc.new { |list_node| print "#{list_node.value} --> " }
node3.block_print_values(node3, &my_proc)
node3.proc_print_value(node3, my_proc)
node3.lambda_print_value(node3, my_proc)
Also, there is possible to define a method by using any of them:
define_method(:my_method, p, &proc { puts p })
my_method 'hello' #=> hello
define_method(:my_method, p, &-> { puts p })
my_method 'hello' #=> hello
But Proc, Lambda, block are not the same. Firstly, need a bit more display how to works magic &. The great article can help with that:
&object is evaluated in the following way:
if object is a block, it converts the block into a simple proc.
if object is a Proc, it converts the object into a block while preserving the lambda? status of the object.
if object is not a Proc, it first calls #to_proc on the object and then converts it into a block.
But this does not show the differences between them. So, now let go to the ruby source:
Proc objects are blocks of code that have been bound to a set of local variables. Once bound, the code may be called in different contexts and still access those variables.
And
+lambda+, +proc+ and Proc.new preserve the tricks of a Proc object given by & argument.
lambda(&lambda {}).lambda? #=> true
proc(&lambda {}).lambda? #=> true
Proc.new(&lambda {}).lambda? #=> true
lambda(&proc {}).lambda? #=> false
proc(&proc {}).lambda? #=> false
Proc.new(&proc {}).lambda? #=> false
Proc created as:
VALUE block = proc_new(klass, FALSE);
rb_obj_call_init(block, argc, argv);
return block;
When lambda:
return proc_new(rb_cProc, TRUE);
Both are Proc. In this case, the difference is just in TRUE or FALSE. TRUE, FALSE - check the number of parameters passed when called.
So, lambda is like more strict Proc:
is_proc = !proc->is_lambda;
Summary of Lambda vs Proc:
Lambdas check the number of arguments, while procs do not.
Return within the proc would exit the method from where it is called.
Return within a lambda would exit it from the lambda and the method would continue executing.
Lambdas are closer to a method.
Blocks: They are called closures in other languages, it is a way of grouping code/statements. In ruby single line blocks are written in {} and multi-line blocks are represented using do..end.
Block is not an object and can not be saved in a variable. Lambda and Proc are both an object.
So, let do small code test based on this answer:
# ruby 2.5.1
user system total real
0.016815 0.000000 0.016815 ( 0.016823)
0.023170 0.000001 0.023171 ( 0.023186)
0.117713 0.000000 0.117713 ( 0.117775)
0.217361 0.000000 0.217361 ( 0.217388)
This shows that using block.call is almost 2x slower than using yield.
Thanks, #engineersmnky, for good references in comments.
Proc is an object wrapper over block. Lambda basically is a proc with different behavior.
AFAIK pure blocks are more rational to use compared to procs.
def f
yield 123
end
Should be faster than
def g(&block)
block.call(123)
end
But proc can be passed on further.
I guess you should find some articles with performance comparison on the toppic
IMO, your block_print_value method is poorly designed/named, which makes it impossible to answer your question directly. From the name of the method, we would expect that the method "prints" something, but the only printing is the border condition, which does a
print "nil\n"
So, while I would strongly vote against using this way to print the tree, it doesn't mean that the whole idea of using a block for the printing problem is bad.
Since your problem looks like a programming assignment, I don't post a whole solution, but give a hint:
Replace your block_print_value by a, say block_visit_value, which does the same like your current method, but doesn't do any printing. Instead, the "else" part could also invoke the block to let it do the printing.
I'm sure that you will see afterwards the advantage of this method. If not, come back here for a discussion.
At a high level, procs are methods that can be stored inside variables like so:
full_name = Proc.new { |first,last| first + " " + last }
I can call this in two ways, using the bracket syntax followed by the arguments I want to pass to it or use the call method to run the proc and pass in arguments inside of parentheses like so:
p full_name.call("Daniel","Cortes")
What I did with the first line above is create a new instance of Proc and assigned it to a variable called full_name. Procs can take a code block as a parameter so I passed it two different arguments, arguments go inside the pipes.
I can also make it print my name five times:
full_name = Proc.new { |first| first * 5 }
The block I was referring to is called a closure in other programming languages. Blocks allow you to group statements together and encapsulate behavior. You can create blocks with curly braces or do...end syntax.
Why use Procs?
The answer is Procs give you more flexibility than methods. With Procs you can store an entire set of processes inside a variable and then call the variable anywhere else in your program.
Similar to Procs, Lambdas allow you to store functions inside a variable and call the method from other parts of the program. So really the same code I had above can be used like so:
full_name = lambda { |first,last| first + " " + last }
p full_name["daniel","cortes"]
So what is the difference between the two?
There are two key differences in addition to syntax. Please note that the differences are subtle, even to the point that you may never even notice them while programming.
The first key difference is that Lambdas count the arguments you pass to them whereas Procs do not. For example:
full_name = lambda { |first,last| first + " " + last }
p full_name.call("Daniel","Cortes")
The code above works, however, if I pass it another argument:
p full_name.call("Daniel","Abram","Cortes")
The application throws an error saying that I am passing in the wrong number of arguments.
However, with Procs it will not throw an error. It simply looks at the first two arguments and ignores anything after that.
Secondly, Lambdas and Procs have different behavior when it comes to returning values from methods, for example:
def my_method
x = lambda { return }
x.call
p "Text within method"
end
If I run this method, it prints out Text within method. However, if we try the same exact implementation with a Proc:
def my_method
x = Proc.new { return }
x.call
p "Text within method"
end
This will return a nil value.
Why did this occur?
When the Proc saw the word return it exited out of the entire method and returned a nil value. However, in the case of the Lambda, it processed the remaining part of the method.

How does a code block in Ruby know what variable belongs to an aspect of an object?

Consider the following:
(1..10).inject{|memo, n| memo + n}
Question:
How does n know that it is supposed to store all the values from 1..10? I'm confused how Ruby is able to understand that n can automatically be associated with (1..10) right away, and memo is just memo.
I know Ruby code blocks aren't the same as the C or Java code blocks--Ruby code blocks work a bit differently. I'm confused as to how variables that are in between the upright pipes '|' will automatically be assigned to parts of an object. For example:
hash1 = {"a" => 111, "b" => 222}
hash2 = {"b" => 333, "c" => 444}
hash1.merge(hash2) {|key, old, new| old}
How do '|key, old, new|' automatically assign themselves in such a way such that when I type 'old' in the code block, it is automatically aware that 'old' refers to the older hash value? I never assigned 'old' to anything, just declared it. Can someone explain how this works?
The parameters for the block are determined by the method definition. The definition for reduce/inject is overloaded (docs) and defined in C, but if you wanted to define it, you could do it like so (note, this doesn't cover all the overloaded cases for the actual reduce definition):
module Enumerable
def my_reduce(memo=nil, &blk)
# if a starting memo is not given, it defaults to the first element
# in the list and that element is skipped for iteration
elements = memo ? self : self[1..-1]
memo ||= self[0]
elements.each { |element| memo = blk.call(memo, element) }
memo
end
end
This method definition determines what values to use for memo and element and calls the blk variable (a block passed to the method) with them in a specific order.
Note, however, that blocks are not like regular methods, because they don't check the number of arguments. For example: (note, this example shows the usage of yield which is another way to pass a block parameter)
def foo
yield 1
end
# The b and c variables here will be nil
foo { |a, b, c| [a,b,c].compact.sum } # => 1
You can also use deconstruction to define variables at the time you run the block, for example if you wanted to reduce over a hash you could do something like this:
# this just copies the hash
{a: 1}.reduce({}) { |memo, (key, val)| memo[key] = val; memo }
How this works is, calling reduce on a hash implicitly calls to_a, which converts it to a list of tuples (e.g. {a: 1}.to_a = [[:a, 1]]). reduce passes each tuple as the second argument to the block. In the place where the block is called, the tuple is deconstructed into separate key and value variables.
A code block is just a function with no name. Like any other function, it can be called multiple times with different arguments. If you have a method
def add(a, b)
a + b
end
How does add know that sometimes a is 5 and sometimes a is 7?
Enumerable#inject simply calls the function once for each element, passing the element as an argument.
It looks a bit like this:
module Enumerable
def inject(memo)
each do |el|
memo = yield memo, el
end
memo
end
end
And memo is just memo
what do you mean, "just memo"? memo and n take whatever values inject passes. And it is implemented to pass accumulator/memo as first argument and current collection element as second argument.
How do '|key, old, new|' automatically assign themselves
They don't "assign themselves". merge assigns them. Or rather, passes those values (key, old value, new value) in that order as block parameters.
If you instead write
hash1.merge(hash2) {|foo, bar, baz| bar}
It'll still work exactly as before. Parameter names mean nothing [here]. It's actual values that matter.
Just to simplify some of the other good answers here:
If you are struggling understanding blocks, an easy way to think of them is as a primitive and temporary method that you are creating and executing in place, and the values between the pipe characters |memo| is simply the argument signature.
There is no special special concept behind the arguments, they are simply there for the method you are invoking to pass a variable to, like calling any other method with an argument. Similar to a method, the arguments are "local" variables within the scope of the block (there are some nuances to this depending on the syntax you use to call the block, but I digress, that is another matter).
The method you pass the block to simply invokes this "temporary method" and passes the arguments to it that it is designed to do. Just like calling a method normally, with some slight differences, such as there are no "required" arguments. If you do not define any arguments to receive, it will happily just not pass them instead of raising an ArgumentError. Likewise, if you define too many arguments for the block to receive, they will simply be nil within the block, no errors for not being defined.

Using yield within a select statement

If I wanted to prune an array by a given set of parameters I would write something like this:
array = [4,5,6,7,8]
a = array.select{|i| i>=5}
puts a.inspect
which would return [5,6,7,8].
I want to write a function "filter" which accomplishes the same thing. In this case my first thought is to write something like:
array = [4,5,6,7,8]
a = filter(array) {|i| i >= 5}
puts a.inspect
What I can't figure out is how to properly call yield within the method to invoke the code block during the select statement:
a = array.select{yield}
Doesn't seem to work since it attempts to call the code block on nil, not the array within the function. What's the proper way of doing this?
Don't know if it makes sense for you, but try:
def filter(array)
array.select { |i| yield(i) }
end
array = [4,5,6,7,8]
p filter(array) {|i| i >= 5}
When you write code within braces {...} to be passed to a method, this code is called a block. It is normally passed implicitly to a method (i.e. it is not a named argument). To invoke this implicit block, you call yield.
In your case, you don't want to invoke the block yourself; you want your filter method to pass the block along to select, where the actual filtering takes place.
To "pass along a block", you can make the method's block argument explicit by using the & prefix. Note that the name block in this example is just convention; there is no special block keyword. The important part is the & character:
def filter(array, &block)
array.select(&block)
end
array = [4,5,6,7,8]
filter(array) { |i| i >= 5 } # => [5,6,7,8]

Simple way to understand returning from a block in ruby

My code is supposed to print integers in an array.
odds_n_ends = [:weezard, 42, "Trady Blix", 3, true, 19, 12.345]
ints = odds_n_ends.select { |x| if x.is_a?(Integer) then return x end }
puts ints
It gives me an error in the 2nd line - in 'block in <main>': unexpected return (LocalJumpError)
When I remove the return, the code works exactly as desired.
To find the mistake in my understanding of blocks, I read related posts post1 and post2. But, I am not able to figure out how exactly are methods and blocks being called and why my approach is incorrect.
Is there some call stack diagram explanation for this ? Any simple explanation ?
I am confused because I have only programmed in Java before.
You generally don't need to worry exactly what blocks are to use them.
In this situation, return will return from the outside scope, e.g. if these lines were in a method, then from that method. It's the same as if you put a return statement inside a loop in Java.
Additional tips:
select is used to create a copied array where only the elements satisfying the condition inside the block are selected:
only_ints = odds_n_ends.select { |x| x.is_a?(Integer) }
You're using it as a loop to "pass back" variables that are integers, in which case you'd do:
only_ints = []
odds_n_ends.each { |x| if x.is_a?(Integer) then only_ints << x end }
If you try to wrap your code in a method then it won't give you an error:
def some_method
odds_n_ends = [:weezard, 42, "Trady Blix", 3, true, 19, 12.345]
ints = odds_n_ends.select { |x| if x.is_a?(Integer) then return true end }
puts ints
end
puts some_method
This code output is true. But wait, where's puts ints??? Ruby didn't reach that. When you put return inside a Proc, then you're returning in the scope of the entire method. In your example, you didn't have any method in which you put your code, so after it encountered 'return', it didn't know where to 'jump to', where to continue to.
Array#select basically works this way: For each element of the array (represented with |x| in your code), it evaluates the block you've just put in and if the block evaluates to true, then that element will be included in the new array. Try removing 'return' from the second line and your code will work:
ints = odds_n_ends.select { |x| if x.is_a?(Integer) then true end }
However, this isn't the most Ruby-ish way, you don't have to tell Ruby to explicitly return true. Blocks (the code between the {} ) are just like methods, with the last expression being the return value of the method. So this will work just as well:
ints = odds_n_ends.select { |x| if x.is_a?(Integer) } # imagine the code between {} is
#a method, just without name like 'def is_a_integer?' with the value of the last expression
#being returned.
Btw, there's a more elegant way to solve your problem:
odds_n_ends = [:weezard, 42, "Trady Blix", 3, true, 19, 12.345]
ints = odds_n_ends.grep(Integer)
puts ints
See this link. It basically states:
Returns an array of every element in enum for which Pattern ===
element.
To understand Pattern === element, simply imagine that Pattern is a set (let's say a set of Integers). Element might or might not be an element of that set (an integer). How to find out? Use ===. If you type in Ruby:
puts Integer === 34
it will evalute to true. If you put:
puts Integer === 'hey'
it will evalute to false.
Hope this helped!
In ruby a method always returns it's last statement, so in generall you do not need to return unless you want to return prematurely.
In your case you do not need to return anything, as select will create a new array with just the elements that return true for the given block. As ruby automatically returns it's last statement using
{ |x| x.is_a?(Integer) }
would be sufficient. (Additionally you would want to return true and not x if you think about "return what select expects", but as ruby treats not nil as true it also works...)
Another thing that is important is to understand a key difference of procs (& blocks) and lambdas which is causing your problem:
Using return in a Proc will return the method the proc is used in.
Using return in a Lambdas will return it's value like a method.
Think of procs as code pieces you inject in a method and of lambdas as anonymous methods.
Good and easy to comprehend read: Understanding Ruby Blocks, Procs and Lambdas
When passing blocks to methods you should simply put the value you want to be returned as the last statement, which can also be in an if-else clause and ruby will use the last actually reached statement.

Accessing a passed block in Ruby

I have a method that accepts a block, lets call it outer. It in turn calls a method that accepts another block, call it inner.
What I would like to have happen is for outer to call inner, passing it a new block which calls the first block.
Here's a concrete example:
class Array
def delete_if_index
self.each_with_index { |element, i| ** A function that removes the element from the array if the block passed to delete_if_index is true }
end
end
['a','b','c','d'].delete_if_index { |i| i.even? }
=> ['b','d']
the block passed to delete_if_index is called by the block passed to each_with_index.
Is this possible in Ruby, and, more broadly, how much access do we have to the block within the function that receives it?
You can wrap a block in another block:
def outer(&block)
if some_condition_is_true
wrapper = lambda {
p 'Do something crazy in this wrapper'
block.call # original block
}
inner(&wrapper)
else
inner(&passed_block)
end
end
def inner(&block)
p 'inner called'
yield
end
outer do
p 'inside block'
sleep 1
end
I'd say opening up an existing block and changing its contents is Doing it WrongTM, maybe continuation-passing would help here? I'd also be wary of passing around blocks with side-effects; I try and keep lambdas deterministic and have actions like deleting stuff in the method body. In a complex application this will likely make debugging a lot easier.
Maybe the example is poorly chosen, but your concrete example is the same as:
[1,2,3,4].reject &:even?
Opening up and modifying a block strikes me as code smell. It'd be difficult to write it in a way that makes the side effects obvious.
Given your example, I think a combination of higher order functions will do what you're looking to solve.
Update: It's not the same, as pointed out in the comments. [1,2,3,4].reject(&:even?) looks at the contents, not the index (and returns [1,3], not [2,4] as it would in the question). The one below is equivalent to the original example, but isn't vary pretty.
[1,2,3,4].each_with_index.reject {|element, index| index.even? }.map(&:first)
So here's a solution to my own question. The passed in block is implicitly converted into a proc which can be received with the & parameter syntax. The proc then exists inside the closure of any nested block, as it is assigned to a local variable in scope, and can be called by it:
class Array
def delete_if_index(&proc)
ary = []
self.each_with_index { |a, i| ary << self[i] unless proc.call(i) }
ary
end
end
[0,1,2,3,4,5,6,7,8,9,10].delete_if_index {|index| index.even?}
=> [1, 3, 5, 7, 9]
Here the block is converted into a proc, and assigned to the variable proc, which is then available within the block passed to each_with_index.

Resources