I recently discovered Ruby's blocks and yielding features, and I was wondering: where does this fit in terms of computer science theory? Is it a functional programming technique, or something more specific?
Ruby's yield is not an iterator like in C# and Python. yield itself is actually a really simple concept once you understand how blocks work in Ruby.
Yes, blocks are a functional programming feature, even though Ruby is not properly a functional language. In fact, Ruby uses the method lambda to create block objects, which is borrowed from Lisp's syntax for creating anonymous functions — which is what blocks are. From a computer science standpoint, Ruby's blocks (and Lisp's lambda functions) are closures. In Ruby, methods usually take only one block. (You can pass more, but it's awkward.)
The yield keyword in Ruby is just a way of calling a block that's been given to a method. These two examples are equivalent:
def with_log
output = yield # We're calling our block here with yield
puts "Returned value is #{output}"
end
def with_log(&stuff_to_do) # the & tells Ruby to convert into
# an object without calling lambda
output = stuff_to_do.call # We're explicitly calling the block here
puts "Returned value is #{output}"
end
In the first case, we're just assuming there's a block and say to call it. In the other, Ruby wraps the block in an object and passes it as an argument. The first is more efficient and readable, but they're effectively the same. You'd call either one like this:
with_log do
a = 5
other_num = gets.to_i
#my_var = a + other_num
end
And it would print the value that wound up getting assigned to #my_var. (OK, so that's a completely stupid function, but I think you get the idea.)
Blocks are used for a lot of things in Ruby. Almost every place you'd use a loop in a language like Java, it's replaced in Ruby with methods that take blocks. For example,
[1,2,3].each {|value| print value} # prints "123"
[1,2,3].map {|value| 2**value} # returns [2, 4, 8]
[1,2,3].reject {|value| value % 2 == 0} # returns [1, 3]
As Andrew noted, it's also commonly used for opening files and many other places. Basically anytime you have a standard function that could use some custom logic (like sorting an array or processing a file), you'll use a block. There are other uses too, but this answer is already so long I'm afraid it will cause heart attacks in readers with weaker constitutions. Hopefully this clears up the confusion on this topic.
There's more to yield and blocks than mere looping.
The series Enumerating enumerable has a series of things you can do with enumerations, such as asking if a statement is true for any member of a group, or if it's true for all the members, or searching for any or all members meeting a certain condition.
Blocks are also useful for variable scope. Rather than merely being convenient, it can help with good design. For example, the code
File.open("filename", "w") do |f|
f.puts "text"
end
ensures that the file stream is closed when you're finished with it, even if an exception occurs, and that the variable is out of scope once you're finished with it.
A casual google didn't come up with a good blog post about blocks and yields in ruby. I don't know why.
Response to comment:
I suspect it gets closed because of the block ending, not because the variable goes out of scope.
My understanding is that nothing special happens when the last variable pointing to an object goes out of scope, apart from that object being eligible for garbage collection. I don't know how to confirm this, though.
I can show that the file object gets closed before it gets garbage collected, which usually doesn't happen immediately. In the following example, you can see that a file object is closed in the second puts statement, but it hasn't been garbage collected.
g = nil
File.open("/dev/null") do |f|
puts f.inspect # #<File:/dev/null>
puts f.object_id # Some number like 70233884832420
g = f
end
puts g.inspect # #<File:/dev/null (closed)>
puts g.object_id # The exact same number as the one printed out above,
# indicating that g points to the exact same object that f pointed to
I think the yield statement originated from the CLU language. I always wonder if the character from Tron was named after CLU too....
I think 'coroutine' is the keyword you're looking for.
E.g. http://en.wikipedia.org/wiki/Yield
Yield in computing and information science:
in computer science, a point of return (and re-entry) of a coroutine
Related
In a blog post about unconditional programming Michael Feathers shows how limiting if statements can be used as a tool for reducing code complexity.
He uses a specific example to illustrate his point. Now, I've been thinking about other specific examples that could help me learn more about unconditional/ifless/forless programming.
For example in this cat clone there is an if..else block:
#!/usr/bin/env ruby
if ARGV.length > 0
ARGV.each do |f|
puts File.read(f)
end
else
puts STDIN.read
end
It turns out ruby has ARGF which makes this program much simpler:
#!/usr/bin/env ruby
puts ARGF.read
I'm wondering if ARGF didn't exist how could the above example be refactored so there is no if..else block?
Also interested in links to other illustrative specific examples.
Technically you can,
inputs = { ARGV => ARGV.map { |f| File.open(f) }, [] => [STDIN] }[ARGV]
inputs.map(&:read).map(&method(:puts))
Though that's code golf and too clever for its own good.
Still, how does it work?
It uses a hash to store two alternatives.
Map ARGV to an array of open files
Map [] to an array with STDIN, effectively overwriting the ARGV entry if it is empty
Access ARGV in the hash, which returns [STDIN] if it is empty
Read all open inputs and print them
Don't write that code though.
As mentioned in my answer to your other question, unconditional programming is not about avoiding if expressions at all costs but about striving for readable and intention revealing code. And sometimes that just means using an if expression.
You can't always get rid of a conditional (maybe with an insane number of classes) and Michael Feathers isn't advocating that. Instead it's sort of a backlash against overuse of conditionals. We've all seen nightmare code that's endless chains of nested if/elsif/else and so has he.
Moreover, people do routinely nest conditionals inside of conditionals. Some of the worst code I've ever seen is a cavernous nightmare of nested conditions with odd bits of work interspersed within them. I suppose that the real problem with control structures is that they are often mixed with the work. I'm sure there's some way that we can see this as a form of single responsibility violation.
Rather than slavishly try to eliminate the condition, you could simplify your code by first creating an array of IO objects from ARGV, and use STDIN if that list is empty.
io = ARGV.map { |f| File.new(f) };
io = [STDIN] if !io.length;
Then your code can do what it likes with io.
While this has strictly the same number of conditionals, it eliminates the if/else block and thus a branch: the code is linear. More importantly, since it separates gathering data from using it, you can put it in a function and reuse it further reducing complexity. Once it's in a function, we can take advantage of early return.
# I don't have a really good name for this, but it's a
# common enough idiom. Perl provides the same feature as <>
def arg_files
return ARGV.map { |f| File.new(f) } if ARGV.length;
return [STDIN];
end
Now that it's in a function, your code to cat all the files or stdin becomes very simple.
arg_files.each { |f| puts f.read }
First, although the principle is good, you have to consider other things that are more importants such as readability and perhaps speed of execution.
That said, you could monkeypatch the String class to add a read method and put STDIN and the arguments in an array and start reading from the beginning until the end of the array minus 1, so stopping before STDIN if there are arguments and go on until -1 (the end) if there are no arguments.
class String
def read
File.read self if File.exist? self
end
end
puts [*ARGV, STDIN][0..ARGV.length-1].map{|a| a.read}
Before someone notices that I still use an if to check if a File exists, you should have used two if's in your example to check this also and if you don't, use a rescue to properly inform the user.
EDIT: if you would use the patch, read about the possible problems at these links
http://blog.jayfields.com/2008/04/alternatives-for-redefining-methods.html
http://www.justinweiss.com/articles/3-ways-to-monkey-patch-without-making-a-mess/
Since the read method isn't part of String the solutions using alias and super are not necessary, if you plan to use a Module, here is how to do that
module ReadString
def read
File.read self if File.exist? self
end
end
class String
include ReadString
end
EDIT: just read about a safe way to monkey patch, for your documentation see https://solidfoundationwebdev.com/blog/posts/writing-clean-monkey-patches-fixing-kaminari-1-0-0-argumenterror-comparison-of-fixnum-with-string-failed?utm_source=rubyweekly&utm_medium=email
I'm trying to understand when one should code blocks implicitly or explicitly. Given the following code blocks:
Implicit
def two_times_implicit
return "No block" unless block_given?
yield
yield
end
puts two_times_implicit { print "Hello "}
puts two_times_implicit
Explicit
def two_times_explicit (&i_am_a_block)
return "No block" if i_am_a_block.nil?
i_am_a_block.call
i_am_a_block.call
end
puts two_times_explicit { puts "Hello"}
puts two_times_explicit
Is it preferable to code using one over the other? Is there a standard practice and are there instances where one would work better or differently than the other and where one would not work at all?
Receiving a block via & creates a new proc object out of the block, so from the point of view of efficiency, it is better not to use it. However, using & generally makes it easier to define methods that may or may not take a block, and using &, you can also handle blocks together with arguments, so it is preferred by many.
Actually, according to one very interesting read, second variant is 439% slower (related thread on HackerNews).
TL;DR: Creating and passing a block via yield is a highly optimized common case in MRI, which is handled by dedicated C function in interpreter, while passing &block is implemented differently and has a big overhead of creating new environment and creating Proc itself on every call.
Summing up, use &block only if you need passing it further (for example, to a next function), or manipulate it somehow in other way. Otherwise, use yield, since it's way faster.
I came across three ways of writing a loop.
the_count = [1, 2, 3, 4, 5]
for loop 1
for number in the_count
puts "This is count #{number}"
end
for loop 2
the_count.each do |count|
puts "The counter is at: #{count}"
end
for loop 3
the_count.each {|i| puts "I got #{i}"}
Are there situations in which one way is a good practice or better solution than the other two? The first one is the most similar to the ones in other languages, and for me, the third one looks unorderly.
The first option is generally discouraged. It is possible in ruby to be more friendly towards developers coming from other languages (as they recognize the syntax) but it behaves a bit strange regarding variable visibility. Generally, you should avoid this variant everywhere and use only one of the block variants.
The advantage of the two other variants is that it works the same for all methods accepting a block, e.g. map, reduce, take_while and others.
The two bottom variants are mostly equivalent You use the each method and provide it with a block. The each method calls the block once for each element in the array.
Which one you use is mostly up to preference. Most people tend to use the one with braces for simple blocks which don't require a line-break. If you want to use a line-break in your block, e.g. if you have multiple statements there, you should use the do...end variant. This makes your code more readable.
There are other slightly more nuanced opinions on when you should use one or the other (e.g. some always use the braces form when writing functional block, i.e. ones which don't affect the outside of the block even when they are longer), but if you follow this above advice, you will please at least 98% of all ruby developers reading your code.
Thus, in conclusion, avoid the for i in ... variants (the same counts for while, until, ...) and always use the block-form. Use the do...end of block for complex blocks and the braces-form for simple one-line blocks.
When you use the the block form, you should however be aware of the slight differences in priority when chaining methods.
This
foo bar { |i| puts i }
is equivalent to
foo(bar{|i| puts i})
while
foo bar do |i|
puts i
end
is equivalent to
foo(bar) { |i| puts i }
As you can see, in the braces form, the block is passed to the right-most method while in the do...end form, the block is passed to the left-most method. You can always resolve the ambiguity with parenthesis though.
It should be noted that this is trade-off between idiomatic Ruby (solutions 2 and 3) and performant Ruby (using while loops, because for …in uses each under the hood) as pointed out in Yet Another Language Speed Test: Counting Primes:
Notably, it should be mentioned that writing idiomatic Python and Ruby results in much slower code than that used here. Ranges bad. While loops good.
While it's generally encouraged to opt for idiomatic Ruby, there are perfectly valid situations where you want to ignore that advice.
This question already has answers here:
How to sum array of numbers in Ruby?
(16 answers)
Closed 8 years ago.
Q: Write a method, sum which takes an array of numbers and returns the sum of the numbers.
A:
def sum(nums)
total = 0
i = 0
while i < nums.count
total += nums[i]
i += 1
end
# return total
total
end
There has to be another way to solve this without using while, right? Anyone know how?
Edit: This is not an exam or test. This is a practice problem provided on github for app academy. They provide the question and answer as an example. I just read however that good programmers don't like to use while or unless, so I was curious if I could learn something to solve this problem a better way. Like with enumerable? (Noob at Ruby here, obviously..)
Also, I would love any walkthrough or methods that I should learn.. This question is also different because I am asking for specific examples using this data.
The usual way of doing that would be this:
def sum(nums) nums.reduce(&:+) end
which is short for something like this:
def sum(nums) nums.reduce(0) { |total, num| total + num } end
I see that Neil posted a similar solution while I was typing this, so I'll just note that reduce and inject are two names for the same method - Ruby has several aliases like this so that people used to different other languages can find what they're looking for. He also left off the &, which is optional when using a named method for reduce/inject, but not in other cases.
Explanation follows.
In Ruby you don't normally use explicit loops (for, while, etc.). Instead you call methods on the collection you're iterating over, and pass them a block of code to execute on each item. Ruby's syntax places the block after the arguments to the method, between either do...end or {...}, so it looks like traditional imperative flow control, but it works differently.
The basic iteration method is each:
[1,2,3].each do |i| puts i end
That calls the block do |i| puts i end three times, passing it 1, then passing it 2, and finally passing it 3. The |i| is a block parameter, which tells Ruby where to put the value(s) passed into the block each time.
But each just throws away the return value of the block calls (in this case, the three nils returned by puts). If you want to do something with those return values, you have to call a different method. For example, map returns an array of the return values:
[1,2,3].map do |i| puts i end
#=> [nil, nil, nil]
That's not very interesting here, but it becomes more useful if the block returns something:
[1,2,3].map do |i| 2*i end
#=> [2,4,6]
If you want to combine the results into a single aggregate return value instead of getting back an array that's the same size as the input, that's when you reach for reduce. In addition to a block, it takes an extra argument, and the block itself is also called with an extra argument. The extra parameter corresponding to this argument is called the "accumulator"; the first time the block is called, it gets the argument originally passed to reduce, but from then on, it gets the return value of the previous call to the block, which is how each block call can pass information along to the next.
That makes reduce more general than map; in fact, you can build map out of reduce by passing in an empty array and having the block add to it:
[1,2,3].reduce([]) do |a,i| a + [2*i] end
#=> [2,4,6]
But since map is already defined, you would normally just use it for that, and only use reduce to do things that are more, well, reductive:
[1,2,3].reduce(0) do |s, i| s + 2*i end
#=> 12
...which is what we're doing in solving your problem.
Neil and I took a couple extra shortcuts. First, if a block does nothing but call a single method on its parameters and return the result, you can get an equivalent block by prefixing &: to the method name. That is, this:
some_array.reduce(x) do |a,b| a.some_method(b) end
can be rewritten more simply as this:
some_array.reduce(x, &:some_method)
and since a + b in Ruby is really just a more-familiar way of writing the method call a.+(b), that means that you can add up numbers by just passing in &:+:
[1,2,3].reduce(0, &:+)
#=> 6
Next, the initial accumulator value for reduce is optional; if you leave it out, then the first time the block is called, it gets the first two elements of the array. So you can leave off the 0:
[1,2,3].reduce(&:+)
#=> 6
Finally, you normally need the & any time you are passing in a block that is not a literal chunk of code. You can turn blocks into Proc objects and store them in variables and in general treat them like any other value, including passing them as regular arguments to method calls. So when you want to use one as the block on a method call instead, you indicate that with the &.
Some methods, including reduce, will also accept a bare Symbol (like :+) and create the Proc/block for you; and Neil took advantage of that fact. But other iterator methods, such as map, don't work that way:
irb(main):001:0> [-1,2,-3].map(:abs)
ArgumentError: wrong number of arguments (1 for 0)
from (irb):1:in `map'
from (irb):1
from /usr/bin/irb:12:in `<main>'
So I just always use the &.
irb(main):002:0> [-1,2,-3].map(&:abs)
#=> [1, 2, 3]
There are lots of good online tutorials for Ruby. For more general information about map/reduce and related concepts, and how to apply them to problem-solving, you should search for introductions to "functional programming", which is called that because it treats "functions" (that is, blocks of executable code, which in Ruby are realized as Proc objects) as values just like numbers and strings, which can be passed around, assigned to variables, etc.
Probably the most idiomatic way of doing this in Ruby is:
nums.inject(:+)
. . . although this basically hides all the working, so it depends what the test is trying to test.
Documentation for Array#inject
As is stated in the title, I was curious to know why Ruby decided to go away from classical for loops and instead use the array.each do ...
I personally find it a little less readable, but that's just my personal opinion. No need to argue about that. On the other hand, I suppose they designed it that way on purpose, there should be a good reason behind.
So, what are the advantages of putting loops that way? What is the "raison d'etre" of this design decision?
This design decision is a perfect example of how Ruby combines the object oriented and functional programming paradigms. It is a very powerful feature that can produce simple readable code.
It helps to understand what is going on. When you run:
array.each do |el|
#some code
end
you are calling the each method of the array object, which, if you believe the variable name, is an instance of the Array class. You are passing in a block of code to this method (a block is equivalent to a function). The method can then evaluate this block and pass in arguments either by using block.call(args) or yield args. each simply iterates through the array and for each element it calls the block you passed in with that element as the argument.
If each was the only method to use blocks, this wouldn't be that useful but many other methods and you can even create your own. Arrays, for example have a few iterator methods including map, which does the same as each but returns a new array containing the return values of the block and select which returns a new array that only contains the elements of the old array for which the block returns a true value. These sorts of things would be tedious to do using traditional looping methods.
Here's an example of how you can create your own method with a block. Let's create an every method that acts a bit like map but only for every n items in the array.
class Array #extending the built in Array class
def every n, &block #&block causes the block that is passed in to be stored in the 'block' variable. If no block is passed in, block is set to nil
i = 0
arr = []
while i < self.length
arr << ( block.nil? ? self[i] : block.call(self[i]) )#use the plain value if no block is given
i += n
end
arr
end
end
This code would allow us to run the following:
[1,2,3,4,5,6,7,8].every(2) #= [1,3,5,7] #called without a block
[1,2,3,4,5,6,7,8,9,10].every(3) {|el| el + 1 } #= [2,5,8,11] #called with a block
Blocks allow for expressive syntax (often called internal DSLs), for example, the Sinatra web microframework.
Sinatra uses methods with blocks to succinctly define http interaction.
eg.
get '/account/:account' do |account|
#code to serve of a page for this account
end
This sort of simplicity would be hard to achieve without Ruby's blocks.
I hope this has allowed you to see how powerful this language feature is.
I think it was mostly because Matz was interested in exploring what a fully object oriented scripting language would look like when he built it; this feature is based heavily on the CLU programming language's iterators.
It has turned out to provide some interesting benefits; a class that provides an each method can 'mix in' the Enumerable module to provide a huge variety of pre-made iteration routines to clients, which reduces the amount of tedious boiler-plate array/list/hash/etc iteration code that must be written. (Ever see java 4 and earlier iterators?)
I think you are kind of biased when you ask that question. Another might ask "why were C for loops designed that way?". Think about it - why would I need to introduce counter variable if I only want to iterate through array's elements? Say, compare these two (both in pseudocode):
for (i = 0; i < len(array); i++) {
elem = array[i];
println(elem);
}
and
for (elem in array) {
println(elem);
}
Why would the first feel more natural than the second, except for historical (almost sociological) reasons?
And Ruby, highly object-oriented as is, takes this even further, making it an array method:
array.each do |elem|
puts elem
end
By making that decision, Matz just made the language lighter for superfluous syntax construct (foreach loop), delegating its use to ordinary methods and blocks (closures). I appreciate Ruby the most just for this very reason - being really rational and economical with language features, but retaining expressiveness.
I know, I know, we have for in Ruby, but most of the people consider it unneccessary.
The do ... end blocks (or { ... }) form a so-called block (almost a closure, IIRC). Think of a block as an anonymous method, that you can pass as argument to another method. Blocks are used a lot in Ruby, and thus this form of iteration is natural for it: the do ... end block is passed as an argument to the method each. Now you can write various variations to each, for example to iterate in reverse or whatnot.
There's also the syntactic sugar form:
for element in array
# Do stuff
end
Blocks are also used for example to filter an array:
array = (1..10).to_a
even = array.select do |element|
element % 2 == 0
end
# "even" now contains [2, 4, 6, 8, 10]
I think it's because it emphasizes the "everything is an object" philosophy behind Ruby: the each method is called on the object.
Then switching to another iterator is much smoother than changing the logic of, for example, a for loop.
Ruby was designed to be expressive, to read as if it was being spoken... Then I think it just evolved from there.
This comes from Smalltalk, that implements control structures as methods, thus reducing the number of keywords and simplifying the parser. Thus allowing controll strucures to serve as proff of concept for the language definition.
In ST, even if conditions are methods, in the fashion:
boolean.ifTrue ->{executeIfBody()}, :else=>-> {executeElseBody()}
In the end, If you ignore your cultural bias, what will be easier to parse for the machine will also be easier to parse by yourself.