Why is Ruby's Array.map() also called Array.collect()? - ruby

Whenever I see Ruby code that says:
arrayNames.collect { ... }
I forget what collect is and have to look up what it is, and find that it is the same as map().
Map, I can understand, mapping 1 byte to a pixel, and function is to map an x to a y, a 2 to a 4, a 5 to a 25, etc. But where does the name "collect" come from? Maybe that will help to remember what a "collect" method is.

It comes from Smalltalk old days. Smalltalk used collect and select instead of map and filter (as used in many other languages) for iterating on its collections.

To add to the other answers, it is kind of an inside-joke in Smalltalk:
inject:into:
collect:
select:
reject:
detect:
Spot the pattern?

kriss is right that the method name has its origins in Smalltalk but to help remember what it does when you see it used you can think of it as "collecting the results from the block in a new array".

Related

Stuck with rdoc: Any way to add indentation to call-seq?

I know I should use YARD. Not an option in this environment.
So I'm using rdoc and call-seq is generally been great, but I want to add indentation for a call-seq that is always used in a block/yield style, something like:
call-seq:
someFunc {
example1(..)
example2(..)
}
But call-seq: removes any indentation and it becomes:
someFunc {
example1(..)
example2(..)
}
Which is sad. I know I could use a separate code block as an example, but this is really part of the general API for how this should be called, and it would be nice if I could use call-seq (instead of the ugly inverted colors of a code block).
I am guessing this isn't possible with the limitations of rdoc, but I thought I'd ask. Any thoughts?
I think you are confused about what the purpose of the calling sequence is, but the name unfortunately is really confusing. Normally, I would assume that it was designed by a non-native speaker, but if I remember my history correctly, RDoc was designed by David Thomas and Andy Hunt for their book Programming Ruby in 2000.
Anyway, calling sequence is not for documenting a sequence of calls, as one might surmise, i.e. it is not for documenting multiple lines of code.
It is for documenting what we could all in other languages multiple overloads of the method. Note that the syntax that is used for each overload is not even legal Ruby syntax (because of the →), so treating it as executable code that is to be indented does not make much sense:
inject(initial, sym) → obj
inject(sym) → obj
inject(initial) { |memo, obj| block } → obj
inject { |memo, obj| block } → obj
What this tells us is that the Enumerable#inject method has four different "overloads", 2×2 combinations: with or without initial value and specifying the binary operation either as a block or as a Symbol.
Hence why it is not possible to format code inside the calling sequence.

chaining ruby enumerator functions in a clean way

I just finished a course on ruby where the instructor takes a list of movies, groups them, then calls map, sort, and reverse. It works fine, but I don't find the syntax to be very readable and I'm trying to figure out if what I have in mind is valid. I come from a c# background.
#we can reformat our code to make it shorter
#note that a lot of people don't like calling functions on the
#end of function blocks. (I don't like the look, either)
count_by_month = movies.group_by do |movie|
movie.release_date.strftime("%B")
end.map do |month, list|
[month, list.size]
end.sort_by(&:last).reverse
What I am wondering is if I can do something like
#my question: can I do this?
count_by_month = movies.group_by(&:release_date.strftime("%B"))
.map(&:first, &:last.size)
.sort_by(&:last)
.reverse
#based on what I've seen online, I could maybe do something like
count_by_month = movies.groupBy({m -> m.release_date.strftime("%B")})
.map{|month, list| [month, list.size]}
.sort_by(&:last)
.reverse
As a number of people in the comments suggest, this is really a matter of style; that being said, I have to agree with the comments within the code and say that you want to avoid method chaining at the end of a do..end.
If you're going to split methods by line, use a do..end. {} and do...end are synonymous, as you know, but the braces are more often used (in my experience) for single-line pieces of code, and as 'mu is too short' pointed out, if you're set on using them, you may want to look into lambdas. But I'd stick to do..end in this case.
A general style rule I was taught that I follow is to split up chains if what is being worked with changes class in a way that might not be intuitive. ex: fizz = "buzz".split.reverse breaks up a string into an array, but it's clear what the code is doing.
In the example you provided, there's a lot going on that's a bit hard to follow; I like that you wrote out the group_by using hash notation in the last example because it's clear what the group_by is sorting by there and what the output is - I'd put it in a [well named] variable of its own.
grouped_by_month = movies.groupBy({m -> m.release_date.strftime("%B")})
count_by_month = grouped_by_month.map{|month, list| [month, list.size]}.sort_by(&:last).reverse
This splits up the code into one line that sets up the grouping hash and another line that manipulates it.
Again, this is style, so everyone has their own quirks; this is simply how I'd edit this based off a quick glance. You seem to be getting into Ruby quite well overall! Sometimes I just like the look of a chain of methods on one line, even if its against best practices (and I'm doing Project Euler or some other project of my own). I'd suggest looking at large projects on Github (ex: rails) to get a feel for how those far more experienced than myself write clean code. Good luck!

Can't all or most cases of `each` be replaced with `map`?

The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.

ruby shorthand for conditionals, when to use instance variables and efficiency of looping through parsed xml versus storing as hash

I'm learning ruby and have a few questions about some code I wrote for a newbie challenge. Purpose of challenge is to find country with largest population from an xml document.
I've included my code below. Questions I have are:
Is there a way to avoid having to initialize the #max_pop variable (#max_pop=0)?
Is there shorthand for combining the entire conditional block into 1 line?
Do I have to use instance vars #max_pop, #max_pop_country? Got error without them.
Which is more efficient:
Loop through each country and check if pop > max_pop (approach in code below)
Create pop hash (pop[:country]) and then find country with highest pop
Is there hash method to return key value pair for largest element in hash (to do 4.1)?
Source Code:
#max_pop=0
doc.elements.each("cia/country") do |country|
if country.attributes["population"].to_i > #max_pop
#max_pop=country.attributes["population"].to_i
#max_pop_country=country.attributes["name"]
end
end
puts "country with largest pop is #{#max_pop_country} with pop of #{#max_pop}
I am not familiar with rexml, but you ought to be able to simplify everything to something like this:
max_pop_elem = doc.elements.enum_for(:each, "cia/country").max_by { |c| c.attributes["population"].to_i }
max_pop_country = max_pop_elem.attributes["name"]
max_pop = max_pop_elem.attributes["population"].to_i
Yes, see above.
Yes, see above.
No. You should use local variables instead of instance variables when possible.
Don't worry about efficiency of CPU time until you have a slow program. Then use ruby-prof. Until then, just worry about the efficiency of coding time (do things the easy way).
Yes, just do key, value = hash.max_by{|k,v| v}.
In general, if you are going to be iterating over things you should learn about Ruby's Enumerable module. I made a reference sheet for it here.

Mathematica - can I define a block of code using a single variable?

It has been a while since I've used Mathematica, and I looked all throughout the help menu. I think one problem I'm having is that I do not know what exactly to look up. I have a block of code, with things like appending lists and doing basic math, that I want to define as a single variable.
My goal is to loop through a sequence and when needed I wanted to call a block of code that I will be using several times throughout the loop. I am guessing I should just put it all in a loop anyway, but I would like to be able to define it all as one function.
It seems like this should be an easy and straightforward procedure. Am I missing something simple?
This is the basic format for a function definition in Mathematica.
myFunc[par1_,par2_]:=Module[{localVar1,localVar2},
statement1; statement2; returnStatement ]
Your question is not entirely clear, but I interpret that you want something like this:
facRand[] :=
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b])
Now every time facRand[] is called a new random integer is factored, global variables b and x are assigned, and the value of b is printed. This could also be done with Function:
Clear[facRand]
facRand =
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b]) &
This is also called with facRand[]. This form is standard, and allows addressing or passing the symbol facRand without triggering evaluation.

Resources