Correct semantic usage of map - ruby

I am trying to understand what is a semantically right way to use map. As map can behave the same way as each, you could modify the array any way you like. But I've been told by my colleague that after map is applied, array should have
the same order and the same size.
For example, that would mean using the map to return an updated array won't be the right way to use map:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4}.compact
I've been using map and other Enumerator methods for ages and never thought about this too much. Would appreciate advice from experienced Ruby Developers.

In Computer Science, map according to Wikipedia:
In many programming languages, map is the name of a higher-order
function that applies a given function to each element of a list,
returning a list of results in the same order
This statement implies the returned value of map should be of the same length (because we're applying the function to each element). And the returned-elements are to be in the same order. So when you use map, this is what the reader expects.
How not to use map
arr.map {|i| arr.pop } #=> [3, 2]
This clearly betrays the intention of map since we have a different number of elements returned and they are not even in the original order of application. So don't use map like this. See "How to use ruby's value_at to get subhashes in a hash" and subsequent comments for further clarification and thanks to #meager for originally pointing this out to me.

Meditate on this:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4} # => [1, nil, 3, nil]
.compact # => [1, 3]
The intermediate value is an array of the same size, however it contains undesirable values, forcing the use of compact. The fallout of this is CPU time is wasted generating the nil values, then deleting them. In addition, memory is being wasted generating another array that is the same size when it shouldn't be. Imagine the CPU and memory cost in a loop that is processing thousands of elements in an array.
Instead, using the right tool cleans up the code and avoids wasting CPU or memory:
array.reject { |num| num == 2 || num == 4 } # => [1, 3]
I've been using map and other Enumerator methods for ages and never thought about this too much.
I'd recommend thinking about it. It's the little things like this that can make or break code or a system, and everything we do when programming needs to be done deliberately, avoiding all negative side-effects we can foresee.

Related

How can I return hash pairs of keys that sum up to less than a maximum value?

Given this hash:
numsHash = {5=>10, 3=>9, 4=>7, 2=>5, 20=>4}
How can I return the key-value pair of this hash if and when the sum of its keys would be under or equal to a maximum value such as 10?
The expected result would be something like:
newHash = { 5=>10, 3=>9, 2=>5 }
because the sum of these keys equals 10.
I've been obsessing with this for hours now and can't find anything that leads up to a solution.
Summary
In the first section, I provide some context and a well-commented working example of how to solve the defined knapsack problem in a matter of microseconds using a little brute force and some Ruby core classes.
In the second section, I refactor and expand on the code to demonstrate the conversion of the knapsack solution into output similar to what you want, although (as explained and demonstrated in the answer below) the correct output when there are multiple results must be a collection of Hash objects rather than a single Hash unless there are additional selection criteria not included in your original post.
Please note that this answer uses syntax and classes from Ruby 3.0, and was specifically tested against Ruby 3.0.3. While it should work on Ruby 2.7.3+ without changes, and with most currently-supported Ruby 2.x versions with some minor refactoring, your mileage may vary.
Solving the Knapsack Problem with Ruby Core Methods
This seems to be a variant of the knapsack problem, where you're trying to optimize filling a container of a given size. This is actually a complex problem that is NP-complete, so a real-world application of this type will have many different solutions and possible algorithmic approaches.
I do not claim that the following solution is optimal or suitable for general purpose solutions to this class of problem. However, it works very quickly given the provided input data from your original post.
Its suitability is primarily based on the fact that you have a fairly small number of Hash keys, and the built-in Ruby 3.0.3 core methods of Hash#permutation and Enumerable#sum are fast enough to solve this particular problem in anywhere from 44-189 microseconds on my particular machine. That seems more than sufficiently fast for the problem as currently defined, but your mileage and real objectives may vary.
# This is the size of your knapsack.
MAX_VALUE = 10
# It's unclear why you need a Hash or what you plan to do with the values of the
# Hash, but that's irrelevant to the problem. For now, just grab the keys.
#
# NB: You have to use hash rockets or the parser complains about using an
# Integer as a Symbol using the colon notation and raises SyntaxError.
nums_hash = {5 => 10, 3 => 9, 4 => 7, 2 => 5, 20 => 4}
keys = nums_hash.keys
# Any individual element above MAX_VALUE won't fit in the knapsack anyway, so
# discard it before permutation.
keys.reject! { _1 > MAX_VALUE }
# Brute force it by evaluating all possible permutations of your array, dropping
# elements from the end of each sub-array until all remaining elements fit.
keys.permutation.map do |permuted_array|
loop { permuted_array.sum > MAX_VALUE ? permuted_array.pop : break }
permuted_array
end
Returning an Array of Matching Hashes
The code above just returns the list of keys that will fit into your knapsack, but per your original post you then want to return a Hash of matching key/value pairs. The problem here is that you actually have more than one set of Hash objects that will fit the criteria, so your collection should actually be an Array rather than a single Hash. Returning only a single Hash would basically return the original Hash minus any keys that exceed your MAX_VALUE, and that's unlikely to be what's intended.
Instead, now that you have a list of keys that fit into your knapsack, you can iterate through your original Hash and use Hash#select to return an Array of unique Hash objects with the appropriate key/value pairs. One way to do this is to use Enumerable#reduce to call Hash#merge on each Hash element in the subarrays to convert the final result to an Array of Hash objects. Next, you should call Enumerable#unique to remove any Hash that is equivalent except for its internal ordering.
For example, consider this redesigned code:
MAX_VALUE = 10
def possible_knapsack_contents hash
hash.keys.reject! { _1 > MAX_VALUE }.permutation.map do |a|
loop { a.sum > MAX_VALUE ? a.pop : break }; a
end.sort
end
def matching_elements_from hash
possible_knapsack_contents(hash).map do |subarray|
subarray.map { |i| hash.select { |k, _| k == i } }.
reduce({}) { _1.merge _2 }
end.uniq
end
hash = {5 => 10, 3 => 9, 4 => 7, 2 => 5, 20 => 4}
matching_elements_from hash
Given the defined input, this would yield 24 hashes if you didn't address the uniqueness issue. However, by calling #uniq on the final Array of Hash objects, this will correctly yield the 7 unique hashes that fit your defined criteria if not necessarily the single Hash you seem to expect:
[{2=>5, 3=>9, 4=>7},
{2=>5, 3=>9, 5=>10},
{2=>5, 4=>7},
{2=>5, 5=>10},
{3=>9, 4=>7},
{3=>9, 5=>10},
{4=>7, 5=>10}]

Can I count on partition preserving order?

Say I have a sorted Array, such as this:
myArray = [1, 2, 3, 4, 5, 6]
Suppose I call Enumerable#partition on it:
p myArray.partition(&:odd?)
Must the output always be the following?
[[1, 3, 5], [2, 4, 6]]
The documentation doesn't state this; this is what it says:
partition { |obj| block } → [ true_array, false_array ]
partition → an_enumerator
Returns two arrays, the first containing the elements of enum for which the block evaluates to true, the second containing the rest.
If no block is given, an enumerator is returned instead.
But it seems logical to assume partition works this way.
Through testing Matz's interpreter, it appears to be the case that the output works like this, and it makes full sense for it to be like this. However, can I count on partition working this way regardless of the Ruby version or interpreter?
Note: I made implementation-agnostic because I couldn't find any other tag that describes my concern. Feel free to change the tag to something better if you know about it.
No, you can't rely on the order. The reason is parallelism.
A traditional serial implementation of partition would loop through each element of the array evaluating the block one at a time in order. As each call to odd returns, it's immediately pushed into the appropriate true or false array.
Now imagine an implementation which takes advantage of multiple CPU cores. It still iterates through the array in order, but each call to odd can return out of order. odd(myArray[2]) might return before odd(myArray[0]) resulting in [[3, 1, 5], [2, 4, 6]].
List processing idioms such as partition which run a list through a function (most of Enumerable) benefit greatly from parallel processing, and most computers these days have multiple cores. I wouldn't be surprised if a future Ruby implementation took advantage of this. The writers of the API documentation for Enumerable likely carefully omitted any mention of process ordering to leave this optimization possibility open.
The documentation makes no explicit mention of this, but judging from the official code, it does retain ordering:
static VALUE
partition_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, arys))
{
struct MEMO *memo = MEMO_CAST(arys);
VALUE ary;
ENUM_WANT_SVALUE();
if (RTEST(enum_yield(argc, i))) {
ary = memo->v1;
}
else {
ary = memo->v2;
}
rb_ary_push(ary, i);
return Qnil;
}
This code gets called from the public interface.
Essentially, the ordering in which your enumerable emits objects gets retained with the above logic.

Combine array of array into all possible combinations, forward only, in Ruby

I have an array of arrays, like so:
[['1','2'],['a','b'],['x','y']]
I need to combine those arrays into a string containing all possible combinations of all three sets, forward only. I have seen lots of examples of all possible combinations of the sets in any order, that is not what I want. For example, I do not want any of the elements in the first set to come after the second set, or any in the third set to come before the first, or second, and so on. So, for the above example, the output would be:
['1ax', '1ay', '1bx', '1by', '2ax', '2ay', '2bx', '2by']
The number of arrays, and length of each set is dynamic.
Does anybody know how to solve this in Ruby?
Know your Array#product:
a = [['1','2'],['a','b'],['x','y']]
a.first.product(*a[1..-1]).map(&:join)
Solved using a recursive, so-called "Dynamic Programming" approach:
For n-arrays, combine the entries of the first array with each result on the remaining (n-1) arrays
For a single array, the answer is just that array
In code:
def variations(a)
first = a.first
if a.length==1 then
first
else
rest = variations(a[1..-1])
first.map{ |x| rest.map{ |y| "#{x}#{y}" } }.flatten
end
end
p variations([['1','2'],['a','b'],['x','y']])
#=> ["1ax", "1ay", "1bx", "1by", "2ax", "2ay", "2bx", "2by"]
puts variations([%w[a b],%w[M N],['-'],%w[x y z],%w[0 1 2]]).join(' ')
#=> aM-x0 aM-x1 aM-x2 aM-y0 aM-y1 aM-y2 aM-z0 aM-z1 aM-z2 aN-x0 aN-x1 aN-x2
#=> aN-y0 aN-y1 aN-y2 aN-z0 aN-z1 aN-z2 bM-x0 bM-x1 bM-x2 bM-y0 bM-y1 bM-y2
#=> bM-z0 bM-z1 bM-z2 bN-x0 bN-x1 bN-x2 bN-y0 bN-y1 bN-y2 bN-z0 bN-z1 bN-z2
You could also reverse the logic, and with care you should be able to implement this non-recursively. But the recursive answer is rather straightforward. :)
Pure, reduce with product:
a = [['1','2'],['a','b'],['x','y']]
a.reduce() { |acc, n| acc.product(n).map(&:flatten) }.map(&:join)
# => ["1ax", "1ay", "1bx", "1by", "2ax", "2ay", "2bx", "2by"]

Count, size, length...too many choices in Ruby?

I can't seem to find a definitive answer on this and I want to make sure I understand this to the "n'th level" :-)
a = { "a" => "Hello", "b" => "World" }
a.count # 2
a.size # 2
a.length # 2
a = [ 10, 20 ]
a.count # 2
a.size # 2
a.length # 2
So which to use? If I want to know if a has more than one element then it doesn't seem to matter but I want to make sure I understand the real difference. This applies to arrays too. I get the same results.
Also, I realize that count/size/length have different meanings with ActiveRecord. I'm mostly interested in pure Ruby (1.92) right now but if anyone wants to chime in on the difference AR makes that would be appreciated as well.
Thanks!
For arrays and hashes size is an alias for length. They are synonyms and do exactly the same thing.
count is more versatile - it can take an element or predicate and count only those items that match.
> [1,2,3].count{|x| x > 2 }
=> 1
In the case where you don't provide a parameter to count it has basically the same effect as calling length. There can be a performance difference though.
We can see from the source code for Array that they do almost exactly the same thing. Here is the C code for the implementation of array.length:
static VALUE
rb_ary_length(VALUE ary)
{
long len = RARRAY_LEN(ary);
return LONG2NUM(len);
}
And here is the relevant part from the implementation of array.count:
static VALUE
rb_ary_count(int argc, VALUE *argv, VALUE ary)
{
long n = 0;
if (argc == 0) {
VALUE *p, *pend;
if (!rb_block_given_p())
return LONG2NUM(RARRAY_LEN(ary));
// etc..
}
}
The code for array.count does a few extra checks but in the end calls the exact same code: LONG2NUM(RARRAY_LEN(ary)).
Hashes (source code) on the other hand don't seem to implement their own optimized version of count so the implementation from Enumerable (source code) is used, which iterates over all the elements and counts them one-by-one.
In general I'd advise using length (or its alias size) rather than count if you want to know how many elements there are altogether.
Regarding ActiveRecord, on the other hand, there are important differences. check out this post:
Counting ActiveRecord associations: count, size or length?
There is a crucial difference for applications which make use of database connections.
When you are using many ORMs (ActiveRecord, DataMapper, etc.) the general understanding is that .size will generate a query that requests all of the items from the database ('select * from mytable') and then give you the number of items resulting, whereas .count will generate a single query ('select count(*) from mytable') which is considerably faster.
Because these ORMs are so prevalent I following the principle of least astonishment. In general if I have something in memory already, then I use .size, and if my code will generate a request to a database (or external service via an API) I use .count.
In most cases (e.g. Array or String) size is an alias for length.
count normally comes from Enumerable and can take an optional predicate block. Thus enumerable.count {cond} is [roughly] (enumerable.select {cond}).length -- it can of course bypass the intermediate structure as it just needs the count of matching predicates.
Note: I am not sure if count forces an evaluation of the enumeration if the block is not specified or if it short-circuits to the length if possible.
Edit (and thanks to Mark's answer!): count without a block (at least for Arrays) does not force an evaluation. I suppose without formal behavior it's "open" for other implementations, if forcing an evaluation without a predicate ever even really makes sense anyway.
I found a good answare at http://blog.hasmanythrough.com/2008/2/27/count-length-size
In ActiveRecord, there are several ways to find out how many records
are in an association, and there are some subtle differences in how
they work.
post.comments.count - Determine the number of elements with an SQL
COUNT query. You can also specify conditions to count only a subset of
the associated elements (e.g. :conditions => {:author_name =>
"josh"}). If you set up a counter cache on the association, #count
will return that cached value instead of executing a new query.
post.comments.length - This always loads the contents of the
association into memory, then returns the number of elements loaded.
Note that this won't force an update if the association had been
previously loaded and then new comments were created through another
way (e.g. Comment.create(...) instead of post.comments.create(...)).
post.comments.size - This works as a combination of the two previous
options. If the collection has already been loaded, it will return its
length just like calling #length. If it hasn't been loaded yet, it's
like calling #count.
Also I have a personal experience:
<%= h(params.size.to_s) %> # works_like_that !
<%= h(params.count.to_s) %> # does_not_work_like_that !
We have a several ways to find out how many elements in an array like .length, .count and .size. However, It's better to use array.size rather than array.count. Because .size is better in performance.
Adding more to Mark Byers answer. In Ruby the method array.size is an alias to Array#length method. There is no technical difference in using any of these two methods. Possibly you won't see any difference in performance as well. However, the array.count also does the same job but with some extra functionalities Array#count
It can be used to get total no of elements based on some condition. Count can be called in three ways:
Array#count # Returns number of elements in Array
Array#count n # Returns number of elements having value n in Array
Array#count{|i| i.even?} Returns count based on condition invoked on each element array
array = [1,2,3,4,5,6,7,4,3,2,4,5,6,7,1,2,4]
array.size # => 17
array.length # => 17
array.count # => 17
Here all three methods do the same job. However here is where the count gets interesting.
Let us say, I want to find how many array elements does the array contains with value 2
array.count 2 # => 3
The array has a total of three elements with value as 2.
Now, I want to find all the array elements greater than 4
array.count{|i| i > 4} # =>6
The array has total 6 elements which are > than 4.
I hope it gives some info about count method.

Ruby - return an array in random order

What is the easiest way to return an array in random order in Ruby?
Anything that is nice and short that can be used in an IRB session like
[1,2,3,4,5].random()
# or
random_sort([1,2,3,4,5])
array.shuffle
If you don't have [].shuffle, [].sort_by{rand} works as pointed out by sepp2k. .sort_by temporarily replaces each element by something for the purpose of sorting, in this case, a random number.
[].sort{rand-0.5} however, won't properly shuffle. Some languages (e.g. some Javascript implementations) don't properly shuffle arrays if you do a random sort on the array, with sometimes rather public consequences.
JS Analysis (with graphs!): http://www.robweir.com/blog/2010/02/microsoft-random-browser-ballot.html
Ruby is no different! It has the same problem. :)
#sort a bunch of small arrays by rand-0.5
a=[]
100000.times{a << [0,1,2,3,4].sort{rand-0.5}}
#count how many times each number occurs in each position
b=[]
a.each do |x|
x.each_index do |i|
b[i] ||=[]
b[i][x[i]] ||= 0
b[i][x[i]] += 1
end
end
p b
=>
[[22336, 18872, 14814, 21645, 22333],
[17827, 25005, 20418, 18932, 17818],
[19665, 15726, 29575, 15522, 19512],
[18075, 18785, 20283, 24931, 17926],
[22097, 21612, 14910, 18970, 22411]]
Each element should occur in each position about 20000 times. [].sort_by(rand) gives much better results.
#sort with elements first mapped to random numbers
a=[]
100000.times{a << [0,1,2,3,4].sort_by{rand}}
#count how many times each number occurs in each position
...
=>
[[19913, 20074, 20148, 19974, 19891],
[19975, 19918, 20024, 20030, 20053],
[20028, 20061, 19914, 20088, 19909],
[20099, 19882, 19871, 19965, 20183],
[19985, 20065, 20043, 19943, 19964]]
Similarly for [].shuffle (which is probably fastest)
[[20011, 19881, 20222, 19961, 19925],
[19966, 20199, 20015, 19880, 19940],
[20062, 19894, 20065, 19965, 20014],
[19970, 20064, 19851, 20043, 20072],
[19991, 19962, 19847, 20151, 20049]]
What about this?
Helper methods for Enumerable, Array, Hash, and String
that let you pick a random item or shuffle the order of items.
http://raa.ruby-lang.org/project/rand/

Resources