How to merge sub-arrays within an array in Ruby? - ruby

I have an array which for arguments sake looks something like this:
a = [[1,100], [2,200], [3,300], [2,300]]
Of those four sub-arrays, I would like to merge any where the first element is a duplicate. So in the example above I would like to merge the 2nd and the 4th sub-arrays. However, the caveat is that where the second element in the matching sub-arrays is different, I would like to maintain the higher value.
So, I would like to see this result:
a = [[1,100], [3,300], [2,300]]
This little conundrum is a little above my Ruby skills so am turning to the community for help. Any guidance with how to tackle this is much appreciated.
Thanks

# Get a hash that maps the first entry of each subarray to the subarray
# requires 1.8.7+ or active_support (or facets, I think)
hash = a.group_by { |first, second| first }
# Take each entry in the hash and select the biggest entry for each unique key
hash.map {|k,v| v.max }

Related

How to use less memory generating Array permutation?

So I need to get all possible permutations of a string.
What I have now is this:
def uniq_permutations string
string.split(//).permutation.map(&:join).uniq
end
Ok, now what is my problem: This method works fine for small strings but I want to be able to use it with strings with something like size of 15 or maybe even 20. And with this method it uses a lot of memory (>1gb) and my question is what could I change not to use that much memory?
Is there a better way to generate permutation? Should I persist them at the filesystem and retrieve when I need them (I hope not because this might make my method slow)?
What can I do?
Update:
I actually don't need to save the result anywhere I just need to lookup for each in a table to see if it exists.
Just to reiterate what Sawa said. You do understand the scope? The number of permutations for any n elements is n!. It's about the most aggressive mathematical progression operation you can get. The results for n between 1-20 are:
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600,
6227020800, 87178291200, 1307674368000, 20922789888000, 355687428096000,
6402373705728000, 121645100408832000, 2432902008176640000]
Where the last number is approximately 2 quintillion, which is 2 billion billion.
That is 2265820000 gigabytes.
You can save the results to disk all day long - unless you own all the Google datacenters in the world you're going to be pretty much out of luck here :)
Your call to map(&:join) is what is creating the array in memory, as map in effect turns an Enumerator into an array. Depending on what you want to do, you could avoid creating the array with something like this:
def each_permutation(string)
string.split(//).permutation do |permutaion|
yield permutation.join
end
end
Then use this method like this:
each_permutation(my_string) do |s|
lookup_string(s) #or whatever you need to do for each string here
end
This doesn’t check for duplicates (no call to uniq), but avoids creating the array. This will still likely take quite a long time for large strings.
However I suspect in your case there is a better way of solving your problem.
I actually don't need to save the result anywhere I just need to lookup for each in a table to see if it exists.
It looks like you’re looking for possible anagrams of a string in an existing word list. If you take any two anagrams and sort the characters in them, the resulting two strings will be the same. Could you perhaps change your data structures so that you have a hash, with keys being the sorted string and the values being a list of words that are anagrams of that string. Then instead of checking all permutations of a new string against a list, you just need to sort the characters in the string, and use that as the key to look up the list of all strings that are permutations of that string.
Perhaps you don't need to generate all elements of the set, but rather only a random or constrained subset. I have written an algorithm to generate the m-th permutation in O(n) time.
First convert the key to a list representation of itself in the factorial number system. Then iteratively pull out the item at each index specified by the new list and of the old.
module Factorial
def factorial num; (2..num).inject(:*) || 1; end
def factorial_floor num
tmp_1 = 0
1.upto(1.0/0.0) do |counter|
break [tmp_1, counter - 1] if (tmp_2 = factorial counter) > num
tmp_1 = tmp_2 #####
end # #
end # #
end # returns [factorial, integer that generates it]
# for the factorial closest to without going over num
class Array; include Factorial
def generate_swap_list key
swap_list = []
key -= (swap_list << (factorial_floor key)).last[0] while key > 0
swap_list
end
def reduce_swap_list swap_list
swap_list = swap_list.map { |x| x[1] }
((length - 1).downto 0).map { |element| swap_list.count element }
end
def keyed_permute key
apply_swaps reduce_swap_list generate_swap_list key
end
def apply_swaps swap_list
swap_list.map { |index| delete_at index }
end
end
Now, if you want to randomly sample some permutations, ruby comes with Array.shuffle!, but this will let you copy and save permutations or to iterate through the permutohedral space. Or maybe there's a way to constrain the permutation space for your purposes.
constrained_generator_thing do |val|
Array.new(sample_size) {array_to_permute.keyed_permute val}
end
Perhaps I am missing the obvious, but why not do
['a','a','b'].permutation.to_a.uniq!

Add element to an array if it's not there already

I have a Ruby class
class MyClass
attr_writer :item1, :item2
end
my_array = get_array_of_my_class() #my_array is an array of MyClass
unique_array_of_item1 = []
I want to push MyClass#item1 to unique_array_of_item1, but only if unique_array_of_item1 doesn't contain that item1 yet. There is a simple solution I know: just iterate through my_array and check if unique_array_of_item1 already contains the current item1 or not.
Is there any more efficient solution?
#Coorasse has a good answer, though it should be:
my_array | [item]
And to update my_array in place:
my_array |= [item]
You can use Set instead of Array.
You don't need to iterate through my_array by hand.
my_array.push(item1) unless my_array.include?(item1)
Edit:
As Tombart points out in his comment, using Array#include? is not very efficient. I'd say the performance impact is negligible for small Arrays, but you might want to go with Set for bigger ones.
You can convert item1 to array and join them:
my_array | [item1]
Important to keep in mind that the Set class and the | method (also called "Set Union") will yield an array of unique elements, which is great if you want no duplicates but which will be an unpleasant surprise if you have non-unique elements in your original array by design.
If you have at least one duplicate element in your original array that you don't want to lose, iterating through the array with an early return is worst-case O(n), which isn't too bad in the grand scheme of things.
class Array
def add_if_unique element
return self if include? element
push element
end
end
I'm not sure if it's perfect solution, but worked for me:
host_group = Array.new if not host_group.kind_of?(Array)
host_group.push(host)

Cannot understand what the following code does

Can somebody explain to me what the below code is doing. before and after are hashes.
def differences(before, after)
before.diff(after).keys.sort.inject([]) do |diffs, k|
diff = { :attribute => k, :before => before[k], :after => after[k] }
diffs << diff; diffs
end
end
It is from the papertrail differ gem.
It's dense code, no question. So, as you say before and after are hash(-like?) objects that are handed into the method as parameters. Calling before.diff(after) returns another hash, which then immediately has .keys called on it. That returns all the keys in the hash that diff returned. The keys are returned as an array, which is then immediately sorted.
Then we get to the most complex/dense bit. Using inject on that sorted array of keys, the method builds up an array (called diffs inside the inject block) which will be the return value of the inject method.
That array is made up of records of differences. Each record is a hash - built up by taking one key from the sorted array of keys from the before.diff(after) return value. These hashes store the attribute that's being diffed, what it looked like in the before hash and what it looks like in the after hash.
So, in a nutshell, the method gets a bunch of differences between two hashes and collects them up in an array of hashes. That array of hashes is the final return value of the method.
Note: inject can be, and often is, much, much simpler than this. Usually it's used to simply reduce a collection of values to one result, by applying one operation over and over again and storing the results in an accumlator. You may know inject as reduce from other languages; reduce is an alias for inject in Ruby. Here's a much simpler example of inject:
[1,2,3,4].inject(0) do |sum, number|
sum + number
end
# => 10
0 is the accumulator - the initial value. In the pair |sum, number|, sum will be the accumulator and number will be each number in the array, one after the other. What inject does is add 1 to 0, store the result in sum for the next round, add 2 to sum, store the result in sum again and so on. The single final value of the accumulator sum will be the return value. Here 10. The added complexity in your example is that the accumulator is different in kind from the values inside the block. That's less common, but not bad or unidiomatic. (Edit: Andrew Marshall makes the good point that maybe it is bad. See his comment on the original question. And #tokland points out that the inject here is just a very over-complex alternative for map. It is bad.) See the article I linked to in the comments to your question for more examples of inject.
Edit: As #tokland points out in a few comments, the code seems to need just a straightforward map. It would read much easier then.
def differences(before, after)
before.diff(after).keys.sort.map do |k|
{ :attribute => k, :before => before[k], :after => after[k] }
end
end
I was too focused on explaining what the code was doing. I didn't even think of how to simplify it.
It finds the entries in before and after that differ according to the underlying objects, then builds up a list of those differences in a more convenient format.
before.diff(after) finds the entries that differ.
keys.sort gives you the keys (of the map of differences) in sorted order
inject([]) is like map, but starts with diffs initialized to an empty array.
The block creates a diff line (a hash) for each of these differences, and then appends it to diffs.

Sorting a hash table of objects (by the attributes of the objects) in Ruby

Say I have a class called Person, and it contains things such as last name, first name, address, etc.
I also have a hash table of Person objects that needs to be sorted by last and first name.
I understand that a sort_by will not change the hash permanently, which is fine, I only need to print in that order. Currently, I am trying to sort/print in place using:
#hash.sort_by {|a,b| a <=> b}.each { |person| puts person.last}
I have overloaded the <=> operator to sort by last/first, but nothing appears to actually sort. The puts there simply outputs in the hash's original order. I have spent a good 4 days trying to figure this out (it is a school assignment, and my first Ruby program). Any ideas? I am sure this is easy, but I am having the hardest time bringing my brain out of the C++ way of thinking.
You appear to be confusing sort and sort_by
sort yields two objects from the collection to the block and expects you to return a <=> like value: -1,0 or 1 depending on whether the arguments are equal, ascending or descending, for example
%w(one two three four five).sort {|a,b| a.length <=> b.length}
Sorts the strings by length. This is the form to use if you want to use your <=> operator
sort_by yields one object from the collection at a time and expects you to return what you want to sort by - you shouldn't be doing any comparison here. Ruby then uses <=> on these objecfs to sort your collection. The previous example can e rewritten as
%w(one two three four five).sort_by {|s| s.length}
This is also known as a schwartzian transform
In your case the collection is a hash so things are slightly more complicated: the values that are passed into the block are arrays that contain key/value pairs, so you'll need to extract the person object from that pair. You could also just work on #hash.keys or #hash.values (depending on whether the person objects are keys or values)
If you've overidden the <=> operator to sort Person objects appropriately then you can simply do:
#hash.sort_by{ |key, person| person }
because sort_by will yield both the hash key, and the object (in your case a person) to each iteration of the block. So the code above will sort your hash based on Person objects - for which you've already specified an <=> operator.
When you #.sort_by with a Hash, the parameters passed to the block are 'key','value', not 'element a' and 'element b'. Try:
#hash.sort_by{|key,value| value.last}.each{|key,value| puts value.last}
Also, see Frederick Cheung's excellent explanation of #sort vs #sort_by.

Find lowest value in a hash

a = {
1 => ["walmart", "walmart.com", 300.0],
2 => ["amazon", "amazon.com", 350.0],
...
}
How do I find the element with lowest value of the float value in its array?
min_by is available as a method from the Enumerable module.
It gets the array of all values in the Hash, and then picks the minimum value based on the last element of each array.
a.values.min_by(&:last)
Another useful method is sort_by from the Enumerable module as well. It will arrange your hash from ascending order. Then chain the method with first to grab the lowest value.
a.sort_by { |key, value| value }.first
See the min_by solution in the answer below. My original answer to this question was way less efficient, as pointed out in the comment.

Resources