Fastest/One-liner way to remove duplicates (by key) in Ruby Array? - ruby

What's the fastest/one-liner way to remove duplicates in an array of objects, based on a specific key:value, or a result returned from a method?
For instance, I have 20 XML Element nodes that are all the same name, but they have different "text" values, some of which are duplicates. I would like to remove the duplicates by saying "if element.text == previous_element.text, remove it". How do I do that in Ruby in the shortest amount of code?
I've seen how to do it for simple string/integer values, but not for objects.

Here's the standard hashy way. Note the use of ||= operator, which is a more convenient (a ||= b) way to write a = b unless a.
array.inject({}) do |hash,item|
hash[item.text]||=item
hash
end.values.inspect
You can do it in a single line either.
The script needs O(n) equality checks of text strings. That's what's covered under O(n) when you see a hash.

This does it all:
Hash[*a.map{|x| [x.text, x]}].values
short? yep.
(asterisk is optional; seems to be required for 1.8.6).
For example:
a = [Thing.new('a'), Thing.new('b'), Thing.new('c'), Thing.new('c')]
=> [#<Thing a>, #<Thing b>, #<Thing c>, #<Thing c>]
Hash[a.map{|x| [x.text, x]}].values
=> [#<Thing a>, #<Thing b>, #<Thing c>]
Boring part: here's the little test class I used:
class Thing
attr_reader :text
def initialize(text)
#text = text
end
def inspect
"#<Thing #{text}>"
end
end

Use Array#uniq with a block. In your case:
array.uniq(&:text) # => array with duplicated `text` removed
This was introduced in Ruby 1.9.2, so if using an earlier version, you can use backports with require 'backports/1.9.2/array/uniq'

Related

Ruby Hash destructive vs. non-destructive method

Could not find a previous post that answers my question...I'm learning how to use destructive vs. non-destructive methods in Ruby. I found an answer to the exercise I'm working on (destructively adding a number to hash values), but I want to be clear on why some earlier solutions of mine did not work. Here's the answer that works:
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| the_hash[k] = v + number_to_add_to_each_value}
end
These two solutions come back as non-destructive (since they all use "each" I cannot figure out why. To make something destructive is it the equals sign above that does the trick?):
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each_value { |v| v + number_to_add_to_each_value}
end
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| v + number_to_add_to_each_value}
end
The terms "destructive" and "non-destructive" are a bit misleading here. Better is to use the conventional "in-place modification" vs. "returns a copy" terminology.
Generally methods that modify in-place have ! at the end of their name to serve as a warning, like gsub! for String. Some methods that pre-date this convention do not have them, like push for Array.
The = performs an assignment within the loop. Your other examples don't actually do anything useful since each returns the original object being iterated over regardless of any results produced.
If you wanted to return a copy you'd do this:
def modify_a_hash(the_hash, number_to_add)
Hash[
the_hash.collect do |k, v|
[ k, v + number_to_add ]
end
]
end
That would return a copy. The inner operation collect transforms key-value pairs into new key-value pairs with the adjustment applied. No = is required since there's no assignment.
The outer method Hash[] transforms those key-value pairs into a proper Hash object. This is then returned and is independent of the original.
Generally a non-destructive or "return a copy" method needs to create a new, independent version of the thing it's manipulating for the purpose of storing the results. This applies to String, Array, Hash, or any other class or container you might be working with.
Maybe this slightly different example will be helpful.
We have a hash:
2.0.0-p481 :014 > hash
=> {1=>"ann", 2=>"mary", 3=>"silvia"}
Then we iterate over it and change all the letters to the uppercase:
2.0.0-p481 :015 > hash.each { |key, value| value.upcase! }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}
The original hash has changed because we used upcase! method.
Compare to method without ! sign, that doesn't modify hash values:
2.0.0-p481 :017 > hash.each { |key, value| value.downcase }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}

How to get Ruby hash to remove braces in output

I started working through some sample problems on Test-First, and had worked out a solution which passed all the RSpec tests using Ruby 1.8.7. I just upgraded my OS, and Ruby upgraded as well; my code no longer passes the RSpec test. Can anyone help me understand why this is not working anymore?
My code
def entries
#d
end
the error message
Failures:
1) Dictionary can add whole entries with keyword and definition
Failure/Error: #d.entries.should == {'fish' => 'aquatic animal'}
expected: {"fish"=>"aquatic animal"}
got: {["fish"]=>["aquatic animal"]} (using ==)
Diff:
## -1,2 +1,2 ##
-"fish" => "aquatic animal"
+["fish"] => ["aquatic animal"]
#
I can't figure out what to change about the formatting. (One of the RSpec tests is that the #d must be empty when created, so when I try modifying the #d by putting in explicit formatting it also fails, but I'm imagining that there's a straightforward type issue here I'm not understanding.)
Update: More code
class Dictionary
def initialize d = {}
#d = d
end
def entries
#d
end
def keywords
#d.keys.sort
end
def add words
n_key = words.keys
n_val = words.values
#d[n_key] = n_val
end
end
It looks like you're trying to do some kind of mass assignment by adding several words at once, but that's not the way to do it.
A Ruby Hash can have anything as a key, and this includes arrays of things. It's not like JavaScript where it will automatically cast to string, or other languages that have the same sort of conversion to a specific dictionary key type. In Ruby any object will do.
So your add words method should be:
def add words
words.each do |word, value|
#d[word] = value
end
end
As a note using names like #d is really bad form. Try and be more specific about what that is, or you risk confusing people endlessly. Programs filled with things like #d, x and S are awful to debug and maintain. Better to be clear if a bit verbose than terse and ambiguous.
Secondly, it's not clear how your Dictionary class is all that different from Hash itself. Maybe you could make it a subclass and save yourself some trouble. For example:
class Dictionary < Hash
def keywords
keys.sort
end
def add words
merge!(words)
end
end
In general terms it's always best to use the core Ruby classes to do what you want, then build out from there. Re-inventing the wheel leads to incompatibility and frustration. The built-in Hash class has a whole bunch of utility methods that are very handy for doing data transformation, conversion and iteration, things you're losing by creating your own opaque wrapper class.
The merge! method in particular adds data to an existing Hash, which is exactly what you want.

Detect if something is Enumerable in Ruby

I have a method that accepts either a single object or a collection of objects. What is the proper way of detecting if what's passed in is Enumerable? I'm currently doing the following (which works but I'm not sure it's the correct way):
def foo(bar)
if bar.respond_to? :map
# loop over it
else
# single object
end
end
I would use is_a?.
bar.is_a? Enumerable
But there’s a better way to take a single object or a collection, assuming that the caller knows which one they’re passing in. Use a splat:
def foo(*args)
args.each do |arg|
…
end
end
Then you can call it as foo(single_arg), foo(arg1, arg2), and foo(*argv).
I depends on your exact needs, but it's usually not a great idea to differentiate between a single object and an Enumerable. In particular, a Hash is an Enumerable, but in most cases it should be considered as a single object.
It's usually better to distinguish between a single object and an array-like argument. That's what Ruby often does. The best way to do this is to check if arg.respond_to? :to_ary. If it does, then all methods of Array should be available to you, if not treat it as a single object.
If you really want to check for Enumerable, you could test arg.is_a? Enumerable but consider that a Hash is an Enumerable and so are Lazy enumerators (and calling map on them won't even give you an array!)
If your purpose is to loop over it, then the standard way is to ensure it is an array. You can do it like this without condition.
def foo(bar)
[*bar] # Loop over it. It is ensured to be an array.
end
What about handling single items or a collection in one shot?
[*bar].each { |item| puts item }
This will work whether bar is a single item or an array or hash or whatever. This probably isn't the best for working with hashes, but with arrays it works pretty well.
Another way to ensure that something is an Array is with the Array "function (technically still a method):
def foo(bar)
Array(bar).map { |o| … }
end
Array will leave an array an array, and convert single elements to an array:
Array(["foo"]) # => ["foo"]
Array("foo") # => ["foo"]
Array(nil) # => []

Removing Identical Objects in Ruby?

I am writing a Ruby app at the moment which is going to search twitter for various things. One of the problems I am going to face is shared results between searches in close proximity to each other time-wise. The results are returned in an array of objects each of which is a single tweet. I know of the Array.uniq method in ruby which returns an array with all the duplicates removed.
My question is this. Does the uniq method remove duplicates in so far as these objects point to the same space in memory or that they contain identical information?
If the former, whats the best way of removing duplicates from an array based on their content?
Does the uniq method remove duplicates
in so far as these objects point to
the same space in memory or that they
contain identical information?
The method relies on the eql? method so it removes all the elements where a.eql?(b) returns true.
The exact behavior depends on the specific object you are dealing with.
Strings, for example, are considered equal if they contain the same text regardless they share the same memory allocation.
a = b = "foo"
c = "foo"
[a, b, c].uniq
# => ["foo"]
This is true for the most part of core objects but not for ruby objects.
class Foo
end
a = Foo.new
b = Foo.new
a.eql? b
# => false
Ruby encourages you to redefine the == operator depending on your class context.
In your specific case I would suggest to create an object representing a twitter result and implement your comparison logic so that Array.uniq will behave as you expect.
class Result
attr_accessor :text, :notes
def initialize(text = nil, notes = nil)
self.text = text
self.notes = notes
end
def ==(other)
other.class == self.class &&
other.text == self.text
end
alias :eql? :==
end
a = Result.new("first")
b = Result.new("first")
c = Result.new("third")
[a, b, c].uniq
# => [a, c]
For anyone else stumbling upon this question, it looks like things have changed a bit since this question was first asked and in newer Ruby versions (1.9.3 at least), Array.uniq assumes that your object also has a meaningful implementation of the #hash method, in addition to .eql? or ==.
uniq uses eql?, as documented in this thread.
See the official ruby documentation for the distinction between ==, equal?, and eql?.
I believe that Array.uniq detects duplicates via the objects' eql? or == methods, which means its comparing based on content, not location in memory (assuming the objects provide a meaningful implementation of eql? based on content).

What is the "right" way to iterate through an array in Ruby?

PHP, for all its warts, is pretty good on this count. There's no difference between an array and a hash (maybe I'm naive, but this seems obviously right to me), and to iterate through either you just do
foreach (array/hash as $key => $value)
In Ruby there are a bunch of ways to do this sort of thing:
array.length.times do |i|
end
array.each
array.each_index
for i in array
Hashes make more sense, since I just always use
hash.each do |key, value|
Why can't I do this for arrays? If I want to remember just one method, I guess I can use each_index (since it makes both the index and value available), but it's annoying to have to do array[index] instead of just value.
Oh right, I forgot about array.each_with_index. However, this one sucks because it goes |value, key| and hash.each goes |key, value|! Is this not insane?
This will iterate through all the elements:
array = [1, 2, 3, 4, 5, 6]
array.each { |x| puts x }
# Output:
1
2
3
4
5
6
This will iterate through all the elements giving you the value and the index:
array = ["A", "B", "C"]
array.each_with_index {|val, index| puts "#{val} => #{index}" }
# Output:
A => 0
B => 1
C => 2
I'm not quite sure from your question which one you are looking for.
I think there is no one right way. There are a lot of different ways to iterate, and each has its own niche.
each is sufficient for many usages, since I don't often care about the indexes.
each_ with _index acts like Hash#each - you get the value and the index.
each_index - just the indexes. I don't use this one often. Equivalent to "length.times".
map is another way to iterate, useful when you want to transform one array into another.
select is the iterator to use when you want to choose a subset.
inject is useful for generating sums or products, or collecting a single result.
It may seem like a lot to remember, but don't worry, you can get by without knowing all of them. But as you start to learn and use the different methods, your code will become cleaner and clearer, and you'll be on your way to Ruby mastery.
I'm not saying that Array -> |value,index| and Hash -> |key,value| is not insane (see Horace Loeb's comment), but I am saying that there is a sane way to expect this arrangement.
When I am dealing with arrays, I am focused on the elements in the array (not the index because the index is transitory). The method is each with index, i.e. each+index, or |each,index|, or |value,index|. This is also consistent with the index being viewed as an optional argument, e.g. |value| is equivalent to |value,index=nil| which is consistent with |value,index|.
When I am dealing with hashes, I am often more focused on the keys than the values, and I am usually dealing with keys and values in that order, either key => value or hash[key] = value.
If you want duck-typing, then either explicitly use a defined method as Brent Longborough showed, or an implicit method as maxhawkins showed.
Ruby is all about accommodating the language to suit the programmer, not about the programmer accommodating to suit the language. This is why there are so many ways. There are so many ways to think about something. In Ruby, you choose the closest and the rest of the code usually falls out extremely neatly and concisely.
As for the original question, "What is the “right” way to iterate through an array in Ruby?", well, I think the core way (i.e. without powerful syntactic sugar or object oriented power) is to do:
for index in 0 ... array.size
puts "array[#{index}] = #{array[index].inspect}"
end
But Ruby is all about powerful syntactic sugar and object oriented power, but anyway here is the equivalent for hashes, and the keys can be ordered or not:
for key in hash.keys.sort
puts "hash[#{key.inspect}] = #{hash[key].inspect}"
end
So, my answer is, "The “right” way to iterate through an array in Ruby depends on you (i.e. the programmer or the programming team) and the project.". The better Ruby programmer makes the better choice (of which syntactic power and/or which object oriented approach). The better Ruby programmer continues to look for more ways.
Now, I want to ask another question, "What is the “right” way to iterate through a Range in Ruby backwards?"! (This question is how I came to this page.)
It is nice to do (for the forwards):
(1..10).each{|i| puts "i=#{i}" }
but I don't like to do (for the backwards):
(1..10).to_a.reverse.each{|i| puts "i=#{i}" }
Well, I don't actually mind doing that too much, but when I am teaching going backwards, I want to show my students a nice symmetry (i.e. with minimal difference, e.g. only adding a reverse, or a step -1, but without modifying anything else).
You can do (for symmetry):
(a=*1..10).each{|i| puts "i=#{i}" }
and
(a=*1..10).reverse.each{|i| puts "i=#{i}" }
which I don't like much, but you can't do
(*1..10).each{|i| puts "i=#{i}" }
(*1..10).reverse.each{|i| puts "i=#{i}" }
#
(1..10).step(1){|i| puts "i=#{i}" }
(1..10).step(-1){|i| puts "i=#{i}" }
#
(1..10).each{|i| puts "i=#{i}" }
(10..1).each{|i| puts "i=#{i}" } # I don't want this though. It's dangerous
You could ultimately do
class Range
def each_reverse(&block)
self.to_a.reverse.each(&block)
end
end
but I want to teach pure Ruby rather than object oriented approaches (just yet). I would like to iterate backwards:
without creating an array (consider 0..1000000000)
working for any Range (e.g. Strings, not just Integers)
without using any extra object oriented power (i.e. no class modification)
I believe this is impossible without defining a pred method, which means modifying the Range class to use it. If you can do this please let me know, otherwise confirmation of impossibility would be appreciated though it would be disappointing. Perhaps Ruby 1.9 addresses this.
(Thanks for your time in reading this.)
Use each_with_index when you need both.
ary.each_with_index { |val, idx| # ...
The other answers are just fine, but I wanted to point out one other peripheral thing: Arrays are ordered, whereas Hashes are not in 1.8. (In Ruby 1.9, Hashes are ordered by insertion order of keys.) So it wouldn't make sense prior to 1.9 to iterate over a Hash in the same way/sequence as Arrays, which have always had a definite ordering. I don't know what the default order is for PHP associative arrays (apparently my google fu isn't strong enough to figure that out, either), but I don't know how you can consider regular PHP arrays and PHP associative arrays to be "the same" in this context, since the order for associative arrays seems undefined.
As such, the Ruby way seems more clear and intuitive to me. :)
Here are the four options listed in your question, arranged by freedom of control. You might want to use a different one depending on what you need.
Simply go through values:
array.each
Simply go through indices:
array.each_index
Go through indices + index variable:
for i in array
Control loop count + index variable:
array.length.times do | i |
Trying to do the same thing consistently with arrays and hashes might just be a code smell, but, at the risk of my being branded as a codorous half-monkey-patcher, if you're looking for consistent behaviour, would this do the trick?:
class Hash
def each_pairwise
self.each { | x, y |
yield [x, y]
}
end
end
class Array
def each_pairwise
self.each_with_index { | x, y |
yield [y, x]
}
end
end
["a","b","c"].each_pairwise { |x,y|
puts "#{x} => #{y}"
}
{"a" => "Aardvark","b" => "Bogle","c" => "Catastrophe"}.each_pairwise { |x,y|
puts "#{x} => #{y}"
}
I'd been trying to build a menu (in Camping and Markaby) using a hash.
Each item has 2 elements: a menu label and a URL, so a hash seemed right, but the '/' URL for 'Home' always appeared last (as you'd expect for a hash), so menu items appeared in the wrong order.
Using an array with each_slice does the job:
['Home', '/', 'Page two', 'two', 'Test', 'test'].each_slice(2) do|label,link|
li {a label, :href => link}
end
Adding extra values for each menu item (e.g. like a CSS ID name) just means increasing the slice value. So, like a hash but with groups consisting of any number of items. Perfect.
So this is just to say thanks for inadvertently hinting at a solution!
Obvious, but worth stating: I suggest checking if the length of the array is divisible by the slice value.
If you use the enumerable mixin (as Rails does) you can do something similar to the php snippet listed. Just use the each_slice method and flatten the hash.
require 'enumerator'
['a',1,'b',2].to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
# is equivalent to...
{'a'=>1,'b'=>2}.to_a.flatten.each_slice(2) {|x,y| puts "#{x} => #{y}" }
Less monkey-patching required.
However, this does cause problems when you have a recursive array or a hash with array values. In ruby 1.9 this problem is solved with a parameter to the flatten method that specifies how deep to recurse.
# Ruby 1.8
[1,2,[1,2,3]].flatten
=> [1,2,1,2,3]
# Ruby 1.9
[1,2,[1,2,3]].flatten(0)
=> [1,2,[1,2,3]]
As for the question of whether this is a code smell, I'm not sure. Usually when I have to bend over backwards to iterate over something I step back and realize I'm attacking the problem wrong.
In Ruby 2.1, each_with_index method is removed.
Instead you can use each_index
Example:
a = [ "a", "b", "c" ]
a.each_index {|x| print x, " -- " }
produces:
0 -- 1 -- 2 --
The right way is the one you feel most comfortable with and which does what you want it to do. In programming there is rarely one 'correct' way to do things, more often there are multiple ways to choose.
If you are comfortable with certain way of doings things, do just it, unless it doesn't work - then it is time to find better way.
Using the same method for iterating through both arrays and hashes makes sense, for example to process nested hash-and-array structures often resulting from parsers, from reading JSON files etc..
One clever way that has not yet been mentioned is how it's done in the Ruby Facets library of standard library extensions. From here:
class Array
# Iterate over index and value. The intention of this
# method is to provide polymorphism with Hash.
#
def each_pair #:yield:
each_with_index {|e, i| yield(i,e) }
end
end
There is already Hash#each_pair, an alias of Hash#each. So after this patch, we also have Array#each_pair and can use it interchangeably to iterate through both Hashes and Arrays. This fixes the OP's observed insanity that Array#each_with_index has the block arguments reversed compared to Hash#each. Example usage:
my_array = ['Hello', 'World', '!']
my_array.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"
my_hash = { '0' => 'Hello', '1' => 'World', '2' => '!' }
my_hash.each_pair { |key, value| pp "#{key}, #{value}" }
# result:
"0, Hello"
"1, World"
"2, !"

Resources