How to select unique elements - ruby

I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq. For example:
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]
Example with closure:
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements{|z| z.round} # => [2.0, 5.1]
Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don't care of speed, I call it only once in my program, so it can be a slow.

Helper method
This method uses the helper:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
This method is similar to Array#-. The difference is illustrated in the following example:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a - b #=> [1]
c = a.difference b #=> [1, 3, 2, 2]
As you see, a contains three 3's and b contains two, so the first two 3's in a are removed in constructing c (a is not mutated). When b contains as least as many instances of an element as does a, c contains no instances of that element. To remove elements beginning at the end of a:
a.reverse.difference(b).reverse #=> [3, 1, 2, 2]
Array#difference! could be defined in the obvious way.
I have found many uses for this method: here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and here.
I have proposed that this method be added to the Ruby core.
When used with Array#-, this method makes it easy to extract the unique elements from an array a:
a = [1,3,2,4,3,4]
u = a.uniq #=> [1, 2, 3, 4]
u - a.difference(u) #=> [1, 2]
This works because
a.difference(u) #=> [3,4]
contains all the non-unique elements of a (each possibly more than once).
Problem at Hand
Code
class Array
def uniq_elements(&prc)
prc ||= ->(e) { e }
a = map { |e| prc[e] }
u = a.uniq
uniques = u - a.difference(u)
select { |e| uniques.include?(prc[e]) ? (uniques.delete(e); true) : false }
end
end
Examples
t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements
#=> [1,3,5,6,8]
t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round }
# => [2.0, 5.1]

Here's another way.
Code
require 'set'
class Array
def uniq_elements(&prc)
prc ||= ->(e) { e }
uniques, dups = {}, Set.new
each do |e|
k = prc[e]
((uniques.key?(k)) ? (dups << k; uniques.delete(k)) :
uniques[k] = e) unless dups.include?(k)
end
uniques.values
end
end
Examples
t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements #=> [1,3,5,6,8]
t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round } # => [2.0, 5.1]
Explanation
if uniq_elements is called with a block, it is received as the proc prc.
if uniq_elements is called without a block, prc is nil, so the first statement of the method sets prc equal to the default proc (lambda).
an initially-empty hash, uniques, contains representations of the unique values. The values are the unique values of the array self, the keys are what is returned when the proc prc is passed the array value and called: k = prc[e].
the set dups contains the elements of the array that have found to not be unique. It is a set (rather than an array) to speed lookups. Alternatively, if could be a hash with the non-unique values as keys, and arbitrary values.
the following steps are performed for each element e of the array self:
k = prc[e] is computed.
if dups contains k, e is a dup, so nothing more needs to be done; else
if uniques has a key k, e is a dup, so k is added to the set dups and the element with key k is removed from uniques; else
the element k=>e is added to uniques as a candidate for a unique element.
the values of unique are returned.

class Array
def uniq_elements
counts = Hash.new(0)
arr = map do |orig_val|
converted_val = block_given? ? (yield orig_val) : orig_val
counts[converted_val] += 1
[converted_val, orig_val]
end
uniques = []
arr.each do |(converted_val, orig_val)|
uniques << orig_val if counts[converted_val] == 1
end
uniques
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p t.uniq_elements { |elmt| elmt.round }
--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]
Array#uniq does not find non-duplicated elements, rather Array#uniq removes duplicates.

Use Enumerable#tally:
class Array
def uniq_elements
tally.select { |_obj, nb| nb == 1 }.keys
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]
If you are using Ruby < 2.7, you can get tally with the backports gem
require 'backports/2.7.0/enumerable/tally'

class Array
def uniq_elements
zip( block_given? ? map { |e| yield e } : self )
.each_with_object Hash.new do |(e, v), h| h[v] = h[v].nil? ? [e] : false end
.values.reject( &:! ).map &:first
end
end
[1,2,2,3,4,4,5,6,7,7,8,9,9,9].uniq_elements #=> [1, 3, 5, 6, 8]
[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2].uniq_elements &:round #=> [2.0, 5.1]

Creating and calling a default proc is a waste of time, and
Cramming everything into one line using tortured constructs doesn't make the code more efficient--it just makes the code harder to understand.
In require statements, rubyists don't capitalize file names.
....
require 'set'
class Array
def uniq_elements
uniques = {}
dups = Set.new
each do |orig_val|
converted_val = block_given? ? (yield orig_val) : orig_val
next if dups.include? converted_val
if uniques.include?(converted_val)
uniques.delete(converted_val)
dups << converted_val
else
uniques[converted_val] = orig_val
end
end
uniques.values
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p t.uniq_elements {|elmt|
elmt.round
}
--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]

Related

Why my custom ruby 'inject' method makes 'self' an empty array instead of the array I'm calling it on?

So I implemented a custom reduce/inject method as following
And I check what's the value of self early on
def my_inject(arg=nil, &block)
puts self
arg.nil? ? memo = self.shift() : memo = arg
until self.empty?
memo = block.call(memo, self.shift)
end
memo
end
my_array = [3, 5, 8, 9]
#without arguments works great
p my_array.my_inject { |m, v| m + v}
=> [3, 5, 8, 9]
=> 25
#with an argument does not work so great
p my_array.my_inject(100) { |m, v| m + v }
=> []
=> 100
When I call the method on an array I get what I expected only if I don't pass any arguments.
When I pass an argument (here 100) the result of printing self becomes an empty array... and so my 'until' never runs, why is that ?
Thanks in advance for your help!

generalize map and reduce lab

I'm working on a lab Using a generalized map method to pass an element and block through returning multiple outcomes.
Really struggled on this one. Found some responses but they don't really make sense to me.
Here is the code:
def map(s)
new = []
i = 0
while i < s.length
new.push(yield(s[i]))
i += 1
end
new
end
Here's is the test:
it "returns an array with all values made negative" do
expect(map([1, 2, 3, -9]){|n| n * -1}).to eq([-1, -2, -3, 9])
end
it "returns an array with the original values" do
dune = ["paul", "gurney", "vladimir", "jessica", "chani"]
expect(map(dune){|n| n}).to eq(dune)
end
it "returns an array with the original values multiplied by 2" do
expect(map([1, 2, 3, -9]){|n| n * 2}).to eq([2, 4, 6, -18])
end
it "returns an array with the original values squared" do
expect(map([1, 2, 3, -9]){|n| n * n}).to eq([1, 4, 9, 81])
end
end
I don't get how the above code can give you these 4 different results.
Could someone help me understand it ?
Thank you for your help!
How your method map works
To see how your method operates let's modify your code to add some intermediate variables and some puts statements to show the values of those variables.
def map(s)
new = []
i = 0
n = s.length
puts "s has length #{n}"
while i < n
puts "i = #{i}"
e = s[i]
puts " Yield #{e} to the block"
rv = yield(e)
puts " The block's return value is #{rv}. Push #{rv} onto new"
new.push(rv)
puts " new now equals #{new}"
i += 1
end
puts "We now return the value of new"
new
end
Now let's execute the method with one of the blocks of interest.
s = [1, 2, 3, -9]
map(s) { |n| n * 2 }
#=> [2, 4, 6, -18] (return value of method)
The following is displayed.
s has length 4
i = 0
Yield 1 to the block
The block's return value is 2. Push 2 onto new
new now equals [2]
i = 1
Yield 2 to the block
The block's return value is 4. Push 4 onto new
new now equals [2, 4]
i = 2
Yield 3 to the block
The block's return value is 6. Push 6 onto new
new now equals [2, 4, 6]
i = 3
Yield -9 to the block
The block's return value is -18. Push -18 onto new
new now equals [2, 4, 6, -18]
We now return the value of new
It may by of interest to execute this modified method with different values of s and different blocks.
A replacement for Array#map?
Is this a replacement for Array#map (or Enumerable#map, but for now let's just consider Array#map)? As you defined it at the top level your map is an instance method of the class Object:
Object.instance_methods.include?(:map) #=> true
It must be invoked map([1,2,3]) { |n| ... } whereas Array#map is invoked [1,2,3].map { |n| ... }. Therefore, for your method map to be a replacement for Array#map you need to define it as follows.
class Array
def map
new = []
i = 0
while i < length
new.push(yield(self[i]))
i += 1
end
new
end
end
[1, 2, 3, -9].map { |n| n * 2 }
#=> [2, 4, 6, -18]
Simplify
We can simplify this method as follows.
class Array
def map
new = []
each { |e| new << yield(e) }
new
end
end
[1, 2, 3, -9].map { |n| n * 2 }
#=> [2, 4, 6, -18]
or, better:
class Array
def map
each_with_object([]) { |e,new| new << yield(e) }
end
end
See Enumerable#each_with_object.
Note that while i < length is equivalent to while i < self.length, because self., if omitted, is implicit, and therefore redundant. Similarly, each { |e| new << yield(e) } is equivalent to self.each { |e| new << yield(e) } and each_with_object([]) { ... } is equivalent to self.each_with_object([]) { ... }.
Are we finished?
If we examine the doc Array#map carefully we see that there are two forms of the method. The first is when map takes a block. Our method Array#map mimics that behaviour and that is the only behaviour needed to satisfy the given rspec tests.
There is a second form, however, where map is not given a block, in which case it returns an enumerator. That allows us to chain the method to another. For example (with Ruby's Array#map),
['cat', 'dog', 'pig'].map.with_index do |animal, i|
i.even? ? animal.upcase : animal
end
#=> ["CAT", "dog", "PIG"]
We could modify our Array#map to incorporate this second behaviour as follows.
class Array
def map
if block_given?
each_with_object([]) { |e,new| new << yield(e) }
else
to_enum(:map)
end
end
end
[1, 2, 3, -9].map { |n| n * 2 }
#=> [2, 4, 6, -18]
['cat', 'dog', 'pig'].map.with_index do |animal, i|
i.even? ? animal.upcase : animal
end
#=> ["CAT", "dog", "PIG"]
See Kernel#block_given? and Object#to_enum.
Notes
You might use, say, arr, rather than s as the variable holding the array, as s often denotes a string, just as h typically denotes a hash. One generally avoids names for variables and custom methods that are the names of core Ruby methods. That is also an objection to your use of new as a variable name, as there are many core methods named new.

Add numbers in mixed array

My expectation for this code is the number 3. Why doesn't this work?
mixed_array=[1,'cat',2]
mixed_array.inject(0) do |memo,num|
memo += num if num.is_a?Integer
end
NoMethodError: undefined method `+' for nil:NilClass
You were almost there:
mixed_array.inject(0) do |memo, num|
next memo unless num.is_a?(Integer)
memo + num
end
#=> 3
Making your code working:
mixed_array.inject(0) do |memo, num|
memo += num if num.is_a?(Integer)
memo # return memo on each iteration, because it becomes nil with non-integer element
end
#=> 3
What you have doesn't work because:
The value of memo += num if num.is_a?Integer is nil when num is not an Integer.
The value of each block is fed to the next iteration as memo.
Your block evaluates to nil on the second iteration so you're going to end up trying to evaluate:
nil += 2 if 2.is_a? Integer
and there's your NoMethodError.
You're probably better off doing this in two steps for clarity:
mixed_array.select { |e| e.is_a? Integer }.inject(:+)
or maybe even the looser version:
mixed_array.select { |e| e.is_a? Numeric }.inject(:+)
or with newer versions of Ruby:
mixed_array.select { |e| e.is_a? Numeric }.sum
If you're not dogmatically opposed to ternaries then you could also say things like:
mixed_array.inject(0) { |memo, num| memo + (num.is_a?(Integer) ? num : 0) }
mixed_array.sum { |e| e.is_a?(Integer) ? e : 0 }
If you know that the non-numeric elements of mixed_array are strings that don't look like numbers or start with numbers (i.e. nothing like '0', '11 pancakes', ...) then you could say:
mixed_array.map(&:to_i).inject(:+)
mixed_array.inject(0) { |memo, num| memo + num.to_i }
...
but that's probably making too many assumptions.
You can do this a lot more easily if you consider it as a two stage operation rather than one:
mixed_array=[1,'cat',2]
mixed_array.grep(Integer).inject(0, :+)
# => 3
This filters out all the non-Integer elements from the array and adds the rest together.
Remember that inject takes the return value from the previous iteration as the seed for the next. Your if clause must return an alternate value. You end up with this if you fix it:
memo += num.is_a?(Integer) ? num : 0
You could also go with a good-enough solution like:
memo += num.to_i
Depending on what sort of data you're trying to screen out.
mixed_array = [1, 'cat', 2, [3, 4], :a, { b: 5, c: 6 }, 7]
mixed_array.reduce(0) { |tot, obj| tot += Integer(obj) rescue 0 }
#=> 10
When the array may include one or more floats and you want to return a float:
mixed_array = [1, 'cat', 2, [3, 4], :a, { b: 5, c: 6 }, 7, 8.123]
mixed_array.reduce(0) { |tot, obj| tot += Float(obj) rescue 0 }
#=> 18.122999999999998
See Kernel::Integer and Kernel::Float.

What's the common fast way of expressing the infinite enumerator `(1..Inf)` in Ruby?

I think infinite enumerator is very convenient for writing FP style scripts but I have yet to find a comfortable way to construct such structure in Ruby.
I know I can construct it explicitly:
a = Enumerator.new do |y|
i = 0
loop do
y << i += 1
end
end
a.next #=> 1
a.next #=> 2
a.next #=> 3
...
but that's annoyingly wordy for such a simple structure.
Another approach is sort of a "hack" of using Float::INFINITY:
b = (1..Float::INFINITY).each
b = (1..1.0/0.0).each
These two are probably the least clumsy solution I can give. Although I'd like to know if there are some other more elegant way of constructing infinite enumerators. (By the way, why doesn't Ruby just make inf or infinity as a literal for Float::INFINITY?)
Use #to_enum or #lazy to convert your Range to an Enumerable. For example:
(1..Float::INFINITY).to_enum
(1..Float::INFINITY).lazy
I would personally create my own Ruby class for this.
class NaturalNumbers
def self.each
i = 0
loop { yield i += 1 }
end
end
NaturalNumbers.each do |i|
puts i
end
Ruby 2.7 introduced Enumerator#produce for creating an infinite enumerator from any block, which results in a very elegant, very functional way of implementing the original problem:
irb(main):001:0> NaturalNumbers = Enumerator.produce(0) { |x| x + 1 }
=> #<Enumerator: #<Enumerator::Producer:0x00007fadbd82d990>:each>
irb(main):002:0> NaturalNumbers.first(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
irb(main):003:0> _
... which - if you're a fan of numbered block parameters (another Ruby 2.7 feature) - can also be written as:
irb(main):006:0> NaturalNumbers = Enumerator.produce(0) { _1 + 1 }
=> #<Enumerator: #<Enumerator::Producer:0x00007fadbc8b08f0>:each>
irb(main):007:0> NaturalNumbers.first(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
irb(main):008:0> _

How to extend Ruby Enumerable class max_by to ignore nil?

a = [4, 3, 2, nil]
a.max_by { |v| v * 2 } => NoMethodError: undefined method `*' for nil:NilClass
How to overload max_by to ignore nil values?
You can use Array.compact to remove nils before you call max_by.
a.compact.max_by { |v| v * 2 }
Welcome to Ruby: there are so many ways to solve the problem!
A very simple solution is:
a.max_by { |v| v.to_f * 2 }
since nil coerces to float as 0. This doesn't handle negative values, but since nil is only a single instance of a class called NilClass, now as with all classes in Ruby we can open it an let it learn a littly maths:
class NilClass
# overloading * operator
def *(y)
# returning negative infinity: Ruby 1.8.7
-1.0/0.0
# returning negative infinity: Ruby 1.9.2
# -Float::INFINITY
end
end
now we have
a.max_by { |v| v * 2 }
returning 4.
Here's another one:
a.max_by { |v| v.nil? ? -Float::INFINITY : v }
#=> 4
For your example this is obviously more complicated than compact, but if you want to sort the array and keep the nil values it's a handy trick. Or if you want to sort in a strange way, like zeroes to the end:
[0,4,5,6,1,9].sort_by { |v| v.zero? ? Float::INFINITY : v }
#=> [1, 4, 5, 6, 9, 0]

Resources