I have two arrays and I am creating a key-value-pair using hash in Ruby. How can I detect a duplicate key when zipping two arrays into key-value-pair and adding a prefix like "A-" in front of the key name for the duplicates?
I am using .zip to merge two arrays and making one a key and other one a value
[0] = "David"
[1] = "John"
[2] = "Alex"
[3] = "Sam"
[4] = "Caleb"
[5] = "David"
[6] = "John"
[7] = "Alex"
[8] = "Sam"
[0] = "1"
[1] = "2"
[2] = "3"
[3] = "4"
[4] = "5"
[5] = "6"
[6] = "7"
[7] = "8"
[8] = "9"
name_number_key_value_pair_hash = first_names.zip(numbers).to_h
puts(name_number_key_value_pair_hash)
Expected:
{"David"=>"1", "John"=>"2", "Alex"=>"3", "Sam"=>"4", "Caleb"=>"5", "A-David"=>"6", "A-John"=>"7", "A-Alex"=>"8", "A-Sam"=>"9"}
Actual:
{"David"=>"6", "John"=>"7", "Alex"=>"8", "Sam"=>"9", "Caleb"=>"5"}
It seems straight forward Have attached code snippet
names = %w[David John Alex Sam Caleb David John Alex Sam]
numbers = %w[1 2 3 4 5 6 7 8 9]
key_pair = {}
names.each_with_index do |name, index|
name = "A-#{name}" if key_pair[name]
key_pair[name] = numbers[index]
end
It generates the expected output:
{"David"=>"1", "John"=>"2", "Alex"=>"3", "Sam"=>"4", "Caleb"=>"5", "A-David"=>"6", "A-John"=>"7", "A-Alex"=>"8", "A-Sam"=>"9"}
You basically just need to keep track of the state of the hash as you build it and, when you find a conflict, create a new key instead. This captures the general approach:
def hash_with_prefixes(a, b, prefixes)
kv_pairs = a.zip(b)
prefixes = prefixes.to_enum
result_hash = {}
kv_pairs.each do |initial_key, value|
final_key = initial_key
while result_hash.include? final_key
final_key = "#{pfx.next}-#{initial_key}"
end
prefixes.rewind
result_hash[final_key] = value
end
result_hash
rescue StopIteration
fail "Insufficient prefixes to provide unique keys for input lists."
end
At the slight expense of clarity, you can also write it in a rather shorter form:
def hash_with_prefixes(a, b, prefixes)
pi = Hash[a.map {|k| [k, prefixes.lazy.map {|p| "#{p}-#{k}"}]}]
a.zip(b).inject({}) {|h, kv| h[h.include?(kv[0]) ? pi[kv[0]].next : kv[0]] = kv[1]; h}
rescue StopIteration
fail "Insufficient prefixes to provide unique keys for input lists."
end
(Don't do this.)
This is really very simple.
names = ["John","John", "John", "David", "David", "Susan", "Sue"]
numbers = ["1", "2", "3", "4", "5", "6","7"]
def uniq_hash_keys(names, numbers)
hash = {}
names.each_with_index do |name,i|
if hash[name]
prefix = 'A1-'
key = prefix + name
while hash[key]
version = prefix.match(/A(\d+)-.*/i)[1].to_i
prefix = "A#{version + 1}-"
key = prefix + name
end
name = key
end
hash[name] = numbers[i]
end
hash
end
This function produces:
{
"John"=>"1",
"A1-John"=>"2",
"A2-John"=>"3",
"David"=>"4",
"A1-David"=>"5",
"Susan"=>"6",
"Sue"=>"7"
}
Notice that there are 3 Johns, this is why the while loop is inside the function.
This is one way to create the desired hash. Note that in arr1 "John" appears three times.
arr1 = ["David", "John", "Alex", "Sam", "Caleb",
"David", "John", "Alex", "John", "Sam"]
arr2 = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
prefixes =
arr1.each_with_object({}) do |s,h|
if h.key?(s)
prefix = "A-"
(h[s].size-1).times { prefix = prefix.next }
h[s] << prefix
else
h[s] = ['']
end
end
#=> {"David"=>["", "A-"], "John"=>["", "A-", "B-"],
# "Alex"=>["", "A-"], "Sam"=>["", "A-"],
# "Caleb"=>[""]}
arr1.map { |s| "#{prefixes[s].shift}#{s}" }.zip(arr2).to_h
#=> {"David"=>"1", "John"=>"2", "Alex"=>"3", "Sam"=>"4",
# "Caleb"=>"5", "A-David"=>"6", "A-John"=>"7",
# "A-Alex"=>"8", "B-John"=>"9", "A-Sam"=>"10"}
Note that "A-".next #=> "B-" and "Z-".next #=> "AA-".
Alternative data structure
You may wish to consider a different data structure, one that returns
{"David"=>["1", "6"], "John"=>["2", "7", "9"],
"Alex" =>["3", "8"], "Sam" =>["4", "10"], "Caleb"=>["5"]}
You could do that as follows.
arr1.each_with_index.
group_by(&:first).
transform_values { |v| arr2.values_at(*v.map(&:last)) }
#=> {"David"=>["1", "6"], "John"=>["2", "7", "9"],
# "Alex" =>["3", "8"], "Sam" =>["4", "10"],
# "Caleb"=>["5"]}
See Enumerable#each_with_index, Enumerable#group_by, Hash#transform_values1 and Array#values_at. v.map(*:last) is here the same as v.map { |arr| arr.last }.
The steps are as follows.
a = arr1.each_with_index
#=> #<Enumerator: ["David", "John", "Alex", "Sam",
# "Caleb", "David", "John", "Alex", "John", "Sam"]:
# each_with_index>
We can see the values that will be generated by this enumerator by converting it to an array.
a.to_a
#=> [["David", 0], ["John", 1], ["Alex", 2], ["Sam", 3],
# ["Caleb", 4], ["David", 5], ["John", 6], ["Alex", 7],
# ["John", 8], ["Sam", 9]]
Continuing,
b = a.group_by(&:first)
#=> {"David"=>[["David", 0], ["David", 5]],
# "John"=> [["John", 1], ["John", 6], ["John", 8]],
# "Alex"=> [["Alex", 2], ["Alex", 7]],
# "Sam"=> [["Sam", 3], ["Sam", 9]],
# "Caleb"=>[["Caleb", 4]]}
b.transform_values { |v| arr2.values_at(*v.map(&:last)) }
#=> {"David"=>["1", "6"], "John"=>["2", "7", "9"],
# "Alex"=> ["3", "8"], "Sam"=> ["4", "10"], "Caleb"=>["5"]}
For the last step, the first value of the hash b is passed to the block and the block variable is assigned to that value.
v = b.values.first
#=> [["David", 0], ["David", 5]]
The block calculations are then as follows.
c = v.map(&:last)
#=> [0, 5]
arr2.values_at(*c)
#=> arr2.values_at(0, 5)
#=> ["1", "6"]
The calculations are similar for each of the remaining values of b that are passed to the block.
1. New in Ruby MRI v2.4.
This code is less readable but compact and functional-style.
It conceptually the same as rahul mishra code https://stackoverflow.com/a/54697573/2109121
names = %w[David John Alex Sam Caleb David John Alex Sam]
numbers = %w[1 2 3 4 5 6 7 8 9]
result = names.zip(numbers).reduce({}) { |a, (b, c)| a.merge(a.key?(b) ? "A-#{b}" : b => c) }
Using zip and each_with_object
names = %w[David John Alex Sam Caleb David John Alex Sam]
numbers = %w[1 2 3 4 5 6 7 8 9]
names.zip(numbers).each_with_object({}) do |(name, number), hash|
key = hash.key?(name) ? "A-#{name}" : name
hash[key] = number
end
Assuming I get back a string:
"27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41"
I turn that into an array by calling
str.split(',')
Then turning it into a hash by calling
arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
I would get back a hash that looks like
{"1"=>2, "6"=>1, "39"=>23, "36"=>23, "34"=>39, "32"=>31, "30"=>18, "3"=>8, "2"=>10, "28"=>36, "29"=>21, "26"=>41, "27"=>48, "49"=>1, "44"=>4, "43"=>14, "42"=>34, "48"=>2, "40"=>9, "41"=>10, "11"=>1, "17"=>15, "12"=>1}
However, I'd like to sort that hash by key.
I've tried the solutions listed here.
I believe my problem is related to the fact they keys are strings.
The closest I got was using
Hash[h.sort_by{|k,v| k.to_i}]
Hashes shouldn't be treated as a sorted data structure. They have other advantages and use case as to return their values sequentially. As Mladen Jablanović already pointed out a array of tuples might be the better data structure when you need a sorted key/value pair.
But in current versions of Ruby there actually exists a certain order in which key/value pairs are returned when you call for example each on a hash and that is the order of insertion. Using this behavior you can just build a new hash and insert all key/value pairs into that new hash in the order you want them to be. But keep in mind that the order will break when you add more entries later on.
string = "27,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,12,17,17,41,17,17,17,17,17,17,17,17,17,17,17,17,17,26,26,26,26,26,26,26,26,26,29,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,40,48,28,28,28,28,28,28,28,28,28,28,28,28,28,28,29,29,29,29,29,29,29,29,29,29,29,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,34,34,34,34,34,34,36,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,34,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,36,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,39,40,40,40,40,40,40,40,40,41,41,41,41,41,41,41,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,43,43,43,43,43,43,43,43,43,43,43,43,43,44,44,44,44,48,49,29,41,6,30,11,29,29,36,29,29,36,29,43,1,29,29,29,1,41"
sorted_number_count_tupels = string.split(',').
group_by(&:itself).
map { |k, v| [k, v.size] }.
sort_by { |(k, v)| k.to_i }
#=> [["1",2],["2",10],["3",8],["6",1],["11",1],["12",1],["17",15],["26",41],["27",48],["28",36],["29",21],["30",18],["32",31],["34",39],["36",23],["39",23],["40",9],["41",10],["42",34],["43",14],["44",4],["48",2],["49",1]]
sorted_number_count_hash = sorted_number_count_tupels.to_h
#=> { "1" => 2, "2" => 10, "3" => 8, "6" => 1, "11" => 1, "12" => 1, "17" => 15, "26" => 41, "27" => 48, "28" => 36, "29" => 21, "30" => 18, "32" => 31, "34" => 39, "36" => 23, "39" => 23, "40" => 9, "41" => 10, "42" => 34, "43" => 14, "44" => 4, "48" => 2, "49" => 1}
Suppose you started with
str = "27,2,2,2,41,26,26,26,48,48,41,6,11,1,41"
and created the following hash
h = str.split(',').inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
#=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1}
I removed compact because the array str.split(',') contains only (possibly empty) strings, no nils.
Before continuing, you may want to change this last step to
h = str.split(/\s*,\s*/).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
#=> {"27"=>1, "2"=>3, "41"=>3, "26"=>3, "48"=>2, "6"=>1, "11"=>1, "1"=>1}
Splitting on the regex allows for the possibility of one or more spaces before or after each comma, and Enumerable#each_with_object avoids the need for that pesky ; h. (Notice the block variables are reversed.)
Then
h.sort_by { |k,_| k.to_i }.to_h
#=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2}
creates a new hash that contains h's key-value pairs sorted by the integer representations of the keys. See Hash#sort_by.
Notice we've created two hashes. Here's a way to do that by modifying h in place.
h.keys.sort_by(&:to_i).each { |k| h[k] = h.delete(k) }
#=> ["1", "2", "6", "11", "26", "27", "41", "48"] (each always returns the receiver)
h #=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2}
Lastly, another alternative is to sort str.split(',') before creating the hash.
str.split(',').sort_by(&:to_i).each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
#=> {"1"=>1, "2"=>3, "6"=>1, "11"=>1, "26"=>3, "27"=>1, "41"=>3, "48"=>2}
Notes
compact
String#split cannot return a nil element. compact won't be useful, here. split might return an empty string, though :
p "1,,2,3".split(',')
# ["1", "", "2", "3"]
p "1,,2,3".split(',').compact
# ["1", "", "2", "3"]
p "1,,2,3".split(',').reject(&:empty?)
# ["1", "2", "3"]
inject
If you have to use two statements inside inject block, each_with_object might be a better idea :
arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
can be rewritten :
arr.compact.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
Hash or Array?
If you need to sort results, an Array of pairs might be more suitable than a Hash.
String or Integer?
If you accept to have an integer as key, it might make your code easier to write.
Refactoring
Here's a possibility to rewrite your code :
str.split(',')
.reject(&:empty?)
.map(&:to_i)
.group_by(&:itself)
.map { |k, v| [k, v.size] }
.sort
It outputs :
[[1, 2], [2, 10], [3, 8], [6, 1], [11, 1], [12, 1], [17, 15], [26, 41], [27, 48], [28, 36], [29, 21], [30, 18], [32, 31], [34, 39], [36, 23], [39, 23], [40, 9], [41, 10], [42, 34], [43, 14], [44, 4], [48, 2], [49, 1]]
If you really want a Hash, you can add .to_h :
{1=>2, 2=>10, 3=>8, 6=>1, 11=>1, 12=>1, 17=>15, 26=>41, 27=>48, 28=>36, 29=>21, 30=>18, 32=>31, 34=>39, 36=>23, 39=>23, 40=>9, 41=>10, 42=>34, 43=>14, 44=>4, 48=>2, 49=>1}
You can assign the arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } to a variable and sort it by key:
num = arr.compact.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
num.keys.sort
That would sort the hash by key.
A Ruby hash will keep the order of keys added. If the array is small enough to sort I would just change
str.split(',').
to
str.split(',').sort_by(&:to_i)
in order to get the values, and therefore also you hash sorted...