Array insert and excessive matches in nested for loops - ruby - ruby

I don't understand why, but I'm getting too many inserts and matches generated when I nest these two loops. Any help appreciated!
pseudocode
two arrays - nested for loops
search 2nd array for match of each element in 1st array
if there is a match in 2nd array, take the number after the match
insert number in 1st array after word that has been matched
end
problem code:
ary1 = ['a','b','c','d']
ary2 = ['e','f','g', 'a']
limit = ary1.count - 1
limit2 = ary2.count - 1
(0..limit).each do |i|
(0..limit2).each do |j|
if ary1[i] == ary2[j]
ary1.insert(i,ary2[j])
puts 'match!'
end
end
end
puts ary1
output:
match!
match!
match!
match!
a
a
a
a
a
b
c
d
provisional solution:
ary1 = ['a','b','c','d']
ary2 = ['e','f','g', 'a']
# have to make a copy to avoid excessive matches
ary_dup = Array.new(ary1)
limit = ary1.count - 1
limit2 = ary2.count - 1
(0..limit).each do |i|
(0..limit2).each do |j|
if ary1[i] == ary2[j]
ary_dup.insert(i,ary2[j])
puts 'match!'
end
end
end
puts ary_dup
output:
match!
a
a
b
c
d

Its happening because you're modifying array (ary1) under examination on the fly.
You could achieve desired result using this line of code -
(ary1 & ary2).each {|e| ary1.insert(ary1.index(e)+1,e)}
What it does is -
ary1 & ary2 returns an array which is intersection of two arrays - ary1 and ary2. In other words it'll contain all those elements that exist in both arrays.
.each and ensuing block traverses over this new array and inserts each element in ary1 at "index of original element" + 1
puts ary1 #=> ["a", "a", "b", "c", "d"]

The below part is not correcrt:
(0..limit).each do |i|
(0..limit2).each do |j|
if ary1[i] == ary2[j]
ary1.insert(i,ary2[j])
puts 'match!'
end
end
end
First pass:
ary1 = ['a','b','c','d']
ary2 = ['e','f','g', 'a']
when limit=0 and limit2 = 3,there is a match.ary1.insert(0,ary2[j]) line makes your array ary1 as ary1 = ['a','a','b','c','d']
Second pass:
ary1 = ['a','a',b','c','d']
ary2 = ['e','f','g', 'a']
when limit=1 and limit2 = 3,there is a match.ary1.insert(1,ary2[j]) line makes your array ary1 as ary1 = ['a','a','a','b','c','d'].
And it Goes on.. So as your arr1is having size 4, 4 a s has been added to ary1. Finally it becomes - [ a,a,a,a,a,b,c,d].
Array#insert says :-
Inserts the given values before the element with the given index.Negative indices count backwards from the end of the array, where -1 is the last element.

Related

Using regular expressions to multiply and sum numeric string characters contained in a hash of mixed numeric strings

Without getting too much into biology, Proteins are made of Amino Acids. Each of the 20 Amino Acids that make up Proteins are represented by characters in a sequence. Each Amino Acid char has a different chemical formula, which I represent as strings. For example, "M" has a formula of "C5H11NO2S"
Given the 20 different formulas (and the varying frequency of each amino acid chars in a protein sequence) I want to compile all 20 of them into a single formula that will yield the total formula for the protein.
So first: multiply each formula by the frequency of its char in the sequence
Second : sum together all multiplied formulas into one formula.
To accomplish this, I first tried multiplying each amino acid char frequency in the sequence by the numbers in the chemical formula. I did this using .tally
sequence ="MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
sequence.chars.string.tally --> {"M"=>2, "G"=>5, "A"=>11, "R"=>5, "T"=>2, "L"=>9, "P"=>5, "D"=>5, "C"=>3, "S"=>4, "V"=>5, "H"=>1, "Q"=>4, "F"=>3, "N"=>3, "I"=>8, "K"=>7, "E"=>5, "Y"=>2}
Then, I listed all the amino acids chars and formulas into a hash
hash_of_formulas = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
An example of what the process for my overall goal is:
In the sequence , "M" occurs twice so "C5H11NO2S" will become "C10H22N2O4S2". "C" has a formula of "C3H7NO2S" occurs 3 times: In the sequence so "C3H7NO2S" becomes "C9H21N3O6S3"
So, Summing together "C10H22N2O4S2" and "C9H21N3O6S3" will yield "C19H43N5O10S5"
How can I repeat the process of multiplying each formula by its frequency and then summing together all multiplied formulas?
I know that I could use regex for multiplying a formula by its frequency for an individual string using
formula_multiplied_by_frequency = "C5H11NO2S".gsub(/\d+/) { |x| x.to_i * 4}
But I'm not sure of any methods to use regex on strings embedded within hashes
If I understand correctly, you want the to provide the total formula for a given protein sequence. Here's how I'd do it:
NUCLEOTIDES = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
NUCLEOTIDE_COMPOSITIONS = NUCLEOTIDES.each_with_object({}) { |(nucleotide, formula), compositions|
compositions[nucleotide] = formula.scan(/([A-Z][a-z]*)(\d*)/).map { |element, count| [element, count.empty? ? 1 : count.to_i] }.to_h
}
def formula(sequence)
sequence.each_char.with_object(Hash.new(0)) { |nucleotide, final_counts|
NUCLEOTIDE_COMPOSITIONS[nucleotide].each { |element, element_count|
final_counts[element] += element_count
}
}.map { |element, element_count|
"#{element}#{element_count.zero? ? "" : element_count}"
}.join
end
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
p formula(sequence)
# => "C434H888N51O213S"
You can't use regexp to multiply things. You can use it to parse a formula, but then it's on you and regular Ruby to do the math. The first job is to prepare a composition lookup by breaking down each nucleotide formula. Once we have a composition hash for each nucleotide, we can iterate over a nucleotide sequence, and add up all the elements of each nucleotide.
BTW, tally is not particularly useful here, since tally will need to iterate over the sequence, and then you have to iterate over tally anyway — and there is no aggregate operation going on that can't be done going over each letter independently.
EDIT: I probably made the regexp slightly more complicated that it needs to be, but it should parse stuff like CuSO4 correctly. I don't know if it's an accident or not that all nucleotides are only composed of elements with a single-character symbol... :P )
Givens
We are given a string representing a protein comprised of amino acids:
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIR" +
"AKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
and a hash that contains the formulas of amino acids:
formulas = {
"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4",
"C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2",
"H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2",
"M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3",
"T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"
}
Obtain counts of atoms in each amino acid
As a first step we can calculate the numbers of each atom in each amino acid:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).
each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
#=> {"A"=>{"C"=>3, "H"=>7, "N"=>1, "O"=>2},
# "R"=>{"C"=>6, "H"=>14, "N"=>4, "O"=>2},
# ...
# "M"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2, "S"=>1}
# ...
# "V"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2}}
Compute formula for protein
Then it's simply:
def protein_formula(sequence, counts)
sequence.each_char.
with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
end
protein_formula(sequence, counts)
#=> "C434H888N120O213S5"
Another example:
protein_formula("MCMPCFTTDHQMARKCDDCCGGKGRGKCYGPQCLCR", count)
#=> "C158H326N52O83S11"
Explanation of calculation of counts
This calculation:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
uses the method Hash#transform_values. It will return a hash having the same keys as the hash formulas, with the values of those keys in formula modified by transform_values's block. For example, formulas["A"] ("C3H7NO2") is "transformed" to the hash {"C"=>3, "H"=>7, "N"=>1, "O"=>2} in the hash that is returned, counts.
transform_values passes each value of formulas to the block and sets the block variable equal to it. The first value passed is "C3H7NO2", so it sets:
s = "C3H7NO2"
We can write the block calculation more simply:
h = {}
s.scan(/[CHNOS]\d*/).each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
h
(Once you understand this calculation, which I explain below, see Enumerable#each_with_object to understand why I used that method in my solution.)
After initializing h to an empty hash, the following calculations are performed:
h = {}
a = s.scan(/[CHNOS]\d*/)
#=> ["C3", "H7", "N", "O2"]
a is computed using String#scan with the regular expression /[CHNOS]\d*/. That regular expression, or regex, matches exactly one character in the character class [CHNOS] followed by zero of more (*) digits (\d). It therefore separates the string "C3H7NO2" into the substrings that are returned in the array shown under the calculation of a above . Continuing,
a.each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
changes h to the following:
h #=> {"C"=>3, "H"=>7, "N"=>1, "O"=>2}
The block variable s is initially set equal to the first element of a that is passed to each's block:
s = "C3"
then we compute:
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
h["A"] = 2 == 1 ? 1 : "3".to_i
= false ? 1 : 3
3
This is repeated for each element of a.
Exclamation of construction of formula for the protein
We can simplify the following code1:
sequence.each_char.with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
to more or less the following:
h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }
ch = sequence.chars
#=> ["M", "G", "A",..., "F", "I"]
ch.each do |c|
counts[c].each { |aa,cnt| h[aa] += cnt }
end
h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5}
When the first value of ch ("M") is passed to each's block (when h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }), the following calculations are performed:
c = "M"
g = counts[c]
#=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
g.each { |aa,cnt| h[aa] += cnt }
h #=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
Lastly, (when h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5})
s = ''
h.each { |aa,nbr| s << "#{aa}#{nbr}" }
s #=> "C434H888N120O213S5"
When aa = "C" and nbr = 434,
"#{aa}#{nbr}"
#=> "C434"
is appended to the string s.
1. (("C"=>0, "H"=>0, "N"=>0, "O"=>0) is shorthand for ({"C"=>0, "H"=>0, "N"=>0, "O"=>0}).

How to exit when loop when the min of the sizes of two tables is reached

I am doing some data migration, comparing tables between a legacy and a new database. I have a loop that raises exceptions when two arrays do not have the same size.
array1.zip(array2).each do |ar1, ar2|
# some code here
end
I want to know how to exit the loop when we reach the same size of the two arrays.
The loop breaks when reaches the last element of the first array zipped.
array1 = ['a', 'b', 'c', 'd']
array2 = ['x', 'y', 'z']
array1.zip(array2).each do |ar1, ar2|
puts "#{ar1} -- #{ar2}"
end
puts "-"*10
array2.zip(array1).each do |ar2, ar1|
puts "#{ar1} -- #{ar2}"
end
You could swap the variables if the first array is bigger:
array1, array2 = array2, array1 if array1.size > array2.size
array1.zip(array2).each do |ar1, ar2|
puts "#{ar1} -- #{ar2}"
end
If you just want to check if data are the same and don't want keep track of data origin.
[array1.size, array1.size].min.each do |i|
# code here referencing array1[i] and array2[i]
end
Given arrays:
a = %w[ 1 2 3 10 ]
b = %w[ 1 4 5 1 ]
c = %w[ 5 4 3 ]
If you want to compare two arrays for length:
a.length == b.length
# => true
a.length == c.length
# => false
If you want to compare that the elements in the arrays are of the same length and that the arrays are the same length:
def equal_size_elements(a, b)
return false unless a.length == b.length
a.zip(b).all? do |_a, _b|
_a.length == _b.length
end
end
Where that checks if all of the elements have different lengths because if they all match then it's good, otherwise not good. That method will halt iterating as soon as it finds a mismatch because at that point they can't all pass.
I found a solution, maybe there is better one than mine, I am just a beginner in ruby :
j = 0
array1.zip(array2).each do |ar1, ar2|
j += 1
break if [array1.size,array2.size].min == j
....
end
Just a bit better than your solution:
array1.zip(array2).each_with_index do |zipped, index|
break if index == [array1.size, array2.size].min
puts zipped.first
puts zipped.last
puts
end

Using .min and max and pushing to an array

I'm working through Chris Pine's "learn to program" book and I am at the exercise in Chapter 10 where he asks you to alphabetize a list of words without using .sort. I used min/max (which he probably doesn't intend you to use either, but it's a start). This works, except when I use .min and push that value to the sorted array, the result comes out z to a, rather than a to z as I expected. When I use max (which I use in my code below just to make it work), it comes out a to z. Any idea why?
puts "Enter a list of words, separated by commas. Hit enter when done."
puts "This program will sort your words alphabetically."
word_list = gets.chomp.downcase
word_array = word_list.split(", ")
def sort_words (words)
sorted_array = [] if sorted_array.nil?
words = [] if words.nil?
until words.length == 0
first_word = words.max #method should be .min (?)
words.delete(first_word)
sorted_array.push(first_word)
sort_words(words)
end
puts sorted_array
end
sort_words(word_array)
Think of it like this.
unsorted = [1, 3, 2, 5, 4]
sorted = []
unsorted.max is 5. Delete that and push it onto sorted.
unsorted = [1, 3, 2, 4]
sorted = [5]
unsorted.max is 4. Delete that and push it onto sorted.
unsorted = [1, 3, 2]
sorted = [5, 4]
I think you can see where the mistake lies. push adds to the end of an array, so you want to build sorted from the smallest to the largest. Thus uses unsorted.max.
The problem with your code is you call sort_words(words) inside the loop after removing the max. This is a form of recursion. While you can write this sort routine using recursion, mixing the loop with recursion is causing your problem.
What is happening is the loop is removing the max element, then calling sort_words again with the same list less the max element. Then it does that again, and again, and again. You wind up with a stack of calls like...
call_stack sorted_array (local to each call)
sort_words([1,3,2,5,4]) [5]
sort_words([1,3,2,4]) [4]
sort_words([1,3,2]) [3]
sort_words([1,2]) [2]
sort_words([1]) [1]
sort_words([]) []
Since words is a reference it isn't copied in each call, every call to sort_words is working on the same word list. Each call shrinks words by one. When words is empty all the loops exit and print their results, but the stack returns from the bottom first! You get what looks like
1
2
3
4
5
But if you change puts sorted_array to puts "sorted array: #{sorted_array}" you'll see what's really happening.
sorted array: []
sorted array: ["1"]
sorted array: ["2"]
sorted array: ["3"]
sorted array: ["4"]
sorted array: ["5"]
Got it, thanks. Overdid it with the recursion in the loop. Deleted the call to the method within the loop. Also converted the sorted_array to a string for output.
puts "Enter a list of words, separated by commas. Hit enter when done."
puts "This program will sort your words alphabetically."
word_list = gets.chomp.downcase
word_array = word_list.split(", ")
def sort_words (words)
sorted_array = [] if sorted_array.nil?
words = [] if words.nil?
until words.length == 0
first_word = words.min
words.delete(first_word)
sorted_array.push(first_word)
end
sorted_words = sorted_array.join(", ")
puts sorted_words
end
sort_words(word_array)

Combine and sort 2 arrays

This question was asked somewhere else, but I just wanted to check if what I did was applicable given the rspec circumstances:
Write a method that takes two sorted arrays and produces the sorted array that combines both.
Restrictions:
Do not call sort anywhere.
Do not in any way modify the two arrays given to you.
Do not circumvent (2) by cloning or duplicating the two arrays, only to modify the copies.
Hint: you will probably need indices into the two arrays.
combine_arrays([1, 3, 5], [2, 4, 6]) == [1, 2, 3, 4, 5, 6]
Can you just combine the two arrays into a single array and then run a typical bubble sort?
def combine_arrays(arr1,arr2)
final = arr1 + arr2
sorted = true
while sorted do
sorted = false
(0..final.length - 2).each do |x|
if final[x] > final[x+1]
final[x], final[x+1] = final[x+1], final[x]
sorted = true
end
end
end
final
end
p combine_arrays([1,3,5],[2,4,6]) => [1, 2, 3, 4, 5, 6]
Here is a variant which relies solely on Ruby's enumerators. The result is short and sweet.
# merge two sorted arrays into a sorted combined array
def merge(a1, a2)
[].tap do |combined|
e1, e2 = a1.each, a2.each
# The following three loops terminate appropriately because
# StopIteration acts as a break for Kernel#loop.
# First, keep appending smaller element until one of
# the enumerators run out of data
loop { combined << (e1.peek <= e2.peek ? e1 : e2).next }
# At this point, one of these enumerators is "empty" and will
# break immediately. The other appends all remaining data.
loop { combined << e1.next }
loop { combined << e2.next }
end
end
The first loop keeps grabbing the minimum of the two enumerator values until one of the enumerators runs out of values. The second loop then appends all remaining (which may be none) values from the first array's enumerator, the third loop does the same for the second array's enumerator, and tap hands back the resulting array.
Sure, you can do that but you are overlooking a real gimmee - the two arrays you are given will already be sorted.
def combine_arrays(A1, A2)
retVal = Array.CreateInstance(System::Int32, A1.Length + A2.Length - 1)
i = 0
j = 0
while i < A1.Length | j < A2.Length
if i < A1.Length and self.A1(i) < self.A2(j) then
self.retVal(i + j) = self.A1(i)
i += 1
else
self.retVal(i + j) = self.A2(j)
j += 1
end
end
return retVal
end
This is based on the same logic as Dale M's post, but in proper ruby:
def combine_arrays(arr1,arr2)
[].tap do |out|
i1 = i2 = 0
while i1 < arr1.size || i2 < arr2.size
v1 = arr1[i1]
v2 = arr2[i2]
if v1 && (!v2 || v1 < v2)
out << v1
i1 += 1
else
out << v2
i2 += 1
end
end
end
end
combine_arrays([1,3,5], [2,4,6])
Take a look at this one:
def merge(arr1, arr2)
arr2.each { |n| arr1 = insert_into_place(arr1, n) }
arr1.empty? ? arr2 : arr1
end
def insert_into_place(array, number)
return [] if array.empty?
group = array.group_by { |n| n >= number }
bigger = group[true]
smaller = group[false]
if bigger.nil?
number > smaller.last ? smaller << number : smaller.unshift(number)
else
(smaller << number) + bigger
end
end

Ruby - Return duplicates in an array using hashes, is this efficient?

I have solved the problem using normal loops and now using hashes, however I am not confident I used the hashes as well as I could have. Here is my code:
# 1-100 whats duplicated
def whats_duplicated?(array)
temp = Hash.new
output = Hash.new
# Write the input array numbers to a hash table and count them
array.each do |element|
if temp[element] >= 1
temp[element] += 1
else
temp[element] = 1
end
end
# Another hash, of only the numbers who appeared 2 or more times
temp.each do |hash, count|
if count > 1
output[hash] = count
end
end
# Return our sorted and formatted list as a string for screen
output.sort.inspect
end
### Main
# array_1 is an array 1-100 with duplicate numbers
array_1 = []
for i in 0..99
array_1[i] = i+1
end
# seed 10 random indexes which will likely be duplicates
for i in 0..9
array_1[rand(0..99)] = rand(1..100)
end
# print to screen the duplicated numbers & their count
puts whats_duplicated?(array_1)
My question is really what to improve? This is a learning excercise for myself, I am practising some of the typical brain-teasers you may get in an interview and while I can do this easily using loops, I want to learn an efficient use of hashes. I re-did the problem using hashes hoping for efficiency but looking at my code I think it isn't the best it could be. Thanks to anyone who takes an interest in this!
The easiest way to find duplicates in ruby, is to group the elements, and then count how many are in each group:
def whats_duplicated?(array)
array.group_by { |x| x }.select { |_, xs| xs.length > 1 }.keys
end
whats_duplicated?([1,2,3,3,4,5,3,2])
# => [2, 3]
def whats_duplicated?(array)
array.each_with_object(Hash.new(0)) { |val, hsh| hsh[val] += 1 }.select { |k,v| v > 1 }.keys
end
I would do it this way:
def duplicates(array)
counts = Hash.new { |h,k| h[k] = 0 }
array.each do |number|
counts[number] += 1
end
counts.select { |k,v| v > 1 }.keys
end
array = [1,2,3,4,4,5,6,6,7,8,8,9]
puts duplicates(array)
# => [4,6,8]
Some comments about your code: The block if temp[element] == 1 seems not correct. I think that will fail if a number occurs three or more times in the array. You should at least fix it to:
if temp[element] # check if element exists in hash
temp[element] += 1 # if it does increment
else
temp[element] = 1 # otherwise init hash at that position with `1`
end
Furthermore I recommend not to use the for x in foo syntax. Use foo.each do |x| instead. Hint: I like to ask in interviews about the difference between both versions.

Resources