How do you Compare Key Value Pairs Ruby - ruby

What is the easiest way to compare each key-value pair in a hash in Ruby, to one another?
For example,
I want to sort this code so the highest three values are first. If the third spot has values that are all the same, then I want the one greatest
key to go in that spot.
{"Aa"=>1, "DDD"=>1, "DdD"=>1, "aA"=>1, "aa"=>1, "bb"=>1, "cC"=>1, "cc"=>1, "ddd"=>3, "e"=>7}
I need the above hash to be {"e"=>7, "ddd"=>3, "aa"=>1}

One string - last string makes what u want. Add .reverse after .sort for changing sort direction.
That is a solution:
# two lines for test
m = %w(a a a a a DDD DD ddd Ddd ddd e e e cC cC cC cC b x XXX XXX XXX ZZZ ZZZ ZZZ)
m = %w(n n n KKK KKK KKK KKK KKK LLL LLL LLL kk kk kk kk kk kk kk kk)
m = m.inject(Hash.new(0)) {|h, n| h.update(n => h[n]+1)}
m = m.sort.sort_by {|k, val| -val}.to_h

I would do it in three steps:
(1) Convert your Hash h to an Array of pairs:
kv_array = h.to_a
(2) Sort the array according to your criterion:
kv_array.sort! do
|left, right|
#.... implement your comparision here
end
(3) Turn your sorted array into a Hash
h_sorted = Hash[kv_array]

as suggested by #tadman, using a group_by then sorting the relevant groups will get you what you want, although you will need to tweak to fit your actual need, as a lot of assumptions were made:
m = {"Aa"=>1, "DDD"=>1, "DdD"=>1, "aA"=>1, "aa"=>1, "bb"=>1, "cC"=>1, "cc"=>1, "ddd"=>3, "e"=>7}
m.group_by { |k,v| v }
.each_with_object([]) {|a,x| x << [ a[1].compact.map { |b| b[0] }.min, a[0] ] }
.sort{|a| a[1]}
.to_h
=> {"e"=>7, "ddd"=>3, "Aa"=>1}
explanation:
firstly we group by the value (which returns a hash with the value as the first key, and an array of the hashes that match)
then we collect the grouped hash, and find the "first" key for each grouped value (using min) ... * this is an assumption * ... returning the whole thing as an array
then we sort the new array based on the "value"
then we convert the array to a hash
I added in new-lines to aid readability (hopefully)

I am so excited to get so many great suggestions! Thank you! This entire problem was to take a string and output the top three occurring words in an array. But, no special characters were allowed unless they were an apostrophe that was within the word. If there were multiple values and they were in the top three you had to pick the one that came closest to "a". In the beginning, I used the tally method to add everything up really quickly, and my plan was then to sort the hash by the value, but then when the values were the same I couldn't put the right key-value pair where it had to be if they shared the same value.
So, I came here and asked about sorting a hash, and then realized I needed to scratch my entire approach altogether. In the end, I found that I could sort the hash to an extent, but not to the place I needed/wanted so this is what I came up with!
def top_3_words(str)
str.scan(/[\w]+'?[\w]*/).sort.slice_when{|a, b| a != b}.max(3){|a, b| a.size <=> b.size}.flatten.uniq
end
p top_3_words("a a a b c c d d d d e e e e e") == ["e", "d", "a"]
p top_3_words("e e e e DDD ddd DdD: ddd ddd aa aA Aa, bb cc cC e e e") == ["e", "ddd", "aa"]
p top_3_words(" //wont won't won't ") == ["won't", "wont"]
p top_3_words(" , e .. ") == ["e"]
p top_3_words(" ... ") == []
p top_3_words(" ' ") == []
p top_3_words(" ''' ") == []
p top_3_words("""In a village of La Mancha, the name of which I have no desire to call to
mind, there lived not long since one of those gentlemen that keep a lance
in the lance-rack, an old buckler, a lean hack, and a greyhound for
coursing. An olla of rather more beef than mutton, a salad on most
nights, scraps on Saturdays, lentils on Fridays, and a pigeon or so extra
on Sundays, made away with three-quarters of his income.""") == ["a", "of", "on"]
My thought process with this was to call scan on the str so I could get everything I didn't want in the strings out of them. Then, I called sort on that return value because the last test case was really big and I needed all the like words together. After that, I called slice_when on that return value and said when a doesn't equal b then slice, so there would be a multi d array that I then could call max on, and because I sorted earlier, the values would be alphabetical so if there was a shared value it would give me the right one. I passed 3 to max to get the top three, and then called flatten so I had one array, and uniq to take out the extra characters!!
This is all so different from my original question, but I thought I would share what I was working on in case it could ever help anyone in the future!!

Related

Codewars - upper and lowercase letters are considered the same character - Ruby 2.5

So i'm on this Kata :
`
def first_non_repeating_letter(s)
a = s.chars
a.select! { |char| a.count(char) == 1 }
if a.empty?
("")
else
a.first
end
end
`
And the only thing i'm missing is :
"As an added challenge, upper- and lowercase letters are considered the same character, but the function should return the correct case for the initial letter. For example, the input 'sTreSS' should return 'T'."
s.downcase.chars doesn't apply here then. I tried with .casecmp but remain unsuccessful. Should i use regex ?
If the given string is s the computational complexity clearly will be at least O(s.size) (since we need to examine the entire string to confirm that a given character appears exactly once in the string). We may therefore look for a method with the same computational complexity, preferably one that employs relatively efficient Ruby built-in methods and is easy to understand and test.
Suppose the given string is as follows:
s = "TgHEtGgh"
The first character that appears only once, assuming case is not considered, is seen to be "E".
As a first step we may wish to compute the frequency of each character in the string, with lowercase and uppercase characters treated alike1:
sdn = s.downcase
#=> "tghetggh"
enum = sdn.each_char
#=> #<Enumerator: "tghetggh":each_char>
h = enum.tally
#=> {"t"=>2, "g"=>3, "h"=>2, "e"=>1}
This uses the methods String#downcase, String#each_char and Enumerable#tally. Each of these methods has a computational complexity of O(s.size), so the calculation of h has the same complexity. As a bonus, each of these methods is implemented in C.
We may, of course, chain these methods:
h = s.downcase.each_char.tally
#=> {"t"=>2, "g"=>3, "h"=>2, "e"=>1}
We may now simply step through the characters of the string s until a character c is found for which h[c.downcase] == 1 is true.
s.each_char.find { |c| h[c.downcase] == 1 }
#=> "E"
See Enumerable#find.
For this last step to have a computational complexity of O(s.size) the computational complexity of the calculation h[c.downcase] would have to equal 1. In fact, the computational complexity of hash key lookups is slightly greater than 1, but from a practical standpoint we may assume it equals 1.
1. Note that we could have obtained the same result by having written arr = sdn.chars #=> ["t", "g", "h", "e", "t", "g", "g", "h"], then h = arr.tally. This has the disadvantage that, unlike String#each_char, String#chars creates a temporary array, consuming memory, though in this case the memory savings by using each_char may be minimal.
It is kinda tricky with your implementation because now you don't have any explicit comparison operation. Instead, you use a trick with Array#count.
The thing is that this implementation not only inflexible, it is also very inefficient. Its O(n^2) (because for each element traversed by select you call count on the whole array), so for inputs large enough
your implementation will be extremely slow.
Good thing is that addressing the performance issue you will be able to easily implement this added challenge too (because the comparison operation(s) will become explicit, so you will be able to downcase only what need to be downcased without affecting the input per se).
Let's think about the generic solution. What is "non-repeating" character? It is a character whose first and last occurrences in a string are the same. So, if we iterate over the string and build some auxiliary data structure that a) keeps this first/last occurrences and b) allows its constant time lookup, we could solve the task in linear time.
Let's go then:
def first_non_repeating_letter(str)
# We'd like to have a hash where keys are chars (downcased)
# and values are arrays of [<first occurence index>, <last occurence index>]
positions = {}
str.each_char.with_index do |c, i|
key = c.downcase
if positions.key?(key)
# We've seen it before, so all we need to do is to update its last occurrence position
positions[key][1] = i
else
# This is the 1st time we met the char `c`. So we make its first
# and last occurrence positions the same (i)
positions[key] = [i, i]
end
end
# At this point, for the given string 'sTreSS' (for example) we would build
# positions = {"s"=>[0, 5], "t"=>[1, 1], "r"=>[2, 2], "e"=>[3, 3]}
# Now let's do the main job finally
str.chars.find { |c| positions[c.downcase][0] == positions[c.downcase][1] } || ""
end
pry(main)> first_non_repeating_letter('sTreSS')
=> "T"

Using regular expressions to multiply and sum numeric string characters contained in a hash of mixed numeric strings

Without getting too much into biology, Proteins are made of Amino Acids. Each of the 20 Amino Acids that make up Proteins are represented by characters in a sequence. Each Amino Acid char has a different chemical formula, which I represent as strings. For example, "M" has a formula of "C5H11NO2S"
Given the 20 different formulas (and the varying frequency of each amino acid chars in a protein sequence) I want to compile all 20 of them into a single formula that will yield the total formula for the protein.
So first: multiply each formula by the frequency of its char in the sequence
Second : sum together all multiplied formulas into one formula.
To accomplish this, I first tried multiplying each amino acid char frequency in the sequence by the numbers in the chemical formula. I did this using .tally
sequence ="MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
sequence.chars.string.tally --> {"M"=>2, "G"=>5, "A"=>11, "R"=>5, "T"=>2, "L"=>9, "P"=>5, "D"=>5, "C"=>3, "S"=>4, "V"=>5, "H"=>1, "Q"=>4, "F"=>3, "N"=>3, "I"=>8, "K"=>7, "E"=>5, "Y"=>2}
Then, I listed all the amino acids chars and formulas into a hash
hash_of_formulas = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
An example of what the process for my overall goal is:
In the sequence , "M" occurs twice so "C5H11NO2S" will become "C10H22N2O4S2". "C" has a formula of "C3H7NO2S" occurs 3 times: In the sequence so "C3H7NO2S" becomes "C9H21N3O6S3"
So, Summing together "C10H22N2O4S2" and "C9H21N3O6S3" will yield "C19H43N5O10S5"
How can I repeat the process of multiplying each formula by its frequency and then summing together all multiplied formulas?
I know that I could use regex for multiplying a formula by its frequency for an individual string using
formula_multiplied_by_frequency = "C5H11NO2S".gsub(/\d+/) { |x| x.to_i * 4}
But I'm not sure of any methods to use regex on strings embedded within hashes
If I understand correctly, you want the to provide the total formula for a given protein sequence. Here's how I'd do it:
NUCLEOTIDES = {"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4", "C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2", "H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2", "M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3", "T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"}
NUCLEOTIDE_COMPOSITIONS = NUCLEOTIDES.each_with_object({}) { |(nucleotide, formula), compositions|
compositions[nucleotide] = formula.scan(/([A-Z][a-z]*)(\d*)/).map { |element, count| [element, count.empty? ? 1 : count.to_i] }.to_h
}
def formula(sequence)
sequence.each_char.with_object(Hash.new(0)) { |nucleotide, final_counts|
NUCLEOTIDE_COMPOSITIONS[nucleotide].each { |element, element_count|
final_counts[element] += element_count
}
}.map { |element, element_count|
"#{element}#{element_count.zero? ? "" : element_count}"
}.join
end
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIRAKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
p formula(sequence)
# => "C434H888N51O213S"
You can't use regexp to multiply things. You can use it to parse a formula, but then it's on you and regular Ruby to do the math. The first job is to prepare a composition lookup by breaking down each nucleotide formula. Once we have a composition hash for each nucleotide, we can iterate over a nucleotide sequence, and add up all the elements of each nucleotide.
BTW, tally is not particularly useful here, since tally will need to iterate over the sequence, and then you have to iterate over tally anyway — and there is no aggregate operation going on that can't be done going over each letter independently.
EDIT: I probably made the regexp slightly more complicated that it needs to be, but it should parse stuff like CuSO4 correctly. I don't know if it's an accident or not that all nucleotides are only composed of elements with a single-character symbol... :P )
Givens
We are given a string representing a protein comprised of amino acids:
sequence = "MGAAARTLRLALGLLLLATLLRPADACSCSPVHPQQAFCNADVVIR" +
"AKAVSEKEVDSGNDIYGNPIKRIQYEIKQIKMFKGPEKDIEFI"
and a hash that contains the formulas of amino acids:
formulas = {
"A"=>"C3H7NO2", "R"=>"C6H14N4O2", "N"=>"C4H8N2O3", "D"=>"C4H7NO4",
"C"=>"C3H7NO2S", "E"=>"C5H9NO4", "Q"=>"C5H10N2O3", "G"=>"C2H5NO2",
"H"=>"C6H9N3O2", "I"=>"C6H13NO2", "L"=>"C6H13NO2", "K"=>"C6H14N2O2",
"M"=>"C5H11NO2S", "F"=>"C9H11NO2", "P"=>"C5H9NO2", "S"=>"C3H7NO3",
"T"=>"C4H9NO3", "W"=>"C11H12N2O2", "Y"=>"C9H11NO3", "V"=>"C5H11NO2"
}
Obtain counts of atoms in each amino acid
As a first step we can calculate the numbers of each atom in each amino acid:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).
each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
#=> {"A"=>{"C"=>3, "H"=>7, "N"=>1, "O"=>2},
# "R"=>{"C"=>6, "H"=>14, "N"=>4, "O"=>2},
# ...
# "M"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2, "S"=>1}
# ...
# "V"=>{"C"=>5, "H"=>11, "N"=>1, "O"=>2}}
Compute formula for protein
Then it's simply:
def protein_formula(sequence, counts)
sequence.each_char.
with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
end
protein_formula(sequence, counts)
#=> "C434H888N120O213S5"
Another example:
protein_formula("MCMPCFTTDHQMARKCDDCCGGKGRGKCYGPQCLCR", count)
#=> "C158H326N52O83S11"
Explanation of calculation of counts
This calculation:
counts = formulas.transform_values do |s|
s.scan(/[CHNOS]\d*/).each_with_object({}) do |s,h|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
end
uses the method Hash#transform_values. It will return a hash having the same keys as the hash formulas, with the values of those keys in formula modified by transform_values's block. For example, formulas["A"] ("C3H7NO2") is "transformed" to the hash {"C"=>3, "H"=>7, "N"=>1, "O"=>2} in the hash that is returned, counts.
transform_values passes each value of formulas to the block and sets the block variable equal to it. The first value passed is "C3H7NO2", so it sets:
s = "C3H7NO2"
We can write the block calculation more simply:
h = {}
s.scan(/[CHNOS]\d*/).each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
h
(Once you understand this calculation, which I explain below, see Enumerable#each_with_object to understand why I used that method in my solution.)
After initializing h to an empty hash, the following calculations are performed:
h = {}
a = s.scan(/[CHNOS]\d*/)
#=> ["C3", "H7", "N", "O2"]
a is computed using String#scan with the regular expression /[CHNOS]\d*/. That regular expression, or regex, matches exactly one character in the character class [CHNOS] followed by zero of more (*) digits (\d). It therefore separates the string "C3H7NO2" into the substrings that are returned in the array shown under the calculation of a above . Continuing,
a.each do |s|
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
end
changes h to the following:
h #=> {"C"=>3, "H"=>7, "N"=>1, "O"=>2}
The block variable s is initially set equal to the first element of a that is passed to each's block:
s = "C3"
then we compute:
h[s[0]] = s.size == 1 ? 1 : s[1..-1].to_i
h["A"] = 2 == 1 ? 1 : "3".to_i
= false ? 1 : 3
3
This is repeated for each element of a.
Exclamation of construction of formula for the protein
We can simplify the following code1:
sequence.each_char.with_object("C"=>0, "H"=>0, "N"=>0, "O"=>0) do |c,h|
counts[c].each { |aa,cnt| h[aa] += cnt }
end.each_with_object('') { |(aa,nbr),s| s << "#{aa}#{nbr}" }
to more or less the following:
h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }
ch = sequence.chars
#=> ["M", "G", "A",..., "F", "I"]
ch.each do |c|
counts[c].each { |aa,cnt| h[aa] += cnt }
end
h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5}
When the first value of ch ("M") is passed to each's block (when h = { "C"=>0, "H"=>0, "N"=>0, "O"=>0, "S"=>0 }), the following calculations are performed:
c = "M"
g = counts[c]
#=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
g.each { |aa,cnt| h[aa] += cnt }
h #=> {"C"=>10, "H"=>22, "N"=>2, "O"=>4, "S"=>1}
Lastly, (when h #=> {"C"=>434, "H"=>888, "N"=>120, "O"=>213, "S"=>5})
s = ''
h.each { |aa,nbr| s << "#{aa}#{nbr}" }
s #=> "C434H888N120O213S5"
When aa = "C" and nbr = 434,
"#{aa}#{nbr}"
#=> "C434"
is appended to the string s.
1. (("C"=>0, "H"=>0, "N"=>0, "O"=>0) is shorthand for ({"C"=>0, "H"=>0, "N"=>0, "O"=>0}).

Using a pair of values as a key

Very frequently I've had the need to hash a pair of values. Often, I just generate a range between num1 and num2 and hash that as a key, but that's pretty slow because the distance between those two numbers can be quite large.
How can one go about hashing a pair of values to a table? For example, say I'm iterating through an array and want to hash every single possible pair of values into a hash table, where the key is the pair of nums and the value is their sum. What's an efficient way to do this? I've also thought about hashing an an array as the key, but that doesn't work.
Also, how would one go about extending this to 3,4, or 5 numbers?
EDIT:
I'm referring to hashing for O(1) lookup in a hashtable.
Just do it.
You can simply hash on the array...
Verification
Let me show a little experiment:
array = [ [1,2], [3,4], ["a", "b"], ["c", 5] ]
hash = {}
array.each do |e|
e2 = e.clone
e << "dummy"
e2 << "dummy"
hash[e] = (hash[e] || 0) + 1
hash[e2] = (hash[e2] || 0) + 1
puts "e == e2: #{(e==e2).inspect}, e.id = #{e.object_id}, e.hash = #{e.hash}, e2.id = #{e2.object_id}, e2.hash = #{e2.hash}"
end
puts hash.inspect
As you see, I take a few arrays, clone them, modify them separately; after this, we are sure that e and e2 are different arrays (i.e. different object IDs); but they contain the same elements. After this, the two different arrays are used as hash keys; and since they have the same content, are hashed together.
e == e2: true, e.id = 19797864, e.hash = -769884714, e2.id = 19797756, e2.hash = -769884714
e == e2: true, e.id = 19797852, e.hash = -642596098, e2.id = 19797588, e2.hash = -642596098
e == e2: true, e.id = 19797816, e.hash = 104945655, e2.id = 19797468, e2.hash = 104945655
e == e2: true, e.id = 19797792, e.hash = -804444135, e2.id = 19797348, e2.hash = -804444135
{[1, 2, "dummy"]=>2, [3, 4, "dummy"]=>2, ["a", "b", "dummy"]=>2, ["c", 5, "dummy"]=>2}
As you see, you can not only use arrays as keys, but it also recognizes them as being the "same" (and not some weird object identity which it could also be).
Caveat
Obviously this works only to a point. The contents of the arrays must recursively be well-defined with regards to hashing. I.e., you can use sane things like strings, numbers, other arrays, even nil in there.
Reference
From http://ruby-doc.org/core-2.4.0/Hash.html :
Two objects refer to the same hash key when their hash value is identical and the two objects are eql? to each other.
From http://ruby-doc.org/core-2.4.0/Array.html#method-i-eql-3F :
eql?(other) → true or false
Returns true if self and other are the same object, or are both arrays with the same content (according to Object#eql?).
hash → integer
Compute a hash-code for this array.
Two arrays with the same content will have the same hash code (and will compare using eql?).
Emphasis mine.
If you are using a range or array, then you can also call hash on it and use that.
(num1..num2).hash
[num1, num2].hash
That will return a key that you can use as a hash. I have no idea if this is efficient. It does show the source code on the range documentation and the array documentation
Another way I would do it is to turn the numbers into strings. This is the better solution if you are worried about hash collisions.
'num1:num2'
And the ruby-esque ways that I would solve your problem are:
number_array.combination(2).each { |arr| my_hash[arr.hash] = arr }
number_array.combination(2).each { |arr| my_hash[arr.join(":")] = arr }
A hash table, where the key is the pair of nums and the value is their sum:
h = {}
[1,4,6,8].combination(2){|ar| h[ar] = ar.sum}
p h #=>{[1, 4]=>5, [1, 6]=>7, [1, 8]=>9, [4, 6]=>10, [4, 8]=>12, [6, 8]=>14}
Note that using arrays as hash keys is no problem at all. To extend this to 3,4, or 5 numbers use combination(3) #or 4 or 5.

How to group Arrays with similar values

I'm looking for a smart way to group any number of arrays with similar values (not necessarily in the same order). The language I'm using is ruby but I guess the problem is pretty language agnostic.
Given
a = ['foo', 'bar']
b = ['bar', 'foo']
c = ['foo', 'bar', 'baz']
d = ['what', 'ever', 'else']
e = ['foo', 'baz', 'bar']
I'd like to have a function that tells me that
a & b are in one group
c & e are in one group
d is it's own group
I can think of a number of not so smart ways of doing this very inefficient, like I could compare each array's values to each others array's values.
Or I could check if ((a - b) + (b - a)).length == 0 for all combinations of arrays and group the ones that result in 0. Or I could check if a.sort == b.sort for all combinations of arrays.
I'm sure someone before me has solved this problem way more efficiently. I just can't seem to find how.
You can do it with sort without doing it "for all combinations of arrays" but doing it only for all arrays (Schwartzian transform).
arrays = [a, b, c, d, e]
arrays.group_by{|array| array.sort}.values

Two indexes in Ruby for loop

can you have a ruby for loop that has two indexes?
ie:
for i,j in 0..100
do something
end
Can't find anything in google
EDIT: Adding in more details
I need to compare two different arrays like such
Index: Array1: Array2:
0 a a
1 a b
2 a b
3 a b
4 b b
5 c b
6 d b
7 d b
8 e c
9 e d
10 e d
11 e
12 e
But knowing that they both have the same items (abcde)
This is my logic in pseudo, lets assume this whole thing is inside a loop
#tese two if states are for handling end-of-array cases
If Array1[index_a1] == nil
Errors += Array1[index_a1-1]
break
If Array2[index_a1] == nil
Errors += Array2[index_a2-1]
break
#this is for handling mismach
If Array1[index_a1] != Array2[index_a2]
Errors += Array1[index_a1-1] #of course, first entry of array will always be same
if Array1[index_a1] != Array1[index_a1 - 1]
index_a2++ until Array1[index_a1] == Array2[index_a2]
index_a2 -=1 (these two lines are for the loop's sake in next iteration)
index_a1 -=1
if Array2[index_a2] != Array2[index_a2 - 1]
index_a1++ until Array1[index_a1] == Array2[index_a2]
index_a2 -=1 (these two lines are for the loop's sake in next iteration)
index_a1 -=1
In a nutshell, in the example above,
Errors looks like this
a,b,e
As c and d are good.
You could iterate over two arrays using Enumerators instead of numerical indices. This example iterates over a1 and a2 simultaneously, echoing the first word in a2 that starts with the corresponding letter in a1, skipping duplicates in a2:
a1 = ["a", "b", "c", "d"]
a2 = ["apple", "angst", "banana", "clipper", "crazy", "dizzy"]
e2 = a2.each
a1.each do |letter|
puts e2.next
e2.next while e2.peek.start_with?(letter) rescue nil
end
(It assumes all letters in a1 have at least one word in a2 and that both are sorted -- but you get the idea.)
The for loop is not the best way to approach iterating over an array in Ruby. With the clarification of your question, I think you have a few possibly strategies.
You have two arrays, a and b.
If both arrays are the same length:
a.each_index do |index|
if a[index] == b[index]
do something
else
do something else
end
end
This also works if A is shorter than B.
If you don't know which one is shorter, you could write something like:
controlArray = a.length < b.length ? a : b to assign the controlArray, the use controlArray.each_index. Or you could use (0..[a.length, b.length].min).each{|index| ...} to accomplish the same thing.
Looking over your edit to your question, I think I can rephrase it like this: given an array with duplicates, how can I obtain a count of each item in each array and compare the counts? In your case, I think the easiest way to do that would be like this:
a = [:a,:a,:a,:b,:b,:c,:c,:d,:e,:e,:e]
b = [:a,:a,:b,:b,:b,:c,:c,:c,:d,:e,:e,:e]
not_alike = []
a.uniq.each{|value| not_alike << value if a.count(value) != b.count(value)}
not_alike
Running that code gives me [:a,:b,:c].
If it is possible that a does not contain every symbol, then you will need to have an array which just contains the symbols and use that instead of a.uniq, and another and statement in the conditional could deal with nil or 0 counts.
the two arrays are praticatly the same except for a few elements that i have to skip in either/or every once in a while
Instead of skipping during iterating, could you pre-select the non-skippable ones?
a.select{ ... }.zip( b.select{ ... } ).each do |a1,b1|
# a1 is an entry from a's subset
# b1 is the paired entry bfrom b's subset
end

Resources