Comparing strings of equal lengths and noting where the differences occur - ruby

Given two strings of equal length such that
s1 = "ACCT"
s2 = "ATCT"
I would like to find out the positions where there strings differ. So i have done this. (please suggest a better way of doing it. I bet there should be)
z= seq1.chars.zip(seq2.chars).each_with_index.map{|(s1,s2),index| index+1 if s1!=s2}.compact
z is an array of positions where the two strings are different. In this case z returns 2
Imagine that I add a new string
s3 = "AGCT"
and I wish to compare it with the the others and see where the 3 strings differ. We could do the same approach as above but this time
s1.chars.zip(s2.chars,s3.chars)
returns an array of arrays. Given two strings I was relaying on just comparing two chars for equality, but as I add more strings it starts to become overwhelming and as the strings become longer.
#=> [["A", "A", "A"], ["C", "T", "G"], ["C", "C", "C"], ["T", "T", "T"]]
Running
s1.chars.zip(s2.chars,s3.chars).each_with_index.map{|item| item.uniq}
#=> [["A"], ["C", "T", "G"], ["C"], ["T"]]
can help reduce redundancy and return positions that are exactly the same(non empty subarray of size 1). I could then print out the indices and contents of the subarrays that are of size > 1.
s1.chars.zip(s2.chars,s3.chars,s4.chars).each_with_index.map{|item| item.uniq}.each_with_index.map{|a,index| [index+1,a] unless a.size== 1}.compact.map{|h| Hash[*h]}
#=> [{2=>["C", "T", "G"]}]
I feel that this will glide to a halt or get slow as I increase the number of strings and as the string lengths get longer. What are some alternative ways of optimally doing this?
Thank you.

Here's where I'd start. I'm purposely using different strings to make it easier to see the differences:
str1 = 'jackdaws love my giant sphinx of quartz'
str2 = 'jackdaws l0ve my gi4nt sphinx 0f qu4rtz'
To get the first string's characters:
str1.chars.with_index.to_a - str2.chars.with_index.to_a
=> [["o", 10], ["a", 19], ["o", 30], ["a", 35]]
To get the second string's characters:
str2.chars.with_index.to_a - str1.chars.with_index.to_a
=> [["0", 10], ["4", 19], ["0", 30], ["4", 35]]
There will be a little slow down as the strings get bigger, but it won't be bad.
EDIT: Added more info.
If you have an arbitrary number of strings, and need to compare them all, use Array#combination:
str1 = 'ACCT'
str2 = 'ATCT'
str3 = 'AGCT'
require 'pp'
pp [str1, str2, str3].combination(2).to_a
>> [["ACCT", "ATCT"], ["ACCT", "AGCT"], ["ATCT", "AGCT"]]
In the above output you can see that combination cycles through the array, returning the various n sized combinations of the array elements.
pp [str1, str2, str3].combination(2).map{ |a,b| a.chars.with_index.to_a - b.chars.with_index.to_a }
>> [[["C", 1]], [["C", 1]], [["T", 1]]]
Using combination's output you could cycle through the array, comparing all the elements against each other. So, in the above returned array, in the "ACCT" and "ATCT" pair, 'C' was the difference between the two, located at position 1 in the string. Similarly, in "ACCT" and "AGCT" the difference is "C" again, in position 1. Finally for 'ATCT' and 'AGCT' it's 'T' at position 1.
Because we already saw in the longer string samples that the code will return multiple changed characters, this should get you pretty close.

Solution 1
strings = %w[ACCT ATCT AGCT]
First, join the strings, and make a hash of all the positions for each character.
joined = strings.join
positions = (0...joined.length).group_by{|i| joined[i]}
# => {"A"=>[0, 4, 8], "C"=>[1, 2, 6, 10], "T"=>[3, 5, 7, 11], "G"=>[9]}
Then, group the indices by their corresponding position within each string, remove those that are repeated as many times as the number of strings. This part is a variant of an algorithm that Jorg suggests.
length = strings.first.length
n = strings.length
diff = Hash[*positions.map{|k, v|
[k, v.group_by{|i| i % length}.reject{|i, is| is.length == n}.keys]
}]
This will give something like:
diff
# => {"A"=>[], "C"=>[1], "T"=>[1], "G"=>[1]}
which means that, "A" appears in the same positions in all strings, and "C", "T", and "G" differ at position 1 (count starts from 0) of the strings.
If you simply want to know the positions where the strings differ, do
diff["G"] + diff["A"] + diff["C"] + diff["T"]
# or diff["G"] + diff["A"] + diff["C"]
# => [1]
Solution 2
Note that, by maintaining an array of indices where a pairwise comparison fails, and keep adding to indices to it, comparison of s1 against the rest (s2, s3, ...) will suffice.
length = s1.length
diff = []
[s2, s3, ...].each{|s| diff += (0...length).reject{|i| s1[i] == s[i]}}
Explanation in a bit more detail
Suppose
s1 = 'GGGGGGGGG'
s2 = 'GGGCGGCGG'
s3 = 'GGGAGGCGG'
Afters1 and s2 are compared, we have the set of indices [3, 6] that represents where they differ. Now, when we add s3 into consideration, it does not matter whether we compare it with s1 or with s2 because, if s1[i] and s2[i] are different, then i is already included in the set [3, 6], so it does not make difference whether or not either of them are different from s3[i] and i is to be added to the set. On the other hand, if s1[i] and s2[i] are the same, it also does not make difference which one of them we compare with s3[i]. Therefore, pairwise comparison of s1 with s2, s3, ... is enough.

You almost certainly don't want to be doing this analysis with your own code. Rather, you want to be handing it off to an existing multiple sequence alignment tool, like Clustal.
I realise this is not an answer to your question, but i hope it's a solution to your problem!

Related

Ruby Regular expression to find the binary gap

I want to find the binary gap using Ruby regex
Say 1000001001010011100000000000, From left I want to use regex to match
A. 1000001 should return 00000
B. 1001 should return 00
C. 101 should return 0
D 1001 should return 00
My first attempt look like this but its missing the B and D
Update
A binary gap within a positive integer N is any maximal sequence of consecutive zeros that is surrounded by ones at both ends in the binary representation of N.
I think what you are looking for is:
/1(0+)(?=1)/
The problem with your pattern is that you consume the "closing 1". Consequence, the next research starts after this "closing 1".
But if you use a lookahead (that is a zero width assertion that doesn't consume characters and only tests what happens after), the "closing 1" isn't consumed and you get the desired result, because the next research starts after the last zero.
Note that if you don't need the zeros to be enclosed between ones, you can also simply use: /0+/
Other way: if you are sure that the string only contains 1s and 0s, you can also use the (non-)word-boundary assertion \B with this pattern: 1\K0++\B
R = /
(?= # start a positive lookahead
1 # match a one
(0+) # match one or more zeros in capture group 1
1 # match a one
) # end positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
The elements of arr correspond to all all substrings of one or more "0"'s of str that are preceded and followed by 1. The first element of each pair is the substring, the second is the offset into str where the substring begins.
Here's a second example.
str = "10011001010101001110001000100101"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00", 1], ["00", 5], ["0", 8], ["0", 10], ["0", 12], ["00", 14],
# ["000", 19], ["000", 23], ["00", 27], ["0", 30]]
Note that one must use a positive lookahead, rather than a positive lookbehind, as (in Ruby) the latter does not permit variable-length strings (i.e., 0+).
#Stefan, in a comment, suggested an improvement:
R = /
(?<=1) # match a one in a positive lookbehind
0+ # match one or more zeros
(?=1) # match a one in a positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m, Regexp.last_match.begin(0)] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
This is similar to what #Casimir suggests (/1(0+)(?=1)/), except that by putting the first 1 in a positive lookbehind there's no need for the capture group.
Here is another way that does not use a regex.
str = "1000001001010011100000000000"
(0..str.size-3).each_with_object([]) do |i,a|
next if str[i] == '0' || str[i+1] == '1'
ndx = str[i+2..-1].index('1')
a << [str[i+1, 1+ndx], i+1] if ndx
end
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
In order to get only the zeroes in between ones, you need to use regex lookbehind and lookahead:
(?:<=1)0+(?:=1)
After that you only need to get the max lenght element.

Ruby - pushing values from an array combination to a new array

I am trying to print all the different sums of all combinations in this array [1,2,3]. I want to first push every sum result to a new array b, then print them using b.uniq so that non of the sum results are repeated.
However, with the code I have, the 3 repeats itself, and I think it is because of the way it is pushed into the array b.
Is there a better way of doing this?
a = [1,2,3]
b = []
b.push a
b.push a.combination(2).collect {|a,b| (a+b)}
b.push a.combination(3).collect {|a,b,c| (a+b+c)}
puts b.uniq
p b #[[1, 2, 3], [3, 4, 5], [6]]
Can someone please help me with this? I am still new in ruby.
Because an Array of arbitrary length can be summed using inject(:+), we can create a more general solution by iterating over the range 1..n, where n is the length of the Array.
(1..(a.size)).flat_map do |n|
a.combination(n).map { |c| c.inject(&:+) }
end.uniq
#=> [1, 2, 3, 4, 5, 6]
By using flat_map, we can avoid getting the nested Array result, and can call uniq directly on it. Another option to ensure uniqueness would be to pass the result to a Set, for which Ruby guarantees uniqueness internally.
require "set"
sums = (1..(a.size)).flat_map do |n|
a.combination(n).map { |c| c.inject(&:+) }
end
Set.new(sums)
#=> #<Set: {1, 2, 3, 4, 5, 6}>
This will work for an any Array, as long as all elements are Fixnum.
If all you want is an array of the possible sums, flatten the array before getting the unique values.
puts b.flatten.uniq
What is happening is uniq is running over a multi-dimensional array. This causes it to look for duplicate arrays in your array. You'll need the array to be flattened first.

Adding unparallel elements between two arrays

I have a pair of arrays,
array_1 = [1,2,3,4,5]
array_2 = [10,9,8,7,6]
and I'm trying to subtract the nth element of one array from the (n-1)-th element of the second array, starting with the n-th element, yielding an array of:
[9-1, 8-2, 7-3, 6-4] = [8, 6, 4, 2]
I wrote it in a procedural fashion:
array_1.pop
array_2.shift
[array_2,array_1].transpose.map { |a,b| a-b }
but I do not wish to alter the arrays. Is there a method or another way to go about this?
Another way:
enum1 = array_1.to_enum
enum2 = array_2.to_enum
enum2.next
arr = []
loop do
arr << enum2.next - enum1.next
end
arr
#=> [8, 6, 4, 2]
Use the non-destructive drop for the receiver, and zip, which will stop when the receiver runs out of an element even if the argument has more.
array_2.drop(1).zip(array_1).map{|a, b| a - b}
I think you may be overthinking it a bit; as long as both arrays are the same length, you can just iterate over the indices you care about, and reference the other array by index - offset.
array_1 = [1,2,3,4,5]
array_2 = [10,9,8,7,6]
n = 1
(n...array_1.length).map {|i| array_2[i] - array_1[i - 1] }
You can set n to whatever number you like and compute from that point onwards, so even if the arrays were tremendously large, you don't have to generate any intermediate arrays, and you don't have to perform any unnecessary work.

Alphabetical sorting of an array without using the sort method

I have been working through Chris Pine's tutorial for Ruby and am currently working on a way to sort an array of names without using sort.
My code is below. It works perfectly but is a step further than I thought I had got!
puts "Please enter some names:"
name = gets.chomp
names = []
while name != ''
names.push name
name = gets.chomp
end
names.each_index do |first|
names.each_index do |second|
if names[first] < names[second]
names[first], names[second] = names[second], names[first]
end
end
end
puts "The names you have entered in alphabetical order are: " + names.join(', ')
It is the sorting that I am having trouble getting my head around.
My understanding of it is that each_index would look at the position of each item in the array. Then the if statement takes each item and if the number is larger than the next it swaps it in the array, continuing to do this until the biggest number is at the start. I would have thought that this would just have reversed my array, however it does sort it alphabetically.
Would someone be able to talk me through how this algorithm is working alphabetically and at what point it is looking at what the starting letters are?
Thanks in advance for your help. I'm sure it is something very straightforward but after much searching I can't quite figure it out!
I think the quick sort algorithm is one of the easier ones to understand:
def qsort arr
return [] if arr.length == 0
pivot = arr.shift
less, more = arr.partition {|e| e < pivot }
qsort(less) + [pivot] + qsort(more)
end
puts qsort(["George","Adam","Michael","Susan","Abigail"])
The idea is that you pick an element (often called the pivot), and then partition the array into elements less than the pivot and those that are greater or equal to the pivot. Then recursively sort each group and combine with the pivot.
I can see why you're puzzled -- I was too. Look at what the algorithm does at each swap. I'm using numbers instead of names to make the order clearer, but it works the same way for strings:
names = [1, 2, 3, 4]
names.each_index do |first|
names.each_index do |second|
if names[first] < names[second]
names[first], names[second] = names[second], names[first]
puts "[#{names.join(', ')}]"
end
end
end
=>
[2, 1, 3, 4]
[3, 1, 2, 4]
[4, 1, 2, 3]
[1, 4, 2, 3]
[1, 2, 4, 3]
[1, 2, 3, 4]
In this case, it started with a sorted list, then made a bunch of swaps, then put things back in order. If you only look at the first couple of swaps, you might be fooled into thinking that it was going to do a descending sort. And the comparison (swap if names[first] < names[second]) certainly seems to imply a descending sort.
The trick is that the relationship between first and second is not ordered; sometimes first is to the left, sometimes it's to the right. Which makes the whole algorithm hard to reason about.
This algorithm is, I guess, a strange implementation of a Bubble Sort, which I normally see implemented like this:
names.each_index do |first|
(first + 1...names.length).each do |second|
if names[first] > names[second]
names[first], names[second] = names[second], names[first]
puts "[#{names.join(', ')}]"
end
end
end
If you run this code on the same array of sorted numbers, it does nothing: the array is already sorted so it swaps nothing. In this version, it takes care to keep second always to the right of first and does a swap only if the value at first is greater than the value at second. So in the first pass (where first is 0), the smallest number winds up in position 0, in the next pass the next smallest number winds up in the next position, etc.
And if you run it on array that reverse sorted, you can see what it's doing:
[3, 4, 2, 1]
[2, 4, 3, 1]
[1, 4, 3, 2]
[1, 3, 4, 2]
[1, 2, 4, 3]
[1, 2, 3, 4]
Finally, here's a way to visualize what's happening in the two algorithms. First the modified version:
0 1 2 3
0 X X X
1 X X
2 X
3
The numbers along the vertical axis represent values for first. The numbers along the horizontal represent values for second. The X indicates a spot at which the algorithm compares and potentially swaps. Note that it's just the portion above the diagonal.
Here's the same visualization for the algorithm that you provided in your question:
0 1 2 3
0 X X X X
1 X X X X
2 X X X X
3 X X X X
This algorithm compares all the possible positions (pointlessly including the values along the diagonal, where first and second are equal). The important bit to notice, though, is that the swaps that happen below and to the left of the diagonal represent cases where second is to the left of first -- the backwards case. And also note that these cases happen after the forward cases.
So essentially, what this algorithm does is reverse sort the array (as you had suspected) and then afterwards forward sort it. Probably not really what was intended, but the code sure is simple.
Your understanding is just a bit off.
You said:
Then the if statement takes each item and if the number is larger than the next it swaps it in the array
But this is not what the if statement is doing.
First, the two blocks enclosing it are simply setting up iterators first and second, which each count from the first to the last element of the array each time through the block. (This is inefficient but we'll leave discussion of efficient sorting for later. Or just see Brian Adkins' answer.)
When you reach the if statement, it is not comparing the indices themselves, but the names which are at those indices.
You can see what's going on by inserting this line just before the if. Though this will make your program quite verbose:
puts "Comparing names[#{first}] which is #{names[first]} to names[#{second}] which is #{names[second]}..."
Alternatively, you can create a new array and use a while loop to append the names in alphabetical order. Delete the elements that have been appended in the loop until there are no elements left in the old array.
sorted_names = []
while names.length!=0
sorted_names << names.min
names.delete(names.min)
end
puts sorted_names
This is the recursive solution for this case
def my_sort(list, new_array = nil)
return new_array if list.size <= 0
if new_array == nil
new_array = []
end
min = list.min
new_array << min
list.delete(min)
my_sort(list, new_array)
end
puts my_sort(["George","Adam","Michael","Susan","Abigail"])
Here is my code to sort items in an array without using the sort or min method, taking into account various forms of each item (e.g. strings, integers, nil):
def sort(objects)
index = 0
sorted_objects = []
while index < objects.length
sorted_item = objects.reduce do |min, item|
min.to_s > item.to_s ? item : min
end
sorted_objects << sorted_item
objects.delete_at(objects.find_index(sorted_item))
end
index += 1
sorted_objects
end
words_2 = %w{all i can say is that my life is pretty plain}
p sort(words_2)
=> ["all", "can", "i", "is", "is", "life", "my", "plain", "pretty", "say", "that"]
mixed_array_1 = ["2", 1, "5", 4, "3"]
p sort(mixed_array_1)
=> [1, "2", "3", 4, "5"]
mixed_array_2 = ["George","Adam","Michael","Susan","Abigail", "", nil, 4, "5", 100]
p sort(mixed_array_2)
=> ["", nil, 100, 4, "5", "Abigail", "Adam", "George", "Michael", "Susan"]

Adding two arrays in Ruby when the array length will always be the same

So, I need to add two arrays together to populate a third. EG
a = [1,2,3,4]
b = [3,4,5,6]
so that:
c = [4,6,8,10]
I read the answer given here: https://stackoverflow.com/questions/12584585/adding-two-ruby-arrays
but I'm using the codecademy labs ruby editor and it's not working there, plus the lengths of my arrays are ALWAYS going to be equal. Also, I don't have any idea what the method ".with_index" is or does and I don't understand why it's necessary to use ".to_i" when the value is already an integer.
It seems like this should be really simple?
a = [1,2,3,4]
b = [3,4,5,6]
a.zip(b).map { |i,j| i+j } # => [4, 6, 8, 10]
Here
a.zip(b) # => [[1, 3], [2, 4], [3, 5], [4, 6]]
and map converts each 2-tuple to the sum of its elements.
OPTION 1:
For a pure Ruby solution, try the transpose method:
a = [1,2,3,4]
b = [3,4,5,6]
c = [a, b].transpose.map{|x, y| x + y}
#=> [4,6,8,10]
OPTION 2:
If you're in a Rails environment, you can utilize Rails' sum method:
[a, b].transpose.map{|x| x.sum}
#=> [4,6,8,10]
EXPLANATION:
transpose works perfectly for your scenario, since it raises an IndexError if the sub-arrays don't have the same length. From the docs:
Assumes that self is an array of arrays and transposes the rows and columns.
If the length of the subarrays don’t match, an IndexError is raised.

Resources