Use regular expression to evaluate and modify string in Ruby? - ruby

I want to modify part of a string I have using Ruby.
The string is [x, y] where y is an integer that I want to change to its alphabetical letter. So say [1, 1] would become [1, A] and [1, 26] would become [1, Z].
Would a regular expression help me do this? or is there an easier way? I am not to strong with regular expressions, I am reading up on those now.

The shortest way I can think of is the following
string = "[1,1]"
array = string.chop.reverse.chop.reverse.split(',')
new_string="[#{array.first},#{(array.last.to_i+64).chr}]"

Maybe this helps:
Because we do not have an alphabet yet we can look up the position, create one.
This is a range converted to an array so you don't need to specify it yourself.
alphabet = ("A".."Z").to_a
Then we try to get the integer/position out of the string:
string_to_match = "[1,5]"
/(\d+)\]$/.match(string_to_match)
Maybe the regexp can be improved, however for this example it is working.
The first reference in the MatchData is holding the second integer in your "string_to_match".
Or you can get it via "$1".
Do not forget to convert it to an integer.
position_in_alphabet = $1.to_i
Also we need to remember that the index of arrays starts at 0 and not 1
position_in_alphabet -= 1
Finally, we can take a look which char we really get
char = alphabet[position_in_alphabet]
Example:
alphabet = ("A".."Z").to_a #=> ["A", "B", "C", ..*snip*.. "Y", "Z"]
string_to_match = "[1,5]" #=> "[1,5]"
/(\d+)\]$/.match(string_to_match) #=> #<MatchData "5]" 1:"5">
position_in_alphabet = $1.to_i #=> 5
position_in_alphabet -= 1 #=> 4
char = alphabet[position_in_alphabet] #=> "E"
Greetings~

Related

Why can I assign two variables corresponding to an array in Ruby?

After about a year of Ruby, I just saw this somewhere and my mind is blown. Why in the world does this work?
>> words = ['uno', 'dos']
=> ["uno", "dos"]
>> first, second = words
=> ["uno", "dos"]
>> first
=> "uno"
>> second
=> "dos"
Specifically, how does this work:
>> first, second = ['uno', 'dos']
Why can I do this? It makes no syntactical sense!
It makes no syntactical sense
But this is part of Ruby's syntax! In the Ruby docs it is known as array decomposition:
Like Array decomposition in method arguments you can decompose an
Array during assignment using parenthesis:
(a, b) = [1, 2]
p a: a, b: b # prints {:a=>1, :b=>2}
You can decompose an Array as part of a larger multiple assignment:
a, (b, c) = 1, [2, 3]
p a: a, b: b, c: c # prints {:a=>1, :b=>2, :c=>3}
Since each decomposition is considered its own multiple assignment you
can use * to gather arguments in the decomposition:
a, (b, *c), *d = 1, [2, 3, 4], 5, 6
p a: a, b: b, c: c, d: d
# prints {:a=>1, :b=>2, :c=>[3, 4], :d=>[5, 6]}
Edit
as Stefan points out in the comments, the docs don't mention that array decomposition also occurs implicitly (i.e. without parenthesis) if there is only one value on the right-hand side:
a, b = [1, 2] works like (a, b) = [1, 2]
Why can I do this? It makes no syntactical sense!
It makes a perfect sense. It is an example of parallel assignment.
When you use = what is happening is each of the list of variables on the left of = are assigned to each of the list of expressions on the right of =.
first, second = ['uno', 'dos']
# is equivalent to
first, second = 'uno', 'dos'
If there are more variables on the left, than expressions on the right, those left variables are assigned with nil:
first, second = 'uno'
first #=> 'uno'
second #=> nil
As to
words = ['uno', 'dos']
first, second = words
first #=> 'uno'
second #=> 'dos'
It is not assigning the whole words array to first leaving second with nil, because while parallel assignment Ruby tries to decompose the right side expression, and does so if it is an instance of Array.
[TIL] Moreover, it attempts to call to_ary on the right side expression, and if it responds to the method, decomposes accordingly to that object's to_ary implementation (credits to #Stefan):
string = 'hello world'
def string.to_ary; split end
first, second = string
first #=> 'hello'
second #=> 'world'
This is called multiple assignment, handy to assign multiple variables at once.
example
one, two = 1,2
puts one #=>1
puts two #=>2
one, two = [1,2] # this makes sense
one, two = 1 # obviously this doesn't it will assign nil to two
Hope its bit clear now

Ruby element match

I am trying to find the index of the first and second instance of a string variable. I want to be able to use any predefined string variable but when I try to do that it gives me an error. I want to be able to declare multiple string variables like ss, aa, ff, etc and use them in place of xx. Can someone help me out?
#aa is a predefined array
xx = "--help--"
find_xx_instance = aa.each_with_index.select{|i,idx| i =~ /xx/}
#/--help--/works but not /xx/
find_xx_instance.map! {|i| i[1]}
#gives me info between the first two instances of string
puts aa[find_xx_instance[0]+1..find_xx_instance[1]-1]
As far as I understand, you just need to pass variable to regular expression. Try this:
find_xx_instance = aa.each_with_index.select{|i,idx| i =~ /#{xx}/}
I have assumed you are given an array of strings, arr, a string str, and an integer n, and wish to return an array a of n elements i, where i is the index of the ith+1 instance of str in arr.
For example:
arr = %w| Now is the time for the Zorgs to attack the Borgs |
#=> ["Now", "is", "the", "time", "for", "the", "Zorgs", "to", "attack", "the", "Borgs"]
str = "the"
nbr = 2
This is one way:
b = arr.each_index.select { |i| arr[i]==str }
#=> [2, 5, 9]
b.first(nbr)
#=> [2, 5]
which can be written
arr.each_index.select { |i| arr[i]==str }.first(nbr)
This For small problems like this one, that's fine, but if arr is large, it would be better to terminate the calculations after nbr instances of str have been found. We can do that by creating a Lazy enumerator:
arr.each_index.lazy.select { |i| arr[i]==str }.first(nbr)
#=> [2, 5]
Here's a second example that clearly illustrates that lazy is stopping the calculations after nbr strings str in arr have been found:
(0..Float::INFINITY).lazy.select { |i| arr[i] == str }.first(nbr)
#=> [2, 5]

How to create two seperate arrays from one input?

DESCRIPTION:
The purpose of my code is to take in input of a sequence of R's and C's and to simply store each number that comes after the character in its proper array.
For Example: "The input format is as follows: R1C4R2C5
Column Array: [ 4, 5 ] Row Array: [1,2]
My problem is I am getting the output like this:
[" ", 1]
[" ", 4]
[" ", 2]
[" ", 5]
**How do i get all the Row integers following R in one array, and all the Column integers following C in another seperate array. I do not want to create multiple arrays, Rather just two.
Help!
CODE:
puts 'Please input: '
input = gets.chomp
word2 = input.scan(/.{1,2}/)
col = []
row = []
word2.each {|a| col.push(a.split(/C/)) if a.include? 'C' }
word2.each {|a| row.push(a.split(/R/)) if a.include? 'R' }
col.each do |num|
puts num.inspect
end
row.each do |num|
puts num.inspect
end
x = "R1C4R2C5"
col = []
row = []
x.chars.each_slice(2) { |u| u[0] == "R" ? row << u[1] : col << u[1] }
p col
p row
The main problem with your code is that you replicate operations for rows and columns. You want to write "DRY" code, which stands for "don't repeat yourself".
Starting with your code as the model, you can DRY it out by writing a method like this to extract the information you want from the input string, and invoke it once for rows and once for columns:
def doit(s, c)
...
end
Here s is the input string and c is the string "R" or "C". Within the method you want
to extract substrings that begin with the value of c and are followed by digits. Your decision to use String#scan was a good one, but you need a different regex:
def doit(s, c)
s.scan(/#{c}\d+/)
end
I'll explain the regex, but let's first try the method. Suppose the string is:
s = "R1C4R2C5"
Then
rows = doit(s, "R") #=> ["R1", "R2"]
cols = doit(s, "C") #=> ["C4", "C5"]
This is not quite what you want, but easily fixed. First, though, the regex. The regex first looks for a character #{c}. #{c} transforms the value of the variable c to a literal character, which in this case will be "R" or "C". \d+ means the character #{c} must be followed by one or more digits 0-9, as many as are present before the next non-digit (here a "R" or "C") or the end of the string.
Now let's fix the method:
def doit(s, c)
a = s.scan(/#{c}\d+/)
b = a.map {|str| str[1..-1]}
b.map(&:to_i)
end
rows = doit(s, "R") #=> [1, 2]
cols = doit(s, "C") #=> [4, 5]
Success! As before, a => ["R1", "R2"] if c => "R" and a =>["C4", "C5"] if c => "C". a.map {|str| str[1..-1]} maps each element of a into a string comprised of all characters but the first (e.g., "R12"[1..-1] => "12"), so we have b => ["1", "2"] or b =>["4", "5"]. We then apply map once again to convert those strings to their Fixnum equivalents. The expression b.map(&:to_i) is shorthand for
b.map {|str| str.to_i}
The last computed quantity is returned by the method, so if it is what you want, as it is here, there is no need for a return statement at the end.
This can be simplified, however, in a couple of ways. Firstly, we can combine the last two statements by dropping the last one and changing the one above to:
a.map {|str| str[1..-1].to_i}
which also gets rid of the local variable b. The second improvement is to "chain" the two remaining statements, which also rids us of the other temporary variable:
def doit(s, c)
s.scan(/#{c}\d+/).map { |str| str[1..-1].to_i }
end
This is typical Ruby code.
Notice that by doing it this way, there is no requirement for row and column references in the string to alternate, and the numeric values can have arbitrary numbers of digits.
Here's another way to do the same thing, that some may see as being more Ruby-like:
s.scan(/[RC]\d+/).each_with_object([[],[]]) {|n,(r,c)|
(n[0]=='R' ? r : c) << n[1..-1].to_i}
Here's what's happening. Suppose:
s = "R1C4R2C5R32R4C7R18C6C12"
Then
a = s.scan(/[RC]\d+/)
#=> ["R1", "C4", "R2", "C5", "R32", "R4", "C7", "R18", "C6", "C12"]
scan uses the regex /([RC]\d+)/ to extract substrings that begin with 'R' or 'C' followed by one or more digits up to the next letter or end of the string.
b = a.each_with_object([[],[]]) {|n,(r,c)|(n[0]=='R' ? r : c) << n[1..-1].to_i}
#=> [[1, 2, 32, 4, 18], [4, 5, 7, 6, 12]]
The row values are given by [1, 2, 32, 4, 18]; the column values by [4, 5, 7, 6, 12].
Enumerable#each_with_object (v1.9+) creates an array comprised of two empty arrays, [[],[]]. The first subarray will contain the row values, the second, the column values. These two subarrays are represented by the block variables r and c, respectively.
The first element of a is "R1". This is represented in the block by the variable n. Since
"R1"[0] #=> "R"
"R1"[1..-1] #=> "1"
we execute
r << "1".to_i #=> [1]
so now
[r,c] #=> [[1],[]]
The next element of a is "C4", so we will execute:
c << "4".to_i #=> [4]
so now
[r,c] #=> [[1],[4]]
and so on.
rows, cols = "R1C4R2C5".scan(/R(\d+)C(\d+)/).flatten.partition.with_index {|_, index| index.even? }
> rows
=> ["1", "2"]
> cols
=> ["4", "5"]
Or
rows = "R1C4R2C5".scan(/R(\d+)/).flatten
=> ["1", "2"]
cols = "R1C4R2C5".scan(/C(\d+)/).flatten
=> ["4", "5"]
And to fix your code use:
word2.each {|a| col.push(a.delete('C')) if a.include? 'C' }
word2.each {|a| row.push(a.delete('R')) if a.include? 'R' }

Ruby inject with index and brackets

I try to clean my Code. The first Version uses each_with_index. In the second version I tried to compact the code with the Enumerable.inject_with_index-construct, that I found here.
It works now, but seems to me as obscure as the first code.
Add even worse I don't understand the brackets around element,index in
.. .inject(groups) do |group_container, (element,index)|
but they are necessary
What is the use of these brackets?
How can I make the code clear and readable?
FIRST VERSION -- WITH "each_with_index"
class Array
# splits as good as possible to groups of same size
# elements are sorted. I.e. low elements go to the first group,
# and high elements to the last group
#
# the default for number_of_groups is 4
# because the intended use case is
# splitting statistic data in 4 quartiles
#
# a = [1, 8, 7, 5, 4, 2, 3, 8]
# a.sorted_in_groups(3) # => [[1, 2, 3], [4, 5, 7], [8, 8]]
#
# b = [[7, 8, 9], [4, 5, 7], [2, 8]]
# b.sorted_in_groups(2) {|sub_ary| sub_ary.sum } # => [ [[2, 8], [4, 5, 7]], [[7, 8, 9]] ]
def sorted_in_groups(number_of_groups = 4)
groups = Array.new(number_of_groups) { Array.new }
return groups if size == 0
average_group_size = size.to_f / number_of_groups.to_f
sorted = block_given? ? self.sort_by {|element| yield(element)} : self.sort
sorted.each_with_index do |element, index|
group_number = (index.to_f / average_group_size).floor
groups[group_number] << element
end
groups
end
end
SECOND VERSION -- WITH "inject" AND index
class Array
def sorted_in_groups(number_of_groups = 4)
groups = Array.new(number_of_groups) { Array.new }
return groups if size == 0
average_group_size = size.to_f / number_of_groups.to_f
sorted = block_given? ? self.sort_by {|element| yield(element)} : self.sort
sorted.each_with_index.inject(groups) do |group_container, (element,index)|
group_number = (index.to_f / average_group_size).floor
group_container[group_number] << element
group_container
end
end
end
What is the use of these brackets?
It's a very nice feature of ruby. I call it "destructuring array assignment", but it probably has an official name too.
Here's how it works. Let's say you have an array
arr = [1, 2, 3]
Then you assign this array to a list of names, like this:
a, b, c = arr
a # => 1
b # => 2
c # => 3
You see, the array was "destructured" into its individual elements. Now, to the each_with_index. As you know, it's like a regular each, but also returns an index. inject doesn't care about all this, it takes input elements and passes them to its block as is. If input element is an array (elem/index pair from each_with_index), then we can either take it apart in the block body
sorted.each_with_index.inject(groups) do |group_container, pair|
element, index = pair
# or
# element = pair[0]
# index = pair[1]
# rest of your code
end
Or destructure that array right in the block signature. Parentheses there are necessary to give ruby a hint that this is a single parameter that needs to be split in several.
Hope this helps.
lines = %w(a b c)
indexes = lines.each_with_index.inject([]) do |acc, (el, ind)|
acc << ind - 1 if el == "b"
acc
end
indexes # => [0]
What is the use of these brackets?
To understand the brackets, first you need to understand how destruction works in ruby. The simplest example I can think of this this:
1.8.7 :001 > [[1,3],[2,4]].each do |a,b|
1.8.7 :002 > puts a, b
1.8.7 :003?> end
1
3
2
4
You should know how each function works, and that the block receives one parameter. So what happens when you pass two parameters? It takes the first element [1,3] and try to split (destruct) it in two, and the result is a=1 and b=3.
Now, inject takes two arguments in the block parameter, so it is usually looks like |a,b|. So passing a parameter like |group_container, (element,index)| we are in fact taking the first one as any other, and destructing the second in two others (so, if the second parameter is [1,3], element=1 and index=3). The parenthesis are needed because if we used |group_container, element, index| we would never know if we are destructing the first or the second parameter, so the parenthesis there works as disambiguation.
9In fact, things works a bit different in the bottom end, but lets hide this for this given question.)
Seems like there already some answers given with good explanation. I want to add some information regards the clear and readable.
Instead of the solution you chose, it is also a possibility to extend Enumerable and add this functionality.
module Enumerable
# The block parameter is not needed but creates more readable code.
def inject_with_index(memo = self.first, &block)
skip = memo.equal?(self.first)
index = 0
self.each_entry do |entry|
if skip
skip = false
else
memo = yield(memo, index, entry)
end
index += 1
end
memo
end
end
This way you can call inject_with_index like so:
# m = memo, i = index, e = entry
(1..3).inject_with_index(0) do |m, i, e|
puts "m: #{m}, i: #{i}, e: #{e}"
m + i + e
end
#=> 9
If you not pass an initial value the first element will be used, thus not executing the block for the first element.
In case, someone is here from 2013+ year, you have each_with_object and with_index for your needs:
records.each_with_object({}).with_index do |(record, memo), index|
memo[record.uid] = "#{index} in collection}"
end

Comparing strings of equal lengths and noting where the differences occur

Given two strings of equal length such that
s1 = "ACCT"
s2 = "ATCT"
I would like to find out the positions where there strings differ. So i have done this. (please suggest a better way of doing it. I bet there should be)
z= seq1.chars.zip(seq2.chars).each_with_index.map{|(s1,s2),index| index+1 if s1!=s2}.compact
z is an array of positions where the two strings are different. In this case z returns 2
Imagine that I add a new string
s3 = "AGCT"
and I wish to compare it with the the others and see where the 3 strings differ. We could do the same approach as above but this time
s1.chars.zip(s2.chars,s3.chars)
returns an array of arrays. Given two strings I was relaying on just comparing two chars for equality, but as I add more strings it starts to become overwhelming and as the strings become longer.
#=> [["A", "A", "A"], ["C", "T", "G"], ["C", "C", "C"], ["T", "T", "T"]]
Running
s1.chars.zip(s2.chars,s3.chars).each_with_index.map{|item| item.uniq}
#=> [["A"], ["C", "T", "G"], ["C"], ["T"]]
can help reduce redundancy and return positions that are exactly the same(non empty subarray of size 1). I could then print out the indices and contents of the subarrays that are of size > 1.
s1.chars.zip(s2.chars,s3.chars,s4.chars).each_with_index.map{|item| item.uniq}.each_with_index.map{|a,index| [index+1,a] unless a.size== 1}.compact.map{|h| Hash[*h]}
#=> [{2=>["C", "T", "G"]}]
I feel that this will glide to a halt or get slow as I increase the number of strings and as the string lengths get longer. What are some alternative ways of optimally doing this?
Thank you.
Here's where I'd start. I'm purposely using different strings to make it easier to see the differences:
str1 = 'jackdaws love my giant sphinx of quartz'
str2 = 'jackdaws l0ve my gi4nt sphinx 0f qu4rtz'
To get the first string's characters:
str1.chars.with_index.to_a - str2.chars.with_index.to_a
=> [["o", 10], ["a", 19], ["o", 30], ["a", 35]]
To get the second string's characters:
str2.chars.with_index.to_a - str1.chars.with_index.to_a
=> [["0", 10], ["4", 19], ["0", 30], ["4", 35]]
There will be a little slow down as the strings get bigger, but it won't be bad.
EDIT: Added more info.
If you have an arbitrary number of strings, and need to compare them all, use Array#combination:
str1 = 'ACCT'
str2 = 'ATCT'
str3 = 'AGCT'
require 'pp'
pp [str1, str2, str3].combination(2).to_a
>> [["ACCT", "ATCT"], ["ACCT", "AGCT"], ["ATCT", "AGCT"]]
In the above output you can see that combination cycles through the array, returning the various n sized combinations of the array elements.
pp [str1, str2, str3].combination(2).map{ |a,b| a.chars.with_index.to_a - b.chars.with_index.to_a }
>> [[["C", 1]], [["C", 1]], [["T", 1]]]
Using combination's output you could cycle through the array, comparing all the elements against each other. So, in the above returned array, in the "ACCT" and "ATCT" pair, 'C' was the difference between the two, located at position 1 in the string. Similarly, in "ACCT" and "AGCT" the difference is "C" again, in position 1. Finally for 'ATCT' and 'AGCT' it's 'T' at position 1.
Because we already saw in the longer string samples that the code will return multiple changed characters, this should get you pretty close.
Solution 1
strings = %w[ACCT ATCT AGCT]
First, join the strings, and make a hash of all the positions for each character.
joined = strings.join
positions = (0...joined.length).group_by{|i| joined[i]}
# => {"A"=>[0, 4, 8], "C"=>[1, 2, 6, 10], "T"=>[3, 5, 7, 11], "G"=>[9]}
Then, group the indices by their corresponding position within each string, remove those that are repeated as many times as the number of strings. This part is a variant of an algorithm that Jorg suggests.
length = strings.first.length
n = strings.length
diff = Hash[*positions.map{|k, v|
[k, v.group_by{|i| i % length}.reject{|i, is| is.length == n}.keys]
}]
This will give something like:
diff
# => {"A"=>[], "C"=>[1], "T"=>[1], "G"=>[1]}
which means that, "A" appears in the same positions in all strings, and "C", "T", and "G" differ at position 1 (count starts from 0) of the strings.
If you simply want to know the positions where the strings differ, do
diff["G"] + diff["A"] + diff["C"] + diff["T"]
# or diff["G"] + diff["A"] + diff["C"]
# => [1]
Solution 2
Note that, by maintaining an array of indices where a pairwise comparison fails, and keep adding to indices to it, comparison of s1 against the rest (s2, s3, ...) will suffice.
length = s1.length
diff = []
[s2, s3, ...].each{|s| diff += (0...length).reject{|i| s1[i] == s[i]}}
Explanation in a bit more detail
Suppose
s1 = 'GGGGGGGGG'
s2 = 'GGGCGGCGG'
s3 = 'GGGAGGCGG'
Afters1 and s2 are compared, we have the set of indices [3, 6] that represents where they differ. Now, when we add s3 into consideration, it does not matter whether we compare it with s1 or with s2 because, if s1[i] and s2[i] are different, then i is already included in the set [3, 6], so it does not make difference whether or not either of them are different from s3[i] and i is to be added to the set. On the other hand, if s1[i] and s2[i] are the same, it also does not make difference which one of them we compare with s3[i]. Therefore, pairwise comparison of s1 with s2, s3, ... is enough.
You almost certainly don't want to be doing this analysis with your own code. Rather, you want to be handing it off to an existing multiple sequence alignment tool, like Clustal.
I realise this is not an answer to your question, but i hope it's a solution to your problem!

Resources