Assignment Branch Condition size is too high - ruby

I'm making method that takes multiline string (logs) and writes new strings to array.
def task_2(str)
result = []
str.each_line do |x|
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
if !ip.nil? && !datetime.nil? && !address.nil?
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
end
result
end
and I need it to pass rubocop analysis with default configuration, but it gives AbcSize 18.68/15
And I'm sure that because of if..end statement, but how can I rewrite it?
Log example:
10.6.246.103 - - [23/Apr/2018:20:30:39 +0300] "POST /test/2/messages HTTP/1.1" 200 48 0.0498
10.6.246.101 - - [23/Apr/2018:20:30:42 +0300] "POST /test/2/run HTTP/1.1" 200 - 0.2277

The ABC size is calculated by doing the following:
√(assignments² + branches² + conditionals²)
Let's first have a look at the assignments:
result = []
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
This leaves us with 4 assignments.
Next up the branches. For this I have to mention that most of the operators are methods (thus count towards branches) for example 1 + 1 could also be written as 1.+(1) + is a method on an integer. The same applies for string[regex], which could be written as string.[](regex) [] is a method on strings. And !value which could be written as value.!# !# is a method on all objects.
With that out of the way let's count the branches.
str.each_line
x[/^.* - -/]
x[/[\[].*[\]]/]
x[/T .* H/]
!ip.nil? # counts for 2 (! and .nil?)
!datetime.nil? # counts for 2 (! and .nil?)
!address.nil? # counts for 2 (! and .nil?)
result << ...
datetime[1..-2]
ip[0..-4]
address[1..-3]
+ # 4 times in result << ... + ... + ....
This leaves us with 18 branches.
The last thing to count are the conditionals. Since Ruby uses short circuiting with the && and || operators they will count towards conditionals.
if
&& # 2 times
This leaves us with 3 conditionals.
√(4² + 18² + 3²) ≈ 18.68
Now that we have an understanding of where the number is coming from, we can try and reduce it. The easiest way to reduce ABC size is by reducing the thing with the largest number, since this number is squared. In your case these are the branches. You already spotted where the issue lies in the question.
if !ip.nil? && !datetime.nil? && !address.nil?
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
Could be simplified to:
if ip && datetime && address
result << "#{datetime[1..-2]} FROM: #{ip[0..-4]}TO:#{address[1..-3]}"
end
Taking a total of 10 branches away. 3 times !something.nil? (which count for 2, since ! and .nil? are both counted towards branches) and 4 times +.
Leaving you with:
√(4² + 8² + 3²) ≈ 9.43

def task_2(str)
result = []
str.each_line do |x|
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
if ip && datetime && address
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
end
result
end
Having !variable.nil? is redundant. Basically, you are checking presence here, so #present? method would suffice, but any value that is not nil or false is considered false, so to be more idiomatic it is better to just use the form I used in the if statement. This solves ABS issue.

I don't use rubocop, but I did test the following with this data:
data = <<FILE
10.6.246.103 - - [23/Apr/2018:20:30:39 +0300] "POST /test/2/messages HTTP/1.1" 200 48 0.0498
10.6.246.101 - - [23/Apr/2018:20:30:42 +0300] "POST /test/2/run HTTP/1.1" 200 - 0.2277
12.55.123.255 - - Hello
FILE
using String#gsub! and Enumerable#select (Reports AbcSize of 3)
def task_2(str)
str.each_line.select do |x|
# Without named groups
# x.gsub!(/\A([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*\z/m,
# '\2 FROM \1 TO \3')
x.gsub!(/\A(?<ip>[\d+\.\d+]+).*(?<=\[)(?<date_time>.*)(?=\]).*(?<=\s)(?<address>(?:\/\w+)*?)(?=\s).*\z/m,
'\k<date_time> FROM \k<ip> TO \k<address>')
end
end
task_2(data)
# => ["23/Apr/2018:20:30:39 +0300 FROM 10.6.246.103 TO /test/2/messages",
# "23/Apr/2018:20:30:42 +0300 FROM 10.6.246.101 TO /test/2/run"]
Here we are using String#gsub! with a pattern replacement, which will return nil if no replacement is made thus rejecting it from Enumerable#select.
Similar solution, although likely less efficient, using String#match, Enumerable#map, and Array#compact (Reports AbcSize of 7.14)
def task_2(str)
str.each_line.map do |x|
match = x.match(/\A(?<ip>[\d+\.\d+]+).*(?<=\[)(?<date_time>.*)(?=\]).*(?<=\s)(?<address>(?:\/\w+)*?)(?=\s)/)
"#{match['date_time']} FROM #{match['ip']} TO #{match['address']}" if match
end.compact
end
Here we are using String#match to extract the match data and then confirming a match and outputting the desired format if there is a match. A string that does not match will output nil and thus we compact the Array to remove the nil values.
Another option could just be to scan the whole String all at once and break out the matching groups: (Reports AbcSize of 5)
def task_2(str)
str.scan(/^([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*$/)
.map {|a| "#{a[1]} FROM #{a[0]} TO #{a[2]}"}
end
Can make the last one as low as 2.24 via
def task_2(str)
r = []
str.scan(/^([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*$/) do |ip, date_time, address |
r << "#{date_time} FROM #{ip} TO #{address}"
end
r
end

Any time I run into a ABC too high (or similar complexity/length warnings), I'm pretty quick to just chop the method up. Your readability, testability, and maintainability almost always improve.
The quickest way is to yank out the body of a loop or conditional into a new method. Repeat as needed until you can read each method in one breath.
Similarly, if you've got big complex conditionals/loop constructs, pull that out to a new method as well.
Combining those two strategies enough times will reduce any method into roughly two method calls. That might be a bit overzealous in some cases...but it's never too far.
Here's one way you could apply that strategy to your code:
def task_2(str)
result = []
str.each_line do |x|
ip, datetime, address = parse_line(x)
if [ip, datetime, address].all?
result << "#{datetime[1..-2]} FROM: #{ip[0..-4]} TO: #{address[1..-3]}"
end
end
result
end
def parse_line(x)
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
return [ip, datetime, address]
end
s =<<EOF
123.123.123.999 - - [2009-12-31 13:13:13] T www.google.com H"
456.456.456.999 - - [2009-12-31 13:13:13] 404"
678.678.678.999 - - [2009-12-31 13:13:13] T www.amazon.com H"
EOF
puts task_2(s)
Produces the output:
2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com
2009-12-31 13:13:13 FROM: 678.678.678.999 TO: www.amazon.com
If you wanted to go even farther, you could pull the body of each_line out to a new method, process_line, etc. And if you created a class, you could avoid the messy (to my eye) multi-value returns.

This is a problem where it is conventient to use named capture groups.
R = /
(?= # begin a positive lookahead
(?<ip>.*\s-\s-) # match the string in a capture group named 'ip'
) # end positive lookahead
(?= # begin a positive lookahead
.* # match any number of characters
(?<datetime>[\[].*[\]]) # match the string in a capture group named 'datetime'
) # end positive lookahead
(?= # begin a positive lookahead
.* # match any number of characters
(?<address>T\s.*\sH) # match the string in a capture group named 'address'
) # end positive lookahead
/x # free-spacing regex definition mode
def task_2(str)
str.each_line.with_object([]) do |s, result|
m = str.match(R)
result << m[:datetime][1..-2] + ' FROM: ' + m[:ip][0..-4] +
'TO:' + m[:address][1..-3] unless m.nil?
end
end
str =<<_
123.123.123.999 - - [2009-12-31 13:13:13] T www.google.com H"
456.456.456.999 - - [2009-12-31 13:13:13] 404"
678.678.678.999 - - [2009-12-31 13:13:13] T www.amazon.com
_
task_2 str
#=> ["2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com",
# "2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com",
# "2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com"]
The regular expression is conventionally written as follows.
R = /(?=(?<ip>\A.* - -))(?=.*(?<datetime>[\[].*[\]]))(?=.*(?<address>T .* H))/
Notice that where I have spaces here I had whitespace characters (\s) when writing the regex in free-spacing mode. That's because in free-spacing mode spaces are stripped out before the expression is evaluated. Alternatively, spaces can be preserved in free-spacing mode by enclosing them in character classes ([ ]).

Related

Ruby - Issues iterating through multiple strings and populating arrays

I have a response variable that has the following strings in it:
response variable (this output I pulled from a telnet session, which I'm not showing below in the code for the sake of simplifying):
mydummyhost# show ip bgp 43.245.43.105
BGP routing table entry for 43.245.43.0/24
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
38561 2914 55432, (aggregated by 55532 202.68.67.134)
202.158.215.44 from 201.158.202.44 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Tue Sep 22 12:25:17 2020
38561 2914 55433, (aggregated by 55433 202.68.67.135)
202.158.215.52 from 202.158.215.52 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Mon Sep 21 06:44:58 2020
I have a piece of code that I'm using to try to iterate through the string lines above and basically get the following result :
Required result:
43.245.43.105 is domestic peering (On-Net) originated by AS 55432 via path 38561 2914 55432
43.245.43.105 is domestic peering (On-Net) originated by AS 55433 via path 38561 2914 55433
Code:
#!/usr/bin/env ruby
require 'net/telnet'
ipaddress = "43.245.43.105"
cat = []
response = ""
origin = []
paths = []
net = []
community = []
onoffnet= {
ond:"domestic (On-Net)",
oni:"international research (On-Net)",
opd:"domestic peering (On-Net)",
ofd:"domestic transit (Off-Net)",
opi:"international peering (Off-Net)",
ofi:"international transit (Off-Net)"
}
response.each_line do |line|
if line =~ /BGP routing table entry for (\d+\.\d+\.\d+\.\d+.*)/
net[i] = $1
elsif line =~ /Community: (.*)$/
community[i] = $1
elsif line =~ /^\s+([0-9\ ]+),.*/
paths, aggregatedBy = line.split(", ")
paths[i] = paths.strip
origin[i] == aggregatedBy.split(" ")[2]
elsif line =~ /Last update:/
i += 1
end
end
if i == 0
print "ERROR, no data found for the IP."
else
i = 0
net.each do | ip |
if community[i] =~ /7575\:1000/
cat[i] = onoffnet.fetch(:ond)
elsif community[i] =~ /7575\:1001/
cat[i] = onoffnet.fetch(:oni)
elsif community[i] =~ /7575\:1002/
if community[i] =~ /7575\:6001/
cat[i] = onoffnet.fetch(:opd)
else
cat[i] = onoffnet.fetch(:opi)
end
elsif community[i] =~ /7575\:1003/
if community[i] =~ /7575\:6001/
cat[i] = onoffnet.fetch(:ofd)
else
cat[i] = onoffnet.fetch(:ofi)
end
end
i += 1
if origin[i].to_s.length > 0 && paths[i].to_s.length > 0
puts "#{ipaddress} is cat[i] network ip[i] originated by AS #{origin} via path #{paths} ."
else
puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n"
puts "Test"
end
end
end
When I ran this, only "Test" is shown as output :
[root#mydummyhost]# ./telnet.rb
Test
Basically I'm trying to use line =~ /Last update:/ to determine that I have another separate paths and origin to display by incrementing i, if that makes sense.
So I know it had hit the if condition, however I'm not sure why its not showing the first print line puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n" , this appears to be empty.
EDIT:
After changing puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n" to puts "#{ipaddress} is #{cat[i]} network #{ip[i]}\n" (as suggested by #trueunlessfalse thanks!), I can now see an output :
[root#dummyhost]# ./telnet.rb
43.245.43.105 is network 3
Unfortunately this isnt even close to the required result I'm looking for.
I know somehow the problem is with the way I was iterating that response var and populating the arrays - I have a perl code that works and am trying to convert that to ruby - , I'm just not sure how to fix this or what is a better way of going through that output.
Any suggestions how I should iterate response so the arrays can be populated based on the i value that gets incremented when it comes to the end of the "block" marked by Last update:?
Thanks
J
The expected result appears to depend on the following types of values contained in the given string:
BGP ip (e.g., "43.245.43.105")
path values (e.g., "38561 2914 55432")
originated values (e.g., "38561")
community values (e.g., "7575:1002 7575:2462 7575:3002 7575:6001 7575:8001")
I suggest you first concentrate on extracting these values, after which it is fairly straightforward to construct the desired strings. My answer is limited to this initial task.
To produce some numbers I will first construct your string response.
response =<<~BITTER_END
mydummyhost# show ip bgp 43.245.43.105
BGP routing table entry for 43.245.43.0/24
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
38561 2914 55432, (aggregated by 55532 202.68.67.134)
202.158.215.44 from 201.158.202.44 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Tue Sep 22 12:25:17 2020
38561 2914 55433, (aggregated by 55433 202.68.67.135)
202.158.215.52 from 202.158.215.52 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Mon Sep 21 06:44:58 2020
BITTER_END
I don't know much about Telnet, but it appears response contains a block of data that begins with a line that starts with 'mydummyhost'. I have written this to permit multiple such blocks (each beginning with a line that begins, 'mydummyhost'), so as a first step I will apply String#scan with a regular expression as follows.
arr = response.scan(/^mydummyhost\D+.+?(?=\z|^mydummyhost)/m)
#=> ["mydummyhost# show ip bgp 43.245.43.105\nBGP routing table entry for 43.245.43.0/24\nPaths: (2 available, best #1, table Default-IP-Routing-Table)\n Not advertised to any peer\n 38561 2914 55432, (aggregated by 55532 202.68.67.134)\n 202.158.215.44 from 201.158.202.44 (202.158.215.62)\n Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001\n Last update: Tue Sep 22 12:25:17 2020\n\n 38561 2914 55433, (aggregated by 55433 202.68.67.135)\n 202.158.215.52 from 202.158.215.52 (202.158.215.62)\n Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001\n Last update: Mon Sep 21 06:44:58 2020\n"]
We can write the regular expression in free-spacing mode to make it self-documenting.
/
^mydummyhost\D+ # match 'mydummyhost' at the beginning of a line followed
# by 1+ characters other than digits (\D)
.+ # match 1+ characters, including line terminators
? # make previous match lazy (aka non-greedy)
(?= # begin a positive lookahead
\z # match end of string
| # or
^mydummyhost # match '^mydummyhost' at the beginning of a line
) # end positive lookahead
/mx # specify multiline (\m) and free-spacing regex definition modes
Multiline mode (named differently by other languages) causes the dot to match line terminators (\n and \r\n) as well as other characters.
You will see here that arr contains a single element. The next step is to map arr to values for each block. To simplify the presentation I will assume the string contains one block, namely, response, but it should be evident how it could be generalized.
Extract the BGP ip
bgp_rgx = /^mydummyhost\D+\K\d{1,3}(?:\.\d{1,3}){3}$/
bgp_ip = response[bgp_rgx]
#=> "43.245.43.105"
See String#[]. The regex in free-spacing mode:
bgp_rgx =
/
^mydummyhost\D+ # match 'mydummyhost' at the begining of a line (^),
# followed by 1+ characters other than digits (\D)
\K # reset the beginning of the match to the current location
# and discard any previously-matched characters from the
# match that is returned
\d{1,3} # match 1-3 digits
(?:\.\d{1,3}) # match '.' followed by 1-3 characters, save to a
# non-capture group
{3} # execute the foregoing non-capture group 3 times
$ # match end of line
/x # specify free-spacing regex definition mode
Extract the path values
path_rgx = /(?<=^ {2})\d+(?: +\d+){2}(?=,)/
paths = response.scan(path_rgx)
#=> ["38561 2914 55432", "38561 2914 55433"]
In free-spacing mode:
path_rgx =
/
(?<=^[ ]{2}) # use a positive lookbehind (?<=...) to assert that the
# match that follows is preceded by two spaces at the
# beginning of a line
\d+ # match 1+ digits
(?:[ ]+\d+) # match 1+ spaces followed by 1+ digits, save to non-capture group
{2} # execute the foregoing non-capture group 2 times
(?=,) # use a positive lookahead (?=...) to assert that the
# preceding match is followed by ','
/x # specify free-spacing regex definition mode
Note that when writing regular expressions in free-spacing mode all spaces are removed before the expression is parsed. It is necessary, therefore, to protect all spaces that should not be stripped out. I've done that by putting a space character in a capture group ([ ]). There are other ways to protect spaces, but that's not important.
Extract the originated values from the path values
originated = paths.map { |s| s[/\d+/] }
#=> ["38561", "38561"]
The regex reads, "match one or more digits".
Extract the community values
community_rgx = /^ {6}Community: +\K\d+:\d+(?: +\d+:\d+)+/
community = response.scan(community_rgx)
#=> ["7575:1002 7575:2462 7575:3002 7575:6001 7575:8001",
# "7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"]
In free-spacing mode:
community_rgx =
/
^[ ]{6} # match 6 spaces at beginning of a line
Community:[ ]+ # match 'Community:' followed by 1+ spaces
\K # reset the beginning of the match to the current location
# and discard any previously-matched characters from the
\d+:\d+ # match 1+ digits, ':', 1+ digits
(?:[ ]+\d+:\d+) # match 1+ spaces, 1+ digits, ':', 1+ digits, save
# to a non-capture group
+ # execute the foregoing non-capture group 1+ times
/x # specify free-spacing regex definition mode
Combine values into a hash (optional)
params = {
bgp_ip: bgp_ip,
values: originated.zip(paths, community).map do |o,p,c|
{ originated: o, path: p, community: c }
end
}
#=> {:bgp_ip=>"43.245.43.105",
# :values=>[
# {:originated=>"38561", :path=>"38561 2914 55432",
# :community=>"7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"},
# {:originated=>"38561", :path=>"38561 2914 55433",
# :community=>"7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"}
# ]
# }
See Array#zip.
Again, if the string contains multiple blocks an array of hashes such as params would be returned.
Update: This answer is not adressing all issues with the authors code, but only the question why one of the puts is not giving any output.
I would have expected this line:
puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n"
to throw an error, because I never before tried to interpolate a string without first opening a string.
I tried this in the console, and indeed it just prints an empty line:
irb(main):002:0> a = "foo"
=> "foo"
irb(main):003:0> puts #{a}
=> nil
However this works as expected:
irb(main):004:0> puts "#{a}"
foo
=> nil
Try rewriting this line as:
puts "#{ipaddress} is #{cat[i]} network #{ip[i]}\n"

How can I transform "email#domain.com" to "em***#domain.com"?

I want to transform emails only showing the first two characters and substitute the rest of them with '*' and keep the '#domain.com' as it is, how can I do it
I already have this code that works as expected but I would like to have a reduced way to do it, I don't know if there is a way to do it with gsub method and pass some ranges, any help will be appreciated
email = 'email#domain.com'
splitted_email = email.partition('#')
splitted_email.first[2...email.size-2]
splitted_email[0] = splitted_email.first[0...2] + ('*' *(splitted_email.first.size-2))
splitted_email.join
=> em***#domain.com
You may use a regex for this. A more or less readable one is
.gsub(/(?<=\A[^#]{2})[^#]*/) { |m| "*" * m.length }
What it does is matches
(?<=\A[^#]{2}) - a positive lookbehind that matches a location that is immediately preceded with:
\A - string start
[^#]{2} - two chars other than #
[^#]+ - any 1+ chars other than #.
The replacement is the block where we can manipulate the match: the * char is repeated the same number of times as the length of the match value.
See the Ruby demo:
email = 'email#domain.com'
p email.gsub(/(?<=\A[^#]{2})[^#]+/) { |m| "*" * m.length }
# => "em***#domain.com"
email.gsub(/(^.{2})[^#]*(#.*)$/,'\1***\2')
But i does not work with only one letter in front of #.
One possible solution using Object#then:
email.split('#').then { |a, b| a[0..-4] + '*' * 3 + '#' + b }
or
email.partition('#').then { |a, b, c| a[0..-4]+ '*' * 3 + b + c }
For the following cases it returns
email = 'email#domain.com' #=> "em***#domain.com"
email = 'emai#domain.com' #=> "e***#domain.com"
email = 'ema#domain.com' #=> "***#domain.com"
email = 'e#domain.com' #=> "***#domain.com"
But I don't know how you intend to manage when local-part is smaller than 4 chars.

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.
Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"
I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);
I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'
It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

How do I extract all possible methionine residues to the end from a protein sequence?

I am looking to extract all Methionine residues to the end from a sequence.
In the below sequence:
MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
Original Amino Acid sequence:
atgtttgaaatcgaagaacatatgaaggattcacaggtggaatacataattggccttcataatatcccattattgaatgcaactatttcagtgaagtgcacaggatttcaaagaactatgaatatgcaaggttgtgctaataaatttatgcaaagacattatgagaatcccctgacgggg
I want to extract from the sequence any M residue to the end, and obtain the following:
- MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
- MKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
- MNMQGCANKFMQRHYENPLTG
- MQGCANKFMQRHYENPLTG
- MQRHYENPLTG
With the data I am working with there are cases where there are a lot more "M" residues in the sequence.
The script I currently have is below. This script translates the genomic data first and then works with the amino acid sequences. This does the first two extractions but nothing further.
I have tried to repeat the same scan method after the second scan (See the commented part in the script below) but this just gives me an error:
private method scan called for #<Array:0x7f80884c84b0> No Method Error
I understand I need to make a loop of some kind and have tried, but all in vain. I have also tried matching but I haven't been able to do so - I think that you cannot match overlapping characters a single match method but then again I'm only a beginner...
So here is the script I'm using:
#!/usr/bin/env ruby
require "bio"
def extract_open_reading_frames(input)
file_output = File.new("./output.aa", "w")
input.each_entry do |entry|
i = 1
entry.naseq.translate(1).scan(/M\w*/i) do |orf1|
file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf1}"
i = i + 1
orf1.scan(/.(M\w*)/i) do |orf2|
file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf2}"
i = i + 1
# orf2.scan(/.(M\w*)/i) do |orf3|
# file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf3}"
# i = i + 1
# end
end
end
end
file_output.close
end
biofastafile = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
extract_open_reading_frames(biofastafile)
The script has to be in Ruby since this is part of a much longer script that is in Ruby.
You can do:
str = "MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG"
str.scan(/(?=(M.*))./).flatten
#=> ["MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG", MKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG", "MNMQGCANKFMQRHYENPLTG", "MQGCANKFMQRHYENPLTG", "MQRHYENPLTG"]
This works by capturing loookaheads starting with M and advancing one char at a time.
str = "MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG"
pos = 0
while pos < str.size
if md = str.match(/M.*/, pos)
puts md[0]
pos = md.offset(0)[0] + 1
else
break
end
end
--output:--
MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
MKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
MNMQGCANKFMQRHYENPLTG
MQGCANKFMQRHYENPLTG
MQRHYENPLTG
md -- stands for the MatchData object.
match() -- returns nil if there is no match, the second argument is the start position of the search.
md[0] -- is the whole match (md[1] would be the first parenthesized group, etc.).
md.offset(n) -- returns an array containing the beginning and ending position in the string of md[n].
Running the program on the string "MMMM" produces the output:
MMMM
MMM
MM
M
I have also tried matching but I haven't been able to do so - I think
that you cannot match overlapping characters a single match method but
then again I'm only a beginner...
Yes, that's true. String#scan will not find overlapping matches. After scan finds a match, the search continues from the end of the match. Perl has some ways to make regexes back-up, I don't know whether Ruby has those.
Edit:
For Ruby 1.8.7:
str = "MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG"
pos = 0
while true
str = str[pos..-1]
if md = str.match(/M.*/)
puts md[0]
pos = md.offset(0)[0] + 1
else
break
end
end

Replace the last match in a string

I'm playing around with Ruby to do some file versioning for me. I have a string 2.0.0.65 . I split it up, increment the build number (65 --> 66) then I want to replace the 65 with the 66. In this replace though, I only want to replace the last match of the string. What's the best way in Ruby to do this?
version_text = IO.read('C:\\Properties')
puts version_text
version = version_text.match(/(\d+\.\d+\.\d+\.\d+)/)[1]
puts version
build_version = version.split('.')[3]
puts build_version
incremented_version = build_version.to_i + 1
puts incremented_version`
...
If you just want to increment the integer at the very end of a string then try this:
s = '2.0.0.65'
s.sub(/\d+\Z/) {|x| x.to_i + 1} # => '2.0.0.66'
You can do something like this:
parts = "2.0.0.65".split('.')
parts[3] = parts[3].to_i + 1
puts parts.join(".")
output:
2.0.0.66
This gives you more control over just using a string replacement method, as now you can increment other parts of the version string if needed more easily.
Once you have the string with the build number, you only need to use 'succ' method
'2.0.0.65'.succ()
Which gives you the string
'2.0.0.66'
sample = '2.0.0.65'
def incr_version(version)
parts = version.split('.')
parts[-1] = parts[-1].to_i + 1
parts.join('.')
end
incr_version(sample) # => '2.0.0.66'
For fun, if you want to increment the last integer in any string you could do this:
str = "I have 3 cats and 41 rabbits"
str.reverse.sub(/\d+/){ |s| (s.reverse.to_i+1).to_s.reverse }.reverse
#=> "I have 3 cats and 42 rabbits"
This is only valid when you modify your regex to match the reversed version of the text.
More generally, you can do this:
class String
# Replace the last occurrence of a regex in a string.
# As with `sub` you may specify matches in the replacement string,
# or pass a block instead of the replacement string.
# Unlike `sub` the captured sub-expressions will be passed as
# additional parameters to your block.
def rsub!(pattern,replacement=nil)
if n=rindex(pattern)
found=match(pattern,n)
self[n,found[0].length] = if replacement
replacement.gsub(/\\\d+/){ |s| found[s[1..-1].to_i] || s }
else
yield(*found).to_s
end
end
end
def rsub(pattern,replacement=nil,&block)
dup.tap{ |s| s.rsub!(pattern,replacement,&block) }
end
end
str = "I have 3 cats and 41 rabbits"
puts str.rsub(/(?<=\D)(\d+)/,'xx')
#=> I have 3 cats and xx rabbits
puts str.rsub(/(?<=\D)(\d+)/,'-\1-')
#=> I have 3 cats and -41- rabbits
puts str.rsub(/(?<=\D)(\d+)/){ |n| n.to_i+1 }
#=> I have 3 cats and 42 rabbits
Note that (as with rindex) because the regex search starts from the end of the string you may need to make a slightly more complex regex to force your match to be greedy.

Resources