Ruby - Issues iterating through multiple strings and populating arrays - ruby

I have a response variable that has the following strings in it:
response variable (this output I pulled from a telnet session, which I'm not showing below in the code for the sake of simplifying):
mydummyhost# show ip bgp 43.245.43.105
BGP routing table entry for 43.245.43.0/24
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
38561 2914 55432, (aggregated by 55532 202.68.67.134)
202.158.215.44 from 201.158.202.44 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Tue Sep 22 12:25:17 2020
38561 2914 55433, (aggregated by 55433 202.68.67.135)
202.158.215.52 from 202.158.215.52 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Mon Sep 21 06:44:58 2020
I have a piece of code that I'm using to try to iterate through the string lines above and basically get the following result :
Required result:
43.245.43.105 is domestic peering (On-Net) originated by AS 55432 via path 38561 2914 55432
43.245.43.105 is domestic peering (On-Net) originated by AS 55433 via path 38561 2914 55433
Code:
#!/usr/bin/env ruby
require 'net/telnet'
ipaddress = "43.245.43.105"
cat = []
response = ""
origin = []
paths = []
net = []
community = []
onoffnet= {
ond:"domestic (On-Net)",
oni:"international research (On-Net)",
opd:"domestic peering (On-Net)",
ofd:"domestic transit (Off-Net)",
opi:"international peering (Off-Net)",
ofi:"international transit (Off-Net)"
}
response.each_line do |line|
if line =~ /BGP routing table entry for (\d+\.\d+\.\d+\.\d+.*)/
net[i] = $1
elsif line =~ /Community: (.*)$/
community[i] = $1
elsif line =~ /^\s+([0-9\ ]+),.*/
paths, aggregatedBy = line.split(", ")
paths[i] = paths.strip
origin[i] == aggregatedBy.split(" ")[2]
elsif line =~ /Last update:/
i += 1
end
end
if i == 0
print "ERROR, no data found for the IP."
else
i = 0
net.each do | ip |
if community[i] =~ /7575\:1000/
cat[i] = onoffnet.fetch(:ond)
elsif community[i] =~ /7575\:1001/
cat[i] = onoffnet.fetch(:oni)
elsif community[i] =~ /7575\:1002/
if community[i] =~ /7575\:6001/
cat[i] = onoffnet.fetch(:opd)
else
cat[i] = onoffnet.fetch(:opi)
end
elsif community[i] =~ /7575\:1003/
if community[i] =~ /7575\:6001/
cat[i] = onoffnet.fetch(:ofd)
else
cat[i] = onoffnet.fetch(:ofi)
end
end
i += 1
if origin[i].to_s.length > 0 && paths[i].to_s.length > 0
puts "#{ipaddress} is cat[i] network ip[i] originated by AS #{origin} via path #{paths} ."
else
puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n"
puts "Test"
end
end
end
When I ran this, only "Test" is shown as output :
[root#mydummyhost]# ./telnet.rb
Test
Basically I'm trying to use line =~ /Last update:/ to determine that I have another separate paths and origin to display by incrementing i, if that makes sense.
So I know it had hit the if condition, however I'm not sure why its not showing the first print line puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n" , this appears to be empty.
EDIT:
After changing puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n" to puts "#{ipaddress} is #{cat[i]} network #{ip[i]}\n" (as suggested by #trueunlessfalse thanks!), I can now see an output :
[root#dummyhost]# ./telnet.rb
43.245.43.105 is network 3
Unfortunately this isnt even close to the required result I'm looking for.
I know somehow the problem is with the way I was iterating that response var and populating the arrays - I have a perl code that works and am trying to convert that to ruby - , I'm just not sure how to fix this or what is a better way of going through that output.
Any suggestions how I should iterate response so the arrays can be populated based on the i value that gets incremented when it comes to the end of the "block" marked by Last update:?
Thanks
J

The expected result appears to depend on the following types of values contained in the given string:
BGP ip (e.g., "43.245.43.105")
path values (e.g., "38561 2914 55432")
originated values (e.g., "38561")
community values (e.g., "7575:1002 7575:2462 7575:3002 7575:6001 7575:8001")
I suggest you first concentrate on extracting these values, after which it is fairly straightforward to construct the desired strings. My answer is limited to this initial task.
To produce some numbers I will first construct your string response.
response =<<~BITTER_END
mydummyhost# show ip bgp 43.245.43.105
BGP routing table entry for 43.245.43.0/24
Paths: (2 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
38561 2914 55432, (aggregated by 55532 202.68.67.134)
202.158.215.44 from 201.158.202.44 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Tue Sep 22 12:25:17 2020
38561 2914 55433, (aggregated by 55433 202.68.67.135)
202.158.215.52 from 202.158.215.52 (202.158.215.62)
Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001
Last update: Mon Sep 21 06:44:58 2020
BITTER_END
I don't know much about Telnet, but it appears response contains a block of data that begins with a line that starts with 'mydummyhost'. I have written this to permit multiple such blocks (each beginning with a line that begins, 'mydummyhost'), so as a first step I will apply String#scan with a regular expression as follows.
arr = response.scan(/^mydummyhost\D+.+?(?=\z|^mydummyhost)/m)
#=> ["mydummyhost# show ip bgp 43.245.43.105\nBGP routing table entry for 43.245.43.0/24\nPaths: (2 available, best #1, table Default-IP-Routing-Table)\n Not advertised to any peer\n 38561 2914 55432, (aggregated by 55532 202.68.67.134)\n 202.158.215.44 from 201.158.202.44 (202.158.215.62)\n Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001\n Last update: Tue Sep 22 12:25:17 2020\n\n 38561 2914 55433, (aggregated by 55433 202.68.67.135)\n 202.158.215.52 from 202.158.215.52 (202.158.215.62)\n Community: 7575:1002 7575:2462 7575:3002 7575:6001 7575:8001\n Last update: Mon Sep 21 06:44:58 2020\n"]
We can write the regular expression in free-spacing mode to make it self-documenting.
/
^mydummyhost\D+ # match 'mydummyhost' at the beginning of a line followed
# by 1+ characters other than digits (\D)
.+ # match 1+ characters, including line terminators
? # make previous match lazy (aka non-greedy)
(?= # begin a positive lookahead
\z # match end of string
| # or
^mydummyhost # match '^mydummyhost' at the beginning of a line
) # end positive lookahead
/mx # specify multiline (\m) and free-spacing regex definition modes
Multiline mode (named differently by other languages) causes the dot to match line terminators (\n and \r\n) as well as other characters.
You will see here that arr contains a single element. The next step is to map arr to values for each block. To simplify the presentation I will assume the string contains one block, namely, response, but it should be evident how it could be generalized.
Extract the BGP ip
bgp_rgx = /^mydummyhost\D+\K\d{1,3}(?:\.\d{1,3}){3}$/
bgp_ip = response[bgp_rgx]
#=> "43.245.43.105"
See String#[]. The regex in free-spacing mode:
bgp_rgx =
/
^mydummyhost\D+ # match 'mydummyhost' at the begining of a line (^),
# followed by 1+ characters other than digits (\D)
\K # reset the beginning of the match to the current location
# and discard any previously-matched characters from the
# match that is returned
\d{1,3} # match 1-3 digits
(?:\.\d{1,3}) # match '.' followed by 1-3 characters, save to a
# non-capture group
{3} # execute the foregoing non-capture group 3 times
$ # match end of line
/x # specify free-spacing regex definition mode
Extract the path values
path_rgx = /(?<=^ {2})\d+(?: +\d+){2}(?=,)/
paths = response.scan(path_rgx)
#=> ["38561 2914 55432", "38561 2914 55433"]
In free-spacing mode:
path_rgx =
/
(?<=^[ ]{2}) # use a positive lookbehind (?<=...) to assert that the
# match that follows is preceded by two spaces at the
# beginning of a line
\d+ # match 1+ digits
(?:[ ]+\d+) # match 1+ spaces followed by 1+ digits, save to non-capture group
{2} # execute the foregoing non-capture group 2 times
(?=,) # use a positive lookahead (?=...) to assert that the
# preceding match is followed by ','
/x # specify free-spacing regex definition mode
Note that when writing regular expressions in free-spacing mode all spaces are removed before the expression is parsed. It is necessary, therefore, to protect all spaces that should not be stripped out. I've done that by putting a space character in a capture group ([ ]). There are other ways to protect spaces, but that's not important.
Extract the originated values from the path values
originated = paths.map { |s| s[/\d+/] }
#=> ["38561", "38561"]
The regex reads, "match one or more digits".
Extract the community values
community_rgx = /^ {6}Community: +\K\d+:\d+(?: +\d+:\d+)+/
community = response.scan(community_rgx)
#=> ["7575:1002 7575:2462 7575:3002 7575:6001 7575:8001",
# "7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"]
In free-spacing mode:
community_rgx =
/
^[ ]{6} # match 6 spaces at beginning of a line
Community:[ ]+ # match 'Community:' followed by 1+ spaces
\K # reset the beginning of the match to the current location
# and discard any previously-matched characters from the
\d+:\d+ # match 1+ digits, ':', 1+ digits
(?:[ ]+\d+:\d+) # match 1+ spaces, 1+ digits, ':', 1+ digits, save
# to a non-capture group
+ # execute the foregoing non-capture group 1+ times
/x # specify free-spacing regex definition mode
Combine values into a hash (optional)
params = {
bgp_ip: bgp_ip,
values: originated.zip(paths, community).map do |o,p,c|
{ originated: o, path: p, community: c }
end
}
#=> {:bgp_ip=>"43.245.43.105",
# :values=>[
# {:originated=>"38561", :path=>"38561 2914 55432",
# :community=>"7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"},
# {:originated=>"38561", :path=>"38561 2914 55433",
# :community=>"7575:1002 7575:2462 7575:3002 7575:6001 7575:8001"}
# ]
# }
See Array#zip.
Again, if the string contains multiple blocks an array of hashes such as params would be returned.

Update: This answer is not adressing all issues with the authors code, but only the question why one of the puts is not giving any output.
I would have expected this line:
puts #{ipaddress} + "is" + cat[i] + "network" + ip[i] + "\n"
to throw an error, because I never before tried to interpolate a string without first opening a string.
I tried this in the console, and indeed it just prints an empty line:
irb(main):002:0> a = "foo"
=> "foo"
irb(main):003:0> puts #{a}
=> nil
However this works as expected:
irb(main):004:0> puts "#{a}"
foo
=> nil
Try rewriting this line as:
puts "#{ipaddress} is #{cat[i]} network #{ip[i]}\n"

Related

How to check with ruby if a word is repeated twice in a file

I have a large file, and I want to be able to check if a word is present twice.
puts "Enter a word: "
$word = gets.chomp
if File.read('worldcountry.txt') # do something if the word entered is present twice...
How can i check if the file worldcountry.txt include twice the $word i entered ?
I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby
On the Gerry post with this code
word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
Thanks, i will pay more attention next time.
If the file is not overly large, it can be gulped into a string. Suppose:
str = File.read('cat')
#=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.
Suppose the given word is 'dog'.
Confirm the file contains at least two instances of the given word
One can attempt to match the regular expression
r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
#=> true
Demo
Confirm the file contains exactly two instances of the given word
Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let
r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
#=> false
Demo
The two regular expressions can be written in free-spacing mode to make them self-documenting.
r1 = /
\bdog\b # match 'dog' surrounded by word breaks
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
/m # cause . to match newlines
r2 = /
\A # match beginning of string
(?: # begin non-capture group
(?: # begin non-capture group
. # match one character
(?! # begin negative lookahead
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
) # end non-capture group
* # execute preceding non-capture group zero or more times
\bdog\b # match 'dog' surrounded by word breaks
) # end non-capture group
{2} # execute preceding non-capture group twice
(?! # begin negative lookahead
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
/xm # # cause . to match newlines and invoke free-spacing mode

Assignment Branch Condition size is too high

I'm making method that takes multiline string (logs) and writes new strings to array.
def task_2(str)
result = []
str.each_line do |x|
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
if !ip.nil? && !datetime.nil? && !address.nil?
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
end
result
end
and I need it to pass rubocop analysis with default configuration, but it gives AbcSize 18.68/15
And I'm sure that because of if..end statement, but how can I rewrite it?
Log example:
10.6.246.103 - - [23/Apr/2018:20:30:39 +0300] "POST /test/2/messages HTTP/1.1" 200 48 0.0498
10.6.246.101 - - [23/Apr/2018:20:30:42 +0300] "POST /test/2/run HTTP/1.1" 200 - 0.2277
The ABC size is calculated by doing the following:
√(assignments² + branches² + conditionals²)
Let's first have a look at the assignments:
result = []
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
This leaves us with 4 assignments.
Next up the branches. For this I have to mention that most of the operators are methods (thus count towards branches) for example 1 + 1 could also be written as 1.+(1) + is a method on an integer. The same applies for string[regex], which could be written as string.[](regex) [] is a method on strings. And !value which could be written as value.!# !# is a method on all objects.
With that out of the way let's count the branches.
str.each_line
x[/^.* - -/]
x[/[\[].*[\]]/]
x[/T .* H/]
!ip.nil? # counts for 2 (! and .nil?)
!datetime.nil? # counts for 2 (! and .nil?)
!address.nil? # counts for 2 (! and .nil?)
result << ...
datetime[1..-2]
ip[0..-4]
address[1..-3]
+ # 4 times in result << ... + ... + ....
This leaves us with 18 branches.
The last thing to count are the conditionals. Since Ruby uses short circuiting with the && and || operators they will count towards conditionals.
if
&& # 2 times
This leaves us with 3 conditionals.
√(4² + 18² + 3²) ≈ 18.68
Now that we have an understanding of where the number is coming from, we can try and reduce it. The easiest way to reduce ABC size is by reducing the thing with the largest number, since this number is squared. In your case these are the branches. You already spotted where the issue lies in the question.
if !ip.nil? && !datetime.nil? && !address.nil?
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
Could be simplified to:
if ip && datetime && address
result << "#{datetime[1..-2]} FROM: #{ip[0..-4]}TO:#{address[1..-3]}"
end
Taking a total of 10 branches away. 3 times !something.nil? (which count for 2, since ! and .nil? are both counted towards branches) and 4 times +.
Leaving you with:
√(4² + 8² + 3²) ≈ 9.43
def task_2(str)
result = []
str.each_line do |x|
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
if ip && datetime && address
result << datetime[1..-2] + ' FROM: ' + ip[0..-4] + 'TO:' + address[1..-3]
end
end
result
end
Having !variable.nil? is redundant. Basically, you are checking presence here, so #present? method would suffice, but any value that is not nil or false is considered false, so to be more idiomatic it is better to just use the form I used in the if statement. This solves ABS issue.
I don't use rubocop, but I did test the following with this data:
data = <<FILE
10.6.246.103 - - [23/Apr/2018:20:30:39 +0300] "POST /test/2/messages HTTP/1.1" 200 48 0.0498
10.6.246.101 - - [23/Apr/2018:20:30:42 +0300] "POST /test/2/run HTTP/1.1" 200 - 0.2277
12.55.123.255 - - Hello
FILE
using String#gsub! and Enumerable#select (Reports AbcSize of 3)
def task_2(str)
str.each_line.select do |x|
# Without named groups
# x.gsub!(/\A([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*\z/m,
# '\2 FROM \1 TO \3')
x.gsub!(/\A(?<ip>[\d+\.\d+]+).*(?<=\[)(?<date_time>.*)(?=\]).*(?<=\s)(?<address>(?:\/\w+)*?)(?=\s).*\z/m,
'\k<date_time> FROM \k<ip> TO \k<address>')
end
end
task_2(data)
# => ["23/Apr/2018:20:30:39 +0300 FROM 10.6.246.103 TO /test/2/messages",
# "23/Apr/2018:20:30:42 +0300 FROM 10.6.246.101 TO /test/2/run"]
Here we are using String#gsub! with a pattern replacement, which will return nil if no replacement is made thus rejecting it from Enumerable#select.
Similar solution, although likely less efficient, using String#match, Enumerable#map, and Array#compact (Reports AbcSize of 7.14)
def task_2(str)
str.each_line.map do |x|
match = x.match(/\A(?<ip>[\d+\.\d+]+).*(?<=\[)(?<date_time>.*)(?=\]).*(?<=\s)(?<address>(?:\/\w+)*?)(?=\s)/)
"#{match['date_time']} FROM #{match['ip']} TO #{match['address']}" if match
end.compact
end
Here we are using String#match to extract the match data and then confirming a match and outputting the desired format if there is a match. A string that does not match will output nil and thus we compact the Array to remove the nil values.
Another option could just be to scan the whole String all at once and break out the matching groups: (Reports AbcSize of 5)
def task_2(str)
str.scan(/^([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*$/)
.map {|a| "#{a[1]} FROM #{a[0]} TO #{a[2]}"}
end
Can make the last one as low as 2.24 via
def task_2(str)
r = []
str.scan(/^([\d+\.\d+]+).*(?<=\[)(.*)(?=\]).*(?<=\s)((?:\/\w+)*?)(?=\s).*$/) do |ip, date_time, address |
r << "#{date_time} FROM #{ip} TO #{address}"
end
r
end
Any time I run into a ABC too high (or similar complexity/length warnings), I'm pretty quick to just chop the method up. Your readability, testability, and maintainability almost always improve.
The quickest way is to yank out the body of a loop or conditional into a new method. Repeat as needed until you can read each method in one breath.
Similarly, if you've got big complex conditionals/loop constructs, pull that out to a new method as well.
Combining those two strategies enough times will reduce any method into roughly two method calls. That might be a bit overzealous in some cases...but it's never too far.
Here's one way you could apply that strategy to your code:
def task_2(str)
result = []
str.each_line do |x|
ip, datetime, address = parse_line(x)
if [ip, datetime, address].all?
result << "#{datetime[1..-2]} FROM: #{ip[0..-4]} TO: #{address[1..-3]}"
end
end
result
end
def parse_line(x)
ip = x[/^.* - -/]
datetime = x[/[\[].*[\]]/]
address = x[/T .* H/]
return [ip, datetime, address]
end
s =<<EOF
123.123.123.999 - - [2009-12-31 13:13:13] T www.google.com H"
456.456.456.999 - - [2009-12-31 13:13:13] 404"
678.678.678.999 - - [2009-12-31 13:13:13] T www.amazon.com H"
EOF
puts task_2(s)
Produces the output:
2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com
2009-12-31 13:13:13 FROM: 678.678.678.999 TO: www.amazon.com
If you wanted to go even farther, you could pull the body of each_line out to a new method, process_line, etc. And if you created a class, you could avoid the messy (to my eye) multi-value returns.
This is a problem where it is conventient to use named capture groups.
R = /
(?= # begin a positive lookahead
(?<ip>.*\s-\s-) # match the string in a capture group named 'ip'
) # end positive lookahead
(?= # begin a positive lookahead
.* # match any number of characters
(?<datetime>[\[].*[\]]) # match the string in a capture group named 'datetime'
) # end positive lookahead
(?= # begin a positive lookahead
.* # match any number of characters
(?<address>T\s.*\sH) # match the string in a capture group named 'address'
) # end positive lookahead
/x # free-spacing regex definition mode
def task_2(str)
str.each_line.with_object([]) do |s, result|
m = str.match(R)
result << m[:datetime][1..-2] + ' FROM: ' + m[:ip][0..-4] +
'TO:' + m[:address][1..-3] unless m.nil?
end
end
str =<<_
123.123.123.999 - - [2009-12-31 13:13:13] T www.google.com H"
456.456.456.999 - - [2009-12-31 13:13:13] 404"
678.678.678.999 - - [2009-12-31 13:13:13] T www.amazon.com
_
task_2 str
#=> ["2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com",
# "2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com",
# "2009-12-31 13:13:13 FROM: 123.123.123.999 TO: www.google.com"]
The regular expression is conventionally written as follows.
R = /(?=(?<ip>\A.* - -))(?=.*(?<datetime>[\[].*[\]]))(?=.*(?<address>T .* H))/
Notice that where I have spaces here I had whitespace characters (\s) when writing the regex in free-spacing mode. That's because in free-spacing mode spaces are stripped out before the expression is evaluated. Alternatively, spaces can be preserved in free-spacing mode by enclosing them in character classes ([ ]).

How to extract portion of a line in ruby?

I have a line say
line = "start running at Sat April 1 07:30:37 2017"
and I want to extract
"Sat April 1 07:30:37 2017"
I tried this...
line = "start running at Sat April 1 07:30:37 2017"
if (line =~ /start running at/)
line.split("start running at ").last
end
... but is there any other way of doing this?
This is a way to extract, from an arbitrary string, a substring that represents a time in the given format. I've assumed there is at most one such substring in the string.
require 'time'
R = /
(?:#{Date::ABBR_DAYNAMES.join('|')})\s
# match day name abbreviation in non-capture group. space
(?:#{Date::MONTHNAMES[1,12].join('|')})\s
# match month name in non-capture group, space
\d{1,2}\s # match one or two digits, space
\d{2}: # match two digits, colon
\d{2}: # match two digits, colon
\d{2}\s # match two digits, space
\d{4} # match 4 digits
(?!\d) # do not match digit (negative lookahead)
/x # free-spacing regex def mode
# /
# (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat)\s
# (?:January|February|March|...|November|December)\s
# \d{1,2}\s
# \d{2}:
# \d{2}:
# \d{2}\s
# \d{4}
# (?!\d)
# /x
def extract_time(str)
s = str[R]
return nil if s.nil?
(DateTime.strptime(s, "%a %B %e %H:%M:%S %Y") rescue nil) ? s : nil
end
str = "start eating breakfast at Sat April 1 07:30:37 2017"
extract_time(str)
#=> "Sat April 1 07:30:37 2017"
str = "go back to sleep at Cat April 1 07:30:37 2017"
extract_time(str)
#=> nil
Alternatively, if there is a match against R, but Time#strptime raises an exception (meaning s is not a valid time for the given time format) one could raise an exception to advise the user.
try
line.sub(/start running at (.*)/, '\1')
The standard way to do this with regular expressions would be:
if md = line.match(/start running at (.*)/)
md[1]
end
But you don't need regular expressions, you can do regular string operations:
prefix = 'start running at '
if line.start_with?(prefix)
line[prefix.size..-1]
end
Here's another (as it turns out, slightly faster) option using #partition:
# will return empty string if there is no match, instead of raising an exception like split.last will
line.partition('start running at ').last
I was interested how this performs against regexp match, so here's a quick benchmark with 1 million executions each:
line.sub(/start running at (.*)/, '\1')
# => #real=1.7465
line.partition('start running at ').last
# => #real=0.712406
# => this is faster, but you'd need to be calling this quite a bit for it to make a significant difference
Bonus: it also makes it really easy to cater for a more general case e.g. if you have lines that start with "start running at" and others that start with "stop running at". Then something like line.partition(' at ').last will cater for both (and actually run slightly faster).
And yet another alternative:
puts $1 if line =~ /start running at (.*)/
The shortest would be line["Sat April 1 07:30:37 2017"] which would return your "Sat April 1 07:30:37 2017" string if present and nil if not.
The [] notation on a String is a shorthand for getting a substring out of the string and can be used with another string or a Regular Expression. See https://ruby-doc.org/core-2.2.0/String.html#method-i-5B-5D
In case the string is unknown you can use this shorthand also like Cary suggested
line[/start running at (.*)/, 1]
In case you want to be sure the date extracted is valid you would need the regular expression from his answer but you still could use this method.

Ruby parsing and regex

Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.
Let's say I have an order list that looks strictly like this in this format:
cost: 50 items: book,lamp
One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that.
How can I check for errors in this format using Ruby? This for example should fail my checks:
cost: 60 items:shoes,football
My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?
You could use the following regular expression.
r = /
\A # match beginning of string
cost:\s # match "cost:" followed by a space
\d+\s # match > 0 digits followed by a space
items:\s # match "items:" followed by a space
[[:alpha:]]+ # match > 0 lowercase or uppercase letters
(?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase
# letters in a non-capture group (?: ... )
* # perform the match on non-capture group >= 0 times
\z # match the end of the string
/x # free-spacing regex definition mode
"cost: 50 items: book,lamp" =~ r #=> 0 (a match, beginning at index 0)
"cost: 50 items: book,lamp,table" =~ r #=> 0 (a match, beginning at index 0)
"cost: 60 items:shoes,football" =~ r #=> nil (no match)
The regex can can of course be written in the normal manner:
r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/
or
r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/
though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).

Not understanding regex (1[0-2]|0?[1-9]) for validating time strings [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
I found and tested a regex to validate a time string such as 11:30 AM:
^(1[0-2]|0?[1-9]):([0-5][0-9])(\s[A|P]M)\)?$
I understand most of it except the beginning:
(1[0-2]|0?[1-9])
Can someone explain what is going on? 1[0-2] is there is a fist digit that can be between 0 and 2? And then I don't understand |0?[1-9].
(1[0-2]|0?[1-9])
| separates the regex into two parts, where
1[0-2]
matches 10, 11 or 12, and
0?[1-9]
matches 1 to 9, with an optional leading 0.
I will explain by writing the regex in extended mode, which permits comments:
r = /
^ # match the beginning of the string
( # begin capture group 1
1 # match 1
[0-2] # match one of the characters 0,1,2
| # or
0? # optionally match a zero
[1-9] # match one of the characters between 1 and 9
) # end capture group 1
: # match a colon
( # begin capture group 2
[0-5] # match one of the characters between 0 and 5
[0-9] # match one of the characters between 0 and 9
) # end capture group 2
( # begin capture group 3
\s # match one whitespace character
[A|P] # match one of the characters A, | or P
M # match M
) # end capture group 3
\)? # optionally match a right parenthesis
$ # match the end of the string
/x # extended mode
As noticed by #Mischa, [A|P] is incorrect. It should be [AP]. That's because "|" is just an ordinary character when it's within a character class.
Also, I think the regex would be improved by moving \s out of capture group 3. We therefore might write:
r = /^(1[0-2]|0?[1-9]):([0-5][0-9])\s([AP]M)\)?$/
It could be used thusly:
result = "11:39 PM" =~ r
if result
puts "It's #{$2} minutes past #{$1}, #{ $3=='AM' ? 'anti' : 'post' } meridiem."
else
# raise exception?
end
#=> It's 39 minutes past 11, post meridiem.
In words, the revised regex reads as follows:
match the beginning of the string.
match "10", "11", "12", or one of the digits "1" to "9", optionally preceded by a zero, and save the match to capture group 1.
match a colon.
match a digit between "0" and "5", then a digit between "0" and "9", and save the two digits to capture group 2.
match a whitespace character.
match "A", or "P", followed by "M", and save the two characters to capture group 3.
optionally match a right parenthesis.
match the end of the string.

Resources