Regex doesn't work on the first time - ruby

I have a string e.g. 02112016. I want to make a datetime from this string.
I have tried:
s = "02112016"
s.sub(/(\d{2})(\d{2})(\d{4})/, "#{$1}-#{$2}-#{$3}")
But there is a problem. It returns "--".
If I try this s.sub(/(\d{2})(\d{2})(\d{4})/, "#{$1}-#{$2}-#{$3}") again, it works: "02-11-2016". Now I can use to_datetime method.
But why doesn't the s.sub(/(\d{2})(\d{2})(\d{4})/, "#{$1}-#{$2}-#{$3}") work on the first time?

It's really a simple change here. $1 and friends are only assigned after the match succeeds, not during the match itself. If you want to use immediate values, do this:
s = "02112016"
s.sub(/(\d{2})(\d{2})(\d{4})/, '\1-\2-\3')
# => "02-11-2016"
Here \1 corresponds to what will be assigned to $1. This is especially important if you're using gsub since $1 tends to be the last match only while \1 is evaluated for each match individually.

I prefer the following.
r = /
\d{2} # match two digits
(?=\d{4}) # match four digits in a positive lookahead
/x # free-spacing regex definition mode
which is the same as
r = /\d{2}(?=\d{4})/
to be used with String#gsub:
s.gsub(r) { |s| "#{s}-" }
Try it:
"02112016".gsub(r) { |s| "#{s}-" }
#=> "02-11-2016"

What is happening is the first time you ran it, $1, $2, and $3 are empty
You are essentially subbing the numbers for empty strings.
So if we do
s = "02112016"
p $1 #=> nil
p $2 #=> nil
p $3 #=> nil
s.sub(/(\d{2})(\d{2})(\d{4})/, "#{$1}-#{$2}-#{$3}") #=> "--"
p $1 #=> "02"
p $2 #=> "11"
p $3 #=> "2016"
s.sub(/(\d{2})(\d{2})(\d{4})/, "#{$1}-#{$2}-#{$3}") #=> "02-11-2016"
That is why it works the second time.
Since the string is always the same length, you can use the [] method to break it up.
s = "#{s[0..1]}-#{s[2..3]}-#{s[4..-1]}"
This will return the desired result
"02-11-2016"

Related

Remove a string pattern and symbols from string

I need to clean up a string from the phrase "not" and hashtags(#). (I also have to get rid of spaces and capslock and return them in arrays, but I got the latter three taken care of.)
Expectation:
"not12345" #=> ["12345"]
" notabc " #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK" #=> ["capslock"]
"##doublehash" #=> ["doublehash"]
"h#a#s#h" #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]
This is the code I have
def some_method(string)
string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end
All of the above test does what I need to do except for the hash ones. I don't know how to get rid of the hashes; I have tried modifying the regex part: n.sub(/(#not)/), n.sub(/#(not)/), n.sub(/[#]*(not)/) to no avail. How can I make Regex to remove #?
arr = ["not12345", " notabc", "notone, nottwo", "notCAPSLOCK",
"##doublehash:", "h#a#s#h", "#notswaggerest"].
arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
#=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]
When the block variable str is set to "notone, nottwo",
s = str.downcase
#=> "notone, nottwo"
a = s.split(',')
#=> ["notone", " nottwo"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["one", "two"]
Because I used Enumerable#flat_map, "one" and "two" are added to the array being returned. When str #=> "notCAPSLOCK",
s = str.downcase
#=> "notcapslock"
a = s.split(',')
#=> ["notcapslock"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["capslock"]
Here is one more solution that uses a different technique of capturing what you want rather than dropping what you don't want: (for the most part)
a = ["not12345", " notabc", "notone, nottwo",
"notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]
Had to delete the # because of h#a#s#h otherwise delete could have been avoided with a regex like /(?<=not|^#[^not])\w+/
You can use this regex to solve your problem. I tested and it works for all of your test cases.
/^\s*#*(not)*/
^ means match start of string
\s* matches any space at the start
#* matches 0 or more #
(not)* matches the phrase "not" zero or more times.
Note: this regex won't work for cases where "not" comes before "#", such as not#hash would return #hash
Fun problem because it can use the most common string functions in Ruby:
result = values.map do |string|
string.strip # Remove spaces in front and back.
.tr('#','') # Transform single characters. In this case remove #
.gsub('not','') # Substitute patterns
.split(', ') # Split into arrays.
end
p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]
I prefer this way rather than a regexp as it is easy to understand the logic of each line.
Ruby regular expressions allow comments, so to match the octothorpe (#) you can escape it:
"#foo".sub(/\#/, "") #=> "foo"

Removing trailings zeros in string

I have a string and I need to remove trailing zeros after the 2nd decimal place:
remove_zeros("1,2,3,4.2300") #=> "1,2,3,4.23"
remove_zeros("1,2,3,4.20300") #=> "1,2,3,4.203"
remove_zeros("1,2,3,4.0200") #=> "1,2,3,4.02"
remove_zeros("1,2,3,4.0000") #=> "1,2,3,4.00"
Missing zeros don't have to be appended, i.e.
remove_zeros("1,2,3,4.0") #=> "1,2,3,4.0"
How could I do this in Ruby? I tried with converting into Float but it terminates the string when I encounter a ,. Can I write any regular expression for this?
Yes, a regular expression could be used.
R = /
\. # match a decimal
\d*? # match one or more digits lazily
\K # forget all matches so far
0+ # match one or more zeroes
(?!\d) # do not match a digit (negative lookahead)
/x # free-spacing regex definition mode
def truncate_floats(str)
str.gsub(R,"")
end
truncate_floats "1,2,3,4.2300"
#=> "1,2,3,4.23"
truncate_floats "1.34000,2,3,4.23000"
#=> "1.34,2,3,4.23"
truncate_floats "1,2,3,4.23003500"
#=> "1,2,3,4.230035"
truncate_floats "1,2,3,4.3"
#=> "1,2,3,4.3"
truncate_floats "1,2,3,4.000"
#=> "1,2,3,4."
> a = "1,2,3,4.2300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.23"
> a = "1,2,3,4.20300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.203"
First, you need to parse the string into its component numbers, then remove the trailing zeros on each number. This can be done by:
1) splitting the string on ',' to get an array of numeric strings
2) for each numeric string, convert it to a Float, then back to a string:
#!/usr/bin/env ruby
def parse_and_trim(string)
number_strings = string.split(',')
number_strings.map { |s| Float(s).to_s }.join(',')
end
p parse_and_trim('1,2,3,4.2300') # => "1.0,2.0,3.0,4.23"
If you really want to remove the trailing '.0' fragments, you could replace the script with this one:
#!/usr/bin/env ruby
def parse_and_trim_2(string)
original_strings = string.split(',')
converted_strings = original_strings.map { |s| Float(s).to_s }
trimmed_strings = converted_strings.map do |s|
s.end_with?('.0') ? s[0..-3] : s
end
trimmed_strings.join(',')
end
p parse_and_trim_2('1,2,3,4.2300') # => "1,2,3,4.23"
These could of course be made more concise, but I've used intermediate variables to clarify what's going on.

Regexp to match repeated substring

I would like to verify a string containing repeated substrings. The substrings have a particular structure. Whole string has a particular structure (substring split by "|"). For instance, the string can be:
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
How can I check that all repeated substrings match a regexp? I tried to check it with:
"1=23.00|6=22.12|12=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
But checking gives true even when several substrings do not match the regexp:
"1=23.00|6=ass|=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
# => #<MatchData "1=23.00" 1:"1=23.00">
The question is whether every repeated substring matches a regex. I understand that the substrings are separated by the character | or $/, the latter being the end of a line. We first need to obtain the repeated substrings:
a = str.split(/[#{$/}\|]/)
.map(&:strip)
.group_by {|s| s}
.select {|_,v| v.size > 1 }
.keys
Next we specify whatever regex you wish to use. I am assuming it is this:
REGEX = /[1-9][0-9]*=[1-9]+\.[0-9]+/
but it could be altered if you have other requirements.
As we wish to determine if all repeated substrings match the regex, that is simply:
a.all? {|s| s =~ REGEX}
Here are the calculations:
str =<<_
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
_
c = str.split(/[#{$/}\|]/)
#=> ["1=23.00", "6=22.12", "12=21.34", "112=20.34", "1=23.00",
# "6=22.12", "12=21.34", "1=23.00", "12=21.34", "1=23.00**"]
d = c.map(&:strip)
# same as c, possibly not needed or not wanted
e = d.group_by {|s| s}
# => {"1=23.00" =>["1=23.00", "1=23.00", "1=23.00"],
# "6=22.12" =>["6=22.12", "6=22.12"],
# "12=21.34" =>["12=21.34", "12=21.34", "12=21.34"],
# "112=20.34"=>["112=20.34"], "1=23.00**"=>["1=23.00**"]}
f = e.select {|_,v| v.size > 1 }
#=> {"1=23.00"=>["1=23.00", "1=23.00" , "1=23.00"],
# "6=22.12"=>["6=22.12", "6=22.12"],
# "12=21.34"=>["12=21.34", "12=21.34", "12=21.34"]}
a = f.keys
#=> ["1=23.00", "6=22.12", "12=21.34"]
a.all? {|s| s =~ REGEX}
#=> true
This will return true if there are any duplicates, false if there are not:
s = "1=23.00|6=22.12|12=21.34|112=20.34|3=23.00"
arr = s.split(/\|/).map { |s| s.gsub(/\d=/, "") }
arr != arr.uniq # => true
If you want to resolve it through regexp (not ruby), you should match whole string, not substrings. Well, I added [|] symbol and line ending to your regexp and it should works like you want.
([1-9][0-9]*[=][0-9\.]+[|]*)+$
Try it out.

Separating a code chunk into the main parts and the expected return parts

What is the best way to separate a code chunk (string) into its "main parts" and its "expected return parts"? Here are my definitions:
An expected return part is a line that matches /^[ \t]*#[ \t]*=>/ followed by zero or more consecutive lines that do not match /^[ \t]*#[ \t]*=>/ but match /[ \t]*#(?!\{)/.
A main part is any consecutive lines that is not an expected return part.
Main parts and expected return parts may appear multiple times in a code chunk.
Given a string of code chunk, I want to get an array of arrays, each of which includes a flag of whether it is an expected return part, and the string. What is the best way to do this? For example, given a string code whose content is:
def foo bar
"hello" if bar
end
#=> foo(true) == "hello"
#=> foo(false) == nil
a = (0..3).to_a
#=> a == [
# 0,
# 1,
# 2,
# 3
# ]
I would like a return that would be equivalent to this:
[[false, <<CHUNK1], [true <<CHUNK2], [true, <<CHUNK3], [false, <<CHUNK4], [true, <<CHUNK5]]
def foo bar
"hello" if bar
end
CHUNK1
#=> foo(true) == "hello"
CHUNK2
#=> foo(false) == nil
CHUNK3
a = (0..3).to_a
CHUNK4
#=> a == [
# 0,
# 1,
# 2,
# 3
# ]
CHUNK5
This regex should match all expected returns:
^([ \t]*#[ \t]*=>.+(?:\n[ \t]*#(?![ \t]*=>).+)*)
Extract and then replace all expected returns from your string with a separator. Then split your string by the separator and you will have all main parts.
Test it here: http://rubular.com/r/ZYjqPQND28
There is a slight problem with your definition pertaining to the regex /[ \t]*#(?!>\{)/, by which I am assuming you meant /[ \t]*#(?!=>)/, because otherwise
#=> foo(true) == "hello"
#=> foo(false) == nil
would count as one chunk
Another approach would be to use this regex (completely unoptimised):
^([ \t]*#[ \t]*=>.+(?:\n[ \t]*#(?![ \t]*=>).+)*|(?:[ \t]*(?!#[ \t]*=>).+\n)*)
to simply split it into chunks correctly, then do a relatively simple regex test on each chunk to see if it is an expected return or main part.

How do I get the match data for all occurrences of a Ruby regular expression in a string?

I need the MatchData for each occurrence of a regular expression in a string. This is different than the scan method suggested in Match All Occurrences of a Regex, since that only gives me an array of strings (I need the full MatchData, to get begin and end information, etc).
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
numbers.match input # #<MatchData "12"> (only the first match)
input.scan numbers # ["12", "34", "567"] (all matches, but only the strings)
I suspect there is some method that I've overlooked. Suggestions?
You want
"abc12def34ghijklmno567pqrs".to_enum(:scan, /\d+/).map { Regexp.last_match }
which gives you
[#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]
The "trick" is, as you see, to build an enumerator in order to get each last_match.
My current solution is to add an each_match method to Regexp:
class Regexp
def each_match(str)
start = 0
while matchdata = self.match(str, start)
yield matchdata
start = matchdata.end(0)
end
end
end
Now I can do:
numbers.each_match input do |match|
puts "Found #{match[0]} at #{match.begin(0)} until #{match.end(0)}"
end
Tell me there is a better way.
I’ll put it here to make the code available via a search:
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
input.gsub(numbers) { |m| p $~ }
The result is as requested:
⇒ #<MatchData "12">
⇒ #<MatchData "34">
⇒ #<MatchData "567">
See "input.gsub(numbers) { |m| p $~ } Matching data in Ruby for all occurrences in a string" for more information.
I'm surprised nobody mentioned the amazing StringScanner class included in Ruby's standard library:
require 'strscan'
s = StringScanner.new('abc12def34ghijklmno567pqrs')
while s.skip_until(/\d+/)
num, offset = s.matched.to_i, [s.pos - s.matched_size, s.pos - 1]
# ..
end
No, it doesn't give you the MatchData objects, but it does give you an index-based interface into the string.
input = "abc12def34ghijklmno567pqrs"
n = Regexp.new("\\d+")
[n.match(input)].tap { |a| a << n.match(input,a.last().end(0)+1) until a.last().nil? }[0..-2]
=> [#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]

Resources