Match String with Number Placeholder - ruby

I would like to match the following strings: With String match: https://apidock.com/ruby/String/match
"The account 340394034 is finalized"
"The account 9394834 is finalized"
"The account 12392039483 is finalized"
"The account 3493849384 is finalized"
"The account 32984938434983 is finalized"
Which regex do I have to use to match this strings with number placeholders in it? Thanks
"The account {number_placeholder} is finalized"

This is the full regex
\d+
Depending on input, assuming there is a possibility of other numbers in the string, you could use this instead and get the contents of capture group 1:
account\s+(\d+)

If you just want to use the match method to determine whether a given string matches the pattern in your examples, you can do this:
example = "The account 32984938434983 is finalized"
if example.match(/The account \d+ is finalized/)
puts "it matched"
else
puts "it didn't match"
end
The match method returns a MatchData object (basically the part of the string that matched the regex, which in this case is the whole thing). Using it on a non-matching string will return nil, this means you can use the result of the match method for if-statements.
If you want to extract the number in the string, only if the string matches the pattern, you could do this:
example = "The account 32984938434983 is finalized"
match_result = example.match(/The account (\d+) is finalized/)
number = if match_result
match_result.captures.first.to_i
else
number = nil # or 0 / some other default value
end
The brackets in the regex form a "capture group". The captures method on the result gives an array of all the capture group matches. The first method gets the first (and in this case only) element from that array, and the to_i method converts the string into an integer.

Related

Ruby : get value between parenthesis (without those parenthesis)

I have strings like "untitled", "untitled(1)" or "untitled(2)".
I want to get the last integer value between parenthesis when if there is. So far I tried a lot of regex, the ones making sense to me (I am new to regex) look like this:
number= string[/\([0-9]+\)/]
number= string[/\(([0-9]+)\)/]
but it still returns me the value with the parenthesis.
If there are no left-then-right parenthesis, getting an empty string (or nil) would be nice. Case such as "untitled($)", getting the '$' char or nil or an empty string would do the trick. For "untitled(3) animal(4)", I want to get 4.
I have been looking a lot of topics about how to do that and but it never seems to work ... what am I missing here ?
/(?<=\()\w+(?=\)$)/ matches one or more word characters (letter, number, underscore) within parenthesis, right before the end of line:
words = %w[
untitled
untitled(1)
untitled(2)
untitled(foo)
unti(tle)d
]
p words.map { |word| word[/(?<=\()\w+(?=\)$)/] }
# => [nil, "1", "2", "foo", nil]
When you use Regex as the parameter to String#[], you can optionally pass in the index of captured group to extract that group.
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
string = "untitled(1)"
number = string[/\(([0-9]+)\)/, 1]
puts number
#=> 1

Different return values of `String#split` method using regex

Why does the second split in the following return the punctuation? Why does using parentheses in the regular expression change the output?
str = "This is a test string. Let's split it."
str.split(/\. /)
# =>["This is a test string", "Let's split it."]
str.split(/(\. )/)
# =>["This is a test string", ". ", "Let's split it."]
Because the second code uses a regex that contains a group. From String#split:
If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.

Use regular expression to fetch 3 groups from string

This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.

Ruby Regex: negative lookahead with unlimited matching before

I'm trying to be able to match a phrase like:
I request a single car
// or
I request a single person
// or
I request a single coconut tree
but not
I request a single car by id
// nor
I request a single person by id with friends
// nor
I request a single coconut tree by id with coconuts
Something like this works:
/^I request a single person(?!\s+by id.*)/
for strings like this:
I request a single person
I request a single person with friends
But when I replace the person with a matcher (.*) or add the $ to the end, it stops working:
/^I request a single (.*)(?!\s+by id.*)$/
How can I accomplish this but still match in the first match everything before the negative lookahead?
There's no ) to match ( in (.*\). Perhaps that's a typo, since you tested. After fixing that, however, there's still a problem:
"I request a single car by idea" =~ /^I request a single (?!.*by id.*)(.*)$/
#=> nil
Presumably, that should be a match. If you only want to know if there's a match, you can use:
r = /^I request a single (?!.+?by id\b)/
Then:
"I request a single car by idea" =~ r #=> 0
"I request a single person by id with friends" =~ r #=> nil
\b matches a word break, which includes the case where the previous character is the last one in the string. Notice that if you are just checking for a match, there's no need to include anything beyond the negative lookahead.
If you want to return whatever follows "single " when there's a match, use:
r = /^I request a single (?!.+?by id\b)(.*)/
"I request a single coconut tree"[r,1] #=> "coconut tree"
"I request a single person by id with friends"[r,1] #=> nil
OK, I think I just got it. Right after asking the question. Instead of a creating lookahead after the thing I want to capture, I create a lookahead before the thing I want to capture, like so:
/^I request a single (?!.*by id.*)(.*[^\s])?\s*$/

Validate that first char has to be integer or letter

I have this method where I "spot" user input I want to prevent.
def has_forbidden_prefix?(string)
string =~ %r{^(http://|www.)}
end
If has_forbidden_prefix? is true than I don't want to accept the input.
For example:
Allowed: google.com
Not allowed: www.google.com, http://google.com, http://www.google.com
Now I want to detect also any beginning special characters in my method.
Not allowed: .google.com, /google.com, ...
What do I have to include in my regex?
The regex to see if the first character is an alphanumeric or number is:
^[a-zA-Z0-9]
where ^ stands for the start of the string the regex pattern is applied to. For more info refer to http://www.regular-expressions.info/reference.html
That regex matches what you didn't want from the previous question
So expanding you could have
def has_forbidden_prefix?(string)
disallowed_string = string =~ %r{\A(http://|www)}
non_alphanumeric = string =~ /\A[^a-zA-Z0-9]/
disallowed_string || non_alphanumeric
end

Resources