ruby regex test for exact match of string - ruby

I am trying to write a regex to exactly match if a string is a mongo id, not if a string contains a mongo id. My regex for a mongo id is
/[a-z,0-9]{24}/
which works great, but I can't figure out how to write a regex that rejects a URL that contains a mongo id, for example. So, I get this, which is not what I want:
pattern = /[a-z,0-9]{24}/
str1 = "589ab375c3c6416310171b5b"
str2 = "http://somewhere.com/589ab375c3c6416310171b5b?review"
str1.match pattern
output:
#<MatchData "589ab375c3c6416310171b5b">
And the second string
str2.match pattern
output:
#<MatchData "589ab375c3c6416310171b5b">
I want the first to match but the second to not.
Thanks for any help,
kevin

For your example data your could use anchors \A and \z. The data does not contain a comma so you can omit that from the character class.
For example:
pattern = /\A[a-f0-9]{24}\z/
Demo

You can use ^ and $ to match exactly a whole string. ^ matches the start of a line, and $ matches the end. So:
/^[a-z,0-9]{24}$/
will match a string containing only the 24 characters of a Mongo ID.

Related

best way to find substring in ruby using regular expression

I have a string https://stackverflow.com. I want a new string that contains the domain from the given string using regular expressions.
Example:
x = "https://stackverflow.com"
newstring = "stackoverflow.com"
Example 2:
x = "https://www.stackverflow.com"
newstring = "www.stackoverflow.com"
"https://stackverflow.com"[/(?<=:\/\/).*/]
#⇒ "stackverflow.com"
(?<=..) is a positive lookbehind.
If string = "http://stackoverflow.com",
a really easy way is string.split("http://")[1]. But this isn't regex.
A regex solution would be as follows:
string.scan(/^http:\/\/(.+)$/).flatten.first
To explain:
String#scan returns the first match of the regex.
The regex:
^ matches beginning of line
http: matches those characters
\/\/ matches //
(.+) sets a "match group" containing any number of any characters. This is the value returned by the scan.
$ matches end of line
.flatten.first extracts the results from String#scan, which in this case returns a nested array.
You might want to try this:
#!/usr/bin/env ruby
str = "https://stackoverflow.com"
if mtch = str.match(/(?::\/\/)(/S)/)
f1 = mtch.captures
end
There are two capturing groups in the match method: the first one is a non-capturing group referring to your search pattern and the second one referring to everything else afterwards. After that, the captures method will assign the desired result to f1.
I hope this solves your problem.

Ruby Regex Rubular vs reality

I have a string and I want to remove all non-word characters and whitespace from it. So I thought Regular expressions would be what I need for that.
My Regex looks like that (I defined it in the string class as a method):
/[\w&&\S]+/.match(self.downcase)
when I run this expression in Rubular with the test string "hello ..a.sdf asdf..," it highlioghts all the stuff I need ("hellloasdfasdf") but when I do the same in irb I only get "hello".
Has anyone any ideas about why that is?
Because you use match, with returns one matching element. If you use scan instead, all should work properly:
string = "hello ..a.sdf asdf..,"
string.downcase.scan(/[\w&&\S]+/)
# => ["hello", "a", "sdf", "asdf"]
\w means [a-zA-Z0-9_]
\S means any non-whitespace character [a-zA-Z_-0-9!##$%^&*\(\)\\{}?><....etc]
so using a \w and \S condition is ambiguous.
Its like saying What is an intersection of India and Asia. Obviously its going to be India. So I will suggest you to use \w+.
and you can use scan to get all matches as mentioned in the second answer :
string = "hello ..a.sdf asdf..,"
string.scan(/\w+/)

Ruby regex - using optional named backreferences

I am trying to write a Ruby regex that will return a set of named matches. If the first element (defined by slashes) is found anywhere later in the string then I want the match to return that 2nd match onward. Otherwise, return the whole string. The closest I've gotten is (?<p1>top_\w+).*?(?<hier>\k<p1>.*) which doesn't work for the 3rd item. I've tried regex ifthen-else constructs but Rubular says it's invalid. I've tried (?<p1>[\w\/]+?)(?<hier>\k<p1>.*) which correct splits the 1st and 4th lines but doesn't work for the others. Please note: I want all results to return as the same named reference so I can iterate through "hier".
Input:
top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Output:
hier = top_cat/mouse/dog/elephant/horse
hier = top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
hier = top_bat/car[0]
hier = top_2/top_1/top_3/top_4/dog
Problem
The reason it does not match the second line is because the second instance of hat does not end with a slash, but the first instance does.
Solution
Specify that there is a slash between the first and second match
Regex
(top_.*)/(\1.*$)|(^.*$)
Replacement
hier = \2\3
Example
Regex101 Permalink
More info on the Alternation token
To explain how the | token works in regex, see the example: abc|def
What this regex means in plain english is:
Match either the regex below (attempting the next alternative only if this one fails)
Match the characters abc literally
Or match the regex below (the entire match attempt fails if this one fails to match)
Match the characters def literally
Example
Regex: alpha|alphabet
If we had a phrase "I know the alphabet", only the word alpha would be matched.
However, if we changed the regex to alphabet|alpha, we would match alphabet.
So you can see, alternation works in a left-to-right fashion.
paths = %w(
top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
test/test
)
paths.each do |path|
md = path.match(/^([^\/]*).*\/(\1(\/.*|$))/)
heir = md ? md[2] : path
puts heir
end
Output:
top_cat/mouse/dog/elephant/horse
top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
top_bat/car[0]
top_2/top_1/top_3/top_4/dog
test

Match consecutive list of exactly one character in set with regular expressions

I don't think I'll even try to explain this, I don't know the words to, but I'd like to achieve the following:
Given a string like this:
+++>><<<--
I'd like a match to give me: +++, but also match if any of the other characters were in the string consecutively like they are. So if the +++ wasn't there, I'd like to match >>.
I tried using the following regular expression:
([><\-\+]+)
However, given the string above, it would match the entire string, and not the first list of consecutive characters.
If it makes a difference, this is in Ruby (1.9.3).
Not sure about the ruby bit, but you can do this with backreferences in the pattern:
(.)\1+
What this does is to use a capturing group () to capture any character . followed by any number + of the same character \1. The \1 is a backreference to the the first captured group; in a pattern with more capturing groups \2 would be the second captured group and so on.
Java Example
Pattern p = Pattern.compile("(.)\\1+");
Matcher m = p.matcher("aaabbccaa");
m.find();
System.out.println(m.group(0)); // prints "aaa"
Ruby Example
# Return an array of matched patterns.
string = '+++>><<<--'
string.scan( /((.)\2+)/ ).collect { |match| match.first }

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

Resources