Matching multiple parts of a string as first match - ruby

Given the following string:
details.html?id=8220&inr=4241&marke=Ford&modell=Focus&art=Gebrauchtwagen&standort=
I need to match 82204241 in a single expression. I need to extract all numbers from it as a single match. Any idea how this can be solved?
(\d+) will create two matches. I also tried with something like this without any luck: details\.html\?[id=|.*inr=]+(\d+)

Regex only matches a substring of the original string. Since 82204241 does not appear as a substring in the original string, it is impossible to match that as a single match with a regex.

How about joining regex scan? Here:
a = "details.html?id=8220&inr=4241&marke=Ford&modell=Focus&art=Gebrauchtwagen&standort="
a.scan(/\d+/).join
# => "82204241"

Related

Ensure non-matching of a pattern within a scope

I am trying to create a regex that matches a pattern in some part of a string, but not in another part of the string.
I am trying to match a substring that
(i) is surrounded by a balanced pair of one or more consecutive backticks `
(ii) and does not include as many consecutive backticks as in the surrounding patterns
(iii) where the surrounding patterns (sequence of backticks) are not adjacent to other backticks.
This is some variant of the syntax of inline code notation in Markdown syntax.
Examples of matches are as follows:
"xxx`foo`yyy" # => matches "foo"
"xxx``foo`bar`baz``yyy" # => matches "foo`bar`baz"
"xxx```foo``bar``baz```yyy" # => matches "foo``bar``baz"
One regex to achieve this is:
/(?<!`)(?<backticks>`+)(?<inline>.+?)\k<backticks>(?!`)/
which uses a non-greedy match.
I was wondering if I can get rid of the non-greedy match.
The idea comes from when the prohibited pattern is a single character. When I want to match a substring that is surrounded by a single quote ' that does not include a single quote in it, I can do either:
/'.+?'/
/'[^']+'/
The first one uses non-greedy match, and the second one uses an explicit non-matching pattern [^'].
I am wondering if it is possible to have something like the second form when the prohibited pattern is not a single character.
Going back to the original issue, there is negative lookahead syntax(?!), but I cannot restrict its effective scope. If I make my regex like this:
/(?<!`)(?<backticks>`+)(?<inline>(?!.*\k<backticks>).*)\k<backticks>(?!`)/
then the effect of (?!.*\k<backticks>) will not be limited to within (?<inline>...), but will extend to the whole string. And since that contradicts with the \k<backticks> at the end, the regex fails to match.
Is there a regex technique to ensure non-matching of a pattern (not-necessarily a single character) within a certain scope?
You can search for one or more characters which aren't the first character of a delimiter:
/(?<!`)(?<backticks>`+)(?<inline>(?:(?!\k<backticks>).)+)\k<backticks>(?!`)/

Regular Expression String Does Not Contain Numbers

I'm trying to match a pattern in a url that does not include a number.
For example:
/painters/1-joe-bob/dashboard
I would only want to match urls that are the following:
/painters
/painters/string
If the url includes /painters/1-something then there should be no match.
I've been trying the following with no luck:
\/{1}(painters|contractors)\/?[^0-9][a-z]*
This still matches on /painters/ or /contractors/
Please advise.
You can try this regex. It uses a negative lookahead to disallow a match if a number comes after your second forward slash.
^\/(painters|contractors)\/(?![0-9])
Note that if you don't want number anywhere in the string you can use a negative lookahead right at the beginning.
^(?!.*[0-9])\/(painters|contractors)\/
This construct will disallow any string containing numbers.

Match comma separated list with Ruby Regex

Given the following string, I'd like to match the elements of the list and parts of the rest after the colon:
foo,bar,baz:something
I.e. I am expecting the first three match groups to be "foo", "bar", "baz". No commas and no colon. The minimum number of elements is 1, and there can be arbitrarily many. Assume no whitespace and lower case.
I've tried this, which should work, but doesn't populate all the match groups for some reason:
^([a-z]+)(?:,([a-z]+))*:(something)
That matches foo in \1 and baz (or whatever the last element is) in \2. I don't understand why I don't get a match group for bar.
Any ideas?
EDIT: Ruby 1.9.3, if that matters.
EDIT2: Rubular link: http://rubular.com/r/pDhByoarbA
EDIT3: Add colon to the end, because I am not just trying to match the list. Sorry, oversimplified the problem.
This expression works for me: /(\w+)/i
If you want to do it with regex, how about this?
(?<=^|,)("[^"]*"|[^,]*)(?=,|$)
This matches comma-separated fields, including the possibility of commas appearing inside quoted strings like 123,"Yes, No". Regexr for this.
More verbosely:
(?<=^|,) # Must be preceded by start-of-line or comma
(
"[^"]*"| # A quote, followed by a bunch of non-quotes, followed by quote, OR
[^,]* # OR anything until the next comma
)
(?=,|$) # Must end with comma or end-of-line
Usage would be with something like Python's re.findall(), which returns all non-overlapping matches in the string (working from left to right, if that matters.) Don't use it with your equivalent of re.search() or re.match() which only return the first match found.
(NOTE: This actually doesn't work in Python because the lookbehind (?<=^|,) isn't fixed width. Grr. Open to suggestions on this one.)
Edit: Use a non-capturing group to consume start-of-line or comma, instead of a lookbehind, and it works in Python.
>>> test_str = '123,456,"String","String, with, commas","Zero-width fields next",,"",nyet,123'
>>> m = re.findall('(?:^|,)("[^"]*"|[^,]*)(?=,|$)',test_str)
>>> m
['123', '456', '"String"', '"String, with, commas"',
'"Zero-width fields next"', '', '""', 'nyet', '123']
Edit 2: The Ruby equivalent of Python's re.findall(needle, haystack) is haystack.scan(needle).
Maybe split will be better solution for this case?
'foo,bar,baz'.split(',')
=> ["foo", "bar", "baz"]
If I am interpreting your post correctly, you want everything separated by commas before the colon (:).
The appropriate regex for this would be:
[^\s:]*(,[^\s:]*)*(:.*)?
This should find everything you are looking for.

How to match anything EXCEPT this string?

How can I match a string that is NOT partners?
Here is what I have that matches partners:
/^partners$/i
I've tried the following to NOT match partners but doesn't seem to work:
/^(?!partners)$/i
Your regex
/^(?!partners)$/i
only matches empty lines because you didn't include the end-of-line anchor in your lookahead assertion. Lookaheads do just that - they "look ahead" without actually matching any characters, so only lines that match the regex ^$ will succeed.
This would work:
/^(?!partners$)/i
This reports a match with any string (or, since we're in Ruby here, any line in a multi-line string) that's different from partners. Note that it only matches the empty string at the start of the line. Which is enough for validation purposes, but the match result will be "" (instead of nil which you'd get if the match failed entirely).
not easily but with the look ahead operator it can.
Here the ruby regex
^((?!partners).)*$
Cheers
If you only want to get a true value when string is not partners then there is no need to use regex and you can just use a string comparison (which ignores case).
If you for some reason need a positive regex match for any string which does not contain partners (if it's a part of a larger regex for example) you could use several different constructs, like:
`^(?:(?!partners).)*$`
or
^(?:[^p]+|p(?!artners))*$
For example, in Java:
!"partners".equalsIgnoreCase(aString)

Ruby String: how to match a Regexp from a defined position

I want to match a regexp from a ruby string only from a defined position. Matches before that position do not interest me. Moreover, I'd like \A to match this position.
I found this solution:
code[index..-1][/\A[a-z_][a-zA-Z0-9_]*/]
This match the regexp at position index in the string code. If the match is not exactly at position index, it return nil.
Is there a more elegant way to do this (I want to avoid to create the temporary string with the first slice)?
Thanks
You could use ^.{#{index}} inside the regular expression. Don't know if that's what you want, because I don't understand your question completely. Can you maybe add an example with the tested String? And have you heard of Rubular? Great way to test your regular expressions.
This is how you could do it if I understand your question correctly:
code.match(/^.{#{index}}your_regex_here/)
The index variable will be put inside your regular expression. When index = 4, it will check if there's 4 characters from the beginning. Then it will check your own regular expression and only return true if yours is valid as well. I hope it helps. Good luck.
EDIT
And if you want to get the matched value for your regular expression:
code.scan(/^.{#{index}}([a-z_][a-zA-Z0-9_]*)/).join
It puts the matched result (inside the brackets) in an Array and joins it into a String.

Resources