Non-greedy regex for watir-webdriver - ruby

Say I want to get rid of the first occurrence of 'my' in the following string:
my Marsha Tammy
My current regex setup is greedy I think:
.sub(/my/,"")
Which gets rid of all instances. Data will look like this:
my Bill Port
my Samy Gonzalez
my Ulm Germany
Only want first occurrence of 'my' gone.

Try .sub(/^my/,"")
This is a working example:
https://regex101.com/r/tD2jI2/1
EDIT: Or even better - .sub(/^my /,"") to get rid of the trailing space

According to the Ruby docs for String#sub:
Returns a copy of str with the first occurrence of pattern replaced by the second argument.
So, you should be in the clear in terms of it only replacing the first instance of your regexp. If you wanted it to replace all instances then you need to use String#gsub.

Related

ruby regex: match URL recurring pattern

I want to be able to match all the following cases below using Ruby 1.8.7.
/pages/multiedit/16801,16809,16817,16825,16833
/pages/multiedit/16801,16809,16817
/pages/multiedit/16801
/pages/multiedit/1,3,5,7,8,9,10,46
I currently have:
\/pages\/multiedit\/\d*
This matches upto the first set of numbers. So for example:
"/pages/multiedit/16801,16809,16817,16825,16833"[/\/pages\/multiedit\/\d*/]
# => "/pages/multiedit/16801"
See http://rubular.com/r/ruFPx5yIAF for example.
Thanks for the help, regex gods.
\/pages\/multiedit\/\d+(?:,\d+)*
Example: http://rubular.com/r/0nhpgki6Gy
Edit: Updated to not capture anything... Although the performance hit would be negligible. (Thanks Tin Man)
The currently accepted answer of
\/pages\/multiedit\/[\d,]+
may not be a good idea because that will also match the following strings
.../pages/multiedit/,,,
.../pages/multiedit/,1,
My answer requires there be at least one digit before the first comma, and at least one digit between commas, and it must end with a digit.
I'd use:
/\/pages\/multiedit\/[\d,]+/
Here's a demonstration of the pattern at http://rubular.com/r/h7VLZS1W1q
[\d,]+ means "find one or more numbers or commas"
The reason \d* doesn't work is it means "find zero or more numbers". As soon as the pattern search runs into a comma it stops. You have to tell the engine that it's OK to find numbers and commas.

ruby regex make sub stop at first match

I am trying to replace a specific pattern in a text string.
That pattern is a href containing the word "sak".
My script currently looks like this:
ccontent=ccontent.sub(/<a .+?href=\"([^\"]+)\"[^\>]*>Sak<\/a>/, '')
The problem is that this replaces the entire string. (the string contains two links).
The problem is somewhere around the `a .+?" symbols, it runs through the link i want to Replace entirely and goes into the next link and replaces that whole link as well.
But I want it to STOP when the first pattern match is reached so that it only erases "sak" link.
How do i make the pattern match stop at the first time it reaches the 'href'?
Your expression is greedy, because .+? will actually keep matching any character as long as the pattern still matches.
Just use the [^>]* character set you're already using at the end of the regex:
ccontent.sub(/<a [^>]*href=\"([^\"]+)\"[^>]*>Sak<\/a>/, '')

Ruby regex: remove first name, leave last name

I am parsing a text and I want to ignore people's first names.
Examples (cases):
B.Obama => Obama
B. Obama => Obama
B . Obama => Obama
I manage to write this working Ruby regex:
"B.Obama".gsub(/\p{L}+\.(\p{L}+)/, '\\1')
However, it solves only one case. Also, it doesn't check, if the first letter is capital.
So, how should the regex, which combines all these cases, look like?
Details: Ruby 1.92 and UTF-8 strings.
I Gave a it a little bit more thought and I like this better:
/^(\w+)[ .,](.+$)/
This will capture both the first name and last name in different capturing groups
i.e.
"Mark del cato".scan /^(\w+)[ .,](.+$)/
see rubular for example: Rubular
Or Try
^[^ .]+
This will pick up the first word on a line. that is not delimited by a dot or a space.
Hope it helps, see example at Rubular
Try
(\w+)$
\w+ matches one or more 'word' characters.
The $ is a zero-length match matching the end of the string.
Do you want to be able to pull second names from a piece of text? That could get very difficult. Can you post an excerpt of your text?

Very odd issue with Ruby and regex

I am getting completely different reults from string.scan and several regex testers...
I am just trying to grab the domain from the string, it is the last word.
The regex in question:
/([a-zA-Z0-9\-]*\.)*\w{1,4}$/
The string (1 single line, verified in Ruby's runtime btw)
str = 'Show more results from software.informer.com'
Work fine, but in ruby....
irb(main):050:0> str.scan /([a-zA-Z0-9\-]*\.)*\w{1,4}$/
=> [["informer."]]
I would think that I would get a match on software.informer.com ,which is my goal.
Your regex is correct, the result has to do with the way String#scan behaves. From the official documentation:
"If the pattern contains groups, each individual result is itself an array containing one entry per group."
Basically, if you put parentheses around the whole regex, the first element of each array in your results will be what you expect.
It does not look as if you expect more than one result (especially as the regex is anchored). In that case there is no reason to use scan.
'Show more results from software.informer.com'[ /([a-zA-Z0-9\-]*\.)*\w{1,4}$/ ]
#=> "software.informer.com"
If you do need to use scan (in which case you obviously need to remove the anchor), you can use (?:) to create non-capturing groups.
'foo.bar.baz lala software.informer.com'.scan( /(?:[a-zA-Z0-9\-]*\.)*\w{1,4}/ )
#=> ["foo.bar.baz", "lala", "software.informer.com"]
You are getting a match on software.informer.com. Check the value of $&. The return of scan is an array of the captured groups. Add capturing parentheses around the suffix, and you'll get the .com as part of the return value from scan as well.
The regex testers and Ruby are not disagreeing about the fundamental issue (the regex itself). Rather, their interfaces are differing in what they are emphasizing. When you run scan in irb, the first thing you'll see is the return value from scan (an Array of the captured subpatterns), which is not the same thing as the matched text. Regex testers are most likely oriented toward displaying the matched text.
How about doing this :
/([a-zA-Z0-9\-]*\.*\w{1,4})$/
This returns
informer.com
On your test string.
http://rubular.com/regexes/13670

Not grab group match in Ruby Regex

I am trying to break down the following string:
"#command Text1 #command2 Text2"
in Ruby. I want to take out "Text1" and "Text2" in an array. To do this I am using the scan method and using this:
text.scan(/#* (.*?)(#|$)/)
However, when run, the script is pulling the # symbol in the middle as a separate match (presumably because the parenthesis are used in Ruby to indicate what string you want to pull out of the input):
Text1
#
Text2
My question is, how can I pull out Text1 and Text2 bearing in mind the expression needs to stop matching at both "#" and the end of a string?
If you want a non-capturing group use ?:
text.scan(/#* (.*?)(?:#|$)/)
As a sidenote, your regular expression looks like it might contain an error. Perhaps you meant this instead?
text.scan(/#\w+ (\w+)(?= #|$)/)
The difference is that your expression matches on " foo", which I guess is not intentional.
text.scan(/#* (.*?)(?:#|$)/)'
In your regex, you don't need the parentheses around '#|$'. The following will accomplish the same thing without the '#' being returned in a separate match group:
text.scan(/#* (.*?)[#\$]/)
Since you're looking only for a single character in that group, the square brackets will match any one character within them.
Here's how I'd do it:
text.scan(/#[^\s]* ([^#]*)/)
How does this regex look?
http://rubular.com/regexes/13264

Resources