Ruby regex: "capture string unless it is followed by..." - ruby

My regex captures quoted phrases:
"([^"]*)"
I want to improve it, by ignoring quotes, which are followed by ', -' (a comma, a space and a dash in this particular order).
How do I do this?
The test: http://rubular.com/r/xls6vN1w92

This should do it, using a Negative Lookahead:
"(?!, -)([^"]*)"(?!, -)
A little icky, but it works. You want to make sure either quote isn't followed by your string, or else the match will start at the closing quotes.
http://rubular.com/r/yFMyUKJOHL

Regex
"(.*?)"(?!, -)
Working Example
http://rubular.com/r/9kOmZLxLfy

This is unparsable in your context, its open ended. The only way to parse it is to consume the not's as well as the want's, but its still an invalid premise.
/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/
Then check for capture group 1 on each match, something like this:
$rx = qr/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/;
while (' "ingnore me", - "but not me" ' =~ /$rx/g) {
print "'$1'\n" if defined $1
}

Add (?!...) at the end of the regex:
"([^"\n]*)"(?!, -)

Related

Regex to detect period at end of string, but not '...'

Using a regex, how can I match strings that end with exactly one . as:
This is a string.
but not those that end with more than one . as:
This is a string...
I have a regex that detects a single .:
/[\.]{1}\z/
but I do not want it to match strings that end in ....
What you want is a 'negative lookbehind' assertion:
(?<!\.)\.\z
This looks for a period at the end of a string that isn't preceded by a period. The other answers won't match the following string: "."
Also, you may need to look out for unicode ellipsis characters…
You can detect this like so: str =~ /\u{2026}/
You can use:
[^\.][\.]\z
You are looking for a string that before the last dot there is a char that is not a dot.
I like Regexr a lot!
Solution similar to Dekel:
[^.]+[.]
Live demo

How exactly does this work string.split(/\?|\.|!/).size?

I know, or at least I think I know, what this does (string.split(/\?|\.|!/).size); splits the string at every ending punctuation into an array and then gets the size of the array.
The part I am confused with is (/\?|\.|!/).
Thank you for your explanation.
Regular expressions are surrounded by slashes / /
The backslash before the question mark and dot means use those characters literally (don't interpret them as special instructions)
The vertical pipes are "or"
So you have / then question mark \? then "or" | then period \. then "or" | then exclamation point ! then / to end the expression.
/\?|\.|!/
It's a Regular Expression. That particular one matches any '?', '.' or '!' in the target string.
You can learn more about them here: http://regexr.com/
A regular expression splitting on the char "a" would look like this: /a/. A regular expression splitting on "a" or "b" is like this: /a|b/. So splitting on "?", "!" and "." would look like /?|!|./ - but it does not. Unfortunately, "?", and "." have special meaning in regexps which we do not want in this case, so they must be escaped, using "\".
A way to avoid this is to use Regexp.union("?","!",".") which results in /\?|!|\./
(/\?|\.|!/)
Working outside in:
The parentheses () captures everything enclosed.
The // tell Ruby you're using a Regular Expression.
\? Matches any ?
\. Matches any .
! Matches any !
The preceding \ tells Ruby we want to find these specific characters in the string, rather than using them as special characters.
Special characters (that need to be escaped to be matched) are:
. | ( ) [ ] { } + \ ^ $ * ?.
There is a nice guide to Ruby RegEx at:
http://rubular.com/ & http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
For SO answers that involve regular expressions, I often use the "extended" mode, which makes them self-documenting. This one would be:
r = /
\? # match a question mark
| # or
\. # match a period
| # or
! # match an explamation mark
/x # extended mode
str = "Out, damn'd spot! out, I say!—One; two: why, then 'tis time to " +
"do't.—Hell is murky.—Fie, my lord, fie, a soldier, and afeard?"
str.split(r)
#=> ["Out, damn'd spot",
# " out, I say",
# "—One; two: why, then 'tis time to do't",
# "—Hell is murky",
# "—Fie, my lord, fie, a soldier, and afeard"]
str.split(r).size #=> 5
#steenslag mentioned Regexp::union. You could also use Regexp::new to write (with single quotes):
r = Regexp.new('\?|\.|!')
#=> /\?|\.|!/
but it really doesn't buy you anything here. You might find it useful in other situations, however.

Ruby regex: operator and

I have an string of an email that looks like "<luke#example.com>"
I would like to use regex for deleting "<" and ">", so I wanted something like
"<luke#example.com>".sub /<>/, ""
The problem is quite clear, /<>/ doesn't wrap what I want. I tried with different regex, but I don't know how to choose < AND >, it is there any and operator where I can say: "wrap this and this"?
As written, your regex matches the literal substring "<>" only. You need to use [] to make them a character class so that they're matched individually, and gsub to replace all matches:
"<luke#example.com>".gsub(/[<>]/, "") # => "luke#example.com"
"<luke#example.com>".gsub /[<>]/, ""
http://regex101.com/r/hP3sY2
If you only ever want to strip the < and > from the start and end only, you can use this:
'<luke#example.com>'.sub(/\A<([^<>]+)>\z/, '\1')
You don't need, nor should you use, a regex.
string[1..-2]
is enough.

Ruby regex: split string with match beginning with either a newline or the start of the string?

Here's my regular expression that I have for this. I'm in Ruby, which — if I'm not mistaken — uses POSIX regular expressions.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
Here's my goal: I want to split a string with a regex that is *delimited by asterisks*, including those asterisks. However: I only want to split by the match if it is prefaced with a newline character (\n), or it's the start of the whole string. This is the string I'm working with.
"*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
My regular expression is not splitting properly at the *Friday* match, but it is splitting at the *But break here* match (it's also throwing in a here split). My issue is somewhere in the first group, I think: (?:\n^) — I know it's wrong, and I'm not entirely sure of the correct way to write it. Can someone shed some light? Here's my complete code.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
str.split(regex)
Which results in this:
>>> ["*Friday*\nDo not *break here*", "*But break here*", "But again, not this"]
I want it to be this:
>>> ["*Friday*", "Do not *break here*", "*But break here*", "But again, not this"]
Edit #1: I've updated my regex and result. (2011/10/18 16:26 CST)
Edit #2: I've updated both again. (16:32 CST)
What if you just add a '\n' to the front of each string. That simplifies the processing quite a bit:
regex = /(?:\n)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
res = ("\n"+str).split(regex)
res.shift if res[0] == ""
res
=> [ "*Friday*", "Do not *break here*",
"*But break here*", "But again, not this"]
We have to watch for the initial extra match but it's not too bad. I suspect someone can shorten this a bit.
Groups 1 & 2 of the regex below :
(?:\A|\\n)(\*.*?\*)|(?:\A|\\n)(.*?)(?=\\n|\Z)
Will give you your desired output. I am no ruby expert so you will have to create the list yourself :)
Why not just split at newlines? From your example, it looks that's what you're really trying to do.
str.split("\n")

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources