Regex: match something except within arbitrary delimiters - ruby

My string:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
Using Ruby's regex flavor, I would like to do something like:
a.gsub /regex_pattern/, '_'
And obtain:
"Please_match_spaces_here_<but not here>._Again_match_here_<while ignoring these>"

This should do it:
result = subject.gsub(/\s+(?![^<>]*>)/, '_')
This regex assumes there's nothing tricky like escaped angle brackets. Also be aware that \s matches newlines, TABs and other whitespace characters as well as spaces. That's probably what you want, but you have the option of matching only spaces:
/ +(?![^<>]*>)/

I think, it works:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
pattern = /<(?:(?!<).)*>/
a.gsub(pattern, '')
# => "Please match spaces here . Again match here "

Related

Ruby: Regular expression to match which strings made up either space , tab or new line and nothing else

I trying to formulate a regular expression which will match only those strings which are made up of only 3 types of characters: tab, space and new line. For ex.
String1 = " \t "
String2 = "\n\n"
String3 = " \t \n \n \n "
All above strings should match the regular expression.
I tried this : %r/[ \n]+/
But this is also matching strings having space and new line but apart from those many other characters also, like
string4 = " I am a boy \n"
My expression is also match string4 which it should not match.
I am not able to fix it. It will be great if someone could come up with a solution to fix this.
You need to tell the regex that the WHOLE string must fit, rather than part of a string. Do this with the ^ and $ operators, which mean 'start of file' and 'end of file' respectively:
/^[\t\n ]+$/
This site, and sites like it, can be useful:
http://regex101.com/

Ruby regex to split text

I am using the below regex to split a text at certain ending punctuation however it doesn't work with quotes.
text = "\"Hello my name is Kevin.\" How are you?"
text.scan(/\S.*?[...!!??]/)
=> ["\"Hello my name is Kevin.", "\" How are you?"]
My goal is to produce the following result, but I am not very good with regex expressions. Any help would be greatly appreciated.
=> ["\"Hello my name is Kevin.\"", "How are you?"]
text.scan(/"(?>[^"\\]+|\\{2}|\\.)*"|\S.*?[...!!??]/)
The idea is to check for quoted parts before. The subpattern is a bit more elaborated than a simple "[^"]*" to deal with escaped quotes (* see at the end to a more efficient pattern).
pattern details:
" # literal: a double quote
(?> # open an atomic group: all that can be between quotes
[^"\\]+ # all that is not a quote or a backslash
| # OR
\\{2} # 2 backslashes (the idea is to skip even numbers of backslashes)
| # OR
\\. # an escaped character (in particular a double quote)
)* # repeat zero or more times the atomic group
" # literal double quote
| # OR
\S.*?[...!!??]
to deal with single quote to you can add: '(?>[^'\\]+|\\{2}|\\.)*'| to the pattern (the most efficient), but if you want make it shorter you can write this:
text.scan(/(['"])(?>[^'"\\]+|\\{2}|\\.|(?!\1)["'])*\1|\S.*?[...!!??]/)
where \1 is a backreference to the first capturing group (the found quote) and (?!\1) means not followed by the found quote.
(*) instead of writing "(?>[^"\\]+|\\{2}|\\.)*", you can use "[^"\\]*+(?:\\.[^"\\]*)*+" that is more efficient.
Add optional quote (["']?) to the pattern:
text.scan(/\S.*?[...!!??]["']?/)
# => ["\"Hello my name is Kevin.\"", "How are you?"]

Why $ doesn't match \r\n

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.
$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

What is the Ruby regex to match a string with at least one period and no spaces?

What is the regex to match a string with at least one period and no spaces?
You can use this :
/^\S*\.\S*$/
It works like this :
^ <-- Starts with
\S <-- Any character but white spaces (notice the upper case) (same as [^ \t\r\n])
* <-- Repeated but not mandatory
\. <-- A period
\S <-- Any character but white spaces
* <-- Repeated but not mandatory
$ <-- Ends here
You can replace \S by [^ ] to work strictly with spaces (not with tabs etc.)
Something like
^[^ ]*\.[^ ]*$
(match any non-spaces, then a period, then some more non-spaces)
no need regular expression. Keep it simple
>> s="test.txt"
=> "test.txt"
>> s["."] and s.count(" ")<1
=> true
>> s="test with spaces.txt"
=> "test with spaces.txt"
>> s["."] and s.count(" ")<1
=> false
Try this:
/^\S*\.\S*$/

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Resources