Why $ doesn't match \r\n - ruby

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.

$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

Related

Regex: match something except within arbitrary delimiters

My string:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
Using Ruby's regex flavor, I would like to do something like:
a.gsub /regex_pattern/, '_'
And obtain:
"Please_match_spaces_here_<but not here>._Again_match_here_<while ignoring these>"
This should do it:
result = subject.gsub(/\s+(?![^<>]*>)/, '_')
This regex assumes there's nothing tricky like escaped angle brackets. Also be aware that \s matches newlines, TABs and other whitespace characters as well as spaces. That's probably what you want, but you have the option of matching only spaces:
/ +(?![^<>]*>)/
I think, it works:
a = "Please match spaces here <but not here>. Again match here <while ignoring these>"
pattern = /<(?:(?!<).)*>/
a.gsub(pattern, '')
# => "Please match spaces here . Again match here "

Ruby - substitute \n if not \\n

I'm trying to do a regex with lookbehind that changes \n to but not if it's a \\n.
My closest attempt has no effect:
text.gsub /(?<!\\)\n/, ''
Unfortunately, no number of backslashes in the lookbehind seem to fix the problem. How can I address this?
You need to double the backslash before the n in the regex, otherwise it's looking for a newline instead of a literal backslash followed by n:
irb(main):001:0> puts "hello\\nthere\\\\n".gsub(/(?<!\\)\\n/, ' ')
hello there\\n
You don't need anything special. "\n" is a single character. It does not include a "\" or "n" character.
text.gsub(/\n/, "")
But instead of that, you should do:
text.gsub("\n", "")
or
text.tr("\n", "")
But I would do:
text.tr($/, "")

Regexp for certain character to end of line

I have a string
"So on and so forth $5.99"
I would like to extract everything after the $ until the end of the line.
/$ finds the character $. How do I select the rest of the string? I know it's something \z but I can't get the syntax right.
In regexp $ represents the end of the line.
So in your case you need \$.*$ To include your escaped $ and everything (.*) up until the end of the line $.
No, /$ does not match that character. You need to escape it \ to match a literal.
string = "So on and so forth $5.99"
result = string.match(/\$(.*)$/)
puts result[1] #=> "5.99"
If you want to capture everything after the $, you'll want:
/\$(.*)\z/
See http://rubular.com/r/T4fR1SEl3j

Ruby regex: ^ matches start of line even without m modifier?

Ruby 1.8.7. I'm using a regex with a ^ to match a pattern at the start of the string. The problem is that if the pattern is found at the start of any line in the string it still matches. This is the behaviour I would expect if I were using the 'm' modifier but I'm not:
$ irb
irb(main):001:0> str = "hello\ngoodbye"
=> "hello\ngoodbye"
irb(main):002:0> puts str
hello
goodbye
=> nil
irb(main):004:0> str =~ /^goodbye/
=> 6
What am I doing wrong here?
start of the line: ^
end of the line: $
start of the string: \A
end of the string: \z
Use \A instead of ^.
Ruby regex reference: http://www.zenspider.com/ruby/quickref.html#regexen
Your confusion is justified. In most regex flavors, ^ is equivalent to \A and $ is equivalent to \Z by default, and you have to set the "multiline" flag to make them take on their other meanings (i.e. line boundaries). In Ruby, ^ and $ always match at line boundaries.
To add to the confusion, Ruby has something it calls "multiline" mode, but it's really what everybody else calls "single-line" or "DOTALL" mode: it changes the meaning of the . metacharacter, allowing it to match line-separator characters (e.g. \r, \n) as well as all other characters.
"^" is the start of the line. To make what you want, you can split de string and test just the first line. But I think exist some better method.
str.split("\n")[0] =~ /^hello/

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Resources