Ruby regex: ^ matches start of line even without m modifier? - ruby

Ruby 1.8.7. I'm using a regex with a ^ to match a pattern at the start of the string. The problem is that if the pattern is found at the start of any line in the string it still matches. This is the behaviour I would expect if I were using the 'm' modifier but I'm not:
$ irb
irb(main):001:0> str = "hello\ngoodbye"
=> "hello\ngoodbye"
irb(main):002:0> puts str
hello
goodbye
=> nil
irb(main):004:0> str =~ /^goodbye/
=> 6
What am I doing wrong here?

start of the line: ^
end of the line: $
start of the string: \A
end of the string: \z

Use \A instead of ^.
Ruby regex reference: http://www.zenspider.com/ruby/quickref.html#regexen

Your confusion is justified. In most regex flavors, ^ is equivalent to \A and $ is equivalent to \Z by default, and you have to set the "multiline" flag to make them take on their other meanings (i.e. line boundaries). In Ruby, ^ and $ always match at line boundaries.
To add to the confusion, Ruby has something it calls "multiline" mode, but it's really what everybody else calls "single-line" or "DOTALL" mode: it changes the meaning of the . metacharacter, allowing it to match line-separator characters (e.g. \r, \n) as well as all other characters.

"^" is the start of the line. To make what you want, you can split de string and test just the first line. But I think exist some better method.
str.split("\n")[0] =~ /^hello/

Related

Why this regular expression gets true in ruby?

I was started learning the regular expressions in ruby. In that I had one problem. The problem is the below regular expression does not work as expected.
/^[\s]*$/ -- This will match only if the input contains white spaces or the input contains empty.
For example,
str = "
abc
"
if str =~ /^[\s]*$/
puts "Condition is true"
else
puts "Condition is false"
end
My expectation is this condition will gets false. But it gets true. I don't know why ?
In sed or grep it will work as expected. But why it does not works in ruby.
The reason is that in Ruby regex, ^ and $ match the start/end of a line. Change to \A and \z and you will get a false result.
See this Ruby demo at Ideone. The /\A\s*\z/ will only match strings that are either empty or have whitespace symbols only.
As for \s, it is a synonym for [ \t\r\n\f], not just [ \t\n]. See this Ruby Character Class reference:
/\s/ - A whitespace character: /[ \t\r\n\f]/

Ruby - substitute \n if not \\n

I'm trying to do a regex with lookbehind that changes \n to but not if it's a \\n.
My closest attempt has no effect:
text.gsub /(?<!\\)\n/, ''
Unfortunately, no number of backslashes in the lookbehind seem to fix the problem. How can I address this?
You need to double the backslash before the n in the regex, otherwise it's looking for a newline instead of a literal backslash followed by n:
irb(main):001:0> puts "hello\\nthere\\\\n".gsub(/(?<!\\)\\n/, ' ')
hello there\\n
You don't need anything special. "\n" is a single character. It does not include a "\" or "n" character.
text.gsub(/\n/, "")
But instead of that, you should do:
text.gsub("\n", "")
or
text.tr("\n", "")
But I would do:
text.tr($/, "")

Regexp for certain character to end of line

I have a string
"So on and so forth $5.99"
I would like to extract everything after the $ until the end of the line.
/$ finds the character $. How do I select the rest of the string? I know it's something \z but I can't get the syntax right.
In regexp $ represents the end of the line.
So in your case you need \$.*$ To include your escaped $ and everything (.*) up until the end of the line $.
No, /$ does not match that character. You need to escape it \ to match a literal.
string = "So on and so forth $5.99"
result = string.match(/\$(.*)$/)
puts result[1] #=> "5.99"
If you want to capture everything after the $, you'll want:
/\$(.*)\z/
See http://rubular.com/r/T4fR1SEl3j

Why $ doesn't match \r\n

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.
$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

How to escape newline in regex scan

str = "This\n is a sample text for test"
str.scan(/\S.{0,15}\S(?=\s|$)|\S+/)
# => ["This", "is a sample text", "for test"]
Here, it splits when the newline (\n) is present. I actually want the output as,
["This\n is a", "sample text for", "test"]
How can I achieve that?
Use the /m modifier which allows the dot to match newlines:
str.scan(/\S.{0,15}\S(?=\s|\z)|\S+/m)
Also, I suggest you use \z instead of $ because $ matches the end of a line; \z is the only way to force Ruby to match the end of the string. It doesn't matter in this example, but it's a good habit to get into. Ruby differs from all other regex flavors in these two points.

Resources