Why this regular expression gets true in ruby? - ruby

I was started learning the regular expressions in ruby. In that I had one problem. The problem is the below regular expression does not work as expected.
/^[\s]*$/ -- This will match only if the input contains white spaces or the input contains empty.
For example,
str = "
abc
"
if str =~ /^[\s]*$/
puts "Condition is true"
else
puts "Condition is false"
end
My expectation is this condition will gets false. But it gets true. I don't know why ?
In sed or grep it will work as expected. But why it does not works in ruby.

The reason is that in Ruby regex, ^ and $ match the start/end of a line. Change to \A and \z and you will get a false result.
See this Ruby demo at Ideone. The /\A\s*\z/ will only match strings that are either empty or have whitespace symbols only.
As for \s, it is a synonym for [ \t\r\n\f], not just [ \t\n]. See this Ruby Character Class reference:
/\s/ - A whitespace character: /[ \t\r\n\f]/

Related

matching a double-quote via quote vs pattern

Why does check_char1 fail to find the double-quote?
#!/usr/bin/env ruby
line = 'hello, "bob"'
def check_char1(line, _char)
puts "check_char1 found #{_char} in #{line}" if line =~ /_char/
end
check_char1(line, '"')
def check_char2(line, _char)
puts "check_char2 found #{_char.inspect} in #{line}" if line =~ _char
end
check_char2(line, /"/)
...and can it be made to work using line =~ /_char/? (How should the double-quote be passed to the method?)
If _char is just a string (i.e. no regex pattern matching needed) then just use String#include?
if line.include?(_char)
If you must use a regex for this then Regexp.escape is your friend:
if line =~ /#{Regexp.escape(_char)}/
if line =~ Regexp.new(Regexp.escape(_char))
and if you want _char to be treated like a regex (i.e. '.' matches anything) then drop the Regexp.escape:
if line =~ /#{_char}/
if line =~ Regexp.new(_char)
In check_char1, _char in /_char/ is treated as a literal, not a variable. You need /#{_char}/.
If _char were treated as variable how could one enter a literal in a regex that was the name of a variable, method or constant?

Regexp for certain character to end of line

I have a string
"So on and so forth $5.99"
I would like to extract everything after the $ until the end of the line.
/$ finds the character $. How do I select the rest of the string? I know it's something \z but I can't get the syntax right.
In regexp $ represents the end of the line.
So in your case you need \$.*$ To include your escaped $ and everything (.*) up until the end of the line $.
No, /$ does not match that character. You need to escape it \ to match a literal.
string = "So on and so forth $5.99"
result = string.match(/\$(.*)$/)
puts result[1] #=> "5.99"
If you want to capture everything after the $, you'll want:
/\$(.*)\z/
See http://rubular.com/r/T4fR1SEl3j

Ruby regex: ^ matches start of line even without m modifier?

Ruby 1.8.7. I'm using a regex with a ^ to match a pattern at the start of the string. The problem is that if the pattern is found at the start of any line in the string it still matches. This is the behaviour I would expect if I were using the 'm' modifier but I'm not:
$ irb
irb(main):001:0> str = "hello\ngoodbye"
=> "hello\ngoodbye"
irb(main):002:0> puts str
hello
goodbye
=> nil
irb(main):004:0> str =~ /^goodbye/
=> 6
What am I doing wrong here?
start of the line: ^
end of the line: $
start of the string: \A
end of the string: \z
Use \A instead of ^.
Ruby regex reference: http://www.zenspider.com/ruby/quickref.html#regexen
Your confusion is justified. In most regex flavors, ^ is equivalent to \A and $ is equivalent to \Z by default, and you have to set the "multiline" flag to make them take on their other meanings (i.e. line boundaries). In Ruby, ^ and $ always match at line boundaries.
To add to the confusion, Ruby has something it calls "multiline" mode, but it's really what everybody else calls "single-line" or "DOTALL" mode: it changes the meaning of the . metacharacter, allowing it to match line-separator characters (e.g. \r, \n) as well as all other characters.
"^" is the start of the line. To make what you want, you can split de string and test just the first line. But I think exist some better method.
str.split("\n")[0] =~ /^hello/

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Backslashes in gsub (escaping and backreferencing)

Consider the following snippet:
puts 'hello'.gsub(/.+/, '\0 \\0 \\\0 \\\\0')
This prints (as seen on ideone.com):
hello hello \0 \0
This was very surprising, because I'd expect to see something like this instead:
hello \0 \hello \\0
My argument is that \ is an escape character, so you write \\ to get a literal backslash, thus \\0 is a literal backslash \ followed by 0, etc. Obviously this is not how gsub is interpreting it, so can someone explain what's going on?
And what do I have to do to get the replacement I want above?
Escaping is limited when using single quotes rather then double quotes:
puts 'sinlge\nquote'
puts "double\nquote"
"\0" is the null-character (used i.e. in C to determine the end of a string), where as '\0' is "\\0", therefore both 'hello'.gsub(/.+/, '\0') and 'hello'.gsub(/.+/, "\\0") return "hello", but 'hello'.gsub(/.+/, "\0") returns "\000". Now 'hello'.gsub(/.+/, '\\0') returning 'hello' is ruby trying to deal with programmers not keeping the difference between single and double quotes in mind. In fact, this has nothing to do with gsub: '\0' == "\\0" and '\\0' == "\\0". Following this logic, whatever you might think of it, this is how ruby sees the other strings: both '\\\0' and '\\\\0' equal "\\\\0", which (when printed) gives you \\0. As gsub uses \x for inserting match number x, you need a way to escape \x, which is \\x, or in its string representation: "\\\\x".
Therefore the line
puts 'hello'.gsub(/.+/, "\\0 \\\\0 \\\\\\0 \\\\\\\\0")
indeed results in
hello \0 \hello \\0

Resources