Ruby Regex Different from Rubular

Ruby Regex Different from Rubular - ruby

I'm trying the same super simple regex on Rubular.com and my VM Linux with Ruby 1.9.2 I dont' know why I'm getting different outputs:
VM:
my_str = "Madam Anita"
puts my_str[/\w/]
this Outputs: Madam
on Rubular it outputs: MadamAnita
Rubular:
http://www.rubular.com/r/qyQipItdes
I would love some help. I stuck here. I will not be able to test my code for the hw1.

No, it doesn't really. It matches all characters in "Madam" and "Anita", but not the space. The problem you are having is that my_str[/\w/] only returns a single match for the given regular expression, whereas Rubular highlights all possible matches.
If you need all occurrences, you could do this:
1.9.3p194 :002 > "Madam Anita".scan(/\w+/)
=> ["Madam", "", "Anita", ""]

Actually, \w matches a single character. The result in Rubular contains spaces between adjacent characters to tell you this (though I wish they'd also make the highlighting more obvious...). Compare with the output from matching \w+, which matches two strings (Madam and Anita).

Related

What is wrong with this extremely simple regex?

I'm trying to test that a regex will match a 2-digit number. I get:
11 =~ /^\d{1,2}$/
# => nil
Yet the regex works flawlessly on Rubular. What am I doing wrong?

The problem is that you are testing the regex against a number and not a string. Regexes are intended for matching strings. Simply:
'11' =~ /^\d{1,2}$/
or
11.to_s =~ /^\d{1,2}$/

You are calling Kernel#=~, which always returns nil.
Rubular does not interpret your input as Ruby code, it interprets is as string literal. That is why it works there.

You are applying regex on number instead of string so convert it to string and try again.

Ruby regex too greedy with back to back matches

I'm working on some text processing in Ruby 1.8.7 to support some custom shortcodes that I've created. Here are some examples of my shortcode:
[CODE first-part]
[CODE first-part second-part]
I'm using the following RegEx to grab the
text.gsub!( /\[CODE (\S+)\s?(\S?)\]/i, replacementText )
The problem is this: the regex doesn't work on the following text:
[CODE first-part][CODE first-part-again]
The results are as follows:
1. first-part][CODE
2. first-part-again
It seems that the \s? is the problematic part of the regex that is searching on until it hits the last space, not the first one. When I change the regex to the following:
\[CODE ([\w-]+)\s?(\S*)\]/i
It works fine. The only concern I have is what all \w vs \s as I want to make sure the \w will match URL-safe characters.
I'm sure there's a perfectly valid explanation, but it's eluding me. Any ideas? Thanks!

Actually, thinking about it, just using [^\]] might not be enough, as it will swallow up all spaces as well. You also need to exclude those:
/\[CODE[ ]([^\]\s]+)\s?([^\]\s]*)\]/i
Note the [ ] - I just think it makes literal spaces more readable.
Working demo.
Explained in free-spacing mode:
\[CODE[ ] # match your identifier
( # capturing group 1
[^\]\s]+ # match one or more non-], non-whitespace characters
) # end of group 1
\s? # match an optional whitespace character
( # capturing group 2
[^\]\s]+ # match zero or more non-], non-whitespace characters
) # end of group 2
\] # match the closing ]
As none of the character classes in the pattern includes ], you can never possibly go beyond the end of the square bracketed expression.
By the way, if you find unnecessary escapes in regex as obscuring as I do, here is the minimal version:
/\[CODE[ ]([^]\s]+)\s?([^]\s]*)]/i
But that is definitely a matter of taste.

The problem was with the greedy \S+ in this
/\[CODE (\S+)\s?(\S?)\]/i
You could try:
/\[CODE (\S+?)\s?(\S?)\]/i
but actually your new character class is IMO superiror.
Even better might be:
/\[CODE ([^\]]+?)\s?([^\]]*)\]/i

Replace specific characters between brackets in ruby

I have a string
str = "'${1:textbox}',[${2:x},${3:y},${4:w},${5:h}]"
and I would like to replace all , between [ and ] with a single space.
I have attempted to use something like
str.gsub!(/(?<=\[)\,*?(?=\])/," ")
without success. However, if I replace \, in my expression with ., I get the expected output:
str.gsub!(/(?<=\[).*?(?=\])/," ")
== "'${1:textbox}',[ ]"
Could someone please explain the proper regex technique to use in this situation, and perhaps also explain why the examples I have posted above have failed and succeeded?
I am using ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]

It may be possible to do this with a single regex, but even if it is, I can guarantee it'll be ugly beyond description. It's a lot simpler to use "nested" substitution - use one gsub to find bracketed substrings, and then use another to swap out the commas:
str.gsub(/\[.*?\]/) do |substr|
substr.gsub(',', ' ')
end
I'm afraid I can't explain why your attempts have failed - neither of them would run for me (ruby 1.8.7 / irb 0.9.5). IRB gave errors that vaguely said "Bad regexp syntax." And I can't quite grok how they're supposed to work (edit: mu is too short has an awesome breakdown in his answer - check that out). Hope this is helpful anyway!

This regex:
/(?<=\[)\,*?(?=\])/
is looking for an opening bracket followed by a sequence of commas (of any length) followed by a closing bracket. That means things like this:
[]
[,]
[,,,,,,,,,,,]
Your string doesn't look like that so your first gsub! doesn't do anything. If you do this:
'[,,,,,,]'.gsub(/(?<=\[),*?(?=\])/, " ")
You'll get a '[ ]' for your troubles.
Your second regex:
/(?<=\[).*?(?=\])/
works because .*? matches anything (subject to newlines and /m and /s modifiers of course) and the portion of your string between [ and ] certainly qualifies as anything.
If you're trying to produce this:
"'${1:textbox}',[${2:x} ${3:y} ${4:w} ${5:h}]"
then I'd go with Xavier Holt's nested gsub approach, that's simple and clean.

Rubular/Ruby discrepancy in captured text

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(

Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n

I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.

The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/

Ruby RegEx problem text.gsub[^\W-], '') fails

I'm trying to learn RegEx in Ruby, based on what I'm reading in "The Rails Way". But, even this simple example has me stumped. I can't tell if it is a typo or not:
text.gsub(/\s/, "-").gsub([^\W-], '').downcase
It seems to me that this would replace all spaces with -, then anywhere a string starts with a non letter or number followed by a dash, replace that with ''. But, using irb, it fails first on ^:
syntax error, unexpected '^', expecting ']'
If I take out the ^, it fails again on the W.

>> text = "I love spaces"
=> "I love spaces"
>> text.gsub(/\s/, "-").gsub(/[^\W-]/, '').downcase
=> "--"
Missing //
Although this makes a little more sense :-)
>> text.gsub(/\s/, "-").gsub(/([^\W-])/, '\1').downcase
=> "i-love-spaces"
And this is probably what is meant
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
\W means "not a word"
\w means "a word"
The // generate a regexp object
/[^\W-]/.class
=> Regexp

Step 1: Add this to your bookmarks. Whenever I need to look up regexes, it's my first stop
Step 2: Let's walk through your code
text.gsub(/\s/, "-")
You're calling the gsub function, and giving it 2 parameters.
The first parameter is /\s/, which is ruby for "create a new regexp containing \s (the // are like special "" for regexes).
The second parameter is the string "-".
This will therefore replace all whitespace characters with hyphens. So far, so good.
.gsub([^\W-], '').downcase
Next you call gsub again, passing it 2 parameters.
The first parameter is [^\W-]. Because we didn't quote it in forward-slashes, ruby will literally try run that code. [] creates an array, then it tries to put ^\W- into the array, which is not valid code, so it breaks.
Changing it to /[^\W-]/ gives us a valid regex.
Looking at the regex, the [] says 'match any character in this group. The group contains \W (which means non-word character) and -, so the regex should match any non-word character, or any hyphen.
As the second thing you pass to gsub is an empty string, it should end up replacing all the non-word characters and hyphens with empty string (thereby stripping them out )
.downcase
Which just converts the string to lower case.
Hope this helps :-)

You forgot the slashes. It should be /[^\W-]/

Well, .gsub(/[^\W-]/,'') says replace anything that's a not word nor a - for nothing.
You probably want
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
Lower case \w (\W is just the opposite)

The slashes are to say that the thing between them is a regular expression, much like quotes say the thing between them is a string.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby Regex Different from Rubular - ruby

Actually, \w matches a single character. The result in Rubular contains spaces between adjacent characters to tell you this (though I wish they'd also make the highlighting more obvious...). Compare with the output from matching \w+, which matches two strings (Madam and Anita).

Related

What is wrong with this extremely simple regex?

Ruby regex too greedy with back to back matches

Replace specific characters between brackets in ruby

Rubular/Ruby discrepancy in captured text

Ruby RegEx problem text.gsub[^\W-], '') fails

Categories

Resources