Ruby RegEx problem text.gsub[^\W-], '') fails - ruby

I'm trying to learn RegEx in Ruby, based on what I'm reading in "The Rails Way". But, even this simple example has me stumped. I can't tell if it is a typo or not:
text.gsub(/\s/, "-").gsub([^\W-], '').downcase
It seems to me that this would replace all spaces with -, then anywhere a string starts with a non letter or number followed by a dash, replace that with ''. But, using irb, it fails first on ^:
syntax error, unexpected '^', expecting ']'
If I take out the ^, it fails again on the W.

>> text = "I love spaces"
=> "I love spaces"
>> text.gsub(/\s/, "-").gsub(/[^\W-]/, '').downcase
=> "--"
Missing //
Although this makes a little more sense :-)
>> text.gsub(/\s/, "-").gsub(/([^\W-])/, '\1').downcase
=> "i-love-spaces"
And this is probably what is meant
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
\W means "not a word"
\w means "a word"
The // generate a regexp object
/[^\W-]/.class
=> Regexp

Step 1: Add this to your bookmarks. Whenever I need to look up regexes, it's my first stop
Step 2: Let's walk through your code
text.gsub(/\s/, "-")
You're calling the gsub function, and giving it 2 parameters.
The first parameter is /\s/, which is ruby for "create a new regexp containing \s (the // are like special "" for regexes).
The second parameter is the string "-".
This will therefore replace all whitespace characters with hyphens. So far, so good.
.gsub([^\W-], '').downcase
Next you call gsub again, passing it 2 parameters.
The first parameter is [^\W-]. Because we didn't quote it in forward-slashes, ruby will literally try run that code. [] creates an array, then it tries to put ^\W- into the array, which is not valid code, so it breaks.
Changing it to /[^\W-]/ gives us a valid regex.
Looking at the regex, the [] says 'match any character in this group. The group contains \W (which means non-word character) and -, so the regex should match any non-word character, or any hyphen.
As the second thing you pass to gsub is an empty string, it should end up replacing all the non-word characters and hyphens with empty string (thereby stripping them out )
.downcase
Which just converts the string to lower case.
Hope this helps :-)

You forgot the slashes. It should be /[^\W-]/

Well, .gsub(/[^\W-]/,'') says replace anything that's a not word nor a - for nothing.
You probably want
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
Lower case \w (\W is just the opposite)

The slashes are to say that the thing between them is a regular expression, much like quotes say the thing between them is a string.

Related

ruby gsub new line characters

I have a string with newline characters that I want to gsub out for white space.
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\\r\\n]/, ' ')
something like this ^ only my regex seems to be replacing the 'r' and 'n' letters as well. the other constraint is sometimes the pattern repeats itself twice and thus would be replaced with two whitespaces in a row, although this is not preferable it is better than all the text being cut apart.
If there is a way to only select the new line characters. Or even better if there a more rubiestic way of approaching this outside of going to regex?
If you have mixed consecutive line breaks that you want to replace with a single space, you may use the following regex solution:
s.gsub(/\R+/, ' ')
See the Ruby demo.
The \R matches any type of line break and + matches one or more occurrences of the quantified subpattern.
Note that in case you have to deal with an older version of Ruby, you will need to use the negated character class [\r\n] that matches either \r or \n:
.gsub(/[\r\n]+/, ' ')
or - add all possible linebreaks:
/gsub(/(?:\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029])+/, ' ')
This should work for your test case:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]/, ' ')
If you don't want successive \r\n characters to result in duplicate spaces you can use this instead:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]+/, ' ')
(Note the addition of the + after the character class.)
As Wiktor mentioned, you're using \\ in your regex, which inside the regex literal /.../ actually escapes a backslash, meaning you're matching a literal backslash \, r, or n as part of your expression. Escaping characters works differently in regex literals, since \ is used so much, it makes no sense to have a special escape for it (as opposed to regular strings, which is a whole different animal).

How to write \1 in string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to write \1 in a string, but I can't do it. I would appreciate if somebody helped me with this strange behaviour. Here is an example with some explaining.
EDIT: Adding example output
puts "\1 <- null"
puts "\\1 <- slash one"
works!
but typing
"\1"
"\\1"
in the irb command line yields
"\1"
=> "\u0001
"\\1"
=> "\\1"
There are a few ways to get it:
"\\1"
'\1'
?\\ + ?1
Remember that the way it will show up is always "\\1", which means literal backslash, one, which is what you want. The way to know that this is correct is to use puts:
puts "\\1"
# => \1
Inside of double-quoted strings, backslashes have significant meaning. \n means the newline character. In single quoted strings, that's two characters: backslash and n.
You can even test this:
"\\1".chars
# => ["\\", "1"]
'\1'.chars
# => ["\\", "1"]
So you can see Ruby is interpreting that as two characters, not three. Don't be fooled by the second backslash inside a double-quoted string. That's how a literal backslash is represented.
Have you tried puts '\1'? (single quotes instead of double)
I'm not 100% sure what you're asking but if that helps, cheers.
Your command line shows "\1" because irb does .inspect on the object, which escapes the string. So essentially \1 is properly stored, but when it's displaying it, it adds another \ to indicate to you that it's escaped
When I'm in IRB and type \1, the value returned is \u0001 which is Ruby's way of
representing the character.
When I write puts('\1), the behavior is the same in IRB and when running
a script. I see a unicode character map as follows
0 0
0 1
This won't be the same output on all platforms (it depends on how unicode is
displayed). So that's probably why you see no output on the repl.it example.

GSUB and Forward Slash usage in Ruby

I often see the gsub function being called with the pattern parameter enclosed in forward slashes. For example:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub(/\*\*\*/, "WOOF")
=> "WOOF and WOOF ran to the ###."
I thought maybe it had something to do with escaping asterisks, but using single quotes and double quotes works just as well:
>> phrase = "*** and *** ran to the ###."
>> phrase.gsub('***', "WOOF")
=> "WOOF and WOOF ran to the ###."
>> phrase.gsub("***", "WOOF")
=> "WOOF and WOOF ran to the ###."
Is it just convention to use forward slash? What am I missing?
Use forward slashes if you need to use regular expressions.
If you use a string argument with gsub, it will just do a plain character match.
In your example, you need backslashes to escape the asterisks when using a regular expression, because asterisks have a special meaning in regex (optionally match something any number of times). They are not necessary when using a string, because they are just matched exactly.
In your example, you probably don't need to use a regular expression, since it is a simple pattern. However, if you wanted to match *** only when it was at the beginning of a string (e.g. the first bunch in your example), then you would want to use a regex, for example:
phrase.gsub(/^\*{3}/, "WOOF")
For more information on regular expressions, see: http://www.regular-expressions.info/.
For more information on using regular expressions in Ruby, see: http://ruby-doc.org/core-2.2.0/Regexp.html.
To play with regular expressions as they work in Ruby, try: http://rubular.com/.
You are missing reading the documentation:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ā€˜dā€™, instead of a digit.
http://ruby-doc.org/core-2.1.4/String.html#method-i-gsub
In other words, you can give a string or a regular expression. Regular expressions can be delimited several ways:
Regexps are created using the /.../ and %r{...} literals, and by the Regexp::new constructor.
http://ruby-doc.org/core-2.2.2/Regexp.html
The benefit of %r and of the alternate %r delimiters is you can usually find a delimiter that doesn't collide with characters in the pattern, which would force escaping them, as in your example.
* has to be escaped because it has special meaning in a regex, but in a string it does not.

ruby regex about escape a escape

I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$

count quotes in a string that do not have a backslash before them

Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).

Resources