Stripping non-alphanumeric chars but leaving spaces in Ruby - ruby

Trying to change this:
"The basketball-player is great! (Kobe Bryant)"
into this:
"the basketball player is great kobe bryant"
Want to downcase and remove all punctuation but leave spaces...
Tried string.downcase.gsub(/[^a-z ]/, '') but it removes the spaces

You can simply add \s (whitespace)
string.downcase.gsub(/[^a-z0-9\s]/i, '')

If you want to catch non-latin characters, too:
str = "The basketball-player is great! (Kobe Bryant) (ひらがな)"
str.downcase.gsub(/[^[:word:]\s]/, '')
#=> "the basketballplayer is great kobe bryant ひらがな"

Some fine solutions, but simplest is usually best:
string.downcase.gsub /\W+/, ' '

All the other answers strip out numbers as well. That works for the example given but doesn't really answer the question which is how to strip out non-alphanumeric.
string.downcase.gsub(/[^\w\s]/, '')
Note this will not strip out underscores. If you need that then:
string.downcase.gsub(/[^a-zA-Z\s\d]/, '')

a.downcase.gsub(/[^a-z ]/, "")
Note the whitespace I have added after a-z.
Also if you want to replace all whitespaces(not only space use \s as proposed by gmalette).

All the previous answers make basketball-player into basketballplayer or remove numbers entirely, which is not exactly what is required.
The following code does exactly what you asked:
text.downcase
.gsub(/[^[:word:]\s]/, ' ') # Replace sequences of non-alphanumerical chars by a single space
Hope this helps someone!

Related

Ruby regex and special characters like dash (—) and »

I'm trying to replace all punctuation and the likes in some text with just a space. So I have the line
text = "—Bonne chance Harry murmura t il »"
How can I remove the dash and the dash and »? I tried
text.gsub( /»|—/, ' ')
which gives an error, not surprisingly. I'm new to ruby and just trying to get a hang of things by writing a script to pull all the words out of a chapter of a book. I figure I'd just remove the punctuation and symbols and just use text.split. Any help would be appreciated. I couldn't find much
It turns out the problem had to do with the utf-8 encoding. Adding
# encoding: utf-8
solved my issues and what #Andrewlton said works great
This should properly substitute in the way you were trying to do it; just add brackets and remove the pipe:
text.gsub(/[»—]/, ' ')
The standard punctuation regexp also works:
text.gsub(/\p{P}/, ' ')
You should be able to use regexp pretty universally, coming from whatever language you know. Hope this helps!

Regex to match string excluding first character

I'm writing ruby and need some help with regex. And I'm really noob in regexp.
I have a string like this
/hello/world
I would like to #gsub this string to change the second slash to %2F.
The challange for me to ignore the first slash and to change only the second slash.
I tried this one
[^/]/
but it chooses not clean slash but o/ in
/hello/world
Please, help me. Thanks!!
You can simply capture the character before the slash in a group and use that in the replacement, for example:
"/hello/world".gsub(/([^\/])\//, '\1%2F') #=> "/hello%2Fworld"
Or if you just want to match any / that appears after the first character, you can simplify this to:
"/hello/world".gsub(/(.)\//, '\1%2F') #=> "/hello%2Fworld"
Or like this:
"/hello/world".gsub(/(?<!^)\//, '%2F') #=> "/hello%2Fworld"
And now for an uglier, regexless alternative:
"/hello/world".split("/").tap(&:shift).unshift("/").join("")
I'll see myself out.
You need to use subpattern within () for find substring:
/^\/(.*)$/
or
/^.(.*)$/
this pattern excluding first character. And then replace / in this substring
(?!^\/)\/
http://rubular.com/r/IRWptAJdLs is a a working example.
change the second / to %2F:
'/hello/world'.sub /(\/.*?)\//, '\1%2F'
#=> "/hello%2Fworld"

easy issue about Ruby

I would like to know what it does:
File.open(filename,"r").each_file do |line|
if (!line.strip.empty? and !line.starts_with?(" "))
....
.....
end
end
Especially what isstrip? Thanks for your time!
Strip removes all the leading and trailing whitespace chars from a string. In essense the code you pasted checks if the sting contains anything apart from whitespaces AND the first symbol is not a space.

Ruby gsub issues

I have a piece of text that resembled the following:
==EXCLUDE
#lots of lines of text
==EXCLUDE
#this is what I actually want
And so I was trying to remove the unwanted bit by doing:
str.gsub!(/==EX.*?==EXCLUDE/, '')
However, its not working. When I tried to remove the \n chars first, it worked like a dream. The issue is that I can't actually remove the \n characters. How can I do a substitution like this while leaving newlines in place?
By default, the . does not match line break chars. If you enable the m modifier in Ruby (in other languages, this is the s modifier) it should work:
str.gsub!(/==EX.*?==EXCLUDE/m, '')
Here's a live demo on Rubular: http://rubular.com/r/YxLSB1Iq95
Try str.gsub!(/==EX.*?==EXCLUDE/m, '')
That should make it span new lines.

Ruby RegEx problem text.gsub[^\W-], '') fails

I'm trying to learn RegEx in Ruby, based on what I'm reading in "The Rails Way". But, even this simple example has me stumped. I can't tell if it is a typo or not:
text.gsub(/\s/, "-").gsub([^\W-], '').downcase
It seems to me that this would replace all spaces with -, then anywhere a string starts with a non letter or number followed by a dash, replace that with ''. But, using irb, it fails first on ^:
syntax error, unexpected '^', expecting ']'
If I take out the ^, it fails again on the W.
>> text = "I love spaces"
=> "I love spaces"
>> text.gsub(/\s/, "-").gsub(/[^\W-]/, '').downcase
=> "--"
Missing //
Although this makes a little more sense :-)
>> text.gsub(/\s/, "-").gsub(/([^\W-])/, '\1').downcase
=> "i-love-spaces"
And this is probably what is meant
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
\W means "not a word"
\w means "a word"
The // generate a regexp object
/[^\W-]/.class
=> Regexp
Step 1: Add this to your bookmarks. Whenever I need to look up regexes, it's my first stop
Step 2: Let's walk through your code
text.gsub(/\s/, "-")
You're calling the gsub function, and giving it 2 parameters.
The first parameter is /\s/, which is ruby for "create a new regexp containing \s (the // are like special "" for regexes).
The second parameter is the string "-".
This will therefore replace all whitespace characters with hyphens. So far, so good.
.gsub([^\W-], '').downcase
Next you call gsub again, passing it 2 parameters.
The first parameter is [^\W-]. Because we didn't quote it in forward-slashes, ruby will literally try run that code. [] creates an array, then it tries to put ^\W- into the array, which is not valid code, so it breaks.
Changing it to /[^\W-]/ gives us a valid regex.
Looking at the regex, the [] says 'match any character in this group. The group contains \W (which means non-word character) and -, so the regex should match any non-word character, or any hyphen.
As the second thing you pass to gsub is an empty string, it should end up replacing all the non-word characters and hyphens with empty string (thereby stripping them out )
.downcase
Which just converts the string to lower case.
Hope this helps :-)
You forgot the slashes. It should be /[^\W-]/
Well, .gsub(/[^\W-]/,'') says replace anything that's a not word nor a - for nothing.
You probably want
>> text.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase
=> "i-love-spaces"
Lower case \w (\W is just the opposite)
The slashes are to say that the thing between them is a regular expression, much like quotes say the thing between them is a string.

Resources