Replace all characters other than english letters and numbers to underscore - ruby

I have a string, and I would like to replace all special characters with underscores.
In other words, I just want 26 english letters (lower and upper cases) and 0-9 and the "_" character.
Also note that there are the non-english characters and they need to be replaced with "_" as well.
What is the most elegant way to do this in Ruby?

It sounds like you want to replace all non-word characters with underscores. Therefore,
result = subject.gsub(/[^\w]/, '_')
But are you okay that this would also replace newlines and other whitespace characters?
If not, change it to
result = subject.gsub(/[^\w\s]/, '_')
Explain Regex
[^\w\s] # any character except: word characters (a-
# z, A-Z, 0-9, _), whitespace (\n, \r, \t,
# \f, and " ")
Note
As #CarySwoveland mentions, the [^\w] can also be written with the shorthand \W.

Related

Replacing all but alphabetic characters with spaces in python, in any language

The code
phrase = "".join([c if c.isalpha() else " " for c in phrase])
substitute all non-alphabetic character with spaces. It works very well with strings made up with occidental language characters.
But giving it the value:
phrase = u'इसका स्वामित्व और नियंत्रण किया। इसके'
the result is u'इसक स व म त व और न य त रण क य इसक ', while it shouldn't change, since the string is only made of alphabetic characters and spaces.
I think the reason is that some character is a surrogate pair.
Is it a bug with python's isalpha() method?
Or, if not, how can I deal properly with characters represented by surrogate pairs?

Username Regular Expression

I need the username to be two or more characters of a-z, 0-9, all downcase. This is the current regex I am using
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
With this regex, users are able to use uppercase charters in their username. How do I modify the current regex to avoid that?
The regular expression to filter for two to twenty lower-case characters or digits is
/^[a-z0-9]{2,20}$/
which means:
^ at the front of input
a-z accept lower-case 'a' through 'z'
0-9 accept '0' through '9'
{2,20} accept 2 to 20 elements from preceding [] block
$ until the end of input
You can make a regular expression case-insensitive with trailing i, as in your example; that appears to be the root of problem. That said, I don't know Ruby's peculiarities with respect to regular expressions.
If you must keep the RegEx - remove the "i" from the end
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/i
USER_REGEX = /\A[a-z0-9][-a-z0-9]{1,19}\z/
the "i" tells the RegEx to be a case-insensitive RegEx.
but you want it to be case-sensitive and only match on lowercase letters.

regex any non-digit with exception

I've got strings like these:
+996999966966AA
-996999966966AA
I am using this code:
"+996999966966AA".gsub!(/\D/, "")
to get rid of any character except digits, but the sign + also being stripped. How can my code retain the +?
Use:
[^+\d]
to match anything that isn't + or a digit.
You can also use \W, "non-word character" which matches any character that is not a word character (alphanumeric & underscore)).
(\W\d+)\w+

Regexp non alphanumerical but not German characters

I would like to remove all non alpha numerical characters from a string. Except space, - and some German characters.
Example
regexp = "mönchengladbach."
regexp.gsub(/[^0-9a-z \-]/i, '')
=> mnchengladbach
I need this:
=> mönchengladbach
It should also not replace other German characters such as:
ä ö ü ß
Thanks!
Edit:
It was just me not testing properly. The IRB did not accept special characters. This works for me:
regexp.gsub(/[^0-9a-z \-äüöß]/i, '')
To remove all that is not a letter or a space you can use this:
str.gsub(/[^\p{L}\s]+/, '')
I use here a negated character class, [^\p{L}\s] means all that is not a letter (in all language you want) or a white charater (space, tab, newlines)
\p{L} is an unicode character class for Letters.
You can easily add other characters you want to preserve like -:
str.gsub(/[^\p{L}\s-]+/, '')
example script:
# encoding: UTF-8
str = "mönchengladbach."
str = str.gsub(/[^\p{L}\s]+/, '#')
puts str
I think you want:
/[^[:alnum:] -]/
Note the //i is not necessary and no need to escape - when it's at the end of a []

how to remove leading and trailing non-alphabetic characters in ruby

I want to remove any leading and trailing non-alphabetic character in my string.
for eg. ":----- pt-br:-" , i want "pt-br"
Thanks
result = subject.gsub(/\A[\d_\W]+|[\d_\W]+\Z/, '')
will remove non-letters from the start and end of the string.
\A and \Z anchor the regex at the start/end of the string (^/$ would also match after/before a newline which is probably not what you want - but that might not matter in this case);
[\d_\W]+ matches one or more digits, the underscore or anything else that is not an alphanumeric character, leaving only letters.
| is the alternation operator.
In ruby 1.9.1 :
":----- pt-br:-".partition( /[a-zA-Z](...)[a-zA-Z]/ )[1]
partition searches the pattern in the string and returns the part before it, the match, and the part after it.
result = subject.gsub(/^[^a-zA-Z]+/, '').gsub(/[^a-zA-Z]+$/, '')

Resources