Currently my regular Expression looks like this: /\A[a-zA-Z]{2,50}\z/i
Now I would like to add acceptance of: -, whitespace and ÄÖÜ.
I tried rubular.com but I'm a total regex noob.
Probably you want to think about using Unicode properties and scripts. You could write your regex then as
\A[\p{L}\s-]{2,50}\z
See it here on Rubular
\p{L} is a Unicode property and matching any letter in any language.
If you want to match ÄÖÜ you maybe also want ß, using Unicode properties you don't have to think about such things.
If you want to limit the possible letters a bit you can use a Unicode script such as Latin
\A[\p{Latin}\s-]{2,50}\z
See it on Rubular
Just add it in the regex as follows:
/\A[a-zA-Z\-\sÄÖÜ]{2,50}\z/i
DEMO
regex is an invaluable cross-language tool that the sooner you learn, the better off you will be. I suggest putting in the time to learn it
In a regex, whitespace is represented by the shorthand character class \s.
A hyphen/minus is a special character, so must be escaped by a backslash \-
Ä, Ö and Ü are normal characters, so you can just add them as they are.
/\A[a-zA-Z\s\-ÄÖÜ]{2,50}\z/i
Related
I am trying to remove white-space from japanese word.
input "かいしゃ(会社)"
output "かいしゃ(会社)"
The space here is consumed by the parentheses. They are not your regular ASCII parentheses, they are of the "full width" flavor.
If you want to replace them with ASCII parentheses, you can do it like this:
compact_input = input.gsub("\uFF08", '(') # and a similar step for the closing parenthesis
Although this might make your string look weird in japanese (I don't know the language well enough, so can't say)
So I want to split a string in java on any non-alphanumeric characters.
Currently I have been doing it like this
words= Str.split("\\W+");
However I want to keep apostrophes("'") in there. Is there any regular expression to preserve apostrophes but kick the rest of the junk? Thanks.
words = Str.split("[^\\w']+");
Just add it to the character class. \W is equivalent to [^\w], which you can then add ' to.
Do note, however, that \w also actually includes underscores. If you want to split on underscores as well, you should be using [^a-zA-Z0-9'] instead.
For basic English characters, use
words = Str.split("[^a-zA-Z0-9']+");
If you want to include English words with special characters (such as fiancé) or for languages that use non-English characters, go with
words = Str.split("[^\\p{L}0-9']+");
I need regexp for "EWD-eb-AEW-97-QOW" like strings.
The general pattern is:
3 uppercase letters, hex, 3 uppercase letters, hex, 3 uppercase letters.
I use:
/[A-Z]{3}-\h-[A-Z]{3}-\h-[A-Z]{3}/
but it doesn't work. Can anyone help with it and explain why it doesn't work?
\h doesn't match 2 digit hex numbers, use this regex:
/[A-Z]{3}-[A-F0-9]{2}-[A-Z]{3}-[A-F0-9]{2}-[A-Z]{3}/i
RegEx Demo
In addition to anubhava's answer.. You can add {2} occurrences in your answer for achieving the same.
/[A-Z]{3}-\h{2}-[A-Z]{3}-\h{2}-[A-Z]{3}/i
See DEMO
I'd use something like:
/(?:[A-Z]{3}-[a-f0-9]{2}-){2}[A-Z]{3}/
https://www.regex101.com/r/aY5eF6/1
(?:[A-Z]{3}-[a-f0-9]{2}-)
groups the three-letters, '-', two hexadecimal letters, and '-', and then does it twice, followed by three-letters again.
Regarding using Ruby's \h Regexp extension:
I'd be careful using special characters like \h in a mixed-language environment, or one that is supporting old versions of Ruby. We use YAML to contain patterns shared among various languages, and something like this would open up a very hard to track bug. I'd recommend using [a-f0-9] unless you KNOW you'll never run into that problem.
I have a display name field which I have to validate using Ruby regex. We have to match all language characters like French, Arabic, Chinese, German, Spanish in addition to English language characters except special characters like *()!##$%^&.... I am stuck on how to match those non-Latin characters.
There are two possibilities:
Create a regex with a negated character class containing every symbol you don't want to match:
if ( name ~= /[^*!#%\^]/ ) # add everything and if this matches you are good
This solution may not be feasible, since there is a massive amount of symbols you'd have to insert, even if you were just to include the most common ones.
Use Oniguruma (see also: Oniguruma for Ruby main). This supports Unicode and their properties; in which case all letters can be matched using:
if ( name ~= /[\pL\pM]/ )
You can see what these are all about here: Unicode Regular Expressions
Starting from Ruby 1.9, the String and Regex classes are unicode aware. You can safely use the Regex word character selector \w
"可口可樂!?!".gsub /\w/, 'Ha'
#=> "HaHaHaHa!?!"
In ruby > 1.9.1 (maybe earlier) one can use \p{L} to match word characters in all languages (without the oniguruma gem as described in a previous answer).
I am using Ruby on Rails 3.0.9 and I would like to validate a string that can contain only characters (case insensitive characters), blank spaces and numbers.
More:
special characters are not allowed (eg: !"£$%&/()=?^) except - and _;
accented characters are allowed (eg: à, è, é, ò, ...);
The regex that I know from this question is ^[a-zA-Z\d\s]*$ but this do not validate special characters and accented characters.
So, how I should improve the regex?
I wrote the ^(?:[^\W_]|\s)*$ answer in the question you referred to (which actually would have been different if I'd known you wanted to allow _ and -). Not being a Ruby guy myself, I didn't realize that Ruby defaults to not using Unicode for regex matching.
Sorry for my lack of Ruby experience. What you want to do is use the u flag. That switches to Unicode (UTF-8), so accented characters are caught. Here's the pattern you want:
^[\w\s-]*$
And here it is in action at Rubular. This should do the trick, I think.
The u flag works on my original answer as well, though that one isn't meant to allow _ or - characters.
Something like ^[\w\s\-]*$ should validate characters, blank spaces, minus, and underscore.
Validation string only for not allowed characters. In this case |,<,>," and &.
^[^|<>\"&]*$