I'm trying to write a regular expression that matches only a single standalone letter only, such as a,C,f,G, but, NOT abc or de for instance.
I tried [a-zA-z], but all of the above match.
What should I do in this case?
^[a-zA-Z]$
Add ^$ or anchors to limit match to just one character.
or
(?:^|(?<=[^a-zA-Z]))[a-zA-Z](?=[^a-zA-Z]|$)
There are several ways to do this, depending on your content. This could work:
[^a-zA-Z][a-zA-Z][^a-zA-Z]
Or there's a regex code for that, the \b:
\b[a-zA-Z]\b
which is more useful since it allows matches at the start and end of a line.
Your regex [a-zA-z] matches not only letters but also matches [, ], \, ^, _ and `. Moreover, it has no anchors and thus will match both a and t in at.
You can make use of the POSIX bracket expression alpha to match a single letter substring together with a word boundary \b:
puts 'a,C,f,G, but, NOT abc de'.scan(/\b[[:alpha:]]\b/)
See IDEONE demo
Output:
a
C
f
G
Related
So, there are a number of regular expression which matches a particular group like the following:
/./ - Any character except a newline.
/./m - Any character (the m modifier enables multiline mode)
/\w/ - A word character ([a-zA-Z0-9_])
/\s/ - Any whitespace character
And in ruby:
/[[:punct:]]/ - Punctuation character
/[[:space:]]/ - Whitespace character ([:blank:], newline, carriage return, etc.)
/[[:upper:]]/ - Uppercase alphabetical
So, here is my question: how do I get a regexp to match a group like this, but exempt a character out?
Examples:
match all punctuations apart from the question mark
match all whitespace characters apart from the new line
match all words apart from "go"... etc
Thanks.
You can use character class subtraction.
Rexegg:
The syntax […&&[…]] allows you to use a logical AND on several character classes to ensure that a character is present in them all. Intersecting with a negated character, as in […&&[^…]] allows you to subtract that class from the original class.
Consider this code:
s = "./?!"
res = s.scan(/[[:punct:]&&[^!]]/)
puts res
Output is only ., / and ? since ! is excluded.
Restricting with a lookahead (as sawa has written just now) is also possible, but is not required when you have this subtraction supported. When you need to restrict some longer values (more than 1 character) a lookahead is required.
In many cases, a lookahead must be anchored to a word boundary to return correct results. As an example of using a lookahead to restrict punctuation (single character matching generic pattern):
/(?:(?!!)[[:punct:]])+/
This will match 1 or more punctuation symbols but a !.
The puts "./?!".scan(/(?:(?!!)[[:punct:]])+/) code will output ./? (see demo)
Use character class subtraction whenever you need to restrict with single characters, it is more efficient than using lookaheads.
So, the 3rd scenario regex must look like:
/\b(?!go\b)\w+\b/
^^
If you write /(?!\bgo\b)\b\w+\b/, the regex engine will check each position in the input string. If you use a \b at the beginning, only word boundary positions will be checked, and the pattern will yield better performance. Also note that the ^^ \b is very important since it makes the regex engine check for the whole word go. If you remove it, it will only restrict to the words that do not start with go.
Put what you want to exclude inside a negative lookahead in front of the match. For example,
To match all punctuations apart from the question mark,
/(?!\?)[[:punct:]]/
To match all words apart from "go",
/(?!\bgo\b)\b\w+\b/
This is a general approach that is sometimes useful:
a = []
".?!,:;-".scan(/[[:punct:]]/) { |s| a << s unless s == '?' }
a #=> [".", "!", ",", ":", ";", "-"]
The content of the block is limited only by your imagination.
So I have a string that looks like this:
#jackie#test.com, #mike#test.com
What I want to do is before any email in this comma separated list, I want to remove the #. The issue I keep running into is that if I try to do a regular \A flag like so /[\A#]+/, it finds all the instances of # in that string...including the middle crucial #.
The same thing happens if I do /[\s#]+/. I can't figure out how to just look at the beginning of each string, where each string is a complete email address.
Edit 1
Note that all I need is the regex, I already have the rest of the stuff I need to do what I want. Specifically, I am achieving everything else like this:
str.gsub(/#/, '').split(',').map(&:strip)
Where str is my string.
All I am looking for is the regex portion for my gsub.
You may use the below negative lookbehind based regex.
str.gsub(/(?<!\S)#/, '').split(',').map(&:strip)
(?<!\S) Negative lookbehind asserts that the character or substring we are going to match would be preceeded by any but not of a non-space character. So this matches the # which exists at the start or the # which exists next to a space character.
Difference between my answer and hwnd's str.gsub(/\B#/, '') is, mine won't match the # which exists in :# but hwnd's answer does. \B matches between two word characters or two non-word characters.
Here is one solution
str = "#jackie#test.com, #mike#test.com"
p str.split(/,[ ]+/).map{ |i| i.gsub(/^#/, '')}
Output
["jackie#test.com", "mike#test.com"]
I am trying to make a regular expression that validates letters, numbers and spaces ONLY. I dont want any special characters accepted (i.e. # )(*&^%$#!,)
I have been trying several things but nothing has given me letters(uppercase & lowercase), numbers, and spaces.
So it should accept something like this...
John Stevens 12
james stevens
willcall12
or
12cell space
but not this
12cell space!
John#Stevens
james 12 Fall#
I have tried the following
^[a-zA-Z0-9]+$
[\w _]+
^[\w_ ]+$
but they allow special characters or dont allow spaces. This is for a ruby validation.
You almost got it right. You could use this:
/\A[a-z0-9\s]+\Z/i
\s matches whitespace characters including tab. You could use (space) within square brackets if you need exact match for space.
/i at the end means match is not case sensitive.
Take a look at Rubular for testing your regexes.
EDIT: As pointed out by Jesus Castello, for some scenarios one should use \A and \Z instead of ^ and $ to denote string boundaries. See Difference between \A \Z and ^ $ in Ruby regular expressions for the explanation.
Here is a working example that will print matching results:
VALIDATION = /\A[a-zA-Z0-9 ]+\Z/
words = ["willcall12", "John Stevens 12", "12cell space!", "John#Stevens"]
words.each do |word|
m = word.match(VALIDATION)
puts m[0] if m
end
I can recommend this article if you would like to learn more about regular expressions.
Is there anyway to scan only if there is nothing before what I am scanning for.
For example I have a post and I am scanning for a forward slash and what follows it but I do not want to scan for a forward slash if it is not the beginning character.
I want to scan for /this but I do not want to scan for this/this or http://this.com.
The regular expression I am currently using is..
/\/(\w+)/
I am using this with gsub to link each /forwardslash.
I think what you are asking for is to only match words that begin with '/', not strings or lines beginning with '/'. If that is true, I believe the following regex will work: %r{(?:^|\s+)/(\w+)}:
For example:
"/foo /this this/that http://this".scan %r{(?:^|\s+)/(\w+)} # => [["foo"], ["this"]]
The caret (^) character means "beginning of string" -- a dollar sign ($) means "end of string."
So
/^\/(\w+)/
...will get you what you want -- only matching at the beginning of the string.
First thing, since you're using a regex with slashes change the delimiter to something else, then you won't have to escape the backslashes and it will be easier to read.
Secondly, if you want to replace the slash as well then include it in the capture.
On to the regex.
...if it is not the beginning
character...
...of a line:
!^(/\w+)!
if it is not the beginning
character...
...of a word:
!\s(/\w+)!
but that won't match if it's at the very beginning of a line. For that you'll need something a lot more complex, so I'd just run both the regexes here instead of creating that monster.
why this snippet:
'He said "Hello"' =~ /(\w)\1/
matches "ll"? I thought that the \w part matches "H", and hence \1 refers to "H", thus nothing should be matched? but why this result?
I thought that the \w part matches "H"
\w matches any alphanumerical character (and underscore). It also happens to match H but that’s not terribly interesting since the regular expression then goes on to say that this has to be matched twice – which H can’t in your text (since it doesn’t appear twice consecutively), and neither is any of the other characters, just l. So the regular expression matches ll.
You're thinking of /^(\w)\1/. The caret symbol specifies that the match must start at the beginning of the line. Without that, the match can start anywhere in the string (it will find the first match).
and you're right, nothing was matched at that position. then regex went further and found match, which it returned to you.
\w is of course matches any word character, not just 'H'.
The point is, "\1" means one repetition of the "(\w)" block, only the letter "l" is doubled and will match your regex.
A nice page for toying around with ruby and regular expressions is Rubular