PregMatch . space and #? - preg-match

Can someone tell me, what's wrong in this code:
if ((!preg_match("[a-zA-Z0-9 \.\s]", $username)) || (!preg_match("[a-zA-Z0-9 \.\s]", $password)));
exit("result_message=Error: invalid characters");
}
??

Several things are wrong. I assume that the code you are looking for is:
if (preg_match('~[^a-z0-9\h.]~i', $username) || preg_match('~[^a-z0-9\h.]~i', $password))
exit('result_message=Error: invalid characters');
What is wrong in your code?
the pattern [a-zA-Z0-9 \.\s] is false for multiple reasons:
a regex pattern in PHP must by enclosed by delimiters, the most used is /, but as you can see, I have choosen ~. Example: /[a-zA-Z \.\s]/
the character class is strange because it contains a space and the character class \s that contains the space too. IMO, to check a username or a password, you only need the space and why not the tab, but not the carriage return or the line feed character! You can remove \s and let the space, or you can use the \h character class that matches all horizontal white spaces. /[a-zA-Z\h\.]/ (if you don't want to allow tabs, replace the \h by a space)
the dot has no special meaning inside a character class and doesn't need to be escaped: /[a-zA-Z\h.]/
you are trying to verify a whole string, but your pattern matches a single character! In other words, the pattern checks only if the string contains at least an alnum, a space or a dot. If you want to check all the string you must use a quantifier + and anchors for the start ^ and the end $ of the string. Example ∕^[a-zA-Z0-9\h.]+$/
in fine, you can shorten the character class by using the case-insensitive modifier i: /^[a-z0-9\h.]+$/i
But there is a faster way, instead of negate with ! your preg_match assertion and test if all characters are in the character range you want, you can only test if there is one character you don't want in the string. To do this you only need to negate the character class by inserting a ^ at the first place:
preg_match('/[^a-z0-9\h.]/i', ...
(Note that the ^ has a different meaning inside and outside a character class. If ^ isn't at the begining of a character class, it is a simple literal character.)

Related

Regex to select all the commas from string that do not have any white space around them

I want to select all the commas in a string that do not have any white space around. Suppose I have this string:
"He,she, They"
I want to select only the comma between he and she. I tried this in rubular and came up with this regex:
(,[^(,\s)(\s,)])
This selects the comma that I want, but also selects an s which is a character after it.
In your regex (,[^(,\s)(\s,)]) you capture a comma followed by a negated character class that matches not any of the specified characters, which could also be written as (,[^)(,\s]) which will capture for example ,s in a group,
What you could do is use a positive lookahead and a positve lookbehind to check what is on the left and what is on the right is not a \S whitespace character:
(?<=\S),(?=\S)
Regex demo
In Ruby, you may use [[:space:]] to match any (Unicode) whitespace and [^[:space:]] to match any char other than whitespace. Using these character classes inside lookarounds solves the problem:
/(?<=[^[:space:]]),(?=[^[:space:]])/
See the Rubular demo
Here,
(?<=[^[:space:]]) - a positive lookbehind that matches a location that is immediately preceded with a non-whitespace char (if the string start position should also be matched, replace with (?<![[:space:]]))
, - a comma
(?=[^[:space:]]) - a positive lookahead that matches a location that is immediately followed with a non-whitespace char (if the string end position should also be matched, replace with (?![[:space:]])).
Check the regex below and use the code hope it will help you!
re = /[^\s](,)[^\s]/m
str = 'check ,my,domain, qwe,sd'
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
Check LIVE DEMO HERE

Special character Oracle REGEXP

I need to allow only set of characters i.e.,
a to z A to Z 0 to 9 . !##$% *()_=+|[]{}"'';:?/.,-
but When I add dash(-) character to below query it is not working please help me at earliest.
SELECT :p_string FROM dual
WHERE NOT REGEXP_LIKE (translate(:p_string,chr(10)||chr(11)||chr(13), ' '),'[^]^A-Z^a-z^0-9^[^.^{^}^!^#^#^$^%^*^(^)^_^=^+^|^\^{^}^"^''^;^:^?^/^,^-^ ]' );
[.-.] will work fine on this query .
The extra ^ symbols inside the bracket expression in your pattern are not, as I think you expect, negations; only the first ^ inside the brackets does that.
The main issue that is causing, apart from allowing that actual circumflex symbol to be matched when you didn't seem to want it, is that you end up with ^-^ being treated as a range.
To include a literal - it has to be the first or last thing in the brackets; from the docs:
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
So as you need to do both, make the hyphen last; you can change your pattern to:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, -]'
You could also skip the tralsnate step by including those special characters in the pattern too:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, '||chr(10)||chr(11)||chr(13)||'-]'
Looks like you need to permit only (7-bit) ASCII characters with exception of ~ and ^
In this case I would try it like this:
WHERE CONVERT(p_string, 'US7ASCII') = p_string
AND NOT REGEXP_LIKE(p_string, '~|\^')
Instead of CONVERT(p_string, 'US7ASCII') = p_string you can also use ASCIISTR(REPLACE(p_string, '\', '/')) = REPLACE(p_string, '\', '/')

Regexp_like check for symbol in string

If I want to check that string contains #symbol I can write something like
REGEXP_LIKE(source_column,'#')
or
REGEXP_LIKE(source_column, '.*#.*')
What is the difference between these two forms?
And why REGEXP_LIKE(source_column,'#') returns true even if string has other symbols than #? For example it matches with mail#mail.com and 12#
Naturally '#' looks like exact string match for me, and '.*#.*' I read as 'any string with that symbol'.
These three all function identically and will return true if any number of characters precede or follow the # symbol:
REGEXP_LIKE(source_column,'#')
REGEXP_LIKE(source_column,'.*#.*')
REGEXP_LIKE(source_column,'^.*#.*$', 'n')
(You need the 'n' match parameter for the last example if you have multi-line data otherwise the . wildcard character will not match newlines and the match will fail.)
If you want an exact match then look for the start-of-string (^) and end-of-string ($) immediately preceding and following the symbol:
REGEXP_LIKE(source_column,'^#$')

ruby parametrized regular expression

I have a string like "{some|words|are|here}" or "{another|set|of|words}"
So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket.
What is the most efficient way to get the selected word of that string ?
I would like do something like this:
#my_string = "{this|is|a|test|case}"
#my_string.get_column(0) # => "this"
#my_string.get_column(2) # => "is"
#my_string.get_column(4) # => "case"
What should the method get_column contain ?
So this is the solution I like right now:
class String
def get_column(n)
self =~ /\A\{(?:\w*\|){#{n}}(\w*)(?:\|\w*)*\}\Z/ && $1
end
end
We use a regular expression to make sure that the string is of the correct format, while simultaneously grabbing the correct column.
Explanation of regex:
\A is the beginnning of the string and \Z is the end, so this regex matches the enitre string.
Since curly braces have a special meaning we escape them as \{ and \} to match the curly braces at the beginning and end of the string.
next, we want to skip the first n columns - we don't care about them.
A previous column is some number of letters followed by a vertical bar, so we use the standard \w to match a word-like character (includes numbers and underscore, but why not) and * to match any number of them. Vertical bar has a special meaning, so we have to escape it as \|. Since we want to group this, we enclose it all inside non-capturing parens (?:\w*\|) (the ?: makes it non-capturing).
Now we have n of the previous columns, so we tell the regex to match the column pattern n times using the count regex - just put a number in curly braces after a pattern. We use standard string substition, so we just put in {#{n}} to mean "match the previous pattern exactly n times.
the first non skipped column after that is the one we care about, so we put that in capturing parens: (\w*)
then we skip the rest of the columns, if any exist: (?:\|\w*)*.
Capturing the column puts it into $1, so we return that value if the regex matched. If not, we return nil, since this String has no nth column.
In general, if you wanted to have more than just words in your columns (like "{a phrase or two|don't forget about punctuation!|maybe some longer strings that have\na newline or two?}"), then just replace all the \w in the regex with [^|{}] so you can have each column contain anything except a curly-brace or a vertical bar.
Here's my previous solution
class String
def get_column(n)
raise "not a column string" unless self =~ /\A\{\w*(?:\|\w*)*\}\Z/
self[1 .. -2].split('|')[n]
end
end
We use a similar regex to make sure the String contains a set of columns or raise an error. Then we strip the curly braces from the front and back (using self[1 .. -2] to limit to the substring starting at the first character and ending at the next to last), split the columns using the pipe character (using .split('|') to create an array of columns), and then find the n'th column (using standard Array lookup with [n]).
I just figured as long as I was using the regex to verify the string, I might as well use it to capture the column.

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources