How below REGEXP_REPLACE works? - oracle

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?

This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}

example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

Related

Special character Oracle REGEXP

I need to allow only set of characters i.e.,
a to z A to Z 0 to 9 . !##$% *()_=+|[]{}"'';:?/.,-
but When I add dash(-) character to below query it is not working please help me at earliest.
SELECT :p_string FROM dual
WHERE NOT REGEXP_LIKE (translate(:p_string,chr(10)||chr(11)||chr(13), ' '),'[^]^A-Z^a-z^0-9^[^.^{^}^!^#^#^$^%^*^(^)^_^=^+^|^\^{^}^"^''^;^:^?^/^,^-^ ]' );
[.-.] will work fine on this query .
The extra ^ symbols inside the bracket expression in your pattern are not, as I think you expect, negations; only the first ^ inside the brackets does that.
The main issue that is causing, apart from allowing that actual circumflex symbol to be matched when you didn't seem to want it, is that you end up with ^-^ being treated as a range.
To include a literal - it has to be the first or last thing in the brackets; from the docs:
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
So as you need to do both, make the hyphen last; you can change your pattern to:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, -]'
You could also skip the tralsnate step by including those special characters in the pattern too:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, '||chr(10)||chr(11)||chr(13)||'-]'
Looks like you need to permit only (7-bit) ASCII characters with exception of ~ and ^
In this case I would try it like this:
WHERE CONVERT(p_string, 'US7ASCII') = p_string
AND NOT REGEXP_LIKE(p_string, '~|\^')
Instead of CONVERT(p_string, 'US7ASCII') = p_string you can also use ASCIISTR(REPLACE(p_string, '\', '/')) = REPLACE(p_string, '\', '/')

PregMatch . space and #?

Can someone tell me, what's wrong in this code:
if ((!preg_match("[a-zA-Z0-9 \.\s]", $username)) || (!preg_match("[a-zA-Z0-9 \.\s]", $password)));
exit("result_message=Error: invalid characters");
}
??
Several things are wrong. I assume that the code you are looking for is:
if (preg_match('~[^a-z0-9\h.]~i', $username) || preg_match('~[^a-z0-9\h.]~i', $password))
exit('result_message=Error: invalid characters');
What is wrong in your code?
the pattern [a-zA-Z0-9 \.\s] is false for multiple reasons:
a regex pattern in PHP must by enclosed by delimiters, the most used is /, but as you can see, I have choosen ~. Example: /[a-zA-Z \.\s]/
the character class is strange because it contains a space and the character class \s that contains the space too. IMO, to check a username or a password, you only need the space and why not the tab, but not the carriage return or the line feed character! You can remove \s and let the space, or you can use the \h character class that matches all horizontal white spaces. /[a-zA-Z\h\.]/ (if you don't want to allow tabs, replace the \h by a space)
the dot has no special meaning inside a character class and doesn't need to be escaped: /[a-zA-Z\h.]/
you are trying to verify a whole string, but your pattern matches a single character! In other words, the pattern checks only if the string contains at least an alnum, a space or a dot. If you want to check all the string you must use a quantifier + and anchors for the start ^ and the end $ of the string. Example ∕^[a-zA-Z0-9\h.]+$/
in fine, you can shorten the character class by using the case-insensitive modifier i: /^[a-z0-9\h.]+$/i
But there is a faster way, instead of negate with ! your preg_match assertion and test if all characters are in the character range you want, you can only test if there is one character you don't want in the string. To do this you only need to negate the character class by inserting a ^ at the first place:
preg_match('/[^a-z0-9\h.]/i', ...
(Note that the ^ has a different meaning inside and outside a character class. If ^ isn't at the begining of a character class, it is a simple literal character.)

Regular Expression with Double Quotes

Given a string in between quotations, as such "Hello"
The following regular expression will print out a match of the string without the double quotations:
/"([^"]+)"/
I don't understand how it is capturing the characters. I believe what this should be capturing is just the initial double quote. What this regular expression is saying is find an expression that starts and ends with double quotes and again has one or more double quotes at the beginning. And it captures that one or more double quotes at the beginning. How does it end up matching the string here with [^"]+ ?
The expression [^"]+ means literally to match all characters which are not the double quote ". So when placed inside (), all characters following the first " and up to the next " are captured. This is because the ^ inside a [] character class implies negation rather than the start of a string as it would mean outside the []. So [^"] literally means anything but a ".
The () itself is the capture group, and the regex will only capture the expression which exists inside (). Depending on the programming language you use, it may also record the entire string matched "Hello" by the entire expression /"([^"]+)"/ in a separate variable, but the purpose of () is to capture its contents.
Full breakdown of the expression:
" - first literal quote
( - begin capture
[^"]+ all subsequent characters up to but not including "
) - end capture group
" - final closing quote literal

Ruby, weird substitution

For example:
str1 = "pppp(m)pppp"
str2 = "(m)"
str1 = str1.sub(/#{str2}/, "<>#{str2}<>")
I will got this:
"pppp(<>(m)<>)pppp"
I expected to get this:
"pppp<>(m)<>pppp"
Why it's happening and how to avoid this?
In ( and ) have a special meaning in regexen and do not actually match the characters ( and ). The regex /(m)/ will match any m whether or not it is enclosed in parentheses (and if it is, it won't match the parentheses).
To match literal parentheses use \( and \) - or in a case like this where you're interpolating a string, you can just use Regexp.escape on the string, i.e. /#{ Regexp.escape(str2) }/.
The regular expression is viewing the "(m)" as a capture group because the parenthesis are operators in regular expressions to get a literal "(m)" you need to use the escape char \ ["\(m\)"].

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources