Removing/Replacing reserved or invalid url characters in a string - ruby

I'm in the process of creating sef urls for my app. I just encountered an error where one of my objects contains the following characters:
##!*
My desired output is the following where anything illegal outside of reserved/unreserved will get replaced by an underscore:
#_!*
I planned on using this regular expression to filter the bad characters:
[^]A-Za-z0-9_.~!*''();:#&=+$,/?#[%-]+
And do the replacement via gsub
'##!*'.gsub!(/[^]_.~!*''();:#&=+$,/?#[%-]+/, '_')
But not getting anything returned at all. What's going on here?

'<##_!*>'.gsub(/[\[\]^A-Za-z0-9.~!*''();:#&=+$,\/?#%+-]/, '_')
#=> "<_____>"
'[', ']' and '/' must be escaped, '-' must be at the beginning or end of the character class and '^' cannot be at the beginning of the character class (the character class being denoted by the outer '[' and ']' characters). There's no point in replacing '' with '' so I've not included that character in the character class.
Do you wish to also replace '<' and '>'? Are you sure letters and digits "reserved characters"?

Related

Special character Oracle REGEXP

I need to allow only set of characters i.e.,
a to z A to Z 0 to 9 . !##$% *()_=+|[]{}"'';:?/.,-
but When I add dash(-) character to below query it is not working please help me at earliest.
SELECT :p_string FROM dual
WHERE NOT REGEXP_LIKE (translate(:p_string,chr(10)||chr(11)||chr(13), ' '),'[^]^A-Z^a-z^0-9^[^.^{^}^!^#^#^$^%^*^(^)^_^=^+^|^\^{^}^"^''^;^:^?^/^,^-^ ]' );
[.-.] will work fine on this query .
The extra ^ symbols inside the bracket expression in your pattern are not, as I think you expect, negations; only the first ^ inside the brackets does that.
The main issue that is causing, apart from allowing that actual circumflex symbol to be matched when you didn't seem to want it, is that you end up with ^-^ being treated as a range.
To include a literal - it has to be the first or last thing in the brackets; from the docs:
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
So as you need to do both, make the hyphen last; you can change your pattern to:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, -]'
You could also skip the tralsnate step by including those special characters in the pattern too:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, '||chr(10)||chr(11)||chr(13)||'-]'
Looks like you need to permit only (7-bit) ASCII characters with exception of ~ and ^
In this case I would try it like this:
WHERE CONVERT(p_string, 'US7ASCII') = p_string
AND NOT REGEXP_LIKE(p_string, '~|\^')
Instead of CONVERT(p_string, 'US7ASCII') = p_string you can also use ASCIISTR(REPLACE(p_string, '\', '/')) = REPLACE(p_string, '\', '/')

How below REGEXP_REPLACE works?

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

PregMatch . space and #?

Can someone tell me, what's wrong in this code:
if ((!preg_match("[a-zA-Z0-9 \.\s]", $username)) || (!preg_match("[a-zA-Z0-9 \.\s]", $password)));
exit("result_message=Error: invalid characters");
}
??
Several things are wrong. I assume that the code you are looking for is:
if (preg_match('~[^a-z0-9\h.]~i', $username) || preg_match('~[^a-z0-9\h.]~i', $password))
exit('result_message=Error: invalid characters');
What is wrong in your code?
the pattern [a-zA-Z0-9 \.\s] is false for multiple reasons:
a regex pattern in PHP must by enclosed by delimiters, the most used is /, but as you can see, I have choosen ~. Example: /[a-zA-Z \.\s]/
the character class is strange because it contains a space and the character class \s that contains the space too. IMO, to check a username or a password, you only need the space and why not the tab, but not the carriage return or the line feed character! You can remove \s and let the space, or you can use the \h character class that matches all horizontal white spaces. /[a-zA-Z\h\.]/ (if you don't want to allow tabs, replace the \h by a space)
the dot has no special meaning inside a character class and doesn't need to be escaped: /[a-zA-Z\h.]/
you are trying to verify a whole string, but your pattern matches a single character! In other words, the pattern checks only if the string contains at least an alnum, a space or a dot. If you want to check all the string you must use a quantifier + and anchors for the start ^ and the end $ of the string. Example ∕^[a-zA-Z0-9\h.]+$/
in fine, you can shorten the character class by using the case-insensitive modifier i: /^[a-z0-9\h.]+$/i
But there is a faster way, instead of negate with ! your preg_match assertion and test if all characters are in the character range you want, you can only test if there is one character you don't want in the string. To do this you only need to negate the character class by inserting a ^ at the first place:
preg_match('/[^a-z0-9\h.]/i', ...
(Note that the ^ has a different meaning inside and outside a character class. If ^ isn't at the begining of a character class, it is a simple literal character.)

regex any non-digit with exception

I've got strings like these:
+996999966966AA
-996999966966AA
I am using this code:
"+996999966966AA".gsub!(/\D/, "")
to get rid of any character except digits, but the sign + also being stripped. How can my code retain the +?
Use:
[^+\d]
to match anything that isn't + or a digit.
You can also use \W, "non-word character" which matches any character that is not a word character (alphanumeric & underscore)).
(\W\d+)\w+

How to find string containing regex special chars with regex

I have the fallowing piece of code :
details =~ /.#{action.name}.*/
If action.name contains regular string such as "abcd" then everything goes ok ,
but if action.string contains special chars such as . or / ,im getting an exception.
Is there a way to check the action.name string without having to put \ before every special char inside action.name ?
You can escape all special characters using Regexp::escape.
Try:
details =~ /.#{Regexp.escape(action.name)}.*/

Resources