Why doesn't translate work on some characters? - oracle

I am trying to remove certain characters from a VARCHAR2 using translate. Characters 160 (some kind of space) and 243 (paragraph control character?), however, appear to be "phantom" characters that are undetectable by both INSTR and TRANSLATE. LENGTH works, but only if it's the only character in a string. LENGTH(CHR(160)) returns 1, but LENGTH(CHR(160) || CHR(110)) also returns 1 when you'd think it would return 2. I've found that REPLACE works in stripping these phantom characters from a string, but I like translate better because it's easier to read and maintain whereas a long nesting or REPLACE functions is just cumbersome.
Is there some other way to strip these characters from a VARCHAR2 without using replace?
EDIT: It appears that character 243 elsewhere registers as ≤. However, Oracle has no problem displaying this character when I selected it explicitly. When I select CHR(243), it just displays the block replacement character. Plus, this source points 243 to the paragraph character which makes more sense since that's a control code.

What about using a regular expression? This removes your trouble characters by replacing char 160 or 243 with nothing:
SQL> select regexp_replace('abc' || chr(160) || chr(243) || 'def', '(' || chr(160) || '|' || chr(243) || ')', '') from dual;
REGEXP
------
abcdef
SQL>

Related

Oracle Contains statement with special characters

I would like to search for this string 'A&G BROS, INC.' using oracle contains statement
FROM contact
WHERE CONTAINS
(name, 'A&G BROS, INC.') > 0
But I do not get accurate results I get over 300,000 records basically anything containing INC.
I tried escaping the & char using
FROM contact
WHERE CONTAINS
(name, 'A&' || 'G BROS, INC.') > 0
I still get same massive results
Any idea how to run this query with this special chars I want to narrow the results down so I can al least get results that starts with "A&G" Note "LIKE" and "INSTR" cannot be used.
Another way to deal with the special characters is to use the function CHR(n), where n is the ASCII value of the special character. For &, it is 38, so instead of
'A&G BROS, INC.' you can use 'A'||CHR(38)||'G BROS, INC.'
Using these special characters directly in literals can be tricky, because they can behave differently in different environments.
You can find the ASCII value of a character using the ASCII function, like this:
select ascii('&') from dual;
ASCII('&')
38
The & is AND, but the , is also ACCUM. The behaviour of those operators explains what you are seeing.
You need to escape those characters:
To query on words or symbols that have special meaning in query expressions such as and & or| accum, you must escape them. There are two ways to escape characters in a query expression...
So you could do:
FROM contact
WHERE CONTAINS
(name, 'A\&G BROS\, INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, 'A{&}G BROS{,} INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, '{A&G BROS, INC.}') > 0
If you can't stop your client prompting for substitution variables - which is really a separate issue to the contains escapes - then you could combine this with your original approach:
FROM contact
WHERE CONTAINS
(name, '{A&' || 'G BROS, INC.}') > 0

Special character Oracle REGEXP

I need to allow only set of characters i.e.,
a to z A to Z 0 to 9 . !##$% *()_=+|[]{}"'';:?/.,-
but When I add dash(-) character to below query it is not working please help me at earliest.
SELECT :p_string FROM dual
WHERE NOT REGEXP_LIKE (translate(:p_string,chr(10)||chr(11)||chr(13), ' '),'[^]^A-Z^a-z^0-9^[^.^{^}^!^#^#^$^%^*^(^)^_^=^+^|^\^{^}^"^''^;^:^?^/^,^-^ ]' );
[.-.] will work fine on this query .
The extra ^ symbols inside the bracket expression in your pattern are not, as I think you expect, negations; only the first ^ inside the brackets does that.
The main issue that is causing, apart from allowing that actual circumflex symbol to be matched when you didn't seem to want it, is that you end up with ^-^ being treated as a range.
To include a literal - it has to be the first or last thing in the brackets; from the docs:
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
So as you need to do both, make the hyphen last; you can change your pattern to:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, -]'
You could also skip the tralsnate step by including those special characters in the pattern too:
'[^]A-Za-z0-9[.{}!##$%*()_=+|\{}"'';:?/, '||chr(10)||chr(11)||chr(13)||'-]'
Looks like you need to permit only (7-bit) ASCII characters with exception of ~ and ^
In this case I would try it like this:
WHERE CONVERT(p_string, 'US7ASCII') = p_string
AND NOT REGEXP_LIKE(p_string, '~|\^')
Instead of CONVERT(p_string, 'US7ASCII') = p_string you can also use ASCIISTR(REPLACE(p_string, '\', '/')) = REPLACE(p_string, '\', '/')

How below REGEXP_REPLACE works?

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

Format string in Oracle

I'm building a string in oracle, where I get a number from a column and make it a 12 digit number with the LPad function, so the length of it is 12 now.
Example: LPad(nProjectNr,12,'0') and I get 000123856812 (for example).
Now I want to split this string in parts of 3 digit with a "\" as prefix, so that the result will look like this \000\123\856\812.
How can I archive this in a select statement, what function can accomplish this?
Assuming strings of 12 digits, regexp_replace could be a way:
select regexp_replace('000123856812', '(.{3})', '\\\1') from dual
The regexp matches sequences of 3 characters and adds a \ as a prefix
It is much easier to do this using TO_CHAR(number) with the proper format model. Suppose we use \ as the thousands separator.... (alas we can't start a format model with a thousands separator - not allowed in TO_CHAR - so we still need to concatenate a \ to the left):
See also edit below
select 123856812 as n,
'\' || to_char(123856812, 'FM000G000G000G000', 'nls_numeric_characters=.\') as str
from dual
;
N STR
--------- ----------------
123856812 \000\123\856\812
Without the FM format model modifier, TO_CHAR will add a leading space (placeholder for the sign, plus or minus). FM means "shortest possible string representation consistent with the model provided" - that is, in this case, no leading space.
Edit - it just crossed my mind that we can exploit TO_CHAR() even further and not need to concatenate the first \. The thousands separator, G, may not be the first character of the string, but the currency symbol, placeholder L, can!
select 123856812 as n,
to_char(123856812, 'FML000G000G000G000',
'nls_numeric_characters=.\, nls_currency=\') as str
from dual
;
SUBSTR returns a substring of a string passed as the first argument. You can specify where the substring starts and how many characters it should be.
Try
SELECT '\'||SUBSTR('000123856812', 1,3)||'\'||SUBSTR('000123856812', 4,3)||'\'||SUBSTR('000123856812', 7,3)||'\'||SUBSTR('000123856812', 10,3) FROM dual;

oracle regexp_replace delete last occurrence of special character

I have a pl sql string as follows :
String := 'ctx_ddl.add_stopword(''"SHARK_IDX19_SPL"'',''can'');
create index "SCOTT"."SHARK_IDX2"
on "SCOTT"."SHARK2"
("DOC")
indextype is ctxsys.context
parameters(''
datastore "SHARK_IDX2_DST"
filter "SHARK_IDX2_FIL"
section group "SHARK_IDX2_SGP"
lexer "SHARK_IDX2_LEX"
wordlist "SHARK_IDX2_WDL"
stoplist "SHARK_IDX2_SPL"
storage "SHARK_IDX2_STO"
sync (every "SYSDATE+(1/1)" memory 67108864)
'')
/
';
I have to get search the final occurrence of '/' and add ';' to it. Also I need to escape the quotes preset in parameters ('') to have extra quotes. I need output like
String := 'ctx_ddl.add_stopword(''"SHARK_IDX19_SPL"'',''can'');
create index "SCOTT"."SHARK_IDX2"
on "SCOTT"."SHARK2"
("DOC")
indextype is ctxsys.context
parameters(''''
datastore "SHARK_IDX2_DST"
filter "SHARK_IDX2_FIL"
section group "SHARK_IDX2_SGP"
lexer "SHARK_IDX2_LEX"
wordlist "SHARK_IDX2_WDL"
stoplist "SHARK_IDX2_SPL"
storage "SHARK_IDX2_STO"
sync (every "SYSDATE+(1/1)" memory 67108864)
'''')
/;
';
Any help.
There's an age-old saying: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems"
Unless you are confronted by a problem that truly requires regular expressions, I'd recommend working with basic string manipulation functions.
Semi-colon:
Use INSTR to find last occurence of '/', call this P1.
Result = Substr from position 1 through P1||';'||substr from P1+1 through to end-of-string
Parameters substitution:
Use INSTR to find where parameter list starts (i.e. find "parameters(" in your string) and ends (presumably the last closing parenthesis ")" in your string). Call these P2 and P3.
Result = substr from 1 through P2 || REPLACE(substr from P2+1 through P3-1,'''','''''''') || substr from P3 to end-of-string

Resources