Oracle Regexp to replace \n,\r and \t with space - oracle

I am trying to select a column from a table that contains newline (NL) characters (and possibly others \n, \r, \t). I would like to use the REGEXP to select the data and replace (only these three) characters with a space, " ".

No need for regex. This can be done easily with the ASCII codes and boring old TRANSLATE()
select translate(your_column, chr(10)||chr(11)||chr(13), ' ')
from your_table;
This replaces newline, tab and carriage return with space.
TRANSLATE() is much more efficient than its regex equivalent. However, if your heart is set on that approach, you should know that we can reference ASCII codes in regex. So this statement is the regex version of the above.
select regexp_replace(your_column, '([\x0A|\x0B|`\x0D])', ' ')
from your_table;
The tweak is to reference the ASCII code in hexadecimal rather than base 10.

select translate(your_column, chr(10)||chr(11)||chr(13), ' ') from your_table;
to clean it is essential to serve non-null value as params ...
(oracle function basically will return null once 1 param is null, there are few excpetions like replace-functions)
select translate(your_column, ' '||chr(10)||chr(11)||chr(13), ' ') from your_table;
this examples uses ' '->' ' translation as dummy-value to prevent Null-Value in parameter 3

Related

REGEXP_LIKE Oracle equivalent to count characters in Snowflake

I am trying to come up with an equivalent of the below Oracle statement in Snowflake. This would check if the different parts of the string separated by '.' matches the number of characters in the REGEXP_LIKE expression. I have come up with a rudimentary version to perform the check in Snowflake but I am sure there's a better and cleaner way to do it. I am looking to come up with a one-liner regular expression check in Snowflake similar to Oracle. Appreciate your help!
-- Oracle
SELECT -- would return True
CASE
WHEN REGEXP_LIKE('AB.XYX.12.34.5670.89', '^\w{2}\.\w{3}\.\w{2}') THEN 'True'
ELSE NULL
END AS abc
FROM DUAL
-- Snowflake
SELECT -- would return True
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 1), '[A-Z0-9]{2}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 2), '[A-Z0-9]{3}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 3), '[A-Z0-9]{2}') AS abc
You need to add a .* at the end as the REGEXP_LIKE adds explicit ^ && $ to string:
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$'). To match any string starting with ABC, the pattern would be 'ABC.*'.
select
column1 as str,
REGEXP_LIKE(str, '\\w{2}\\.\\w{3}\\.\\w{2}.*') as oracle_way
FROM VALUES
('AB.XYX.12.34.5670.89')
;
gives:
STR
ORACLE_WAY
AB.XYX.12.34.5670.89
TRUE
Or in the context of your question:
SELECT IFF(REGEXP_LIKE('AB.XYX.12.34.5670.89', '\\w{2}\\.\\w{3}\\.\\w{2}.*'), 'True', null) AS abc;
Your use of \w seems to suggest you don't need delimited strings to be strictly [A-Z0-9] since word characters allow underscore and period. If all bets were off and the only requirement was to have . at 3rd, 7th and 10th position, you could have used like this way.
select 'AB.XGH.12.34.5670.89' like '__.___.__.%' ;

Oracle query to find any special character in first position or end position of the field value

I have a table in Oracle database with special characters attached at first and last position in the field value. I want to eliminate those special characters while querying the table. I have used INSTR function but I had to apply for each and every special character using CASE expression.
Is there a way to eliminate any special characters that is attached only at first and last positions in one shot?
The query I am using as is below:
CASE WHEN
INSTR(emp_address,'"')=1 THEN REPLACE((emp_address,'"', '').
.
.
.
You can use regular expressions to replace the leading and trailing character of a string if they match the regular expression pattern. For example, if your definition of a "special character" is anything that is not an alpha-numeric character then you can use the regular expression:
^ the start-of-the-string then
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group
| or
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group then
$ the end-of-the-string.
Like this:
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^[:alnum:]]|[^[:alnum:]]$'
) AS simplified_emp_address
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (emp_address) AS
SELECT 'test' FROM DUAL UNION ALL
SELECT '"test2"' FROM DUAL UNION ALL
SELECT 'Not "this" one' FROM DUAL;
Outputs:
EMP_ADDRESS
SIMPLIFIED_EMP_ADDRESS
test
test
"test2"
test2
Not "this" one
Not "this" one
If you have a more complicated definition of a special character then change the regular expression appropriately.
db<>fiddle here

Oracle regex_replace not working as expected

I have following SQL query (Oracle 18c):
SELECT
--FIRST
translate(
' sOmE tEsT
eNdOfLiNe',
chr(10)||chr(11)||chr(13), 'replText'
) "Result1",
--SECOND
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\x0A|\x0B|`\x0D]', 'replText'
) "Result2",
--THIRD
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\r\n\t]', 'replText', 1, 0
) "Result3"
FROM dual
What I would like to do is replace all tabs, return carriages and new line indicators with new string but it seems like regexp replace is not working (returns initial text). I am really sorry about formatting but I need to handle text in exact format as above with \r \n \t mixed chars.
Here is fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=63834f9bcab93136635366f18c375b13
I am learning Oracle right now and don't understand why second and third solution returns initial text. The first solution seems to work but I would like to achieve the same effect in SECOND and THIRD solution. What I missed?
I'm pretty sure Oracle does not allow escape sequences in a character class. I believe this is what you have to do. In response to your comment on another answer here and as you are learning, regex is most definitely not regex. Especially Oracle's implementation.
EDIT to explain the regex: The regex pattern is building a string of a regex character class containing 3 characters, hence the concatenation. You can't just have escape characters in the regex as then regex would take those characters as part of the character class pattern itself.
SELECT REGEXP_REPLACE(
' sOmE tEsT
eNdOfLiNe', '['||CHR(9)||CHR(10)||CHR(13)||']', 'X') Result3
FROM dual;
RESULT3
------------------------------
sOmE tEsTXXXXXXXX eNdOfLiNe
1 row selected.
You can try the below using similar format as translate
select regexp_replace(
' sOmE tEsT
eNdOfLiNe',
chr(10)||'|'||chr(11)||'|'||chr(13), 'replText') "Result3"
FROM dual

Replacing multiple CHR() from PLSQL string

I have a PLSQL string which contains chr() special characters like chr(10), chr(13). I want to replace these special characters from the string. I tried the following ways
select regexp_replace('Hello chr(10)Goodchr(13)Morning','CHR(10)|chr(13)','') from dual;
This do not works since regexp_replace do not replace chr() function. Then I tried
select translate('Hello chr(10)Goodchr(13)Morning', 'chr(10)'||'chr(13)', ' ') from dual;
It works partially, ie; I am forced to replace chr() with white space(third parameter). No option if I do not want to replace chr() with white spaces . If I pass third character as null then above query returns null result.
Anybody have any other methods?
The problem with your replacement isn't the logic, it's the syntax. Parentheses are regex metacharacters, and as such, if you want to replace a literal parenthesis, you need to escape it. So your pattern should be this:
chr\(13\)|chr\(10\)
Here is a working query:
select
regexp_replace('Hello chr(10)Goodchr(13)Morning','chr\(13\)|chr\(10\)','', 1, 0, 'i')
from dual
The fifth parameters in the above call to regexp_replace is 'i' and indicates that we want to do a case insensitive replacement.
Demo
The above logic removes the literal text chr(13) and chr(10) from your text. If instead you want to remove the actual control characters chr(13) and chr(10), then you may add those control characters to the alternation, e.g.
select
regexp_replace('Hello chr(10)Goodchr(13)Morning','chr\(13\)|chr\(10\)|chr(10)|chr(13)','', 1, 0, 'i')
from dual
Since regular expression functions are relatively expensive in Oracle I think it's worth showing the alternate method which just uses REPLACE for the same effect.
This replaces occurrences of the each control characters with a space;
SELECT REPLACE(REPLACE('ABC'||CHR(10)||'DEF'||CHR(13)||'GHI'||CHR(10)||'JKL',CHR(13),' '),CHR(10),' ') from dual
And this replaces occurrences of the strings 'CHR(13)' and 'CHR(10)';
select REPLACE(REPLACE('ABCCHR(10)DEFCHR(13)GHICHR(10)JKL','CHR(13)',' '),'CHR(10)',' ') from dual

Oracle: Find control characters except line feed

I'm trying to all rows where a column contains any control charters with the exception of the line feed character (hex value of A). I've tried the following, but this only returns results that have a control character and don't have a line feed. I really want the set of characters that are control characters, LESS the line feed character. Is there a 'minus' operation for character sets, where you can exclude particular ones from it?
SELECT *
FROM MyTable
WHERE REGEXP_LIKE(MyColumn, '[:cntrl: &&[^' || UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('A')) || ']]{1,}');
Any thoughts?
Thanks!
Well here's a first try that will work but I'm sure this can be made more elegant and efficient:
SELECT *
FROM MyTable
WHERE regexp_like(MyColumn, '[[:cntrl:]]')
AND MyColumn NOT like '%' || chr(10) || '%';

Resources