REGEXP_LIKE Oracle equivalent to count characters in Snowflake - oracle

I am trying to come up with an equivalent of the below Oracle statement in Snowflake. This would check if the different parts of the string separated by '.' matches the number of characters in the REGEXP_LIKE expression. I have come up with a rudimentary version to perform the check in Snowflake but I am sure there's a better and cleaner way to do it. I am looking to come up with a one-liner regular expression check in Snowflake similar to Oracle. Appreciate your help!
-- Oracle
SELECT -- would return True
CASE
WHEN REGEXP_LIKE('AB.XYX.12.34.5670.89', '^\w{2}\.\w{3}\.\w{2}') THEN 'True'
ELSE NULL
END AS abc
FROM DUAL
-- Snowflake
SELECT -- would return True
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 1), '[A-Z0-9]{2}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 2), '[A-Z0-9]{3}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 3), '[A-Z0-9]{2}') AS abc

You need to add a .* at the end as the REGEXP_LIKE adds explicit ^ && $ to string:
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$'). To match any string starting with ABC, the pattern would be 'ABC.*'.
select
column1 as str,
REGEXP_LIKE(str, '\\w{2}\\.\\w{3}\\.\\w{2}.*') as oracle_way
FROM VALUES
('AB.XYX.12.34.5670.89')
;
gives:
STR
ORACLE_WAY
AB.XYX.12.34.5670.89
TRUE
Or in the context of your question:
SELECT IFF(REGEXP_LIKE('AB.XYX.12.34.5670.89', '\\w{2}\\.\\w{3}\\.\\w{2}.*'), 'True', null) AS abc;

Your use of \w seems to suggest you don't need delimited strings to be strictly [A-Z0-9] since word characters allow underscore and period. If all bets were off and the only requirement was to have . at 3rd, 7th and 10th position, you could have used like this way.
select 'AB.XGH.12.34.5670.89' like '__.___.__.%' ;

Related

Oracle query to find any special character in first position or end position of the field value

I have a table in Oracle database with special characters attached at first and last position in the field value. I want to eliminate those special characters while querying the table. I have used INSTR function but I had to apply for each and every special character using CASE expression.
Is there a way to eliminate any special characters that is attached only at first and last positions in one shot?
The query I am using as is below:
CASE WHEN
INSTR(emp_address,'"')=1 THEN REPLACE((emp_address,'"', '').
.
.
.
You can use regular expressions to replace the leading and trailing character of a string if they match the regular expression pattern. For example, if your definition of a "special character" is anything that is not an alpha-numeric character then you can use the regular expression:
^ the start-of-the-string then
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group
| or
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group then
$ the end-of-the-string.
Like this:
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^[:alnum:]]|[^[:alnum:]]$'
) AS simplified_emp_address
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (emp_address) AS
SELECT 'test' FROM DUAL UNION ALL
SELECT '"test2"' FROM DUAL UNION ALL
SELECT 'Not "this" one' FROM DUAL;
Outputs:
EMP_ADDRESS
SIMPLIFIED_EMP_ADDRESS
test
test
"test2"
test2
Not "this" one
Not "this" one
If you have a more complicated definition of a special character then change the regular expression appropriately.
db<>fiddle here

Oracle regex_replace not working as expected

I have following SQL query (Oracle 18c):
SELECT
--FIRST
translate(
' sOmE tEsT
eNdOfLiNe',
chr(10)||chr(11)||chr(13), 'replText'
) "Result1",
--SECOND
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\x0A|\x0B|`\x0D]', 'replText'
) "Result2",
--THIRD
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\r\n\t]', 'replText', 1, 0
) "Result3"
FROM dual
What I would like to do is replace all tabs, return carriages and new line indicators with new string but it seems like regexp replace is not working (returns initial text). I am really sorry about formatting but I need to handle text in exact format as above with \r \n \t mixed chars.
Here is fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=63834f9bcab93136635366f18c375b13
I am learning Oracle right now and don't understand why second and third solution returns initial text. The first solution seems to work but I would like to achieve the same effect in SECOND and THIRD solution. What I missed?
I'm pretty sure Oracle does not allow escape sequences in a character class. I believe this is what you have to do. In response to your comment on another answer here and as you are learning, regex is most definitely not regex. Especially Oracle's implementation.
EDIT to explain the regex: The regex pattern is building a string of a regex character class containing 3 characters, hence the concatenation. You can't just have escape characters in the regex as then regex would take those characters as part of the character class pattern itself.
SELECT REGEXP_REPLACE(
' sOmE tEsT
eNdOfLiNe', '['||CHR(9)||CHR(10)||CHR(13)||']', 'X') Result3
FROM dual;
RESULT3
------------------------------
sOmE tEsTXXXXXXXX eNdOfLiNe
1 row selected.
You can try the below using similar format as translate
select regexp_replace(
' sOmE tEsT
eNdOfLiNe',
chr(10)||'|'||chr(11)||'|'||chr(13), 'replText') "Result3"
FROM dual

Oracle string operation to exclude specific characters based on delimiter

From the String ES-123456-PSA Spain-101, I need to extract only ES-123456-101 Delimiter position is fixed.
Tried REGEXP_SUBSTR('ES-123456-PSA Spain-101','[^-]+',2,3 ) which gives PSA Spain.
Is there a way to ignore those specific characters and returns rest of them.
If you want ES-123456-101 then use this:
SELECT REGEXP_REPLACE('ES-123456-PSA Spain-101', '[^-]+-', '', 1, 3 )
FROM dual;
If you want ES-12345-101 then could you explain the logic for 12345 not 123456? Typo or omit the last character?
you can also use subtr and instr
with t as
(
select 'ES-123456-PSA Spain-101' as text from dual
)
select substr(text,1,instr(text,'-',1,2)) -- ES-123456-
||substr(text,instr(text,'-',1,3)+1) -- 101
from t

REGEXP_SUBSTR for portion of string

I would like to get:
82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
from the following expression
LASTNAME_FIRSTNAME_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
Does someone know how I can get this using regexp_substr ?
EDIT
Basically I have a field which has 7 sets each separated by _ . The string I gave is just one example. I wanted to retrieve everything after the second _ . There is no fixed character length so I can not use a substr function. Hence I was using regexp_substr. I was able to get away by using a simplified version
Select FILE_NAME, ( (REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,3)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,4)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,5)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,6)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+',1,7)) ) as RegExp
from tbl
Here is some more data from the FILE_NAME field
LAST_FIRST_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
SMITH_JOHN_82961_0130BPQX9QZN9G4P5RDTPA9HR4R_MR_1_1of1
LASTNAME_FIRSTNAME_99999_01V0MU4XUQK0Y24Y9RYTFA7W1CM_MR_3_1of1
To get everything after the second underscore, you do not need regular expressions, but can use something like the following:
select substr(FILE_NAME, instr(FILE_NAME, '_', 1, 2) +1 ) from tbl
The instr returns the position of the second occurrence of '_', starting by the first character; the substr simply gets everything starting from the position given by instr + 1
From your Requirement, you can just go ahead and use the simple SUBSTRfunction. Its faster, and it addresses the simple need to remove the String LASTNAME_FIRSTNAME.
select substr('LASTNAME_FIRSTNAME_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1', 20) data_string
from dual;
Output:
data_string
-----------------
82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
Unless you have another underlying logic you need to address?
Kindly clarify so i can edit the answer accordingly.

Oracle Regexp to replace \n,\r and \t with space

I am trying to select a column from a table that contains newline (NL) characters (and possibly others \n, \r, \t). I would like to use the REGEXP to select the data and replace (only these three) characters with a space, " ".
No need for regex. This can be done easily with the ASCII codes and boring old TRANSLATE()
select translate(your_column, chr(10)||chr(11)||chr(13), ' ')
from your_table;
This replaces newline, tab and carriage return with space.
TRANSLATE() is much more efficient than its regex equivalent. However, if your heart is set on that approach, you should know that we can reference ASCII codes in regex. So this statement is the regex version of the above.
select regexp_replace(your_column, '([\x0A|\x0B|`\x0D])', ' ')
from your_table;
The tweak is to reference the ASCII code in hexadecimal rather than base 10.
select translate(your_column, chr(10)||chr(11)||chr(13), ' ') from your_table;
to clean it is essential to serve non-null value as params ...
(oracle function basically will return null once 1 param is null, there are few excpetions like replace-functions)
select translate(your_column, ' '||chr(10)||chr(11)||chr(13), ' ') from your_table;
this examples uses ' '->' ' translation as dummy-value to prevent Null-Value in parameter 3

Resources