Regexp_substr find string not matching a group of characters - oracle

I have a string like mystr = 'value1~|~value2~|~ ... valuen". I need it as one column separated on rows like this:
value1
value2
...
valuen
I'm trying this
select regexp_substr(mystr, '[^(~\|~)]', 1 , lvl) from dual, (select level as lvl from dual connect by level <= 5);
The problem is that ~|~ is not treated as a group, if I add ~ to anywhere in the string it gets separated; also () are treated as separators.
Any help is highly appreciated! Thanks! ~|~

Quick and dirty solution:
with t as (
select rtrim(regexp_substr('value1~|~value2~|~value3~|~value4', '(.+?)($|~\|~)', 1,level,''),'~|~')value from dual connect by level<10
) select * from t where value is not null;

[] signifies a single character match and [^] signifies a single character that does not match any of the contained characters.
So [^(~\|~)] will match any one character that is not ( or ~ or \ or | or ~ (again) or ).
What you want is a match that is terminated by your separator:
SELECT REGEXP_SUBSTR(
mystr,
'(.*?)(~\|~)',
1,
LEVEL,
NULL,
1
)
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( mystr, '(.*?)(~\|~)' );
(or if you cannot have zero-width matches, you can use the regular expression '(.+?)(~\|~)' and <= in the CONNECT BY clause.)

This will parse the delimited list and the format of the regex will handle NULL list elements should they occur as shown in the example.
SQL> with tbl(str) as (
select 'value1~|~value2~|~~|~value4' from dual
)
select regexp_substr(str, '(.*?)(~\|~|$)', 1, level, NULL, 1) parsed
from tbl
connect by level <= regexp_count(str, '~\|~')+1;
PARSED
--------------------------------
value1
value2
value4
SQL>

Related

How to replace string using Regexp_Replace in oracle

I want to replace this:
"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"
with this:
"STORES/KOL#10#8#36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"
basically this is conditional based replace I want to replace / with #
like STORES/KOL string should be STORES/KOL
but 10/8/36 string should be 10#8#36
This will replace the 2nd and 3rd / character with a #:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT '"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"'
FROM DUAL;
Query:
SELECT REGEXP_REPLACE(
value,
'^(.*?/.*?)/(.*?)/(.*)$',
'\1#\2#\3'
) AS replacement
FROM test_data
Output:
| REPLACEMENT |
| :---------------------------------------------------------------------------------------------------------------- |
| "STORES/KOL#10#8#36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL" |
db<>fiddle here
Here is one option using REGEXP_REPLACE. We can try targeting the following regex pattern:
#(\d+)/(\d+)/(\d+)#
Then replace using the three capture groups, replacing the path separators with pound signs.
WITH yourTable AS (
SELECT 'STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL' AS input FROM dual
)
SELECT
input,
REGEXP_REPLACE(input, '#(\d+)/(\d+)/(\d+)#', '#\1#\2#\3#') AS output
FROM yourTable;
Demo
Whether or not this regex replacement is specific enough/accurate for the rest of your data depends on that data, which you never showed us.
with s as (select '"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"' str from dual)
select
replace(replace(str, '/', '#'), 'STORES#KOL', 'STORES/KOL') result_str_1,
regexp_replace(str, '(\d)/', '\1#') result_str_2
from s;

PL/SQL - Split string into an associative array

In plsql is there a way to split a string into an associative array?
Sample string: 'test1:First string, test2: Second string, test3: Third string'
INTO
TYPE as_array IS TABLE OF VARCHAR2(50) INDEX BY VARCHAR2(50);
a_array as_array;
dbms_output.put_line(a_array('test1')); // Output 'First string'
dbms_output.put_line(a_array('test2')); // Output 'Second string'
dbms_output.put_line(a_array('test3')); // Output 'Third string'
The format of the string does not matter for my purposes. It could be 'test1-First string; test2-Second string; test3-Third string'. I could do this with a very large function manually splitting by commas first and then splitting each of those but I'm wondering if there is something built in to the language.
Like I said, I am not looking to do it through a large function (especially using substr and making it look messy). I am looking for something that does my task simpler.
There is no built in function for such a requirement.
But you can easily build a query like below to parse these strings:
SELECT y.*
FROM (
select trim(regexp_substr(str,'[^,]+', 1, level)) as str1
from (
SELECT 'test1:First string, test2: Second string, test3: Third string' as Str
FROM dual
)
connect by regexp_substr(str, '[^,]+', 1, level) is not null
) x
CROSS APPLY(
select trim(regexp_substr(str1,'[^:]+', 1, 1)) as key,
trim(regexp_substr(str1,'[^:]+', 1, 2)) as value
from dual
) y
KEY VALUE
------ --------------
test1 First string
test2 Second string
test3 Third string
Then you may use this query in your function and pass it's result to the array.
I leave this exercise for you, I believe you can manage it (tip: use Oracle's bulk collect feature)
This method handles NULL list elements if you need to still show that element 2 is NULL for example. Note the second element is NULL:
-- Original data with multiple delimiters and a NULL element for testing.
with orig_data(str) as (
select 'test1:First string,, test3: Third string' from dual
),
--Split on first delimiter (comma)
Parsed_data(rec) as (
select regexp_substr(str, '(.*?)(,|$)', 1, LEVEL, NULL, 1)
from orig_data
where str is not null
CONNECT BY LEVEL <= REGEXP_COUNT(str, ',') + 1
)
-- For testing-shows records based on 1st level delimiter
--select rec from parsed_data;
-- Split the record into columns
select trim(regexp_replace(rec, '^(.*):.*', '\1')) key,
trim(regexp_replace(rec, '^.*:(.*)', '\1')) value
from Parsed_data;
Watch out for the regex form of [^,]+ for parsing delimited strings, it fails on NULL elements. More Information

Regex is not handling chr(10) character

SELECT
REGEXP_SUBSTR (
str,
'(.*?)(~-delim~-|$)',
1,
LEVEL,
NULL,
1
) output
FROM (
SELECT 'Line1~-delim~-Line2' AS str FROM DUAL
)
CONNECT BY LEVEL <= REGEXP_COUNT (str, '~-delim~-') + 1
Output is
Line1
Line2
SELECT
REGEXP_SUBSTR (
str,
'(.*?)(~-delim~-|$)',
1,
LEVEL,
NULL,
1
) output
FROM (
SELECT 'Line1'||chr(10)||'~-delim~-Line2' AS str FROM DUAL
)
CONNECT BY LEVEL <= REGEXP_COUNT (str, '~-delim~-') + 1
Output is
Line2
Why the the newline is causing the string to suffer? I would expect the output to be:
Line1
Line2
By default, the Oracle regexp engine does not match the wildcard . to the newline character (chr(10)).
You can change that behavior by using the fifth argument to regexp_substr. Currently you assigned null to it. Change that to 'n' (including the single-quotes) and try again. You will get what you need.
Then check the documentation for REGEXP_SUBSTR, specifically about the fifth parameter - you will see what other options are available. You may not need them now, but perhaps you will remember them when you need them in the future.

How to apply regular expression on the below given string

i have a string 'MCDONALD_YYYYMMDD.TXT' i need to use regular expressions and append the '**' after the letter 'D' in the string given . (i.e In the string at postion 9 i need to append '*' based on a column value 'star_len'
if the star_len = 2 the o/p = ''MCDONALD??_YYYYMMDD.TXT'
if the star_len = 1 the o/p = ''MCDONALD?_YYYYMMDD.TXT'
with
inputs ( filename, position, symbol, len ) as (
select 'MCDONALD_20170812.TXT', 9, '*', 2 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select substr(filename, 1, position - 1) || rpad(symbol, len, symbol)
|| substr(filename, position) as new_str
from inputs
;
NEW_STR
-----------------------
MCDONALD**_20170812.TXT
select regexp_replace('MCDONALD_YYYYMMDD.TXT','MCDONALD','MCDONALD' ||
decode(star_len,1,'*',2,'**'))
from dual
This is how you could do it. I don't think you need it as a regular expression though if it is always going to be "MCDONALD".
EDIT: If you need to be providing the position in the string as well, I think a regular old substring should work.
select substr('MCDONALD_YYYYMMDD.TXT',1,position-1) ||
decode(star_len,1,'*',2,'**') || substr('MCDONALD_YYYYMMDD.TXT',position)
from dual
Where position and star_len are both columns in some table you provide(instead of dual).
EDIT2: Just to be more clear, here is another example using a with clause so that it runs without adding a table in.
with testing as
(select 'MCDONALD_YYYYMMDD.TXT' filename,
9 positionnum,
2 star_len
from dual)
select substr(filename,1,positionnum-1) ||
decode(star_len,1,'*',2,'**') ||
substr(filename,positionnum)
from testing
For the fun of it, here's a regex_replace solution. I went with a star since that what your variable was called even though your example used a question mark. The regex captures the filename string in 2 parts, the first being from the start up to 1 character before the position value, the second the rest of the string. The replace puts the captured parts back together with the stars in between.
with tbl(filename, position, star_len ) as (
select 'MCDONALD_20170812.TXT', 9, 2 from dual
)
select regexp_replace(filename,
'^(.{'||(position-1)||'})(.*)$', '\1'||rpad('*', star_len, '*')||'\2') as fixed
from tbl;

trim value till specified string in oracle pl/sql

i want to trim value of the given string till specified string in oracle pl/sql.
some thing like below.
OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.
In the above string i want to trim till $$ so that i will get "flex-Flex_Image_Rotator-1443680885520".
You can use different ways; here are two methods, with and without regexp:
with test(string) as ( select 'OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.' from dual)
select regexp_replace(string, '(.*)(\$\$)(.*)', '\3')
from test
union all
select substr(string, instr(string, '$$') + length('$$'))
from test
You want to do a SUBSTR where the starting position is going to be the position of '$$' + 2 . +2 is because the string '$$' is of length 2, and we don't want to include that string in the result.
Something like -
SELECT SUBSTR (
'ABCDEF$$some_big_text',
INSTR ('ABCDEF$$some_big_text', '$$') + 2)
FROM DUAL;

Resources