Oracle regexp_replace to pick out pattern matching groups - oracle

I'm struggling to get the groups that match the pattern, out of a string in Oracle 11g.
It's nearly working, but I don't understand why the final non-matching part is still there:
select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*','\1,') c1
from dual
Current result: ST1_12,KG32_1,VI7_08, text3
Expected result: ST1_12,KG32_1,VI7_08
It seems to me, that the end part is not included in the search pattern and that is simply glued at the end, but how get rid of that?

After it matches the third group it starts looking for the next match from text3; the trailing .* is effectively ignored. For the earlier groups that's what you want - otherwise the trailing .* on the first group would include the rest of the string and you'd lose the other groups. When it starts from text3 it doesn't find another match, so the original value (at that point) is returned.
If the values are always comma-separated then you could include comma or end-of-string anchor to make it include the remaining text - up to the space anyway - in the match, but not in \1:
select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)', '\1,') c1
from dual;
ST1_12,KG32_1,VI7_08,
You can use a trim function to get rid of the trailing comma:
select rtrim(regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)','\1,', 1, 0, null),
',') as c1
from dual;
ST1_12,KG32_1,VI7_08
Another option, which doesn't rely on the commas existing, is to split the string into multiple values:
select regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');
ST1_12
KG32_1
VI7_08
and then aggregate them back together:
select listagg(
regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1),
',') within group (order by level) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');
ST1_12,KG32_1,VI7_08
db<>fiddle

Related

FInd if the fifth position is a letter and not a number using ORACLE

How can I find if the fifth position is a letter and thus not a number using Oracle ?
My last try was using the following statement:
REGEXP_LIKE (table_column, '([abcdefghijklmnopqrstuvxyz])');
Perhaps you'd rather check whether 5th position contains a number (which means that it is not something else), i.e. do the opposite of what you're doing now.
Why? Because a "letter" isn't only ASCII; have a look at the 4th row in my example - it contains Croatian characters and these aren't between [a-z] (nor [A-Z]).
SQL> with test (col) as
2 (select 'abc_3def' from dual union all
3 select 'A435D887' from dual union all
4 select '!#$%&/()' from dual union all
5 select 'ASDĐŠŽĆČ' from dual
6 )
7 select col,
8 case when regexp_like(substr(col, 5, 1), '\d+') then 'number'
9 else 'not a number'
10 end result
11 from test;
COL RESULT
------------- ------------
abc_3def number
A435D887 not a number
!#$%&/() not a number
ASDĐŠŽĆČ not a number
SQL>
Anchor to the start of the string else you may get unexpected results. This works, but remove the caret (start of string anchor) and it returns 'TRUE'! Note it uses the case-insensitive flag of 'i'.
select 'TRUE'
from dual
where regexp_like('abcd4fg', '^.{4}[A-Z]', 'i');
Yet another way to do it:
regexp_like(table_column, '^....[[:alpha:]]')
Using the character class [[:alpha:]] will pick up all letters upper case, lower case, accented and etc. but will ignore numbers, punctuation and white space characters.
If what you care about is that the character is not a number, then use
not regexp_like(table_column, '^....[[:digit:]]')
or
not regexp_like(table_column, '^....\d')
Try:
REGEXP_LIKE (table_column, '^....[a-z]')
Or:
SUBSTR (table_column, 5, 1 ) BETWEEN 'a' AND 'z'

PL/SQL - Split string into an associative array

In plsql is there a way to split a string into an associative array?
Sample string: 'test1:First string, test2: Second string, test3: Third string'
INTO
TYPE as_array IS TABLE OF VARCHAR2(50) INDEX BY VARCHAR2(50);
a_array as_array;
dbms_output.put_line(a_array('test1')); // Output 'First string'
dbms_output.put_line(a_array('test2')); // Output 'Second string'
dbms_output.put_line(a_array('test3')); // Output 'Third string'
The format of the string does not matter for my purposes. It could be 'test1-First string; test2-Second string; test3-Third string'. I could do this with a very large function manually splitting by commas first and then splitting each of those but I'm wondering if there is something built in to the language.
Like I said, I am not looking to do it through a large function (especially using substr and making it look messy). I am looking for something that does my task simpler.
There is no built in function for such a requirement.
But you can easily build a query like below to parse these strings:
SELECT y.*
FROM (
select trim(regexp_substr(str,'[^,]+', 1, level)) as str1
from (
SELECT 'test1:First string, test2: Second string, test3: Third string' as Str
FROM dual
)
connect by regexp_substr(str, '[^,]+', 1, level) is not null
) x
CROSS APPLY(
select trim(regexp_substr(str1,'[^:]+', 1, 1)) as key,
trim(regexp_substr(str1,'[^:]+', 1, 2)) as value
from dual
) y
KEY VALUE
------ --------------
test1 First string
test2 Second string
test3 Third string
Then you may use this query in your function and pass it's result to the array.
I leave this exercise for you, I believe you can manage it (tip: use Oracle's bulk collect feature)
This method handles NULL list elements if you need to still show that element 2 is NULL for example. Note the second element is NULL:
-- Original data with multiple delimiters and a NULL element for testing.
with orig_data(str) as (
select 'test1:First string,, test3: Third string' from dual
),
--Split on first delimiter (comma)
Parsed_data(rec) as (
select regexp_substr(str, '(.*?)(,|$)', 1, LEVEL, NULL, 1)
from orig_data
where str is not null
CONNECT BY LEVEL <= REGEXP_COUNT(str, ',') + 1
)
-- For testing-shows records based on 1st level delimiter
--select rec from parsed_data;
-- Split the record into columns
select trim(regexp_replace(rec, '^(.*):.*', '\1')) key,
trim(regexp_replace(rec, '^.*:(.*)', '\1')) value
from Parsed_data;
Watch out for the regex form of [^,]+ for parsing delimited strings, it fails on NULL elements. More Information

Regexp_substr find string not matching a group of characters

I have a string like mystr = 'value1~|~value2~|~ ... valuen". I need it as one column separated on rows like this:
value1
value2
...
valuen
I'm trying this
select regexp_substr(mystr, '[^(~\|~)]', 1 , lvl) from dual, (select level as lvl from dual connect by level <= 5);
The problem is that ~|~ is not treated as a group, if I add ~ to anywhere in the string it gets separated; also () are treated as separators.
Any help is highly appreciated! Thanks! ~|~
Quick and dirty solution:
with t as (
select rtrim(regexp_substr('value1~|~value2~|~value3~|~value4', '(.+?)($|~\|~)', 1,level,''),'~|~')value from dual connect by level<10
) select * from t where value is not null;
[] signifies a single character match and [^] signifies a single character that does not match any of the contained characters.
So [^(~\|~)] will match any one character that is not ( or ~ or \ or | or ~ (again) or ).
What you want is a match that is terminated by your separator:
SELECT REGEXP_SUBSTR(
mystr,
'(.*?)(~\|~)',
1,
LEVEL,
NULL,
1
)
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( mystr, '(.*?)(~\|~)' );
(or if you cannot have zero-width matches, you can use the regular expression '(.+?)(~\|~)' and <= in the CONNECT BY clause.)
This will parse the delimited list and the format of the regex will handle NULL list elements should they occur as shown in the example.
SQL> with tbl(str) as (
select 'value1~|~value2~|~~|~value4' from dual
)
select regexp_substr(str, '(.*?)(~\|~|$)', 1, level, NULL, 1) parsed
from tbl
connect by level <= regexp_count(str, '~\|~')+1;
PARSED
--------------------------------
value1
value2
value4
SQL>

trim value till specified string in oracle pl/sql

i want to trim value of the given string till specified string in oracle pl/sql.
some thing like below.
OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.
In the above string i want to trim till $$ so that i will get "flex-Flex_Image_Rotator-1443680885520".
You can use different ways; here are two methods, with and without regexp:
with test(string) as ( select 'OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.' from dual)
select regexp_replace(string, '(.*)(\$\$)(.*)', '\3')
from test
union all
select substr(string, instr(string, '$$') + length('$$'))
from test
You want to do a SUBSTR where the starting position is going to be the position of '$$' + 2 . +2 is because the string '$$' is of length 2, and we don't want to include that string in the result.
Something like -
SELECT SUBSTR (
'ABCDEF$$some_big_text',
INSTR ('ABCDEF$$some_big_text', '$$') + 2)
FROM DUAL;

How to split varchar in oracle

I have a procedure in where I am taking the input parameters as array of strings.
This String contains like 5-Deal deleted
I want to split this varchar into 5 and Deal deleted.
Here split conditions is -
Try using regexp_substr:
select regexp_substr ('5-Deal deleted' , '[^-]+', 1, rownum) split
from dual
connect by level <= length (regexp_replace ('5-Deal deleted' , '[^-]+')) + 1;
Then you can use BULK COLLECT INTO for store into a variable
For such a simple task, I would go with SUBSTR and INSTR. REGULAR EXPRESSION would be too much resource consuming.
INSTR would find the position of -, i.e. hyphen, and SUBSTR would pick the required portion of the string.
Or,
If your example data is what it looks like for all the rows, then, just extract DIGIT and ALPHA from the string and just concatenate them. This would obviously need REGULAR EXPRESSION.
Try this:
SELECT SUBSTR ('5-Deal deleted', 1, INSTR ('5-Deal deleted', '-') - 1)
AS FIRST,
SUBSTR ('5-Deal deleted', INSTR ('5-Deal deleted', '-') + 1) AS second
FROM DUAL

Resources