Regular Expression with Connect by

Regular Expression with Connect by - oracle

I have this query:
select regexp_substr('1,2,3,4,5','[^,]+',1,level)
from test1
connect by instr('1,2,3,4,5',',',1,level-1)>0;
Please help me to understand this query especially the use of a level and connect by and level-1.

LEVEL and CONNECT BY are used to generate sequences, see this simple example:
select level
from dual
connect by level < 10;
LEVEL
=======
1
2
3
4
5
6
7
8
9
instr('1,2,3,4,5', ',' , 1, level-1) > 0 counts the number of elements separated by comma maybe this version is easier to understand, it does the same:
select regexp_substr('1,2,3,4,5','[^,]+',1,level)
from dual
connect by LEVEL <= REGEXP_COUNT('1,2,3,4,5', ',')+1;
LEVEL <= REGEXP_COUNT('1,2,3,4,5', ',')+1 or instr('1,2,3,4,5', ',' , 1, level-1) > 0 means "run five times".
So, your select is basically a shortcut for
select regexp_substr('1,2,3,4,5','[^,]+', 1, 1) from dual union all
select regexp_substr('1,2,3,4,5','[^,]+', 1, 2) from dual union all
select regexp_substr('1,2,3,4,5','[^,]+', 1, 3) from dual union all
select regexp_substr('1,2,3,4,5','[^,]+', 1, 4) from dual union all
select regexp_substr('1,2,3,4,5','[^,]+', 1, 5) from dual;

#Wernfried's answer is excellent, but you should know there is a big risk with the regex of the format '[^,]+' which one sees often as an example of how to parse delimited strings.
This only works when all elements of the list are present. For an eye-opener, try it with the 2nd element as NULL:
select regexp_substr('1,,3,4,5', '[^,]+', 1, 3) from dual;
REGEXP_SUBSTR('1,,3,4,5','[^,]+',1,3)
-------------------------------------
4
1 row selected.
WHAT? '4' is most definitely NOT the 3rd element of that list! As you can see the NULL is not handled.
Please use this format of REGEXP_SUBSTR which does handle NULL list elements:
select regexp_substr('1,,3,4,5','(.*?)(,|$)', 1, 3, NULL, 1) from dual;
REGEXP_SUBSTR('1,,3,4,5','(.*?)(,|$)',1,3,NULL,1)
-------------------------------------------------
3
1 row selected.
The regex defines defines 2 groups, an optional set of any characters followed by a comma or the end of the line. The arguments to REGEXP_SUBSTR
say to return the 1st group of the 3rd instance of this match.

Related

How to select second split of column data from oracle database

I want to select the data from a Oracle table, whereas the table columns contains the data as , [ex : key,value] separated values; so here I want to select the second split i.e, value
table column data as below :
column_data
++++++++++++++
asper,worse
tincher,good
golder
null -- null values need to eliminate while selection
www,ewe
from the above data, desired output like below:
column_data
+++++++++++++
worse
good
golder
ewe
Please help me with the query

According to data you provided, here are two options:
result1: regular expressions one (get the 2nd word if it exists; otherwise, get the 1st one)
result2: SUBSTR + INSTR combination
SQL> with test (col) as
2 (select 'asper,worse' from dual union all
3 select 'tincher,good' from dual union all
4 select 'golder' from dual union all
5 select null from dual union all
6 select 'www,ewe' from dual
7 )
8 select col,
9 nvl(regexp_substr(col, '\w+', 1, 2), regexp_substr(col, '\w+', 1,1 )) result1,
10 --
11 nvl(substr(col, instr(col, ',') + 1), col) result2
12 from test
13 where col is not null;
COL RESULT1 RESULT2
------------ -------------------- --------------------
asper,worse worse worse
tincher,good good good
golder golder golder
www,ewe ewe ewe
SQL>

Specific Pattern Matching in Oracle

I have a close to 1000 records which have first two characters as alphabets and then an number of characters.
eg.
- BE123
- QT12124
- ST1000
- XY12345
and similar data....
I have a table X in Oracle having a column 'Serial Number' which will be having similar data , but it is of standard length 7 and also start with first two characters as alphabets.
I want to do a pattern matching on the column Serial Number where i can use LIKE and '%' matching on the characters after first two characters in the column for eg
if the column has a data
- BE00123 , it should start give me BE123 as matched data
- QT12124 , it is matched data
- ST11001 , unmatched data
- XY12345, matched data

Well, this returns what you asked. See if it helps.
SQL> with t_one (col) as
2 (select 'BE123' from dual union all
3 select 'QT12124' from dual union all
4 select 'ST1000' from dual union all
5 select 'XY12345' from dual
6 ),
7 t_two (col) as
8 (select 'BE00123' from dual union all
9 select 'QT12124' from dual union all
10 select 'ST11001' from dual union all
11 select 'XY12345' from dual
12 )
13 select o.col
14 from t_one o join t_two t on substr(o.col, 1, 2) = substr(t.col, 1, 2)
15 and instr(t.col, substr(o.col, 3)) > 0;
COL
-------
BE123
QT12124
XY12345
SQL>
Line 14: match first two characters
Line 15: check whether characters from T_ONE table, starting from position 3, are contained in the T_TWO table's column value

Parse Values with Delimiters Using REGEXP_SUPSTR in Oracle 10g

I have a table called TVL_DETAIL that contains column TVL_CD_LIST. Column TVL_CD_LIST contains three records:
TVL_CD_LIST:
M1180_Z6827
K5900_Z6828
I2510
I've used the following code in an attempt to return the values only(so excluding the underscore):
SELECT
TVL_CD_LIST
FROM TVL_DETAIL
WHERE TVL_CD_LIST IN (SELECT regexp_substr(TVL_CD_LIST,'[^_]+', 1, level) FROM DUAL
CONNECT BY regexp_substr(TVL_CD_LIST,'[^_]+', 1, level) IS NOT NULL)
What I was expecting to see returned in separate rows was:
M1180
Z6827
K5900
Z6828
I2510
But it only returns I2510(which is the original value that doesn't contain an underscore).
What am I doing wrong? Any help is appreciated. Thanks!

To answer your question, you are querying for the list where it matches a sub-element and that will only happen where the list is comprised of one element. What you really wanted to select are the sub-elements themselves.
Note: Explanation of why parsing strings using the regex form '[^_]+' is bad here: https://stackoverflow.com/a/31464699/2543416
You want to parse the list, selecting the elements:
SQL> with TVL_DETAIL(TVL_CD_LIST) as (
select 'M1180_Z6827' from dual union
select 'K5900_Z6828' from dual union
select 'I2510' from dual
)
SELECT distinct regexp_substr(TVL_CD_LIST, '(.*?)(_|$)', 1, level, NULL, 1) element
FROM TVL_DETAIL
CONNECT BY level <= LENGTH(regexp_replace(TVL_CD_LIST, '[^_]', '')) + 1;
-- 11g CONNECT BY level <= regexp_count(TVL_CD_LIST, '_') + 1;
ELEMENT
-----------
Z6827
K5900
M1180
I2510
Z6828
SQL>
And this is cool if you want to track by row and element within row:
SQL> with TVL_DETAIL(row_nbr, TVL_CD_LIST) as (
select 1, 'M1180_Z6827' from dual union
select 2, 'K5900_Z6828' from dual union
select 3, 'I2510' from dual
)
SELECT row_nbr, column_value substring_nbr,
regexp_substr(TVL_CD_LIST, '(.*?)(_|$)', 1, column_value, NULL, 1) element
FROM TVL_DETAIL,
TABLE(
CAST(
MULTISET(SELECT LEVEL
FROM dual
CONNECT BY level <= LENGTH(regexp_replace(TVL_CD_LIST, '[^_]', '')) + 1
-- 11g CONNECT BY LEVEL <= REGEXP_COUNT(TVL_CD_LIST, '_')+1
) AS sys.OdciNumberList
)
)
order by row_nbr, substring_nbr;
ROW_NBR SUBSTRING_NBR ELEMENT
---------- ------------- -----------
1 1 M1180
1 2 Z6827
2 1 K5900
2 2 Z6828
3 1 I2510
SQL>
EDIT: Oops, edited to work with 10g as REGEXP_COUNT is not available until 11g.

The query you have used creates the list but you are comparing the list of record with column it self using the in clause, as such M1180 or Z6827 cannot be equal to M1180_Z6827 and so for K5900_Z6828. I2510 has only one value so it gets matched.
You can use below query if your requirement is exactly what you have mentioned in your desired output.
SQL> WITH tvl_detail AS
2 (SELECT 'M1180_Z6827' tvl_cd_list FROM dual
3 UNION ALL
4 SELECT 'K5900_Z6828' FROM dual
5 UNION ALL
6 SELECT 'I2510' FROM dual)
7 ---------------------------
8 --- End of data preparation
9 ---------------------------
10 SELECT regexp_substr(tvl_cd_list, '[^_]+', 1, LEVEL) AS tvl_cd_list
11 FROM tvl_detail
12 CONNECT BY regexp_substr(tvl_cd_list, '[^_]+', 1, LEVEL) IS NOT NULL
13 AND PRIOR tvl_cd_list = tvl_cd_list
14 AND PRIOR sys_guid() IS NOT NULL;
OUTPUT:
TVL_CD_LIST
--------------------------------------------
I2510
K5900
Z6828
M1180
Z6827

how to add different column with one code into one column?

id text
1 hi
1 how are u
1 fine ?
2 rad
2 qey
3
I am searching for a query where it can let me insert id = 1 to another table in one column
id test
1 hi how are you fine ?
2 rad qey
with the listagg function , i will have such result : hi how are you fine ? .. can i have it such
hi
how are u
fine ?

This query can be used to achieve your result:
WITH tab(id,text) AS (
SELECT 1, 'hi' FROM dual UNION ALL
SELECT 1, 'how are u' FROM dual UNION ALL
SELECT 1, 'fine ?' FROM dual UNION ALL
SELECT 2, 'rad' FROM dual UNION ALL
SELECT 2, 'qey' FROM dual UNION ALL
SELECT 3, NULL FROM dual)
-----
--End of data
-----
SELECT ID,
listagg(text, ' ') within GROUP (ORDER BY ROWNUM) AS text
FROM tab
GROUP BY ID;
Output
ID TEXT
1 hi how are u fine ?
2 rad qey
3
But there is one problem, although order by rownum is used but you cannot guaranty the order of text, its better to have one more column in your table that can define the order of the text as below
id text order_text
1 hi 1
1 how are u 2
1 fine ? 3
2 rad 1
2 qey 2
3 1
and the use the query as
SELECT ID,
listagg(text, ' ') within GROUP (ORDER BY order_text) AS text
FROM tab
GROUP BY ID;

You can try the LISTAGG function.
For more information, see here.

CASE Oracle SQL for State

I have a field that can have one or multiple states listed in it (callcenter.stateimpact). If the callcenter.stateimpact contains "OK","TX","AK","TN","NC","SC","GA","FL","AL","MS" or "LA" I need the output field of the SQL to say "South" and if not those, the output needs to say "North". If the callcenter.stateimpact has both South & North states, it needs to say "BOTH" in the output. How do I do this in the Select statement? The fields in this table are callcenter.callid, callcenter.stateimpact, callcenter.callstart and callcenter.callstop. You help is greatly appreciated.

This is tough to explain, so there's a SQL Fiddle here that lays out the values involved.
The best approach I could come up with (other than normalizing the StateImpact value) was to use REGEXP_REPLACE to suck all the "South" states out of the string and then look at the length of what was left. First, here's what REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)') will do to a few sample values:
StateImpact REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)')
----------------------------- -----------------------------------------------------------------
OK,TX,AK,TN,NC,SC,GA,FL,AL,MS ,,,,,,,,,
MI,MA MI,MA
TX null
TX,MI,MA ,MI,MA
So if you're left with all commas or with a null, all the states were South. If you're left with the original string, all states were North. Anything else and it's Both. That makes for a pretty big and confusing CASE statement no matter how you write it. I went with comparing lengths before and after, like so:
Length after replace = 0 (or null): South
Length after replace = (length before + 1) * 3 - 1: South
Length after replace = length before replace: North
Anything else: Both
The second one above is just some math to account for the fact that if (for example) there are five states in StateImpact and they're all South, you'll be left with four commas. Hard to explain but it works :)
Here's the query:
SELECT
StateImpact,
CASE NVL(LENGTH(REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)')), 0)
WHEN LENGTH(StateImpact) THEN 'North'
WHEN (LENGTH(StateImpact) + 1) / 3 - 1 THEN 'South'
ELSE 'Both'
END AS RegionImpact
FROM CallCenter
The SQL Fiddle referenced above also shows the length before and after the REGEXP_REPLACE, which will hopefully help explain the calculations.

One of the ways to reach desired result is to use multiset operators.
But first we need to break string separated by , into rows. One of the way to do that is trick with connect by :
-- Trick with building resultset from tokenized string
with dtest_string as (
select 'OK,TX,AK,TN,NC,SC,GA,FL,AL,MS' StateImpact from dual
)
select
level lvl,
substr( -- Extract part of source string
StateImpact,
-- from N-th occurence of separator
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 ),
-- with length of substring from N-th to (N+1)-th occurence of separator or to the end.
decode( instr(StateImpact,',',1,level), 0, length(StateImpact)+1, instr(StateImpact,',',1,level) )
-
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 )
) code
from test_string
start with
StateImpact is not null -- no entries for empty string
connect by
instr(StateImpact,',',1,level-1) > 0 -- continue if separator found on previous step
Just for fun: same trick with ANSI syntax on SQLFiddle
Next, we need to declare type which we can use to store collections:
create or replace type TCodeList as table of varchar2(100);
After that it's possible to build a query:
with all_south_list as (
-- prepare list of south states
select 'OK' as code from dual union all
select 'TX' as code from dual union all
select 'AK' as code from dual union all
select 'TN' as code from dual union all
select 'NC' as code from dual union all
select 'SC' as code from dual union all
select 'GA' as code from dual union all
select 'FL' as code from dual union all
select 'AL' as code from dual union all
select 'MS' as code from dual union all
select 'LA' as code from dual
)
select
StateImpact,
-- Make decision based on counts
case
when total_count = 0 then 'None'
when total_count = south_count then 'South'
when south_count = 0 then 'North'
else 'Both'
end RegionImpact,
total_count,
south_count,
north_count
from (
select
StateImpact,
-- count total number of states in StateImpact
cardinality(code_list) total_count,
-- count number of south states in StateImpact
cardinality(code_list multiset intersect south_list) south_count,
-- count number of non-south states in StateImpact
cardinality(code_list multiset except south_list) north_count
from (
select
StateImpact,
(
cast(multiset( -- Convert set of values into collection which acts like a nested table
select -- same trick as above
substr(
StateImpact,
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 ),
decode( instr(StateImpact,',',1,level), 0, length(StateImpact)+1, instr(StateImpact,',',1,level) )
-
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 )
) code
from dual
start with StateImpact is not null
connect by instr(StateImpact,',',1,level-1) > 0
) as TCodeList
)
) code_list,
-- Build collection from south states list
cast(multiset(select code from all_south_list) as TCodeList) south_list
from
CallCenter
)
)
Link to SQLFiddle

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regular Expression with Connect by - oracle

I have this query: select regexp_substr('1,2,3,4,5','[^,]+',1,level) from test1 connect by instr('1,2,3,4,5',',',1,level-1)>0; Please help me to understand this query especially the use of a level and connect by and level-1.

Related

How to select second split of column data from oracle database

Specific Pattern Matching in Oracle

Parse Values with Delimiters Using REGEXP_SUPSTR in Oracle 10g

how to add different column with one code into one column?

CASE Oracle SQL for State

Categories

Resources