REGEXP to capture values delimited by a set of delimiters - oracle

My column value looks something like below: [Just an example i created]
{BASICINFOxxxFyyy100x} {CONTACTxxx12345yyy20202x}
It can contain 0 or more blocks of data... I have created the below query to split the blocks
with x as
(select
'{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a from dual)
select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1)
from x
connect by rownum <= REGEXP_COUNT(a,'x}')
However I would like to further split the output into 3 columns like below:
ColumnA | ColumnB | ColumnC
------------------------------
BASICINFO | F |100
CONTACT | 12345 |20202
The delimiters are always standard. I failed to create a pretty query which gives me the desired output.
Thanks in advance.

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE your_table ( str ) AS
SELECT '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' from dual
/
Query 1:
select REGEXP_SUBSTR(
t.str,
'\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}',
1,
l.COLUMN_VALUE,
NULL,
1
) AS col1,
REGEXP_SUBSTR(
str,
'\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}',
1,
l.COLUMN_VALUE,
NULL,
2
) AS col2,
REGEXP_SUBSTR(
str,
'\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}',
1,
l.COLUMN_VALUE,
NULL,
3
) AS col3
FROM your_table t
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.str,'\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}')
) AS SYS.ODCINUMBERLIST
)
) l
Results:
| COL1 | COL2 | COL3 |
|-----------|-------|-------|
| BASICINFO | F | 100 |
| CONTACT | 12345 | 20202 |
Note:
Your query:
select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1)
from x
connect by rownum <= REGEXP_COUNT(a,'x}')
Will not work when you have multiple rows of input - In the CONNECT BY clause, the hierarchical query has nothing to restrict it connecting Row1-Level2 to Row1-Level1 or to Row2-Level1 so it will connect it to both and as the depth of the hierarchies gets greater it will create exponentially more duplicate copies of the output rows. There are hacks you can use to stop this but it is much more efficient to put the row generator into a correlated sub-query which can then be CROSS JOINed back to the original table (it is correlated so it won't join to the wrong rows) if you are going to use hierarchical queries.
Better yet would be to fix your data structure so you are not storing multiple values in delimited strings.

SQL> with x as
2 (select '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a from dual
3 ),
4 y as (
5 select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1) c1
6 from x
7 connect by rownum <= REGEXP_COUNT(a,'x}')
8 )
9 select
10 substr(c1,2,instr(c1,'xxx')-2) z1,
11 substr(c1,instr(c1,'xxx')+3,instr(c1,'yyy')-instr(c1,'xxx')-3) z2,
12 rtrim(substr(c1,instr(c1,'yyy')+3),'x}') z3
13 from y;
Z1 Z2 Z3
--------------- --------------- ---------------
BASICINFO F 100
CONTACT 12345 20202

Here is another solution, which is derived from the place you left. Your query had already resulted into splitting of a row to 2 row. Below will make it in 3 columns:
WITH x
AS (SELECT '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a
FROM DUAL),
-- Your query result here
tbl
AS ( SELECT REGEXP_SUBSTR (a,
'({.*?x})',
1,
ROWNUM,
NULL,
1)
Col
FROM x
CONNECT BY ROWNUM <= REGEXP_COUNT (a, 'x}'))
--- Actual Query
SELECT col,
REGEXP_SUBSTR (col,
'(.*?{)([^x]+)',
1,
1,
'',
2)
AS COL1,
REGEXP_SUBSTR (REGEXP_SUBSTR (col,
'(.*?)([^x]+)',
1,
2,
'',
2),
'[^y]+',
1,
1)
AS COL2,
REGEXP_SUBSTR (REGEXP_SUBSTR (col,
'[^y]+x',
1,
2),
'[^x]+',
1,
1)
AS COL3
FROM tbl;
Output:
SQL> /
COL COL1 COL2 COL3
------------------------------------------------ ------------------------------------------------ ------------------------------------------------ ------------------------------------------------
{BASICINFOxxxFyyy100x} BASICINFO F 100
{CONTACTxxx12345yyy20202x} CONTACT 12345 20202

Related

ORACLE - How to use LAG to display strings from all previous rows into current row

I have data like below:
group
seq
activity
A
1
scan
A
2
visit
A
3
pay
B
1
drink
B
2
rest
I expect to have 1 new column "hist" like below:
group
seq
activity
hist
A
1
scan
NULL
A
2
visit
scan
A
3
pay
scan, visit
B
1
drink
NULL
B
2
rest
drink
I was trying to solve with LAG function, but LAG only returns one row from previous instead of multiple.
Truly appreciate any help!
Use a correlated sub-query:
SELECT t.*,
(SELECT LISTAGG(activity, ',') WITHIN GROUP (ORDER BY seq)
FROM table_name l
WHERE t."GROUP" = l."GROUP"
AND l.seq < t.seq
) AS hist
FROM table_name t
Or a hierarchical query:
SELECT t.*,
SUBSTR(SYS_CONNECT_BY_PATH(PRIOR activity, ','), 3) AS hist
FROM table_name t
START WITH seq = 1
CONNECT BY
PRIOR seq + 1 = seq
AND PRIOR "GROUP" = "GROUP"
Or a recursive sub-query factoring clause:
WITH rsqfc ("GROUP", seq, activity, hist) AS (
SELECT "GROUP", seq, activity, NULL
FROM table_name
WHERE seq = 1
UNION ALL
SELECT t."GROUP", t.seq, t.activity, r.hist || ',' || r.activity
FROM rsqfc r
INNER JOIN table_name t
ON (r."GROUP" = t."GROUP" AND r.seq + 1 = t.seq)
)
SEARCH DEPTH FIRST BY "GROUP" SET order_rn
SELECT "GROUP", seq, activity, SUBSTR(hist, 2) AS hist
FROM rsqfc
Which, for the sample data:
CREATE TABLE table_name ("GROUP", seq, activity) AS
SELECT 'A', 1, 'scan' FROM DUAL UNION ALL
SELECT 'A', 2, 'visit' FROM DUAL UNION ALL
SELECT 'A', 3, 'pay' FROM DUAL UNION ALL
SELECT 'B', 1, 'drink' FROM DUAL UNION ALL
SELECT 'B', 2, 'rest' FROM DUAL;
All output:
GROUP
SEQ
ACTIVITY
HIST
A
1
scan
null
A
2
visit
scan
A
3
pay
scan,visit
B
1
drink
null
B
2
rest
drink
db<>fiddle here
To aggregate strings in Oracle we use LISAGG function.
In general, you need a windowing_clause to specify a sliding window for analytic function to calculate running total.
But unfortunately LISTAGG doesn't support it.
To simulate this behaviour you may use model_clause of the select statement. Below is an example with explanation.
select
group_
, activity
, seq
, hist
from t
model
/*Where to restart calculation*/
partition by (group_)
/*Add consecutive numbers to reference "previous" row per group.
May use "seq" column if its values are consecutive*/
dimension by (
row_number() over(
partition by group_
order by seq asc
) as rn
)
measures (
/*Other columnns to return*/
activity
, cast(null as varchar2(1000)) as hist
, seq
)
rules update (
/*Apply this rule sequentially*/
hist[any] order by rn asc =
/*Previous concatenated result*/
hist[cv()-1]
/*Plus comma for the third row and tne next rows*/
|| presentv(activity[cv()-2], ',', '') /**/
/*lus previous row's value*/
|| activity[cv()-1]
)
GROUP_ | ACTIVITY | SEQ | HIST
:----- | :------- | --: | :---------
A | scan | 1 | null
A | visit | 2 | scan
A | pay | 3 | scan,visit
B | drink | 1 | null
B | rest | 2 | drink
db<>fiddle here
Few more variants (without subqueries):
SELECT--+ NO_XML_QUERY_REWRITE
t.*,
regexp_substr(
listagg(activity, ',')
within group(order by SEQ)
over(partition by "GROUP")
,'^([^,]+,){'||(row_number()over(partition by "GROUP" order by seq)-1)||'}'
)
AS hist1
,xmlcast(
xmlquery(
'string-join($X/A/B[position()<$Y]/text(),",")'
passing
xmlelement("A", xmlagg(xmlelement("B", activity)) over(partition by "GROUP")) as x
,row_number()over(partition by "GROUP" order by seq) as y
returning content
)
as varchar2(1000)
) hist2
FROM table_name t;
DBFIddle: https://dbfiddle.uk/?rdbms=oracle_21&fiddle=9b477a2089d3beac62579d2b7103377a
Full test case with output:
with table_name ("GROUP", seq, activity) AS (
SELECT 'A', 1, 'scan' FROM DUAL UNION ALL
SELECT 'A', 2, 'visit' FROM DUAL UNION ALL
SELECT 'A', 3, 'pay' FROM DUAL UNION ALL
SELECT 'B', 1, 'drink' FROM DUAL UNION ALL
SELECT 'B', 2, 'rest' FROM DUAL
)
SELECT--+ NO_XML_QUERY_REWRITE
t.*,
regexp_substr(
listagg(activity, ',')
within group(order by SEQ)
over(partition by "GROUP")
,'^([^,]+,){'||(row_number()over(partition by "GROUP" order by seq)-1)||'}'
)
AS hist1
,xmlcast(
xmlquery(
'string-join($X/A/B[position()<$Y]/text(),",")'
passing
xmlelement("A", xmlagg(xmlelement("B", activity)) over(partition by "GROUP")) as x
,row_number()over(partition by "GROUP" order by seq) as y
returning content
)
as varchar2(1000)
) hist2
FROM table_name t;
GROUP SEQ ACTIV HIST1 HIST2
------ ---------- ----- ------------------------------ ------------------------------
A 1 scan
A 2 visit scan, scan
A 3 pay scan,visit, scan,visit
B 1 drink
B 2 rest drink, drink

SQL | SPLIT COLUMNS INTO ROWS

How can I split the column data into rows with basic SQL.
COL1 COL2
1 A-B
2 C-D
3 AAA-BB
Result
COL1 Col2
1 A
1 B
2 C
2 D
3 AAA
3 BB
From Oracle 12, if it is always two delimited values then you can use:
SELECT t.col1,
l.col2
FROM table_name t
CROSS JOIN LATERAL (
SELECT SUBSTR(col2, 1, INSTR(col2, '-') - 1) AS col2 FROM DUAL
UNION ALL
SELECT SUBSTR(col2, INSTR(col2, '-') + 1) FROM DUAL
) l
Which, for the sample data:
CREATE TABLE table_name (COL1, COL2) AS
SELECT 1, 'A-B' FROM DUAL UNION ALL
SELECT 2, 'C-D' FROM DUAL UNION ALL
SELECT 3, 'AAA-BB' FROM DUAL;
Outputs:
COL1
COL2
1
A
1
B
2
C
2
D
3
AAA
3
BB
db<>fiddle here
Snowflake is tagged, so here's the snowflake way of doing this:
WITH TEST (col1, col2) as
(select 1, 'A-B' from dual union all
select 2, 'C-D' from dual union all
select 3, 'AAA-BB' from dual
)
SELECT test.col1, table1.value
FROM test, LATERAL strtok_split_to_table(test.col2, '-') as table1
ORDER BY test.col1, table1.value;
As of Oracle:
SQL> with test (col1, col2) as
2 (select 1, 'A-B' from dual union all
3 select 2, 'C-D' from dual union all
4 select 3, 'AAA-BB' from dual
5 )
6 select col1,
7 regexp_substr(col2, '[^-]+', 1, column_value) col2
8 from test cross join
9 table(cast(multiset(select level from dual
10 connect by level <= regexp_count(col2, '-') + 1
11 ) as sys.odcinumberlist))
12 order by col1, col2;
COL1 COL2
---------- ------------------------
1 A
1 B
2 C
2 D
3 AAA
3 BB
6 rows selected.
SQL>
For MS-SQL 2016 and higher you can use:
SELECT Col1, x.value
FROM t CROSS APPLY STRING_SPLIT(t.Col2, '-') as x;
BTW: If Col2 contains null, it does not appear in the result.

Count column comma delimited values oracle

Is it possible to count and also group by comma delimited values in the oracle database table? This is a table data example:
id | user | title |
1 | foo | a,b,c |
2 | bar | a,d |
3 | tee | b |
The expected result would be:
title | count
a | 2
b | 2
c | 1
d | 1
I wanted to use concat like this:
SELECT a.title FROM Account a WHERE concat(',', a.title, ',') LIKE 'a' OR concat(',', a.title, ',') LIKE 'b' ... GROUP BY a.title?
But I'm getting invalid number of arguments on concat. The title values are predefined, therefore I don't mind if I have to list all of them in the query. Any help is greatly appreciated.
This uses simple string functions and a recursive sub-query factoring and may be faster than using regular expressions and correlated joins:
Oracle Setup:
CREATE TABLE account ( id, "user", title ) AS
SELECT 1, 'foo', 'a,b,c' FROM DUAL UNION ALL
SELECT 2, 'bar', 'a,d' FROM DUAL UNION ALL
SELECT 3, 'tee', 'b' FROM DUAL;
Query:
WITH positions ( title, start_pos, end_pos ) AS (
SELECT title,
1,
INSTR( title, ',', 1 )
FROM account
UNION ALL
SELECT title,
end_pos + 1,
INSTR( title, ',', end_pos + 1 )
FROM positions
WHERE end_pos > 0
),
items ( item ) AS (
SELECT CASE end_pos
WHEN 0
THEN SUBSTR( title, start_pos )
ELSE SUBSTR( title, start_pos, end_pos - start_pos )
END
FROM positions
)
SELECT item,
COUNT(*)
FROM items
GROUP BY item
ORDER BY item;
Output:
ITEM | COUNT(*)
:--- | -------:
a | 2
b | 2
c | 1
d | 1
db<>fiddle here
Split titles to rows and count them.
SQL> with test (id, title) as
2 (select 1, 'a,b,c' from dual union all
3 select 2, 'a,d' from dual union all
4 select 3, 'b' from dual
5 ),
6 temp as
7 (select regexp_substr(title, '[^,]', 1, column_value) val
8 from test cross join table(cast(multiset(select level from dual
9 connect by level <= regexp_count(title, ',') + 1
10 ) as sys.odcinumberlist))
11 )
12 select val as title,
13 count(*)
14 From temp
15 group by val
16 order by val;
TITLE COUNT(*)
-------------------- ----------
a 2
b 2
c 1
d 1
SQL>
If titles aren't that simple, then modify REGEXP_SUBSTR (add + sign) in line #7, e.g.
SQL> with test (id, title) as
2 (select 1, 'Robin Hood,Avatar,Star Wars Episode III' from dual union all
3 select 2, 'Mickey Mouse,Avatar' from dual union all
4 select 3, 'The Godfather' from dual
5 ),
6 temp as
7 (select regexp_substr(title, '[^,]+', 1, column_value) val
8 from test cross join table(cast(multiset(select level from dual
9 connect by level <= regexp_count(title, ',') + 1
10 ) as sys.odcinumberlist))
11 )
12 select val as title,
13 count(*)
14 From temp
15 group by val
16 order by val;
TITLE COUNT(*)
------------------------------ ----------
Avatar 2
Mickey Mouse 1
Robin Hood 1
Star Wars Episode III 1
The Godfather 1
SQL>

Need a query to seperate char values and number values from table in oracle [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have table TAB1 which is having one column COL1. as shown below.
TAB1
COL1
123
Xyz
CM
44
I need single query which will give following output.
Ccol | Ncol
Xyz | 123
CM | 45
From Oracle 12 you can define a function in a sub-query factoring clause and this can easily determine whether a value is numeric:
Oracle Setup:
CREATE TABLE table_name (COL1) AS
SELECT '123' FROM DUAL UNION ALL
SELECT 'Xyz' FROM DUAL UNION ALL
SELECT 'CM' FROM DUAL UNION ALL
SELECT '44' FROM DUAL UNION ALL
SELECT '1E3' FROM DUAL UNION ALL
SELECT '-1.2' FROM DUAL
Query:
WITH
FUNCTION isNumeric( value VARCHAR2 ) RETURN NUMBER
IS
n NUMBER;
BEGIN
n := TO_NUMBER( value );
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
SELECT Ccol,
TO_NUMBER( Ncol ) AS Ncol
FROM (
SELECT col1,
isNumeric( col1 ) AS isNumber,
ROW_NUMBER() OVER ( PARTITION BY isNumeric( col1 ) ORDER BY ROWNUM ) AS rn
FROM table_name
)
PIVOT ( MAX( Col1 ) FOR isNumber IN ( 0 AS Ccol, 1 AS Ncol ) )
ORDER BY rn
Output:
CCOL | NCOL
:--- | ---:
Xyz | 123
CM | 44
null | 1000
null | -1.2
db<>fiddle here
In earlier versions you can use CREATE FUNCTION rather than defining it in the query.
You can try this query:
WITH TAB1(COL1) AS
(
SELECT '123' FROM DUAL UNION ALL
SELECT 'Xyz' FROM DUAL UNION ALL
SELECT 'CM' FROM DUAL UNION ALL
SELECT '44' FROM DUAL
)
-- Actual query starts from here
, CTE AS (SELECT
COL1,
NUMERIC,
ROW_NUMBER() OVER(
PARTITION BY NUMERIC
ORDER BY LENGTH(COL1) DESC -- here I considered that Xyz and 123 both have length 3 and are related and same for CM and 44
) AS RN
FROM
(
SELECT
COL1,
CASE
WHEN REGEXP_LIKE ( COL1,
'^[[:digit:]]+$' ) THEN 'NUMBER'
ELSE 'NOT NUMBER'
END AS NUMERIC
FROM
TAB1
))
SELECT
C.COL1 AS "Ccol",
N.COL1 AS "Ncol"
FROM
CTE N
FULL OUTER JOIN CTE C ON ( N.RN = C.RN )
WHERE
N.NUMERIC = 'NUMBER'
AND C.NUMERIC = 'NOT NUMBER';
Output:
Cco Nco
--- ---
Xyz 123
CM 44
db<>fiddle demo
Cheers!!

Match characters in oracle sql where condition

I want to match Col1 from Table a to colum1 from table B.
A B
123 123-ab
234 234-bc
3443 3443-dd
However, value in table b has concatenated data. I want to match only the characters until before special character occurs(-).
I tried this : substr(table1.a,1,3) = substr(table2.b,1,3)
But this doesn’t work as some values have 4 digits.
use join and substr
select * from table_a
inner join table_b on table_a.col_a = substr(table_b.col_b, 1, length(table_a.col_a);
Using REGEXP_SUBSTR() to match on one or more numbers from the beginning of the string up to but not including the first hyphen:
SQL> with a(col1) as (
select '123' from dual union
select '234' from dual union
select '3443' from dual
),
b(col1) as (
select '123-ab' from dual union
select '234-bc' from dual union
select '3443-dd' from dual
)
select a.col1, b.col1
from a, b
where a.col1 = regexp_substr(b.col1, '^(\d+)-', 1, 1, NULL, 1);
COL1 COL1
---- -------
123 123-ab
234 234-bc
3443 3443-dd
SQL>

Resources