Replacing Text which does not match a pattern in Oracle - oracle
I have below text in a CLOB in a table
Table Name: tbl1
Columns
col1 - number (Primary Key)
col2 - clob (as below)
Row#1
-----
Col1 = 1
Col2 =
1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...
Row#2
-----
Col1 = 2
Col2 =
1331882981,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...
Now I need to know how we can get below output
Col1 Value
---- ---------------------
1 1331882981,ab123456
1 1331890329,pqr123223
2 1331882981,abc333
2 1331890329,pqrs23
([0-9]{10},[a-z 0-9]+.), ==> This is the regular expression to match "1331890329,pqrs23" and I need to know how can replace which are not matching this regex and then split them into multiple rows
EDIT#1
I am on Oracle 10.2.0.5.0 and hence cannot use REGEXP_COUNT function :-( Also, the col2 is a CLOB which is massive
EDIT#2
I've tried below query and it works fine for some records (i.e. if I add a "where" clause). But when I remove the "where", it never returns any result. I've tried to put this into a view and insert into a table and left it run overnight but still it had not completed :(
with t as (select col1, col2 from temp_table)
select col1,
cast(substr(regexp_substr(col2, '[^~]+', 1, level), 1, 50) as
varchar2(50)) data
from t
connect by level <= length(col2) - length(replace(col2, '~')) + 1
EDIT#3
# of Chars in Clob Total
----------- -----
0 - 1k 3196
1k - 5k 2865
5k - 25k 661
25k - 100k 36
> 100k 2
----------- -----
Grand Total 6760
I have ~7k rows of clobs which have the distribution as shown above...
Well, you could try something like:
with v as
(
select 1 col1, '1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...' col2 from dual
union all
select 2 col1, '133188298777,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...' col2 from dual
)
select distinct col1, regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level) split
from v
connect by level <= REGEXP_COUNT(col2, '([0-9]{10},[a-z0-9]+)')
order by col1
;
This gives:
1 1331882981,ab123456
1 1331890329,pqr123223
2 1331890329,pqrs23
2 3188298777,abc333
EDIT : for 10g, REGEXP_COUNT does not exist but you have workarounds. Here I replace the pattern found by something I hope I won't find in the text (here, XYZXYZ but you can choose something much more complex to be confident), do a diff with the same matching but replaced by the empty string, then divide by my pattern length (here, 6):
with v as
(
select 1 col1, '1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...' col2 from dual
union all
select 2 col1, '133188298777,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...' col2 from dual
)
select distinct col1, regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level) split
from v
connect by level <= (length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', 'XYZXYZ')) - length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', ''))) / 6
order by col1
;
EDIT 2 : CLOBs (and LOBs in general) and regexp don't seem to fit well together:
ORA-00932: inconsistent datatypes: expected - got CLOB
Converting the CLOG to a string (regexp_substr(to_char(col2), ...) seems to fix the issue.
EDIT 3 : CLOBs don't like distinct either, so converting split result to char in an embedded request and then using the distinct on the upper request succeeds !
select distinct col1, split from
(
select col1, to_char(regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level)) split
from temp_epn
connect by level <= (length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', 'XYZXYZ')) - length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', ''))) / 6
order by col1
);
The above solutions didn't work and below is what I did.
update temp_table set col2=regexp_replace(col2,'([0-9]{10},[a-z0-9]+)','(\1)') ;
update temp_table set col2=regexp_replace(col2,'\),[\s\S]*~\(','(\1)$');
update temp_table set col2=regexp_replace(col2,'\).*?\(','$');
update temp_table set col2=replace(regexp_replace(col2,'\).*',''),'(','');
After these 4 update commands, the col2 will have something like
1 1331882981,ab123456$1331890329,pqr123223
2 1331882981,abc333$1331890329,pqrs23
Then I wrote a function to split this thing. The reason I went for the function is to split by "$" and the fact that the col2 still has >10k characters
create or replace function parse( p_clob in clob ) return sys.odciVarchar2List
pipelined
as
l_offset number := 1;
l_clob clob := translate( p_clob, chr(13)|| chr(10) || chr(9), ' ' ) || '$';
l_hit number;
begin
loop
--Find occurance of "$" from l_offset
l_hit := instr( l_clob, '$', l_offset );
exit when nvl(l_hit,0) = 0;
--Extract string from l_offset to l_hit
pipe row ( substr(l_clob, l_offset , (l_hit - l_offset)) );
--Move offset
l_offset := l_hit+1;
end loop;
end;
I then called
select col1,
REGEXP_SUBSTR(column_value, '[^,]+', 1, 1) col3,
REGEXP_SUBSTR(column_value, '[^,]+', 1, 2) col4
from temp_table, table(parse(temp_table.col2));
Related
Remove coma separated string from another coma separated string in oracle
Column1 =A,B,C,D,E,F Column2 =C,D,A,F,C,B (It can have duplicates) I need to remove column2 values from column1 and get the missing value. Desired output (Column1)-(Column2) = E
Split columns' contents into rows, use MINUS set operator. Sample data in lines #1 - 3; query begins at line #4. SQL> with test (col1, col2) as 2 (select 'A,B,C,D,E,F', 'C,D,A,F,C,B' from dual 3 ) 4 select regexp_substr(col1, '[^,]+', 1, level) val 5 from test 6 connect by level <= regexp_count(col1, ',') + 1 7 minus 8 select regexp_substr(col2, '[^,]+', 1, level) val 9 from test 10 connect by level <= regexp_count(col2, ',') + 1 11 / VAL -------------------------------------------- E SQL> If you're comparing columns in a multi-row table, the above approach won't work OK as it'll retrieve duplicates and will be slow. In that case, rewrite it to SQL> with test (id, col1, col2) as 2 (select 1, 'A,B,C,D,E,F', 'C,D,A,F,C,B' from dual union all 3 select 2, 'A,B,C,D,E,F', 'A,B,B,B' from dual 4 ) 5 select id, listagg(val, ',') within group (order by val) missing_letters 6 from 7 ( 8 select id, 9 regexp_substr(col1, '[^,]+', 1, column_value) val 10 from test cross join 11 table(cast(multiset(select level from dual 12 connect by level <= regexp_count(col1, ',') + 1 13 ) as sys.odcinumberlist)) 14 minus 15 select id, 16 regexp_substr(col2, '[^,]+', 1, column_value) val 17 from test cross join 18 table(cast(multiset(select level from dual 19 connect by level <= regexp_count(col2, ',') + 1 20 ) as sys.odcinumberlist)) 21 ) 22 group by id; ID MISSING_LETTERS ---------- -------------------- 1 E 2 C,D,E,F SQL>
You may use translate function with additional cleanup logic to remove all remaining commas. This will work only for single character replacement (one character between commas), but doesn't require to split string into tokens and uses simple string functions. with a(col1, col2) as ( select 'A,B,C,D,E,F', 'C,D,A,F,C,B' from dual ) select /*Then remove leading and trailing commas*/ trim(',' from /*Then condense all intermediate commas and spaces*/ regexp_replace( /*Do actual replacement*/ translate(col1, replace(col2, ','), ' '), '[, ]+', ',' ) ) as res from a | RES | | :-- | | E | db<>fiddle here
You do not need to split the string. If your delimited values do not have any characters with special meaning in regular expressions then you can double-up the delimiters in col1 and then convert col2 to a regular expression and replace matches with an empty string and then remove the excess delimiters: SELECT col1, col2, TRIM( BOTH ',' FROM REPLACE( REGEXP_REPLACE( ',' || REPLACE(col1, ',', ',,') || ',', ',(' || REPLACE(col2, ',', '|') || '),' ), ',,', ',' ) ) AS missing FROM table_name; Which, for the sample data: CREATE TABLE table_name ( col1, col2 ) AS SELECT 'A,B,C,D,E,F', 'C,D,A,F,C,B' FROM DUAL UNION ALL SELECT 'A,AB,BA,B,', 'A,B' FROM DUAL; Outputs: COL1 COL2 MISSING A,B,C,D,E,F C,D,A,F,C,B E A,AB,BA,B, A,B AB,BA If you do have characters with special meaning then you can do a similar replacement using a recursive sub-query: WITH replacements ( col1, col2 ) AS ( SELECT ',' || REPLACE( col1, ',', ',,') || ',', col2 || ',' FROM table_name UNION ALL SELECT REPLACE(col1, ',' || SUBSTR(col2, 1, INSTR(col2, ','))), SUBSTR(col2, INSTR(col2, ',') + 1) FROM replacements WHERE col2 IS NOT NULL ) SELECT TRIM(BOTH ',' FROM REPLACE(col1, ',,', ',')) AS missing FROM replacements WHERE col2 IS NULL Which outputs: MISSING AB,BA E Note: both of these queries only require a single table scan. db<>fiddle here
Using ora:tokenize you could do something like this (including a few test cases in the with clause; you should remove it, and use your actual table and column names in the main query): with inputs (col1, col2) as ( select 'A,B,C,D,E,F', 'C,D,A,F,C,B' from dual union all select 'D,,F' , 'F,A' from dual union all select 'A,B,E,F' , 'E' from dual union all select 'ABC' , 'A,B,ABC' from dual ) -- END OF TEST DATA; QUERY BEGINS **BELOW THIS LINE** select i.col1, i.col2, l.diff from inputs i cross join lateral ( select listagg(token, ',') within group (order by null) as diff from xmltable('ora:tokenize(.,",")' passing i.col1 || ',' columns token varchar2(10) path '.') where not ',' || col2 || ',' like '%,' || token || ',%' ) l ; COL1 COL2 DIFF ----------- ----------- -------------------- A,B,C,D,E,F C,D,A,F,C,B E D,,F F,A D A,B,E,F E A,B,F ABC A,B,ABC
How to select second split of column data from oracle database
I want to select the data from a Oracle table, whereas the table columns contains the data as , [ex : key,value] separated values; so here I want to select the second split i.e, value table column data as below : column_data ++++++++++++++ asper,worse tincher,good golder null -- null values need to eliminate while selection www,ewe from the above data, desired output like below: column_data +++++++++++++ worse good golder ewe Please help me with the query
According to data you provided, here are two options: result1: regular expressions one (get the 2nd word if it exists; otherwise, get the 1st one) result2: SUBSTR + INSTR combination SQL> with test (col) as 2 (select 'asper,worse' from dual union all 3 select 'tincher,good' from dual union all 4 select 'golder' from dual union all 5 select null from dual union all 6 select 'www,ewe' from dual 7 ) 8 select col, 9 nvl(regexp_substr(col, '\w+', 1, 2), regexp_substr(col, '\w+', 1,1 )) result1, 10 -- 11 nvl(substr(col, instr(col, ',') + 1), col) result2 12 from test 13 where col is not null; COL RESULT1 RESULT2 ------------ -------------------- -------------------- asper,worse worse worse tincher,good good good golder golder golder www,ewe ewe ewe SQL>
REGEXP to capture values delimited by a set of delimiters
My column value looks something like below: [Just an example i created] {BASICINFOxxxFyyy100x} {CONTACTxxx12345yyy20202x} It can contain 0 or more blocks of data... I have created the below query to split the blocks with x as (select '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a from dual) select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1) from x connect by rownum <= REGEXP_COUNT(a,'x}') However I would like to further split the output into 3 columns like below: ColumnA | ColumnB | ColumnC ------------------------------ BASICINFO | F |100 CONTACT | 12345 |20202 The delimiters are always standard. I failed to create a pretty query which gives me the desired output. Thanks in advance.
SQL Fiddle Oracle 11g R2 Schema Setup: CREATE TABLE your_table ( str ) AS SELECT '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' from dual / Query 1: select REGEXP_SUBSTR( t.str, '\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}', 1, l.COLUMN_VALUE, NULL, 1 ) AS col1, REGEXP_SUBSTR( str, '\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}', 1, l.COLUMN_VALUE, NULL, 2 ) AS col2, REGEXP_SUBSTR( str, '\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}', 1, l.COLUMN_VALUE, NULL, 3 ) AS col3 FROM your_table t CROSS JOIN TABLE( CAST( MULTISET( SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= REGEXP_COUNT( t.str,'\{([^}]*?)xxx([^}]*?)yyy([^}]*?)x\}') ) AS SYS.ODCINUMBERLIST ) ) l Results: | COL1 | COL2 | COL3 | |-----------|-------|-------| | BASICINFO | F | 100 | | CONTACT | 12345 | 20202 | Note: Your query: select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1) from x connect by rownum <= REGEXP_COUNT(a,'x}') Will not work when you have multiple rows of input - In the CONNECT BY clause, the hierarchical query has nothing to restrict it connecting Row1-Level2 to Row1-Level1 or to Row2-Level1 so it will connect it to both and as the depth of the hierarchies gets greater it will create exponentially more duplicate copies of the output rows. There are hacks you can use to stop this but it is much more efficient to put the row generator into a correlated sub-query which can then be CROSS JOINed back to the original table (it is correlated so it won't join to the wrong rows) if you are going to use hierarchical queries. Better yet would be to fix your data structure so you are not storing multiple values in delimited strings.
SQL> with x as 2 (select '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a from dual 3 ), 4 y as ( 5 select REGEXP_SUBSTR(a,'({.*?x})',1,rownum,null,1) c1 6 from x 7 connect by rownum <= REGEXP_COUNT(a,'x}') 8 ) 9 select 10 substr(c1,2,instr(c1,'xxx')-2) z1, 11 substr(c1,instr(c1,'xxx')+3,instr(c1,'yyy')-instr(c1,'xxx')-3) z2, 12 rtrim(substr(c1,instr(c1,'yyy')+3),'x}') z3 13 from y; Z1 Z2 Z3 --------------- --------------- --------------- BASICINFO F 100 CONTACT 12345 20202
Here is another solution, which is derived from the place you left. Your query had already resulted into splitting of a row to 2 row. Below will make it in 3 columns: WITH x AS (SELECT '{BASICINFOxxxFyyy100x}{CONTACTxxx12345yyy20202x}' a FROM DUAL), -- Your query result here tbl AS ( SELECT REGEXP_SUBSTR (a, '({.*?x})', 1, ROWNUM, NULL, 1) Col FROM x CONNECT BY ROWNUM <= REGEXP_COUNT (a, 'x}')) --- Actual Query SELECT col, REGEXP_SUBSTR (col, '(.*?{)([^x]+)', 1, 1, '', 2) AS COL1, REGEXP_SUBSTR (REGEXP_SUBSTR (col, '(.*?)([^x]+)', 1, 2, '', 2), '[^y]+', 1, 1) AS COL2, REGEXP_SUBSTR (REGEXP_SUBSTR (col, '[^y]+x', 1, 2), '[^x]+', 1, 1) AS COL3 FROM tbl; Output: SQL> / COL COL1 COL2 COL3 ------------------------------------------------ ------------------------------------------------ ------------------------------------------------ ------------------------------------------------ {BASICINFOxxxFyyy100x} BASICINFO F 100 {CONTACTxxx12345yyy20202x} CONTACT 12345 20202
Oracle instr position
I have 15 char string and need to loop through pulling the position of occurrence of the letter 'a'. I was going to use a cursor to loop through the string, but wasn't sure how to save each positions occurrence.
Something like this to break the string into each character and then filter on your desired value? -- data setup to create a single value to test WITH dat as (select 'ABCDEACDFA' val from DUAL) -- SELECT lvl, strchr from ( -- query to break the string into individual characters, returning a row for each SELECT level lvl, substr(dat.val,level,1) strchr FROM dat CONNECT BY level <= length(val) ) WHERE strchr = 'A'; returns: LVL STRCHR 1 A 6 A 10 A
Here's a different method using one less select and a regex. I don't believe it will help your performance issue though. Please try it and let us know: SQL> with tbl(str) as ( select 'Aabjggaklkjha' from dual ) select level as position from tbl where upper(REGEXP_SUBSTR(str, '.', 1, level)) = 'A' connect by level <= length(str); POSITION ---------- 1 2 7 13 SQL>
oracle query to split the example#gmail.com into columns when ever special char is encountered
Here i have written code but that contains special characters also.But my requirement is ask for user to give a email dynamically and split that email when ever special chars occurs with out special characters i need the out put. col1 col2 col3 ------------------ example123 gmail com select substr('exapmle123#gmail.com',instr('example123#gmail.com','#'),instr('example123#gmail.com','.')) as col1 , substr('exapmle123#gmail.com',1,instr('example123#gmail.com','#')) as col2, substr('exapmle123#gmail.com',instr('example123#gmail.com','.'),length('example123#gmail.com')) as col3 from dual;
I suggest you to use REGEXP_SUBSTR for splitting strings Approach 1 In the example below, there is a row for every new word and row and colnumbers are part of the resultset. I suggest you to use this approach since you can not know the numbers of words/colummns beforehand Query1 with MyString as ( select 'exapmle123#gmail.com' Str, 1 rnum from dual ) ,pivot as ( Select Rownum Pnum From dual Connect By Rownum <= 100 ) SELECT REGEXP_SUBSTR (ms.Str,'([[:alnum:]])+',1,pv.pnum), ms.rnum, pv.pnum colnum FROM MyString ms ,pivot pv where REGEXP_SUBSTR (ms.Str,'([[:alnum:]])+',1,pv.pnum) is not null Result1 REGEXP_SUBSTR(MS.STR RNUM COLNUM -------------------- ---------- ---------- exapmle123 1 1 gmail 1 2 com 1 3 Approach 2 If you know how many words/columns you'll have, then you can use Query2 with MyString as ( select 'exapmle123#gmail.com' Str, 1 rnum from dual ) SELECT REGEXP_SUBSTR (ms.Str,'([[:alnum:]])+',1,1) col1, REGEXP_SUBSTR (ms.Str,'([[:alnum:]])+',1,2) col2, REGEXP_SUBSTR (ms.Str,'([[:alnum:]])+',1,3) col3 FROM MyString ms Result2 COL1 COL2 COL ---------- ----- --- exapmle123 gmail com