Oracle instr position - oracle

I have 15 char string and need to loop through pulling the position of occurrence of the letter 'a'. I was going to use a cursor to loop through the string, but wasn't sure how to save each positions occurrence.

Something like this to break the string into each character and then filter on your desired value?
-- data setup to create a single value to test
WITH dat as (select 'ABCDEACDFA' val from DUAL)
--
SELECT lvl, strchr
from (
-- query to break the string into individual characters, returning a row for each
SELECT level lvl, substr(dat.val,level,1) strchr
FROM dat
CONNECT BY level <= length(val)
) WHERE strchr = 'A';
returns:
LVL STRCHR
1 A
6 A
10 A

Here's a different method using one less select and a regex. I don't believe it will help your performance issue though. Please try it and let us know:
SQL> with tbl(str) as (
select 'Aabjggaklkjha' from dual
)
select level as position
from tbl
where upper(REGEXP_SUBSTR(str, '.', 1, level)) = 'A'
connect by level <= length(str);
POSITION
----------
1
2
7
13
SQL>

Related

How to reverse a string in Oracle (11g) SQL without using REVERSE() function

I am trying to reverse a string without using REVERSE function. I came across one example which is something like:
select listagg(letter) within group(order by lvl)
from
(SELECT LEVEL lvl, SUBSTR ('hello', LEVEL*-1, 1) letter
FROM dual
CONNECT BY LEVEL <= length('hello'));
Apart from this approach,is there any other better approach to do this?
If you're trying to avoid the undocumented reverse() function you could use the utl_raw.reverse() function instead, with appropriate conversion too and from RAW:
select utl_i18n.raw_to_char(
utl_raw.reverse(
utl_i18n.string_to_raw('Some string', 'AL32UTF8')), 'AL32UTF8')
from dual;
UTL_I18N.RAW_TO_CHAR(UTL_RAW.REVERSE(UTL_I18N.STRING_TO_RAW('SOMESTRING','AL32UT
--------------------------------------------------------------------------------
gnirts emoS
So that is taking an original value; doing utl_i18n.string_to_raw() on that; then passing that to utl_raw.reverse(); then passing the result of that back through utl_i18n.raw_to_char().
Not entirely sure how that will cope with multibyte characters, or what you'd want to happen to those anyway...
Or a variation from the discussion #RahulTripathi linked to, without the character set handling:
select utl_raw.cast_to_varchar2(utl_raw.reverse(utl_raw.cast_to_raw('Some string')))
from dual;
UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.REVERSE(UTL_RAW.CAST_TO_RAW('SOMESTRING')))
--------------------------------------------------------------------------------
gnirts emoS
But that thread also notes it only works for single-byte characters.
You could do it like this:
with strings as (select 'hello' str from dual union all
select 'fred' str from dual union all
select 'this is a sentance.' from dual)
select str,
replace(sys_connect_by_path(substr (str, level*-1, 1), '~|'), '~|') rev_str
from strings
where connect_by_isleaf = 1
connect by prior str = str --added because of running against several strings at once
and prior sys_guid() is not null --added because of running against several strings at once
and level <= length(str);
STR REV_STR
------------------- --------------------
fred derf
hello olleh
this is a sentance. .ecnatnes a si siht
N.B. I used a delimiter of ~| simply because that's something unlikely to be part of your string. You need to supply a non-null delimiter to the sys_connect_by_path, hence why I didn't just leave it blank!
SELECT LISTAGG(STR) WITHIN GROUP (ORDER BY RN DESC)
FROM
(
SELECT ROWNUM RN, SUBSTR('ORACLE',ROWNUM,1) STR FROM DUAL
CONNECT BY LEVEL <= LENGTH('ORACLE')
);
You can try using this function:
SQL> ed
Wrote file afiedt.buf
1 with t as (select 'Reverse' as txt from dual)
2 select replace(sys_connect_by_path(ch,'|'),'|') as reversed_string
3 from (
4 select length(txt)-rownum as rn, substr(txt,rownum,1) ch
5 from t
6 connect by rownum <= length(txt)
7 )
8 where connect_by_isleaf = 1
9 connect by rn = prior rn + 1
10* start with rn = 0
SQL> /
Source
select listagg(rev)within group(order by rownum)
from
(select substr('Oracle',level*-1,1)rev from dual
connect by level<=length('Oracle'));

Oracle PL/SQL Tokenize String with empty position

I've a String like this:
AAA,BBB,,DDD
And i would like to tokenize it using comma and retrieve a table like this:
VALUE LEVEL
AAA 1
BBB 2
(null) 3
DDD 4
I need to know the String and in witch position i found it, without missing null String.
I've tried a code like this but i miss the empty position:
SELECT regexp_substr ('AAA,BBB,,DDD', '[^,]+', 1, level), level
FROM dual
CONNECT BY LEVEL <= LENGTH(regexp_replace ('AAA,BBB,,DDD', '[^,]+'));
The output is this:
VALUE LEVEL
AAA 1
BBB 2
DDD 3
Another simple answer is that replacing comma(,) by space with comma(, ) like below
SELECT trim(regexp_substr (replace('AAA,BBB,,DDD',',',', '), '[^,]+', 1, level)), level
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT (replace('AAA,BBB,,DDD',',',', '), '[^,]+');
this also works http://sqlfiddle.com/#!4/b255d/26
SELECT token, lvl FROM (
SELECT regexp_substr ('AAA,BBB,,DDD', '[^,]*', 1, LEVEL) token, LEVEL lvl,
lag(regexp_substr ('AAA,BBB,,DDD', '[^,]*', 1, LEVEL)) over(order by level) prev_token
FROM dual
CONNECT BY LEVEL <= LENGTH(regexp_replace ('AAA,BBB,,DDD', '[^,]+'))*2
) WHERE prev_token is null;
in oracle 11g and upper you can do something like this query:
with
tab1(pointer,test,split_test) as
(select
1 as pointer,test,substr(test,0,case when instr(test,',',1,1) = 0 then LENGTH(test)
else instr(test,',',1,1)-1 end) split_test from table1
union all
select
pointer + 1 as pointer,test,
substr(test,instr(test,',',1,pointer) + 1,case when instr(test,',',1,pointer + 1) = 0 then LENGTH(test) else
instr(test,',',1,pointer + 1) - instr(test,',',1,pointer) - 1 end) split_test
from tab1 where pointer - 1 < LENGTH(test)-LENGTH(REPLACE(test,',','')))
select split_test as "value",pointer as "level" from tab1;
SQL Fiddle

Replacing Text which does not match a pattern in Oracle

I have below text in a CLOB in a table
Table Name: tbl1
Columns
col1 - number (Primary Key)
col2 - clob (as below)
Row#1
-----
Col1 = 1
Col2 =
1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...
Row#2
-----
Col1 = 2
Col2 =
1331882981,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...
Now I need to know how we can get below output
Col1 Value
---- ---------------------
1 1331882981,ab123456
1 1331890329,pqr123223
2 1331882981,abc333
2 1331890329,pqrs23
([0-9]{10},[a-z 0-9]+.), ==> This is the regular expression to match "1331890329,pqrs23" and I need to know how can replace which are not matching this regex and then split them into multiple rows
EDIT#1
I am on Oracle 10.2.0.5.0 and hence cannot use REGEXP_COUNT function :-( Also, the col2 is a CLOB which is massive
EDIT#2
I've tried below query and it works fine for some records (i.e. if I add a "where" clause). But when I remove the "where", it never returns any result. I've tried to put this into a view and insert into a table and left it run overnight but still it had not completed :(
with t as (select col1, col2 from temp_table)
select col1,
cast(substr(regexp_substr(col2, '[^~]+', 1, level), 1, 50) as
varchar2(50)) data
from t
connect by level <= length(col2) - length(replace(col2, '~')) + 1
EDIT#3
# of Chars in Clob Total
----------- -----
0 - 1k 3196
1k - 5k 2865
5k - 25k 661
25k - 100k 36
> 100k 2
----------- -----
Grand Total 6760
I have ~7k rows of clobs which have the distribution as shown above...
Well, you could try something like:
with v as
(
select 1 col1, '1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...' col2 from dual
union all
select 2 col1, '133188298777,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...' col2 from dual
)
select distinct col1, regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level) split
from v
connect by level <= REGEXP_COUNT(col2, '([0-9]{10},[a-z0-9]+)')
order by col1
;
This gives:
1 1331882981,ab123456
1 1331890329,pqr123223
2 1331890329,pqrs23
2 3188298777,abc333
EDIT : for 10g, REGEXP_COUNT does not exist but you have workarounds. Here I replace the pattern found by something I hope I won't find in the text (here, XYZXYZ but you can choose something much more complex to be confident), do a diff with the same matching but replaced by the empty string, then divide by my pattern length (here, 6):
with v as
(
select 1 col1, '1331882981,ab123456,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqr123223,Some more text...' col2 from dual
union all
select 2 col1, '133188298777,abc333,Some text here
which can run multiple lines and have a lot of text...
~1331890329,pqrs23,Some more text...' col2 from dual
)
select distinct col1, regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level) split
from v
connect by level <= (length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', 'XYZXYZ')) - length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', ''))) / 6
order by col1
;
EDIT 2 : CLOBs (and LOBs in general) and regexp don't seem to fit well together:
ORA-00932: inconsistent datatypes: expected - got CLOB
Converting the CLOG to a string (regexp_substr(to_char(col2), ...) seems to fix the issue.
EDIT 3 : CLOBs don't like distinct either, so converting split result to char in an embedded request and then using the distinct on the upper request succeeds !
select distinct col1, split from
(
select col1, to_char(regexp_substr(col2, '([0-9]{10},[a-z 0-9]+)', 1, level)) split
from temp_epn
connect by level <= (length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', 'XYZXYZ')) - length(REGEXP_REPLACE(col2, '([0-9]{10},[a-z 0-9]+)', ''))) / 6
order by col1
);
The above solutions didn't work and below is what I did.
update temp_table set col2=regexp_replace(col2,'([0-9]{10},[a-z0-9]+)','(\1)') ;
update temp_table set col2=regexp_replace(col2,'\),[\s\S]*~\(','(\1)$');
update temp_table set col2=regexp_replace(col2,'\).*?\(','$');
update temp_table set col2=replace(regexp_replace(col2,'\).*',''),'(','');
After these 4 update commands, the col2 will have something like
1 1331882981,ab123456$1331890329,pqr123223
2 1331882981,abc333$1331890329,pqrs23
Then I wrote a function to split this thing. The reason I went for the function is to split by "$" and the fact that the col2 still has >10k characters
create or replace function parse( p_clob in clob ) return sys.odciVarchar2List
pipelined
as
l_offset number := 1;
l_clob clob := translate( p_clob, chr(13)|| chr(10) || chr(9), ' ' ) || '$';
l_hit number;
begin
loop
--Find occurance of "$" from l_offset
l_hit := instr( l_clob, '$', l_offset );
exit when nvl(l_hit,0) = 0;
--Extract string from l_offset to l_hit
pipe row ( substr(l_clob, l_offset , (l_hit - l_offset)) );
--Move offset
l_offset := l_hit+1;
end loop;
end;
I then called
select col1,
REGEXP_SUBSTR(column_value, '[^,]+', 1, 1) col3,
REGEXP_SUBSTR(column_value, '[^,]+', 1, 2) col4
from temp_table, table(parse(temp_table.col2));

CASE Oracle SQL for State

I have a field that can have one or multiple states listed in it (callcenter.stateimpact). If the callcenter.stateimpact contains "OK","TX","AK","TN","NC","SC","GA","FL","AL","MS" or "LA" I need the output field of the SQL to say "South" and if not those, the output needs to say "North". If the callcenter.stateimpact has both South & North states, it needs to say "BOTH" in the output. How do I do this in the Select statement? The fields in this table are callcenter.callid, callcenter.stateimpact, callcenter.callstart and callcenter.callstop. You help is greatly appreciated.
This is tough to explain, so there's a SQL Fiddle here that lays out the values involved.
The best approach I could come up with (other than normalizing the StateImpact value) was to use REGEXP_REPLACE to suck all the "South" states out of the string and then look at the length of what was left. First, here's what REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)') will do to a few sample values:
StateImpact REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)')
----------------------------- -----------------------------------------------------------------
OK,TX,AK,TN,NC,SC,GA,FL,AL,MS ,,,,,,,,,
MI,MA MI,MA
TX null
TX,MI,MA ,MI,MA
So if you're left with all commas or with a null, all the states were South. If you're left with the original string, all states were North. Anything else and it's Both. That makes for a pretty big and confusing CASE statement no matter how you write it. I went with comparing lengths before and after, like so:
Length after replace = 0 (or null): South
Length after replace = (length before + 1) * 3 - 1: South
Length after replace = length before replace: North
Anything else: Both
The second one above is just some math to account for the fact that if (for example) there are five states in StateImpact and they're all South, you'll be left with four commas. Hard to explain but it works :)
Here's the query:
SELECT
StateImpact,
CASE NVL(LENGTH(REGEXP_REPLACE(StateImpact, '(OK|TX|AK|TN|NC|SC|GA|FL|AL|MS|LA)')), 0)
WHEN LENGTH(StateImpact) THEN 'North'
WHEN (LENGTH(StateImpact) + 1) / 3 - 1 THEN 'South'
ELSE 'Both'
END AS RegionImpact
FROM CallCenter
The SQL Fiddle referenced above also shows the length before and after the REGEXP_REPLACE, which will hopefully help explain the calculations.
One of the ways to reach desired result is to use multiset operators.
But first we need to break string separated by , into rows. One of the way to do that is trick with connect by :
-- Trick with building resultset from tokenized string
with dtest_string as (
select 'OK,TX,AK,TN,NC,SC,GA,FL,AL,MS' StateImpact from dual
)
select
level lvl,
substr( -- Extract part of source string
StateImpact,
-- from N-th occurence of separator
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 ),
-- with length of substring from N-th to (N+1)-th occurence of separator or to the end.
decode( instr(StateImpact,',',1,level), 0, length(StateImpact)+1, instr(StateImpact,',',1,level) )
-
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 )
) code
from test_string
start with
StateImpact is not null -- no entries for empty string
connect by
instr(StateImpact,',',1,level-1) > 0 -- continue if separator found on previous step
Just for fun: same trick with ANSI syntax on SQLFiddle
Next, we need to declare type which we can use to store collections:
create or replace type TCodeList as table of varchar2(100);
After that it's possible to build a query:
with all_south_list as (
-- prepare list of south states
select 'OK' as code from dual union all
select 'TX' as code from dual union all
select 'AK' as code from dual union all
select 'TN' as code from dual union all
select 'NC' as code from dual union all
select 'SC' as code from dual union all
select 'GA' as code from dual union all
select 'FL' as code from dual union all
select 'AL' as code from dual union all
select 'MS' as code from dual union all
select 'LA' as code from dual
)
select
StateImpact,
-- Make decision based on counts
case
when total_count = 0 then 'None'
when total_count = south_count then 'South'
when south_count = 0 then 'North'
else 'Both'
end RegionImpact,
total_count,
south_count,
north_count
from (
select
StateImpact,
-- count total number of states in StateImpact
cardinality(code_list) total_count,
-- count number of south states in StateImpact
cardinality(code_list multiset intersect south_list) south_count,
-- count number of non-south states in StateImpact
cardinality(code_list multiset except south_list) north_count
from (
select
StateImpact,
(
cast(multiset( -- Convert set of values into collection which acts like a nested table
select -- same trick as above
substr(
StateImpact,
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 ),
decode( instr(StateImpact,',',1,level), 0, length(StateImpact)+1, instr(StateImpact,',',1,level) )
-
decode( level, 1, 1, instr(StateImpact,',',1,level-1)+1 )
) code
from dual
start with StateImpact is not null
connect by instr(StateImpact,',',1,level-1) > 0
) as TCodeList
)
) code_list,
-- Build collection from south states list
cast(multiset(select code from all_south_list) as TCodeList) south_list
from
CallCenter
)
)
Link to SQLFiddle

Optimizing row by row (cursor) processing in Oracle 11g

I have to process a large table (2.5B records) row by row in order to keep track of two variables. As one can imagine, this is quite slow. I am looking for ideas on how to tune this procedure. Thank you.
declare
cursor c_data is select /* +index(data data_pk) */ * from data order by data_id;
r_data c_data%ROWTYPE;
lst_b_prc number(15,8);
lst_a_prc number(15,8);
begin
open c_data;
loop
fetch c_data into r_data;
exit when c_data%NOTFOUND;
if r_data.BATS = 'B' then
lst_b_prc := r_data.PRC;
end if;
if r_data.BATS = 'A' then
lst_a_prc := r_data.PRC;
end if;
if r_data.BATS = 'T' then
insert into trans .... lst_a_prc , lst_b_prc
end if;
end loop;
close c_data;
end;
The issue really comes down to finding efficient sql to track the latest PRC value when BATS='A' and BATS='B' for each BATS='T' record.
If I understand your problem correctly, with a table of data like this:
create table data as
select 1 data_id, 'T' bats, 1 prc from dual union all
select 2 data_id, 'A' bats, 2 prc from dual union all
select 3 data_id, 'B' bats, 3 prc from dual union all
select 4 data_id, 'T' bats, 4 prc from dual union all
select 5 data_id, 'A' bats, 5 prc from dual union all
select 6 data_id, 'T' bats, 6 prc from dual union all
select 7 data_id, 'B' bats, 7 prc from dual union all
select 8 data_id, 'T' bats, 8 prc from dual union all
select 9 data_id, 'T' bats, 9 prc from dual;
You you want to insert one row for each T, using the last PRC value for A and B. Which would look something like this:
T data_id Last A Last B
--------- ------ ------
1 null null
4 2 3
6 5 3
8 5 7
9 5 7
This query should work:
select data_id, last_A, last_B
from
(
select data_id, bats, prc
,max(case when bats = 'A' then prc else null end) over
(order by data_id
rows between unbounded preceding and current row) last_A
,max(case when bats = 'B' then prc else null end) over
(order by data_id
rows between unbounded preceding and current row) last_B
from data
)
where bats = 'T';
With so much data, you'll probably want to use direct path writes and parallelism.
The performance will largely depend on whether the sorting for the analytic functions can be done in memory or on disk. Optimizing memory can be very difficult, you'll probably need to work with a DBA to allow your process to use as much memory as possible without causing problems for other processes.
There are several options. Most importantly, you're probably keeping a huge UNDO/REDO log for all your inserts. You could occasionally commit your work, say every 1000 inserts.
Another option is to use a SQL MERGE statement (or simpler INSERT .. SELECT .. statement), that will allow your Oracle instance to operate on sets rather than on single records. The execution plan of your select might be optimised for optimal INSERT performance.

Resources