For a given tablespace, why doesn't the sum of bytes in dba_extents equal the sum of bytes in dba_segments? (additional questions after sample script.)
SQL> with
"SEG" as
( select 'segment_bytes' what
, to_char(sum(bytes), '9,999,999,999,999') bytes
from dba_segments
where tablespace_name = 'MYDATA'
)
, "EXT" as
( select 'extent_bytes' what
, to_char(sum(bytes), '9,999,999,999,999') bytes
from dba_extents
where tablespace_name = 'MYDATA'
)
, "FS" as
( select tablespace_name
, sum(bytes) free_bytes
from dba_free_space
where tablespace_name = 'MYDATA'
group by tablespace_name
),
"DF" as
( select tablespace_name
, sum(bytes) alloc_bytes
, sum(user_bytes) user_bytes
from dba_data_files
where tablespace_name = 'MYDATA'
group by tablespace_name
)
select what, bytes from SEG
union all select 'datafile_bytes-freespace' what
, to_char(alloc_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_file_bytes
from DF
left join FS
on DF.tablespace_name = FS.tablespace_name
union all select 'datafile_userbytes-freespace' what
, to_char(user_bytes - nvl(free_bytes, 0), '9,999,999,999,999') used_user_bytes
from DF
left join FS
on DF.tablespace_name = FS.tablespace_name
union all select what, bytes from EXT
;
WHAT BYTES
---------------------------- ------------------
segment_bytes 2,150,514,819,072
datafile_bytes-freespace 2,150,528,540,672
datafile_userbytes-freespace 2,150,412,845,056
extent_bytes 2,150,412,845,056
4 rows selected.
I would have expected segment_bytes to equal either extent_bytes or datafile_bytes-freespace, but it falls somewhere in between.
Is segment_bytes more than extent_bytes due to segment "overhead" (keeping track of all of the extents)?
If so, then is it also true that this segment "overhead" is part of the datafile "overhead"?
Oracle 19.1 Enterprise Edition. Thanks in advance.
For example, the difference between dba_segments and dba_extents might be in the objects from recyclebin: please look at the results from my test database:
with
seg as (
select segment_name,sum(bytes) b1
from dba_segments
group by segment_name
)
,ext as (
select segment_name,sum(bytes) b2
from dba_extents
group by segment_name
)
select
seg.segment_name seg1
,ext.segment_name seg2
,b1,b2
from seg full outer join ext on seg.segment_name=ext.segment_name
where lnnvl(b1=b2)
order by 1,2;
Results:
SEG1 SEG2 B1 B2
------------------------------ ------------------------------ ---------- ----------
BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 65536
BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 65536
_SYSSMU10_2262159254$ _SYSSMU10_2262159254$ 0 4325376
_SYSSMU1_3588498444$ _SYSSMU1_3588498444$ 0 3276800
_SYSSMU2_2971032042$ _SYSSMU2_2971032042$ 0 2228224
_SYSSMU3_3657342154$ _SYSSMU3_3657342154$ 0 2228224
_SYSSMU4_811969446$ _SYSSMU4_811969446$ 0 2293760
_SYSSMU5_3018429039$ _SYSSMU5_3018429039$ 0 3276800
_SYSSMU6_442110264$ _SYSSMU6_442110264$ 0 2228224
_SYSSMU7_2728255665$ _SYSSMU7_2728255665$ 0 2097152
_SYSSMU8_801938064$ _SYSSMU8_801938064$ 0 2228224
_SYSSMU9_647420285$ _SYSSMU9_647420285$ 0 3276800
12 rows selected.
As you can see first 2 rows are objects from recyclebin, so you can run the same query and check if your objects are in recyclebin too. They are not visible in dba_extents, because they filtered out by segment_flag:
select text_vc from dba_views where view_name='DBA_EXTENTS';
select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file#
from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f
where e.segfile# = ds.relative_fno
and e.segblock# = ds.header_block
and e.ts# = ds.tablespace_id
and e.ts# = f.ts#
and e.file# = f.relfile#
and bitand(NVL(ds.segment_flags,0), 1) = 0
and bitand(NVL(ds.segment_flags,0), 65536) = 0
union all
select
ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ktfbueextno, f.file#, e.ktfbuebno,
e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno
from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f
where e.ktfbuesegfno = ds.relative_fno
and e.ktfbuesegbno = ds.header_block
and e.ktfbuesegtsn = ds.tablespace_id
and ds.tablespace_id = f.ts#
and e.ktfbuefno = f.relfile#
and bitand(NVL(ds.segment_flags, 0), 1) = 1
and bitand(NVL(ds.segment_flags,0), 65536) = 0;
So if we comment out those predicates (bitand(NVL(segment_flags,0)....) and check our difference (BIN$... and _SYSSMU... objects), we will find which predicates filter them out:
with
my_dba_extents(
OWNER,SEGMENT_NAME,PARTITION_NAME
,SEGMENT_TYPE,TABLESPACE_NAME,EXTENT_ID,FILE_ID
,BLOCK_ID,BYTES,BLOCKS,RELATIVE_FNO
,segment_flags)
as (
select ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ext#, f.file#, e.block#, e.length * ds.blocksize, e.length, e.file#
,segment_flags
from sys.uet$ e, sys.sys_dba_segs ds, sys.file$ f
where e.segfile# = ds.relative_fno
and e.segblock# = ds.header_block
and e.ts# = ds.tablespace_id
and e.ts# = f.ts#
and e.file# = f.relfile#
-- and bitand(NVL(ds.segment_flags,0), 1) = 0
-- and bitand(NVL(ds.segment_flags,0), 65536) = 0
union all
select
ds.owner, ds.segment_name, ds.partition_name, ds.segment_type,
ds.tablespace_name,
e.ktfbueextno, f.file#, e.ktfbuebno,
e.ktfbueblks * ds.blocksize, e.ktfbueblks, e.ktfbuefno
,segment_flags
from sys.sys_dba_segs ds, sys.x$ktfbue e, sys.file$ f
where e.ktfbuesegfno = ds.relative_fno
and e.ktfbuesegbno = ds.header_block
and e.ktfbuesegtsn = ds.tablespace_id
and ds.tablespace_id = f.ts#
and e.ktfbuefno = f.relfile#
-- and bitand(NVL(ds.segment_flags, 0), 1) = 1
-- and bitand(NVL(ds.segment_flags,0), 65536) = 0
)
select
segment_name
,bitand(NVL(segment_flags, 0), 1) as predicate_1
,bitand(NVL(segment_flags,0), 65536) as predicate_2
,case when bitand(NVL(segment_flags,0), 1) = 0 then 'y' else 'n' end pred_1_res
,case when bitand(NVL(segment_flags,0), 65536) = 0 then 'y' else 'n' end pred_2_res
from my_dba_extents e
where e.segment_name like 'BIN%'
or e.segment_name like '_SYSSMU%';
SEGMENT_NAME PREDICATE_1 PREDICATE_2 PRED_1_RES PRED_2_RES
------------------------------ ----------- ----------- -------------- --------------
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU1_3588498444$ 1 0 n y
_SYSSMU2_2971032042$ 1 0 n y
_SYSSMU2_2971032042$ 1 0 n y
...
_SYSSMU10_2262159254$ 1 0 n y
_SYSSMU10_2262159254$ 1 0 n y
_SYSSMU10_2262159254$ 1 0 n y
BIN$xi7yNJwGcIrgUwIAFaxDaA==$0 1 65536 n n
BIN$xi7yNJwFcIrgUwIAFaxDaA==$0 1 65536 n n
Re "datafile_bytes-freespace": Don't forget that each datafile has own header, so nor dba_segments, nor dba_extents should not count it.
PS. Other 10 rows are undo segments, but that is not your case since your query checks just your MYDATA tablespace, not UNDO.
Related
I have 2 tables:
table1
no
a
b
c
x1
2
3
4
x2
10
11
12
x3
20
21
22
table2
from_val
in_out
cf_pv
term
a
out
cf
b
b
out
pv
b
c
in
cf
e
Define sum_out is sum of a, b, c in table1 with condition in_out='out' in table2 and sum_cf is sum of a, b, c in table1 with condition cf_pv='cf' in table2.
Shortly, values of from_val in table2 are columns name i.e. a, b, c in table1.
How can I extract and calculate sum_out or sum_cf of every no in Oracle?
sum_out of x1 = 2 + 3
sum_out of x2 = 10 + 11
sum_out of x3 = 20 + 21
sum_cf of x1 = 2 + 4
sum_cf of x2 = 10 + 12
sum_cf of x3 = 20 + 22
Thanks!
'''''''''''''''''''''''''''''''''''''''''''''
in additional,
i want to calculate
sum_out and cf of x1= 2 (=a)
sum_out and cf of x2= 10 (=b)
sum_out and cf of x3= 20 (=c)
Sample data
WITH
tbl_1 AS
(
Select 'x1' "COL_NO", 2 "A", 3 "B", 4 "C" From Dual Union All
Select 'x2' "COL_NO", 10 "A", 11 "B", 12 "C" From Dual Union All
Select 'x3' "COL_NO", 20 "A", 21 "B", 22 "C" From Dual
),
tbl_2 AS
(
Select 'A' "FROM_VAL", 'out' "IN_OUT", 'cf' "CF_PV", 'begin' "TERM" From Dual Union All
Select 'B' "FROM_VAL", 'out' "IN_OUT", 'pv' "CF_PV", 'begin' "TERM" From Dual Union All
Select 'C' "FROM_VAL", 'in' "IN_OUT", 'cf' "CF_PV", 'end' "TERM" From Dual
),
Create CTE (formulas) that generates formulas for IN_OUT = 'out' and For CF_PV = 'cf'
formulas AS
(
Select
CASE WHEN IN_OUT = 'out' THEN IN_OUT END "IN_OUT",
LISTAGG(FROM_VAL, ' + ') WITHIN GROUP (ORDER BY FROM_VAL) OVER(PARTITION BY IN_OUT) "IN_OUT_FORMULA",
CASE WHEN CF_PV = 'cf' THEN CF_PV END "CF_PV",
LISTAGG(FROM_VAL, ' + ') WITHIN GROUP (ORDER BY FROM_VAL) OVER(PARTITION BY CF_PV) "CF_PV_FORMULA"
From
tbl_2
),
IN_OUT
IN_OUT_FORMULA
CF_PV
CF_PV_FORMULA
C
cf
A + C
out
A + B
cf
A + C
out
A + B
B
Another CTE (grid) to connect COL_NO to formulas
grid AS
(
Select
t1.COL_NO,
CASE WHEN f1.IN_OUT = 'out' THEN f1.IN_OUT END "IN_OUT", CASE WHEN f1.IN_OUT = 'out' THEN f1.IN_OUT_FORMULA END "IN_OUT_FORMULA",
CASE WHEN f1.CF_PV = 'cf' THEN f1.CF_PV END "CF_PV", CASE WHEN f1.CF_PV = 'cf' THEN f1.CF_PV_FORMULA END "CF_PV_FORMULA"
From
tbl_1 t1
Left Join
formulas f1 ON(f1.IN_OUT Is Not Null AND f1.CF_PV Is Not Null)
)
COL_NO
IN_OUT
IN_OUT_FORMULA
CF_PV
CF_PV_FORMULA
x1
out
A + B
cf
A + C
x2
out
A + B
cf
A + C
x3
out
A + B
cf
A + C
Main SQL to get the final result
SELECT
g.COL_NO,
g.IN_OUT,
g.IN_OUT_FORMULA,
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'A') > 0 THEN A ELSE 0 END +
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'B') > 0 THEN B ELSE 0 END +
CASE WHEN g.IN_OUT = 'out' And INSTR(IN_OUT_FORMULA, 'C') > 0 THEN C ELSE 0 END "CALC_OUT",
--
g.CF_PV,
g.CF_PV_FORMULA,
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'A') > 0 THEN A ELSE 0 END +
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'B') > 0 THEN B ELSE 0 END +
CASE WHEN g.CF_PV = 'cf' And INSTR(CF_PV_FORMULA, 'C') > 0 THEN C ELSE 0 END "CALC_CF"
FROM
grid g
INNER JOIN
tbl_1 t1 ON(g.COL_NO = t1.COL_NO)
R e s u l t :
COL_NO
IN_OUT
IN_OUT_FORMULA
CALC_OUT
CF_PV
CF_PV_FORMULA
CALC_CF
x1
out
A + B
5
cf
A + C
6
x2
out
A + B
21
cf
A + C
22
x3
out
A + B
41
cf
A + C
42
I have these data in 3 tables:
table 1: BU
BU_CODE
ARCHIVE_FLG
1001
Y
1002
Y
1003
Y
1004
N
1005
Y
table 2: STG_ACCOUNT
BU_CODE
ACCOUNT_ID
1001
A0001
1001
A0003
1002
A0002
table 3: STG_CONTRACT
BU_CODE
CONTRACT_ID
1002
C0001
1002
C0002
These 2 queries work fine:
Query 1:
SELECT
T2.BU_CODE, COUNT(T1.ACCOUNT_ID) AS COUNT_OF_ACCOUNT
FROM STG_ACCOUNT T1
FULL JOIN S_BU T2 ON T2.BU_CODE = T1.BU_CODE
WHERE T2.ARCHIVE_FLG = '1'
GROUP BY T2.BU_CODE
ORDER BY T2.BU_CODE;
BU_CODE
COUNT_OF_ACCOUNT
1001
2
1002
1
1003
0
1005
0
Query 2:
SELECT
T2.BU_CODE, COUNT(T1.CONTRACT_ID) AS COUNT_OF_CONTRACT
FROM STG_CONTRACT T1
FULL JOIN S_BU T2 ON T2.BU_CODE = T1.BU_CODE
WHERE T2.ARCHIVE_FLG = '1'
GROUP BY T2.BU_CODE
ORDER BY T2.BU_CODE;
BU_CODE
COUNT_OF_CONTRACT
1001
0
1002
2
1003
0
1005
0
Now I would like to merge the result of these 2 queries to show a more elegant output:
BU_CODE
COUNT_OF_ACCOUNT
COUNT_OF_CONTRACT
1001
2
0
1002
1
2
1003
0
0
1005
0
0
What Oracle SQL function can help me?
One option might be using CTE expressions
with x as
(
SELECT
T2.BU_CODE, COUNT(T1.ACCOUNT_ID) AS COUNT_OF_ACCOUNT
FROM STG_ACCOUNT T1
FULL JOIN S_BU T2 ON T2.BU_CODE = T1.BU_CODE
WHERE T2.ARCHIVE_FLG = '1'
GROUP BY T2.BU_CODE
ORDER BY T2.BU_CODE
),
y as
(
SELECT
T2.BU_CODE, COUNT(T1.CONTRACT_ID) AS COUNT_OF_CONTRACT
FROM STG_CONTRACT T1
FULL JOIN S_BU T2 ON T2.BU_CODE = T1.BU_CODE
WHERE T2.ARCHIVE_FLG = '1'
GROUP BY T2.BU_CODE
ORDER BY T2.BU_CODE
)
select x.bu_code , x.count_of_account, y.count_of_contract
from x join y on x.bu_code=y.bu_code
You can join both tables.
SELECT
T1.BU_CODE AS BU_CODE, COUNT(DISTINCT T2.ACCOUNT_ID) AS COUNT_OF_ACCOUNT, COUNT(DISTINCT T3.CONTRACT_ID) AS COUNT_OF_CONTRACT
FROM S_BU T1
LEFT JOIN STG_ACCOUNT T2 ON T1.BU_CODE = T2.BU_CODE
LEFT JOIN STG_CONTRACT T3 ON T1.BU_CODE = T3.BU_CODE
WHERE T1.ARCHIVE_FLG = '1'
GROUP BY T1.BU_CODE
ORDER BY T1.BU_CODE;
I have the following table - with a lot of rows -
ID A_1 B_1 A_2 B_2 A_3 B_3
-- ---- --- --- ---- --- ---
1 0 0 0 0 0 0
2 1 0 0 0 0 0
I need to get the following output table -
the rows will be ID, A_1, B_1 and so on.
ID A B
--- -- --
1 0 0
1 0 0
1 0 0
2 1 0
2 0 0
2 0 0
I tried with union, unpivot - I get only one row for each ID instead three.
How can I do this?
You have to use Union all to include the duplicate:
Demo
Result:
SELECT ID, A_1 A, B_1 B FROM TABLE1
UNION ALL
SELECT ID, A_2, B_2 FROM TABLE1
UNION ALL
SELECT ID, A_3, B_3 FROM TABLE1 ORDER BY ID;
select *
from t
unpivot (
(A,B)
for z in (
(A_1, B_1),
(A_2 , B_2),
(A_3, B_3)
)
);
Full test case with the results:
with t (ID, A_1, B_1, A_2 , B_2 , A_3, B_3) as (
select 1, 0 , 0, 0, 0, 0, 0 from dual union all
select 2, 1 , 0, 0, 0, 0, 0 from dual
)
select *
from t
unpivot (
(A,B)
for z in (
(A_1, B_1),
(A_2 , B_2),
(A_3, B_3)
)
);
Results:
ID Z A B
---------- ------- ---------- ----------
1 A_1_B_1 0 0
1 A_2_B_2 0 0
1 A_3_B_3 0 0
2 A_1_B_1 1 0
2 A_2_B_2 0 0
2 A_3_B_3 0 0
I have a table with data shown below
no s d
100 I D
100 C D
101 C null
101 I null
102 C D
102 I null
then i'm using this query to partition
create table pinky nologging as
select no,status,dead
from(select no,status,dead,
row_number() over(partition by no order by dead desc) seq
from PINK) d
where seq = 1;
i'm getting this results
100 I D
101 C null
102 I null
but i want data like shown below
100 C D
101 I NULL
102 I NULL
i.e,
FOR I AND C COMBINATION and both d column is D then pick C
FOR I AND C COMBINATION and both d column is null then pick I
FOR I AND C COMBINATION and d column is null AND d then pick null corresponding value
Assuming that there can be only one dead is null record for each no:
with
-- you data, remove it when running the query
-- in your environment ...
pink (no, status, dead) as
(select 100, 'I', 'D' from dual union
select 100, 'C', 'D' from dual union
select 101, 'C', null from dual union
select 101, 'I', null from dual union
select 102, 'C', 'D' from dual union
select 102, 'I', null from dual
),
-- ... end of you data
--
temp as ( -- a temporary table (CTE) where we make some
-- preliminary calculations
select pink.*,
-- count rows with status = 'I' for this no
sum(case when status = 'I' then 1 else 0 end) over(partition by no) ni,
-- count rows with status = 'C' for this no
sum(case when status = 'C' then 1 else 0 end) over(partition by no) nc,
-- count rows with dead = 'D' for this no
sum(case when dead = 'D' then 1 else 0 end) over(partition by no) nd,
-- total number of rows (in case it's not always = 2)
count(*) over(partition by no) n
from pink
)
select no, status, dead
from temp
where -- pick 'C' if there's also 'I' and all dead = 'D'
status = 'C' and ni > 0 and nd = n
-- pick 'I' if there's also 'C' and all dead is null
or status = 'I' and nc > 0 and nd = 0
-- pick dead is null if there are I's and C's and
-- all other dead's = 'D'
or dead is null and ni > 0 and nc > 0 and n - nd = 1;
create or replace procedure prcdr_Clustering is
v_sampleCount number;
v_sampleFlag number;
v_matchPercent number;
v_SpendAmount Number(18, 2);
cursor cur_PDCSample is
SELECT *
FROM TBL_BIL
WHERE UDF_CHK = 'N';
rec_Pdcsample TBL_BIL%rowtype;
BEGIN
OPEN cur_PDCSample;
LOOP
FETCH cur_PDCSample
into rec_Pdcsample;
EXIT WHEN cur_PDCSample%NOTFOUND;
SELECT COUNT(*)
INTO v_sampleCount
FROM TBL_BIL
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount <> 0 THEN
UPDATE TBL_BIL
SET UDF_CHK = 'Y'
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount > 1 THEN
v_sampleFlag := 1;
ELSE
IF v_sampleCount = 1 THEN
v_sampleFlag := 2;
ELSE
v_sampleFlag := 0;
END IF;
END IF;
UPDATE TBL_BIL
SET UDF_SAMPLECOUNT = v_sampleCount, UDF_SAMPLEFLAG = v_sampleFlag
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_PID = rec_Pdcsample.uniqueid
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
UPDATE TBL_BIL
SET UDF_PIDSPEND = v_SpendAmount
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_MATCHPERCENT = 1
WHERE uniqueid <> rec_Pdcsample.uniqueid
AND UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
END IF;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
OPEN cur_PDCSample;
END LOOP;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
end PrcdrClustering;
It takes me days to execute, my table has 225,846 rows of data.
The structure of my table is :-
UNIQUEID NUMBER Notnull primary key
VENDORNAME VARCHAR2(200)
SHORTTEXT VARCHAR2(500)
SPENDAMT NUMBER(18,2)
UDF_TOKENIZED VARCHAR2(999)
UDF_PID NUMBER(10)
UDF_SAMPLEFLAG NUMBER(4)
UDF_SAMPLECOUNT NUMBER(4)
UDF_MATCHPERCENT NUMBER(4)
UDF_TOKENCNT NUMBER(4)
UDF_PIDSPEND NUMBER(18,2)
UDF_CHK VARCHAR2(1)
Where to start? I've a number points to make.
You're doing bulk updates; this implies that bulk collect ... forall would be far more efficient.
You're doing multiple updates of the same table, which doubles the amount of DML.
As you've already selected from the table, re-entering it to do another count is pretty pointless, use an analytic function to get the result you need.
Indentation, indentation, indentation. Makes your code much easier to read.
You can use elsif to reduce the amount of statements to be evaluated ( very, very minor win )
If the uniqueid is unique you can use rowid to update the table.
You're updating udf_pidspend to null, whether this is intentional or not there's no need to do a separate update for it.
You can do a lot more in the cursor, but there's obviously no need to select everything, which'll decrease the amount of data you need to read from the disks.
You may need a couple of commits in there; though this means you can't rollback if it fails midway.
I hope tbl_bil is indexed on uniqueid
As GolzeTrol noted you're opening the cursor multiple times. There's no need for this.
As general rules:
If you're going to select / update or delete from a table do it once if possible and as few times as possible if not.
If you're doing bulk operations use bulk collect.
Never write select *
Use rowid where possible it avoids all index problems.
This will only work in 11G, I answered this question recently where I provided my own way of dealing with this implementation restriction in versions prior to 11G and linked to Ollie's, Tom Kyte's and Sathya's
I'm not entirely certain what you're trying to do here so please forgive me if the logic is a little off.
create or replace procedure prcdr_Clustering is
cursor c_pdcsample is
select rowid as rid
, count(*) over ( partition by udf_tokenized ) as samplecount
, udf_chk
, max(uniqueid) over ( partition by udf_tokenized ) as udf_pid
from tbl_bil
where udf_chk = 'N';
type t__pdcsample is table of c_pdcsample%rowtype index by binary_integer;
t_pdcsample t__pdcsample;
begin
open c_pdcsample;
loop
fetch c_pdcsample bulk collect into t_pdcsample limit 1000;
exit when t_pdcsample.count = 0;
if t_pdcsample.samplecount <> 0 then
t_pdcsample.udf_chk := 'y';
if t_pdcsample.samplecount > 1 then
t_pdcsample.samplecount := 1;
elsif t_pdcsample.samplecount = 1 then
t_pdcsample.samplecount := 2;
else
t_pdcsample.samplecount := 0;
end if;
end if;
forall i in t_pdcsample.first .. t_pdcsample.last
update tbl_bil
set udfsamplecount = t_pdcsample.samplecount
, udf_sampleflag = t_pdcsample.sampleflag
, udf_pidspend = null
, udf_pid = t_pdcsample.udf_pid
where rowid = t_pdcsample(i).rowid
;
for i in t_pdcsample.first .. t_pdcsample.last loop
update tbl_bil TBL_BIL
set udfmatchpercent = 1
where uniqueid <> t_pdcsample.uniqueid
and udf_tokenized = t_pdcsample.udf_tokenized;
end loop;
commit ;
end loop;
close c_pdcsample;
end PrcdrClustering;
/
Lastly calling all tables tbl_... is a little bit unnecessary.
Here is a variant using a single SQL statement. I'm not 100% certain that the logic is exactly the same, but for my test set, it is. Also the current procedure is non deterministic when you have more than one record with udf_chk = 'N' and the same udf_tokenized ...
This is the refactored procedure
SQL> create procedure prcdr_clustering_refactored
2 is
3 begin
4 merge into tbl_bil t
5 using ( select tb1.uniqueid
6 , count(*) over (partition by tb1.udf_tokenized) cnt
7 , max(decode(udf_chk,'N',uniqueid)) over (partition by tb1.udf_tokenized order by tb1.udf_chk) pid
8 from tbl_bil tb1
9 where udf_chk = 'N'
10 or exists
11 ( select 'dummy'
12 from tbl_bil tb2
13 where tb2.udf_tokenized = tb1.udf_tokenized
14 )
15 ) q
16 on ( t.uniqueid = q.uniqueid )
17 when matched then
18 update
19 set t.udf_samplecount = decode(t.udf_chk,'N',q.cnt,t.udf_samplecount)
20 , t.udf_sampleflag = decode(t.udf_chk,'N',decode(q.cnt,1,2,1),t.udf_sampleflag)
21 , t.udf_pid = q.pid
22 , t.udf_pidspend = decode(t.udf_chk,'N',null,t.udf_pidspend)
23 , t.udf_matchpercent = decode(t.udf_chk,'N',t.udf_matchpercent,1)
24 , t.udf_chk = 'Y'
25 ;
26 end;
27 /
Procedure created.
And here is a test:
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 0 0 0 0 0 0 N
2 a a 1 bla 0 0 0 0 0 0 N
3 a a 1 bla 0 0 0 0 0 0 Y
4 a a 1 bla 0 0 0 0 0 0 Y
5 a a 1 bla 0 0 0 0 0 0 Y
6 a a 1 blah 0 0 0 0 0 0 N
7 a a 1 blah 0 0 0 0 0 0 Y
8 a a 1 blah 0 0 0 0 0 0 Y
9 a a 1 blah 0 0 0 0 0 0 Y
10 a a 1 blah 0 0 0 0 0 0 Y
11 a a 1 blah 0 0 0 0 0 0 Y
11 rows selected.
SQL> exec prcdr_clustering
PL/SQL procedure successfully completed.
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 1 2 1 0 0 Y
2 a a 1 bla 2 1 4 0 0 Y
3 a a 1 bla 2 0 0 1 0 0 Y
4 a a 1 bla 2 0 0 1 0 0 Y
5 a a 1 bla 2 0 0 1 0 0 Y
6 a a 1 blah 6 1 6 0 0 Y
7 a a 1 blah 6 0 0 1 0 0 Y
8 a a 1 blah 6 0 0 1 0 0 Y
9 a a 1 blah 6 0 0 1 0 0 Y
10 a a 1 blah 6 0 0 1 0 0 Y
11 a a 1 blah 6 0 0 1 0 0 Y
11 rows selected.
SQL> rollback
2 /
Rollback complete.
SQL> exec prcdr_clustering_refactored
PL/SQL procedure successfully completed.
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 1 2 1 0 0 Y
2 a a 1 bla 2 1 4 0 0 Y
3 a a 1 bla 2 0 0 1 0 0 Y
4 a a 1 bla 2 0 0 1 0 0 Y
5 a a 1 bla 2 0 0 1 0 0 Y
6 a a 1 blah 6 1 6 0 0 Y
7 a a 1 blah 6 0 0 1 0 0 Y
8 a a 1 blah 6 0 0 1 0 0 Y
9 a a 1 blah 6 0 0 1 0 0 Y
10 a a 1 blah 6 0 0 1 0 0 Y
11 a a 1 blah 6 0 0 1 0 0 Y
11 rows selected.
Regards,
Rob.
I don't know why, but you open the cur_PDCSample, which select (I suspect) thousands of records. And then, in a loop, you close the cursor and reopen it, each time processing only the first record that is returned.
If you open the cursor once, process each record and then close it, your procedure will probably go a lot faster.
Actually, since you do not always update TBL_BIL.UDF_CHK to 'Y', it seems to me that your current procedure may run infinitely.