Fastest way of doing field comparisons in the same table with large amounts of data in oracle - oracle

I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison.
Here is what I have:
I have a vb.net program that parses the data and inserts it into an extract table
I run a procedure to do a full outer join on the two tables into a new table with the fields in one department prefixed with '_c'
I run another procedure to compare the old/new data and update 2 different tables with detail and summary information. Here is code from inside the procedure:
DECLARE
CURSOR Cur_Comp IS SELECT * FROM T.AEC_CIS_COMP;
BEGIN
FOR compRow in Cur_Comp LOOP
--If service pipe exists in CIS but not in FM and the service pipe has status of retired in CIS, ignore the variance
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
--If there is not a summary record for this HQ in the table for this run, create one
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT compRow.HQ, to_date(sysdate, 'DD/MM/YYYY') from dual WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = compRow.HQ AND RUN_DATE = to_date(sysdate, 'DD/MM/YYYY'))
-- Check fields and update the tables accordingly
If (compRow.cis_loop <> compRow.cis_loop_c) Then
--Insert information into the details table
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl,
DateTime, Changed_Field, CIS_Value, FM_Value)
VALUES(compRow.Fac_ID, compRow.Pipe_Num, compRow.Hq, compRow.Street_Num || ' ' || compRow.Street_Name,
'Y', sysdate, 'Cis_Loop', compRow.cis_loop, compRow.cis_loop_c);
-- Update information into the summary table
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq = compRow.Hq
AND Run_Date = to_date(sysdate, 'DD/MM/YYYY')
End If;
END LOOP;
END;
Any suggestions of an easier way of doing this rather than an if statement for all 44 columns of the table? (This is run once a week if it matters)
Update: Just to clarify, there are 88 columns of data (44 of duplicates to compare with one suffixed with _c). One table lists each field in a row that is different so one row can mean 30+ records written in that table. The other table keeps tally of the number of discrepencies for each week.

First of all I believe that your task can be implemented (and should be actually) with staight SQL. No fancy cursors, no loops, just selects, inserts and updates. I would start with unpivotting your source data (it is not clear if you have primary key to join two sets, I guess you do):
Col0_PK Col1 Col2 Col3 Col4
----------------------------------------
Row1_val A B C D
Row2_val E F G H
Above is your source data. Using UNPIVOT clause we convert it to:
Col0_PK Col_Name Col_Value
------------------------------
Row1_val Col1 A
Row1_val Col2 B
Row1_val Col3 C
Row1_val Col4 D
Row2_val Col1 E
Row2_val Col2 F
Row2_val Col3 G
Row2_val Col4 H
I think you get the idea. Say we have table1 with one set of data and the same structured table2 with the second set of data. It is good idea to use index-organized tables.
Next step is comparing rows to each other and storing difference details. Something like:
insert into diff_details(some_service_info_columns_here)
select some_service_info_columns_here_along_with_data_difference
from table1 t1 inner join table2 t2
on t1.Col0_PK = t2.Col0_PK
and t1.Col_name = t2.Col_name
and nvl(t1.Col_value, 'Dummy1') <> nvl(t2.Col_value, 'Dummy2');
And on the last step we update difference summary table:
insert into diff_summary(summary_columns_here)
select diff_row_id, count(*) as diff_count
from diff_details
group by diff_row_id;
It's just rough draft to show my approach, I'm sure there is much more details should be taken into account. To summarize I suggest two things:
UNPIVOT data
Use SQL statements instead of cursors

You have several issues in your code:
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
"cis_status_c" is not declared. Is it a variable or a column in AEC_CIS_COMP?
In case it is a column, just put the condition into the cursor, i.e. SELECT * FROM T.AEC_CIS_COMP WHERE not (compRow.pipe_num = '' AND cis_status_c = 'R')
to_date(sysdate, 'DD/MM/YYYY')
That's nonsense, you convert a date into a date, simply use TRUNC(SYSDATE)
Anyway, I think you can use three single statements instead of a cursor:
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT comp.HQ, trunc(sysdate)
from AEC_CIS_COMP comp
WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = comp.HQ AND RUN_DATE = trunc(sysdate));
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, DateTime, Changed_Field, CIS_Value, FM_Value)
select comp.Fac_ID, comp.Pipe_Num, comp.Hq, comp.Street_Num || ' ' || comp.Street_Name, 'Y', sysdate, 'Cis_Loop', comp.cis_loop, comp.cis_loop_c
from T.AEC_CIS_COMP comp
where comp.cis_loop <> comp.cis_loop_c;
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq IN (Select Hq from T.AEC_CIS_COMP)
AND trunc(Run_Date) = trunc(sysdate);
They are not tested but they should give you a hint how to do it.

Related

Transferring data to a test table

There is a table contact_history with 1.244.000.000 number of data (from 04.03.22-05.06.2022) and with fields contact_dt and contact_dttm. I tried to transfer all the data to test using contact_dt with script:
**DECLARE
dat date;
begin
dat:= TO_DATE('04.03.2022', 'dd.mm.yyyy');
while dat<= TO_DATE('05.06.2022', 'dd.mm.yyyy') loop
INSERT /*+ append enable_parallel_dml parallel(16)*/
INTO CONTACT_HISTORY_TEST ct
SELECT -- + parallel(16)
ch.sas_contact_id,
ch.contact_source,
ch.client_id,
ch.contact_dttm,
ch.contact_dt,
ch.sas_contact_error_desc,
ch.sas_contact_status
FROM CONTACT_HISTORY ch
WHERE ch.contact_dt = dat;
commit;
dat:= dat+1;
end loop;
end;**
There is such a problem that when SELECT COUNT(*) FROM CONTACT_HISTORY_TEST shows only 1.200.000.000 data in the test table, when in general table 1.244.000.000.
And there is such a moment that when checking
SELECT COUNT(*)
FROM CONTACT_HISTORY
WHERE CONTACT_DT>= TO_DATE('04.03.2021', 'dd.mm.yyyy')
AND CONTACT_DT<= TO_DATE('05.06.2022', 'dd.mm.yyyy');
SELECT COUNT(*)
FROM CONTACT_HISTORY_TEST
WHERE CONTACT_DT>= TO_DATE('04.03.2021', 'dd.mm.yyyy')
AND CONTACT_DT<= TO_DATE('05.06.2022', 'dd.mm.yyyy')
In both tables, there are 1.200.000.000 data, please tell me where the remaining 44 million data have gone and how can I completely transfer the data from the table or how to do it right?
I presume that contact_dt column contains date values that have time component; for example, it isn't just 04.03.2021, but 04.03.2021 13:23:45.
Code you posted handles "start" of the period correctly as 04.03.2021 actually represents 04.03.2021 00:00:00.
However, the last day of that period isn't handled correctly - you're missing (almost) the whole last day because you copied only rows whose contact_dt is equal to 05.06.2022 00:00:00. What about eg. 05.06.2022 08:32:13?
Therefore, modify something. If contact_dt column is indexed, you shouldn't truncate it, so the simplest option is to change this
while dat <= TO_DATE('05.06.2022', 'dd.mm.yyyy') loop
to
while dat < TO_DATE('06.06.2022', 'dd.mm.yyyy') loop
As #APC commented, where clause should then also be fixed to
where ch.contact_dt >= dat and ch.contact_dt < dat + 1
To verify number of rows and date values, run the following code in both schemas and then post the result (edit the question, not as a comment):
alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';
select min(contact_dt) min_dat, max(contact_dt) max_dat, count(*) cnt
from contact_history;

Translate hierarchical Oracle query to DB2 query

I work primarily with SAS and Oracle and am still new to DB2. Im faced with needing a hierarchical query to separate a clob into chunks that can be pulled into sas. SAS has a limit of 32K for character variables so I cant just pull the dataset in normally.
I found an old stackoverflow question about the best way to pull a clob into a sas data set but it is written in Oracle.
Import blob through SAS from ORACLE DB
Since I am new to DB2 and the syntax for this type of join seems very different I was hoping to find someone that could help convert it and explain the syntax. I find the Oracle syntax to be much easier to understand. I'm not sure in DB2 if you would use a CTE recursion like this https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/apsg/src/tpc/db2z_xmprecursivecte.html or if you would use hierarchical queries like this https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/sqlp/rbafyrecursivequeries.htm
Here is the Oracle query.
SELECT
id
, level as chunk_id
, regexp_substr(clob_value, '.{1,32767}', 1, level, 'n') as clob_chunk
FROM (
SELECT id, clob_value
FROM schema.table
WHERE id = 1
)
CONNECT BY LEVEL <= regexp_count(clob_value, '.{1,32767}',1,'n')
order by id, chunk_id;
The table has two fields the id and the clob_value and would look like this.
ID CLOB_VALUE
1 really large clob
2 medium clob
3 another large clob
The thought is I would want this result. I would only ever be doing this one row at a time where id= which ever row I am processing.
ID CHUNK_ID CLOB
1 1 clob_chunk1of3
1 2 clob_chunk2of3
1 3 clob_chunk3of3
Thanks for any time spent reading and helping.
Here is a solution that should work in DB2 with few changes (but please be advised that I don't know DB2 at all; I am just using Oracle features that are in the SQL Standard, so they should be implemented identically - or almost so - in DB2).
Below I create a table with your sample data; then I show how to chunk it into substrings of length at most 8 characters. Although the strings are short, I defined the column as CLOB and I am using CLOB tools; this should work on much larger CLOBs.
You can make both the chunk size and the id into bind parameters, if needed. In my demo below I hardcoded the chunk size and I show the result for all IDs in the table. In case the CLOB is NULL, I do return one chunk (which is NULL, of course).
Note that touching CLOBs in a query is very expensive; so most of the work is done without touching the CLOBs. I only work on them as little as possible.
PREP WORK
drop table tbl purge; -- If needed
create table tbl (id number, clob_value clob);
insert into tbl (id, clob_value)
select 1, 'really large clob' from dual union all
select 2, 'medium clob' from dual union all
select 3, 'another large clob' from dual union all
select 4, null from dual -- added to check handling
;
commit;
QUERY
with
prep(id, len) as (
select id, dbms_lob.getlength(clob_value)
from tbl
)
, rec(id, len, ord, pos) as (
select id, len, 1, 1
from prep
union all
select id, len, ord + 1, pos + 8
from rec
where len >= pos + 8
)
select id, ord, dbms_lob.substr(clob_value, 8, pos)
from tbl inner join rec using (id)
order by id, ord
;
ID ORD CHUNK
---- ---- --------
1 1 really l
1 2 arge clo
1 3 b
2 1 medium c
2 2 lob
3 1 another
3 2 large cl
3 3 ob
4 1
Another option is to enable the Oracle compatibility in Db2 and just issue the hierarchical query.
This GitHub repository has background information on SQL recursion in DB2, including the Oracle-style syntax and a side by side example (both work against the Db2 sample database):
-- both queries are against the SAMPLE database
-- and should return the same result
SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
START WITH DEPTNO = 'A00'
CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT;
WITH tdep(level, deptname, deptno) as (
SELECT 1, CAST( DEPTNAME AS VARCHAR(40)) AS DEPTNAME, deptno
FROM department
WHERE DEPTNO = 'A00'
UNION ALL
SELECT t.LEVEL+1, CAST(SPACE(t.LEVEL * 4) || '/' || d.DEPTNAME
AS VARCHAR(40)) AS DEPTNAME, d.deptno
FROM DEPARTMENT d, tdep t
WHERE d.admrdept=t.deptno and d.deptno<>'A00')
SELECT level, deptname
FROM tdep;

Oracle: Merge equivalent of insert all?

I've tried to find an answer on several forums with no luck, so perhaps you can help me out.
I've got an INSERT ALL request that inserts thousands of rows at once.
INSERT ALL
INTO my_table (field_x, field_y, field_z) VALUES ('value_x1', 'value_y1', 'value_z1')
INTO my_table (field_x, field_y, field_z) VALUES ('value_x2', 'value_y2', 'value_z2')
...
INTO my_table (field_x, field_y, field_z) VALUES ('value_xn', 'value_yn', 'value_zn')
SELECT * FROM DUAL;
Now I'd like to amend it to update rows when some criteria are met. For each row, I could have something like:
MERGE INTO my_table m
USING (SELECT 'value_xi' x, 'value_yi' y, 'value_zi' z FROM DUAL) s
ON (m.field_x = s.x and m.field_y = s.y)
WHEN MATCHED THEN UPDATE SET
field_z = s.z,
WHEN NOT MATCHED THE INSERT (field_x, field_y, field_z)
VALUE(s.x, s.y, s.z);
Is there a way for me to do a kind of "MERGE ALL" that would allow to have all those merge requests in one?
Or maybe I'm missing the point and there's a better way to do this?
Thanks,
Edit: One possible solution is to use "UNION ALL" for a set of selects from dual, as follows:
MERGE INTO my_table m
USING (
select '' as x, '' as y, '' as z from dual
union all select 'value_x1', 'value_y1', 'value_z1' from dual
union all select 'value_x2', 'value_y2', 'value_z2' from dual
[...]
union all select 'value_xn', 'value_yn', 'value_zn' from dual
) s
ON (m.field_x = s.x and m.field_y = s.y)
WHEN MATCHED THEN UPDATE SET
field_z = s.z,
WHEN NOT MATCHED THEN INSERT (field_x, field_y, field_z)
VALUES (s.x, s.y, s.z);
NB: I've used a first empty row to be able generate all rows in the same format when I write the request. I also specify the columns names there.
Another solution would be to create a temporary table, INSERT ALL data into it, then merge with the target table and delete the temporary table.
If you're passing in tens of thousands of rows from your python script, I would do:
Create a global temporary table (GTT - this is a permanent table that holds data at session level)
Get your python script to insert the rows into the GTT
Use the GTT in the Merge statement, e.g.:
merge into your_main_table tgt
using your_gtt src
on (<join conditions>)
when matched then
update ...
when not matched then
insert ...;

how can i make this merge/ update statement more efficient , its taking too much time

MERGE INTO ////////1 GFO
USING
(SELECT *
FROM
(SELECT facto/////rid,
p-Id,
PRE/////EDATE,
RU//MODE,
cre///date,
ROW_NUMBER() OVER (PARTITION BY facto/////id ORDER BY cre///te DESC) col
FROM ///////////2
) x
WHERE x.col = 1) UFD
ON (GFO.FACTO-/////RID=UFD.FACTO////RID)
WHEN MATCHED THEN UPDATE
SET
GFO.PRE////DATE=UFD.PRE//////DATE
WHERE UFD.CRE/////DATE IS NOT NULL
AND UFD.RU//MODE= 'S'
AND GFO.P////ID=:2
hi every1, my above merge statement is taking too long , it has to run 40 times on table 1 using table2 each having 4millions plus records, for 40 different p--id, please suggest more efficient way as currently its taking 40+ minutes.
its updating only one colummn using a column from table2.t
i am unable to execute the query, its returning
Error: cannot fetch last explain plan from PLAN_TABLE
EXPLAIN PLAN IMAGE
HERE IS THE SCREENSHOT OF EXPLAIN PLAN
cost
The shown plan seems to by OK, the observed problem stems from the LOOP over P_ID that do not scale.
I assume you performs something like this (strongly simplified) - assuming the P_ID to be processed are in table TAB_PID
begin
for cur in (select p_id from tab_pid) loop
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1 WHERE p_id = cur.p_id;
end loop;
end;
/
HASH JOIN on large tables (in NO PARALLEL mode) with elapsed time 60 seconds is not a catastrophic result. But looping 40 times makes your 40 minutes.
So I'd sugesst to try to integrate the loop in the MERGE statement, without knowing details something like this (mayby you'll need also ajdust the MERGE JOIN condition).
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1
WHERE p_id in (select p_id from tab_pid);

pl-sql include column names in query

A weird request maybe but. My boss wants me to create an admin version of a page we have that displays data from an oracle query in a table.
The admin page, instead of displaying the data (query returns 1 row), needs to return the table name and column name
Ex: Instead of:
Name Initial
==================
Bob A
I want:
Name Initial
============================
Users.FirstName Users.MiddleInitial
I realize I can do this in code but would rather just modify the query to return the data I want so I can leave the report generation code mostly alone.
I don't want to do it in a stored procedure.
So when I spit out the data in the report using something like:
blah blah = MyDataRow("FirstName")
I can leave that as is but instead of it displaying "BOB" it would display "Users.FirstName"
And I want to do the query using select * if possible instead of listing all the columns
So for each of the columns I am querying in the * , I want to get (instead of the column value) the tablename.ColumnName or tablename|columnName
hope you are following- I am confusing myself...
pseudo:
select tablename + '.' + Columnname as WhateverTheColumnNameIs
from Table1
left join Table2 on whatever...
Join Table_Names on blah blah
Whew- after writing all this I think I will just do it on the code side.
But if you are up for it maybe a fun challenge
Oracle does not provide an authentic way(there is no pseudocolumn) to get the column name of a table as a result of a query against that table. But you might consider these two approaches:
Extract column name from an xmltype, formed by passing cursor expression(your query) in the xmltable() function:
-- your table
with t1(first_name, middle_name) as(
select 1,2 from dual
), -- your query
t2 as(
select * -- col1 as "t1.col1"
--, col2 as "t1.col2"
--, col3 as "t1.col3"
from hr.t1
)
select *
from ( select q.object_value.getrootelement() as col_name
, rownum as rn
from xmltable('//*'
passing xmltype(cursor(select * from t2 where rownum = 1))
) q
where q.object_value.getrootelement() not in ('ROWSET', 'ROW')
)
pivot(
max(col_name) for rn in (1 as "name", 2 as "initial")
)
Result:
name initial
--------------- ---------------
FIRST_NAME MIDDLE_NAME
Note: In order for column names to be prefixed with table name, you need to list them
explicitly in the select list of a query and supply an alias, manually.
PL/SQL approach. Starting from Oracle 11g you could use dbms_sql() package and describe_columns() procedure specifically to get the name of columns in the cursor(your select).
This might be what you are looking for, try selecting from system views USER_TAB_COLS or ALL_TAB_COLS.

Resources