There is a table contact_history with 1.244.000.000 number of data (from 04.03.22-05.06.2022) and with fields contact_dt and contact_dttm. I tried to transfer all the data to test using contact_dt with script:
**DECLARE
dat date;
begin
dat:= TO_DATE('04.03.2022', 'dd.mm.yyyy');
while dat<= TO_DATE('05.06.2022', 'dd.mm.yyyy') loop
INSERT /*+ append enable_parallel_dml parallel(16)*/
INTO CONTACT_HISTORY_TEST ct
SELECT -- + parallel(16)
ch.sas_contact_id,
ch.contact_source,
ch.client_id,
ch.contact_dttm,
ch.contact_dt,
ch.sas_contact_error_desc,
ch.sas_contact_status
FROM CONTACT_HISTORY ch
WHERE ch.contact_dt = dat;
commit;
dat:= dat+1;
end loop;
end;**
There is such a problem that when SELECT COUNT(*) FROM CONTACT_HISTORY_TEST shows only 1.200.000.000 data in the test table, when in general table 1.244.000.000.
And there is such a moment that when checking
SELECT COUNT(*)
FROM CONTACT_HISTORY
WHERE CONTACT_DT>= TO_DATE('04.03.2021', 'dd.mm.yyyy')
AND CONTACT_DT<= TO_DATE('05.06.2022', 'dd.mm.yyyy');
SELECT COUNT(*)
FROM CONTACT_HISTORY_TEST
WHERE CONTACT_DT>= TO_DATE('04.03.2021', 'dd.mm.yyyy')
AND CONTACT_DT<= TO_DATE('05.06.2022', 'dd.mm.yyyy')
In both tables, there are 1.200.000.000 data, please tell me where the remaining 44 million data have gone and how can I completely transfer the data from the table or how to do it right?
I presume that contact_dt column contains date values that have time component; for example, it isn't just 04.03.2021, but 04.03.2021 13:23:45.
Code you posted handles "start" of the period correctly as 04.03.2021 actually represents 04.03.2021 00:00:00.
However, the last day of that period isn't handled correctly - you're missing (almost) the whole last day because you copied only rows whose contact_dt is equal to 05.06.2022 00:00:00. What about eg. 05.06.2022 08:32:13?
Therefore, modify something. If contact_dt column is indexed, you shouldn't truncate it, so the simplest option is to change this
while dat <= TO_DATE('05.06.2022', 'dd.mm.yyyy') loop
to
while dat < TO_DATE('06.06.2022', 'dd.mm.yyyy') loop
As #APC commented, where clause should then also be fixed to
where ch.contact_dt >= dat and ch.contact_dt < dat + 1
To verify number of rows and date values, run the following code in both schemas and then post the result (edit the question, not as a comment):
alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';
select min(contact_dt) min_dat, max(contact_dt) max_dat, count(*) cnt
from contact_history;
I'm trying to update a table based on another one's information:
Source_Table (Table 1) columns:
TABLE_ROW_ID (Based on trigger-sequence when insert)
REP_ID
SOFT_ASSIGNMENT
Description (Table 2) columns:
REP_ID
NEW_SOFT_ASSIGNMENT
This is my loop statement:
SELECT count(table_row_id) INTO V_ROWS_APPROVED FROM Source_Table;
FOR i IN 1..V_ROWS_APPROVED LOOP
SELECT REQUESTED_SOFT_MAPPING INTO V_SOFT FROM Source_Table WHERE ROW_ID = i;
SELECT REP_ID INTO V_REP_ID FROM Source_Table WHERE ROW_ID = i;
UPDATE Description_Table D
SET D.NEW_SOFT_ASSIGNMENT = V_SOFT
WHERE D.REP_ID = V_REP_ID;
END LOOP;
END;
The ending result of this loop is a beautiful ''504 Gateway Time-out''.
I know the issue is on the Update query but there's no other way (I can think about) of doing it.
Can someone give me a hand please?
Thanks
Unless your row_id values are contiguous - i.e. count(row_id) == max(row_id) - then this will get a no-data-found. Sequences aren't gapless, so this seems fairly likely. We have no way of telling if that is happening and somehow that is leaving your connection hanging until it times out, or if it's just taking a long time because you're doing a lot of individual queries and updates over a large data set. (And you may be squashing any errors that do occur, though you haven't shown that.)
You don't need to query and update in a loop though, or even use PL/SQL; you can apply all the values in the source table to the description table with a single update or merge:
merge into description_table d
using source_table s
on (s.rep_id = d.rep_id)
when matched then
update set d.new_soft_assignment = s.requested_soft_mapping;
db<>fiddle with some dummy data, including a non-contiguous row_id to show that erroring.
MERGE INTO ////////1 GFO
USING
(SELECT *
FROM
(SELECT facto/////rid,
p-Id,
PRE/////EDATE,
RU//MODE,
cre///date,
ROW_NUMBER() OVER (PARTITION BY facto/////id ORDER BY cre///te DESC) col
FROM ///////////2
) x
WHERE x.col = 1) UFD
ON (GFO.FACTO-/////RID=UFD.FACTO////RID)
WHEN MATCHED THEN UPDATE
SET
GFO.PRE////DATE=UFD.PRE//////DATE
WHERE UFD.CRE/////DATE IS NOT NULL
AND UFD.RU//MODE= 'S'
AND GFO.P////ID=:2
hi every1, my above merge statement is taking too long , it has to run 40 times on table 1 using table2 each having 4millions plus records, for 40 different p--id, please suggest more efficient way as currently its taking 40+ minutes.
its updating only one colummn using a column from table2.t
i am unable to execute the query, its returning
Error: cannot fetch last explain plan from PLAN_TABLE
EXPLAIN PLAN IMAGE
HERE IS THE SCREENSHOT OF EXPLAIN PLAN
cost
The shown plan seems to by OK, the observed problem stems from the LOOP over P_ID that do not scale.
I assume you performs something like this (strongly simplified) - assuming the P_ID to be processed are in table TAB_PID
begin
for cur in (select p_id from tab_pid) loop
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1 WHERE p_id = cur.p_id;
end loop;
end;
/
HASH JOIN on large tables (in NO PARALLEL mode) with elapsed time 60 seconds is not a catastrophic result. But looping 40 times makes your 40 minutes.
So I'd sugesst to try to integrate the loop in the MERGE statement, without knowing details something like this (mayby you'll need also ajdust the MERGE JOIN condition).
merge INTO tab1 USING tab2 ON (tab1.r_id = tab2.r_id)
WHEN MATCHED THEN
UPDATE SET tab1.col1=tab2.col1
WHERE p_id in (select p_id from tab_pid);
I have 2 delete statements that are taking a long time to complete. There are several indexes on the columns in where clause.
What is a duplicate?
If 2 or more records have same values in columns id,cid,type,trefid,ordrefid,amount and paydt then there are duplicates.
The DELETEs delete about 1 million record.
Can they be re-written in any way to make it quicker.
DELETE FROM TABLE1 A WHERE loaddt < (
SELECT max(loaddt) FROM TABLE1 B
WHERE
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
COMMIT;
DELETE FROM TABLE1 a where rowid > (
Select min(rowid) from TABLE1 b
WHERE
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
commit;
Explain Plan:
DELETE TABLE1
HASH JOIN 1296491
Access Predicates
AND
A.ID=ITEM_1
A.CID=ITEM_2
ITEM_3=NVL(TYPE,'-99999')
ITEM_4=NVL(TREFID,'-99999')
ITEM_5=NVL(ORDREFID,'-99999')
ITEM_6=NVL(AMOUNT,(-99999))
ITEM_7=NVL(PAYDT,TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Filter Predicates
LOADDT<MAX(LOADDT)
TABLE ACCESS TABLE1 FULL 267904
VIEW VW_SQ_1 690385
SORT GROUP BY 690385
TABLE ACCESS TABLE1 FULL 267904
How large is the table? If count of deleted rows is up to 12% then you may think about index.
Could you somehow partition your table - like week by week and then scan only actual week?
Maybe this could be more effecient. When you're using aggregate function, then oracle must walk through all relevant rows (in your case fullscan), but when you use exists it stops when the first occurence is found. (and of course the query would be much faster, when there was one function-based(because of NVL) index on all columns in where clause)
DELETE FROM TABLE1 A
WHERE exists (
SELECT 1
FROM TABLE1 B
WHERE
A.loaddt != b.loaddt
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
Although some may disagree, I am a proponent of running large, long running deletes procedurally. In my view it is much easier to control and track progress (and your DBA will like you better ;-) Also, not sure why you need to join table1 to itself to identify duplicates (and I'd be curious if you ever run into snapshot too old issues with your current approach). You also shouldn't need multiple delete statements, all duplicates should be handled in one process. Finally, you should check WHY you're constantly re-introducing duplicates each week, and perhaps change the load process (maybe doing a merge/upsert rather than all inserts).
That said, you might try something like:
-- first create mat view to find all duplicates
create materialized view my_dups_mv
tablespace my_tablespace
build immediate
refresh complete on demand
as
select id,cid,type,trefid,ordrefid,amount,paydt, count(1) as cnt
from table1
group by id,cid,type,trefid,ordrefid,amount,paydt
having count(1) > 1;
-- dedup data (or put into procedure and schedule along with mat view refresh above)
declare
-- make sure my_dups_mv is refreshed first
cursor dup_cur is
select * from my_dups_mv;
type duprec_t is record(row_id rowid);
duprec duprec_t;
type duptab_t is table of duprec_t index by pls_integer;
duptab duptab_t;
l_ctr pls_integer := 0;
l_dupcnt pls_integer := 0;
begin
for rec in dup_cur
loop
l_ctr := l_ctr + 1;
-- assuming needed indexes exist
select rowid
bulk collect into duptab
from table1
where id = rec.id
and cid = rec.cid
and type = rec.type
and trefid = rec.trefid
and ordrefid = rec.ordrefid
and amount = rec.amount
and paydt = rec.paydt
-- order by whatever makes sense to make the "keeper" float to top
order by loaddt desc
;
for i in 2 .. duptab.count
loop
l_dupcnt := l_dupcnt + 1;
delete from table1 where rowid = duptab(i).row_id;
end loop;
if (mod(l_ctr, 10000) = 0) then
-- log to log table here (calling autonomous procedure you'll need to implement)
insert_logtable('Table1 deletes', 'Commit reached, deleted ' || l_dupcnt || ' rows');
commit;
end if;
end loop;
commit;
end;
Check your log table for progress status.
1. Parallel
alter session enable parallel dml;
DELETE /*+ PARALLEL */ FROM TABLE1 A WHERE loaddt < (
...
Assuming you have Enterprise Edition, a sane server configuration, and you are on 11g. If you're not on 11g, the parallel syntax is slightly different.
2. Reduce memory requirements
The plan shows a hash join, which is probably a good thing. But without any useful filters, Oracle has to hash the entire table. (Tbone's query, that only use a GROUP BY, looks nicer and may run faster. But it will also probably run into the same problem trying to sort or hash the entire table.)
If the hash can't fit in memory it must be written to disk, which can be very slow. Since you run this query every week, only one of the tables needs to look at all the rows. Depending on exactly when it runs, you can add something like this to the end of the query: ) where b.loaddt >= sysdate - 14. This may significantly reduce the amount of writing to temporary tablespace. And it may also reduce read IO if you use some partitioning strategy like jakub.petr suggested.
3. Active Report
If you want to know exactly what your query is doing, run the Active Report:
select dbms_sqltune.report_sql_monitor(sql_id => 'YOUR_SQL_ID_HERE', type => 'active')
from dual;
(Save the output to an .html file and open it with a browser.)
How do I get the total number of inserts/updates that have occurred in an Oracle database over a period of time?
Assuming that you've configured AWR to retain data for all SQL statements (the default is to only retain the top 30 by CPU, elapsed time, etc. if the STATISTICS_LEVEL is 'TYPICAL' and the top 100 if the STATISTICS_LEVEL is 'ALL') via something like
BEGIN
dbms_workload_repository.modify_snapshot_settings (
topnsql => 'MAXIMUM'
);
END;
and assuming that SQL statements don't age out of the cache before a snapshot captures them, you can use the AWR tables for some of this.
You can gather the number of times that an INSERT statement was executed and the number of times that an UPDATE statement was executed
SELECT sum( stat.executions_delta ) insert_executions
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 2;
SELECT sum( stat.executions_delta ) update_executions
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 6;
Note that these queries include both statements that your application issues and statements that Oracle issues in the background. You could add additional criteria if you want to filter out certain SQL statements.
Similarly, you could get the total number of distinct INSERT and UPDATE statements
SELECT count( distinct stat.sql_id ) distinct_insert_stmts
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 2;
SELECT count( distinct stat.sql_id ) distinct_update_stmts
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 6;
Oracle does not, however, track the number of rows that were inserted or updated in a given interval. So you won't be able to get that information from AWR. The closest you could get would be to try to leverage the monitoring Oracle does to determine if statistics are stale. Assuming MONITORING is enabled for each table (it is by default in 11g and I believe it is by default in 10g), i.e.
ALTER TABLE table_name
MONITORING;
Oracle will periodically flush the approximate number of rows that are inserted, updated, and deleted for each table to the SYS.DBA_TAB_MODIFICATIONS table. But this will only show the activity since statistics were gathered on a table, not the activity in a particular interval. You could, however, try to write a process that periodically captured this data to your own table and report off that.
If you instruct Oracle to flush the monitoring information from memory to disk (otherwise there is a lag of up to several hours)
BEGIN
dbms_stats.flush_database_monitoring_info;
END;
you can get an approximate count of the number of rows that have changed in each table since statistics were last gathered
SELECT table_owner,
table_name,
inserts,
updates,
deletes
FROM sys.dba_tab_modifications