How do I get the number of inserts/updates occuring in an Oracle database? - oracle

How do I get the total number of inserts/updates that have occurred in an Oracle database over a period of time?

Assuming that you've configured AWR to retain data for all SQL statements (the default is to only retain the top 30 by CPU, elapsed time, etc. if the STATISTICS_LEVEL is 'TYPICAL' and the top 100 if the STATISTICS_LEVEL is 'ALL') via something like
BEGIN
dbms_workload_repository.modify_snapshot_settings (
topnsql => 'MAXIMUM'
);
END;
and assuming that SQL statements don't age out of the cache before a snapshot captures them, you can use the AWR tables for some of this.
You can gather the number of times that an INSERT statement was executed and the number of times that an UPDATE statement was executed
SELECT sum( stat.executions_delta ) insert_executions
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 2;
SELECT sum( stat.executions_delta ) update_executions
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 6;
Note that these queries include both statements that your application issues and statements that Oracle issues in the background. You could add additional criteria if you want to filter out certain SQL statements.
Similarly, you could get the total number of distinct INSERT and UPDATE statements
SELECT count( distinct stat.sql_id ) distinct_insert_stmts
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 2;
SELECT count( distinct stat.sql_id ) distinct_update_stmts
FROM dba_hist_sqlstat stat
JOIN dba_hist_sqltext txt ON (stat.sql_id = txt.sql_id )
JOIN dba_hist_snapshot snap ON (stat.snap_id = snap.snap_id)
WHERE snap.begin_interval_time BETWEEN <<start time>> AND <<end time>>
AND txt.command_type = 6;
Oracle does not, however, track the number of rows that were inserted or updated in a given interval. So you won't be able to get that information from AWR. The closest you could get would be to try to leverage the monitoring Oracle does to determine if statistics are stale. Assuming MONITORING is enabled for each table (it is by default in 11g and I believe it is by default in 10g), i.e.
ALTER TABLE table_name
MONITORING;
Oracle will periodically flush the approximate number of rows that are inserted, updated, and deleted for each table to the SYS.DBA_TAB_MODIFICATIONS table. But this will only show the activity since statistics were gathered on a table, not the activity in a particular interval. You could, however, try to write a process that periodically captured this data to your own table and report off that.
If you instruct Oracle to flush the monitoring information from memory to disk (otherwise there is a lag of up to several hours)
BEGIN
dbms_stats.flush_database_monitoring_info;
END;
you can get an approximate count of the number of rows that have changed in each table since statistics were last gathered
SELECT table_owner,
table_name,
inserts,
updates,
deletes
FROM sys.dba_tab_modifications

Related

Translate SQL's first_value and partition by into SAS

I have this code in SQL
SELECT acc_id,
time,
approved_amount,
balance,
coalesce(approved_amount,
first_value(balance) OVER (PARTITION BY acc_id
ORDER BY time)) orig_amount
FROM table;
Is it possible somehow to translate it into SAS? It is not working in proc sql step.
I don't use nor know SAS, however if it is something what does not support window functions, you can replace it by joins. I assume you want second argument of coalesce as the balance of oldest record of those in acc_id group, hence:
select acc_id,
time,
approved_amount,
balance,
coalesce(approved_amount, acc_id_to_balance.balance_fallback)
from table t
join (
select t.acc_id, t.balance as balance_fallback
from (
select acc_id, min(time) as min_time
from table
group by acc_id
) acc_id_to_min_time
join table t on acc_id_to_min_time.acc_id = t.acc_id and acc_id_to_min_time.min_time = t.time
) acc_id_to_balance on t.acc_id = acc_id_to_balance.acc_id
Just worked out in head, didn't try. Problems might appear in case of duplicate minimal time, which would require another level of grouping.
This is how you would do that in SAS since unlike SQL when you use a data step it will process the data in the order that it appears in the source dataset.
data want;
set table ;
by acc_id time;
if first.id then first_balance=balance;
retain first_balance;
orig_amount = coalesce(approved_amount,first_balance);
run;

Oracle SELECT * FROM LARGE_TABLE - takes minutes to respond

So I have a simple table with 5 or so columns, one of which is a clob containing some JSON data.
I am running
SELECT * FROM BIG_TABLE
SELECT * FROM BIG_TABLE WHERE ROWNUM < 2
SELECT * FROM BIG_TABLE WHERE ROWNUM = 1
SELECT * FROM BIG_TABLE WHERE ID=x
I expect that any fractionally intelligent relational database would return the data immediately. We are not imposing order by/group by clauses, so why not return the data as and when you find it?
Of all the forms of SELECT statements above, only 4. returned in a sub-second manner. This is unexpected for 1-3 which are returning between 1 and 10 minutes before the query shows any responses in SQL Developer. SQL Developer has the standard SQL Array Fetch Size of 50 (JDBC Fetch size of 50 rows) so at a minimum, it is taking 1-10 minutes to return 50 rows from a simple table with no joins on a super high-performance RAC cluster backed by fancy 4-tiered EMC disk subsystem.
Explain plans show a table scan. Fine, but why should I wait 1-10 minutes for the results with rownum in the WHERE clause?
What is going on here?
OK - I found the issue. ROWNUM does not operate like I thought it did and in the code above it never stops the full table scan.
This is because:
RowNum is assigned during the predicate operation (where clause evaluation) and incremented afterwards, i.e.: your row makes it into the result set and then gets rownum assigned.
In order to filter by rownum you need to already have it exist, something like ...
SELECT * FROM (SELECT * FROM BIG_TABLE) WHERE ROWNUM < 1
In effect what this means is that there is no way to filter out the top 5 rows from a table without having first selected the entire table if no other filter criteria are involved.
I solved my problem like this...
SELECT * FROM (SELECT * FROM BIG_TABLE WHERE
DATE_COL BETWEEN :Date1 AND :Date2) WHERE ROWNUM < :x;

Retrieving a large dataset from a very large table

I have a very large table in oracle that contains 140+ million rows. Currently we are doing three full table scans on this table nightly, and using some of the results to populate a tmp table. That tmp table is then turned into a very large report (usually 140K + lines).
The big table is called tasklog and has the following structure has:
tasklog_id (number) - PK
document_id (number)
date_time_in (date)
+ a few more rows that aren't relevant
There are millions of different document ids each repeated between 1 and several hundred times, date_time_in is the time this entry was put into the database.
All of the full table scans looks like this
DECLARE
n_prevdocid number;
cursor tasks is
select *
from tasklog
order by document_id, date_time_in DESC;
BEGIN
for tk in tasks
loop
if n_prevdocid <> tk.document_id then
-- *code snipped*
end if;
n_prevdocid = tk.document_id;
end loop;
END;
/
So my question: is there a quick (ish) way to get a distinct list of document_ids with the row having the most recent date_time_in. This could dramatically speed up the whole thing. Or can anyone think of a better way of retrieving this data daily?
Things that may be relevant, this table only ever has rows inserted with current date time. It is not range paritioned but I can't see how that might help me. No rows are ever updated or deleted. There are about 70k - 80k rows inserted daily.
I don't think that you're going to get away from doing at least one full table scan, as the only way that it would be efficient would be is if the ratio of distinct document_id's to total records was pretty small. The clustering on the document_id is going to be very poor due to the way that the data is generated and inserted.
How about:
create table tmp nologging compress -- or pctfree 0
as
select ...
from (
select t.*,
max(date_time_in) over (partition by document_id) max_date_time_in
from tasklog t)
where date_time_in = max_date_time_in
Possibly, having created this once, you could then optimise further refreshes by merging into this set only the newer records. Something like ...
merge into tmp
using (
select ...
from (
select t.*,
max(date_time_in) over (partition by document_id) max_date_time_in
from tasklog t
where date_time_in > (select max(date_time_in) from tmp))
where date_time_in = max_date_time_in)
on ... blah blah
Have you tried:
select document_id
from tasklog t1
where date_time_in = (select max(date_time_in)
from tasklog t2
where t1.document_id=t2.document_id)
You can do something like this:
select document_id , date_time from tasklog group by date_time,document_id order by date_time desc;
By this you can retrieve distinct document_id with latest date_time colums.

optimizing a dup delete statement Oracle

I have 2 delete statements that are taking a long time to complete. There are several indexes on the columns in where clause.
What is a duplicate?
If 2 or more records have same values in columns id,cid,type,trefid,ordrefid,amount and paydt then there are duplicates.
The DELETEs delete about 1 million record.
Can they be re-written in any way to make it quicker.
DELETE FROM TABLE1 A WHERE loaddt < (
SELECT max(loaddt) FROM TABLE1 B
WHERE
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
COMMIT;
DELETE FROM TABLE1 a where rowid > (
Select min(rowid) from TABLE1 b
WHERE
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
commit;
Explain Plan:
DELETE TABLE1
HASH JOIN 1296491
Access Predicates
AND
A.ID=ITEM_1
A.CID=ITEM_2
ITEM_3=NVL(TYPE,'-99999')
ITEM_4=NVL(TREFID,'-99999')
ITEM_5=NVL(ORDREFID,'-99999')
ITEM_6=NVL(AMOUNT,(-99999))
ITEM_7=NVL(PAYDT,TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Filter Predicates
LOADDT<MAX(LOADDT)
TABLE ACCESS TABLE1 FULL 267904
VIEW VW_SQ_1 690385
SORT GROUP BY 690385
TABLE ACCESS TABLE1 FULL 267904
How large is the table? If count of deleted rows is up to 12% then you may think about index.
Could you somehow partition your table - like week by week and then scan only actual week?
Maybe this could be more effecient. When you're using aggregate function, then oracle must walk through all relevant rows (in your case fullscan), but when you use exists it stops when the first occurence is found. (and of course the query would be much faster, when there was one function-based(because of NVL) index on all columns in where clause)
DELETE FROM TABLE1 A
WHERE exists (
SELECT 1
FROM TABLE1 B
WHERE
A.loaddt != b.loaddt
a.id=b.id and
a.cid=b.cid and
NVL(a.type,'-99999') = NVL(b.type,'-99999') and
NVL(a.trefid,'-99999')=NVL(b.trefid,'-99999') and
NVL(a.ordrefid,'-99999')= NVL(b.ordrefid,'-99999') and
NVL(a.amount,'-99999')=NVL(b.amount,'-99999') and
NVL(a.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))=NVL(b.paydt,TO_DATE('9999-12-31','YYYY-MM-DD'))
);
Although some may disagree, I am a proponent of running large, long running deletes procedurally. In my view it is much easier to control and track progress (and your DBA will like you better ;-) Also, not sure why you need to join table1 to itself to identify duplicates (and I'd be curious if you ever run into snapshot too old issues with your current approach). You also shouldn't need multiple delete statements, all duplicates should be handled in one process. Finally, you should check WHY you're constantly re-introducing duplicates each week, and perhaps change the load process (maybe doing a merge/upsert rather than all inserts).
That said, you might try something like:
-- first create mat view to find all duplicates
create materialized view my_dups_mv
tablespace my_tablespace
build immediate
refresh complete on demand
as
select id,cid,type,trefid,ordrefid,amount,paydt, count(1) as cnt
from table1
group by id,cid,type,trefid,ordrefid,amount,paydt
having count(1) > 1;
-- dedup data (or put into procedure and schedule along with mat view refresh above)
declare
-- make sure my_dups_mv is refreshed first
cursor dup_cur is
select * from my_dups_mv;
type duprec_t is record(row_id rowid);
duprec duprec_t;
type duptab_t is table of duprec_t index by pls_integer;
duptab duptab_t;
l_ctr pls_integer := 0;
l_dupcnt pls_integer := 0;
begin
for rec in dup_cur
loop
l_ctr := l_ctr + 1;
-- assuming needed indexes exist
select rowid
bulk collect into duptab
from table1
where id = rec.id
and cid = rec.cid
and type = rec.type
and trefid = rec.trefid
and ordrefid = rec.ordrefid
and amount = rec.amount
and paydt = rec.paydt
-- order by whatever makes sense to make the "keeper" float to top
order by loaddt desc
;
for i in 2 .. duptab.count
loop
l_dupcnt := l_dupcnt + 1;
delete from table1 where rowid = duptab(i).row_id;
end loop;
if (mod(l_ctr, 10000) = 0) then
-- log to log table here (calling autonomous procedure you'll need to implement)
insert_logtable('Table1 deletes', 'Commit reached, deleted ' || l_dupcnt || ' rows');
commit;
end if;
end loop;
commit;
end;
Check your log table for progress status.
1. Parallel
alter session enable parallel dml;
DELETE /*+ PARALLEL */ FROM TABLE1 A WHERE loaddt < (
...
Assuming you have Enterprise Edition, a sane server configuration, and you are on 11g. If you're not on 11g, the parallel syntax is slightly different.
2. Reduce memory requirements
The plan shows a hash join, which is probably a good thing. But without any useful filters, Oracle has to hash the entire table. (Tbone's query, that only use a GROUP BY, looks nicer and may run faster. But it will also probably run into the same problem trying to sort or hash the entire table.)
If the hash can't fit in memory it must be written to disk, which can be very slow. Since you run this query every week, only one of the tables needs to look at all the rows. Depending on exactly when it runs, you can add something like this to the end of the query: ) where b.loaddt >= sysdate - 14. This may significantly reduce the amount of writing to temporary tablespace. And it may also reduce read IO if you use some partitioning strategy like jakub.petr suggested.
3. Active Report
If you want to know exactly what your query is doing, run the Active Report:
select dbms_sqltune.report_sql_monitor(sql_id => 'YOUR_SQL_ID_HERE', type => 'active')
from dual;
(Save the output to an .html file and open it with a browser.)

How to check Oracle database for long running queries

My application, which uses an Oracle database, is going slow or appears to have stopped completely.
How can find out which queries are most expensive, so I can investigate further?
This one shows SQL that is currently "ACTIVE":-
select S.USERNAME, s.sid, s.osuser, t.sql_id, sql_text
from v$sqltext_with_newlines t,V$SESSION s
where t.address =s.sql_address
and t.hash_value = s.sql_hash_value
and s.status = 'ACTIVE'
and s.username <> 'SYSTEM'
order by s.sid,t.piece
/
This shows locks. Sometimes things are going slow, but it's because it is blocked waiting for a lock:
select
object_name,
object_type,
session_id,
type, -- Type or system/user lock
lmode, -- lock mode in which session holds lock
request,
block,
ctime -- Time since current mode was granted
from
v$locked_object, all_objects, v$lock
where
v$locked_object.object_id = all_objects.object_id AND
v$lock.id1 = all_objects.object_id AND
v$lock.sid = v$locked_object.session_id
order by
session_id, ctime desc, object_name
/
This is a good one for finding long operations (e.g. full table scans). If it is because of lots of short operations, nothing will show up.
COLUMN percent FORMAT 999.99
SELECT sid, to_char(start_time,'hh24:mi:ss') stime,
message,( sofar/totalwork)* 100 percent
FROM v$session_longops
WHERE sofar/totalwork < 1
/
Try this, it will give you queries currently running for more than 60 seconds. Note that it prints multiple lines per running query if the SQL has multiple lines. Look at the sid,serial# to see what belongs together.
select s.username,s.sid,s.serial#,s.last_call_et/60 mins_running,q.sql_text from v$session s
join v$sqltext_with_newlines q
on s.sql_address = q.address
where status='ACTIVE'
and type <>'BACKGROUND'
and last_call_et> 60
order by sid,serial#,q.piece
v$session_longops
If you look for sofar != totalwork you'll see ones that haven't completed, but the entries aren't removed when the operation completes so you can see a lot of history there too.
Step 1:Execute the query
column username format 'a10'
column osuser format 'a10'
column module format 'a16'
column program_name format 'a20'
column program format 'a20'
column machine format 'a20'
column action format 'a20'
column sid format '9999'
column serial# format '99999'
column spid format '99999'
set linesize 200
set pagesize 30
select
a.sid,a.serial#,a.username,a.osuser,c.start_time,
b.spid,a.status,a.machine,
a.action,a.module,a.program
from
v$session a, v$process b, v$transaction c,
v$sqlarea s
Where
a.paddr = b.addr
and a.saddr = c.ses_addr
and a.sql_address = s.address (+)
and to_date(c.start_time,'mm/dd/yy hh24:mi:ss') <= sysdate - (15/1440) -- running for 15 minutes
order by c.start_time
/
Step 2: desc v$session
Step 3:select sid, serial#,SQL_ADDRESS, status,PREV_SQL_ADDR from v$session where sid='xxxx' //(enter the sid value)
Step 4: select sql_text from v$sqltext where address='XXXXXXXX';
Step 5: select piece, sql_text from v$sqltext where address='XXXXXX' order by piece;
You can use the v$sql_monitor view to find queries that are running longer than 5 seconds. This may only be available in Enterprise versions of Oracle. For example this query will identify slow running queries from my TEST_APP service:
select to_char(sql_exec_start, 'dd-Mon hh24:mi'), (elapsed_time / 1000000) run_time,
cpu_time, sql_id, sql_text
from v$sql_monitor
where service_name = 'TEST_APP'
order by 1 desc;
Note elapsed_time is in microseconds so / 1000000 to get something more readable
You can generate an AWR (automatic workload repository) report from the database.
Run from the SQL*Plus command line:
SQL> #$ORACLE_HOME/rdbms/admin/awrrpt.sql
Read the document related to how to generate & understand an AWR report. It will give a complete view of database performance and resource issues. Once we are familiar with the AWR report it will be helpful to find Top SQL which is consuming resources.
Also, in the 12C EM Express UI we can generate an AWR.
You can check the long-running queries details like % completed and remaining time using the below query:
SELECT SID, SERIAL#, OPNAME, CONTEXT, SOFAR,
TOTALWORK,ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE"
FROM V$SESSION_LONGOPS
WHERE OPNAME NOT LIKE '%aggregate%'
AND TOTALWORK != 0
AND SOFAR <> TOTALWORK;
For the complete list of troubleshooting steps, you can check here:Troubleshooting long running sessions
select sq.PARSING_SCHEMA_NAME, sq.LAST_LOAD_TIME, sq.ELAPSED_TIME, sq.ROWS_PROCESSED, ltrim(sq.sql_text), sq.SQL_FULLTEXT
from v$sql sq, v$session se
order by sq.ELAPSED_TIME desc, sq.LAST_LOAD_TIME desc;

Resources