Query cost: Global Temporary Tables vs. Collections (Virtual Arrays) - oracle

I have a query whose results are stored in a GTT (Global Temporary Table) and in a Collection.
Selecting the data from the GTT again, I get a very small cost: 103.
SELECT
...
FROM my_table_gtt
JOIN table2 ...
JOIN table3 ...
But when switching this from a GTT to a Collection (VA - Virtual Array), the cost skyrockets (78.000), but the difference in execution times between the two is very small.
SELECT
...
FROM TABLE(CAST(my_table_va as my_table_tt))
JOIN table2 ...
JOIN table3 ...
My question is why is there such a big difference in cost between the two approaches? From my knowledge, GTTs don't store table statistics, so why is it returning a better cost than the VA?

Global temporary tables can have statistics as any other table. In fact they are like any other table, they have data segments, just in temporary tablespace.
In 11g the statistics are global so they sometimes cause issues with execution plans. In 12c they are session based so each session gets proper ones (if available).
The collection type cardinality is based on DB block size and for default 8 kB block is 8168. Collection content is stored in PGA. It's quite common to hint the cardinality when using collection types in complex queries to hint the optimizer. You can also use extended optimizer interface for implementing own way for calculating cost.
Edit - added tests:
CREATE TYPE STRINGTABLE IS TABLE OF VARCHAR2(255);
CREATE GLOBAL TEMPORARY TABLE TMP (VALUE VARCHAR2(255));
INSERT INTO TMP SELECT 'Value' || LEVEL FROM DUAL CONNECT BY LEVEL <= 1000000;
DECLARE
x STRINGTABLE;
cnt NUMBER;
BEGIN
SELECT VALUE BULK COLLECT INTO x FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(VALUE)) INTO cnt FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(COLUMN_VALUE)) INTO cnt FROM TABLE(x);
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
END;
In this case is the access to GTT about twice as fast then to collection, cca 200 ms vs. 400 ms on my test machine. When I increased the number of rows to 10 000 000, I got ORA-22813: operand value exceeds system limits on the second query.

The most important difference between collections and GTT in SQL, is that CBO(cost-based optimizer) has limitations for TABLE function (kokbf$...), for example JPPD doesn't work with TABLE() functions.
Some workarounds: http://orasql.org/2019/05/30/workarounds-for-jppd-with-view-and-tablekokbf-xmltable-or-json_table-functions/

Related

Optimizing SQL by using temporary table in Oracle

I have a data cleanup-er procedure which deletes the same data from the card rows of two tables.
Both of these update statement use the same subQuery for detecting which rows should be updated.
UPDATE table_1 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
UPDATE table_2 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
Is using Oracle Temporary table good solution for optimizing my code?
CREATE TEMPORARY TABLE tmp_sub_table AS
select id from sub_table WHERE /* complex clause here */;
UPDATE table_1 SET card = NULL WHERE id in (select * from tmp_sub_table);
UPDATE table_2 SET card = NULL WHERE id in (select * from tmp_sub_table);
Should I use Local temporary table or Global Temporary table?
Global Temporary Tables are persistent data structures. When we INSERT the data is written to disk, when we SELECT the data is read from disk. So that's quite a lot of Disk I/O: the cost saving from running the same query twice must be greater than the cost of all those writes and reads.
One thing to watch out for is that GTTs are built on a Temporary Tablespace, so you might get contention with other long running processes which are doing sorts, etc. It's a good idea to have a separate Temporary Tablespace, just for GTTs but not many DBAs do this.
An alternative solution would be to use a collection to store subsets of the records in memory and use bulk processing.
declare
l_ids sys.ocinumberlist;
cursor l_cur is
select id from sub_table WHERE /* complex clause here */
order by id
;
begin
open lcur;
loop
fetch lcur bulk collect into l_ids limit 5000;
exit when l_ids.count() = 0;
update table1
set card=null
where id member of l_ids;
update table2
set card=null
where id member of l_ids;
end loop;
end;
"updating many rows with one update statement ... works much faster than updating separately using Looping over cursor"
That is the normal advice. But this is a bulk operation: it is updating five thousand rows at a time, so it's faster than row-by-row. The size of the batch is governed by the BULK COLLECT ... LIMIT clause: you don't want to make the value too high because the collection is in session memory but as you're only select one column - and a number - maybe you can make it higher.
As always tuning is a matter of benchmarking. Have you established that running this sub-query twice is a high-cost operation?
select id from sub_table WHERE /* complex clause here */
If it seems too slow you need to test other approaches and see whether they're faster. Maybe a Global Temporary Table is faster than a bulk operation. Generally memory access is faster than disk access, but you need to see which works best for you.

postgres not using index on SELECT COUNT(*) for a large table

I have four tables; two for current data, two for archive data. One of the archive tables has tens of millions of rows. All tables have a couple narrow indexes and are very similar.
Given the following queries:
SELECT (SELECT COUNT(*) FROM A)
UNION SELECT (SELECT COUNT(*) FROM B)
UNION SELECT (SELECT COUNT(*) FROM C_LargeTable)
UNION SELECT (SELECT COUNT(*) FROM D);
A, B and D perform index scans. C_LargeTable uses a seq scan and the query takes about 20 seconds to execute. Table D has millions of rows as well, but is only about 10% of the size of C_LargeTable
If I then modify my query to execute using the following logic, which sufficiently narrows counts, I still get the same results, the index is used and the query takes about 5 seconds, or 1/4th of the time
...
SELECT (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col < 'G')
+ (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col BETWEEN 'G' AND 'Q')
+ (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col > 'Q')
...
It does not makes sense to me to have the I/O overhead of a full table scan for a count when perfectly good indexes exist and there is a covering primary key which would ensure uniqueness. My understanding of postgres is that a PRIMARY KEY isn't like a SQL Server clustering index in that it determines a sort, but it implicitly creates a btree index to ensure uniqueness, which I assume should require significantly less I/O than a full table scan.
Is this potentially an indication of an optimization that I may need to perform to organize data within C_LargeTable?
There isn't a covering index on the primary key because PostgreSQL doesn't support them (true up to and including 9.4 anyway).
The heap scan is required because of MVCC visibility. The index doesn't contain visibility information. Pg can do an index scan, but it still has to check visibility info from the heap, and with an index scan that'd be random I/O to read the whole table, so a seqscan will be much faster.
Make sure you run 9.2 or newer, and that autovacuum is configured to run frequently on the table. You should then be able to do an index-only scan where the visibility map is used. This only works under limited circumstances as Horse notes; see the wiki page on count and on index-only scans. If you aren't letting autovacuum run regularly enough the visibility map will be outdated and Pg won't be able to do an index-only scan.
In future, make sure you post explain or preferably explain analyze output with any queries.

Use of index in multiple join condition oracle

I have two tables: tableA and tableB
TableA have millions of record and tableB have around 1000 records
Table A {
aid
city, (city is indexed)
state,
X,
Y
}
Table B {
bid,
city,
state
}
Now my query is
SELECT X, Y, COUNT(*) FROM A,B
WHERE A.city = B.city
and A.state=B.state
group by X,Y
This query is running very slow. However when we had join only on city everything was working very quickly.
Now my query is
SELECT X, Y, COUNT(*) FROM A,B
WHERE A.city = B.city
group by X,Y
So I went to the explain plan and in the first case(slow) the query plan is not using the index whereas in the second case it was using the city index. I tried adding state index in A table which did not help as expected. Also i tried to use the index hint like /*+ INDEX(A,city_idx) */ after select which did not help much. Can you help me out in this case?
Creating indexes for both tables on city and state is likely to help.
Create a composite index on the table A that has all the four columns: city, state, X, Y:
CREATE INDEX index_name ON table_name (city, state, X, Y);
In this way, your query won't need to access the table A, only the newly created index. Of course, the downside of yet another index -> insert/update/delete in this table will be slower.
TableA have millions of record and tableB have around 1000
In this case using nested loops seems like the most suited access path for the job.
you are requesting a aggregation based on two columns from table A meaning oracle will have to access pretty much all the blocks in the table anyway. In this case creating an index on the big table will be useless. creating an index on the small, inner table of the join, will make sense.
WHERE A.city = B.city and A.state=B.state
WHERE A.city = B.city
Can the same city exist in two states ? sounds unlikely... if a city cannot exists in more then one state then any index on state (in either table) will be redundant.
As #Florin Ghita noted in his comment you can use the hint USE_NL to force oracle to use nested loops but personally, I highly recommend avoiding hints (for so many reasons - mostly maintenance).
my suggestions are
gather stats on both tables to make sure oracle knows the
proportions and have sufficient data to estimate cardinality
exec dbms_stats.gather_table_stats(user,'tableX').
Test the query with parallel execution - parallel is great at
speeding NL between small and big tables by broadcasting the entire
small table to the slave process working the big table chunk (get
even further with compression on the small table).
Cities and states are related but the optimizer does not understand that. Oracle can probably accurately predict each condition separately but not together.
For example, assume that 10% of all states match and 10% of all cities match. When both conditions are present Oracle will estimate 0.1 * 0.1 = 0.01. The real number is probably closer to 0.1. If the city name matches the state name will almost always match.
Adding extended statistics tells Oracle about this column relationship. And these statistics can help any query, not just the current problem query.
declare
v_name varchar2(100);
begin
v_name := dbms_stats.create_extended_stats(user, 'A', '(city, state)');
v_name := dbms_stats.create_extended_stats(user, 'B', '(city, state)');
dbms_stats.gather_table_stats(user, 'A');
dbms_stats.gather_table_stats(user, 'B');
end;
/
Without the plans we can't accurately predict whether this will solve the problem or not. But giving the optimizer more accurate information usually helps and almost never hurts.

Optimal way to DELETE specified rows from Oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;
Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.
First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id
I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);

where rownum=1 query taking time in Oracle

I am trying to execute a query like
select * from tableName where rownum=1
This query is basically to fetch the column names of the table.There are more than million records in the table.When I put the above condition its taking so much time to fetch the first row.Is there any alternate to get the first row.
This question has already been answered, I will just provide an explanation as to why sometimes a filter ROWNUM=1 or ROWNUM <= 1 may result in a long response time.
When encountering a ROWNUM filter (on a single table), the optimizer will produce a FULL SCAN with COUNT STOPKEY. This means that Oracle will start to read rows until it encounters the first N rows (here N=1). A full scan reads blocks from the first extent to the high water mark. Oracle has no way to determine which blocks contain rows and which don't beforehand, all blocks will therefore be read until N rows are found. If the first blocks are empty, it could result in many reads.
Consider the following:
SQL> /* rows will take a lot of space because of the CHAR column */
SQL> create table example (id number, fill char(2000));
Table created
SQL> insert into example
2 select rownum, 'x' from all_objects where rownum <= 100000;
100000 rows inserted
SQL> commit;
Commit complete
SQL> delete from example where id <= 99000;
99000 rows deleted
SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from example where rownum = 1;
Elapsed: 00:00:05.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=2015)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'EXAMPLE' (TABLE) (Cost=7 Card=1588 [..])
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
33211 consistent gets
25901 physical reads
0 redo size
2237 bytes sent via SQL*Net to client
278 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see the number of consistent gets is extremely high (for a single row). This situation could be encountered in some cases where for example, you insert rows with the /*+APPEND*/ hint (thus above high water mark), and you also delete the oldest rows periodically, resulting in a lot of empty space at the beginning of the segment.
Try this:
select * from tableName where rownum<=1
There are some weird ROWNUM bugs, sometimes changing the query very slightly will fix it. I've seen this happen before, but I can't reproduce it.
Here are some discussions of similar issues: http://jonathanlewis.wordpress.com/2008/03/09/cursor_sharing/ and http://forums.oracle.com/forums/thread.jspa?threadID=946740&tstart=1
Surely Oracle has meta-data tables that you can use to get column names, like the sysibm.syscolumns table in DB2?
And, after a quick web search, that appears to be the case: see ALL_TAB_COLUMNS.
I'd use those rather than go to the actual table, something like (untested):
SELECT COLUMN_NAME
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = "MYTABLE"
ORDER BY COLUMN_NAME;
If you are hell-bent on finding out why your query is slow, you should revert to the standard method: asking your DBMS to explain the execution plan of the query for you. For Oracle, see section 9 of this document.
There's a conversation over at Ask Tom - Oracle that seems to suggest the row numbers are created after the select phase, which may mean the query is retrieving all rows anyway. The explain will probably help establish that. If it contains FULL without COUNT STOPKEY, then that may explain the performance.
Beyond that, my knowledge of Oracle specifics diminishes and you will have to analyse the explain further.
Your query is doing a full table scan and then returning the first row.
Try
SELECT * FROM table WHERE primary_key = primary_key_value;
The first row, particularly as it pertains to ROWNUM, is arbitrarily decided by Oracle. It may not be the same from query to query, unless you provide an ORDER BY clause.
So, picking a primary key value to filter by is as good a method as any to get a single row.
I think you're slightly missing the concept of ROWNUM - according to Oracle docs: "ROWNUM is a pseudo-column that returns a row's position in a result set. ROWNUM is evaluated AFTER records are selected from the database and BEFORE the execution of ORDER BY clause."
So it returns ANY row that it consideres #1 in the result set which in your case will contain 1M rows.
You may want to check out a ROWID pseudo-column: http://psoug.org/reference/pseudocols.html
I've recently had the same problem you're describing: I want one row from the very large table as a quick, dirty, simple introspection, and "where rownum=1" alone behaves very poorly. Below is a remedy which worked for me.
Select the max() of the first term of some index, and then use it to choose some small fraction of all rows with "rownum=1". Suppose my table has some index on numerical "group-id", and compare this:
select * from my_table where rownum = 1;
-- Elapsed: 00:00:23.69
with this:
select * from my_table where rownum = 1
and group_id = (select max(group_id) from my_table);
-- Elapsed: 00:00:00.01

Resources