I am incrementally pulling records from a Web Service. I need to set the status of any records in our local table that were not returned by the Web Service to 'DELETED';
I had intended to store the complete list of record IDs in a PL/SQL table and then perform a single UPDATE statement based on that.
I now estimate the memory usage of the record set to be ~615MB. How does the memory footprint of a PL/SQL table compare to using a global temporary table instead? Would it use a different part of Oracle's memory, PGA vs SGA for instance?
Performance isn't a major concern because this nightly job already runs in Production in an acceptable amount of time. I don't believe adding the 'DELETED' status piece will increase the run duration to affect users.
A global temporary table would not use memory in that way. It stores the values in a temporary segment to which only your session has access, and which is dropped when no longer needed.
Have you considered what happens if your session disconnects? In either method you lose the values you have accumulated. You might like to just use a regular table.
Related
We have a very large partitioned table that needs to drop partition periodically. The business system needs 7*24 hours of operation.
We use global index.
From the following article, we know that Oracle supports asynchronous index update.
https://oracle-base.com/articles/12c/asynchronous-global-index-maintenance-for-drop-and-truncate-partition-12cr1
But : "The actual index maintenance is performed at a later time"\
Does it affect the normal business when it is actually executed.( Query/Insert/Update/Delete )
No it doesn't, as you can read here (I supposed you are running 12.1 as you didn't specify database version and you linked 12.1 documentation).
The parts that are of interest to you are the following:
The partitions of tables containing local indexes are locked to prevent DML operations against the affect table partitions, except for an ONLINE MOVE operation. However, unlike the index maintenance for local indexes, any global index is still fully available for DML operations and does not affect the online availability of the OLTP system.
[...]
For example, dropping an old partition is semantically equivalent to deleting all the records of the old partition using the SQL DELETE statement. In the DML case, all index entries of the deleted data set have to be removed from any global index as a standard index maintenance operation, which does not affect the availability of an index for SELECT and DML operations.
I have some SQL that runs quickly on its own, but is 60 times slower when the same SQL is inside a stored proc. The SQL uses temp tables in many of its queries. Why is it slower in a sp?
tl;dr
After populating the (temp) table then consider a) running create index and/or b) running update statistics on the table; both of these provide the following benefits:
forces a recompilation of any follow-on queries that reference the (temp) table
provides the compiler with stats about the (temp) table [and index]
Hard to say with 100% certainty without a minimal, reproducible example (to include ASE version and query plans) but some ideas to consider:
I'm guessing the SQL that runs quickly on its own is based on a
multi-batch process (eg, temp table created in one batch, temp table
populated in one batch, query(s) run in yet another batch) in which
case the query(s) is compiled with the benefit of having access to
some stats for the non-empty temp table
with the proc the steps (create temp table, populate temp table, run
query(s)) is all part of a single batch and therefore, as pointed
out in eric's answer, the
query(s) will (initially) be compiled based on some assumptions
about the temp table size
another possibiilty: 1st time proc is executed it's compiled based
on user1's (temp table) data set; user2 then runs the proc and ends
up using the cached version of user1's execution plan, but if
user2's (temp table) data set is considerably different (such that
user2's data would generate a different query plan) then applying
user1's query plan to user2's data could lead to a poor/slow
execution
Regardless of why the proc's query(s) is run with a less-than-optimal query plan the general 'solution' is to make sure the query(s) is compiled with some up-to-date info about the temp table.
How/when to (re)compile the proc's query(s) depends on ASE version and any number of associated scenarios:
newer ASE versions have config parameters deferred name resolution, procedure deferred compilation and optimize temp table resolution which can dictate when individual queries, within a proc, are compiled; but even this may not be enough if idea #2 (above) is in play
the proc could be created with recompile (to force a recompilation on each run) but this may not be enough if (again) idea #2 (above) is in play
creating an index (after the temp table has been populated) and/or running update statistics (again, preferably after temp table has been populated) should force a recompilation of any follow-on queries
(re)compilation of large, complex queries may extend the overall run time for the query (as opposed to re-using a pre-existing query plan); net result is that excessive recompilations can lead to overall degradation in performance
the use of abstract query plans (as well as other optimizer hints), while not capable of eliminating recompiations, can help reduce the time to (re)compile queries [this starts getting into advanced P&T work - likely a bit more than what is needed for this particular Q&A]
While the documentation's recommendation to create the temp table(s) prior to calling the proc may have some benefit, there are a few pitfalls here, too:
having to manage/remember the relationship between parent process (batch or proc where temp table is created) and the child proc (where the problematic query is run)
for a child proc called by multiple/different parent processes the same temp table DDL must be used by all parent processes otherwise the child proc could end up with excessive recompilations due to schema changes in the temp table; the issue is that if the proc notices a different temp table structure (eg, different datatypes, columns named in different order, different indexes) then it will force a recompilation
Sybase says, "When you create a table in the same stored procedure or batch where it is used, the query optimizer cannot determine how large the table is because the table was not created when the query was optimized. This applies to both temporary tables and regular user tables."
Sybase recommends creating the temp tables outside of the stored proc. Also some solutions if you create indices in certain ways. See Sybase docs for more specifics
Does you stored procedure has parameters?
If yes, SQL Server uses a process called parameter sniffing when it executes stored procedures that have parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used.
What happens when the values in a table you’re querying aren’t evenly distributed? What if one value would return 10 rows and another value would return 10,000 rows, or 10 million rows?
What will happen is that the first time the procedure is run and the plan is compiled, whatever value is passed in is stored with the plan. Every time it’s executed, until it’s recompiled, the same value and plan will be used – regardless of whether it is the fastest or best plan for that value.
You can force SQL Server to recompile the stored procedure each time it is run. The benefit here is that the best query plan will be created each time it is run. However, recompiling is a CPU-intensive operation. This may not be an ideal solution for stored procedures that are run frequently, or on a server that is constrained by CPU resources already.
ALTER PROCEDURE SP_Test
#ProductID INT
WITH RECOMPILE
AS
SELECT OrderID, OrderQty
FROM SalesOrderDetail
WHERE ProductID = #ProductID
I have a same query running on two different DB servers with almost identical config. Query is doing Full Table scan(FTS) on one table
SELECT COUNT (1) FROM tax_proposal_dtl WHERE tax_proposal_no = :b1 AND taxid != :b2 AND INSTR(:b3 , ',' || STATUS || ',' ) > 0 ;
While on 1st DB I get result in less than 3 secs with 0 disk read while on 2nd DB disk read is high and elapsed time is approx 9 secs
Only difference between the table config on two DBs is that on 1st Table has Cache = 'Y' while on 2nd Cache = 'N'. As per my understanding is that in case of FTS cache wont be used and direct path read will be used. so, why is the performance of same query is impacted by cache/nocache(Because that is the only difference between two envs and even the execution plan is same).
As suggested by Jon and after doing further research on this topic(Specially with regards to _SMALL_TABLE_THRESHOLD), I am adding more details.
Current version: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit
Details of 2nd DB:
Total block count of table from DBA_SEGMENTS = 196736
Details of 1st DB:
Total block count of table from DBA_SEGMENTS = 172288
Execution plan on both the DBs are same but there are two major differences :
a) On 2nd DB cache option is false on the table(I tried alter table cache but still no impact on performance)
b) On 2nd DB because _STT parameter is 23920 so as per 5*_STT rule table will not be qualified as medium sized table while on 1st DB _STT parameter is 48496 so as per 5*_STT rue table will be qualified as medium sized table.
Below is a chart based on my research till now on _STT an Cache parameter of how system will behave for different table size.
Please let me know if my understanding is correct in assuming that Cache option will have no impact on Medium or Large sized table but it will help in retaining small sized table longer in LRU. So based on above assumptions and chart presented I am concluding that in the case of 2nd DB Table is classified as Large sized table and hence DPR and more elapsed time while in the case of 1st it is classified as medium sized table and hence cache read and less elapsed time.
As per this link I have set the _STT parameter on session on 2nd DB
alter session set "_small_table_threshold"=300000;
So, performance has improved considerably and almost same as 1st DB with 0 disk reads, as this implies that table will be considered Small sized.
I have used following articles in my research.
https://jonathanlewis.wordpress.com/2011/03/24/small-tables/
https://hoopercharles.wordpress.com/2010/06/17/_small_table_threshold-parameter-and-buffer-cache-what-is-wrong-with-this-quote/?unapproved=43522&moderation-hash=be8d35c5530411ff0ca96388a6fa8099#comment-43522
https://dioncho.wordpress.com/tag/full-table-scan/
https://mikesmithers.wordpress.com/2016/06/23/oracle-pinning-table-data-in-the-buffer-cache/
http://afatkulin.blogspot.com/2012/07/serial-direct-path-reads-in-11gr2-and.html
http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html
The keywords CACHE and NOCACHE are a bit misleading - they don't simply enable or disable caching, they only make cache reads more or less likely by changing how the data is stored in the cache. Like most memory systems, the Oracle buffer cache is constantly adding new data and aging out old data. The default, NOCACHE, will still add table data from full table scans to the buffer cache, but it will mark it as the first piece of data to age out.
According to the SQL Language Reference:
CACHE
For data that is accessed frequently, this clause indicates
that the blocks retrieved for this table are placed at the most
recently used end of the least recently used (LRU) list in the buffer
cache when a full table scan is performed. This attribute is useful
for small lookup tables.
...
NOCACHE
For data that is not accessed
frequently, this clause indicates that the blocks retrieved for this
table are placed at the least recently used end of the LRU list in the
buffer cache when a full table scan is performed. NOCACHE is the
default for LOB storage.
The real behavior can be much more complicated. The in-memory option, result caching, OS and SAN caching, direct path reads (usually for parallelism), the small table threshold (where Oracle doesn't cache the whole table if it exceeds a threshold), and probably other features I can't think of may affect how data is cached and read.
Edit: I'm not sure if I can add much to your analysis. There's not a lot of official documentation around these thresholds and table scan types. Looks like you know as much about the subject as anyone else.
I would caution that this kind of full table scan optimization should only be needed in rare situations. Why is a query frequently doing a full table scan of a 1GB table? Isn't there an index or a materialized view that could help instead? Or maybe you just need to add more memory if you need the development environment to match production.
Another option, instead of changing the small table threshold, is to change the perceived size of the table. Modify the statistics so that Oracle thinks the table is small. This way no other tables are affected.
begin
dbms_stats.set_table_stats(ownname => user, tabname => 'TAX_PROPOSAL_DTL', numblks => 999);
dbms_stats.lock_table_stats(ownname => user, tabname => 'TAX_PROPOSAL_DTL');
end;
/
I have a pretty complex query where we make use of a temporary table (this is in Oracle running on AWS RDS service).
INSERT INTO TMPTABLE (inserts about 25.000 rows in no time)
SELECT FROM X JOIN TMPTABLE (joins with temp table also in no time)
DELETE FROM TMPTABLE (takes no time in a copy of the production database, up to 10 minutes in the production database)
If I change the delete to a truncate it is as fast as in development.
So this change I will of course deploy. But I would like to understand why this occurs. AWS team has been quite helpful but they are a bit biased on AWS and like to tell me that my 3000 USD a month database server is not fast enough (I don't think so). I am not that fluent in Oracle administration but I have understood that if the redo logs are constantly filled, this can cause issues. I have increased the size quite substantially, but then again, this doesn't really add up.
This is a fairly standard issue when deleting large amounts of data. The delete operation has to modify each and every row individually. Each row gets deleted, added to a transaction log, and is given an LSN.
truncate, on the other hand, skips all that and simply deallocates the data in the table.
You'll find this behavior is consistent across various RDMS solutions. Oracle, MSSQL, PostgreSQL, and MySQL will all have the same issue.
I suggest you use an Oracle Global Temporary table. They are fast, and don't need to be explicitly deleted after the session ends.
For example:
CREATE GLOBAL TEMPORARY TABLE TMP_T
(
ID NUMBER(32)
)
ON COMMIT DELETE ROWS;
See https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#ADMIN11633
I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column) and then use this AVG to filter the table from all values below average. I then need to INSERT these filtered results into a table.
The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.
I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)
Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.
Temporary table are, however, dropped at the end of a database session:
Temporary tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction.
There are multiple considerations you have to take into account:
If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).
Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.
Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.
EDIT:
If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.