I am trying to compare two tables which are very large in my system(Oracle 10g).
The way I used to compare is the "MINUS" operation.
Because of the large size of tables, I want to know the usage of the temporary tablespace
on the real time.
I googled someways on how to get the usage of the tempory tablespace. But I am not sure
which one is right.Here are the three ways:
1.select TABLESPACE_NAME, BYTES_USED, BYTES_FREE from V$TEMP_SPACE_HEADER;
2.select BYTES_USED,BYTES_CACHED from V$TEMP_EXTEND_POOL
What is the difference of BYTES_USED and BYTES_CACHED
3.select USED_EXTENDS, USED_BLOCKS v$sort_segment
the three ways really confused me a lot and I don't know what is the difference.
Look at the dynamic perfomance views v$sql_workarea and v$sql_workarea_active -- they will tell you not only how much space is being used by the query, but how much of it is attributable to different phases in the execution plan, what sort of sort area it is (hash join etc) and how it is being used (one-pass etc). It'll be a much more effective method of performance tuning.
V$SORT_SEGMENT view can be used to get used/free extents, used/free blocks information for TEMPORARY tablespaces.
V$TEMP_SPACE_HEADER and V$TEMP_EXTEND_POOL views are almost the same which provides used bytes information. However, V$TEMP_EXTEND_POOL is reliable because former is updated only when DB is restarted or tablespace is recreated.
Note: From Oracle 11g, DBA_TEMP_FREE_SPACE view can be used to get TEMPORARY tablespace information.
Related
I am facing an issue while executing a huge query, where the temp tablespace of the oracle instance runs out of space. At the following link is the query.
https://dl.dropboxusercontent.com/u/96203352/Query/title_block.sql
Size of the Temp tablespace is 30 GB and due to clients concerns I can not extend its space more. Therefore, I tried to reduce sort operations but it all went in vain. Is there anyway to optimize or reduce sorts operations of this query.
At the following link the statistics of the PLAN Table is placed.
https://dl.dropboxusercontent.com/u/96203352/Query/PLAN_TABLE_INFO.txt
As the size of the query and the explain plan is way to large to be posted in this question, therefore I have to share it while using a link. Sorry for the inconvenience.
One more thing I can not remove distinct from the select statement as there is duplication in the data returned.
Please help.
The query plan says it all at the very top: all the temp space is being used by the DISTINCT operation. The reason that operation is requiring so much memory is that your query rows are so fat... around 10,000 bytes each!!!
I'm not sure why your query rows are so fat, but one suggestion I would try would be to change your query to do the DISTINCT operation and then, in a later step, CAST the necessary columns to VARCHAR2(3999). Logically that shouldn't affect it, but I've seen strange behaviour with CAST over the years. I wouldn't trust it enough not to at least try my suggestion.
I've tried to figure out which performance impacts the use of temporary tables has on an Oracle database. We want to use these tables in our ETL process to save temporary results. At this time we are using physical tables for this purpose and truncating this tables at the beginning of the ETL process. I know that the truncate process is very expensive and therefore I thought if it would be better to use temporary tables instead.
Have anyone of you experiences if there is a performance boost by using temporary tables in this scenario?
There were only some answers on this question regarding to the SQL Server like in this question. But I don't know if these recommendations also fit for the Oracle db.
It would be nice if anyone could list the advantages and disadvanteges of this feature and also point out in which scenarios this feature could be applicable.
Thanks in advance.
First of all: truncate is not expensive, a delete with no condition is very expensive.
Second: do your temporary table have indexes? What about external keys?
That could affect performance.
The temporary table works more or less like Sql Server (of course the syntax is different, like global temporary table), and both are just table.
You won't get any performance gain with temporary tables against normal table, they are just the same: they have a definition on DB, can have indexes, and are logged.
The only difference is that temporary table are exclusive to your session (except for global table) and that means if multiple scripts from multiple sessions refer to the same table, every one is reading/writing a different table and they cannot locking each other (in this case you could gain performance, but I think it's rarely the case).
I have some very large tables (to me anyway), as in millions of rows. I am loading them from a legacy system and it is taking forever. Assuming hardware is ok that is fast. How can I speed this up? I have tried exporting from one system into CSV and used Sql loader - slow. I have also tried a direct link from one system to another so there is no middle csv file, just unload from one load into another.
One person said something about pre-staging tables and that somehow could make things faster. I don't know what that is or if it could help. I was hoping for input. Thank you.
Oracle 11g is what is being used.
update: my database is clustered so I don't know if I can do anything to speed things up.
What you can try:
disabling all constraints and only enabling them after the load process
CTAS (create table as select)
What you really should do: understand what is you bottleneck. Is it network, file I/O, checking constraints ... then fix that problem. For me looking at the explain plan is most of the time the first step.
As Jens Schauder suggested, if you can connect to your source legacy system via DB link, CTAS would be the best compromise between performance and simplicity, as long as you don't need any joins on the source side.
Otherwise, you should consider using SQL*Loader and tweaking some settings. Using direct path I was able to load 100M records (~10GB) in 12 minutes on a 6 year old ProLaint.
EDIT: I used the data format defined for the Datamation sort benchmark. The generator for it is available in the Apache Hadoop distribution. It generates records with fixed width fields with 99 bytes of data plus a newline character per line of file. The SQL*Loader control file I used for the numbers quoted above was:
OPTIONS (SILENT=FEEDBACK, DIRECT=TRUE, ROWS=1000)
LOAD DATA
INFILE 'rec100M.txt' "FIX 99"
INTO TABLE BENCH (
BENCH_KEY POSITION(1:10),
BENCH_REC_NBR POSITION(13:44),
BENCH_FILLER POSITION(47:98))
What is the configuration you are using?
Does the database where the data is imported have something like a standby database coupled to it? If so, it is very likely to have a configuration with force_logging enabled?
You can check this using
SELECT FORCE_logging from v$database;
It can also be enabled at tablespace level:
SELECT TABLESPACE_name,FORCE_logging from DBA_tablespaces
If your database is running ith force_logging, or your tablespace has force_logging, this will have impact on the import speed.
If this is not the case, check if archivelog mode is enabled.
SELECT LOG_mode from v$database;
If so, it could be that the archives are not written fast enough. In that case increase the size of the online redolog files.
If the database is not running archivelog mode, it still has to write to the redo files, if not using direct path inserts. In that case, check how quick the redo's can be written. Normally, 200GB/h is very well possible, when indexes are not playing a role.
Important is to find what link is causing the lack of performance. It could be the input, it could be the output. Here I focused on the output.
I’ve been tasked with optimizing a rather nasty stored procedure in a legacy system. It’s a database dedicated to search, and a new copy is being generate every day, with a lot of complex joins being de-normalized. No writes are being performed, only SELECTs, so I figured some easy improvements could be made by making the whole database read-only and changing the recovery model to “Simple”.
Much to my surprise, this didn’t help – at all! The stored procedure still takes the same amount of time of complete. If fact, I’m so surprised that I figured I did it wrong!
My questions:
Do I need to do anything other than setting “Database read-only” to “true”?
Am I wrong to expect significant performance improvement by making the database read-only?
Same for the recovery model: Shouldn’t “Simple” have some noticeable impact?
Are there other similar database-wide configurations that can improve performance in this scenario?
The stored procedure is huge, with temporary tables, 40+ tables joined in 20+ queries. But I’d like to optimize the database itself before I edit this proc.
Since no writes are performed by your SP, there is no reason to expect noticable performance improvement from changing recovery model and read-write mode.
As others mentioned, you should look into the query plan and optimize your queries.
Another hint: indexes in the database might get fragmented while the database is filled up. Since the data is not going to be modified any more, it might help to rebuild all the indexes with fillfactor 100 - this might help to get rid of fragmentation and to compact data.
Call this for each table in the database: ALTER INDEX ALL ON table_name REBUILD WITH (FILLFACTOR = 100).
Generally, I won't expect much of performance improvement from this, but it depends on the particular database.
Speaking of query optimization, there are very useful features in SQL Server 2005 and later: Execution Related and Index-Related Dynamic Management Views. In particular, sys.dm_exec_query_stats and missing indexes are of interest.
These give you almost the same information as Tuning Advisor, but using you real-life workload, so you don't need to simulate it and feed to the Advisor.
Have you tried using the Database Engine Tuning Advisor included in SQL Server? It will analyze your query and suggest new indexes that will improve the performance of the query. Some of them will be good, some will be bad (for example, I've seen it suggest adding every column in a table to an index, sometimes like 30 of them!), so I don't follow it blindly. Generally I'll add a few indexes and then retest, to find the suggestions that are the most important. I've used it to optimize many queries that I thought I had properly indexed, only to find I could get a lot more performance out of them.
I had a similar setup, large stored procedures with lots of large temp tables.
Our problem was that the joins with and between the temp tables was very slow.
I recommend that you look at your execution plan and try to add relevant indexes to the temp tables too if you have not already.
We have noticed that our queries are running slower on databases that had big chunks of data added (bulk insert) when compared with databases that had the data added on record per record basis, but with similar amounts of data.
We use Sql 2005 Express and we tried reindexing all indexes without any better results.
Do you know of some kind of structural problem on the database that can be caused by inserting data in big chunks instead of one by one?
Thanks
One tip I've seen is to turn off Auto-create stats and Auto-update stats before doing the bulk insert:
ALTER DATABASE databasename SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT
ALTER DATABASE databasename SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT
Afterwards, manually creating statistics by one of 2 methods:
--generate statistics quickly using a sample of data from the table
exec sp_createstats
or
--generate statistics using a full scan of the table
exec sp_createstats #fullscan = 'fullscan'
You should probably also turn Auto-create and Auto-update stats back on when you're done.
Another option is to check and defrag the indexes after a bulk insert. Check out Pinal Dave's blog post.
Probably SQL Server allocated new disk space in many small chunks. When doing big transactions, it's better to pre-allocate much space in both the data and log files.
That's an interesting question.
I would have guessed that Express and non-Express have the same storage layout, so when you're Googling for other people with similar problems, don't restrict yourself to Googling for problems in the Express version. On the other hand though, bulk insert is a common-place operation and performance is important, so I wouldn't consider it likely that this is a previously-undetected bug.
One obvious question: which is the clustered index? Is the clustered index also the primary key? Is the primary key unassigned when you insert, and therefore initialized by the database? If so then maybe there's a difference (between the two insert methods) in the pattern or sequence of successive values assigned by the database, which affects the way in which the data is clustered, which then affects performance.
Something else: as well as indexes, people say that SQL uses statistics (which it created as a result of runing previous queries) to optimize its execution plan. I don't know any details of that, but as well as "reindexing all indexes", check the execution plans of your queries in the two test cases to ensure that the plans are identical (and/or check the associated statistics).