I have a few sprocs that execute some number of more complex queries and liberally use collections.
My DBA is complaining that they occasionally consume a S#$%ton of in-memory TEMP tablespace.
I can perform optimizations on the queries but i also wish to be as noninvasive as possible and to do this i need to see the effects my changes have on the TEMP tablespace.
QUESTION:
How can i see what cost my query has on the TEMP tablespace?
One thing to consider is i dont have DBA access.
Thanks in advance.
Depends what you mean by the cost your query has on temp.
If you can select from v$tempseg_usage, you can see how much space you are consuming in temp - on a DEV database there is no reason your DBA cannot give you access to that view.
As was mentioned by gpeche - autotrace will give you a good idea about how many IOs you are doing from temp - so that combined with the space usage will give you a good idea about what is going on.
Large collections are generally a bad idea - they consume a lot of memory in the PGA (which is very different from TEMP) which is shared by all the other sessions - this will be what your DBA is concerned about. How large is large depends on your system - low thousands of small records probably isn't too bad, but 100's of thousands or millions of records in a collection and I would be getting worried.
Before doing all kinds of interesting queries and tricks, estimate the data volume that should be sorted, after filtering. If this is larger than what fits in the sort area, the sort will move blocks from memory to temp and read them back later. Add a little overhead to the raw data size; use 30% overhead. This should give a reasonable estimation for the needed total sort size.
Use the same strategy for collections. There has to be room for the data somewhere, there is no magic/compression that makes your data volume smaller. If you have memory for 1000 rows max and try to use it with 1000.000 rows it won't fit. In that case talk to your dba and try to find a solution. It could be that you end up partitioning your workload.
Without having DBA access, I would try with AUTOTRACE. It will not give you TEMP tablespace consumption, but you can get a lot of useful information for tuning your queries (logical reads, number of sorts to disk, recursive SQL, redo consumption, network roundtrips). Note that you need some privileges granted to use AUTOTRACE, but not full DBA rights.
While your query is running you can query v$sql_workarea_active, or after it has run you can query v$sql_workarea.
These will show you the temp tablespace usage in terms of memory used, disk space used, and (most importantly) the number of passes (space usage is only part of the issue -- multipass sorts are very expensive), and correlate the usage to steps in the explain plan.
You can then consider whether modifying memory management would help you reduce temp tablespace usage both in terms of absolute space used and in the pass count.
Related
newb here, We have an old Oracle 10g instance that they have to keep alive until it is replaced. The nightly jobs have been very slow causing some issues. Every other Week there is a large process that does large amounts of DML (deletes, inserts, updates). Some of these tables have 2+ million rows. I noticed that some of the tables the HWM is higher than expected and in Toad I ran a database advisor check that recommended shrinking some tables, but I am concerned that the tables may need the space for DML operations or will shrinking them make the process faster or slower?
We cannot add cpu due to licensing costs
If you are accessing the tables with full scans and have a lot of empty space below the HWM, then yes, definitely reorg those (alter table move). There is no downside, only benefit. But if your slow jobs are using indexes, then the benefit will be minimal.
Don't assume that your slow jobs are due to space fragmentation. Use ASH (v$active_session_history) and SQL monitor (v$sql_plan_monitor) data or a graphical tool that utilizes this data to explore exactly what your queries are doing. Understand how to read execution plans and determine whether the correct plan is being used for your data. Tuning is unfortunately not a simple thing that can be addressed with a question on this forum.
In general, shrinking tables or rebuilding indexes should speed up reads of the table, or anything that does full table scans. It should not affect other DML operations.
When selecting or searching data, all of the empty blocks in the table and any indexes used by the query must still be read, so rebuilding them to reduce empty space and lower the high water mark will generally improve performance. This is especially true in indexes, where space lost to deleted rows is not recovered for reuse.
I am new to Oracle Exadata. My question is, to Index or not to Index in Exadata?
Found some of the blogs which says not to Index database Index and only to storage indexes which are temporary, but there are no official documentation from Oracle which says not to index in Exadata.
What are the issues if I index in Exadata? (since it is implemented in memory concepts), will it improve or downgrade performance? Is it better to drop index if already created?
We have huge datas 15 million plus and growing in Oracle Exadata with Varchars, CLOBS and other common datatypes. Not having any indexes created except primary keys. Why query is taking 10 to 12 minutes ( from 15 million records with simple select query having few where conditions) for execution? Oracle says Exadata is the fastest database in the planet.
The decision for an index is independent of the platform. It is always the same process, namely:
Does the benefits of having the index outweigh the cost of having the index.
Costs
has to be maintained
space overhead
might increase contention in high insert/update/delete frequency environments
Benefits
faster response times
The reason you might have less indexes in Exadata is that if other mechanisms (storage indexes, compression, flash, etc etc) can give you response times that meet your business requirements, then you can save on not having the drawbacks of those indexes.
But the decision process remains identical - cost vs benefit.
A common technique to assess an existing index is to make it invisible and see if there is an adverse (or beneficial) impact. In that way, if you have to revert and keep the index, there is no cost in doing so.
In addition to Connor's answer, be aware that an index in not always the best way to access the data. This is true even on non-Exadata storage systems. The process and considerations of whether to use an index is independent of Exadata; what Exadata does is give more reasons/capabilities not to use an index.
The oaktable article (shown in earlier comment) shows why it is better in exadata to most always not have an index. The note from Oracle below explains why. In a non-exadata DB dumb-storage return blocks (usually 8k) not rows, so for large tables a FTS is almost always a bad thing (unless you need most rows). Exadata has smart storage that has info from the query and tries to eliminate bytes that won't answer the query. It tries to return only bytes (not blocks) that may answer the query. This action lowers I/O back to the DB for processing. This way a FTS is not so bad and may actually be preferred. As a DBA, I have a DB with 12TB and many times I have to stick in a NO_INDEX hint to improve queries. This goes against normal modeling theory. Retrieving data from disk is the slowest process in the DB. Exadata removes unneeded data early in the process (at the storage level) and lessens the amount of data sent back to the DB for processing. Many times, my FTS on 8 billion row table is much faster than when using an index... only in Exadata ;)
http://www.oracle.com/technetwork/testcontent/o31exadata-354069.html
I have a read only database (product) that recides on its own Sql Server 2008.
I already optimized queries by looking at most expensive queries in activity monitor - report. I ordered the report by CPU-cost. I now have something like 50 queries/second and no query is longer than 300ms.
CPU-Time is ok (30%) and Memory is only used by 20% (out of 64GB).
There is one issue: disk time is at steady 100% (I looked at idle time performance counter and used ideras SQL diagnostic manager). I can see that the product db behaves different than my order db which is on a different machine and has smaller tables: If I look at a profiler trace I have queries in product-db that show a value in column "read" higher than 50.000. In my order DB these values are never higher than 1000. The queries in product-db use a lot of Common table expressions, work on large tables (some are around 5 Million entries).
I am not shure if I should invest time in optimizing queries for i/o performance or if I should just add a server. By otimizing for query duration I already added the missing indexes. Is optimizing for i/o something that is usually done?
In short, yes. Optimize for both CPU and IO.
Queries with high CPU tend to be doing unnecessary in-memory sorts, (sometimes inefficient) hash joins, or complex logic.
Queries with high IO (Page Reads) tend to be doing full table scans or working in other inefficient ways.
9 times out of 10, the same queries will be near the top of the list, but if you've worked on the high CPU and you still are unhappy with performance, then by all means, work on the high IO procs next.
There's always a next bottleneck.
they say.
Now that you've tuned CPU usage, it's only natural that I/O load emerges as dominant. Is your performance already acceptable? If yes stop, if no you have to estimate how many hours you will have to invest in further tuning and if buying another server or more hard disks might be cheaper.
Regarding the I/O tuning again, try to see what you can achieve with easy measures. Sometimes you can trade CPU for I/O and vice versa. Compression is an example for this. You would then tune that component that is your current bottlneck.
Before you seek to make the I/O faster try to reduce the I/O that is generated.
Look for obvious IO performance improvements for your query, but more importantly, look at how you can improve your IO performance at the server level.
If your other resources (CPU and memory) aren't overloaded, you probably don't need a new server. Consider adding an SSD for logs and temp files, and/or consider if you can affordably fit your whole DB onto an array of SSDs.
Of course, clearing out your disk IO bottleneck is likely to raise CPU usage, but if your performance is close to acceptable, this will probably improve things to the point that you can stop optimizing for now.
Unless you are using SSDs or a DB optimized SAN then IO is almost always the limit in database applications.
So yes, optimize to get rid of it as much as possible.
Table indexes are the first thing to do.
Then, add as much RAM as you possibly can, up to the complete size of your DB files.
Then partition your data tables (if that is a reasonable thing to do) so that any necessary table or index scans are done on only one or two table partitions.
Then I suppose you either buy bigger machines with even more RAM and/or buy SSDs or a SAN or a SAN with SSDs.
Alternatively you rebuild your entire database application to use something like NoSQL or database sharding, and implement all your relations, joins, constraints, etc in a middle interface layer.
We have about 10K rows in a table. We want to have a form where we have a select drop down that contains distinct values of a given column in this table. We have an index on the column in question.
To increase performance I created a little cache table that contains the distinct values so we didn't need to do a select distinct field from table against 10K rows. Surprisingly it seems doing select * from cachetable (10 rows) is no faster than doing the select distinct against 10K rows. Why is this? Is the index doing all the work? At what number of rows in our main table will there be a performance improvement by querying the cache table?
For a DB, 10K rows is nothing. You're not seeing much difference because the actual calculation time is minimal, with most of it consumed by other, constant, overhead.
It's difficult to predict when you'd start noticing a difference, but it would probably be at around a million rows.
If you've already set up caching and it's not detrimental, you may as well leave it in.
10k rows is not much... start caring when you reach 500k ~ 1 million rows.
Indexes do a great job, specially if you just have 10 different values for that index.
This depends on numerous factors - the amount of memory your DB has, the size of the rows in the table, use of a parameterised query and so forth, but generally 10K is not a lot of rows and particularly if the table is well indexed then it's not going to cause any modern RDBMS any sweat at all.
As a rule of thumb I would generally only start paying close attention to performance issues on a table when it passes the 100K rows mark, and 500K doesn't usually cause much of a problem if indexed correctly and accessed by such. Performance usually tends to fall off catastrophically on large tables - you may be fine on 500K rows but crawling on 600K - but you have a long way to go before you are at all likely to hit such problems.
Is the index doing all the work?
You can tell how the query is being executed by viewing the execution plan.
For example, try this:
explain plan for select distinct field from table;
select * from table(dbms_xplan.display);
I notice that you didn't include an ORDER BY on that. If you do not include ORDER BY then the order of the result set may be random, particularly if oracle uses the HASH algorithm for making a distinct list. You ought to check that.
So I'd look at the execution plans for the original query that you think is using an index, and at the one based on the cache table. Maybe post them and we can comment on what's really going on.
Incidentaly, the cache table would usually be implemented as a materialised view, particularly if the master table is generally pretty static.
Serious premature optimization. Just let the database do its job, maybe with some tweaking to the configuration (especially if it's MySQL, which has several cache types and settings).
Your query in 10K rows most probably uses HASH SORT UNIQUE.
As 10K most probably fit into db_buffers and hash_area_size, all operations are performed in memory, and you won't note any difference.
But if the query will be used as a part of a more complex query, or will be swapped out by other data, you may need disk I/O to access the data, which will slow your query down.
Run your query in a loop in several sessions (as many sessions as there will be users connected), and see how it performs in that case.
For future plans and for scalability, you may want to look into an indexing service that uses pure memory or something faster than the TCP DB round-trip. A lot of people (including myself) use Lucene to achieve this by normalizing the data into flat files.
Lucene has a built-in Ram Drive directory indexer, which can build the index all in memory - removing the dependency on the file system, and greatly increasing speed.
Lately, I've architected systems that have a single Ram drive index wrapped by a Webservice. Then, I have my Ajax-like dropdowns query into that Webservice for high availability and high speed - no db layer, no file system, just pure memory and if remote tcp packet speed.
If you have an index on the column, then all the values are in the index and the dbms never has to look in the table. It just looks in the index which just has 10 entries. If this is mostly read only data, then cache it in memory. Caching helps scalability and a lot by relieving the database of work. A query that is quick on a database with no users, might perform poorly if a 30 queries are going on at the same time.
I have to simultaneously load data into a table and run queries on it. Because of data nature, I can trade integrity for performance. How can I minimize the overhead of transactions?
Unfortunately, alternatives like MySQL cannot be used (due to non-technical reasons).
Other than the general optimization practices that apply to all databases such as eliminating full table scans, removing unused or inefficient indexes, etc., etc., here are a few things you can do.
Run in No Archive Log mode. This sacrifices recoverability for speed.
For inserts use the /*+ APPEND */ hint. This puts data into the table above the high water mark which does not create UNDO. The disadvantage is that existing free space is not used.
On the hardware side, RAID 0 over a larger number of smaller disks will give you the best insert performance, but depending on your usage RAID 10 with its better read performance may provide a better fit.
This said, I don't think you will gain much from any of these changes.
Perhaps I'm missing something, but since in Oracle readers don't block writers and writers don't block readers, what exactly is the problem you are trying to solve?
From the perspective of the sessions that are reading the data, sessions that are doing inserts aren't really adding any overhead (updates might add a bit of overhead as the reader would have to look at data in the UNDO tablespace in order to reconstruct a read-consistent view of the data). From the perspective of the sessions that are inserting the data, sessions that are doing reads aren't really adding any overhead. Of course, your system as a whole might have a bottleneck that causes the various sessions to contend for resources (i.e. if your inserts are using up 100% of the available I/O bandwidth, that is going to slow down queries that have to do physical I/O), but that isn't directly related to the type of operations that the different sessions are doing-- you can flood an I/O subsystem with a bunch of reporting users just as easily as with a bunch of insert sessions.
You want transaction isolation read uncommitted. I don't recommend it but that's what you asked for :)
This will allow you to breach transaction isolation and read uncommitted inserted data.
Please read this Ask Tom article: http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html.
UPDATE: I was actually mistaking, Oracle doesn't really support read uncommitted isolation level, they just mention it :).
How about you try disabling all constraints in your table, then inserting all the data, then enabling them back again?
i.e. alter session set constraints=deffered;
However, if you had not set the constraints in your table to defferable during table creation, there might arise a slight problem.
What kind of performance volumes are you looking at? Are inserts batched or numerous small ones?
Before banging your head against the wall trying to think of clever ways to have good performance, did you create any simple prototypes which would give you a better picture of the out-of-the-box performance? It could easily turn out that you don't need to do anything special to meet the goals.