I read that oracle's CBO(on recent versions) is so good that even if worst possible join order is given, CBO automagically takes the best join order. So will hints like ORDERED do any good on recent versions(10,11)?. Is it possible for the CBO to miss an optimal join order?. Thanks.
My personal experience is that with Oracle 10g / 11g, you can hardly improve any query with hints. Most often they turn out to perform worse the way you think is optimal. The CBO has truly improved drastically.
There are still some (very remote) limitations to the query transformation facilities, especially with rather complex OUTER JOIN constructs, because internally, the ANSI syntax is still mapped to some archaic Oracle constructs involving the (+) operator. Other than that, and other very remote cases, I don't think hints are necessary anymore.
You can tune a lot better by re-writing some SQL, or adding constraints/indexes. Of course, these things hold true only if your statistics and histograms are correct, up-to-date and sensible, as skaffman correctly commented
Oracle CBO results depend on so many parameters, that I can hardly predict the resulting plan in real system. As a general rule I suggest you don't use hints, if you're unsure you must use them.
Regarding ORDERED hint from my experience there're very few times you should tell Oracle the exact order for joins.
Anyway, the best way to find out - is to make some experiments yourself, try out and choose the best solution :)
I suggest you reading Tom Kyte's Effective Oracle by Design - there's a very good chapter on query optimization and how CBO works.
Related
As I come up to speed on Oracle (having been a DB2-guy for the last couple decades), I see a A LOT of existing code that uses optimizer-hints in its queries.
From what I've read on various Oracle-focused web-sites, several Oracle "experts" advise AGAINST putting optimizer-hints in production code because:
With every Oracle patch or upgrade, the hint will probably be wrong.
With every DDL, the hint will probably be wrong.
One "expert" says:
The reason to be wary of hinting is that by embedding hints in your SQL, you are overriding the optimizer and saying that you know more than it does – not just now, but every time in the future that your SQL will be run, irrespective of any other changes that may happen to your database. The likely consequence of this is that your SQL will possibly run sub-optimally now and almost certainly in the future.
(See http://allthingsoracle.com/a-beginners-guide-to-optimizer-hints/)
So, if optimizer-hints are commonly known to be "unwise", why are they so frequently used (... at least in the code I've seen)?
To the extent that hints are common, it's generally because someone in the past prioritized fixing an acute issue rather than dealing with the underlying statistics problem. It's entirely possible that this prioritization was reasonable (particularly at the time) but it introduces technical debt that will probably have to be paid back.
When a system becomes unresponsive because a query plan changed and became massively less efficient, fixing the acute production issue generally takes precedence over identifying the root cause. Slapping a hint on a single query is generally both quicker and easier than figuring out the underlying issue or learning how to use the various tools Oracle provides to ensure query plan stability or to evolve plans over time.
Of course, having slapped a band-aid on a single query, if you don't then invest the time to understand why statistics sent the optimizer down the wrong path or why your approach to plan stability didn't work, it's likely that whatever statistics issue you have will cause other queries to perform poorly. It's very rare that misleading statistics would cause just one query in the system to perform poorly. Generally, that either leads to a viscous circle where more queries start performing badly leading to more hints being added or a virtuous circle where DBAs take a step back, work through what's going wrong to cause query performance to suffer, fix the underlying issue, and then remove the hints.
All that being said, there are a few hints that may be reasonably used relatively commonly depending on the sort of code you're using. If you've got a function that returns a sys_refcursor that is returned to a client application that you know is going to fetch the first few rows, display them to the user, and only ask for the next set of rows if they don't find what they're looking for, it makes sense to use a FIRST_ROWS hint pretty liberally because you know something the optimizer can't possibly know. You know that users are much more interested in the first few rows rather than the complete set of results. If you have lots of code that is using collections in SQL, you probably want to use a lot of CARDINALITY hints because otherwise Oracle has no idea how many elements the collection is likely to have.
In my case, most of the hints I see in our production code are like these:
/*+APPEND*/
/*+ full(MP) parallel(MP, 16) */
/*+ PARALLEL(ME1, 16) FULL(ME1) DRIVING_SITE(ME1) */
Rarely ( but every so often ), I'll see an index-hint:
e.g.
/*+ index(MSL XCL01000) */
Thanks for your valuable input. I certainly understand a bit more the dangers and necessities of hints.
I have some SQLs running on ORACLE 10g database. Some of these SQLs were using oracle sample clause.
e.g.
select /*+ use_nl(emp,dept) +*/ *
from emp, dept
sample(10)
where emp.deptno=dept.deptno
Now, the oracle 10g will be upgraded to 11g. We have to proove that there is no performance impact for the sample clause for the database upgrade. In other words, I should proove the sample clause working well in oracle 11g on performance aspect.
But I spent a whole day on search on google, no expected answer.
Could you give me a answer or suggestion on it? Thank you so much.
Well the good news is that it's exactly the same clause used internally by Oracle for sampling -- most notably for DBMS_Stats programs. So I'd say that it's rather unlikely that it's done anything other than get faster.
The bad news is that there are an enormous number of issues that might harm or improve performance during an upgrade, but where there is a problem you are most likely going to find out pretty quickly and there will be a fix available. However, someone doesn't want to be held to blame if anything at all changes during the upgrade, and is covering their arse by making you waste your time on efforts that are bound to be unproductive and inconclusive.
The correct way to investigate this is to set up a new system with the new version that you are actually going to be using -- no cut-down data sets, no export-import if what you're going to do is an in-place upgrade (you want the physical data layout to be exactly the same as production), no "small representative set of queries", using the same storage architecture, processor type and memory configuration. Run your actual application on real data and compare performance before and after upgrade in a meaningful way.
It is absolutely the only way to be sure, because the fundamental problem with the task that you've been set is that you have to look for evidence of something (a performance problem) not existing. It's like trying to prove the non-existence of invisible pixies in your washing machine that eat one sock out of every pair -- practically impossible! The burden of proof lies with those who say that something might exist.
http://www.logicallyfallacious.com/index.php/logical-fallacies/146-proving-non-existence
Here are some constructive steps you can take:
Search Metalink -- this is the number one authoratative source for upgrade problems, because problems get reported via Metalink and either a bug is raised, or an explanation is published.
Search the Oracle forums -- if there is a general change in behaviour that people are encountering then they'll question it here.
Search the internet generally.
If you've done all of those then that's the limit of the efforts you can be expected to take.
If that's not enough then you or the people behind this migration just have to escalate it, and ask some awkward questions: Was this level of proof required in moving from 9i to 10g? Is it required for upgrades from 10.2.0.2 to 10.2.0.5?
I'd really love to know the politics behind this -- it sounds like a dreadful place to work.
I have a web site that's been progressivelly expanding in both traffic and complexity of database design. I've always worked as a developer first & foremost, and never really been much of a DB administrator beyond what I need to do to get my code running. This needs to change - I need to improve efficiency on the database side of things.
To give a vague example, I'm looking for how to go about learning:
Optimising complex tables/relationships for performance/scaling
How to index efficiently. (At the moment I throw indexes on foreign keys, and that's about it)
General design principles for complex databases
Most of the resources I've found are either directed more towards the basics of SQL ("this is a SELECT query, a JOIN, etc") or focus primarily on performance issues outside the DB.
So, I know this is a little vague - but where should I look to ensure my database is designed in the most most efficient & integral manner possible?
Learn about data modeling. Choosing the right data structure is always a crucial first step, for programming in general and databases in particular. Performance cannot be "bolted" on top of a bad data structure! The ERwin Methods Guide is probably not a bad way to start learning about data modeling.
Learn how DBMSes organize data at the physical level. This will help you immensely in understanding how to "shape" your data for performance and how to effectively leverage many of the performance mechanisms modern DBMSes put at your disposal. Use The Index, Luke! is an excellent tutorial on the topic.
Learn how to efficiently access the database and make sure you really understand the client API that will be called from your code. Different APIs have their own idiosyncrasies, but they all share some common themes, such as parameter binding, query preparation and fetching. Even if you are "shielded" by an ORM from ever having to, say, bind parameters manually, this is still taking place "under the covers" and understanding it raises your ability to write performant code.
Measure, measure, measure. Modern information systems are immensely complex and even experts find themselves making incorrect assumptions, so don't rely on assumptions!
I would suggest some reading in performance tuning. It is very specialized depending on the database backend you use. BUt here are some books to consider:
SQl Server
http://www.amazon.com/Server-Query-Performance-Tuning-Distilled/dp/1590594215/ref=sr_1_2?s=books&ie=UTF8&qid=1334154710&sr=1-2
http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476/ref=sr_1_12?s=books&ie=UTF8&qid=1334154710&sr=1-12
MySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-ebook/dp/B0028N4W7Y/ref=sr_1_3?ie=UTF8&qid=1334154504&sr=8-3
Oracle
http://www.amazon.com/Oracle-Database-Release-Performance-Techniques/dp/0071780262/ref=sr_1_2?s=books&ie=UTF8&qid=1334154909&sr=1-2
General performance Tuning
http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692/ref=sr_1_18?s=books&ie=UTF8&qid=1334154964&sr=1-18
First and foremost, I'd recommend learning how to use EXPLAIN and what its output means. Run it on your most common queries and study the output. Are the queries using sensible indexes? Are they using indexes at all? Queries that look very simple at a glance might end up being quite costly.
Next, I'd suggest finding your slowest queries. Postgres (for example) has a feature that allows you to log the SQL source for all queries that take longer than N seconds to run. Are they slow because they're unindexed, very complex, or operating on a huge amount of data?
Third, I'd look at the number of times a particular query is run. Are you using the database to store static data, and hitting a table over and over again to grab a record that never changes? You could probably cache the result somewhere.
Are there any good scripts that I could run against my Oracle database to test for SQL defects or maybe common performance issues?
Edit: Everything in an Oracle database can be queried. From the PL/SQL packages, indexes and sql running stats. The performance books say look in this place and it will show some absolute values that need the developer to be able to interpret. Has anyone combined their knowledge to include this interpretation within the scripts?
Are you asking for the information in this book?
http://www.amazon.com/Oracle-Database-Performance-Techniques-Osborne/dp/0072263059/ref=sr_1_1?ie=UTF8&s=books&qid=1264619796&sr=1-1
Are you asking about this wiki?
http://wiki.oracle.com/page/Performance+Tuning
Or are you asking for this vendor information?
http://www.oracle.com/technology/deploy/performance/index.html
Edit. There is no magical set of queries that you simply run and set the various tuning options.
Oracle is very complicated. Changing a parameter to make one thing fast can make several other things faster or slower. Or makes makes the instance consume more real memory than you have installed. It's hard to generalize this into magical queries. You have tools, but even then, the tools give you tuning options and you may need to run different experiments.
Performance is a balance. You have to strike a balance between physical I/O time and CPU time. It's not possible to generalize this into a magical query. Your system may need faster physical I/O (data warehouses, for instance, often need this) because it can't effectively work from cache. My system may need faster processor time and will have to work in cache to achieve this.
Performance is a function of your application. No magical query of Oracle will reveal a single thing about how your application is designed to work.
Enterprise Manager and it's associated performance tools are a good place to start looking for queries that are consuming the most resources. Here you can see the plans generated for your SQL, view traces of long running queries, etc.
If you have a budget, there is Spotlight by Quest. I've only used the trial version, but I found it useful.
I would recommend checking out the book Optimizing Oracle Performance and any of Cary Millsap's other writings. It is a waste of time to think about optimizing every query. You really need an approach to finding out where your performance bottlenecks are. His Method R approach is a very good one to read up on. Also most of Tom Kyte's books go into detail about performance issues.
I was planning to benchmark that but since it's a lot of work, I'd like to check if I didn't miss any obvious answer before.
I have a huge query that gets some more details for each row with a subquery.
Each row is then used in a ListAdapter that is plugged in a ListView, so another loop take each row one by one to make it a ListItem.
What do you think is more efficient :
Keeping the subqueries in the SQL mess, counting on the SQL engine to make optimizations .
Taking out the subqueries in the ListAdapter loop, so we lazy load the details on display : much more readable but I'm afraid too many hit would slow down the process.
Two important things :
I can't rewrite the big SQL chunk to get rid of the subqueries. I know it would be better, but I failed to do so.
As far as I can tell, a list won't contain more than 1000 items, and it's a desktop app so there is no concurrency. Is this even relevant to care about perf in that case ? If not, I'd still be interested in the answser for a hight traffic web site anyway. It's good to know...
SQlite is a surprisingly good little engine, but it's not really about extra clever optimizations, and I wouldn't really consider it for a "high traffic web site". One big plus (for uses within its limitations) is that it can run in-process, so that the overhead of multiple queries is really small compared to one big query; if that's easiest to code, for your specific use case, I would really consider it (and doing it in a "lazy load" way, as you hint, might actually make the first screen of data appear faster!). As you suspect, it's unlikely that this will be a performance bottleneck, in your use case, so going for simpler and thus more reliable coding is an important plus.
If I was doing a high-traffic site, and using a richer, "heavier" engine such as PosgtreSQL, Oracle, SQL Server, or DB2, I would trust the optimizer much more. One thing I've noticed, however, is that I can often (alas, not always) change sub-queries into joins, and that often tends to improve performance (joins make it easier for the optimizer to use good indices, I think -- I have never coded a SQL optimizer myself, but that's my impression from staring at query execution plans from many engines for alternative forms of queries... that, of course, DOES assume you have good indices!-) -- this would have to be confirmed with a benchmark of the specific case in question, of course, but it would be my initial working assumption.
What about using cursors?
I would prefer using a big query and let my SQL engine optimize my query.
Also I can't think of an example where it's better to do a loop outside SQL instead of using a "big" query or using cursors.
But the best way to know what's better is to benchmark it.
Good luck!