I am a Java guy who is familiar to basic SQL and PL/SQL. I can write simple procedures and functions. Now, I am expected to do some Oracle performance tuning. Can anyone help me with some crash course material? Thanks in advance.
--Siddharth
Along with the earlier suggestions (all excellent in their own rights) there are a few simple things you can do which will make you into an Instant Performance Guru (tm):
Carry a briefcase full of papers and books. Dog-eared books with titles like "Oracle Performance Tuning For Highly Effective People"+ and pieces of paper with boxes and arrows scribbled on them work well. If the books are for outdated versions of Oracle so much the better as it makes it look like you've been doing this for a while - plus, you can buy 'em cheap from the 'clearance' table at your local bookstore. For best effect the briefcase should be well worn - if you're forced to purchase a new briefcase you can get that weathered effect by backing over it with a car and/or tying a rope to the handle and dragging it through the sand/dirt/mud for ten minutes or so. This all helps to impress the natives. A bullet hole or two can be interesting conversation starters. You can also use the briefcase to carry your lunch and other important stuff like a towel.
Add full-key indexes for all queries.
Ensure all foreign keys have full-key indexes backing them.
This may give you the idea that "performance analysis" consists mostly of adding the indexes that the people who wrote the software never bothered to add because in their next-to-empty development database everything ran really fast. This is not correct. A complete fabrication. Utter nonsense. At best it's only about, like, 95% of it. Pay no attention to that man behind the curtain - he is of no importance whatsoever...
You now know the Secrets of the Oracle Performance Masters. (Well, most of it anyways, except for the Secret Handshake, which is tough to explain in a text message (and besides, you need six fingers on each hand), and the Hidden Mysteries, which consist principally of a lot of stuff about frogs which you don't technically need to know but which has been know to make well-informed people giggle a lot - which is not attractive...).
Go ye forth and do ye good works.
+This is actually slightly incorrect. The book you really want to have in your briefcase is "Oracle Performance Tuning For People Who Are Smarter Than 99.9% Of The Inhabitants Of This Planet". Since 99.99999%++ of the inhabitants of this planet are single-celled organisms or managers (and sometimes both) this is not difficult to accomplish.
++This is a real number. You can't just make this stuff up+++.
+++Actually, you can and I did. But it's not "lying" if you use an exact number - it's "creative re-imagining"++++.
++++That's "lying" in consultant-speak.
Performance Tuning is all about "It depends". If there existed a check list for how to improve performance, it would already be implemented in the database.
Hang around the OTN forums and look for topics related to tuning. Try to really understand what the experts write. Why was there a problem? What clues was there to aid in discovering a solution? What is the difference in solutions provided? What tradeoffs had to be made in order to select one solution over the other? Re-create the problems in your database and experiment with them yourself.
Don't be afraid to ask if you don't understand why some adviced was posted!
Links to the forums
SQL and PLSQL- Forums
General - Forums
Oh, and please don't go down the usual path which is throwing hints at queries until something happens. Without a solid understanding of access statistics, selectivity, costing, join mechanisms, access paths or just basic knowledge of the architecture of the database, it makes no sense whatsoever to override the optimizer with hints.
If I could travel back in time and give myself three books, I would bring:
Expert Oracle Database Architecture (Tom Kyte)
Cost-based Oracle Fundamentals (Jonathan Lewis)
SQL Tuning (Dan Tow)
If you want to know more on SQL preformance tuning, then you need to know the oracle hints and how to check the explain plan of the query:
http://www.adp-gmbh.ch/ora/sql/hints/index.html
Other important performance related things are the analytical functions
http://www.adp-gmbh.ch/ora/sql/analytical/index.html
This can be useful if you are interested more in Oracle Specific options. The other answers might be more useful if you are generally interested in performance tuning
Two Tom Kyte books:
"Expert Oracle Database Architecture" - A crash course in Oracle internals from the perspective of a developer.
"Effective Oracle by Design" - How to write hi-performance applications. Demystification of tools like EXPLAIN PLAN, AUTOTRACE, TKPROF, Statspack, and DBMS_PROFILER; understanding the optimizer; schema design, and many more.
Any other materials written by Mr. Kyte are worth reading as well.
You could start with the Introduction to Performance Tuning section of the Oracle docs. I have linked to the Oracle 10G version.
Learning to generate and interpret trace files might be a good way to begin.
Try statspack. Check details here
Performance is usually a black magic and it requires a lot of patience :)
Related
I have some SQLs running on ORACLE 10g database. Some of these SQLs were using oracle sample clause.
e.g.
select /*+ use_nl(emp,dept) +*/ *
from emp, dept
sample(10)
where emp.deptno=dept.deptno
Now, the oracle 10g will be upgraded to 11g. We have to proove that there is no performance impact for the sample clause for the database upgrade. In other words, I should proove the sample clause working well in oracle 11g on performance aspect.
But I spent a whole day on search on google, no expected answer.
Could you give me a answer or suggestion on it? Thank you so much.
Well the good news is that it's exactly the same clause used internally by Oracle for sampling -- most notably for DBMS_Stats programs. So I'd say that it's rather unlikely that it's done anything other than get faster.
The bad news is that there are an enormous number of issues that might harm or improve performance during an upgrade, but where there is a problem you are most likely going to find out pretty quickly and there will be a fix available. However, someone doesn't want to be held to blame if anything at all changes during the upgrade, and is covering their arse by making you waste your time on efforts that are bound to be unproductive and inconclusive.
The correct way to investigate this is to set up a new system with the new version that you are actually going to be using -- no cut-down data sets, no export-import if what you're going to do is an in-place upgrade (you want the physical data layout to be exactly the same as production), no "small representative set of queries", using the same storage architecture, processor type and memory configuration. Run your actual application on real data and compare performance before and after upgrade in a meaningful way.
It is absolutely the only way to be sure, because the fundamental problem with the task that you've been set is that you have to look for evidence of something (a performance problem) not existing. It's like trying to prove the non-existence of invisible pixies in your washing machine that eat one sock out of every pair -- practically impossible! The burden of proof lies with those who say that something might exist.
http://www.logicallyfallacious.com/index.php/logical-fallacies/146-proving-non-existence
Here are some constructive steps you can take:
Search Metalink -- this is the number one authoratative source for upgrade problems, because problems get reported via Metalink and either a bug is raised, or an explanation is published.
Search the Oracle forums -- if there is a general change in behaviour that people are encountering then they'll question it here.
Search the internet generally.
If you've done all of those then that's the limit of the efforts you can be expected to take.
If that's not enough then you or the people behind this migration just have to escalate it, and ask some awkward questions: Was this level of proof required in moving from 9i to 10g? Is it required for upgrades from 10.2.0.2 to 10.2.0.5?
I'd really love to know the politics behind this -- it sounds like a dreadful place to work.
I have a web site that's been progressivelly expanding in both traffic and complexity of database design. I've always worked as a developer first & foremost, and never really been much of a DB administrator beyond what I need to do to get my code running. This needs to change - I need to improve efficiency on the database side of things.
To give a vague example, I'm looking for how to go about learning:
Optimising complex tables/relationships for performance/scaling
How to index efficiently. (At the moment I throw indexes on foreign keys, and that's about it)
General design principles for complex databases
Most of the resources I've found are either directed more towards the basics of SQL ("this is a SELECT query, a JOIN, etc") or focus primarily on performance issues outside the DB.
So, I know this is a little vague - but where should I look to ensure my database is designed in the most most efficient & integral manner possible?
Learn about data modeling. Choosing the right data structure is always a crucial first step, for programming in general and databases in particular. Performance cannot be "bolted" on top of a bad data structure! The ERwin Methods Guide is probably not a bad way to start learning about data modeling.
Learn how DBMSes organize data at the physical level. This will help you immensely in understanding how to "shape" your data for performance and how to effectively leverage many of the performance mechanisms modern DBMSes put at your disposal. Use The Index, Luke! is an excellent tutorial on the topic.
Learn how to efficiently access the database and make sure you really understand the client API that will be called from your code. Different APIs have their own idiosyncrasies, but they all share some common themes, such as parameter binding, query preparation and fetching. Even if you are "shielded" by an ORM from ever having to, say, bind parameters manually, this is still taking place "under the covers" and understanding it raises your ability to write performant code.
Measure, measure, measure. Modern information systems are immensely complex and even experts find themselves making incorrect assumptions, so don't rely on assumptions!
I would suggest some reading in performance tuning. It is very specialized depending on the database backend you use. BUt here are some books to consider:
SQl Server
http://www.amazon.com/Server-Query-Performance-Tuning-Distilled/dp/1590594215/ref=sr_1_2?s=books&ie=UTF8&qid=1334154710&sr=1-2
http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476/ref=sr_1_12?s=books&ie=UTF8&qid=1334154710&sr=1-12
MySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-ebook/dp/B0028N4W7Y/ref=sr_1_3?ie=UTF8&qid=1334154504&sr=8-3
Oracle
http://www.amazon.com/Oracle-Database-Release-Performance-Techniques/dp/0071780262/ref=sr_1_2?s=books&ie=UTF8&qid=1334154909&sr=1-2
General performance Tuning
http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692/ref=sr_1_18?s=books&ie=UTF8&qid=1334154964&sr=1-18
First and foremost, I'd recommend learning how to use EXPLAIN and what its output means. Run it on your most common queries and study the output. Are the queries using sensible indexes? Are they using indexes at all? Queries that look very simple at a glance might end up being quite costly.
Next, I'd suggest finding your slowest queries. Postgres (for example) has a feature that allows you to log the SQL source for all queries that take longer than N seconds to run. Are they slow because they're unindexed, very complex, or operating on a huge amount of data?
Third, I'd look at the number of times a particular query is run. Are you using the database to store static data, and hitting a table over and over again to grab a record that never changes? You could probably cache the result somewhere.
This isn't a question of "which is the fastest ORM", nor is it a question on "how to write good code with ORMs". This is the other side: the code's been written, it's gone live, several thousand users are hitting the application, but there's a perceived overall performance problem. A SQL Profiler trace can only be ran for a short amount of time: 5 mins gives several hundred thousand results.
The question is simply this: having used SQL Profiler to narrow down a number of slow queries (duration greater than a given amount of time), what techniques and solutions exist for tracing these SQL queries back into the problematic component? A releated question is that if a specific area is slow, how can we identify the SQL that this area is executing so it can be suitably filtered in SQL Profiler?
The background to this is that we have a rather large application with a fairly complex table structure, and is currently based around data-access via stored procedures. If a SQL performance problem arises, it's usually case of pulling out SQL profiler, find out if there's anything slow (filter by duration) or if a the area being complained about is slow (filter by stored procedure), and tune the stored procedures (or the schema - through indexing).
Now there's a push to move our code over from a mostly-sproc solution to a mostly-ORM solution, however the big push against the move is how performance problems, if they arise, can be traced back to problematic code. I've read around and it seems that more often than not, it may involve third-party tools (ORM tracing utilities like NHProf or .NET tracing utils like dottrace) that we'd need to install on the server. Now whether additional tools can be installed on a live environment is another question, so if things like this can be performed without additional tools, then that may be a bonus.
I'm mostly interested in solutions with SQL Server 2008, but it's probably generic enough for any RDBMS. As far as the ORM tech, on this I have no specific focus as nothing's currently in use, so be interested to hear how techniques differ (or are common) twixt nHibernate, fluent-nhibernate and Entity Framework. Other ORMs are welcome though if they offer something else :-)
I've read through How to find and fix performance problems (...), and I think the issue is simply the section on there that says "isolate". A problem that is easily reproducible only on a live system is going to be difficult to isolate. The figures I quoted in para 2 are figures the types of volumes that we can get from a profile as well...
If you have real-world experience of ORM tracing on live, so much the better :-)
Update, 2016-10-21: Just for completeness, we eventually solved this for NHibernate by writing code, and overriding NHibernate methods. Full details in this other SO question I asked: NHibernate and Interceptors - measuring SQL round trip times. I expect this will be a similar approach for many different ORMs.
There exists profilers for ORM tools, like UberProf. It finds out which SQL statements that are generated by the ORM can be problematic.
Like the select n+1 problem, for instance. These kind of tools might give you an indication of which ORM query statements result in poor SQL code, and perhaps even how you could improve them.
We had a Java/Hibernate app with issues, so we used SET CONTEXT_INFO with a different value. If we saw, say, 0x14 on the same SPID just before a WTF query, we could narrow it to module x.
Not being a Java guy, I don't know exactly what they did, and of course it may not apply to .net. IIRC you have to be careful about when connections are opened/closed
We could also control the client load at this time so we didn't have too much superfluous traffic.
YMMV of course, but it may be useful
I just found these which could be useful too
Temporary tables, sessions and logging in SQL Server?
Why is my CONTEXT_INFO() empty?
I know that you shouldnt optimize too early, and you should instead aim for maintainability. My question is, at what point is it too late?
I'm working on a website, similar to yahoo answers, and my database structure is exactly what I feel it should be. Table for users, questions, answers, question_comments, answer_comments, etc.
My question is, IF the site were to grow, how would this architecture scale? I'm thinking of putting both questions and answers in a single table (posts), separating them by type, and then putting both question_comments and answer_comments in the same table (comments). I believe this is similar to stackoverflow's DB scheme.
I know what you guys are gonna say, "Dont worry about it until it becomes an actual problem". But wouldn't it be a little too late to worry about it then?
Thanks
The reason why it's a bad practice to optimize early is you don't know where your bottlenecks will be until your website sees a significant amount of traffic. How your users access and interact with your site is an unknown at this point.
It's almost always best to start with a 'good' architecture (normalized database, MVC architecture, DRY, well-written frontend code, etc) and go from there. It will be much easier to scale a clean, organized architecture than one that was prematurely optimized.
At best right now you can do some load testing via ab or another load testing tool to see where your current bottlenecks are. It certainly won't find all of them, but it will find some.
If you're really worried about this (and you shouldn't be yet), install Nagios or Munin on your server to monitor performance. Use a third party tool to measure page load time daily. Once you start seeing issues then you can profile and tune.
You absolutely should optimize if a fast service is a fundamental requirement of the application.
If sub-second responses are not a requirement, than you can write clean code and optimize later.
A good example of this was JavaScript before the latest version of browsers, people who wrote nice, clean, extensible JS for their pages had terrible performance and had to start from scratch.
One huge table is generally harder to maintain. People usually cut their tables into partitions and even their databases into shards.
I don't see how putting all comments into the same table would save you a join. Really, putting questions and answers into the same table won't save you a join either, you'll just be joining by the same table.
If you want to save on joins, I'd expect you use a document-oriented NoSQL database, such as MongoDB. That's where you can store a question with all related answers and comments in a single 'record', fetchable with one operation.
Databases need to be designed with performance in mind not wait until you havea problem later. Premature optimization doesn't mean don't do it in design, it means don't get ridiculously excessive about it. However, there are known performance killers for every database backend and it is foolish to design to use one of those when a differnt technique will be faster and take the same amount of time to write code for if you are familar with it. So before designing any database, read up on performance tuning and you will never write database code the same way again.
Are there any good scripts that I could run against my Oracle database to test for SQL defects or maybe common performance issues?
Edit: Everything in an Oracle database can be queried. From the PL/SQL packages, indexes and sql running stats. The performance books say look in this place and it will show some absolute values that need the developer to be able to interpret. Has anyone combined their knowledge to include this interpretation within the scripts?
Are you asking for the information in this book?
http://www.amazon.com/Oracle-Database-Performance-Techniques-Osborne/dp/0072263059/ref=sr_1_1?ie=UTF8&s=books&qid=1264619796&sr=1-1
Are you asking about this wiki?
http://wiki.oracle.com/page/Performance+Tuning
Or are you asking for this vendor information?
http://www.oracle.com/technology/deploy/performance/index.html
Edit. There is no magical set of queries that you simply run and set the various tuning options.
Oracle is very complicated. Changing a parameter to make one thing fast can make several other things faster or slower. Or makes makes the instance consume more real memory than you have installed. It's hard to generalize this into magical queries. You have tools, but even then, the tools give you tuning options and you may need to run different experiments.
Performance is a balance. You have to strike a balance between physical I/O time and CPU time. It's not possible to generalize this into a magical query. Your system may need faster physical I/O (data warehouses, for instance, often need this) because it can't effectively work from cache. My system may need faster processor time and will have to work in cache to achieve this.
Performance is a function of your application. No magical query of Oracle will reveal a single thing about how your application is designed to work.
Enterprise Manager and it's associated performance tools are a good place to start looking for queries that are consuming the most resources. Here you can see the plans generated for your SQL, view traces of long running queries, etc.
If you have a budget, there is Spotlight by Quest. I've only used the trial version, but I found it useful.
I would recommend checking out the book Optimizing Oracle Performance and any of Cary Millsap's other writings. It is a waste of time to think about optimizing every query. You really need an approach to finding out where your performance bottlenecks are. His Method R approach is a very good one to read up on. Also most of Tom Kyte's books go into detail about performance issues.