How to approach performance issues? - performance

We are developing a client-server desktop application(winforms with sql server 2008, using LINQ-SQL).We are now finding many issues related to performance.These relate to querying too much data with LINQ , bad database design,not much caching etc.What do you suggest,we should do - how to go about solving these performance issues? One thing,I am doing is doing sql profiling,and trying to fix some queries.As far caching is concerned,we have static lists.But,how to keep them updated,we don't have any server side implementation.So,these lists can be stale,if someone changes data.
regards

Performance analysis without tools is fruitless, with the wrong tools frustrating. SQL Profiler is the wrong tool to rely on for what you are looking at. I think it is at best giving you a hint of what is wrong.
You need to use a code profiler to determine why/when these queries are being executed. You should be able to find one by Googling it and run it a x day trial.
The key questions are:
Are queries being run multiple times when there is no reason to at all? Is the data already in memory (even if not stored statically). This happens a lot where data is already retrieved but because of some action on the code it loads it again. Class properties are a big culprit here.
Should certain data be stored statically across the application? How volatile is that data? Can you afford to show stale data?
The only way to decide on #2 is to have hard data to examine the cost of a particular transaction. For example, if I know it takes me 1983 ms to create a new invoice, what will it be after I start caching data. After the cache is that savings significant. But recognize you can't answer that question until you know it takes 1983 ms to create an invoice.
When I profile an application transaction I focus on the big contributor and try to determine why it is so big. I look for individual methods that are slow and for any code that is executed frequently. It is often the latter, the death of a thousand cuts, that gets you.
And I wanted to add this, it is also very important to know when to stop working on a performance issue.

I found Jeff Atwood's articles on this quite interesting:
Compiled Or Bust
All Abstractions are field Abstractions

For updating, you can create a Table. I called it ListVersions.
Just store list id, name and version.
When you do some changes to a list, just increment its version. In your application, you'll just need to compare version and update only if it has changed. Update lists that have version incremented, not all.
I've described it in my answer to this question
What is the preferred method of refreshing a combo box when the data changes?
Good Luck!

A general recipe for performance issues:
Measure (wall clock time, CPU time, memory consumption etc.)
Design & implement an algorithm that you think could be faster than current code.
Measure again to assess the impact of your fix.
Many times the biggest bottle necks aren't exactly where you though they were. So, base your actions on measured data.
Try to keep the number of SQL queries small. You're more likely to get performance improvements by lowering the amount of queries than restrucrturing the SQL syntax of an individual query.
I recommed adding some server side logic instead of directly firing the SQL queries from the client. You could implement caching shared but all clients on the server side.

Related

Temporary Dashboard/Reporting Solution while Building a Data Warehouse

Our situation is that we are going to start to build a data warehouse. The data warehouse is going to take some time, if we are going to do it right. It will be built looking at individual processes and growing from there.
We only have three databases that we will be pulling data from. All three databases hold distinct information (financial info, scheduling and patient information - visits, diagnosis,etc).
I am thinking of using a dashboard/reporting tool like (as an example) http://www.jedox.com/en/, or http://www.board.com/us/ to display the information to the business. It will slowly start incoperating the DW as it is beind designed and pushed to production.
My question after all this is: What is the best way to present the data to the application (dashboard/reporter) in the backend that would be efficient, yet not time consuming where I'd rather build the Data Warehouse? Ie. views, materialized views, small seperate DB containing subset data from the main DB's, etc?
This may not be answering your question directly, but rather than find a temporary solution I would just build your warehouse faster.
First, if you can build it quickly then you don't need a temporary one; if you can't build it quickly then you won't be able to build a temporary solution quickly either. You even mentioned developing a "small separate DB containing subset data"; that's exactly what a reporting database is!
Second, any temporary solution will have to be maintained and supported too: if it's too useful then your temporary solution will become your permanent one anyway. That might actually be a good thing because if the 'temporary' solution meets your requirements then why not keep it?
Anyway, I would start by identifying one or two key reports that have high value for your users and commit to delivering them in 2 months (1 month would be even better). Develop the most basic, minimal database and ETL/reporting processes possible to deliver those reports, even if it seems like a horrible, hacked-together mess. Make sure the reports are internal ones that no one will send to an outside customer; that means you can avoid spending time on making them pretty.
After you've delivered those reports, you can now step back and look at what you did. Hopefully you will find yourself in a position where:
Your users got some useful reports very quickly
The reports are ugly but the numbers are correct
You've learned a lot about the users' needs and how they interpret and use the data
Your technical implementation is a mess, but you know that and you also know how to improve it
If #1 and #2 are true then you'll have delivered a lot of business value quickly while also setting the user expectation that correct is often more valuable than pretty (that's really helpful on a reporting project). If #3 and #4 are true then your second iteration will be a big improvement on the first one and even if you find yourself in the worst case scenario of having to re-develop the whole thing from scratch, you'll do it faster and better because you've learned so much.
This is simply agile development, of course: there's no reason you can't use rapid prototyping and incremental delivery in a data warehouse project. Like any IT solution the warehouse will continuously grow and be maintained over time so there's absolutely no reason to try to get everything complete and correct in the first version. It's highly likely that your users don't even really know what they want (in detail) so this approach helps to clarify their expectations and requirements more quickly too.

My Database Design skills stink. Where to seek remedy?

I have a web site that's been progressivelly expanding in both traffic and complexity of database design. I've always worked as a developer first & foremost, and never really been much of a DB administrator beyond what I need to do to get my code running. This needs to change - I need to improve efficiency on the database side of things.
To give a vague example, I'm looking for how to go about learning:
Optimising complex tables/relationships for performance/scaling
How to index efficiently. (At the moment I throw indexes on foreign keys, and that's about it)
General design principles for complex databases
Most of the resources I've found are either directed more towards the basics of SQL ("this is a SELECT query, a JOIN, etc") or focus primarily on performance issues outside the DB.
So, I know this is a little vague - but where should I look to ensure my database is designed in the most most efficient & integral manner possible?
Learn about data modeling. Choosing the right data structure is always a crucial first step, for programming in general and databases in particular. Performance cannot be "bolted" on top of a bad data structure! The ERwin Methods Guide is probably not a bad way to start learning about data modeling.
Learn how DBMSes organize data at the physical level. This will help you immensely in understanding how to "shape" your data for performance and how to effectively leverage many of the performance mechanisms modern DBMSes put at your disposal. Use The Index, Luke! is an excellent tutorial on the topic.
Learn how to efficiently access the database and make sure you really understand the client API that will be called from your code. Different APIs have their own idiosyncrasies, but they all share some common themes, such as parameter binding, query preparation and fetching. Even if you are "shielded" by an ORM from ever having to, say, bind parameters manually, this is still taking place "under the covers" and understanding it raises your ability to write performant code.
Measure, measure, measure. Modern information systems are immensely complex and even experts find themselves making incorrect assumptions, so don't rely on assumptions!
I would suggest some reading in performance tuning. It is very specialized depending on the database backend you use. BUt here are some books to consider:
SQl Server
http://www.amazon.com/Server-Query-Performance-Tuning-Distilled/dp/1590594215/ref=sr_1_2?s=books&ie=UTF8&qid=1334154710&sr=1-2
http://www.amazon.com/Performance-Tuning-Server-Dynamic-Management/dp/1906434476/ref=sr_1_12?s=books&ie=UTF8&qid=1334154710&sr=1-12
MySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-ebook/dp/B0028N4W7Y/ref=sr_1_3?ie=UTF8&qid=1334154504&sr=8-3
Oracle
http://www.amazon.com/Oracle-Database-Release-Performance-Techniques/dp/0071780262/ref=sr_1_2?s=books&ie=UTF8&qid=1334154909&sr=1-2
General performance Tuning
http://www.amazon.com/SQL-Performance-Tuning-Peter-Gulutzan/dp/0201791692/ref=sr_1_18?s=books&ie=UTF8&qid=1334154964&sr=1-18
First and foremost, I'd recommend learning how to use EXPLAIN and what its output means. Run it on your most common queries and study the output. Are the queries using sensible indexes? Are they using indexes at all? Queries that look very simple at a glance might end up being quite costly.
Next, I'd suggest finding your slowest queries. Postgres (for example) has a feature that allows you to log the SQL source for all queries that take longer than N seconds to run. Are they slow because they're unindexed, very complex, or operating on a huge amount of data?
Third, I'd look at the number of times a particular query is run. Are you using the database to store static data, and hitting a table over and over again to grab a record that never changes? You could probably cache the result somewhere.

Tracing ORM performance

This isn't a question of "which is the fastest ORM", nor is it a question on "how to write good code with ORMs". This is the other side: the code's been written, it's gone live, several thousand users are hitting the application, but there's a perceived overall performance problem. A SQL Profiler trace can only be ran for a short amount of time: 5 mins gives several hundred thousand results.
The question is simply this: having used SQL Profiler to narrow down a number of slow queries (duration greater than a given amount of time), what techniques and solutions exist for tracing these SQL queries back into the problematic component? A releated question is that if a specific area is slow, how can we identify the SQL that this area is executing so it can be suitably filtered in SQL Profiler?
The background to this is that we have a rather large application with a fairly complex table structure, and is currently based around data-access via stored procedures. If a SQL performance problem arises, it's usually case of pulling out SQL profiler, find out if there's anything slow (filter by duration) or if a the area being complained about is slow (filter by stored procedure), and tune the stored procedures (or the schema - through indexing).
Now there's a push to move our code over from a mostly-sproc solution to a mostly-ORM solution, however the big push against the move is how performance problems, if they arise, can be traced back to problematic code. I've read around and it seems that more often than not, it may involve third-party tools (ORM tracing utilities like NHProf or .NET tracing utils like dottrace) that we'd need to install on the server. Now whether additional tools can be installed on a live environment is another question, so if things like this can be performed without additional tools, then that may be a bonus.
I'm mostly interested in solutions with SQL Server 2008, but it's probably generic enough for any RDBMS. As far as the ORM tech, on this I have no specific focus as nothing's currently in use, so be interested to hear how techniques differ (or are common) twixt nHibernate, fluent-nhibernate and Entity Framework. Other ORMs are welcome though if they offer something else :-)
I've read through How to find and fix performance problems (...), and I think the issue is simply the section on there that says "isolate". A problem that is easily reproducible only on a live system is going to be difficult to isolate. The figures I quoted in para 2 are figures the types of volumes that we can get from a profile as well...
If you have real-world experience of ORM tracing on live, so much the better :-)
Update, 2016-10-21: Just for completeness, we eventually solved this for NHibernate by writing code, and overriding NHibernate methods. Full details in this other SO question I asked: NHibernate and Interceptors - measuring SQL round trip times. I expect this will be a similar approach for many different ORMs.
There exists profilers for ORM tools, like UberProf. It finds out which SQL statements that are generated by the ORM can be problematic.
Like the select n+1 problem, for instance. These kind of tools might give you an indication of which ORM query statements result in poor SQL code, and perhaps even how you could improve them.
We had a Java/Hibernate app with issues, so we used SET CONTEXT_INFO with a different value. If we saw, say, 0x14 on the same SPID just before a WTF query, we could narrow it to module x.
Not being a Java guy, I don't know exactly what they did, and of course it may not apply to .net. IIRC you have to be careful about when connections are opened/closed
We could also control the client load at this time so we didn't have too much superfluous traffic.
YMMV of course, but it may be useful
I just found these which could be useful too
Temporary tables, sessions and logging in SQL Server?
Why is my CONTEXT_INFO() empty?

When is it too late to optimize for performance?

I know that you shouldnt optimize too early, and you should instead aim for maintainability. My question is, at what point is it too late?
I'm working on a website, similar to yahoo answers, and my database structure is exactly what I feel it should be. Table for users, questions, answers, question_comments, answer_comments, etc.
My question is, IF the site were to grow, how would this architecture scale? I'm thinking of putting both questions and answers in a single table (posts), separating them by type, and then putting both question_comments and answer_comments in the same table (comments). I believe this is similar to stackoverflow's DB scheme.
I know what you guys are gonna say, "Dont worry about it until it becomes an actual problem". But wouldn't it be a little too late to worry about it then?
Thanks
The reason why it's a bad practice to optimize early is you don't know where your bottlenecks will be until your website sees a significant amount of traffic. How your users access and interact with your site is an unknown at this point.
It's almost always best to start with a 'good' architecture (normalized database, MVC architecture, DRY, well-written frontend code, etc) and go from there. It will be much easier to scale a clean, organized architecture than one that was prematurely optimized.
At best right now you can do some load testing via ab or another load testing tool to see where your current bottlenecks are. It certainly won't find all of them, but it will find some.
If you're really worried about this (and you shouldn't be yet), install Nagios or Munin on your server to monitor performance. Use a third party tool to measure page load time daily. Once you start seeing issues then you can profile and tune.
You absolutely should optimize if a fast service is a fundamental requirement of the application.
If sub-second responses are not a requirement, than you can write clean code and optimize later.
A good example of this was JavaScript before the latest version of browsers, people who wrote nice, clean, extensible JS for their pages had terrible performance and had to start from scratch.
One huge table is generally harder to maintain. People usually cut their tables into partitions and even their databases into shards.
I don't see how putting all comments into the same table would save you a join. Really, putting questions and answers into the same table won't save you a join either, you'll just be joining by the same table.
If you want to save on joins, I'd expect you use a document-oriented NoSQL database, such as MongoDB. That's where you can store a question with all related answers and comments in a single 'record', fetchable with one operation.
Databases need to be designed with performance in mind not wait until you havea problem later. Premature optimization doesn't mean don't do it in design, it means don't get ridiculously excessive about it. However, there are known performance killers for every database backend and it is foolish to design to use one of those when a differnt technique will be faster and take the same amount of time to write code for if you are familar with it. So before designing any database, read up on performance tuning and you will never write database code the same way again.

Testing an Oracle database for common bugs/performance issues?

Are there any good scripts that I could run against my Oracle database to test for SQL defects or maybe common performance issues?
Edit: Everything in an Oracle database can be queried. From the PL/SQL packages, indexes and sql running stats. The performance books say look in this place and it will show some absolute values that need the developer to be able to interpret. Has anyone combined their knowledge to include this interpretation within the scripts?
Are you asking for the information in this book?
http://www.amazon.com/Oracle-Database-Performance-Techniques-Osborne/dp/0072263059/ref=sr_1_1?ie=UTF8&s=books&qid=1264619796&sr=1-1
Are you asking about this wiki?
http://wiki.oracle.com/page/Performance+Tuning
Or are you asking for this vendor information?
http://www.oracle.com/technology/deploy/performance/index.html
Edit. There is no magical set of queries that you simply run and set the various tuning options.
Oracle is very complicated. Changing a parameter to make one thing fast can make several other things faster or slower. Or makes makes the instance consume more real memory than you have installed. It's hard to generalize this into magical queries. You have tools, but even then, the tools give you tuning options and you may need to run different experiments.
Performance is a balance. You have to strike a balance between physical I/O time and CPU time. It's not possible to generalize this into a magical query. Your system may need faster physical I/O (data warehouses, for instance, often need this) because it can't effectively work from cache. My system may need faster processor time and will have to work in cache to achieve this.
Performance is a function of your application. No magical query of Oracle will reveal a single thing about how your application is designed to work.
Enterprise Manager and it's associated performance tools are a good place to start looking for queries that are consuming the most resources. Here you can see the plans generated for your SQL, view traces of long running queries, etc.
If you have a budget, there is Spotlight by Quest. I've only used the trial version, but I found it useful.
I would recommend checking out the book Optimizing Oracle Performance and any of Cary Millsap's other writings. It is a waste of time to think about optimizing every query. You really need an approach to finding out where your performance bottlenecks are. His Method R approach is a very good one to read up on. Also most of Tom Kyte's books go into detail about performance issues.

Resources