I want to test some performance tuning techniques on a realistic database with many database tables and a lot of data. I would like to do this in Oracle 11g Release 1 and would like to know how best to go about this or if there is a website I could get realistic datasets/database from.
Many thanks for your audience.
Cheers,
Tunde
Good timing. There was a blog entry today on generating the data for the TPC-H benchmark.
May you can choose something from this.
Related
Can i find any resources like PDF or User Guide for learning vertica DB?
As i am beginner in vertica, also I am looking for the performance factor which affects while loading the data as well.
All of the documentation is posted publicly on my.vertica.com. Data-load performance depends on many factors; you should probably start with Bulk-Loading Data and then review the many COPY parameters. For a general beginner introduction to Vertica, see Getting Started.
What is the best considered way to save and use Views, SPs and UDFs for SSRS reporting services that will be used by many users and some reports subscribed being sent out?
Do I:
Write it to a table overnight via scheduled jobs to do a direct read to the pre-saved query results?
Use a SP with temp tables with indexes based on each Views SQL to have it all in one place for the SSRS?
If the answer is that 'it depends on what I want', I would be grateful if you point me to any resources that can give me an idea of ideal setup to get query data to SSRS with minimal performance issues.
Thanking you kindly
Background/Explanation
SQL Server is not foreign to me but I don’t consider myself experienced (1 year) enough in developing 'etiquette' when it comes to crafting the parts of SQL Server. I feel I'm developing a lot of bad habits formed from using basic SQL knowledge, online searches and the odd MS SQL server course. The amount of searching done has been endless and I’m not saying there isn’t an answer out there for each part of SQL Server (UDFs, SPs and Views) out there.
The company I work for has many servers, many databases, for many outsourced front end systems being used. The issue is performance and the more I search the more I realize the setup of our databases could now maybe completely negligent and amateur. When I joined the setup used a lot of views each 'end' view had a dependency tree of over 4+ views including use of functions, each view ranging from aggregate calculations for Statistics to rearranging via pivots and unpivots. The reason given to me was so that we can pick out the parts that have gone wrong in which view. To no surprise the server has now suddenly had enough of this and peaks at 100% every time a report or view ran affecting the front end systems performance for the users.
My PP stresses my frustration and my position with the company (code monkey) in finding an answer myself which has resulted in pushing the keys back in the keyboard with opposable thumbs and appeal to the experts here.
This question is really too broad for stackoverflow. I'll try to give you a quick overview of what I think you're asking but really you're asking for way too much for a single answer here. This site is mainly focused around solving a specific problem and not the general process of development. I expect someone will probably come along and close your question.
Nightly table loads
Depending on the complexity of the task this is exactly what SSIS (SQL Server Integration Services) is for. You can build automated processes that do data transformations and data loads. It is used to build maintainable data integration solutions. Learning to use SSIS (especially properly) is a whole task though. In fact the 3rd exam for the SQL Server 2012 MCSA is exclusively about SSIS. Though if your table loads are not that complicated running them as SQL Tasks could be just as effective.
Database structure and use of views/SPs/functions/etc
This is an incredibly deep subject and it is totally dependent on what you're trying to do, how your data is structured, what kind of hardware you've got running, etc. Certainly using views, functions, and stored procedures can be good. They enable code re-use and allow you to encapsulate the logic for SSRS reports away from the actual report writers.
However, the SQL needs to be well written or it will suffer from performance problems. But, of course, that is just how it is no matter where you put the code. Even if the SQL is just a dataset in an SSRS report it will run slowly and hammer the server if it isn't written well. If the database isn't configured correctly it can have terrible performance. Indexes and other techniques for speeding up databases will always be important.
Above all everything needs to be documented so that someone else (or your later self) can make sense of it in ten months when something breaks.
Training
I would highly recommend trying to convince your employer to send you on some courses to learn SQL Server if they expect you to be developing complex database solutions. Certainly taking the courses to get your MCSA in SQL Server 2012 would be very useful. Getting the certification certainly opened my eyes to many possibilities for achieving things that I didn't know about before or just hadn't thought of.
The first exam will cover writing SQL queries and the different things that can help performance and the many cool features that you can leverage when retrieving or writing data. The second exam will cover database server administration, troubleshooting, and some performance tuning. The third exam is all about SSIS and how to warehouse your data to enable better analysis and reporting.
Even if you just read the Microsoft Learning books for these exams and never take the tests you will gain a lot of knowledge. There are other books that are good too such as T-SQL Fundamentals by Itzik Ben-Gan but ultimately it sounds like you need to get a lot deeper knowledge of the SQL Server platform before you can really make good design decisions about how to implement your solutions.
Conclusion
In the end, programming is programming. Trying to make a maintainable solution that works is your first goal. Tuning the performance of the system comes after that. The specifics of the languages and platforms don't take away from any of that. But in order to get the best performance out of a system you need knowledge about that system. An answer on here isn't going to be able to give you everything you need to know.
I wonder about that can I write native SQL to add or delete operations instead of using Query Expression or FetchXML etc. I know Query Expression is very useful but my real concern is performance and I've thought writing SQL can be faster than the others.
To put it simply, using direct SQL (especially for create/update actions) is not supported. DO NOT DO IT!
The database model for CRM is complex and updates to data can have effects that extend beyond a simple update to a single table or two.
my real concern is performance
Have you validated this concern? Take a look at this link which documents performance tests on CRM. This is an enterprise-level, scalable platform. If you have proven performance issues then perhaps your code needs optimising or your kit needs beefing up...? :)
I totally agree with Greg's answer, this is just as a side note regarding performance. If you really are seeing "performance issues", maybe you should spend your time focusing on seeing if adding an index would be helpful. Although database indexes aren't included within CRM solutions, and will require manual propagation between dev, qa, staging, and prod environments, and are only supported for on-site installations, they can make some queries, 10 or 100 times faster... (of course if they are abused, they can slow everything down as well. Know what you are doing before you use them)
On top of what #Greg & #Daryl have said, when you say performance do you mean its quicker for you to write sql?
Regardless, CRM has some unique ways of doing things.
For example activate/deactivate a record, invoice related actions or the way CRM converts an Opportunity.
It's not that hard to do. You should spend some time in the sdk...
When you write rather complex SQL for Oracle, sooner or later you will have to apply the odd execution hint because Oracle can't seem to figure out the "best" execution plan itself.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm
Now this is certainly not a SQL standard. But still, I'm wondering, are there any other RDBMS that support these kinds of hints, and I really mean hints that are "embedded" in SQL? Are they similar, syntactically (i.e. also placed between the SELECTkeyword and the first selected COLUMN)? Do you know of a general documentation page comparing hints in various RDBMS?
N.B: I'm mostly interested in these RDBMS: Postgres, MySQL, HSQLDB, H2, Derby, SQLite, DB2, Sybase, SQL Server
I know that in db2 the plans are made fixed in some way, not how. In Oracle 11g there are other options besides adding hints to queries. These are SQLProfiles and SQLPlan Baselines, both very powerful. I just finished a performance tuning project where we did not add even a single hint to the code, on the contrary.
You can add Oprimizer Hints to any SQL Server Query
The PLAN clause allows you to define a particular plan to your query in Firebird.
AFAIK, nothing standard nor close to it, but in general, you can do this in a lot of RDBM's, but not all.
I'd also remind you, if you are making some sort of comparison with other DB platforms, that hints in Oracle are entirely non-binding. Which is to say that Oracle is free to disregard your hint if it so chooses.
Hints can be helpfull but I find that I rarely use them anymore - at least not compared to the past when I was working with the older optimizers in earlier Oracle versions. Back then hints were much more of a staple to performance tuning than they are now.
I am stress testing a database table
I am looking for any software that can connect to my database and show me some metrics like no of rows in a table, time for inserts , inserts/time, table fragmentation[logical/physical] etc .
It would be great if the reporting tool can do the following:
1] Report in real time or atleast after some interval so that I do not have to wait for test to finish to get first look at the data
2] Ability to do stuff with the data later, like get 99.99 percentile, avg etc.
Is mostly freely available :)
Does anyone have any suggestion of something I can use with my Oracle table. Any pointers would be great.
I can actually write scripts to logg stuff like select count(*) etc .. but then I will have to spend a lot of time parsing and changing the data reporting rather than the tests.
I think some intelligent thing might already be out there ??
Thanks
Edit:
I am looking at a piece of design for
a new architecture
The tests are
"comparison" tests for different
designs and hence as far as I do it
on same hardware and same schema etc
they are comparable to some
granularity.
I want to monitor index
fragmentation, and response times
etc.
If you think there are other
things that can change please let me
know. I am trying to roll back the
table to particular state[basically
truncate] for each new iteration of
the test
First, Oracle has built-in functionality for telling you the number of rows in a table (either use count(*) or search 'gather statistics oracle' for another option).
But "stress testing a table" sounds to me like you're going down the wrong path. Most of the metrics you're mentioning ("time for inserts , inserts/time, table fragmentation[logical/physical] etc") are highly dependent on many factors:
what OS Oracle's running on
how the OS is tuned (i.e. other services running)
how the specific Oracle instance is configured
what underlying storage architecture Oracle's using (and how tablespaces are configured)
what other queries are being executed in the database at the exact same time as your test
But NONE of them would be related to the table design itself.
Now, if you're wondering if your normalized (or de-normalized) table schema is hurting your application, that's another matter. As is performance being degraded by improper/unneeded/missing indexes, triggers, or a host of other problems.
But if you really want an app that will give you real-time monitoring, check out Quest Software's Spotlight on Oracle. But it's definitely not free.
Just to add to the other comments, I believe what you really want is to stress test the queries you're running and not the table. The table is just a bunch of data blocks on a disk and the query is what will make the difference in performance as far as development is concerned. That will tell you if you need different indexes or need to redesign the query.
On the other hand, if you're looking at it as a DBA or system administrator, you're probably more interested in OS level statistics especially disk latency, memory paging, and CPU utilization.
All this is available in the enterprise manager which is my primary tuning tool for development and DBA. If you don't have that, read up on using sql_trace to profile your queries and your OS specific documentation on how to get those stats.