I have three custom functions that do very similar things: they pull different data from the same, rather complicated set of joins -- generating the joined table is what takes the time -- and are usually all called in the same select. This is obviously inefficient and I would like to improve performance, but what is the best way to go about it?
Create a materialised view of the complex join, covering all parameters, and just refer to this in each of the functions (or just omit the functions, altogether).
Roll the three functions together and return all values at once in a custom type.
Something else?
The first option seems, to a rookie like me, probably the best solution; but it obviously has the drawback of creating a pretty large materialised view, which would need maintenance (so it's refreshed as required); although this MV might be useful elsewhere... The second option would be a bit of a hack; but is there anything else that I haven't considered?
I would prefer to omit the functions altogether and re-write the queries so that functions are not used. Any function that selects data (especially using a "complex join") is a sure way to slow down your query, since the function must be executed once for each row processed (not even necessarily returned) by the main query, maybe 1000s (or 100,000s) of times.
I would choose option 2 to be honest, unless you really want a materialised view. You can cast the function as a table so you can select from it as normal.
Related
I have data from multiple sources - a combination of Excel (table and non table), csv and, sometimes, even a tsv.
I create queries for each data source and then I am bringing them together one step at a time or, actually, it's two steps: merge and then expand to bring in the fields I want for each data source.
This doesn't feel very efficient and I think that maybe I should be just joining everything together in the Data Model. The problem when I did that was that I couldn't then find a way to write a single query to access all the different fields spread across the different data sources.
If it were Access, I'd have no trouble creating a single query one I'd created all my relationships between my tables.
I feel as though I'm missing something: How can I build a single query out of the data model?
Hoping my question is clear. It feels like something that should be easy to do but I can't home in on it with a Google search.
It is never a good idea to push the heavy lifting downstream in Power Query. If you can, work with database views, not full tables, use a modular approach (several smaller queries that you then connect in the data model), filter early, remove unneeded columns etc.
The more work that has to be performed on data you don't really need, the slower the query will be. Please take a look at this article and this one, the latter one having a comprehensive list for Best Practices (you can also just do a search for that term, there are plenty).
In terms of creating a query from the data model, conceptually that makes little sense, as you could conceivably create circular references galore.
I have a table with data relating to several moments in time that I have to keep updated. To save space and time, however, each row in my table refers to a given day and hourly and quarter-hourly data for that day are scattered throughout the several columns in that same row. When updating the data for a particular moment in time I, therefore, must choose the column that has to be be updated through some programming logic in my PL/SQL procedures and functions.
Is there a way to dynamically choose the column or columns involved in an update/merge operation without having to assemble the query string anew every time? Performance is a concern and the throughput must be high, so I can't do anything that would perform poorly.
Edit: I am aware of normalization issues. However I still would like to know a good way for choosing the columns to be updated/merged dynamically and programatically.
The only way to dynamically choose what column or columns to use for a DML statement is to use dynamic SQL. And the only way to use dynamic SQL is to generate a SQL statement that can then be prepared and executed. Of course, you can assemble the string in a more or less efficient manner, you can potentially parse the statement once and execute it multiple times, etc. in order to minimize the expense of using dynamic SQL. But using dynamic SQL that performs close to what you'd get with static SQL requires quite a bit more work.
I'd echo Ben's point-- it doesn't appear that you are saving time by structuring your table this way. You'll likely get much better performance by normalizing the table properly. I'm not sure what space you believe you are saving but I would tend to doubt that denormalizing your table structure is going to save you much if anything in terms of space.
One way to do what is required is to create a package with all possible updates (which aren't that many, as I'll only update one field at a given time) and then choosing which query to use depending on my internal logic. This would, however, lead to a big if/else or switch/case-like statement. Is there a way to achieve similar results with better performance?
I have been reading that cursors are pretty slow and one should unless out of options avoid them. I am trying to optimize my stored procedures and one of them uses a cursor. It frequently is being called by my application and with lot of users(20000) and rows to update. I was thinking maybe I should use something else as an alternative.
All I am trying to do or want is to get a list of records and then operate on depending on each row value. So for e.g we have say -
Employee - Id,Name,BenefitId,StartDate,EndDate
So based on benefitId I need to do different calculation using dates between StartDate and EndDate and update employee details. I am just making this contrived example to give a idea on my situation.
What are your thoughts on it ? Are there better alternatives for cursors like say using temp tables or user defined functions? When should you really opt for them or should we never be using cursors ? Thanks everyone for their help.
I once changed a stored procedure from cursors to set based logic. Running time went from 8 hours to 22 seconds. That's the kind of difference we're talking about.
Instead of taking different action a record at a time, use several passes on the data. Update and set field1=A where field2 is X, then update and set field1= B where field2 is Y, etc.
I've changed out cursors and moved from over 24 hours of processing time to less than a minute.
TO help you see how to fix your proc with set-based logic, read this:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
A cursor does row-by-row processing, or "Row By Agonizing Row" if your name is Jeff Moden.
This is just one example of how to do set-based SQL programming as opposed to RBAR, but it depends ultimately on what your cursor is doing.
Also, have a look at this on StackOverflow:
RBAR vs. Set based programming for SQL
First off, it sounds like you are trying to mix some business logic in your stored procs. That's generally something you want to avoid. A better solution would be to have a middle tier layer which encapsulates that business logic. That way your data layer remains purely data.
To answer your original question, it really depends on what you are using the cursors for. In some cases you can use a table variable or a temp table. You have to remember to free up temp tables though so I would suggest using table variables whenever possible. Sometimes, though, there is just no way around using cursors. Maybe the original DBA's didn't normalize enough (or normalized too much) and you are forced to use a cursor to traverse through multiple tables without any foreign key relationships.
IF a create thousands of view, Does it hamper the database performance. I mean is there any problem with creating thousands of view in oracle. Please explain as I am new in this area...I am using oracle...
The simple existence of these views shouldn't harm performance at all. However, once those views start being used it's possible that there will be some negative performance impact. Oracle tries to "remember" the plan for each statement that it sees, but it compares statements by comparing the source code (the SQL). Your thousands of views will all be named differently since you can't have multiple views with the same name, and thus each time one of them is used Oracle is going to have to do a full parse of the SQL, even if it's something as basic as
SELECT * FROM VIEW_1;
and
SELECT * FROM VIEW_2;
All these re-parses will certainly take some time.
What's different about each of these views? I think it might be a good idea to step back and consider other possibilities. Questions I'd ask include
What is to be accomplished here?
Why are thousands of different views needed?
Is there some other way to accomplish what needs to be done without creating all these views?
I don't know the answers to 1 and 2, but I'm reasonably sure that the answer to #3 is "Yes".
Good luck.
Oracle views are an encapsulation of a
complex query and must be used with
care. Here are the key facts to
remember:
Views are not intended to improve SQL
performance. When you need to
encapsulate SQL, you should place it
inside a stored procedure rather than
use a view. Views hide the complexity
of the underlying query, making it
easier for inexperienced programmers
and end users to formulate queries.
Views can be used to tune queries with
hints, provided that the view is
always used in the proper context.
source: Guard against performance issues when using Oracle hints and views
View is as heavy to run as the select that creates it but Oracle loads balance and single select can't harm the DB. If you have thousands concurrent selects going then you might have a problem. The amount of views is not important but how heavy they are and how much you use them.
You would actually need to show the views code and tell what you are actually trying to do.
View should not affect performance, if optimizer is smart enough. I remember cases with other DB engines when Views do harm performance. As in many performance cases - I suggest to measure your particular case.
I am writing a stored procedure to perform a dynamic search that spans 10+ database tables. With millions of records in each table and a dynamic set of search parameters*, I am having some trouble optimizing the procedure.
Is there a "best practice" for building these kinds of queries? E.g. Use strings to build a dynamic query, use a huge list of IF THEN .. ELSE statements, etc? Can anyone provide a simple example or point me to some literature that will help? Here's some psuedocode for the stored procedure I am developing, which accepts a collection of parameters and a ref cursor.
v_query = "SELECT .....";
v_name = ... -- retrieve "name" parameter from collection
if v_name is not null then
v_query := v_query || ' AND table.Name = ' || v_name;
end if;
open search_cursor for v_query;
...
*By "dynamic set of search parameters," I mean that I pass in a collection of parameters. I figured this would be easier than making the caller pass in 20 parameters if they only want to search on one.
There are problems with using the static query approach; also be very careful about using the CURSOR_SHARING=FORCE option - it can really raise hell with your system if you haven't done a coverage test to ensure that all your other queries will work the way you want.
Problems with static queries:
The (x is null or x = col) predicates tend to kill any chance of using indexes. Since the query plan is computed at the time query is parsed the first time, the indexes you use will be based on the values for the first run of the query; later runs, which may not constrain on the same columns, will still use the same indexes.
Having one static statement with substitution variables will prevent the optimizer from making an intelligent choice about which index to use based on the data distribution. In a dynamic query (or in the first run of a query with bind variables), Oracle will see how selective your constraint is; a highly selective constraint will become a prime candidate for index use. For example, if your table had a row for every person in the U.S., STATE='Alaska' will be much more likely to use the index on STATE than STATE='California'.
Of course, in both these cases, if the dynamic columns in your WHERE clause are not indexed anyway, it doesn't matter, although I'd be surprised if that were the case in a database the size you're talking about.
Also, consider the real cost of all that hard parsing. Yes, hard parses serialize system resources, which makes them expensive, but only in the context of high volume queries. By their nature, ad-hoc queries do not get run very often. The cost you pay for all the hard parses you incur in an entire day will likely be hundreds of times less than the cost of a single query that uses the wrong indexes.
In the past, I've implemented these systems pretty much like you've done here - a base query portion, then iterating over a constraint list and adding WHERE clause predicates. I don't think it's hard for someone to maintain or understand, especially if you're talking about constraints that don't involve adding a lot of subqueries or extra tables to the FROM clause.
One thing to consider: If this system is primarily an offline one (in other words, not constantly being updated or inserted into - populated by periodic loads of bulk data), you may want to look into using BITMAP indexes. Bitmap indexes differ from regular b-tree indexes in that multiple indexes on a single table can be used simultaneously, and bitmap indexes are much, much smaller on disk than b-trees. They work very well for applications like this - where you will have a variety of constraints that can't be defined at design time. You will only want to put bitmap indexes on columns that have relatively few distinct values - say, one value constitutes no less than 1/1000 of the table - so don't use bitmaps on unique columns.
However, the downside is that bitmap indexes will noticeably degrade the performance of inserts and updates. The best practice for bitmaps is to use them in data warehouse applications, and they are dropped prior to loads and recreated afterwards.
Except in very particular cases, I don't think it is advisable (or even possible) to try to generate an optimized query. My advice is not to use dynamic SQL if you can : hard to read, hard to debug, hard to optimize, hard to maintain.
First, write a generic query that will work with any parameter sent to your procedure. According to your example, that would give something like :
SELECT * FROM table WHERE ((v_name IS NULL) OR (table.Name=v_name));
As you see, you could easily add other parameters to this query without using dynamic SQL. This query is much easier to read and debug. Ask your DBA for optimization tips.
Then, if you have a particular set of parameters that you know are often passed together, you could write a particular query for this set that you could specifically optimize. Pseudocode :
IF particular_set
THEN
/* Specific query */
ELSE
/* Generic query */
END IF;
The difficult part is to try not to have too many specific queries here, or you could fall into a maintenance hell.
We've had a similar requirement for one of our clients. They have half a dozen tables with millions of rows, and they wanted adhoc search capability on most of the columns.
The solution was a separate package for each table, which would take the search criteria and construct the SQL to run the search. We took advantage of the old system that was being replaced, to discover what the most common types of searches the users were doing, and made sure that those searches ran the best, by tuning the queries that were being generated (supported by the strategic use of indexes). Because each package was only responsible for queries against one table, it could have specific code designed to work with that table (including the odd hint, in a few rare cases).
One question/problem that you'll need to address is, do you hard-code the criteria (e.g. WHERE SURNAME='SMITH') or use bind variables? Using bind variables reduces hard parsing, which reduces load on the database server; however it can be impractical to use bind variables when the SQL is dynamically generated. The way we ended up going was to set CURSOR_SHARING=FORCE (which has its own disadvantages) which was a reasonable compromise in our case.
Read http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6711305251199