Access and Filter predicates in Oracle execution plan - oracle

What is the difference between Access and Filter predicates in Oracle execution plan?
If I understand correctly, "access" is used to determine which data blocks need to be read, and "filter" is applied after the blocks are read. Hence, filtering is "evil".
In the example of Predicate Information section of the execution plan below:
10 - access("DOMAIN_CODE"='BLCOLLSTS' AND "CURRENT_VERSION_IND"='Y')
filter("CURRENT_VERSION_IND"='Y')
why "CURRENT_VERSION_IND" is repeated in both Access and Filter sections?
The corresponding operation is INDEX RANGE scan on index, which is defined on fields (DOMAIN_CODE, CODE_VALUE, CURRENT_VERSION_IND, DECODE_DISPLAY).
My guess is that because CURRENT_VERSION_IND is not the second column in the index, Oracle can't use it during the Access stage. Hence, it accesses index by DOMAIN_CODE column, fetches all the blocks, and then filters them by CURRENT_VERSION_IND. Am I right?

No, the access predicates in this example indicates that the index is being traversed by both DOMAIN_CODE and CURRENT_VERSION_IND.
I wouldn't worry about the filter predicate that appears to be redundant - it seems to be a quirk of explain plan, probably something to do with the fact that it has to do a sort of skip-scan on the index (it does a range scan on the first column, then a skip scan over CODE_VALUE, searching for any matching CURRENT_VERSION_INDs).
Whether you need to modify the index or create another index is another matter entirely.
Also, just to correct a minor misunderstanding: the blocks have to be fetched from the index BEFORE it can do anything, whether executing the "access" or "filter" steps. If you're referring to fetching blocks from the table, then also the answer is no - you said the filter predicate "10" was on the index access, not on a table access; and anyway, there's no reason Oracle can't evaluate the filter on CURRENT_VERSION_IND on the index - it doesn't need to access the table at all, unless it needs other columns not included in the index.

I believe you are correct in your assessment of what Oracle is doing, but wrong to say that the filter step (or any other optimizer choice) is always "evil". It doesn't make sense to index absolutely every possible combination of columns that may be queried on, so filtering is frequently required.
However, if in this case adding CURRENT_VERSION_IND as the second column of the index improves performance significantly on frequently run queries and doesn't harm the performance of other queries, then it may make sense to do so.

Related

Does the presence of a single UPPER in CosmosDb SQL Queries force a full collection scan on partition?

Given the following SQL, the ManufacturerIdUpperCase is the partition key, and a lower cased value is passed as a hint to direct Cosmos to the correct partition. The "boat.OwnerIdUpperCase" in an indexed property. Will Cosmos use the ownerId to narrow the scan to the subset of documents for this owner, or does the use of the other two UPPER calls require a full collection scan?
SELECT * FROM boat
WHERE boat.ManufacturerIdUpperCase= #ManufacturerId
AND UPPER(boat.Owner.Type)= UPPER(#OwnerType)
AND boat.OwnerIdUppererCase= #BoatOwnerId)
AND UPPER(boat.BoatType) = UPPER(#BoatType)
I'm trying to decide if I need to maintain a lowercase copy of every property included in the various WHERE clauses, or, if I can do this for one of the remaining UPPER conversions on an indexed property that will reduce the scope of the dataset such that a scan is only required on the resulting subset, not the entire partition?
I've read the old posts like the one below, and run the SQL in the sandbox as proposed. In the simple scenario, I am seeing the same result as the author. However, my work scenario is more complex as described above.
DocumentDB: Performance impact of built-in string functions (like UPPER)
Victor, welcome to StackOverflow! I am from the Cosmos DB engineering team.
In this particular query, since all the filter predicates are intersections (ANDs), and not unions (ORs), Cosmos DB will narrow down the set of documents to evaluate and will not do a full scan. Please ensure that all the 4 fields (/ManufacturerIdUpperCase, /Owner/Type, /OwnerIdUppererCase, /BoatType) are indexed (added as part of "includedPaths" in the indexingPolicy).

performance issues while processing 2 tables in lockstep based on orderedBy from-to

Title is probably not very clear so let me explain.
I want to process a in-process join (nodeJs) on 2 tables*, Session and SessionAction. (1-N)
Since these tables are rather big (millions of records both) my idea was to get slices based on an orderBy sessionId (which they both share), and sort of lock-step walk through both tables in batches.
This however proves to be awefully slow. I'm using pseudo code as follows for both the tables to get the batches:
table('x').orderBy({index:"sessionId"}.filter(row.sessionId > start && row.sessionId < y)
It seems that even though I'm essentially filtering on a attribute sessionId which has got an index, the query planner is not smart enough to see this and every query does a complete tablescan to do the orderby before filtering afterwards (or so it seems)
Of course, this is incredibly wasteful but I don't see another option. E.g.:
Order after filter is not supported by Rethink.
Getting a slice of the ordered table doesn't work either, since slice-enumeration (i.e.: the xth until the yth record) for lack of a better work doesn't add up between the 2 tables.
Questions:
Is my approach indeed expected to be slow, due to having to do a table scan at each iteration/batch?
If so, how could I design my queries to get it working faster?
*) It's too involved to do it using Rethink Reql only.
filter is never indexed in RethinkDB. (In general a particular command will only use a secondary index if you pass index as one of its optional arguments.) You can write that query like this to avoid scanning over the whole table:
r.table('x').orderBy({index: 'sessionID'}).between(start, y, {index: 'sessionId'})

Are indexes used when an UPDATE is fired without a WHERE clause

In Oracle are indexes used when an UPDATE is fired without a WHERE clause
By "used", do you mean "referred to" or "modified"?
An UPDATE without a WHERE clause boils down to an iteration over the entire table; I see no good reason why Oracle should refer to an index in this case, as there's no benefit to be had from that. (Although that's little more than a qualified guess.) nonnb is right that the index will be affected depending on what column you touch.
If you update affects indexed columns, then the index pages will need to be replaced as well.
Will Oracle use the index to find the rows being updated? With no where clause, almost certainly not.
Will Oracle have to read one or more indexes, getting blocks in consistent mode to update them? If you're updating any columns that are indexed, have function-based indexes which will result in an updated indexed value, or cause row movement among partitions, then yes, indexes "will be used."

Force oracle to use index

Is there any way to force oracle to use index except Hints?
No. And if the optimizer doesn't use the index, it usually has a good reason for it. Index usage, if the index is poor, can actually slow your queries down.
Oracle doesn't use an index when it thinks the index is
disabled
invalid (for example, after a huge data load and the statistics about the index haven't been updated)
won't help (for example, when there are only two different values in 5 million rows)
So the first thing to check is that the index is enabled, then run the correct GATHER command on your index/table/schema. When that doesn't help, Oracle thinks that loading your index will actually take more time than loading the actual row values. In this case, add more columns to the index to make it appear more "diverse".
You might take a look at oracle stored outlines. You can take an existing query and create a stored outline and tweak the query just like hints. It is just very hard to use. Do some research before you decide to implement stored outlines.
You can add hints into the query that will cause it to look more favorably on one index over another index.
In general if you have collected good statistics on all the tables and indexes Oracle usually implements very good execution plans.
If your query doesn't include the indexed field in its conditions, then the DB would be foolish to use the index. Thus, I second Donnie's answer.
Yes, technically, you can force Oracle to use an index (without hints), in one scenario: if the table is an index-organized table, then logically the only way to query the table is via its index because there is no table to query.

How can I optimize a dynamic search query in Oracle

I am writing a stored procedure to perform a dynamic search that spans 10+ database tables. With millions of records in each table and a dynamic set of search parameters*, I am having some trouble optimizing the procedure.
Is there a "best practice" for building these kinds of queries? E.g. Use strings to build a dynamic query, use a huge list of IF THEN .. ELSE statements, etc? Can anyone provide a simple example or point me to some literature that will help? Here's some psuedocode for the stored procedure I am developing, which accepts a collection of parameters and a ref cursor.
v_query = "SELECT .....";
v_name = ... -- retrieve "name" parameter from collection
if v_name is not null then
v_query := v_query || ' AND table.Name = ' || v_name;
end if;
open search_cursor for v_query;
...
*By "dynamic set of search parameters," I mean that I pass in a collection of parameters. I figured this would be easier than making the caller pass in 20 parameters if they only want to search on one.
There are problems with using the static query approach; also be very careful about using the CURSOR_SHARING=FORCE option - it can really raise hell with your system if you haven't done a coverage test to ensure that all your other queries will work the way you want.
Problems with static queries:
The (x is null or x = col) predicates tend to kill any chance of using indexes. Since the query plan is computed at the time query is parsed the first time, the indexes you use will be based on the values for the first run of the query; later runs, which may not constrain on the same columns, will still use the same indexes.
Having one static statement with substitution variables will prevent the optimizer from making an intelligent choice about which index to use based on the data distribution. In a dynamic query (or in the first run of a query with bind variables), Oracle will see how selective your constraint is; a highly selective constraint will become a prime candidate for index use. For example, if your table had a row for every person in the U.S., STATE='Alaska' will be much more likely to use the index on STATE than STATE='California'.
Of course, in both these cases, if the dynamic columns in your WHERE clause are not indexed anyway, it doesn't matter, although I'd be surprised if that were the case in a database the size you're talking about.
Also, consider the real cost of all that hard parsing. Yes, hard parses serialize system resources, which makes them expensive, but only in the context of high volume queries. By their nature, ad-hoc queries do not get run very often. The cost you pay for all the hard parses you incur in an entire day will likely be hundreds of times less than the cost of a single query that uses the wrong indexes.
In the past, I've implemented these systems pretty much like you've done here - a base query portion, then iterating over a constraint list and adding WHERE clause predicates. I don't think it's hard for someone to maintain or understand, especially if you're talking about constraints that don't involve adding a lot of subqueries or extra tables to the FROM clause.
One thing to consider: If this system is primarily an offline one (in other words, not constantly being updated or inserted into - populated by periodic loads of bulk data), you may want to look into using BITMAP indexes. Bitmap indexes differ from regular b-tree indexes in that multiple indexes on a single table can be used simultaneously, and bitmap indexes are much, much smaller on disk than b-trees. They work very well for applications like this - where you will have a variety of constraints that can't be defined at design time. You will only want to put bitmap indexes on columns that have relatively few distinct values - say, one value constitutes no less than 1/1000 of the table - so don't use bitmaps on unique columns.
However, the downside is that bitmap indexes will noticeably degrade the performance of inserts and updates. The best practice for bitmaps is to use them in data warehouse applications, and they are dropped prior to loads and recreated afterwards.
Except in very particular cases, I don't think it is advisable (or even possible) to try to generate an optimized query. My advice is not to use dynamic SQL if you can : hard to read, hard to debug, hard to optimize, hard to maintain.
First, write a generic query that will work with any parameter sent to your procedure. According to your example, that would give something like :
SELECT * FROM table WHERE ((v_name IS NULL) OR (table.Name=v_name));
As you see, you could easily add other parameters to this query without using dynamic SQL. This query is much easier to read and debug. Ask your DBA for optimization tips.
Then, if you have a particular set of parameters that you know are often passed together, you could write a particular query for this set that you could specifically optimize. Pseudocode :
IF particular_set
THEN
/* Specific query */
ELSE
/* Generic query */
END IF;
The difficult part is to try not to have too many specific queries here, or you could fall into a maintenance hell.
We've had a similar requirement for one of our clients. They have half a dozen tables with millions of rows, and they wanted adhoc search capability on most of the columns.
The solution was a separate package for each table, which would take the search criteria and construct the SQL to run the search. We took advantage of the old system that was being replaced, to discover what the most common types of searches the users were doing, and made sure that those searches ran the best, by tuning the queries that were being generated (supported by the strategic use of indexes). Because each package was only responsible for queries against one table, it could have specific code designed to work with that table (including the odd hint, in a few rare cases).
One question/problem that you'll need to address is, do you hard-code the criteria (e.g. WHERE SURNAME='SMITH') or use bind variables? Using bind variables reduces hard parsing, which reduces load on the database server; however it can be impractical to use bind variables when the SQL is dynamically generated. The way we ended up going was to set CURSOR_SHARING=FORCE (which has its own disadvantages) which was a reasonable compromise in our case.
Read http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6711305251199

Resources