oracle order by optimization - oracle

I am running a query on a large table and I am expecting a large number of returning row.
unfortunately I need to order the result by 2 columns, which makes the query quite slow.
I added an index to those specific columns but was wondering, if the order direction makes a difference.
one column is ordered desc and one is order asc.
thanks and best wishes,
e.

Your query might benefit from an index ordered the same way as your order by clause e.g.
create index index1 on table1 (col1 desc, col2 asc);
Whether it will benefit depends on the relative cost of the index scans and table lookups versus a simple full table scan. If the number of rows you want is low relative to the total number of rows in the table the query might benefit.
The only way to know for sure is try it.

Related

Do PostgreSQL query plans depend on table row count?

My users table doesn't have many rows... yet. 😏
Might the query plan of the same query change as the table grows?
I.e., to see how my application will scale, should I seed my users table with BILLIONS 🤑 of rows before using EXPLAIN?
Estimated row counts are probably the most important factor that influence which query plan is chosen.
Two examples that support this:
If you use a WHERE condition on an indexed column of a table, three things can happen:
If the table is very small or a high percentage of the rows match the condition, a sequential scan will be used to read the whole table and filter out the rows that match the condition.
If the table is large and a low percentage of the rows match the condition, an index scan will be used.
If the table is large and a medium percentage of rows match the condition, a bitmap index scan will be used.
If you join two tables, the estimated row counts on the tables will determine if a nested loop join is chosen or not.

what is skewed column in Oracle

I found some bottleneck of my query which select data from only single table then require time and i used non unique key index on two column and with column used in where clause.
select name ,isComplete from Student where year='2015' and isComplete='F'
Now i found some concept from internet like skewed column so what is it?
have an idea then plz help me?
and how to resolve problem of skewed column?
and how skewed column affect performance of the Query?
Skewed columns are columns in which the data is not evenly distributed among the rows.
For example, suppose:
You have a table order_lines with 100,000,000 rows
The table has a column named customer_id
You have 1,000,000 distinct customers
Some (very large) customers can have hundreds of thousands or millions of order lines.
In the above example, the data in order_lines.customer_id is skewed. On average, you'd expect each distinct customer_id to have 100 order lines (100 million rows divided by 1 million distinct customers). But some large customers have many, many more than 100 order lines.
This hurts performance because Oracle bases its execution plan on statistics. So, statistically speaking, Oracle thinks it can access order_lines based on a non-unique index on customer_id and get only 100 records back, which it might then join to another table or whatever using a NESTED LOOP operation.
But, then when it actually gets 1,000,000 order lines for a particular customer, the index access and nested loop join are hideously slow. It would have been far better for Oracle to do a full table scan and hash join to the other table.
So, when there is skewed data, the optimal access plan depends on which particular customer you are selecting!
Oracle lets you avoid this problem by optionally gathering "histograms" on columns, so Oracle knows which values have lots of rows and which have only a few. That gives the Oracle optimizer the information it needs to generate the best plan in most cases.
Full table scan and Index Scan both are depend on the Skewed column.
and Skewed column is nothing but your spread like gender column contain 60 male and 40 female.

Can a query use multiple indexes from the same table?

My question is similar to this one, but with a small difference. I have a query running on a single table with multiple WHERE conditions.
Assuming my table has multiple columns (col1 - col9) and I have a query like:
SELECT
col1
, col5
FROM table1
WHERE col1 = 'a'
AND col2 = 'b'
AND col3 = 100
AND col4 = '10a'
AND col5 = 1
And my indexes are:
col1 - unique / non-partitioned
col2, col3 - non-unique / partitioned
col4, col5 - non-unique / partitioned
My question is, if I'm using columns in my WHERE clause that cover multiple indexes, will (should?) the query pick the unique index first to generate a result set and then on that result set use the other two indexes for further filtering, sequentially reducing the result set?
Or will each index go over the entire data in the table and each condition will use an index and later merge all of the result sets?
(I don't have access to a table/data, this is more theoretical than practical).
Thank you in advance for any help
The Oracle optimiser (in more recent versions of Oracle, and unless you force it to behave otherwise) is cost based rather than rule based. When the query is first executed it will consider many different paths to obtain the answer, and choose the one with the lowest cost.
So it's generally impossible to say, ahead of time, how the database will choose to answer a particular query. The answer is always - it depends. It depends on
The statistics for the table, and the number of distinct values on each column
The version of the database you are using
System and session parameters
Statistics for the index
In general, what it will do in most cases is to choose whatever is the most selective index. So if you only had one or two rows where col1='a', it would probably go in on that index, and then scan the rows within it.
As the other answer mentions, the database can combine B-Tree indexes by going through a bitmap conversion stage. This is relatively expensive, and not available in all Oracle versions, but it can happen .
So in summary, the database can do either of the approaches you mention. The only way to know what it will do in your circumstance is to use explain plan or the equivalent tools to watch what it does

oracle partitioning on columns frequently used in joins and where conditions

The customer table contains 9.5 million records. The customer_id column is the primary key. The database is Oracle.
Questions:
1) Should the table contain main partitions or sub-partitions? How do I decide?
Also, I don't think indexing columnA or columnB will help here because of the type of data.
TableA.columnA (varchar) has more than 80% of the records for columnA values 5,6,7. The columnA has values from 1 to 7 only.
TableA.columnB (varchar) has 90% of the records for columnB value = 102. The columnB has values from 1 to 999.
Moreover, the typical queries are (in no particular order):
Query1: where tableA.columnA = values
Query2: where tableA.columnB = values
Query3: where tableA.columnA = values AND/OR tableA.columnB = values
2) When we create sub-partitions, what happens if the query only contains a where clause for sub-partition column? Does the query execution go directly to sub-partition or through main partition?
3) the join contains tableA.partitioned_column = tableB.indexed_column
(eg. customer_Table.branch_code = branch_table.branch_code)
Does partitioning help in the case of JOIN? Will it improve performance?
1) It's very difficult to answer not knowing table structure, the way it's usually used etc. But generally for big tables partitioning is very often necessity.
2) If you will not specify partition then Oracle will have to browse through all partitions to find where the subpartition is (which is not very slow). And then use partition pruning on subpartition. It will be still significantly faster then not having subpartitions at all. But the best situation is to refer in WHERE to partition and subpartition.
3) For 99% I think it will help, because Oracle can use partition pruning to get at once needed rows from tableA. You will be 100% sure if you check query plan. But the best situation is when both column are partition keys.
If 80-90% of these columns have the same values and they are the most often queried values, then partitioning will help some. You would be pruning 10-20% of the data during these queries but you probably want to find another way for Oracle to hone in on the data your query needs (dates, perhaps?)
The value distribution in your two columns also brings up the point of statistics and making sure they are being gathered properly (with histograms to describe the skew in these columns).
As #psur points out, without knowing the details of your system it's hard give concrete suggestions.

why is selecting a pk column faster than a non-indexed column?

I'm currently doing some tests and I noticed the following:
select field1 from table1
Will result into an index fast full scan when field1 is the primary key, thus with a low cost (in my case it is 4690), whereas
select field2 from table1
Will result into a table access full (there's no constraint nor index on field2, yet even with a regular index the result is the same), with a cost of 117591.
I'm aware of the gain when the indexes/constraints are involved in JOIN/WHERE clauses, but in my case there's nothing filtered: I don't understand why the PK should be faster because, anyway, I am retrieving all the rows...
Is it because of the uniqueness? Tom says that a unique index is the same as a conventional index, structurally, which really makes me wonder why selecting the PK would cost less than any other column.
Thanks for your enlightenments :-)
rgds.
A single-column b-tree index does not store data for rows that are NULL. So if you have an index on field2 but field2 allows NULL, Oracle can't do a scan on the index without risking potentially returning incorrect data. A full table scan is, therefore, the only valid way for Oracle to retrieve the data for the field2 column for every row in table1. If you add a NOT NULL constraint to field2, Oracle should be able to at least consider doing a full scan of the index.
Of course, whether or not the optimizer chooses to use the index (and the cost it ultimately assigns to using the index) will depend on the statistics that you've gathered both on the index and on the table. If your statistics are inaccurate, the optimizer's cost estimates are going to be inaccurate and so the plan that is generated is likely to be inefficient. That's one of the reasons that people are usually advised to be cautious about putting too much credence into Oracle's estimate of the cost of a plan-- if you're looking at a plan, it's likely because you suspect it is inefficient which should imply that you can't rely on the cost. You're generally much better served looking at the cardinality estimates for each step and determining whether those make sense given your distribution of data.

Resources