My question is similar to this one, but with a small difference. I have a query running on a single table with multiple WHERE conditions.
Assuming my table has multiple columns (col1 - col9) and I have a query like:
SELECT
col1
, col5
FROM table1
WHERE col1 = 'a'
AND col2 = 'b'
AND col3 = 100
AND col4 = '10a'
AND col5 = 1
And my indexes are:
col1 - unique / non-partitioned
col2, col3 - non-unique / partitioned
col4, col5 - non-unique / partitioned
My question is, if I'm using columns in my WHERE clause that cover multiple indexes, will (should?) the query pick the unique index first to generate a result set and then on that result set use the other two indexes for further filtering, sequentially reducing the result set?
Or will each index go over the entire data in the table and each condition will use an index and later merge all of the result sets?
(I don't have access to a table/data, this is more theoretical than practical).
Thank you in advance for any help
The Oracle optimiser (in more recent versions of Oracle, and unless you force it to behave otherwise) is cost based rather than rule based. When the query is first executed it will consider many different paths to obtain the answer, and choose the one with the lowest cost.
So it's generally impossible to say, ahead of time, how the database will choose to answer a particular query. The answer is always - it depends. It depends on
The statistics for the table, and the number of distinct values on each column
The version of the database you are using
System and session parameters
Statistics for the index
In general, what it will do in most cases is to choose whatever is the most selective index. So if you only had one or two rows where col1='a', it would probably go in on that index, and then scan the rows within it.
As the other answer mentions, the database can combine B-Tree indexes by going through a bitmap conversion stage. This is relatively expensive, and not available in all Oracle versions, but it can happen .
So in summary, the database can do either of the approaches you mention. The only way to know what it will do in your circumstance is to use explain plan or the equivalent tools to watch what it does
Related
I need some help on how to perform auto partition on integer column, similar to how we do on date column like PARTITION BY RANGE (DIM_DT_ID) INTERVAL (NUMTODSINTERVAL(1,'DAY')).
I have 90 million rows and it sucks in performance and our SLA on query is 2 seconds, i would like to perform partition. What is the best approach and how do i enable auto partition on a Integer column
Our query will always filter by these columns like
select * from <tbname>
where ObjectID =1346785
and patentnumber=23456.
"i'm just making an example here, as i cant paste the original query for legality sake"
Fair enough, but the advice we give you will only be as good as the information you give us. So far, nothing you have posted suggests you need Partitioning.
The pasted query would perform well with a compound index, and would probably benefit from compression of the leading column:
create index your_table_lookup_index
on your_table(ObjectID, patentnumber) compress 1;
If that's a unique combination then make the index unique.
how do i enable auto partition on a Integer column
However, if you think you do have a genuine use case for Partitioning then we can use Interval Partitioning with integers as well as dates. This statement will create a table partitioned on objectid with a partition for every ten values.
create table your_table (
objectid number,
patentnumber number,
created_date date
)
partition by range (objectid)
interval (10)
(
partition p_00010 values less than (10)
);
On your posted figures that would be about 400 partitions with around 225000 rows per partition. Is that a good choice? Who can tell? You know your data and your use cases, we don't: perhaps a partition per objectid (i.e. with interval (1)) would be better.
You already have a table so you need to split it into Partitions. The standard of way of doing this would be
create a new table with your partitioning strategy (like above) but with the default partition ranged for values less than (MAXVALUE)
use partition exchange to move the existing table data into the new
structure
drop the old table and rename the table to the old table; resolve
foreign keys and other dependencies.
iteratively split the partition into the required range
This is a fairly time-consuming process. You have tagged your question [oracle12c]; if you're using Oracle 12c R2 you should definitely look at its online conversion mechanism, which is a single command. Find out more.
Remember that Partitioning for performance is a tricky game. While it can improve queries which return a large number of rows aligned with the Partition key it can make no difference to other queries, or even impair their performance. In particular, any query which does not include the partition key (objectid in your case) will likely perform worse after partitioning the table .
Final aside: as you know but for the benefit of future Seekers, Partitioning is a chargeable extra to the Enterprise Edition license. We're not allowed to use it unless we've paid for it.
I am running a query on a large table and I am expecting a large number of returning row.
unfortunately I need to order the result by 2 columns, which makes the query quite slow.
I added an index to those specific columns but was wondering, if the order direction makes a difference.
one column is ordered desc and one is order asc.
thanks and best wishes,
e.
Your query might benefit from an index ordered the same way as your order by clause e.g.
create index index1 on table1 (col1 desc, col2 asc);
Whether it will benefit depends on the relative cost of the index scans and table lookups versus a simple full table scan. If the number of rows you want is low relative to the total number of rows in the table the query might benefit.
The only way to know for sure is try it.
I have to sum a huge number of data with aggregation and where clause, using this query
what I am doing is like this : I have three tables one contains terms the second contains user terms , and the third contains correlation factor between term and user term.
I want to calculate the similarity between the sentence that that user inserted with an already existing sentences, and take the results greater than .5 by summing the correlation factor between sentences' terms
The problem is that this query takes more than 15 min. because I have huge tables
any suggestions to improve performance please?
insert into PLAG_SENTENCE_SIMILARITY
SELECT plag_TERMS.SENTENCE_ID ,plag_User_TERMS.SENTENCE_ID,
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length),
plag_TERMs.isn,
plag_user_terms.isn
FROM plag_TERM_CORRELATIONS3,
plag_TERMS,
Plag_User_TERMS
WHERE ( Plag_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM1
AND Plag_User_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM2
AND Plag_User_Terms.ISN=123)
having
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length) >0.5
group by (plag_User_TERMS.SENTENCE_ID,plag_TERMS.SENTENCE_ID , plag_TERMs.isn, plag_terms.sentence_length,plag_user_terms.sentence_length, plag_user_terms.isn);
plag_terms contains more than 50 million records and plag_correlations3 contains 500000
If you have a sufficient amount of free disk space, then create a materialized view
over the join of the three tables
fast-refreshable on commit (don't use the ANSI join syntax here, even if tempted to do so, or the mview won't be fast-refreshable ... a strange bug in Oracle)
with query rewrite enabled
properly physically organized for quick calculations
The query rewrite is optional. If you can modify the above insert-select, then you can just select from the materialized view instead of selecting from the join of the three tables.
As for the physical organization, consider
hash partitioning by Plag_User_Terms.ISN (with a sufficiently high number of partitions; don't hesitate to partition your table with e.g. 1024 partitions, if it seems reasonable) if you want to do a bulk calculation over all values of ISN
single-table hash clustering by Plag_User_Terms.ISN if you want to retain your calculation over a single ISN
If you don't have a spare disk space, then just hint your query to
either use nested loops joins, since the number of rows processed seems to be quite low (assumed by the estimations in the execution plan)
or full-scan the plag_correlations3 table in parallel
Bottom line: Constrain your tables with foreign keys, check constraints, not-null constraints, unique constraints, everything! Because Oracle optimizer is capable of using most of these informations to its advantage, as are the people who tune SQL queries.
The customer table contains 9.5 million records. The customer_id column is the primary key. The database is Oracle.
Questions:
1) Should the table contain main partitions or sub-partitions? How do I decide?
Also, I don't think indexing columnA or columnB will help here because of the type of data.
TableA.columnA (varchar) has more than 80% of the records for columnA values 5,6,7. The columnA has values from 1 to 7 only.
TableA.columnB (varchar) has 90% of the records for columnB value = 102. The columnB has values from 1 to 999.
Moreover, the typical queries are (in no particular order):
Query1: where tableA.columnA = values
Query2: where tableA.columnB = values
Query3: where tableA.columnA = values AND/OR tableA.columnB = values
2) When we create sub-partitions, what happens if the query only contains a where clause for sub-partition column? Does the query execution go directly to sub-partition or through main partition?
3) the join contains tableA.partitioned_column = tableB.indexed_column
(eg. customer_Table.branch_code = branch_table.branch_code)
Does partitioning help in the case of JOIN? Will it improve performance?
1) It's very difficult to answer not knowing table structure, the way it's usually used etc. But generally for big tables partitioning is very often necessity.
2) If you will not specify partition then Oracle will have to browse through all partitions to find where the subpartition is (which is not very slow). And then use partition pruning on subpartition. It will be still significantly faster then not having subpartitions at all. But the best situation is to refer in WHERE to partition and subpartition.
3) For 99% I think it will help, because Oracle can use partition pruning to get at once needed rows from tableA. You will be 100% sure if you check query plan. But the best situation is when both column are partition keys.
If 80-90% of these columns have the same values and they are the most often queried values, then partitioning will help some. You would be pruning 10-20% of the data during these queries but you probably want to find another way for Oracle to hone in on the data your query needs (dates, perhaps?)
The value distribution in your two columns also brings up the point of statistics and making sure they are being gathered properly (with histograms to describe the skew in these columns).
As #psur points out, without knowing the details of your system it's hard give concrete suggestions.
In general, every index on a table slows down INSERTs into the table
by a factor of three; two indexes generally make the insert twice as
slow as one index. (Yet, a two-part single index is not much worse
than a single-part single index).
I got this from the book Oracle 9i Performance Tuning Tips and Techniques by Richard Niemiec (Osborne Oracle Press Series).
What does the following terms mean:
Two-part single index
Single part single index
Are there any more kinds of indexes?
.
By two-part index I presume Rich means a composite index, that is an index built on multiple columns. Like this:
create index t23_t_idx on t23 (col4, col2);
Whereas a single part index indexes a single column:
create index t23_s_idx on t23(col1);
The indexes created above are b-tree indexes. Oracle has many other types of indexes. For starters, indexes can be unique, in which case they only allow one instance of the given value in the indexed column (or permutation of values for composite columns).
There are also bit-mapped indexes, which impose a much higher performance penalty on DML but which speed up certain types of query; it is rare to come across bitmapped indexes outside of data warehouses.
We can create function-based indexes which allow us to index the results of a deterministic function (i.e. one that is guaranteed to produce the same result for a given input). This is how we can build an index on a date column which ignores the time element:
create index t23_fbi_idx on t23( trunc(col_34));
We can also build domain indexes on text columns. And there are special indexes for partitioned tables.
All of these are covered in more detail in the documentation. Find out more.
I would assume that the author is referring to a composite index when he talks about a "two-part single index". The term "composite index" is a far more common way to refer to an index on multiple columns of a table.
If you have a single composite index on two columns, there is only one index structure that needs to be maintained during an insert so the overhead of index maintenance is not much different than the overhead of maintaining one single-column index.
CREATE TABLE t1 (
col1 NUMBER,
col2 NUMBER,
col3 NUMBER
);
CREATE INDEX t1_composite_idx
ON t1( col1, col2 );
On the other hand, if you create separate indexes on each column individually, Oracle has to maintain two separate index structures which does roughly double the amount of index maintenance that is needed
CREATE TABLE t1 (
col1 NUMBER,
col2 NUMBER,
col3 NUMBER
);
CREATE INDEX t1_idx1
ON t1( col1 );
CREATE INDEX t1_idx2
ON t1( col2 );
I would be rather leery, however, of the "factor of three" that the author quotes, however. There are a lot of variables that come into play that are not captured by that particular rule of thumb. It's useful to remember that adding indexes imposes potentially substantial costs on insert operations but it's much more useful to measure the actual cost that you are imposing when you are weighing the trade-offs to creating another index.
Are there any more kinds of indexes?
As for your last question-- Oracle has quite a few different types of indexes (particularly if we are counting composite indexes as a different type of index). This answer has been solely dealing with b*-tree indexes which are what people normally mean when they refer to "indexes" without qualifiers. Oracle, however, supports a number of different types of indexes-- b*-tree indexes, bitmap indexes, Text indexes, etc. It creates LOB indexes. It supports user-defined extensible indexes. And within each type of index, there are often dozens of different options. For example, you can create a function-based b*-tree index or a bitmap join index, you can specify custom lexers for an Oracle Text index, or you can define your own index structure for your own custom type.
Since the author does not seem to actually ever define the term, I can only guess that they mean a two-part single index is a composite key comprised of two columns and a single-part single index is an index based on a single column.