I am trying to select a distinct list of a certain column from a table with many millions of rows, such as:
select distinct stylecode from bass.stock_snapshot
This query obviously takes a very long time. What performance tuning can I do on this table?
If there are no predicates to my query, will an index help at all?
" just did this on a test table and the explain plan shows it did use
the index."
Please bear in mind that you have to maintain that index for ever more. I don't understand your data but it seems unlikely this index will be useful for other queries, and this query doesn't seem like the sort of query you ought to be running on a frequent basis.
If this is a one-off, some other approach such as parallel query might be better.
If on the other hand it is a frequent requirement perhaps a reference table for STYLECODE would be a good idea.
Related
Let's say I have a table called PEOPLE having three columns, ID, LastName, and FirstName. None of these columns are indexed.
LastName is more unique, and FirstName is less unique.
If I do two searches:
select * from PEOPLE where FirstName="F" and LastName="L"
select * from PEOPLE where LastName="L" and FirstName="F"
My belief is the second one is faster because the more unique criterion (LastName) comes first in the where clause, and records will get eliminated more efficiently. I don't think the optimizer is smart enough to optimize the first SQL query.
Is my understanding correct?
No, that order doesn't matter (or at least: shouldn't matter).
Any decent query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query.
I know the SQL Server query optimizer will pick a suitable index - no matter which order you have your two conditions in. I assume other RDBMS will have similar strategies.
What does matter is whether or not you have a suitable index for this!
In the case of SQL Server, it will likely use an index if you have:
an index on (LastName, FirstName)
an index on (FirstName, LastName)
an index on just (LastName), or just (FirstName) (or both)
On the other hand - again for SQL Server - if you use SELECT * to grab all columns from a table, and the table is rather small, then there's a good chance the query optimizer will just do a table (or clustered index) scan instead of using an index (because the lookup into the full data page to get all other columns just gets too expensive very quickly).
The order of WHERE clauses should not make a difference in a database that conforms to the SQL standard. The order of evaluation is not guaranteed in most databases.
Do not think that SQL cares about the order. The following generates an error in SQL Server:
select *
from INFORMATION_SCHEMA.TABLES
where ISNUMERIC(table_name) = 1 and CAST(table_name as int) <> 0
If the first part of this clause were executed first, then only numeric table names would be cast as integers. However, it fails, providing a clear example that SQL Server (as with other databases) does not care about the order of clauses in the WHERE statement.
ANSI SQL Draft 2003 5WD-01-Framework-2003-09.pdf
6.3.3.3 Rule evaluation order
...
Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions is generally performed from left to right. However, it is implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or if the results of the expressions can be determined without completely evaluating all parts of the expression.
copied from here
No, all the RDBMs first start by analysing the query and optimize it by reordering your where clause.
Depending on which RDBM you are you using can display what is the result of the analyse (search for explain plan in oracle for instance)
M.
It's true as far as it goes, assuming the names aren't indexed.
Different data would make it wrong though. In order to find out which way to do it, which could differ every time, the DBMS would have to run a distinct count query for each column and compare the numbers, that would cost more than just shrugging and getting on with it.
Original OP statement
My belief is the second one is faster because the more unique criterion (LastName) comes first in >the where clause, and records will get eliminated more efficiently. I don't think the optimizer is >smart enough to optimize the first sql.
I guess you are confusing this with selecting the order of columns while creating the indexes where you have to put the more selective columns first than second most selective and so on.
BTW, for the above two query SQL server optimizer will not do any optimization but will use Trivila plan as long as the total cost of the plan is less than parallelism threshold cost.
I am investigating a problem where our application takes too much time to get data from Oracle Database. In my investigation, I found that the slowness of the query traces to the join between tables and because of the aggregate function- SUM.
This may look simple but I am not a good with SQL query optimization.
The query is below
SELECT T1.TONNES, SUM(R.TONNES) AS TOTAL_TONNES
FROM
RECLAIMED R ,
(SELECT DELIVERY_OUT_ID, SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_IN_ID=53773 GROUP BY DELIVERY_OUT_ID) T1
where
R.DELIVERY_OUT_ID = T1.DELIVERY_OUT_ID
GROUP BY
T1.TONNES
SUM(R.TONNES) is the total tonnes per delivery out.
SUM(TONNES) is the total tonnes per delivery in.
My table looks like
I have 16 million entries in this table, and by trying multiple delivery_in_id's by average I am getting about 6 seconds for the query to comeback.
I have similar database (complete copy but only have 4 million entries) and when the same query is applied I am getting less than 1 seconds.
They have both the same indexes so I am confident that index is not a problem.
I am certain that it is just the data, it is heavy on the first database(16 million). I have a feeling that when this query is optimized then the problem will be solved.
Open for suggestions : )
Are the two DB on the same server? If it's not, first, compare the computer configuration, settings and running applications.
If there is no differences, you can try to check if your have NULL values in the column you want to SUM. Use NVL-function to improve your query if there are some.
Also, you may "Analyse index" (or "Rebuild Index"). It cleans up the index. (it's quite fast and safe for your data).
If it is not helping, look if the TABLESPACE of your table is not full. It might have some impact... but I am not sure.
;-)
I've solved the performance problem by updating the stored procedure. It is optimized in a way by adding a filter in the first table before joining the second table. Below is the outcome stored procedure
SELECT R.DELIVERY_IN_ID, R.DELIVERY_OUT_ID, SUM(R.TONNES),
(SELECT SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_OUT_ID=R.DELIVERY_OUT_ID) AS TOTAL_TONNES
FROM
CTSBT_RECLAIMED R
WHERE DELIVERY_IN_ID=53733
GROUP BY DELIVERY_IN_ID, R.DELIVERY_OUT_ID
The result in timing/performance is huge for my case since I am joining a huge table(16M).This query now goes for less than a second. The slowness, I am suspecting is due to 1 table having no index(see T1) and even though in my case it is only about 20 items, it is enough to slow the query down because it compares it to 16 million entries.
The optimized query does the filtering of this 16 million and merge to T1 after.
Should there be a better way how to optimized this? Probably. But I am happy with this result and solved what I intended to solve. Now moving on.
Thanks for those who commented.
General Overview: I have an Oracle table 'product' that contains approximately 80 million records and I would like to improve the performance of joins that use this table. In most cases we are interested in a very small subset of records from (table) 'product' with (column) 'valid_until' date (value) 'mm/dd/9999'.
Possible solutions:
Partition 'mm/dd/9999' and use partition exchange to quickly load new data.
Use an index on 'valid_until' date.
Do you guys have any other possible Oracle solutions or ideas?
Based on needing to find 1% of records, I would expect an index to be adequate. It might pay to include the PK of the table as well if the query is just to find that for the current products.
If there is not a need to identify records by other valid_until dates then it might be worth using Oracle's equivalent of a partial index by indexing on:
case value_until
when date '...whatever the date is...'
then valid_until
else null
end
... but that would mean changing the schema or the tool that generates the queries or both.
You might keep an eye on the table's statistics to make sure that the cardinality of the selected rows is subject to a reasonably accurate estimation.
I wouldn't go for a partition-based solution as a first choice, as the overhead of row-migration during the update of the valid_until values would be fairly high, but if an index cannot deliver the query performance then by all means try.
I´m currently working on optimzing my database schema in regards of index structures. As I´d like to increase my DDL performance I´m searching for potential drop candidates on my Oracle 12c system. Here´s the scenario in which I don´t know what the consequences for the query performance might be if I drop the index.
Given two indexes on the same table:
- non-unique, single column index IX_A (indexes column A)
- unique, combined index UQ_AB (indexes column A, then B)
Using index monitoring I found that the query optimizer didn´t choose UQ_AB, but only IX_A (probably because it´s smaller and thus faster to read). As UQ_AB contains column A and additionally column B I´d like to drop IX_A. Though I´m not sure if I get any performance penalties if I do so. Does the higher selectivity of the combined unique index have any influence on the execution plans?
It could do, though it's quite likely to be minor (usually). Of course it depends on various things, for example how large the values in column B are.
You can look at various columns in USER_INDEXES to compare the two indexes, such as:
BLEVEL: tells you the "height" of the index tree (well, height is BLEVEL+1)
LEAF_BLOCKS: how many data blocks are occupied by the index values
DISTINCT_KEYS: how "selective" the index is
(You need to have analyzed the table first for these to be accurate). That will give you an idea of how much work Oracle needs to do to find a row using the index.
Of course the only way to really be sure is to benchmark and compare timings or even trace output.
We have a huge table which are 144 million rows available right now and also increasing 1 million rows each day.
I would like to create a partitioning table on Oracle 11G server but I am not aware of the techniques. So I have two question :
Is it possible to create a partitioning table from a table that don't have PK?
What is your suggestion to create a partitioning table like huge records?
Yes, but keep in mind that the partition key must be a part of PK
Avoid global indexes
Chose right partitioning key - have it prepared for some kind of future maintenance ( cutting off oldest or unnecessary partitions, placing them in separate tablespaces... etc)
There are too many things to consider.
"There are several non-unique index on the table. But, the performance
is realy terrible! Just simple count function was return result after
5 minutes."
Partitioning is not necessarily a performance enhancer. The partition key will allow certain queries to benefit from partition pruning i.e. those queries which drive off the partition key in the WHERE clause. Other queries may perform worse, if there WHERE clause runs against the grain of the partition key,
It is difficult to give specific advice because the details you've posted are so vague. But here are some other possible ways of speeding up queries on big tables:
index compression
parallel query
better, probably compound, indexes.