Table joins with between clause performances - performance

I want to fetch a column from one table using between condition as in the below query. I joined the tables but it takes lot of time if the tables are having 100k records. Is there any way to rewrite this ?
I need a.grade in the result of my value lies between a.low and a.high.. there can be many matches for one value.
Select a.grade, b.val
From tbl1 a, tbl2b
Where b.val between a.low and a.high;
Also I have an index on (low,high) but optimiser is not using it

Related

Oracle Composite Index Performace

My sample oracle query structure is this:
SELECT <LIST_OF_COLUMNS> FROM <TABLE_NAME>
WHERE COLUMN_01 = <SOMETHING> AND COLUMN_02 = <SOMETHING> AND COLUMN_03 = <SOMETHING>
The table has over 1 Million records. I have indexed COLUMN_01, COLUMN_02 and COLUMN_03 separately. The above query is working fine and provide results as expected.
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance? If so, is there an order for composite index columns?
If I use OR instead of AND like this query, will it improve performance?
SELECT <LIST_OF_COLUMNS> FROM <TABLE_NAME>
WHERE COLUMN_01 = <SOMETHING> OR COLUMN_02 = <SOMETHING> OR COLUMN_03 = <SOMETHING>
If you create a composite index based on the three columns AND
you never query the table using only one column in the WHERE
clause, you won't need the single column indexes. Otherwise, it
depends on what columns participate in a query. And this case should
be well analyzed and tested.
Columns order in a composite index does matter. Columns should be ordered by uniqueness where the least distinct column goes first. It helps trim down the number of rows matched the query predicate and thus speed up performance.
It also should be noticed that Oracle can use a composite index with queries that do not contain all the index columns in their predicates.
For example:
create index idx1 on table_name (col1, col2, col3);
/*In this query Oracle can use index idx1 as a standard one-column index,
because col1 is the first column in the index*/
select * from table_name where col1 = 'some_value';
/*Here Oracle can still use the composite index,
but in this case it will use INDEX SKIP SCAN (assuming col1 equals ANY value),
which reduces query performance comparing to an ordinary index*/
select * from table_name where col2 = 'some_value1' and col3 = 'some_value2';
OR or AND operators do not really matter here. What is more important is the number of rows which match the given predicate.
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance?
Probably. One index which satisfies all WHERE criteria serves as a complete access path and hence is more effective than a single column index access path. The optimizer chooses one index, so it will index read all the rows matching (say) COLUMN_02 criterion and filter those rows using the other columns' criteria.
The price you pay for this improvement in performance is the overhead of maintaining an additional index. So you should consider whether you need all three single column indexes (for other queries).
is there an order for composite index columns?
Yes. Put them in ascending order of distinct values. The leading index column should be the least discriminating column. Having a unique key as the leading column is probably a disaster, although there are edge cases, so be sure to benchmark.
If I use OR instead of AND like this query, will it improve performance?
You're going to be returning more rows, which in itself is more work. It is also hard to use indexes in such a situation, so most likely you're facing a Full Table Scan. But why not try it and see what happens?
If I make COLUMN_01, COLUMN_02 and COLUMN_03 (all columns in WHERE clause) as composite index without changing existing indexes, will it improve performance?
For this query: likely. For INSERT/UPDATE/DELETE: the performance will deteriorate.
So you'll need to measure and see whether improvement in some queries justifies the deterioration in others.
If so, is there an order for composite index columns?
Not for this query. If you prefix-compress the index, you may choose the order that compresses the best, otherwise it shouldn't matter much.
However, there may be other queries that use only some of the indexed columns, in which case you'd want to make sure the columns that are actually used are at the leading edge of the index.
If I use OR instead of AND like this query, will it improve performance?
No. Separate indexes (that you already have) are what is needed in this case.

fact table is being populated with too many records

there are 62000 records in my fact table which is not correct because I only have six records in my time dim, 240 records in my student dim and 140 records in my placement dim, does it have something to do with my where clause any help would be mostly appreciated.
INSERT INTO fact_placements (
report_id,
no_of_placements,
no_of_students,
fk1_time_id,
fk2_placement_id,
fk3_student_id )
SELECT
fact_seq.nextval,
no_of_placements,
no_of_students,
time_id,
placement_id,
student_id
FROM
time_dim,
placement_dim,
student_dim
WHERE
placement_dim.year = time_dim.year AND
student_dim.year = time_dim.year;
Unless you do a cartesian join i.e. without any WHERE clause, you will get less than 140 (placement) * 240 (student) * 6 (time) = 201600 fact records. Your current SQL uses the year column in the 3 tables to join, this is filtering down the records to the 62000 you are getting.
Your question title says that even this is "too many". If that is the case, then you would need to understand the granularity of your dimensions and the fact before joining these on any criteria. Are these all at the "year" level, if so do you have 1 record per year in each of these tables and no duplicates based on year?
If not, you might need to re-think the fact tables granularity or alternatively would need to join unique records based on year in each dimension to get the actual (less) number of records you are expecting, which can also be done by summarizing these tables based on year.
Ideally the fact table contains the combinations of the dimension keys with additional column i.e. the factual metrics (in this case no_of_placements and no_of_students). But depending on the available data not all combinations will be present in the fact table.
Also you might want to change the SQL syntax to use the INNER JOIN clause instead of the implied joins using the commas between table names in the FROM clause, as shown below
FROM time_dim
INNER
JOIN placement_dim
ON placement_dim.year = time_dim.year
INNER
JOIN student_dim
ON placement_dim.year = student_dim.year
There's no relationship between placement and student that's why you have so many records.
Your query is saying: Give me all the students and all the placements where year is the same.
I'm not sure that's what you want. What is really strange here is that you are loading a fact table with dimensions tables.

MonetDB simple join performance on 2 tables

Let's assume I have two tables of the same row count. Both tables contain a column that allows for 1-1 join between them.
If those tables were turned into one table instead and thus JOIN statement eliminated from the query, would there be any performance benefit of that?
Another example... Let's assume I have table with 10 columns. From that table I created new table but only taking one column. If I issue statement selecting that one column with WHERE predicate on the same column would there be any performance difference in executing this query on both tables?
What I'm trying to get to is if performance is the same in above cases is it safe to say tables are only containers wrapping number of columns together?
I did run couple tests but with non conclusive results.
Let's assume I have two tables of the same row count. Both tables
contain a column that allows for 1-1 join between them. If those
tables were turned into one table instead and thus JOIN statement
eliminated from the query, would there be any performance benefit of
that?
Performing that join for every query is of course more expensive than materializing the table once and then reading it. So yes, there would be a performance benefit.
Another example... Let's assume I have table with 10 columns. From
that table I created new table but only taking one column. If I issue
statement selecting that one column with WHERE predicate on the same
column would there be any performance difference in executing this
query on both tables?
No, there would be no difference, since tables are represented as collections of columns, which are each stored in their own file.
What I'm trying to get to is if performance is the same in above cases
is it safe to say tables are only containers wrapping number of
columns together?
That is indeed safe to say.

Wrong index is chosen by Oracle

I have a problem in indexing in Oracle. Will try to explain my problem with an instance as follows.
I have a table TABLE1 with columns A,B,C,D
another table TABLE2 with columns A,B,C,E,F,H
I have created Indexes for TABLE1
IX_1 A
IX_2 A,B
IX_3 A,C
IX_4 A,B,C
I have created Indexes for TABLE1
IY_1 A,B,C
IY_2 A
when i gave query similar to this
SELECT * FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
When i give Explain Plan i got its not getting IX_1 nor IY_2
Its taking IX_4 nor IY_1
why this is not picking right index?
EDITED:
Can anyone help me to know difference between INDEX RANGE SCAN,INDEX UNIQUE SCAN, INDEX SKIP SCAN
I guess SKIP SCAN means when a column is skipped in Composite Index by Oracle
what about others i dont have idea!
The best benefit of indexes is that you can select a few rows from a table without scanning the entire table.
If you ask for too many rows(let's say 30% - depends of many things) the engine will prefer to scan the entire table for those rows.
That's because reading a row using an index is gets an overhead : reading some index blocks, and after that reading table blocks.
In your case, in order to join tables T1 and T2, Oracle needs all the rows from those table. Reading(full) the index will be an unsefull operation, adding unnecesary cost.
UPDATE: A step forward: if you run:
SELECT T1.B, T2.B FROM TABLE1 T1,TABLE2 T2
WHERE T1.A=T2.A
Oracle probably will use the indexes(IX2, IY2), because it does not need to read anything from table, because the values T1.B, T2.B, are in indexes.

How can I speed up a diff between tables?

I am working on doing a diff between tables in postgresql, it takes a long time, as each table is ~13GB...
My current query is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
When I do a diff on the two (unindexed) tables it takes 1:40 hours (1 hour and 40 minutes) In order to get both the new and removed rows I need to run the query twice, bringing the total time to 3:30 hours.
I ran the Postgresql EXPLAIN query on it to see what it was doing. It looks like it is sorting the first table, then the second, then comparing them. Well that made me think that if I indexed the tables they would be presorted and the diff query would be much faster.
Indexing each table took 45 minutes. Once Indexed, each Diff took 1:35 hours.
Why do the indexes only shave off 5 minutes off the total diff time? I would assume that it would be more than half, since in the unindexed queries I am sorting each table twice (I need to run the query twice)
Since one of these tables will not be changing much, it will only need to be indexed once, the other will be updated daily. So the total runtime for the indexed method is 45 minutes for the index, plus 2x 1:35 for the diff, giving a total of 3:55 hours, almost 4hours.
What am I doing wrong here, I can't possibly see why with the index my net diff time is larger than without it?
This is in slight reference to my other question here: Postgresql UNION takes 10 times as long as running the individual queries
EDIT:
Here is the schema for the two tables, they are identical except the table name.
CREATE TABLE bulk.blue
(
"partA" text NOT NULL,
"type" text NOT NULL,
"partB" text NOT NULL
)
WITH (
OIDS=FALSE
);
In the statements above you are not using the indexes.
You could do something like:
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
You could then use the same statement to show which tables had missing values
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
WHERE ISNULL(a.someID) OR ISNULL(b.someID)
This should give you the rows that were missing in table A OR table B
Confirm you indexes are being used (they are likely not in such a generic except statement), but you are not joining against a specified column(s) so likely that lack of explicit join will not make for an optimized query:
http://www.postgresql.org/docs/9.0/static/indexes-examine.html
This will help you view the explain analyze more clearly:
http://explain.depesz.com
Also, make sure you do an analyze on the table after you create the index if you want it to perform well right away:}
The queries as specified require a comparison of every column of the tables.
For example if tableA and tableB each have five columns then the query is having to compare tableA.col1 to tableB.col1, tableA.col2 to tableB.col2, . . . tableA.col5 to tableB.col5
If there are just few columns that uniquely identify a record instead of all the columnS in the table then joining the tables on the specific columns that uniquely identify a record will improve your performance.
The above statement assumes that a primary key has not been created. If a primary key has been defined to indicated which columns uniquely identify a record then I believe the EXCEPT statement would take that into consideration.
What kind of index did you apply? Indexes are only useful to improve WHERE conditions. If you're doing a select *, you're grabbing all the fields and the index is probably not doing anything, but taking up space, and adding a little more processing behind the scenes for the db-engine to compare the query to the index cache.
Instead of SELECT *, you can try selecting your unique fields and create an index for those unique fields
You can also use an OUTER JOIN to show results from both tables that did not match on the unique fields
You may want to consider is clustering your tables
What version of Postgres are you running?
When was the last time you vacuumed?
Other than the above, 13GB is pretty large, so you'll want to check your config settings. It shouldn't take hours to run that, unless you don't have enough memory on your system.

Resources