Query does cartesian join unless - oracle

I've got a query that's supposed to return 2 rows. However, it returns 48 rows. It's acting like one of the tables that's being joined isn't there. But if I add a column from that table to the select clause, with no changes to the from or where parts of the query, it returns 2 rows.
Here's what "Explain plan" says without the "m.*" in the select:
Here it is again after adding m.* in the select:
Can anybody explain why it should behave this way?
Update: We only had this problem on one system and not another. The DBA verified that the one with the problem is running optimizer_features_enable set to 10.2.0.5, and the one where it doesn't happen is running optimizer_features_enable set to 10.2.0.4. Unfortunately the customer site is running 10.2.0.5.

It's about a join elimination that was introduced in 10gR2:
Table elimination (alternately called
"join elimination") removes redundant
tables from a query. A table is
redundant if its columns are only
referenced to in join predicates, and
it is guaranteed that those joins
neither filter nor expand the
resulting rows. There are several
cases where Oracle will eliminate a
redundant table.
Maybe that's kind of related bug or so. Have a look at this article.

Looks like a bug. What are the constraints ?
Logically, if all rows in MASTERSOURCE_FUNCTION had the function NON-OSDA then that wouldn't exclude any rows (or if none had that value, then all rows would be excluded).
Going one step further, if every row in MASTERSOURCE had one or zero NON-OSDA rows in MASTERSOURCE_FUNCTION, then it should be a candidate for exclusion. But there would also need to be a one-to-one between the MASTERSOURCE ID and NAME.
I'd pull the ROWIDs from ACCOUNTSOURCE for the 48 rows, then track the MASTERSOURCE ID and NAME and see on what grounds those rows are being duplicated or not excluded. That is, are there 12 duplicate names in MASTERSOURCE where it is expected to be unique through a NOVALIDATE constraint.

Related

MonetDB simple join performance on 2 tables

Let's assume I have two tables of the same row count. Both tables contain a column that allows for 1-1 join between them.
If those tables were turned into one table instead and thus JOIN statement eliminated from the query, would there be any performance benefit of that?
Another example... Let's assume I have table with 10 columns. From that table I created new table but only taking one column. If I issue statement selecting that one column with WHERE predicate on the same column would there be any performance difference in executing this query on both tables?
What I'm trying to get to is if performance is the same in above cases is it safe to say tables are only containers wrapping number of columns together?
I did run couple tests but with non conclusive results.
Let's assume I have two tables of the same row count. Both tables
contain a column that allows for 1-1 join between them. If those
tables were turned into one table instead and thus JOIN statement
eliminated from the query, would there be any performance benefit of
that?
Performing that join for every query is of course more expensive than materializing the table once and then reading it. So yes, there would be a performance benefit.
Another example... Let's assume I have table with 10 columns. From
that table I created new table but only taking one column. If I issue
statement selecting that one column with WHERE predicate on the same
column would there be any performance difference in executing this
query on both tables?
No, there would be no difference, since tables are represented as collections of columns, which are each stored in their own file.
What I'm trying to get to is if performance is the same in above cases
is it safe to say tables are only containers wrapping number of
columns together?
That is indeed safe to say.

Is there any use to create index on all the table columns in oracle?

In our one of production database, we have 4 column table and there are no PK,UK constraints on it. only one notnull constraint on one column. The inserts are slow on this table and when I checked the indexes , there is one index which is built on all columns.
It is a normal table and not IOT. I really don't see a need of all column index, but wondering why the developers has created it?
Appreciate your thoughts?
It might be usefull, i.e. if you (mainly) query all columns oracle doesn't have to access the table at all, but can get all the data from the index. Though inserts take longer because a larger index has to be maintained by the dbms everytime.
One case where it could be useful is,
Say for example, you are trying to check the existence of records in this table and for that you have to have joins on all four columns. So in such a case if you have written a correlated query like below,
SELECT <something>
FROM table_1 t1
WHERE EXISTS
(SELECT 1 FROM table_t2 t2 where t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3 and t1.c4=t2.c4)
Apart from above case, it looks an error to me from developer's side.
Indexes are good to better query optimization but causes slow updates/inserts because the indexes needs to be updated at each modification.
If these tables first use is querying and inserts happens only in a specific periods like a batch at the beginning or the end of the day only, then you can remove the indexes before updating tables and then restore them.
In addition, all the queries all these tables need to be analysed to see which indexes are useful and which are not?
Anyway, You need to ask developers before removing these indexes.

Deleting duplicate rows in oracle

Shouldn't the following query work fine for deleting duplicate rows in oracle
SQL> delete from sessions o
where 1<=(select count(*)
from sessions i
where i.id!=o.id
and o.data=i.data);
It seems to delete all the duplication rows!! (I wish to keep 1 tough)
Your statement doesn't work because your table has at least one row where two different ID's share the same values for DATA.
Although your intent may be to look for differing values of DATA ID by ID, what your SQL is saying is in fact set-based: "Look at my table as a whole. If there are any rows in the table such that the DATA is the same but the ID's are different (i.e., that inner COUNT(*) is anything greater than 0), then DELETE every row in the table."
You may be attempting specific, row-based logic, but your statement is big-picture (set-based). There's nothing in it to single out duplicate rows, as there is in the solution Ollie has linked to, for example.

Slow query on view when using single column order by clause

I have a view that joins 24 tables (all but 1 are left outer joins) and 83 columns. When I Select * from the view without an order by clause it returns 27k rows all columns in about 4:27 seconds. If I do the same select but add a 'order by requestId' clause it takes 83 minutes to complete.
The column being ordered by is indexed in the original table.
I've tried wrapping it in a Select * from (.......) order by requestId but get the same results.
Suggestions on where to look
Explain might tell you more, if you have the time to wade through it, but guessing I'd say it's doing a full sort of all 27,000 rows as it can't find a useful ordered index to avoid the extra sort.
It will be hard to spot amongst what you have but a simple scenario would be
TableA KeyColumn,DataColumn, where key column is the primary key
Select * From TableA Order By KeyColumn, will use the PK index which is in order, so no sort required.
select * From TableA Order By DataColumn, will read the table and then do a sort.
Add an Index for Datacolumn, and the sort won't be required.
As soon as you get to more complex scenarios, it might be that you have a useful index for ordering, but it's not the best for joining, so it does the join fast, and then spends all the time ordering.
If I was looking at this and what to do didn't leap out at me eg. no index at all for requestid, then I'd start chopping tables out of the query until I stopped getting the undesirable behaviour. Then put one back in to get it back, and then use this hopefully less arduous query and explain and see if I could get a usful index in or restate the query to use a more useful index.
Best of luck.
If you have order by on a column, then that column must be part of either:
- An index of its own where only that column exists
- Or in an index that has the fields in the WHERE clauses and then the fields in the ORDER BY clause in the exact order.
Best if you show me the query. Then I can brainstorm with you.

How can I speed up a diff between tables?

I am working on doing a diff between tables in postgresql, it takes a long time, as each table is ~13GB...
My current query is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
When I do a diff on the two (unindexed) tables it takes 1:40 hours (1 hour and 40 minutes) In order to get both the new and removed rows I need to run the query twice, bringing the total time to 3:30 hours.
I ran the Postgresql EXPLAIN query on it to see what it was doing. It looks like it is sorting the first table, then the second, then comparing them. Well that made me think that if I indexed the tables they would be presorted and the diff query would be much faster.
Indexing each table took 45 minutes. Once Indexed, each Diff took 1:35 hours.
Why do the indexes only shave off 5 minutes off the total diff time? I would assume that it would be more than half, since in the unindexed queries I am sorting each table twice (I need to run the query twice)
Since one of these tables will not be changing much, it will only need to be indexed once, the other will be updated daily. So the total runtime for the indexed method is 45 minutes for the index, plus 2x 1:35 for the diff, giving a total of 3:55 hours, almost 4hours.
What am I doing wrong here, I can't possibly see why with the index my net diff time is larger than without it?
This is in slight reference to my other question here: Postgresql UNION takes 10 times as long as running the individual queries
EDIT:
Here is the schema for the two tables, they are identical except the table name.
CREATE TABLE bulk.blue
(
"partA" text NOT NULL,
"type" text NOT NULL,
"partB" text NOT NULL
)
WITH (
OIDS=FALSE
);
In the statements above you are not using the indexes.
You could do something like:
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
You could then use the same statement to show which tables had missing values
SELECT * FROM tableA a
FULL OUTER JOIN tableB b ON a.someID = b.someID
WHERE ISNULL(a.someID) OR ISNULL(b.someID)
This should give you the rows that were missing in table A OR table B
Confirm you indexes are being used (they are likely not in such a generic except statement), but you are not joining against a specified column(s) so likely that lack of explicit join will not make for an optimized query:
http://www.postgresql.org/docs/9.0/static/indexes-examine.html
This will help you view the explain analyze more clearly:
http://explain.depesz.com
Also, make sure you do an analyze on the table after you create the index if you want it to perform well right away:}
The queries as specified require a comparison of every column of the tables.
For example if tableA and tableB each have five columns then the query is having to compare tableA.col1 to tableB.col1, tableA.col2 to tableB.col2, . . . tableA.col5 to tableB.col5
If there are just few columns that uniquely identify a record instead of all the columnS in the table then joining the tables on the specific columns that uniquely identify a record will improve your performance.
The above statement assumes that a primary key has not been created. If a primary key has been defined to indicated which columns uniquely identify a record then I believe the EXCEPT statement would take that into consideration.
What kind of index did you apply? Indexes are only useful to improve WHERE conditions. If you're doing a select *, you're grabbing all the fields and the index is probably not doing anything, but taking up space, and adding a little more processing behind the scenes for the db-engine to compare the query to the index cache.
Instead of SELECT *, you can try selecting your unique fields and create an index for those unique fields
You can also use an OUTER JOIN to show results from both tables that did not match on the unique fields
You may want to consider is clustering your tables
What version of Postgres are you running?
When was the last time you vacuumed?
Other than the above, 13GB is pretty large, so you'll want to check your config settings. It shouldn't take hours to run that, unless you don't have enough memory on your system.

Resources