MonetDB simple join performance on 2 tables - performance

Let's assume I have two tables of the same row count. Both tables contain a column that allows for 1-1 join between them.
If those tables were turned into one table instead and thus JOIN statement eliminated from the query, would there be any performance benefit of that?
Another example... Let's assume I have table with 10 columns. From that table I created new table but only taking one column. If I issue statement selecting that one column with WHERE predicate on the same column would there be any performance difference in executing this query on both tables?
What I'm trying to get to is if performance is the same in above cases is it safe to say tables are only containers wrapping number of columns together?
I did run couple tests but with non conclusive results.

Let's assume I have two tables of the same row count. Both tables
contain a column that allows for 1-1 join between them. If those
tables were turned into one table instead and thus JOIN statement
eliminated from the query, would there be any performance benefit of
that?
Performing that join for every query is of course more expensive than materializing the table once and then reading it. So yes, there would be a performance benefit.
Another example... Let's assume I have table with 10 columns. From
that table I created new table but only taking one column. If I issue
statement selecting that one column with WHERE predicate on the same
column would there be any performance difference in executing this
query on both tables?
No, there would be no difference, since tables are represented as collections of columns, which are each stored in their own file.
What I'm trying to get to is if performance is the same in above cases
is it safe to say tables are only containers wrapping number of
columns together?
That is indeed safe to say.

Related

Why Access and Filter Predicates are the same here?

When I get the autotrace output of the query above using the Oracle SQL Developer, I see that the join condition is used for access and filter predicates. My question is, does it read all the department_ids from the DEPT_ID_PK and then use these IDs to access and filter the employees table? If so, why the employees table has full table scan? Why does it read the employees table again by using the department_ids of the departments table? Could anyone please read this execution plan step by step simply, and explain the reason why the access and filter predicates are used here?
Best Regards
it is a merge join (a bit like hash join, Merge join is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins).
so Oracle do a full table scan of in outer table (EMPLOYEES) and the it read the inner table in a ordred manner.
the filtre predicates is the column on which the projection will be done
more details: https://datacadamia.com/db/oracle/merge_join
It uses the primary key to avoid sorting, otherwise the plan would be like this
The distinction between "Access predicates" and "Filter predicates" is not particularly consistent, so take them with healthy amount of skepticism. For example, if you remove the USE_MERGE hint, then there would be no Fiter Predicates in the plan any more, and the Access Predicates node would be relocated under the HASH_JOIN node (where it makes more sense for MERGE_JOIN as well):

Is there any use to create index on all the table columns in oracle?

In our one of production database, we have 4 column table and there are no PK,UK constraints on it. only one notnull constraint on one column. The inserts are slow on this table and when I checked the indexes , there is one index which is built on all columns.
It is a normal table and not IOT. I really don't see a need of all column index, but wondering why the developers has created it?
Appreciate your thoughts?
It might be usefull, i.e. if you (mainly) query all columns oracle doesn't have to access the table at all, but can get all the data from the index. Though inserts take longer because a larger index has to be maintained by the dbms everytime.
One case where it could be useful is,
Say for example, you are trying to check the existence of records in this table and for that you have to have joins on all four columns. So in such a case if you have written a correlated query like below,
SELECT <something>
FROM table_1 t1
WHERE EXISTS
(SELECT 1 FROM table_t2 t2 where t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3 and t1.c4=t2.c4)
Apart from above case, it looks an error to me from developer's side.
Indexes are good to better query optimization but causes slow updates/inserts because the indexes needs to be updated at each modification.
If these tables first use is querying and inserts happens only in a specific periods like a batch at the beginning or the end of the day only, then you can remove the indexes before updating tables and then restore them.
In addition, all the queries all these tables need to be analysed to see which indexes are useful and which are not?
Anyway, You need to ask developers before removing these indexes.

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Deleting duplicate rows in oracle

Shouldn't the following query work fine for deleting duplicate rows in oracle
SQL> delete from sessions o
where 1<=(select count(*)
from sessions i
where i.id!=o.id
and o.data=i.data);
It seems to delete all the duplication rows!! (I wish to keep 1 tough)
Your statement doesn't work because your table has at least one row where two different ID's share the same values for DATA.
Although your intent may be to look for differing values of DATA ID by ID, what your SQL is saying is in fact set-based: "Look at my table as a whole. If there are any rows in the table such that the DATA is the same but the ID's are different (i.e., that inner COUNT(*) is anything greater than 0), then DELETE every row in the table."
You may be attempting specific, row-based logic, but your statement is big-picture (set-based). There's nothing in it to single out duplicate rows, as there is in the solution Ollie has linked to, for example.

Query does cartesian join unless

I've got a query that's supposed to return 2 rows. However, it returns 48 rows. It's acting like one of the tables that's being joined isn't there. But if I add a column from that table to the select clause, with no changes to the from or where parts of the query, it returns 2 rows.
Here's what "Explain plan" says without the "m.*" in the select:
Here it is again after adding m.* in the select:
Can anybody explain why it should behave this way?
Update: We only had this problem on one system and not another. The DBA verified that the one with the problem is running optimizer_features_enable set to 10.2.0.5, and the one where it doesn't happen is running optimizer_features_enable set to 10.2.0.4. Unfortunately the customer site is running 10.2.0.5.
It's about a join elimination that was introduced in 10gR2:
Table elimination (alternately called
"join elimination") removes redundant
tables from a query. A table is
redundant if its columns are only
referenced to in join predicates, and
it is guaranteed that those joins
neither filter nor expand the
resulting rows. There are several
cases where Oracle will eliminate a
redundant table.
Maybe that's kind of related bug or so. Have a look at this article.
Looks like a bug. What are the constraints ?
Logically, if all rows in MASTERSOURCE_FUNCTION had the function NON-OSDA then that wouldn't exclude any rows (or if none had that value, then all rows would be excluded).
Going one step further, if every row in MASTERSOURCE had one or zero NON-OSDA rows in MASTERSOURCE_FUNCTION, then it should be a candidate for exclusion. But there would also need to be a one-to-one between the MASTERSOURCE ID and NAME.
I'd pull the ROWIDs from ACCOUNTSOURCE for the 48 rows, then track the MASTERSOURCE ID and NAME and see on what grounds those rows are being duplicated or not excluded. That is, are there 12 duplicate names in MASTERSOURCE where it is expected to be unique through a NOVALIDATE constraint.

Resources