Why Access and Filter Predicates are the same here? - oracle

When I get the autotrace output of the query above using the Oracle SQL Developer, I see that the join condition is used for access and filter predicates. My question is, does it read all the department_ids from the DEPT_ID_PK and then use these IDs to access and filter the employees table? If so, why the employees table has full table scan? Why does it read the employees table again by using the department_ids of the departments table? Could anyone please read this execution plan step by step simply, and explain the reason why the access and filter predicates are used here?
Best Regards

it is a merge join (a bit like hash join, Merge join is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins).
so Oracle do a full table scan of in outer table (EMPLOYEES) and the it read the inner table in a ordred manner.
the filtre predicates is the column on which the projection will be done
more details: https://datacadamia.com/db/oracle/merge_join

It uses the primary key to avoid sorting, otherwise the plan would be like this
The distinction between "Access predicates" and "Filter predicates" is not particularly consistent, so take them with healthy amount of skepticism. For example, if you remove the USE_MERGE hint, then there would be no Fiter Predicates in the plan any more, and the Access Predicates node would be relocated under the HASH_JOIN node (where it makes more sense for MERGE_JOIN as well):

Related

Data consistency between Oracle tables

I have one big table A who has PK (C1, C2, C3) and many other columns, to make the select faster, a smaller table B was created with PK (C1, C2). So we can do a select by joining the two tables to find a row in A.
But the problem now is that it can happen that if data is corrupted in B which results in a joint select returns nothing but we still have a row in A.
Am I doing something wrong with this design and how can I ensure the data in those two tables are consistent?
Thanks a lot.
Standard way - if those tables are in a master-detail relationship - is to create a foreign key constraint which will prevent deleting master if details exist.
If you can fix it now, do it - then create the constraint.
If you can't, then create foreign key constraint using INITIALLY DEFERRED DEFERRABLE option so that current values aren't checked, but future DML will be.
Finally, to fetch data although certain rows don't exist any more, use outer join.
"Am I doing something wrong with this design"
Well it's hard to be sure without more details about your scenario but probably you just needed a non-unique index on A(C1, C2).
Although I would like to see some benchmarking which proves an index-range scan on your primary key index was not up to the job. Especially as it seems likely the join on table B is using that access path.
Performance tuning an Oracle database is a matter of understanding and juggling many variables. It's not just a case of "bung on another index". We need to understand what the database is actually doing and why the optimiser made that choice. So, please read this post on asking Oracle tuning questions which will give you some insight into how to approach query optimisation.

Table joins in BigQuery

Do you know if there's a way to join two tables by, for
example, using a foreign key constraint like in MySQL (I don't seem to
find anything about this) ? If not, is there a replacement ?
Thanks!
I interpret your question as below -
Is there a way to limit the values that can be used on the tableX to only the IDs that exist on the tableY? For example via using a foreign key constraint like in MySQL!
BigQuery does not provide any direct mechanism for this to happen.
But you can easily achieve this by yourself.
For example, assume you need to insert some data to tableX, but you want to make sure that only those rows will be inserted where id in that new data is in tableY
So, you can "enforce" this via below query
#standardSQL
SELECT n.*
FROM newData AS n
JOIN tableY AS y
ON n.id = y.id
... you can run this query with tableX as destination and ONLY needed rows will be inserted
Hope you got an idea
Also you can check existing related feature requests -
https://issuetracker.google.com/issues/35906045
https://issuetracker.google.com/issues/35906043
Since you asked two questions (Stack Overflow suggests asking 1 question per question), I'll answer one:
Also, do you know if there's a way to join two tables by, for example,
using a foreign key
In BigQuery you can join tables by any key - even by keys defined on the fly (this is pretty useful when you need to join two tables from different datasets that choose to encode identical values in different ways).
Why would you need a foreign key to do these joins?

MonetDB simple join performance on 2 tables

Let's assume I have two tables of the same row count. Both tables contain a column that allows for 1-1 join between them.
If those tables were turned into one table instead and thus JOIN statement eliminated from the query, would there be any performance benefit of that?
Another example... Let's assume I have table with 10 columns. From that table I created new table but only taking one column. If I issue statement selecting that one column with WHERE predicate on the same column would there be any performance difference in executing this query on both tables?
What I'm trying to get to is if performance is the same in above cases is it safe to say tables are only containers wrapping number of columns together?
I did run couple tests but with non conclusive results.
Let's assume I have two tables of the same row count. Both tables
contain a column that allows for 1-1 join between them. If those
tables were turned into one table instead and thus JOIN statement
eliminated from the query, would there be any performance benefit of
that?
Performing that join for every query is of course more expensive than materializing the table once and then reading it. So yes, there would be a performance benefit.
Another example... Let's assume I have table with 10 columns. From
that table I created new table but only taking one column. If I issue
statement selecting that one column with WHERE predicate on the same
column would there be any performance difference in executing this
query on both tables?
No, there would be no difference, since tables are represented as collections of columns, which are each stored in their own file.
What I'm trying to get to is if performance is the same in above cases
is it safe to say tables are only containers wrapping number of
columns together?
That is indeed safe to say.

Oracle: Having join or simple from/where clause has no affect on performance?

My manger just told me that having joins or where clause in oracle query doesn't affect performance even when you have million records in each table. And I am just not satisfied with this and want to confirm that.
which of the following queries is better in performance on oracle and in postgresql also
1- select a.name,b.salary,c.address
from a,b,c
where a.id=b.id and a.id=c.id;
2- select a.name,b.salary,c.address
from a
JOIN b on a.id=b.id
JOIN C on a.id=c.id;
I have tried Explain in postgresql for a small data set and query time was same (may be because I have just few rows) and right now I have no access to oracle and actual database to analyze the Explain in real envoirnment.
Using JOINS makes the code easier to read, since it's self-explanatory.
In speed there is no difference (I have just tested it) and the execution plan is the same
If the query optimizer is doing its job right, there should be no difference between those queries.
They are just two ways to specify the same desired result.

How do you create an index on a subquery factored temporary table?

I've got a query which has a WITH statement for a subquery at the top, and I'm then running a couple of CONNECT BYs on the subquery. The subquery can contain tens of thousands of rows, and there's no limit to the depth of the CONNECT BY hierarchy. Currently, this query takes upwards of 30 seconds; is it possible to specify indexes to put on the temporary table created for the factored subquery to speed up the CONNECT BYs, or speed it up another way?
There is no way to do it right in the query: Oracle does not support Eager Spool.
You can temporarily store your resultset in an indexed temporary table and issue the CONNECT BY query against it.
However, for the unsargable equality conditions in the query, the CONNECT BY usually builds a hash table which is in most cases even better than an index.
Could you please post your query here?
You might be able to use the MATERIALIZE hint with query subfactoring so that the subquery isn't being rerun iteratively. While it's undocumented, it seems to reliably flush the results of a WITH clause into a temporary table.
Jonathan Lewis' blog has several examples of how it can be used. There is some risk, however, due to the hint's undocumented nature.

Resources