Hash Join with Partition restriction from third table - oracle

my current problem is in 11g, but I am also interested in how this might be solved smarter in later versions.
I want to join two tables. Table A has 10 million rows, Table B is huge and has a billion of records across about a thousand partitions. One partition has around 10 million records. I am not joining on the partition key. For most rows of Table A, one or more rows in Table B will be found.
Example:
select * from table_a a
inner join table_b b on a.ref = b.ref
The above will return about 50 million rows, whereas the results come from about 30 partitions of table b. I am assuming a hash join is the correct join here, hashing table a and FTSing/index-scanning table b.
So, 970 partitions were scanned for no reason. And, I have a third query that could tell oracle which 30 partitions to check for the join.
Example of third query:
select partition_id from table_c
This query gives exactly the 30 partitions for the query above.
To my question:
In PL/SQL one can solve this by
select the 30 partition_ids into a variable (be it just a select listagg(partition_id,',') ... into v_partitions from table_c
Execute my query like so:
execute immediate 'select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in ('||v_partitions||')' into ...
Let's say this completes in 10 minutes.
Now, how can I do this in the same amount of time with pure SQL?
Just simply writing
select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in (select partition_id from table_c)
does not do the trick it seems, or I might be aiming at the wrong plan.
The plan I think I want is
hash join
table a
nested loop
table c
partition pruning here
table b
But, this does not come back in 10 minutes.
So, how to do this in SQL and what execution plan to aim at? One variation I have not tried yet that might be the solution is
nested loop
table c
hash join
table a
partition pruning here (pushed predicate from the join to c)
table b
Another feeling I have is that the solution might lie in joining table a to table c (not sure on what though) and then joining this result to table b.
I am not asking you to type everything out for me. Just a general concept of how to do this (getting partition restriction from a query) in SQL - what plan should I aim at?
thank you very much! Peter

I'm not an expert at this, but I think Oracle generally does the joins first, then applies the where conditions. So you might get the plan you want by moving the partition pruning up into a join condition:
select * from table_a a
inner join table_b b on a.ref = b.ref
and b.partition_id in (select partition_id from table_c);
I've also seen people try to do this sort of thing with an inline view:
select * from table_a a
inner join (select * from table_b
where partition_id in (select partition_id from table_c)) b
on a.ref = b.ref;

thank you all for your discussions with me on this one. In my case this was solved (not by me) by adding a join-path between table_c and table_a and by overloading the join conditions as below. In my case this was possible by adding column partition_id to table_a:
select * from
table_c c
JOIN table_a a ON (a.partition_id = c.partition_id)
JOIN table_b b ON (b.partition_id = c.partition_id and b.partition_id = a.partition_id and b.ref = a.ref)
And this is the plan you want:
leading(c,b,a) use_nl(c,b) swap_join_inputs(a) use_hash(a)
So you get:
hash join
table a
nested loop
table c
partition list iterator
table b

Related

Oracle Performance issues on using subquery in an "In" orperator

I have two query that looks close to the same but Oracle have very different performance.
Query A
Create Table T1 as Select * from FinalView1 where CustomerID in ('A0000001','A000002')
Query B
Create Table T1 as Select * from FinalView1 where CustomerID in (select distinct CustomerID from CriteriaTable)
The CriteriaTable have 800 rows but all belongs to Customer ID 'A0000001' and 'A000002'.
This means the subquery: "select distinct CustomerID from CriteriaTable" also only returns the same two elements('A0000001','A000002') as manually entered in query A
Following is the query under the FinalView1
create or replace view FinalView1_20200716 as
select
Customer_ID,
<Some columns>
from
Table1_20200716 T1
INNER join Table2_20200716 T2 on
T1.Invoice_number = T2.Invoice_number
and
T1.line_id = T2.line_id
left join Table3_20200716 T3 on
T3.id = T1.Customer_ID
left join Table4_20200716 T4 on
T4.Shipping_ID = T1.Shipping_ID
left join Table5_20200716 Table5 on
Table5.Invoice_ID = T1.Invoice_ID
left join Table6_20200716 T6 on
T6.Shipping_ID = T4.Shipping_ID
left join First_Order first on
first.Shipping_ID = T1.Shipping_ID
;
Table1_20200716,Table2_20200716,Table3_20200716,Table4_20200716,Table5_20200716,Table6_20200716 are views to the corresponding table with temporal validity feature. For example
The query under Table1_20200716
Create or replace view Table1_20200716 as
select
*
from Table1 as for period of to_date('20200716,'yyyymmdd')
However table "First_Order" is just a normal table as
Following is the performance for both queries (According to explain plan):
Query A:
Cardinality: 102
Cost : 204
Total Runtime: 5 secs max
Query B:
Cardinality:27921981
Cost: 14846
Total Runtime:20 mins until user cancelled
All tables are indexed using those columns that used to join against other tables in the FinalView1. According to the explain plan, they have all been used except for the FirstOrder table.
Query A used uniquue index on the FirstOrder Table while Query B performed a full scan.
For query B, I was expecting the Oracle will firstly query the sub-query get the result into the in operator, before executing the main query and therefore should only have minor impact to the performance.
Thanks in advance!
As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.
That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join
Create Table T1 as
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;
Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)
However, it only takes 17 secs. Which is very good!

Oracle 11g - Selecting multiple records from a table column

I was just wondering how you select multiple records from a table column. Please see below the query.
SELECT DISTINCT DEPARTMENT_NAME, CITY, COUNTRY_NAME
FROM OEHR_DEPARTMENTS
NATURAL JOIN OEHR_EMPLOYEES
NATURAL JOIN OEHR_LOCATIONS
NATURAL JOIN OEHR_COUNTRIES
WHERE JOB_ID = 'SA_MAN' AND JOB_ID = 'SA_REP'
;
Basically, I want to be able to select records from the table column I have, however when you use AND it only displays SA_MAN and not SA_REP. I have also tried to use OR and it displays no rows selected. How would I actually be able to select both Job ID's without it just displaying one or the other.
Sorry this may sound like a stupid question (and probably not worded right), but I am pretty new to Oracle 11g SQL.
For your own comfort while debugging, I suggest you to use inner joins instead of natual joins.
That where clause is confusing, if not utterly wrong, because you don't make clear which tables' JOB_ID should be filtered. Use inner joins, give aliases to tables, and refer to those aliases in the where clause.
select distinct DEPARTMENT_NAME, CITY, COUNTRY_NAME
from OEHR_DEPARTMENTS t1
join OEHR_EMPLOYEES t2
on ...
join OEHR_LOCATIONS t3
on ...
join OEHR_COUNTRIES t4
on ...
where tn.JOB_ID = 'SA_MAN' AND tm.JOB_ID = 'SA_REP'
After rephrasing your query somehow like this, you'll have a clearer view on the logical operator you'll have to use in the where clause, which I bet will be an OR.
EDIT (after more details were given)
To list the departments that employ staff with both 'SA_MAN' and 'SA_REP' job_id, you have to join the departments table with the employees twice, once with the filter job_id='SA_MAN' and once with job_id='SA_REP'
select distinct DEPARTMENT_NAME, CITY, COUNTRY_NAME
from OEHR_DEPARTMENTS t1
join OEHR_EMPLOYEES t2
on t1.department_id = t2.department_id --guessing column names
join OEHR_EMPLOYEES t3
on t1.department_id = t3.department_id --guessing column names
join OEHR_LOCATIONS t4
on t1.location_id = t4.location_id --guessing column names
join OEHR_COUNTRIES t5
on t4.country_id = t5.country_id --guessing column names
where t2.job_id = 'SA_MAN' and t3.job_id = 'SA_REP'
order by 1, 2, 3

Insert Statement Returns ORA-01427 Error While Trying To Insert From Multiple Tables

I have this table F_Flight which I am trying to insert into from 3 different tables. The first, fourth and fifth columns are from the same, and the second and third columns from different tables. When I execute the code, I get a "single-row subquery returns more than one row" error.
insert when 1 = 1 then into F_Flight (planeid, groupid, dateid, flightduration, kmsflown) values
(planeid, (select b.groupid from BridgeTable b where exists (select p.p1id from pilotkeylookup p where b.pilotid = p.p1id)),
(select dd.id from D_Date dd where exists (select p.launchtime from PilotKeyLookup p where dd."Date" = p.launchtime)),
flightduration, kmsflown) select * from PilotKeyLookup p;
Your subqueries get multiple rows back, which is what the error message says. There is no correlation between the various bits of data and subqueries you're trying to insert into a single row.
This can be done as a much simpler insert...select with joins, something like:
insert into f_flight (planeid, groupid, dateid, flightduration, kmsflown)
select pkl.planeid, bt.groupid, dd.id, pkl.flightduration, pkl.kmsflown
from pilotkeylookup pkl
join bridgetable bt on bt.pilotid = pkl.p1id
join d_date dd on dd."Date" = pkl.launchtime;
This joins the main PilotKeyLookup table to the other two on the keys you used in your subqueries.
Storing an ID value instead of an actual date is unusual, and if launchtime has a time component - which seems likely from the name - and your d_date entries are just dates (i.e. all with time at midnight) then you won't find matches; you might need to do:
join d_date dd on dd."Date" = trunc(pkl.launchtime);
It also seems like this could be a view, as you're storing duplicate data - everything in f_flight could, obviously, be found from the other tables.

Worse query plan with a JOIN after ANALYZE

I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.
Suppose the following schema:
CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name
Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).
Before ANALYZE
Now when I run the following query:
SELECT * FROM ab;
The query plan looks as follows:
1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.
After ANALYZE
When I execute the same query again, the query plan results in two table scans:
1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b
This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.
Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?
I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).
Accidentally found the answer by adjusting the GROUP BY clause in the view definition. Instead of joining on a.id, I group on b.a instead, although they have the same values.
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY b.a -- <== changed this from a.id to b.a
ORDER BY a.name
I'm still not entirely sure what the difference is, since it groups the same data.

Order of columns returned in SQL query (With table join)

I am wondering what determines the order of columns returned in a SQL query.
For example, SELECT * FROM SOMETABLE;
SQ_ID |BUS_TYPE |VOIP |LOCAL_PHONE
--------|-----------|---------|-------------
SQ000001|Business |Y |N
I am guessing the attribute COLUMN_ID determines this. In the case of a table join, for example, SELECT * FROM SOMETABLE LEFT JOIN OTHERTABLE USING (SOME_COL); how is the order now determine.
The order of columns in SELECT * FROM some_table is determined by the column_id of each column, as seen in USER_TAB_COLUMNS.
The order of columns in SELECT * FROM some_table JOIN other_table is all the columns for each table starting with the leftmost table after the FROM clause. In other words, this ...
SELECT * FROM some_table JOIN other_table
... is equivalent to this ...
SELECT some_table.*, other_table.* FROM some_table JOIN other_table
Changing that inner join to LEFT JOIN or RIGHT JOIN won't change the projection.
This is, of course, theoretical. We should never use select * in production code. Explicit column declarations, with table aliases when joining, are always safer. Apart from better expression of intent, explicit projections protect our code from future changes to the tables such as adding a LOB column or a column name which creates ambiguity with a joined table's column.
You can list the order of the colums in the Select statement:
SELECT SOME_COL, SOME_OTHER_COL
FROM SOMETABLE LEFT JOIN OTHERTABLE USING (SOME_COL)
But you also speak of the ID influencing the order and of ordering in general. So I think you could also be looking for ORDER BY to order the rows:
SELECT *
FROM SOMETABLE LEFT JOIN OTHERTABLE USING (SOME_COL)
ORDER BY SOME_COL
What also comes quite handy in this case is the use of aliases. Especally when both tables have coloums with the same name:
SELECT s.some_col, o.some_col
FROM SOMETABLE s LEFT JOIN OTHERTABLE o ON(o.id = s.id)
ORDER BY o.SOME_COL
I use the ON JOIN syntax in this case, because i find this more intuive when using aliases but it should also work with USING.

Resources