How to effeciently select data from two tables? - performance

I have two tables: A, B.
A has prisoner_id and prisoner_name columns.
B has all other info about prisoners included prisoner_name column.
First I select all of the data that I need from B:
WITH prisoner_datas AS
(SELECT prisoner_name, ... FROM B WHERE ...)
Then I want to know all of the id of my prisoner_datas. To do this I need to combine information by prisoner_name column, because it's common for both tables
I did the following
SELECT A.prisoner_id, prisoner_datas.prisoner_name, prisoner_datas. ...,
FROM A, prisoner_datas
WHERE A.prisoner_name = prisoner_datas.prisoner_name
But it works very slow. How can I improve performance?

Add an index on the prisoner_name join column in the B table. Then the following join should have some performance improvement:
SELECT
A.prisoner_id,
B.prisoner_name,
B.prisoner_datas.id -- and other columns if needed
FROM A
INNER JOIN B
ON A.prisoner_name = B.prisoner_name
Note here that I used an explicit join syntax here. It isn't required, and the query plan might not change, but it makes the query easier to read. I don't think the CTE will change much, but the lack of an index on the join column should be important here.

Related

Consecutive JOIN and aliases: order of execution

I am trying to use FULLTEXT search as a preliminary filter before fetching data from another table. Consecutive JOINs follow to further refine the query and to mix-and-match rows (in reality there are up to 6 JOINs of the main table).
The first "filter" returns the IDs of the rows that are useful, so after joining I have a subset to continue with. My issue is performance, however, and my lack of understanding of how the SQL query is executed in SQLite.
SELECT *
FROM mytbl AS t1
JOIN
(SELECT someid
FROM myftstbl
WHERE
myftstbl MATCH 'MATCHME') AS prior
ON
t1.someid = prior.someid
AND t1.othercol = 'somevalue'
JOIN mytbl AS t2
ON
t2.someid = prior.someid
/* Or is this faster? t2.someid = t1.someid */
My thought process for the query above is that first, we retrieve the matched IDs from the myftstbl table and use those to JOIN on the main table t1 to get a sub-selection. Then we again JOIN a duplicate of the main table as t2. The part that I am unsure of is which approach would be faster: using the IDs from the matches, or from t2?
In other words: when I refer to t1.someid inside the second JOIN, does that contain only the someids after the first JOIN (so only those at the intersection of prior and those for which t1.othercol = 'somevalue) OR does it contain all the original someids of the whole original table?
You can assume that all columns are indexed. In fact, when I use one or the other approach, I find with EXPLAIN QUERY PLAN that different indices are being used for each query. So there must be a difference between the two.
The query should be simplified to
SELECT *
FROM mytbl AS t1
JOIN myftstbl USING (someid) -- or ON t1.someid = myftstbl.someid
JOIN mytbl AS t2 USING (someid) -- or ON t1.someid = t2.someid
WHERE myftstbl.{???} MATCH 'MATCHME' -- replace {???} with correct column name
AND t1.othercol = 'somevalue'
PS. The query logic is not clear for me, so it is saved as-is.

Technical and syntax doubts about joins

i'm having a technical and syntax problem with JOINS in ORACLE.
If i have 7 tables, listed below:
FROM
QT_QTS.PLA_ORDEM_PRODUCAO pla,
qt_qts.res_tubo_austenitizacao aust,
qt_qts.res_tubo_revenimento1 res_rev1,
qt_qts.res_tubo_revenimento2 res_rev2,
limsprod.SAMPLE sp,
limsprod.test t,
limsprod.result r
I need to get ALL the data in the "limsprod.result r" table linked with similar corresponding data inside the qt_qts.res_tubo_austenitizacao aust, qt_qts.res_tubo_revenimento1 res_rev1 and qt_qts.res_tubo_revenimento2 res_rev2 tables.
How can I do this join using Oracle Database? I tried a left join, but it did not work.
It is impossible to answer that question. We have nothing but list of some tables. I'm not sure I'd even want to do that instead of you.
However, here's a suggestion: start with one table:
select * from limsprod.result r;
It'll return all rows. Then join it to another table:
select *
from limsprod.result r join qt_qts.res_tubo_austenitizacao aust on aust.id = r.id
and see what happens - did you get all rows you want? If not, should you add another JOIN condition? Perhaps an outer join? Don't move on to the third table until you sort that out. Once you're satisfied with the result, add another table:
select *
from limsprod.result r join qt_qts.res_tubo_austenitizacao aust on aust.id = r.id
join qt_qts.res_tubo_revenimento1 res_rev1 on res_rev1.idrr = aust.idrr
Repeat what's being said previously.

Hash Join with Partition restriction from third table

my current problem is in 11g, but I am also interested in how this might be solved smarter in later versions.
I want to join two tables. Table A has 10 million rows, Table B is huge and has a billion of records across about a thousand partitions. One partition has around 10 million records. I am not joining on the partition key. For most rows of Table A, one or more rows in Table B will be found.
Example:
select * from table_a a
inner join table_b b on a.ref = b.ref
The above will return about 50 million rows, whereas the results come from about 30 partitions of table b. I am assuming a hash join is the correct join here, hashing table a and FTSing/index-scanning table b.
So, 970 partitions were scanned for no reason. And, I have a third query that could tell oracle which 30 partitions to check for the join.
Example of third query:
select partition_id from table_c
This query gives exactly the 30 partitions for the query above.
To my question:
In PL/SQL one can solve this by
select the 30 partition_ids into a variable (be it just a select listagg(partition_id,',') ... into v_partitions from table_c
Execute my query like so:
execute immediate 'select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in ('||v_partitions||')' into ...
Let's say this completes in 10 minutes.
Now, how can I do this in the same amount of time with pure SQL?
Just simply writing
select * from table_a a
inner join table_b b on a.ref = b.ref
where b.partition_id in (select partition_id from table_c)
does not do the trick it seems, or I might be aiming at the wrong plan.
The plan I think I want is
hash join
table a
nested loop
table c
partition pruning here
table b
But, this does not come back in 10 minutes.
So, how to do this in SQL and what execution plan to aim at? One variation I have not tried yet that might be the solution is
nested loop
table c
hash join
table a
partition pruning here (pushed predicate from the join to c)
table b
Another feeling I have is that the solution might lie in joining table a to table c (not sure on what though) and then joining this result to table b.
I am not asking you to type everything out for me. Just a general concept of how to do this (getting partition restriction from a query) in SQL - what plan should I aim at?
thank you very much! Peter
I'm not an expert at this, but I think Oracle generally does the joins first, then applies the where conditions. So you might get the plan you want by moving the partition pruning up into a join condition:
select * from table_a a
inner join table_b b on a.ref = b.ref
and b.partition_id in (select partition_id from table_c);
I've also seen people try to do this sort of thing with an inline view:
select * from table_a a
inner join (select * from table_b
where partition_id in (select partition_id from table_c)) b
on a.ref = b.ref;
thank you all for your discussions with me on this one. In my case this was solved (not by me) by adding a join-path between table_c and table_a and by overloading the join conditions as below. In my case this was possible by adding column partition_id to table_a:
select * from
table_c c
JOIN table_a a ON (a.partition_id = c.partition_id)
JOIN table_b b ON (b.partition_id = c.partition_id and b.partition_id = a.partition_id and b.ref = a.ref)
And this is the plan you want:
leading(c,b,a) use_nl(c,b) swap_join_inputs(a) use_hash(a)
So you get:
hash join
table a
nested loop
table c
partition list iterator
table b

Worse query plan with a JOIN after ANALYZE

I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.
Suppose the following schema:
CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name
Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).
Before ANALYZE
Now when I run the following query:
SELECT * FROM ab;
The query plan looks as follows:
1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.
After ANALYZE
When I execute the same query again, the query plan results in two table scans:
1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b
This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.
Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?
I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).
Accidentally found the answer by adjusting the GROUP BY clause in the view definition. Instead of joining on a.id, I group on b.a instead, although they have the same values.
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY b.a -- <== changed this from a.id to b.a
ORDER BY a.name
I'm still not entirely sure what the difference is, since it groups the same data.

Why does this query result in a MERGE JOIN CARTESIAN in Oracle?

Here is my query:
select count(*)
from email_prod_junc j
inner join trckd_prod t5 on j.trckd_prod_sk = t5.trckd_prod_sk
inner join prod_brnd b on t5.prod_brnd_sk = b.prod_brnd_sk
inner join email e on j.email_sk = e.email_sk
inner join dm_geography_sales_pos_uniq u on (u.emp_sk = e.emp_sk and u.prod_brnd_sk = b.prod_brnd_sk)
The explain plan says:
Cartesian Join between DM_GEOGRAPHY_SALES_POS_UNIQ and EMAIL_PROD_JUNC.
I don't understand why because there is a join condition for each table.
I solved this by adding the ORDERED hint:
select /*+ ordered */
I got the information from here
If you specify the tables in the order you want them joined and use this hint, Oracle won't spend time trying to figure out the optimal join order, it will just join them as they are ordered in the FROM clause.
Without knowing your indexes and the full plan, it's hard to say why this is happening exactly. My best guess is that EMAIL_PROD_JUNC and DM_GEOGRAPHY_SALES_POS_UNIQ are relatively small and that there's an index on TRCKD_PROD(trckd_prod_sk, prod_brnd_sk). If that's the case, then the optimizer may have decided that the Cartesian on the two smaller tables is less expensive than filtering TRCKD_PROD twice.
I would speculate that it happens because of the on (x and y) condition of the last inner join. Oracle probably doesn't know how to optimize the multi-statement condition, so it does a full join, then filters the result by the condition after the fact. I'm not really familiar with Oracle's explain plan, so I can't say that with authority
Edit
If you wanted to test this hypothesis, you could try changing the query to:
inner join dm_geography_sales_pos_uniq u on u.emp_sk = e.emp_sk
where u.prod_brnd_sk = b.prod_brnd_sk
and see if that eliminates the full join from the plan

Resources