Consecutive JOIN and aliases: order of execution - performance

I am trying to use FULLTEXT search as a preliminary filter before fetching data from another table. Consecutive JOINs follow to further refine the query and to mix-and-match rows (in reality there are up to 6 JOINs of the main table).
The first "filter" returns the IDs of the rows that are useful, so after joining I have a subset to continue with. My issue is performance, however, and my lack of understanding of how the SQL query is executed in SQLite.
SELECT *
FROM mytbl AS t1
JOIN
(SELECT someid
FROM myftstbl
WHERE
myftstbl MATCH 'MATCHME') AS prior
ON
t1.someid = prior.someid
AND t1.othercol = 'somevalue'
JOIN mytbl AS t2
ON
t2.someid = prior.someid
/* Or is this faster? t2.someid = t1.someid */
My thought process for the query above is that first, we retrieve the matched IDs from the myftstbl table and use those to JOIN on the main table t1 to get a sub-selection. Then we again JOIN a duplicate of the main table as t2. The part that I am unsure of is which approach would be faster: using the IDs from the matches, or from t2?
In other words: when I refer to t1.someid inside the second JOIN, does that contain only the someids after the first JOIN (so only those at the intersection of prior and those for which t1.othercol = 'somevalue) OR does it contain all the original someids of the whole original table?
You can assume that all columns are indexed. In fact, when I use one or the other approach, I find with EXPLAIN QUERY PLAN that different indices are being used for each query. So there must be a difference between the two.

The query should be simplified to
SELECT *
FROM mytbl AS t1
JOIN myftstbl USING (someid) -- or ON t1.someid = myftstbl.someid
JOIN mytbl AS t2 USING (someid) -- or ON t1.someid = t2.someid
WHERE myftstbl.{???} MATCH 'MATCHME' -- replace {???} with correct column name
AND t1.othercol = 'somevalue'
PS. The query logic is not clear for me, so it is saved as-is.

Related

Oracle Performance issues on using subquery in an "In" orperator

I have two query that looks close to the same but Oracle have very different performance.
Query A
Create Table T1 as Select * from FinalView1 where CustomerID in ('A0000001','A000002')
Query B
Create Table T1 as Select * from FinalView1 where CustomerID in (select distinct CustomerID from CriteriaTable)
The CriteriaTable have 800 rows but all belongs to Customer ID 'A0000001' and 'A000002'.
This means the subquery: "select distinct CustomerID from CriteriaTable" also only returns the same two elements('A0000001','A000002') as manually entered in query A
Following is the query under the FinalView1
create or replace view FinalView1_20200716 as
select
Customer_ID,
<Some columns>
from
Table1_20200716 T1
INNER join Table2_20200716 T2 on
T1.Invoice_number = T2.Invoice_number
and
T1.line_id = T2.line_id
left join Table3_20200716 T3 on
T3.id = T1.Customer_ID
left join Table4_20200716 T4 on
T4.Shipping_ID = T1.Shipping_ID
left join Table5_20200716 Table5 on
Table5.Invoice_ID = T1.Invoice_ID
left join Table6_20200716 T6 on
T6.Shipping_ID = T4.Shipping_ID
left join First_Order first on
first.Shipping_ID = T1.Shipping_ID
;
Table1_20200716,Table2_20200716,Table3_20200716,Table4_20200716,Table5_20200716,Table6_20200716 are views to the corresponding table with temporal validity feature. For example
The query under Table1_20200716
Create or replace view Table1_20200716 as
select
*
from Table1 as for period of to_date('20200716,'yyyymmdd')
However table "First_Order" is just a normal table as
Following is the performance for both queries (According to explain plan):
Query A:
Cardinality: 102
Cost : 204
Total Runtime: 5 secs max
Query B:
Cardinality:27921981
Cost: 14846
Total Runtime:20 mins until user cancelled
All tables are indexed using those columns that used to join against other tables in the FinalView1. According to the explain plan, they have all been used except for the FirstOrder table.
Query A used uniquue index on the FirstOrder Table while Query B performed a full scan.
For query B, I was expecting the Oracle will firstly query the sub-query get the result into the in operator, before executing the main query and therefore should only have minor impact to the performance.
Thanks in advance!
As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.
That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join
Create Table T1 as
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;
Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)
However, it only takes 17 secs. Which is very good!

How to effeciently select data from two tables?

I have two tables: A, B.
A has prisoner_id and prisoner_name columns.
B has all other info about prisoners included prisoner_name column.
First I select all of the data that I need from B:
WITH prisoner_datas AS
(SELECT prisoner_name, ... FROM B WHERE ...)
Then I want to know all of the id of my prisoner_datas. To do this I need to combine information by prisoner_name column, because it's common for both tables
I did the following
SELECT A.prisoner_id, prisoner_datas.prisoner_name, prisoner_datas. ...,
FROM A, prisoner_datas
WHERE A.prisoner_name = prisoner_datas.prisoner_name
But it works very slow. How can I improve performance?
Add an index on the prisoner_name join column in the B table. Then the following join should have some performance improvement:
SELECT
A.prisoner_id,
B.prisoner_name,
B.prisoner_datas.id -- and other columns if needed
FROM A
INNER JOIN B
ON A.prisoner_name = B.prisoner_name
Note here that I used an explicit join syntax here. It isn't required, and the query plan might not change, but it makes the query easier to read. I don't think the CTE will change much, but the lack of an index on the join column should be important here.

SQL: ORA-00918 column ambiguously defined in INNER JOIN

I'm getting this error no matter what I do with the INNER JOIN Statement
Here is my code:
SELECT Package_Code, Description, Duration, Site_Code
FROM tbl_Holiday_Details
INNER JOIN tbl_Site_Visted
ON tbl_Holiday_Details.Package_Code = tbl_Site_Visted.Package_Code
INNER JOIN tbl_Site_Visted
ON tbl_Site_Details.Site_Code = tbl_Site_Visted.Site_Code
I don't understand what is the problem.
ps. if needed i will provide more code
The immediate problem is that at least Package_Code and Site_Code exist in multiple tables but your select does not specify which table you want to return data from. Yes, you know that you're doing an inner join on those columns so it doesn't matter which table's value is returned but the SQL syntax doesn't allow Oracle to make that inference. Generally, I would advise that you always alias every column both so it is clear which table a particular attribute is coming from and so that you don't break code when you add an attribute to a different table that happens to have the same name.
SELECT tbl_Holiday_Details.Package_Code,
Description,
Duration,
tbl_Site_Visted.Site_Code
FROM tbl_Holiday_Details
INNER JOIN tbl_Site_Visted
ON tbl_Holiday_Details.Package_Code = tbl_Site_Visted.Package_Code
INNER JOIN tbl_Site_Visted
ON tbl_Site_Details.Site_Code = tbl_Site_Visted.Site_Code
will work assuming Description and Duration are defined only in one of the three tables. I would add aliases to Description and Duration as well but I don't know which of the tables should be used. Of course, I would generally use simpler aliases (say, tsv for tbl_Site_Visited) rather than the full table name.
If you want to avoid aliasing your columns, you could use the USING clause rather than the ON clause
SELECT Package_Code,
Description,
Duration,
Site_Code
FROM tbl_Holiday_Details
INNER JOIN tbl_Site_Visted
USING( Package_Code )
INNER JOIN tbl_Site_Visted
USING( Site_Code )

Worse query plan with a JOIN after ANALYZE

I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.
Suppose the following schema:
CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name
Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).
Before ANALYZE
Now when I run the following query:
SELECT * FROM ab;
The query plan looks as follows:
1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.
After ANALYZE
When I execute the same query again, the query plan results in two table scans:
1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b
This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.
Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?
I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).
Accidentally found the answer by adjusting the GROUP BY clause in the view definition. Instead of joining on a.id, I group on b.a instead, although they have the same values.
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY b.a -- <== changed this from a.id to b.a
ORDER BY a.name
I'm still not entirely sure what the difference is, since it groups the same data.

Oracle ignores hint for index with synonym and 2 views

This is the query i am running:
select /*+ index(V_AMV_PLG_ORDER_HISTORY_200_MS.orders.T0 IDX_ORDER_VERSION_3) */ *
from V_AMV_PLG_ORDER_HISTORY_200_MS
where EXCHANGE_SK = 32 and PRODUCT_SK = 1000169
And it uses a different index than the one i am ordering it to.
As you can see, I am querying from the view V_AMV_PLG_ORDER_HISTORY_200_MS, you can see its sql query here:
V_AMV_PLG_ORDER_HISTORY_200_MS view SQL Query:
SELECT AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME) AS ORDER_DATE_TIME,
SUM(orders.BASE_VOLUME) AS VOLUME,
SUM(orders.BASE_CURR_LIMIT_PRICE*orders.BASE_VOLUME)/SUM(orders.BASE_VOLUME) AS PRICE,
orders.PRODUCT_SK AS PRODUCT_SK,
orders.EXCHANGE_SK AS EXCHANGE_SK,
orders.DIRECTION_CD AS DIRECTION_CD,
orders.AGG_UNIT_CD AS AGG_UNIT_CD,
orders.TRADER_KEY AS EXECUTING_REPRESENTATIVE_KEY,
orders.ACCOUNT_KEY AS ACCOUNT_KEY,
a.BUSINESS_UNIT_CD AS BUSINESS_UNIT_CD
FROM AMV_PERF_PROFILES_FRONTEND.S_AMV_ORDER_VERSION_NEW orders
INNER JOIN AMV_PERF_PROFILES_FRONTEND.S_AMV_ACCOUNT a
ON a.ACCOUNT_KEY = orders.ACCOUNT_KEY
WHERE BASE_VOLUME > 0
GROUP BY AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME),
orders.PRODUCT_SK,
orders.EXCHANGE_SK,
orders.ACCOUNT_KEY,
a.BUSINESS_UNIT_CD,
orders.AGG_UNIT_CD,
orders.TRADER_KEY,
orders.DIRECTION_CD;
He is getting the data using the Synonym S_AMV_ORDER_VERSION_NEW, Which directs to another Scheme, to a view called V_AMV_ORDER_VERSION and refering to it as orders, its sql query here:
V_AMV_ORDER_VERSION view Sql query:
SELECT T1.ENTITY_KEY ,
T2.AGG_UNIT_CD ,
T0.BASE_CURR_LIMIT_PRICE ,
T7.DIRECTION_CD ,
T0.EXCHANGE_SK,
T0.ORDER_LOCAL_DATE_TIME ,
T0.PRODUCT_SK,
T18.ENTITY_KEY ,
T19.ENTITY_KEY ,
T0.NOTIONAL_VALUE2 ,
T0.NOTIONAL_VALUE ,
T0.ORDER_GLOBAL_DATE_TIME ,
T0.BASE_VOLUME ,
T31.TRANSACTION_STATUS_CD ,
T0.ORDER_VERSION_KEY
FROM ETS_UDM_CDS_NEW.ORDER_VERSION T0
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T1
ON T0.ACCOUNT_SK = T1.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.AGG_UNIT T2
ON T0.AGG_UNIT_SK = T2.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.DIRECTION T7
ON T0.DIRECTION_SK = T7.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T18
ON T0.LOCAL_TIME_ZONE_SK = T18.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T19
ON T0.TRADER_SK = T19.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.TRANSACTION_STATUS T31
ON T0.TRANSACTION_STATUS_SK = T31.ENTITY_SK;
Which takes its data from a table called ORDER_VERSION and refers to it as T0
this table has an index called IDX_ORDER_VERSION
The problem is that oracle ignores my hint, And uses a different index, Now, I have managed to use a hint to make oracle use an index i wanted when i was querying a view that gets data from a table, But this time I am querying a view which gets his data from another view which gets his data from a table.
And also, The second view in the line is on a different Scheme and i am using a synonym, So perhaps that is why i am missing something Cuz i tried many combinations of possible solutions i found on google but nothing seems to be working...
I would say that if i go one step forward and query directly from V_AMV_ORDER_VERSION (Without the synonym) IT works and i can make oracle work with any index i want, so this query works perfect:
select /*+ index(orders.T0 IDX_ORDER_VERSION_5) */ * from V_AMV_ORDER_VERSION orders
where EXCHANGE_SK =32 and PRODUCT_SK = 1000169
Well me and our company's DBA looked at it for a while, it seems like an Oracle bug in the Global Hint manifestation, We have created the view V_AMV_PLG_ORDER_HISTORY_200_MS using a regular join rather than an ANSI join, and now it works properly:
V_AMV_PLG_ORDER_HISTORY_200_MS view SQL Query:
SELECT AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME) AS ORDER_DATE_TIME,
SUM(orders.BASE_VOLUME) AS VOLUME,
SUM(orders.BASE_CURR_LIMIT_PRICE*orders.BASE_VOLUME)/SUM(orders.BASE_VOLUME) AS PRICE,
orders.PRODUCT_SK AS PRODUCT_SK,
orders.EXCHANGE_SK AS EXCHANGE_SK,
orders.DIRECTION_CD AS DIRECTION_CD,
orders.AGG_UNIT_CD AS AGG_UNIT_CD,
orders.TRADER_KEY AS EXECUTING_REPRESENTATIVE_KEY,
orders.ACCOUNT_KEY AS ACCOUNT_KEY,
a.BUSINESS_UNIT_CD AS BUSINESS_UNIT_CD
FROM AMV_PERF_PROFILES_FRONTEND.S_AMV_ORDER_VERSION_NEW orders,
AMV_PERF_PROFILES_FRONTEND.S_AMV_ACCOUNT a
WHERE BASE_VOLUME > 0 AND a.ACCOUNT_KEY = orders.ACCOUNT_KEY
GROUP BY AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME),
orders.PRODUCT_SK,
orders.EXCHANGE_SK,
orders.ACCOUNT_KEY,
a.BUSINESS_UNIT_CD,
orders.AGG_UNIT_CD,
orders.TRADER_KEY,
orders.DIRECTION_CD;

Resources