Performance: LEFT JOIN vs SUBQUERY - performance

I'm using PostgreSQL 9.3 and have the following tables (simplified to only show the relevant fields):
SITES:
id
name
...
DEVICES:
id
site_id
mac_address UNIQUE
...
Given the mac_address of a particular device, and I want to get the details of the associated site. I have the following two queries:
Using LEFT JOIN:
SELECT s.* FROM sites s
LEFT JOIN devices d ON s.id = d.site_id
WHERE d.mac_address = '00:00:00:00:00:00';
Using SUBQUERY:
SELECT s.* FROM sites s
WHERE s.id IN (SELECT d.site_id FROM devices d WHERE d.mac_address = '00:00:00:00:00:00');
Which of the two queries would have the best performance over an infinitely growing database? I have always leaned towards the LEFT JOIN option, but would be interested to know how the performance of both rates on a large data set.

It generally won't make any difference, because they should result in the same query plan. At least, an EXISTS subquery will; IN isn't as always as intelligently optimised.
For the subquery, rather than using IN (...) you should generally prefer EXISTS (...).
SELECT s.*
FROM sites s
WHERE EXISTS (
SELECT 1
FROM devices d
WHERE d.mac_address = '00:00:00:00:00:00'
AND d.site_id = s.id
);

Related

Optimizing Left Join With Group By and Order By (MariaDb)

I am attempting to optimize a query in MariaDb that is really bogged down by its ORDER BY clause. I can run it in under a tenth of a second without the ORDER BY clause, but it takes over 25 seconds with it. Here is the gist of the query:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
ORDER BY u.display_name
LIMIT 0, 10;
I need it to be a left join because I want to include users that aren't linked to a vehicle.
I need the group by because I want only 1 result per user (and display_name is not guaranteed to be unique).
users table has about 130K rows, while user_vehicles has about 230K rows.
Here is the EXPLAIN of the query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u index dms_cust_idx PRIMARY 4 null 124825 Using where; Using temporary; Using filesort
1 SIMPLE uv ref user_idx user_idx 4 awscheduler.u.id 1 Using where
I have tried these two indices to speed things up, but they don't seem to do much.
CREATE INDEX idx_display_speedy ON users(display_name);
CREATE INDEX idx_display_speedy2 ON users(id, display_name, is_deleted, dms_cust_id);
I am looking for ideas on how to speed this up. I attempted using nested queries, but since the order by is the bottleneck & order within the nested query is ignored, I believe that attempt was in vain.
how about:
WITH a AS (
SELECT u.id, u.display_name, u.cell_phone, u.email
FROM users u
WHERE u.is_deleted = 0
GROUP BY u.id
LIMIT 0, 10
)
SELECT a.id, a.display_name, a.cell_phone, a.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM a LEFT JOIN user_vehicles uv ON uv.user_id = a.id AND uv.current_owner=1
ORDER BY a.display_name;
The intention is we take a subset of users before joining it with user_vehicles.
Disclaimer: I haven't verified if its faster or not, but have similar experience in the past where this helps.
with a as (
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
)
select * from a
ORDER BY u.display_name;
)
I suspect it's not actually the ordering that is causing the problem... If you remove the limit, I bet the ordered and un-ordered versions will end up performing pretty close to the same.
Depending on if your actual query is as simple as the one you posted, you may be able to get good performance in a single query by using RowNum() as described here:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM (
SELECT iu.id, iu.display_name, iu.cell_phone, iu.email
FROM users iu
WHERE iu.is_deleted = 0
ORDER BY iu.display_name) as u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE ROWNUM() < 10
GROUP BY u.id
ORDER BY u.display_name
If that doesn't work, you probably need to select the users in one select and then select their vehicles in a second Select

Oracle Performance issues on using subquery in an "In" orperator

I have two query that looks close to the same but Oracle have very different performance.
Query A
Create Table T1 as Select * from FinalView1 where CustomerID in ('A0000001','A000002')
Query B
Create Table T1 as Select * from FinalView1 where CustomerID in (select distinct CustomerID from CriteriaTable)
The CriteriaTable have 800 rows but all belongs to Customer ID 'A0000001' and 'A000002'.
This means the subquery: "select distinct CustomerID from CriteriaTable" also only returns the same two elements('A0000001','A000002') as manually entered in query A
Following is the query under the FinalView1
create or replace view FinalView1_20200716 as
select
Customer_ID,
<Some columns>
from
Table1_20200716 T1
INNER join Table2_20200716 T2 on
T1.Invoice_number = T2.Invoice_number
and
T1.line_id = T2.line_id
left join Table3_20200716 T3 on
T3.id = T1.Customer_ID
left join Table4_20200716 T4 on
T4.Shipping_ID = T1.Shipping_ID
left join Table5_20200716 Table5 on
Table5.Invoice_ID = T1.Invoice_ID
left join Table6_20200716 T6 on
T6.Shipping_ID = T4.Shipping_ID
left join First_Order first on
first.Shipping_ID = T1.Shipping_ID
;
Table1_20200716,Table2_20200716,Table3_20200716,Table4_20200716,Table5_20200716,Table6_20200716 are views to the corresponding table with temporal validity feature. For example
The query under Table1_20200716
Create or replace view Table1_20200716 as
select
*
from Table1 as for period of to_date('20200716,'yyyymmdd')
However table "First_Order" is just a normal table as
Following is the performance for both queries (According to explain plan):
Query A:
Cardinality: 102
Cost : 204
Total Runtime: 5 secs max
Query B:
Cardinality:27921981
Cost: 14846
Total Runtime:20 mins until user cancelled
All tables are indexed using those columns that used to join against other tables in the FinalView1. According to the explain plan, they have all been used except for the FirstOrder table.
Query A used uniquue index on the FirstOrder Table while Query B performed a full scan.
For query B, I was expecting the Oracle will firstly query the sub-query get the result into the in operator, before executing the main query and therefore should only have minor impact to the performance.
Thanks in advance!
As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.
That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join
Create Table T1 as
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;
Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)
However, it only takes 17 secs. Which is very good!

How to simplify this postgres query

How can I simplify this query?
What I am trying to do is derive the column S9_Unlock via a subquery in which I only look for user_ids which are returned from the main query but this looks very awkward to me, especially as this query here is just an excerpt. In reality I am doing multiple of these subqueries to derive different columns...
SELECT userid, CAST(to_char(S9_unlock,'YYYY/MM/DD') AS timestamp) AS "S9_Unlock"
FROM (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id) AS master
LEFT JOIN (
SELECT MIN(s3.unl) AS "S9_Unlock", s3.user_id
FROM (
SELECT user_id, challenge_codes.created AS unl,
MAX /* Check if license contains Suite9 */
(CASE WHEN substring(bundle_article_code,1,6) = 'BuSu90' THEN 1 ELSE 0 END) AS "S9_Unlock"
FROM licensing_db.serial_numbers
LEFT JOIN licensing_db.licenses ON licenses.id = serial_numbers.license_id
LEFT JOIN user_db.users ON users.id = licenses.user_id
LEFT JOIN licensing_db.challenge_codes ON challenge_codes.serial_number_id = serial_numbers.id
WHERE user_id IN (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id
)
GROUP BY user_id, challenge_codes.created) AS s3
)
WHERE "S9_Unlock" = 1
AND s3.unl IS NOT NULL
GROUP BY s3.user_id) AS "S9_Unlock" ON "S9_Unlock".user_id = master.userid
In your query you have two sub-queries that are identical; this screams for a CTE.
In the sub-query on licensing issues you can filter out the valid licenses after the GROUP BY clause using a HAVING clause. Make that a WITH QUERY too and you end up with the rather more readable:
WITH inv AS (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON ca.invoice_id = inv.id
LEFT JOIN shop_db.cart_items AS ci ON ci.cart_id = ca.id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
), s3 AS (
SELECT u.user_id, min(cc.created) AS first_unlocked, bundle_article_code
FROM licensing_db.serial_numbers AS sn
LEFT JOIN licensing_db.licenses AS lic ON lic.id = sn.license_id
LEFT JOIN user_db.users AS u ON u.id = lic.user_id
LEFT JOIN licensing_db.challenge_codes AS cc ON cc.serial_number_id = sn.id
WHERE u.user_id IN (SELECT userid FROM inv)
GROUP BY u.user_id, bundle_article_code
HAVING bundle_article_code LIKE 'BuSu90%'
AND first_unlocked IS NOT NULL
)
SELECT userid, date_trunc('day', first_unlocked) AS "S9_Unlock"
FROM inv
LEFT JOIN s3 ON s3.user_id = inv.userid;
So the main query is now reduced to 3 lines and both the WITH-QUERY's perform a logically self-contained query of the database. The other sub-queries you refer to can similarly become a WITH-QUERY and then you assemble them in the main query. Remember that you can refer to earlier named queries in the list of with-queries, as is shown above with inv being referred to by s3. While such CTE's are syntactically not providing new functionality (except for the RECURSIVE variant), they do make complex queries much more readable and therefore easier to maintain.
Another approach would be to factor out logical sub-components (such as the inv sub-query) and make a VIEW out of those. Then you can simply reference the view in the main query. Making the whole thing a view is probably also a good idea if you want to make the query more flexible. What if you want to query for Suite9.1 ('BuSu91%') on 27 March 2014? Taken those literals out and then using them as WHERE clauses in a view makes your query more versatile; this can be either with sub-queries or with the complete CTE.
(Please check if the semantics are still right in the s3 with-query because without your table structures and sample data I ccannot test my code above.)
Instead of solving your problem as one big monolithic relational sql query, I would seriously consider going the "procedural" way, by using the built-in "plpgsql" language of postgresql. This could bring a lot of clarity in your application.

Oracle ignores hint for index with synonym and 2 views

This is the query i am running:
select /*+ index(V_AMV_PLG_ORDER_HISTORY_200_MS.orders.T0 IDX_ORDER_VERSION_3) */ *
from V_AMV_PLG_ORDER_HISTORY_200_MS
where EXCHANGE_SK = 32 and PRODUCT_SK = 1000169
And it uses a different index than the one i am ordering it to.
As you can see, I am querying from the view V_AMV_PLG_ORDER_HISTORY_200_MS, you can see its sql query here:
V_AMV_PLG_ORDER_HISTORY_200_MS view SQL Query:
SELECT AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME) AS ORDER_DATE_TIME,
SUM(orders.BASE_VOLUME) AS VOLUME,
SUM(orders.BASE_CURR_LIMIT_PRICE*orders.BASE_VOLUME)/SUM(orders.BASE_VOLUME) AS PRICE,
orders.PRODUCT_SK AS PRODUCT_SK,
orders.EXCHANGE_SK AS EXCHANGE_SK,
orders.DIRECTION_CD AS DIRECTION_CD,
orders.AGG_UNIT_CD AS AGG_UNIT_CD,
orders.TRADER_KEY AS EXECUTING_REPRESENTATIVE_KEY,
orders.ACCOUNT_KEY AS ACCOUNT_KEY,
a.BUSINESS_UNIT_CD AS BUSINESS_UNIT_CD
FROM AMV_PERF_PROFILES_FRONTEND.S_AMV_ORDER_VERSION_NEW orders
INNER JOIN AMV_PERF_PROFILES_FRONTEND.S_AMV_ACCOUNT a
ON a.ACCOUNT_KEY = orders.ACCOUNT_KEY
WHERE BASE_VOLUME > 0
GROUP BY AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME),
orders.PRODUCT_SK,
orders.EXCHANGE_SK,
orders.ACCOUNT_KEY,
a.BUSINESS_UNIT_CD,
orders.AGG_UNIT_CD,
orders.TRADER_KEY,
orders.DIRECTION_CD;
He is getting the data using the Synonym S_AMV_ORDER_VERSION_NEW, Which directs to another Scheme, to a view called V_AMV_ORDER_VERSION and refering to it as orders, its sql query here:
V_AMV_ORDER_VERSION view Sql query:
SELECT T1.ENTITY_KEY ,
T2.AGG_UNIT_CD ,
T0.BASE_CURR_LIMIT_PRICE ,
T7.DIRECTION_CD ,
T0.EXCHANGE_SK,
T0.ORDER_LOCAL_DATE_TIME ,
T0.PRODUCT_SK,
T18.ENTITY_KEY ,
T19.ENTITY_KEY ,
T0.NOTIONAL_VALUE2 ,
T0.NOTIONAL_VALUE ,
T0.ORDER_GLOBAL_DATE_TIME ,
T0.BASE_VOLUME ,
T31.TRANSACTION_STATUS_CD ,
T0.ORDER_VERSION_KEY
FROM ETS_UDM_CDS_NEW.ORDER_VERSION T0
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T1
ON T0.ACCOUNT_SK = T1.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.AGG_UNIT T2
ON T0.AGG_UNIT_SK = T2.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.DIRECTION T7
ON T0.DIRECTION_SK = T7.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T18
ON T0.LOCAL_TIME_ZONE_SK = T18.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.ENTITY T19
ON T0.TRADER_SK = T19.ENTITY_SK
LEFT OUTER JOIN ETS_UDM_CDS_NEW.TRANSACTION_STATUS T31
ON T0.TRANSACTION_STATUS_SK = T31.ENTITY_SK;
Which takes its data from a table called ORDER_VERSION and refers to it as T0
this table has an index called IDX_ORDER_VERSION
The problem is that oracle ignores my hint, And uses a different index, Now, I have managed to use a hint to make oracle use an index i wanted when i was querying a view that gets data from a table, But this time I am querying a view which gets his data from another view which gets his data from a table.
And also, The second view in the line is on a different Scheme and i am using a synonym, So perhaps that is why i am missing something Cuz i tried many combinations of possible solutions i found on google but nothing seems to be working...
I would say that if i go one step forward and query directly from V_AMV_ORDER_VERSION (Without the synonym) IT works and i can make oracle work with any index i want, so this query works perfect:
select /*+ index(orders.T0 IDX_ORDER_VERSION_5) */ * from V_AMV_ORDER_VERSION orders
where EXCHANGE_SK =32 and PRODUCT_SK = 1000169
Well me and our company's DBA looked at it for a while, it seems like an Oracle bug in the Global Hint manifestation, We have created the view V_AMV_PLG_ORDER_HISTORY_200_MS using a regular join rather than an ANSI join, and now it works properly:
V_AMV_PLG_ORDER_HISTORY_200_MS view SQL Query:
SELECT AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME) AS ORDER_DATE_TIME,
SUM(orders.BASE_VOLUME) AS VOLUME,
SUM(orders.BASE_CURR_LIMIT_PRICE*orders.BASE_VOLUME)/SUM(orders.BASE_VOLUME) AS PRICE,
orders.PRODUCT_SK AS PRODUCT_SK,
orders.EXCHANGE_SK AS EXCHANGE_SK,
orders.DIRECTION_CD AS DIRECTION_CD,
orders.AGG_UNIT_CD AS AGG_UNIT_CD,
orders.TRADER_KEY AS EXECUTING_REPRESENTATIVE_KEY,
orders.ACCOUNT_KEY AS ACCOUNT_KEY,
a.BUSINESS_UNIT_CD AS BUSINESS_UNIT_CD
FROM AMV_PERF_PROFILES_FRONTEND.S_AMV_ORDER_VERSION_NEW orders,
AMV_PERF_PROFILES_FRONTEND.S_AMV_ACCOUNT a
WHERE BASE_VOLUME > 0 AND a.ACCOUNT_KEY = orders.ACCOUNT_KEY
GROUP BY AMV_PERF_PROFILES_FRONTEND.AMV_PLG_GET_SEGMENT(200, orders.ORDER_GLOBAL_DATE_TIME),
orders.PRODUCT_SK,
orders.EXCHANGE_SK,
orders.ACCOUNT_KEY,
a.BUSINESS_UNIT_CD,
orders.AGG_UNIT_CD,
orders.TRADER_KEY,
orders.DIRECTION_CD;

How to Optimize this Query

I'm learning MS SQL Server 2008 R2 so please excuse my ignorance.
This query takes 3 sec and I would like to do it in less than 1 sec.
the query is only for testing purposes, in reality I would join on different fields.
select * from
(
select row_number() over(order by t1.id) as n, t1.id as id1, t2.id as id2, t3.id as id3, t4.id as id4, t5.id as id5
from dbo.Context t1
inner join dbo.Context t2 on t1.id = t2.test
inner join dbo.Context t3 on t2.id = t3.test
inner join dbo.Context t4 on t3.id = t4.test
inner join dbo.Context t5 on t4.id = t5.test
) as t
where t.n between 950000 and 950009;
I'm afraid this will be worse by the time I have several billion records in this table.
Do I need to enable multi-threading from configuration or something?
There is no real way to optimize the paging portion of such a query, the part that is
t.n between 950000 and 950009
Which is really
{ ROW_NUMBER } between 950000 and 950009
Without fully materializing the INNER JOINs, there is no way to accurately row-number the result. This is unlike a single table with Row_Number - the Query Optimizer can sometimes just count the index keys and go to a direct range.
The only thing you can do is ensure that the JOIN conditions are all fully indexed and have the indexes INCLUDE the columns that will be selected (so they become COVERING INDEXes). There is no point showing specifics since those are not your real columns.
Do I need to enable multi-threading from configuration or something?
By default, parallelism is [already] turned on so such a query will very likely gather the data in multiple streams.
I'd suggest creating the inner query as an indexed view and then running your paging off of that. Since an indexed view actually has a real index on it the same optimization tricks that work on tables can be used.
See here for more information on indexed views including the limitations.

Resources