Optimizing Left Join With Group By and Order By (MariaDb) - performance

I am attempting to optimize a query in MariaDb that is really bogged down by its ORDER BY clause. I can run it in under a tenth of a second without the ORDER BY clause, but it takes over 25 seconds with it. Here is the gist of the query:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
ORDER BY u.display_name
LIMIT 0, 10;
I need it to be a left join because I want to include users that aren't linked to a vehicle.
I need the group by because I want only 1 result per user (and display_name is not guaranteed to be unique).
users table has about 130K rows, while user_vehicles has about 230K rows.
Here is the EXPLAIN of the query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u index dms_cust_idx PRIMARY 4 null 124825 Using where; Using temporary; Using filesort
1 SIMPLE uv ref user_idx user_idx 4 awscheduler.u.id 1 Using where
I have tried these two indices to speed things up, but they don't seem to do much.
CREATE INDEX idx_display_speedy ON users(display_name);
CREATE INDEX idx_display_speedy2 ON users(id, display_name, is_deleted, dms_cust_id);
I am looking for ideas on how to speed this up. I attempted using nested queries, but since the order by is the bottleneck & order within the nested query is ignored, I believe that attempt was in vain.

how about:
WITH a AS (
SELECT u.id, u.display_name, u.cell_phone, u.email
FROM users u
WHERE u.is_deleted = 0
GROUP BY u.id
LIMIT 0, 10
)
SELECT a.id, a.display_name, a.cell_phone, a.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM a LEFT JOIN user_vehicles uv ON uv.user_id = a.id AND uv.current_owner=1
ORDER BY a.display_name;
The intention is we take a subset of users before joining it with user_vehicles.
Disclaimer: I haven't verified if its faster or not, but have similar experience in the past where this helps.

with a as (
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
)
select * from a
ORDER BY u.display_name;
)

I suspect it's not actually the ordering that is causing the problem... If you remove the limit, I bet the ordered and un-ordered versions will end up performing pretty close to the same.
Depending on if your actual query is as simple as the one you posted, you may be able to get good performance in a single query by using RowNum() as described here:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM (
SELECT iu.id, iu.display_name, iu.cell_phone, iu.email
FROM users iu
WHERE iu.is_deleted = 0
ORDER BY iu.display_name) as u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE ROWNUM() < 10
GROUP BY u.id
ORDER BY u.display_name
If that doesn't work, you probably need to select the users in one select and then select their vehicles in a second Select

Related

Oracle - Aggregate CLOB within PIVOT

I have large user data containing their location permissions and groups of these permissions. I need to do a report containing only 1 row for every user.
There are so many permissions per one user in one group of them, I need to use my own aggregate function LISTAGG_CLOB which makes aggregate list but returns CLOB (more than 4000 characters).
This query gives me the list of all users with their permissions but it makes one row for every group of them.
select u.NAME,
lpg.NAME,
LISTAGG_CLOB(l.NAME)
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME
I tried to pivot these data but I can't get it to work, because Oracle doesn't recognize my own aggregate function as aggregate function and none aggregate funcitons (besides LISTAGG which is too small) is meant to aggregate strings.
Samples of what I tried are:
1)
select * from (
select u.NAME as USER,
lpg.NAME as PERM_GROUP,
LISTAGG_CLOB(l.NAME) as PERMISSIONS
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME)
pivot (LISTAGG(PERMISSIONS) within group (order by PERMISSIONS) for PERM_GROUP in ('Global', 'Orders', 'Admin') )
But it produces
Buffer too small for CLOB to CHAR or BLOB to RAW conversion
error
2)
select * from (
select u.NAME as USER,
lpg.NAME as PERM_GROUP,
LISTAGG_CLOB(l.NAME) as PERMISSIONS
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME)
pivot (DISTINCT(PERMISSIONS) for PERM_GROUP in ('Global', 'Orders', 'Admin') )
But it says "missing expression" so I guess I can't use DISTINCT (or UNIQUE) keyword in place of aggregate function.
I also tried MAX and other aggregate functions but they accept only numbers as input.
Any suggestions how to pivot these data?

Performance: LEFT JOIN vs SUBQUERY

I'm using PostgreSQL 9.3 and have the following tables (simplified to only show the relevant fields):
SITES:
id
name
...
DEVICES:
id
site_id
mac_address UNIQUE
...
Given the mac_address of a particular device, and I want to get the details of the associated site. I have the following two queries:
Using LEFT JOIN:
SELECT s.* FROM sites s
LEFT JOIN devices d ON s.id = d.site_id
WHERE d.mac_address = '00:00:00:00:00:00';
Using SUBQUERY:
SELECT s.* FROM sites s
WHERE s.id IN (SELECT d.site_id FROM devices d WHERE d.mac_address = '00:00:00:00:00:00');
Which of the two queries would have the best performance over an infinitely growing database? I have always leaned towards the LEFT JOIN option, but would be interested to know how the performance of both rates on a large data set.
It generally won't make any difference, because they should result in the same query plan. At least, an EXISTS subquery will; IN isn't as always as intelligently optimised.
For the subquery, rather than using IN (...) you should generally prefer EXISTS (...).
SELECT s.*
FROM sites s
WHERE EXISTS (
SELECT 1
FROM devices d
WHERE d.mac_address = '00:00:00:00:00:00'
AND d.site_id = s.id
);

How to simplify this postgres query

How can I simplify this query?
What I am trying to do is derive the column S9_Unlock via a subquery in which I only look for user_ids which are returned from the main query but this looks very awkward to me, especially as this query here is just an excerpt. In reality I am doing multiple of these subqueries to derive different columns...
SELECT userid, CAST(to_char(S9_unlock,'YYYY/MM/DD') AS timestamp) AS "S9_Unlock"
FROM (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id) AS master
LEFT JOIN (
SELECT MIN(s3.unl) AS "S9_Unlock", s3.user_id
FROM (
SELECT user_id, challenge_codes.created AS unl,
MAX /* Check if license contains Suite9 */
(CASE WHEN substring(bundle_article_code,1,6) = 'BuSu90' THEN 1 ELSE 0 END) AS "S9_Unlock"
FROM licensing_db.serial_numbers
LEFT JOIN licensing_db.licenses ON licenses.id = serial_numbers.license_id
LEFT JOIN user_db.users ON users.id = licenses.user_id
LEFT JOIN licensing_db.challenge_codes ON challenge_codes.serial_number_id = serial_numbers.id
WHERE user_id IN (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id
)
GROUP BY user_id, challenge_codes.created) AS s3
)
WHERE "S9_Unlock" = 1
AND s3.unl IS NOT NULL
GROUP BY s3.user_id) AS "S9_Unlock" ON "S9_Unlock".user_id = master.userid
In your query you have two sub-queries that are identical; this screams for a CTE.
In the sub-query on licensing issues you can filter out the valid licenses after the GROUP BY clause using a HAVING clause. Make that a WITH QUERY too and you end up with the rather more readable:
WITH inv AS (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON ca.invoice_id = inv.id
LEFT JOIN shop_db.cart_items AS ci ON ci.cart_id = ca.id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
), s3 AS (
SELECT u.user_id, min(cc.created) AS first_unlocked, bundle_article_code
FROM licensing_db.serial_numbers AS sn
LEFT JOIN licensing_db.licenses AS lic ON lic.id = sn.license_id
LEFT JOIN user_db.users AS u ON u.id = lic.user_id
LEFT JOIN licensing_db.challenge_codes AS cc ON cc.serial_number_id = sn.id
WHERE u.user_id IN (SELECT userid FROM inv)
GROUP BY u.user_id, bundle_article_code
HAVING bundle_article_code LIKE 'BuSu90%'
AND first_unlocked IS NOT NULL
)
SELECT userid, date_trunc('day', first_unlocked) AS "S9_Unlock"
FROM inv
LEFT JOIN s3 ON s3.user_id = inv.userid;
So the main query is now reduced to 3 lines and both the WITH-QUERY's perform a logically self-contained query of the database. The other sub-queries you refer to can similarly become a WITH-QUERY and then you assemble them in the main query. Remember that you can refer to earlier named queries in the list of with-queries, as is shown above with inv being referred to by s3. While such CTE's are syntactically not providing new functionality (except for the RECURSIVE variant), they do make complex queries much more readable and therefore easier to maintain.
Another approach would be to factor out logical sub-components (such as the inv sub-query) and make a VIEW out of those. Then you can simply reference the view in the main query. Making the whole thing a view is probably also a good idea if you want to make the query more flexible. What if you want to query for Suite9.1 ('BuSu91%') on 27 March 2014? Taken those literals out and then using them as WHERE clauses in a view makes your query more versatile; this can be either with sub-queries or with the complete CTE.
(Please check if the semantics are still right in the s3 with-query because without your table structures and sample data I ccannot test my code above.)
Instead of solving your problem as one big monolithic relational sql query, I would seriously consider going the "procedural" way, by using the built-in "plpgsql" language of postgresql. This could bring a lot of clarity in your application.

PL SQL - Join 2 tables and return max from right table

Trying to retrive the MAX doc in the right table.
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
max(F431.PRDOC)
FROM PRODDTA.F43121 F431
LEFT OUTER JOIN PRODDTA.F4311 F43
ON
F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
and F43.PDLNTY = 'DC'
Group by
F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC,
F431.PRAREC/100
This query is still returning the two rows in the right table. Fairly new to SQL and struggling with the statement. Any help would be appreciated.
Without seeing your data it is difficult to tell where the problem might so I will offer a few suggestions that could help.
First, you are joining with a LEFT JOIN on the PRODDTA.F4311 but you have in the WHERE clause a filter for that table. You should move the F43.PDLNTY = 'DC' to the JOIN condition. This is causing the query to act like an INNER JOIN.
Second, you can try using a subquery to get the MAX(PRDOC) value. Then you can limit the columns that you are grouping on which could eliminate the duplicates. The query would them be similar to the following:
SELECT F43.PDDOCO,
F43.PDSFXO,
F43.PDLNID,
F43.PDAREC/100 As Received,
F431.PRAREC/100,
F431.PRDOC
FROM PRODDTA.F43121 F431
INNER JOIN
(
-- subquery to get the max
-- then group by the distinct columns
SELECT PDKCOO, max(PRDOC) MaxPRDOC
FROM PRODDTA.F43121
WHERE PRDOCO = 401531
and PRMATC = 2
GROUP BY PDKCOO
) f2
-- join the subquery result back to the PRODDTA.F43121 table
on F431.PRDOC = f2.MaxPRDOC
AND F431.PDKCOO = f2.PDKCOO
LEFT OUTER JOIN PRODDTA.F4311 F43
ON F43.PDKCOO=F431.PRKCOO
AND F43.PDDOCO=F431.PRDOCO
AND F43.PDDCTO=F431.PRDCTO
AND F43.PDSFXO=F431.PRSFXO
AND F43.PDLNID=F431.PRLNID
AND F43.PDLNTY = 'DC' -- move this filter to the join instead of the WHERE
WHERE F431.PRDOCO = 401531
and F431.PRMATC = 2
If you provide your table structures and some sample data, it will be easier to determine the issue.

Tsql: what is the best way to retrieve some records with a specific criteria?

I have a table (Cars) which saves some characteristics car like EngineNo, LastProductionStepId, NodyNo, ...
Besides, I have another table (CarSteps) which saves all steps that a specific car should pass during its manufacturing proccess like Engine Assigning(Id = 2), Engraving(3), PrePaint(4), Paint(5), AfterPaint(6), Confirmation(7), Delivery(8)
I would like to get all cars that are between PrePaint and Confirmation at the moment:
select cr.Id, cr.BodyNo, cr.LastStepId
from Cars cr WITH (NOLOCK)
inner join CarSteps steps WITH (NOLOCK) on cr.Id = trace.CarId and cr.LastStepId=trace.StepId
where
cr.LastStepId >= 4
AND cr.[Status] = 1
AND steps.[Status] = 1
AND not exists ( select *
from CarSteps steps1 WITH (NOLOCK)
where steps1.CarId = cr.Id
AND steps1.StepId >= 7 AND steps1.Status = 1
)
Because CarSteps has many records (44 million) the query is slow .
what is your opinion? is there any better way to get those cars?
Looking at your query I see a join from Cars to CarSteps, I see you joining to trace.CarId and trace.StepId is. trace is not defined in your query.
from
Cars cr WITH (NOLOCK) inner join
CarSteps steps WITH (NOLOCK) on
cr.Id = trace.CarId and
cr.LastStepId=trace.StepId
I may be able to help if i have better understand the fully query.
What does the execution plan reveal?

Resources