Oracle - Aggregate CLOB within PIVOT - oracle

I have large user data containing their location permissions and groups of these permissions. I need to do a report containing only 1 row for every user.
There are so many permissions per one user in one group of them, I need to use my own aggregate function LISTAGG_CLOB which makes aggregate list but returns CLOB (more than 4000 characters).
This query gives me the list of all users with their permissions but it makes one row for every group of them.
select u.NAME,
lpg.NAME,
LISTAGG_CLOB(l.NAME)
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME
I tried to pivot these data but I can't get it to work, because Oracle doesn't recognize my own aggregate function as aggregate function and none aggregate funcitons (besides LISTAGG which is too small) is meant to aggregate strings.
Samples of what I tried are:
1)
select * from (
select u.NAME as USER,
lpg.NAME as PERM_GROUP,
LISTAGG_CLOB(l.NAME) as PERMISSIONS
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME)
pivot (LISTAGG(PERMISSIONS) within group (order by PERMISSIONS) for PERM_GROUP in ('Global', 'Orders', 'Admin') )
But it produces
Buffer too small for CLOB to CHAR or BLOB to RAW conversion
error
2)
select * from (
select u.NAME as USER,
lpg.NAME as PERM_GROUP,
LISTAGG_CLOB(l.NAME) as PERMISSIONS
from USERS u
left join LOCATION_PERM lp
on lp.USER_ID = u.ID
left join LOCATION l
on l.ID = lp.LOCATION
left join LOCATION_PERM_GROUP lpg
on lpg.ID = lp.GROUP
group by u.NAME, lpg.NAME)
pivot (DISTINCT(PERMISSIONS) for PERM_GROUP in ('Global', 'Orders', 'Admin') )
But it says "missing expression" so I guess I can't use DISTINCT (or UNIQUE) keyword in place of aggregate function.
I also tried MAX and other aggregate functions but they accept only numbers as input.
Any suggestions how to pivot these data?

Related

Optimizing Left Join With Group By and Order By (MariaDb)

I am attempting to optimize a query in MariaDb that is really bogged down by its ORDER BY clause. I can run it in under a tenth of a second without the ORDER BY clause, but it takes over 25 seconds with it. Here is the gist of the query:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
ORDER BY u.display_name
LIMIT 0, 10;
I need it to be a left join because I want to include users that aren't linked to a vehicle.
I need the group by because I want only 1 result per user (and display_name is not guaranteed to be unique).
users table has about 130K rows, while user_vehicles has about 230K rows.
Here is the EXPLAIN of the query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u index dms_cust_idx PRIMARY 4 null 124825 Using where; Using temporary; Using filesort
1 SIMPLE uv ref user_idx user_idx 4 awscheduler.u.id 1 Using where
I have tried these two indices to speed things up, but they don't seem to do much.
CREATE INDEX idx_display_speedy ON users(display_name);
CREATE INDEX idx_display_speedy2 ON users(id, display_name, is_deleted, dms_cust_id);
I am looking for ideas on how to speed this up. I attempted using nested queries, but since the order by is the bottleneck & order within the nested query is ignored, I believe that attempt was in vain.
how about:
WITH a AS (
SELECT u.id, u.display_name, u.cell_phone, u.email
FROM users u
WHERE u.is_deleted = 0
GROUP BY u.id
LIMIT 0, 10
)
SELECT a.id, a.display_name, a.cell_phone, a.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM a LEFT JOIN user_vehicles uv ON uv.user_id = a.id AND uv.current_owner=1
ORDER BY a.display_name;
The intention is we take a subset of users before joining it with user_vehicles.
Disclaimer: I haven't verified if its faster or not, but have similar experience in the past where this helps.
with a as (
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM users u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE u.is_deleted = 0
GROUP BY u.id
)
select * from a
ORDER BY u.display_name;
)
I suspect it's not actually the ordering that is causing the problem... If you remove the limit, I bet the ordered and un-ordered versions will end up performing pretty close to the same.
Depending on if your actual query is as simple as the one you posted, you may be able to get good performance in a single query by using RowNum() as described here:
SELECT u.id, u.display_name, u.cell_phone, u.email,
uv.year, uv.make, uv.model, uv.id AS user_vehicle_id
FROM (
SELECT iu.id, iu.display_name, iu.cell_phone, iu.email
FROM users iu
WHERE iu.is_deleted = 0
ORDER BY iu.display_name) as u
LEFT JOIN user_vehicles uv ON uv.user_id = u.id AND uv.current_owner=1
WHERE ROWNUM() < 10
GROUP BY u.id
ORDER BY u.display_name
If that doesn't work, you probably need to select the users in one select and then select their vehicles in a second Select

how to get the count(id) from the multiple tables using join query

I want to display count based on the id from multiple tables. for two tables it is working fine but for three tables it is not displaying data
this is my query for three tables it is not working
select r.req_id
, r.no_of_positions
, count(j.cand_id) as no_of_closure
, count(cis.cand_id)
from requirement r
join joined_candidates j
on r.req_id=j.req_id
join candidate_interview_schedule cis
on cis.req_id=r.req_id
where cis.interview_status='Interview Scheduled'
group by r.req_id, r.no_of_positions;
Changed to left joins incase value doens't exist in a table
Changed count to use an window function so counts are not artificially inflated by joins
moved where clause to join criteria as on a left join, it would negate the null values, making it operate like a inner join.
..MAYBE...
SELECt r.req_id
, r.no_of_positions
, count(j.cand_id) over (partition by J.cand_ID) as no_of_closure
, count(cis.cand_id) over (partition by cis.cand_id) as no_of_CIS_CNT
FROM requirement r
LEFT join joined_candidates j
on r.req_id=j.req_id
LEFT join candidate_interview_schedule cis
on cis.req_id=r.req_id
and cis.interview_status='Interview Scheduled'
GROUP BY r.req_id, r.no_of_positions;
or perhaps... (if I can assume j.cand_ID and cis.cand_ID are unique) also to eliminate artificial count increase due to 1:M joins
SELECt r.req_id
, r.no_of_positions
, count(distinct j.cand_id) as no_of_closure
, count(distinct cis.cand_id) as no_of_CIS_CNT
FROM requirement r
LEFT join joined_candidates j
on r.req_id=j.req_id
LEFT join candidate_interview_schedule cis
on cis.req_id=r.req_id
and cis.interview_status='Interview Scheduled'
GROUP BY r.req_id, r.no_of_positions;

How to simplify this postgres query

How can I simplify this query?
What I am trying to do is derive the column S9_Unlock via a subquery in which I only look for user_ids which are returned from the main query but this looks very awkward to me, especially as this query here is just an excerpt. In reality I am doing multiple of these subqueries to derive different columns...
SELECT userid, CAST(to_char(S9_unlock,'YYYY/MM/DD') AS timestamp) AS "S9_Unlock"
FROM (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id) AS master
LEFT JOIN (
SELECT MIN(s3.unl) AS "S9_Unlock", s3.user_id
FROM (
SELECT user_id, challenge_codes.created AS unl,
MAX /* Check if license contains Suite9 */
(CASE WHEN substring(bundle_article_code,1,6) = 'BuSu90' THEN 1 ELSE 0 END) AS "S9_Unlock"
FROM licensing_db.serial_numbers
LEFT JOIN licensing_db.licenses ON licenses.id = serial_numbers.license_id
LEFT JOIN user_db.users ON users.id = licenses.user_id
LEFT JOIN licensing_db.challenge_codes ON challenge_codes.serial_number_id = serial_numbers.id
WHERE user_id IN (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id
)
GROUP BY user_id, challenge_codes.created) AS s3
)
WHERE "S9_Unlock" = 1
AND s3.unl IS NOT NULL
GROUP BY s3.user_id) AS "S9_Unlock" ON "S9_Unlock".user_id = master.userid
In your query you have two sub-queries that are identical; this screams for a CTE.
In the sub-query on licensing issues you can filter out the valid licenses after the GROUP BY clause using a HAVING clause. Make that a WITH QUERY too and you end up with the rather more readable:
WITH inv AS (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON ca.invoice_id = inv.id
LEFT JOIN shop_db.cart_items AS ci ON ci.cart_id = ca.id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
), s3 AS (
SELECT u.user_id, min(cc.created) AS first_unlocked, bundle_article_code
FROM licensing_db.serial_numbers AS sn
LEFT JOIN licensing_db.licenses AS lic ON lic.id = sn.license_id
LEFT JOIN user_db.users AS u ON u.id = lic.user_id
LEFT JOIN licensing_db.challenge_codes AS cc ON cc.serial_number_id = sn.id
WHERE u.user_id IN (SELECT userid FROM inv)
GROUP BY u.user_id, bundle_article_code
HAVING bundle_article_code LIKE 'BuSu90%'
AND first_unlocked IS NOT NULL
)
SELECT userid, date_trunc('day', first_unlocked) AS "S9_Unlock"
FROM inv
LEFT JOIN s3 ON s3.user_id = inv.userid;
So the main query is now reduced to 3 lines and both the WITH-QUERY's perform a logically self-contained query of the database. The other sub-queries you refer to can similarly become a WITH-QUERY and then you assemble them in the main query. Remember that you can refer to earlier named queries in the list of with-queries, as is shown above with inv being referred to by s3. While such CTE's are syntactically not providing new functionality (except for the RECURSIVE variant), they do make complex queries much more readable and therefore easier to maintain.
Another approach would be to factor out logical sub-components (such as the inv sub-query) and make a VIEW out of those. Then you can simply reference the view in the main query. Making the whole thing a view is probably also a good idea if you want to make the query more flexible. What if you want to query for Suite9.1 ('BuSu91%') on 27 March 2014? Taken those literals out and then using them as WHERE clauses in a view makes your query more versatile; this can be either with sub-queries or with the complete CTE.
(Please check if the semantics are still right in the s3 with-query because without your table structures and sample data I ccannot test my code above.)
Instead of solving your problem as one big monolithic relational sql query, I would seriously consider going the "procedural" way, by using the built-in "plpgsql" language of postgresql. This could bring a lot of clarity in your application.

Linq left outer group by, then left outer the group

I've this query that i'm trying to put as linq:
select *
from stuff
inner join stuffowner so on so.stuffID = stuff.stuffID
left outer join (select min(loanId) as loanId, stuffownerId from loan
where userid = 1 and status <> 2 group by stuffownerId) t on t.stuffownerid = so.stuffownerid
left outer join loan on t.LoanId = loan.LoanId
when this is done, I would like to do a linq Group by to have Stuff has key, then stuffowners + Loan as value.
I can't seem to get to a nice query without sub query (hence the double left outer).
So basically what my query does, is for each stuff I've in my database, bring the owners, and then i want to bring the first loan a user has made on that stuff.
I've tried various linq:
from stuff in Stuffs
join so in StuffOwners on stuff.StuffId equals so.StuffId
join tLoan in Loans on so.StuffOwnerId equals tLoan.StuffOwnerId into tmpJoin
from tTmpJoin in tmpJoin.DefaultIfEmpty()
group tTmpJoin by new {stuff} into grouped
select new {grouped, fluk = (int?)grouped.Max(w=> w.Status )}
This is not good because if I don't get stuff owner and on top of that it seems to generate a lot of queries (LinqPad)
from stuff in Stuffs
join so in StuffOwners on stuff.StuffId equals so.StuffId
join tmpLoan in
(from tLoan in Loans group tLoan by tLoan.StuffOwnerId into g
select new {StuffOwnerId = g.Key, loanid = (from t2 in g select t2.LoanId).Max()})
on so.StuffOwnerId equals tmpLoan.StuffOwnerId
into tmptmp from tMaxLoan in tmptmp.DefaultIfEmpty()
select new {stuff, so, tmptmp}
Seems to generate a lot of subqueries as well.
I've tried the let keyworkd with:
from tstuffOwner in StuffOwners
let tloan = Loans.Where(p2 => tstuffOwner.StuffOwnerId == p2.StuffOwnerId).FirstOrDefault()
select new { qsdq = tstuffOwner, qsdsq= (int?) tloan.Status, kwk= (int?) tloan.UserId, kiwk= tloan.ReturnDate }
but the more info i get from tLoan, the longer the query gets with more subqueries
What would be the best way to achieve this?
Thanks

Best way to exclude records from multiple tables

I got the following tables (just an example): vehicles, vehicle_descriptions, vehicle_parts
vehicles have 1 to many with vehicle_descriptions and vehicle_parts. There may not be a corresponding vehicle_description/part for a given vehicle.
SELECT * FROM vehicles
LEFT OUTER JOIN vehicles d ON vehicles.vin = d.vin AND d.summary NOT LIKE 'honda'
LEFT OUTER JOIN
(SELECT SUM(desc_total) FROM vehicle_descriptions WHERE NOT LIKE desc 'honda' GROUP BY vin) b
ON vehicles.vin = vehicle_b.vin
LEFT OUTER JOIN
(SELECT SUM(part_count) FROM vehicle_parts WHERE part_for NOT LIKE 'honda' GROUP BY vin) c ON vehicles.vin = c.vin
If either vehicle_desc, vehicles, or part contains the exclusion term, the whole record should not show up in the result set. The query above will return a record even if one of the tables contain the exclusion term Honda. How would I fix the above query?
You're not using any of the information in either sum() as part of what you show, just to decide whether to include the vehicle. And you're doing an unnecessary self join in your first clause. Generally in situations like this, the "exists" and "not exists" clauses work well. So what about this? I'll use Oracle syntax, you can convert to ANSI of course.
SELECT * FROM vehicles v where summary <> 'honda'
and not exists (select 1 from vehicle_descriptions d where d.vin = v.vin and d.desc <> 'honda')
and not exists (select 1 from vehicle_parts p where p.vin = v.vin and p.part_for <> 'honda')

Resources