I have this query that takes about 5 minutes to return a result set and I can't figure out a better way to do it. The table in question has about 15 or 20 million rows at all times and its schema can be summarized as
create table conversation(
id raw, -- GUID
vendor varchar,
snumber number
rcvdate date
)
where we store messages sent from or to a vendor, each message has a sequence number shared by all the conversation (set of related messages). The problem comes because the vendor can have a parent and the message can have the parent's code (we can assume we know both the code of the vendor and their parent's code at the time of the query). Suppose that A and B are 2 vendors with a common parent P, the table might look like
Vendor snumber date
------------------------------
A 1 01-JAN-2012
P 1 02-JAN-2012
A 1 02-JAN-2012
A 2 03-JAN-2012
P 2 03-JAN-2012
B 3 03-JAN-2012
P 3 04-JAN-2012
A 2 04-JAN-2012
We need to query the last N messages from/to A and get the messages with vendor=A OR (vendor=P and another record with vendor=A and same snumber), that is:
Vendor snumber date
------------------------------
A 1 01-JAN-2012
P 1 02-JAN-2012
A 1 02-JAN-2012
A 2 03-JAN-2012
P 2 03-JAN-2012
A 2 04-JAN-2012
What I did was to store the conversations to/from A in a temporary table T(id, snumber) and then return
select * from (
select * from conversations c
where
exists (select id from T where T.id = C.id) or
( c.vendor=l_parent and exists (select snumber from T where T.snumber=c.snumber )
) where rownum <= l_N
those 2 subqueries are killing the performance. The conversations table has indexes in all the columns I included in this example.
I'm thinking the has to a be a clever way to group this information without having to use temporary tables or subqueries but I can't think of one. Any help will be appreciated.
it sounds like you want something like this:
SQL> select vendor, snumber, rcvdate
2 from (select vendor, snumber, rcvdate,
3 max(case when vendor = 'A' then 'Y' end) over (partition by snumber) has_vendor
4 from conversation
5 where vendor in ( 'A', 'P' )
6 order by rcvdate desc)
7 where has_vendor = 'Y'
8 and rownum <= 100
9 order by rcvdate;
V SNUMBER RCVDATE
- ---------- --------------------
A 1 01-jan-2012 00:00:00
P 1 02-jan-2012 00:00:00
A 1 02-jan-2012 00:00:00
P 2 03-jan-2012 00:00:00
A 2 03-jan-2012 00:00:00
A 2 04-jan-2012 00:00:00
i.e.
max(case when vendor = 'A' then 'Y' end) over (partition by snumber) has_vendor
says if there's an 'A' vendor in that snumber, then you want to return that for the "P"arent vendor otherwise not.
Related
I have database like this
(Example)
client_id
photo_type
date
1
license
13.10.2022
1
ident
12.10.2022
2
ident
15.10.2022
2
license
14.10.2022
3
license
15.10.2022
4
ident
16.10.2022
Where client has two types of photos, and i need to delete 1 type of photo(license or ident) by the date column(the oldest one). For example for client_id 1 i need to delete "ident", and for client 2 delete "license"
I need to use this process for a large amount of data
Please could you provide sollution for this process.
DELETE is such a costy operation especially for large data sets. Rather using a CreateTableAS statement might be preferred to quickly manage the same issue such as
CREATE TABLE t1 AS
SELECT client_id, photo_type, "date"
FROM (SELECT t.*, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY "date" DESC) AS rn
FROM t)
WHERE rn = 1
where the query only picks single row per each client_id even if ties occur for "date" values.
Demo
Here's one option: find rowids for rows you'd want to delete, and then ... well, delete them:
Sample data:
SQL> SELECT *
2 FROM test
3 ORDER BY client_id, datum;
CLIENT_ID PHOTO_T DATUM
---------- ------- ----------
1 ident 12.10.2022
1 license 13.10.2022
2 license 14.10.2022
2 ident 15.10.2022
3 license 15.10.2022
4 ident 16.10.2022
6 rows selected.
Delete:
SQL> DELETE FROM
2 test a
3 WHERE a.ROWID IN
4 (SELECT x.rid
5 FROM (SELECT ROWID rid,
6 b.client_id,
7 ROW_NUMBER ()
8 OVER (PARTITION BY b.client_id
9 ORDER BY b.datum DESC) rn
10 FROM test b) x
11 WHERE x.client_id = a.client_id
12 AND x.rn > 1);
2 rows deleted.
Result:
SQL> SELECT *
2 FROM test
3 ORDER BY client_id, datum;
CLIENT_ID PHOTO_T DATUM
---------- ------- ----------
1 license 13.10.2022
2 ident 15.10.2022
3 license 15.10.2022
4 ident 16.10.2022
SQL>
What about this?
DELETE FROM table a
WHERE EXISTS (
SELECT 1 FROM table b
WHERE b.client_id = a.client_id
AND b.date > a.date
)
I'm new to PL SQL and have to write a function, which has customer_id as an input and has to output a product_name of the best selling product for that customer_id.
The schema looks like this:
I found a lot of simple examples where it includes two tables, but I can't seem to find one where you have to do multiple joins and use a function, while selecting only the best selling product.
I could paste a lot of very bad code here and how I tried to approach this, but this seems to be a bit over my head for current knowledge, since I've been learning PL SQL for less than 3 days now and got this task.
With some sample data (minimal column set):
SQL> select * from products order by product_id;
PRODUCT_ID PRODUCT_NAME
---------- ----------------
1 BMW
2 Audi
SQL> select * From order_items;
PRODUCT_ID CUSTOM QUANTITY UNIT_PRICE
---------- ------ ---------- ----------
1 Little 100 1
1 Little 200 2
2 Foot 300 3
If we check some totals:
SQL> select o.product_id,
2 o.customer_id,
3 sum(o.quantity * o.unit_price) total
4 from order_items o
5 group by o.product_id, o.customer_id;
PRODUCT_ID CUSTOM TOTAL
---------- ------ ----------
2 Little 400
1 Little 100
2 Foot 900
SQL>
It says that
for customer Little, product 2 was sold with total = 400 - that's our choice for Little
for customer Little, product 1 was sold with total = 100
for customer Foot, product 2 was sold with total = 900 - that's our choice for Foot
Query might then look like this:
temp CTE calculates totals per each customer
rank_them CTE ranks them in descending order per each customer; row_number so that you get only one product, even if there are ties
finally, select the one that ranks as the highest
SQL> with
2 temp as
3 (select o.product_id,
4 o.customer_id,
5 sum(o.quantity * o.unit_price) total
6 from order_items o
7 group by o.product_id, o.customer_id
8 ),
9 rank_them as
10 (select t.customer_id,
11 t.product_id,
12 row_number() over (partition by t.customer_id order by t.total desc) rn
13 from temp t
14 )
15 select * From rank_them;
CUSTOM PRODUCT_ID RN
------ ---------- ----------
Foot 2 1 --> for Foot, product 2 ranks as the highest
Little 2 1 --> for Little, product 1 ranks as the highest
Little 1 2
SQL>
Moved to a function:
SQL> create or replace function f_product (par_customer_id in order_items.customer_id%type)
2 return products.product_name%type
3 is
4 retval products.product_name%type;
5 begin
6 with
7 temp as
8 (select o.product_id,
9 o.customer_id,
10 sum(o.quantity * o.unit_price) total
11 from order_items o
12 group by o.product_id, o.customer_id
13 ),
14 rank_them as
15 (select t.customer_id,
16 t.product_id,
17 row_number() over (partition by t.customer_id order by t.total desc) rn
18 from temp t
19 )
20 select p.product_name
21 into retval
22 from rank_them r join products p on p.product_id = r.product_id
23 where r.customer_id = par_customer_id
24 and r.rn = 1;
25
26 return retval;
27 end;
28 /
Function created.
SQL>
Testing:
SQL> select f_product ('Little') result from dual;
RESULT
--------------------------------------------------------------------------------
Audi
SQL> select f_product ('Foot') result from dual;
RESULT
--------------------------------------------------------------------------------
Audi
SQL>
Now, you can improve it so that you'd care about no data found issue (when customer didn't buy anything), ties (but you'd then return a collection or a refcursor instead of a scalar value) etc.
[EDIT] I'm sorry, ORDERS table has to be included into the temp CTE; your data model is correct, you don't have to do anything about it - my query was wrong (small screen + late hours issue; not a real excuse, just saying).
So:
with
temp as
(select i.product_id,
o.customer_id,
sum(i.quantity * i.unit_price) total
from order_items i join orders o on o.order_id = i.order_id
group by i.product_id, o.customer_id
),
The rest of my code is - otherwise - unmodified.
I have two Tables with a foreign key from t1.ID to T2.T_ID
T1:
ID
PR_ID
Version
1
1
1
2
2
1
3
2
2
4
3
1
5
3
2
6
4
1
T2:
ID
T_ID
ab_nr
1
1
56
2
2
3
3
3
76
4
4
4
5
5
87
6
6
64
I need a select which gets all T2.IDs with the highest T1.Version. For example T1.PR_ID has the Numbers 2 and 3 with different Versions, here i would only need as end Result the T1.ID 's 1,3,5 and 6.
I tried it with:
SELECT * FROM T2
JOIN T1 ON T1.ID = T2.T_ID
WHERE T1.Version IN (SELECT MAX(VERSION) FROM T1);
but this doesnt work because it only gets the Number 2 and nothing else.
There's always a many ways to skin a SQL cat, but here's a simple one.
SELECT t2.*
FROM t1
INNER JOIN t2 ON t2.t_id = t1.id
WHERE NOT EXISTS ( SELECT 'higher version for the same PR_ID'
FROM t1 t1x
WHERE t1x.pr_id = t1.pr_id
AND t1x.version > t1.version )
That is, add a NOT EXISTS condition to filter out any results that are for old versions.
The way you tried to do it was on the right track, but you just needed to correlate your MAX(VERSION) subquery so that it got the max version for the current PR_ID. Like this:
SELECT * FROM T2
JOIN T1 ON T1.ID = T2.T_ID
WHERE T1.Version IN (SELECT MAX(VERSION) FROM T1X
-- You missed this part, below
WHERE T1X.PR_ID = T1.PR_ID
);
Anyway, try either of these. If performance is not good, we can start looking at more efficient ways of doing it (e.g., MAX ... KEEP)
There are two table as below
Table1
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 Y 100
3 A 2 Y 100
4 B 3 Y 200
5 B 3 Y 200
Table2
T2ID CID
---------
10 1
20 1
30 1
40 2
50 2
60 3
70 3
80 3
90 4
100 5
110 5
I am trying to inactivate the duplicate record of table 1 and reassign the table2 record to activated rows of table 1,The result for table1 and table2 should be as below
ID Name Age Active PID
-----------------------------
1 A 2 Y 100
2 A 2 N 100
3 A 2 N 100
4 B 3 N 200
5 B 3 Y 200
T2ID CID
---------
10 1
20 1
30 1
40 1
50 1
60 1
70 1
80 1
90 5
100 5
110 5
please help for oracle query to update
You can do this by using two merge statements, like so:
Update table2:
MERGE INTO table2 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT t2.t2id,
t2.cid old_cid,
t1.min_id new_cid
FROM t1
INNER JOIN table2 t2 ON t1.id = t2.cid
WHERE t1.multi_active_rows = 'Y') src
ON (tgt.t2id = src.t2id)
WHEN MATCHED THEN
UPDATE SET tgt.cid = src.new_cid;
Update table1:
MERGE INTO table1 tgt
USING (WITH t1 AS (SELECT ID,
NAME,
age,
active,
pid,
MIN(ID) OVER (PARTITION BY pid) min_id,
CASE WHEN COUNT(CASE WHEN active = 'Y' THEN 1 END) OVER (PARTITION BY pid) > 1 THEN 'Y' ELSE 'N' END multi_active_rows
FROM table1)
SELECT ID
FROM t1
WHERE multi_active_rows = 'Y'
AND ID != min_id) src
ON (tgt.id = src.id)
WHEN MATCHED THEN
UPDATE SET active = 'N';
Since we want to derive the results to update both table1 and table2 from the original dataset in table1, it's easier to update table2 first before updating table1.
This works by finding the lowest id across each set of pids in table1, plus checking to see if there is more than one active row for each pid (there's no need to do any updates if we have at most one active row available).
Once we have that information, we can use that to decide which rows to update in each table, and we can use the min_id to update table2 with, and we can update any rows in table1 where the id doesn't match the min_id to be not active.
N.B. If you could have a mix of Ys and Ns in your data, you may need to skip the and id != min_id check in the second merge statement and amend the update part to update the row to Y if the id is the min_id, otherwise set it to N.
I have a problem which can be handled by a recursive CTE, but not within an acceptable period of time. Can anyone point me at ways to improve the performance and/or get the same result a different way?
Here's my scenario!
I have : A large table which contains in each row an id, a start date, an end date, and a ranking number. There are multiple rows for each id and the date ranges often overlap. Dates are from 2010 onward.
I want: A table which contains a row for each combination of id + date which falls inside any date range for that id from the previous table. Each row should have the lowest ranking number for that id and day.
Eg:
ID Rank Range
1 1 1/1/2010-1/4/2010
1 2 1/2/2010-1/5/2010
2 1 1/1/2010-1/2/2010
becomes
ID Rank Day
1 1 1/1/2010
1 1 1/2/2010
1 1 1/3/2010
1 1 1/4/2010
1 2 1/5/2010
2 1 1/1/2010
2 1 1/2/2010
I can do this with a recursive CTE, but the performance is terrible (20-25 minutes for a relatively small data set which produces a final table with 31 million rows):
with enc(PersonID, EncounterDate, EndDate, Type_Rank) as (
select PersonID, EncounterDate, EndDate, Type_Rank
from Big_Base_Table
union all
select PersonID, EncounterDate + 1, EndDate, Type_Rank
from enc
where EncounterDate + 1 <= EndDate
)
select PersonID, EncounterDate, min(Type_Rank) Type_Rank
from enc
group by PersonID, EncounterDate
;
You could extract all possible dates from the table once in a CTE, and then join that back to the table:
with all_dates (day) as (
select start_date + level - 1
from (
select min(start_date) as start_date, max(end_date) as end_date
from big_base_table
)
connect by level <= end_date - start_date + 1
)
select bbt.id, min(bbt.type_rank) as type_rank, to_char(ad.day, 'YYYY-MM-DD') as day
from all_dates ad
join big_base_table bbt
on bbt.start_date <= ad.day
and bbt.end_date >= ad.day
group by bbt.id, ad.day
order by bbt.id, ad.day;
ID TYPE_RANK DAY
---------- ---------- ----------
1 1 2010-01-01
1 1 2010-01-02
1 1 2010-01-03
1 1 2010-01-04
1 2 2010-01-05
2 1 2010-01-01
2 1 2010-01-02
7 rows selected.
The CTE gets all dates from the lowest for any ID, up to the highest for any ID. You could also use a static calendar table for that if you have one, to save hitting the table twice (and getting min/max at the same time is slow in some versions at least).
You could also write it the other way round, as:
...
from big_base_table bbt
join all_dates ad
on ad.day >= bbt.start_date
and ad.day <= bbt.end_date
...
but I think the optimisier will probably end up treating them the same, with a single full scan of your base table; worth checking the plan it actually comes up with for both though, and if one is more efficnet that the other.