Is it possible to add distinct to part of a sum clause in Oracle? - oracle

I have a pretty lengthy SQL query which I'm going to run on Oracle via hibernate. It consists of two nested selects. In the first select statement, a number of sums are calculated, but in one of them I want to filter the results using unique ids.
SELECT ...
SUM(NVL(CASE WHEN SECOND_STATUS= 50 OR SECOND_STATUS IS NULL THEN RECEIVE_AMOUNT END, 0) +
NVL(CASE WHEN FIRST_STATUS = 1010 THEN AMOUNT END, 0) +
NVL(CASE WHEN FIRST_STATUS = 1030 THEN AMOUNT END, 0) -
NVL(CASE WHEN FIRST_STATUS = 1010 AND (SECOND_STATUS= 50 OR SECOND_STATUS IS NULL) THEN RECEIVE_AMOUNT END, 0)) TOTAL, ...
And at the end:
... FROM (SELECT s.*, p.* FROM FIRST_TABLE s
JOIN SECOND_TABLE p ON s.ID = p.FIRST_ID
In one of the lines that start with NVL (second line actually), I want to add a distinct clause that sums the amounts only if first table ids are unique. But I don't know if this is possible or not. If yes, how would it be?

Assume such setup
select * from first;
ID AMOUNT
---------- ----------
1 10
2 20
select * from second;
SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ----------
1 1 100
2 1 100
3 2 100
After the join you get the total sum of both amounts too high because the amount from the first table is duplicated.
select *
from first
join second on first.id = second.first_id;
ID AMOUNT SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ---------- ---------- ----------
1 10 1 1 100
1 10 2 1 100
2 20 3 2 100
You must add a row_number that identifies the first occurence in the parent table and consider in the AMOUNT only the first row and resets it to NULL in the duplicated rows.
select ID,
case when row_number() over (partition by id order by second_id) = 1 then AMOUNT end as AMOUNT,
SECOND_ID, FIRST_ID, AMOUNT2
from first
join second on first.id = second.first_id;
ID AMOUNT SECOND_ID FIRST_ID AMOUNT2
---------- ---------- ---------- ---------- ----------
1 10 1 1 100
1 2 1 100
2 20 3 2 100
Now you can safely sum in a separate subquery
with tab as (
select ID,
case when row_number() over (partition by id order by second_id) = 1 then AMOUNT end as AMOUNT,
SECOND_ID, FIRST_ID, AMOUNT2
from first
join second on first.id = second.first_id
)
select id, sum(nvl(amount,0) + nvl(amount2,0))
from tab
group by id
;
ID SUM(NVL(AMOUNT,0)+NVL(AMOUNT2,0))
---------- ---------------------------------
1 210
2 120
Note also that this is an answer to your question. Some will argue that in your situation you should first aggregate and than join. This will also resolve your problem possible more elegantly.

Related

Need output in below format in oracle. I have tried with lag function it worked if we have only 2 rows in table

Consider below table table.
Id balance
1 100
2 500
3 4000
I need output in below format.
Id balance begin_bal end_bal
1 100 0 100
2 500 100 600
3 4000 600 4600
A little bit of analytics, as you presumed:
SQL> with test (id, balance) as
2 (select 1, 100 from dual union all
3 select 2, 500 from dual union all
4 select 3, 4000 from dual
5 ),
6 temp as
7 (select id, balance, sum(balance) over (order by id) rsum
8 from test
9 )
10 select id,
11 balance,
12 nvl(lag(rsum) over (order by id), 0) begin_bal,
13 rsum end_bal
14 from temp
15 order by id;
ID BALANCE BEGIN_BAL END_BAL
---------- ---------- ---------- ----------
1 100 0 100
2 500 100 600
3 4000 600 4600
SQL>

Oracle: prioritizing results based on column’s value

I have a data-set in which there are duplicate IDs in the first column. I'm hoping to obtain a single row of data for each ID based on the second column's value. The data looks like so:
ID Info_Source Prior?
A 1 Y
A 3 N
A 2 Y
B 1 N
B 1 N
B 2 Y
C 2 N
C 3 Y
C 1 N
Specifically the criteria would call for prioritizing based on the second column's value (3 highest priority; then 1; and lastly 2): if the 'Info_Source' column has a value of 3, return that row; if there is no 3 in the second column for a given ID, look for a 1 and if found return that row; and finally if there is no 3 or 1 associated with the ID, search for 2 and return that row for the ID.
The desired results would be a single row for each ID, and the resulting data would be:
ID Info_Source Prior?
A 3 N
B 1 N
C 3 Y
row_number() over() usually solves these needs nicely and efficiently e.g.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id order by Info_source DESC) as rn
)
where rn = 1
For prioritizing the second column's value (3 ; then 1, then 2) use a case expression to alter the raw value into an order that you need.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id
order by case when Info_source = 3 then 3
when Infor_source = 1 then 2
else 1 end DESC) as rn
)
where rn = 1

How to speed up EDIT_DISTANCE and Insert Query?

---------------
MASTER TABLE
---------------
DATA_KEY NUMBER
TEXT VARCHAR2(2000)
ORDER_NO NUMBER
---------------
DETAIL TABLE
---------------
DATA_KEY NUMBER
SIMILAR_DATA_KEY NUMBER
DISTANCE_COUNT NUMBER
---------------
INSERT QUERY
---------------
INSERT INTO DETAIL
(
SELECT DATA_KEY, SIMILAR_DATA_KEY, DISTANCE_COUNT
FROM
(
SELECT A.DATA_KEY AS DATA_KEY, B.DATA_KEY AS SIMILAR_DATA_KEY,
UTL_MATCH.EDIT_DISTANCE(A.TEXT, B.TEXT) AS DISTANCE_COUNT
FROM
(SELECT DATA_KEY, TEXT, ORDER_NO FROM MASTER) A
INNER JOIN
(SELECT DATA_KEY, TEXT, ORDER_NO FROM MASTER) B
ON (A.ORDER_NO < B.ORDER_NO)
)
WHERE DISTANCE_COUNT <= 5
)
I need compare MASTER table TEXT field with other TEXT field.
indexes are not exist.
master table 90,000 rows.
ORDER_NO field is for avoid duplicated compare. (1 .. 90000)
=============================================================
A.ORDER_NO < B.ORDER_NO
------------------------------------------
1, 1 <- exclude
1, 2 <- join
1, 3 <- join
1, 4 <- join
..
2, 1 <- exclude
2, 2 <- exclude
2, 3 <- join
2, 4 <- join
...
3, 1 <- exclude
3, 2 <- exclude
3, 3 <- exclude
3, 4 <- join
1. NOT need compare 1 and 1
2. Need compare 1 and 2
3. NOT need compare 2 and 1 (because, duplicate 2.)
so, for decrease compare count...
=============================================================
Slow zone is (WHERE DISTANCE_COUNT <= 5) ?
Slow zone is comparing rows (90000*89999/2) ?
Query elapse time is 7 days.
6,000 rows inserted to DETAIL table.
How to speed up?
I'm sorry for poor English...

CONNECT BY for two tables with two JOINS

I have 3 tables:
two with hierarchical structures
(like "dimensions" of recursive type of hierarchy);
one with summing data (like "facts" with X column).
They are here:
DIM1 (ID1, PARENT2, NAME1)
DIM2 (ID2, PARENT2, NAME2)
FACTS (ID1, ID2, X)
Example of DIM1 table:
-- 1 0 DIM1
---- 2 1 DIM1-A
------ 3 2 DIM1-A-A
-------- 4 3 DIM1-A-A-A
-------- 5 3 DIM1-A-A-B
------ 6 2 DIM1-A-B
-------- 7 6 DIM1-A-B-A
-------- 8 6 DIM1-A-B-B
------ 9 2 DIM1-A-C
---- 10 1 DIM1-B
------ 11 10 DIM1-B-C
------ 12 10 DIM1-B-D
---- 13 1 DIM1-C
Example of DIM2 table:
-- 1 0 DIM2
---- 2 1 DIM2-A
------ 3 2 DIM2-A-A
-------- 4 3 DIM2-A-A-A
-------- 5 3 DIM2-A-A-B
-------- 6 3 DIM2-A-B-C
------ 7 2 DIM2-A-B
---- 8 1 DIM2-B
---- 9 1 DIM2-C
Example of FACTS table:
1 1 100
1 2 30
1 3 500
-- ................
13 9 200
And I would like to create the only SELECT where I will specify the parent for DIM1 (for example ID1=2 for DIM1-A) and parent for DIM2 (for example ID2=2 for DIM2-A) and SELECT will generate a report like this:
Name_of_1 Name_of_2 Sum_of_X
--------- --------- ----------
DIM1-A-A DIM2-A-A (some sum)
DIM1-A-A DIM2-A-B (some sum)
DIM1-A-B DIM2-A-A (some sum)
DIM1-A-B DIM2-A-B (some sum)
DIM1-A-C DIM2-A-A (some sum)
DIM1-A-C DIM2-A-B (some sum)
I would like to use CONNECT BY phrase, START WITH phrase, SUM phrase, GROUP BY phrase, and OUTER or INNER (?) JOIN. I need no other extensions of Oracle 10.2.
In other words: only with "classic" SQL and
only Oracle extensions for hierarchy queries.
Is it possible?
I tried some experiments with question in
Mixing together Connect by, inner join and sum with Oracle
(where is a very nice solution but only for one
dimension table ("Tasks"), but I need to JOIN two dimension tables to one facts table), but I was not successful.
"Some sum" is not very descriptive, so I don't see why do you need CONNECT BY at all.
SELECT dim1.name, dim2.name, x
FROM (
SELECT id1, id2, SUM(x) AS x
FROM facts
GROUP BY
id1, id2
) f
JOIN dim1
ON dim1.id = f.id1
JOIN dim2
ON dim2.id = f.id2
I think what you're trying to do is get the sum of the value in the facts table for all of the children of the specified rows grouped by the topmost children. This would mean that in your example above, the results for the first row would be the sum any intersections of (DIM1-A-A, DIM1-A-A-A, DIM1-A-A-B) and (DIM2-A-A, DIM2-A-A-A, DIM2-A-A-B, DIM3-A-A-C) found in the FACTS table. With that assumption, I have come to the following solution:
SELECT root_name1, root_name2, SUM(X)
FROM ( SELECT CONNECT_BY_ROOT(name1) AS root_name,
id1
FROM dim1
CONNECT BY parent1 = PRIOR id1
START WITH parent1 = 2) d1
CROSS JOIN
( SELECT CONNECT_BY_ROOT(name2) AS root_name,
id2
FROM dim2
CONNECT BY parent2 = PRIOR id2
START WITH parent2 = 2) d2
LEFT OUTER JOIN
facts
ON d1.id1 = facts.id1
AND d2.id2 = facts.id2
GROUP BY root_name1, root_name2
(This also assumes that the columns of FACTS are named ID1, ID2, and X.)

Oracle Calculation Involving Results of Another Calculation

First off, I'm a total Oracle noob although I'm very familiar with SQL. I have a single cost column. I need to calculate the total cost, the percentage of the total cost, and then a running sum of the percentages. I'm having trouble with the running sum of percentages because the only way I can think to do this uses nested SUM functions, which isn't allowed.
Here's what works:
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
FROM my_table
ORDER BY cost DESC
Here's what I'm trying to do that doesn't work:
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per,
SUM(cost/SUM(cost) OVER()) OVER(cost) AS per_sum
FROM my_table
ORDER BY cost DESC
Am I just going about it wrong, or is what I'm trying to do just not possible? By the way I'm using Oracle 10g. Thanks in advance for any help.
You don't need the order by inside that inline view, especially since the outer select is doing an order by the order way around. Also, cost / SUM(cost) OVER () equals RATIO_TO_REPORT(cost) OVER ().
An example:
SQL> create table my_table(cost)
2 as
3 select 10 from dual union all
4 select 20 from dual union all
5 select 5 from dual union all
6 select 50 from dual union all
7 select 60 from dual union all
8 select 40 from dual union all
9 select 15 from dual
10 /
Table created.
Your initial query:
SQL> SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
2 FROM my_table
3 ORDER BY cost DESC
4 /
COST TOTAL PER
---------- ---------- ----------
60 200 .3
50 200 .25
40 200 .2
20 200 .1
15 200 .075
10 200 .05
5 200 .025
7 rows selected.
Quassnoi's query contains a typo:
SQL> SELECT cost, total, per, SUM(running) OVER (ORDER BY cost)
2 FROM (
3 SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
4 FROM my_table
5 ORDER BY
6 cost DESC
7 )
8 /
SELECT cost, total, per, SUM(running) OVER (ORDER BY cost)
*
ERROR at line 1:
ORA-00904: "RUNNING": invalid identifier
And if I correct that typo. It gives the right results, but wrongly sorted (I guess):
SQL> SELECT cost, total, per, SUM(per) OVER (ORDER BY cost)
2 FROM (
3 SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
4 FROM my_table
5 ORDER BY
6 cost DESC
7 )
8 /
COST TOTAL PER SUM(PER)OVER(ORDERBYCOST)
---------- ---------- ---------- -------------------------
5 200 .025 .025
10 200 .05 .075
15 200 .075 .15
20 200 .1 .25
40 200 .2 .45
50 200 .25 .7
60 200 .3 1
7 rows selected.
I think this is the one you are looking for:
SQL> select cost
2 , total
3 , per
4 , sum(per) over (order by cost desc)
5 from ( select cost
6 , sum(cost) over () total
7 , ratio_to_report(cost) over () per
8 from my_table
9 )
10 order by cost desc
11 /
COST TOTAL PER SUM(PER)OVER(ORDERBYCOSTDESC)
---------- ---------- ---------- -----------------------------
60 200 .3 .3
50 200 .25 .55
40 200 .2 .75
20 200 .1 .85
15 200 .075 .925
10 200 .05 .975
5 200 .025 1
7 rows selected.
Regards,
Rob.
SELECT cost, total, per, SUM(per) OVER (ORDER BY cost)
FROM (
SELECT cost, SUM(cost) OVER() AS total, cost / SUM(cost) OVER() AS per
FROM my_table
)
ORDER BY
cost DESC

Resources