how to loop through each row of every group (while doing "group by") in Oracle table - oracle

I have a table like this:
I want to group by the table base on "customer_id" column and calculate "Day-day[0]" column. "Day-day[0]" is "Day" field in every group and "day[0]" is first row of the day in the group. At the same time, I have to calculate total risk which is in following:
This is the table after grouping by:
This is total risk formula:
In fact, I have to loop through each row of every group to calculate total risk.
My sample table is like this:
CREATE TABLE risk_test
(id VARCHAR2 (32) NOT NULL PRIMARY KEY,
customer_id varchar2 (40BYTE),
risk number,
day VARCHAR2(50 BYTE))
insert into risk_test values(1,102,15,1);
insert into risk_test values(2,102,16,1);
insert into risk_test values(3,104,11,1);
insert into risk_test values(4,102,17,2);
insert into risk_test values(5,102,10,2);
insert into risk_test values(6,102,13,3);
insert into risk_test values(7,104,14,2);
insert into risk_test values(8,104,13,2);
insert into risk_test values(9,104,17,1);
insert into risk_test values(10,104,16,2);
The sample answer is like this:
Would you please guide me how I can do this scenario in Oracle database?
Any help is really appreciated.

Using the sample data that was provided, I believe this query should calculate the risks properly:
Query
SELECT o.*,
ROUND (
SUM (day_minus_day0 * risk) OVER (PARTITION BY customer_id)
/ SUM (day_minus_day0) OVER (PARTITION BY customer_id),
5) AS total_risk
FROM (SELECT rt.*, (rt.day - MIN (rt.day) OVER (PARTITION BY customer_id)) + 1 AS day_minus_day0
FROM risk_test rt) o
ORDER BY customer_id, TO_NUMBER (day), TO_NUMBER (id);
Result
ID CUSTOMER_ID RISK DAY DAY_MINUS_DAY0 TOTAL_RISK
_____ ______________ _______ ______ _________________ _____________
1 102 15 1 1 13.77778
2 102 16 1 1 13.77778
4 102 17 2 2 13.77778
5 102 10 2 2 13.77778
6 102 13 3 3 13.77778
3 104 11 1 1 14.25
9 104 17 1 1 14.25
7 104 14 2 2 14.25
8 104 13 2 2 14.25
10 104 16 2 2 14.25

Your total risk calculation just looks like a weighted average to me. That is, the average risk of the rows for each customer, weighted according to the day offset (day-day[0]), so that risks in later days count for more.
To compute that, you need a common table expression to 1st compute the day-weighted risk for each row. Then you can just compute the weighted average by dividing.
The query below illustrates the approach, with comments.
-- This first WITH clause is just sample data. In your database you would
-- get rid of this and replace all references to "input" with your actual
-- table name
with input ( customer_id, risk, day ) AS (
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 4 FROM DUAL
),
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
weighted_input AS (
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM input i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;
+-------------+------+-----+------------+-------------------+------------+
| CUSTOMER_ID | RISK | DAY | DAY_OFFSET | DAY_WEIGHTED_RISK | TOTAL_RISK |
+-------------+------+-----+------------+-------------------+------------+
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 3 | 3 | 300 | 100 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 4 | 4 | 800 | 200 |
+-------------+------+-----+------------+-------------------+------------+
For your database, having the actual table and not needing the input CTE, it would be:
WITH weighted_input AS (
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM my_table i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;

Related

SQL help to count number of locations for each item/branch

I'm a SQL rookie, and am having trouble wrapping my head around how to do the following. I have a table that contains item information by branch. Within a branch an item can be in multiple locations. The data I need to extract needs to include a column that provides the total number of locations (count) the item is associated with for a given branch.
Output would look something like this:
I'm guessing this is a sub query, but to be honest I'm not sure how to get started... order in which this is done (subquery group by first, then join, etc)
In purely logical terms:
SELECT
a.Branch,
a.Item,
a.Loc,
COUNT(a.Branch||a.Item) AS 'LocCount'
FROM BranchInventoryFile a
GROUP BY a.Branch,a.Item
You can tackle this by using Oracle's Count Analytical functions found here. Be sure to read up on WINDOW/Partitioning functions as this unlocks quite a bit of functionality in SQL.
SQL:
SELECT
a.BRANCH,
a.ITEM,
a.LOC,
COUNT(a.ITEM) OVER (PARTITION BY a.BRANCH, a.ITEM) AS LOC_COUNT
FROM
BRANCH a;
Result:
| BRANCH | ITEM | LOC | LOC_COUNT |
|--------|------|------|-----------|
| 100 | A | 1111 | 2 |
| 100 | A | 1112 | 2 |
| 200 | A | 2111 | 1 |
| 200 | B | 1212 | 2 |
| 200 | B | 1212 | 2 |
| 300 | A | 1222 | 1 |
SQL Fiddle:
Here
total number of locations (count) the item is associated with for a given branch
The way you described it, you should
remove location from query:
SQL> with branchinventoryfile (branch, item, location) as
2 (select 100, 'A', 1111 from dual union all
3 select 100, 'A', 1112 from dual union all
4 select 200, 'A', 2111 from dual
5 )
6 select branch,
7 item,
8 count(distinct location) cnt
9 from BranchInventoryFile
10 group by branch, item;
BRANCH I CNT
---------- - ----------
100 A 2
200 A 1
SQL>
if you leave location in select, you have to group by it (and get wrong result):
6 select branch,
7 item,
8 location,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item, location;
BRANCH I LOCATION CNT
---------- - ---------- ----------
100 A 1111 1
200 A 2111 1
100 A 1112 1
SQL>
or include locations, but aggregate them, e.g.
6 select branch,
7 item,
8 listagg(location, ', ') within group (order by null) loc,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item;
BRANCH I LOC CNT
---------- - -------------------- ----------
100 A 1111, 1112 2
200 A 2111 1
SQL>

why adding order by in the query changes the aggregate value?

Following vertica example from https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/SQLAnalytics/AnalyticFunctionsVersusAggregateFunctions.htm?tocpath=Analyzing%20Data%7CSQL%20Analytics%7C_____2
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;
If I run this query without order by, I get same count value for all rows
dbadmin#b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
6 | 20 | 6
7 | 20 | 6
8 | 20 | 6
9 | 20 | 6
10 | 20 | 6
11 | 20 | 6
1 | 10 | 2
4 | 10 | 2
2 | 30 | 3
3 | 30 | 3
5 | 30 | 3
(11 rows)
But if I add order by, I get incremental value
dbadmin#b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
2 | 30 | 1
3 | 30 | 2
5 | 30 | 3
1 | 10 | 1
4 | 10 | 2
6 | 20 | 1
7 | 20 | 2
8 | 20 | 3
9 | 20 | 4
10 | 20 | 5
11 | 20 | 6
(11 rows)
Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms
What is the affect of order by ? Why do I get incremental value?
If the window clause only contains PARTITION BY, it returns the total sum of the partition - for each row of the partition the same value.
If the window clause contains both PARTITION BY and ORDER BY, it returns the running count within the partition . So, using the ORDER BY expression, how many rows have been counted so far within the partition.
That's exactly how window functions work. They give you a whole world of possibilities ...
That happens because Vertica applies a default frame-clause which is defined as:
RANGE UNBOUNDED PRECEDING AND CURRENT ROW
So to get the result you want, you may want to add the frame clause as below after you ORDER BY in the OVER() clause:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
This behaviour is documented as:
If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.
Link to doc

display avg of the last column on the last row in SQL query

Hello I am having trouble trying to figure out this particular question. using oracle SQL developer.
trying to figure out how a query so that it will display exactly like the below table/picture.
the last row of this query to display word AVERAGE: and show the average (of all values in the in the sixth column) of the percentage above min selling price for all the sales made. and all the remaining column to display "--------"
Code ProductName Title ShopID SalePrice %SoldAbove Min.SellPrice
1 Martin Robot 1 $49000 15%
2
3
4
--- ------ ---- ---- AVERAGE: 16.5%
below is last row of the output i am looking for. But i have no clue on how to produce the
"--------" in the remaining columns let alone AVERAGE: and the average of all the values in the sixth column of the last row.
in summary, the last row of the output should show the average (in the sixth column) of the percentage
sold above the minimum selling price for all the sales.
Use ROLLUP:
SELECT DECODE( GROUPING( Code ), 1, '----', code ) AS code,
DECODE( GROUPING( Code ), 1, '----', MAX(col1) ) AS Col1,
DECODE( GROUPING( Code ), 1, '----', MAX(col2) ) AS Col2,
DECODE( GROUPING( Code ), 1, 'Average:', MAX(col3) ) AS Col3,
AVG( value )
FROM table_name
GROUP BY ROLLUP(Code);
Which, for the sample data:
CREATE TABLE table_name ( code, col1, col2, col3, value ) AS
SELECT 1, 'AAA', 1, 'AA1', 15.0 FROM DUAL UNION ALL
SELECT 2, 'BBB', 2, 'BB2', 17.5 FROM DUAL UNION ALL
SELECT 3, 'CCC', 3, 'CC3', 20.0 FROM DUAL;
Outputs:
CODE | COL1 | COL2 | COL3 | AVG(VALUE)
:--- | :--- | :--- | :------- | ---------:
1 | AAA | 1 | AA1 | 15
2 | BBB | 2 | BB2 | 17.5
3 | CCC | 3 | CC3 | 20
---- | ---- | ---- | Average: | 17.5
db<>fiddle here
This is a job for UNION ALL. A good way to get this sort of result:
SELECT * FROM (
SELECT COALESCE(whatever1, '------') whatever1
COALESCE(whatever2, '------') whatever2,
COALESCE(whatever3, '------') whatever3,
whatever4
FROM whatever
UNION ALL
SELECT NULL, NULL, NULL, AVG(whatever4) FROM whatever
) r
ORDER BY whatever1 NULLS LAST
The COALESCE function puts your ----- characters into the output.
You can also investigate GROUP BY WITH ROLLUP. It may do what you want.

use LAG with expression in oracle

I have a column (status) in a table that contain numbers and values are 1, 2 or 4.
I would like, in a SQL query, add a calculated column (bitStatus) that will store the bitwise oerator OR for the status column of the current line and the column in the previous line.
like so :
| id | status| bitStatus|
|----|-------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 4 | 7 |
| 4 | 1 | 7 |
So what I did is to use LAG function in oracle but I coudn't figure out how to do it as long as I want to create only on calculated column bitStatus
my query is like :
select id, status,
BITOR(LAG(bitStatus) OVER (ORDER BY 1), status)) AS bitStatus
But as you know, I can't use LAG(bitStatus) when calculating bitStatus.
So how could I make it the desired table.
Thanks in advance.
Would this help?
lines #1 - 6 represent sample data
the TEMP CTE is here to fetch LAG status value (to improve readability)
the final select does the BITOR operation as bitor(a, b) = a - bitand(a, b) + b
SQL> with test (id, status) as
2 (select 1, 1 from dual union all
3 select 2, 2 from dual union all
4 select 3, 1 from dual union all
5 select 4, 4 from dual
6 ),
7 temp as
8 (select id, status,
9 lag(status) over (order by id) lag_status
10 from test
11 )
12 select id,
13 status,
14 status - bitand(status, nvl(lag_status, status)) + nvl(lag_status, status) as bitstatus
15 from temp
16 order by id;
ID STATUS BITSTATUS
---------- ---------- ----------
1 1 1
2 2 3
3 1 3
4 4 5
SQL>

Is there an other solution calculate data by filtered date without "with" clause?

Need to calculate value when date< first_date by name with interval 3 day in BigQuery.
Example of data:
+------+------------+------------------+
| Name | date | order_id | value |
+------+------------+----------+-------+
| JONES| 2019-01-03 | 11 | 10 |
| JONES| 2019-01-05 | 12 | 5 |
| JONES| 2019-06-03 | 13 | 3 |
| JONES| 2019-07-03 | 14 | 20 |
| John | 2019-07-23 | 15 | 10 |
+------+------------+----------+-------+
My solution is:
WITH data AS (
SELECT "JONES" name, DATE("2019-01-03") date_time, 11 order_id, 10 value
UNION ALL
SELECT "JONES", DATE("2019-01-05"), 12, 5
UNION ALL
SELECT "JONES", DATE("2019-06-03"), 13, 3
UNION ALL
SELECT "JONES", DATE("2019-07-03"), 14, 20
UNION ALL
SELECT "John", DATE("2019-07-23"), 15, 10
),
data2 AS (
SELECT *, MIN(date_time) OVER (PARTITION BY name) min_date
FROM data
)
SELECT name,
ARRAY_AGG(STRUCT(order_id as f_id, date_time as f_date) ORDER BY order_id LIMIT 1)[OFFSET(0)].*,
sum(case when date_time< date_add(min_date,interval 3 day) then value end) as total_value_day3,
SUM(value) AS total
FROM data2
GROUP BY name
Output:
+------+------+------------+----------------+------+
| name | f_id | f_date |total_value_day3| total|
+------+------+------------+----------------+------+
| JONES| 11 | 2019-01-03 | 15 | 38 |
| John | 15 | 2019-07-23 | 10 | 10 |
+------+------+------------+----------------+------+
So my question, can do the same calculated with a more effective way?
Or this solution is ok for large datasets?
The following gets the same results without using window functions or array aggregations, so BQ has to do less ordering/partitioning. For this small example, my query takes longer to run, but there is less byte shuffling. If you run this against a much larger dataset, I think mine will be more efficient.
WITH data AS (
SELECT "JONES" name, DATE("2019-01-03") date_time, "11" order_id, 10 value UNION ALL
SELECT "JONES", DATE("2019-01-05"), "12", 5 UNION ALL
SELECT "JONES", DATE("2019-06-03"), "13", 3 UNION ALL
SELECT "JONES", DATE("2019-07-03"), "14", 20 UNION ALL
SELECT "John", DATE("2019-07-23"), "15", 10
),
aggs as (
select name, min(date_time) as first_order_date, min(order_id) as first_order_id, sum(value) as total
from data
group by 1
)
select
name,
first_order_id as f_id,
first_order_date as f_date,
sum(value) as total_value_day3,
total
from aggs
inner join data using(name)
where date_time < date_add(first_order_date, interval 3 day) -- <= perhaps
group by 1,2,3,5
Note, this makes an assumption that order_id is sequential (aka order_id 11 always occurs before order_id 12) in the same manner that dates are sequential.

Resources