Example data (complete table has more columns and millions of rows):
invoice_number |year |department |euros
-------------------------------------------------------------
1234 |2010 |1 | 200
1234 |2011 |1 | 200
1234 |2011 |2 | 200
4567 |2010 |1 | 450
4567 |2010 |2 | 450
4567 |2010 |3 | 450
My Objective:
I want to sum the euros for every year and every department in every possible combination.
How result should look:
year |department |euros
--------------------------------------------
2010 |1 |650
2010 |2 |450
2010 |3 |450
2010 |(null) |650
2011 |1 |200
2011 |2 |200
(null) |1 |650
(null) |2 |650
(null) |3 |450
(null) |(null) |650
My query:
select year
, department
, sum(euros)
from table1
group by cube (
year
, department
)
Problem:
One invoice number can occur in several categories. For example, one invoice can have items from 2010 and 2011. This is no problem when I want to show the data per year. However, when I want the total over all years the euros will be summed twice, one time for each year. I want the functionality of 'group by cube' but I want to sum only distinct invoice numbers for aggregations.
Problem table:
year |department |euros
--------------------------------------------
2010 |1 |650
2010 |2 |450
2010 |3 |450
2010 |(null) |1550
2011 |1 |200
2011 |2 |200
(null) |1 |850
(null) |2 |650
(null) |3 |450
(null) |(null) |1950
Is it possible to do what I want? So far my search has yielded no results. I have created a SQL Fiddle, I hope it works
[Removed previous "solution"]
New attempt: here is quite an ugly solution, but it seems to work, even when two invoices have the same amount. With two table accesses, you should check if performance is acceptable.
SQL> with table1_cubed as
2 ( select year
3 , department
4 , grouping_id(year,department) gid
5 from table1
6 group by cube(year,department)
7 )
8 , join_distinct_invoices as
9 ( select distinct x.*
10 , r.invoice_number
11 , r.euros
12 from table1_cubed x
13 inner join table1 r on (nvl(x.year,r.year) = r.year and nvl(x.department,r.department) = r.department)
14 )
15 select year
16 , department
17 , sum(euros)
18 from join_distinct_invoices
19 group by year
20 , department
21 , gid
22 order by year
23 , department
24 /
YEAR DEPARTMENT SUM(EUROS)
---------- -------------------- ----------
2010 1 650
2010 2 450
2010 3 450
2010 650
2011 1 200
2011 2 200
2011 200
1 650
2 650
3 450
650
11 rows selected.
select year
,department
,case when GROUPING_id(year,department) in (3) then sum(dist_euro) else sum(euros) end sums
,decode(GROUPING_id(year,department),0,'NO GROUP',1,'DEPARTMENT IS NULL',2,'YEAR IS NULL',3,'TOTAL OVER ALL YEARS') info
from (
select year
, department
, euros
,case when row_number() over(partition by year order by year) = 1 then euros else 0 end dist_euro
from table1)
group by cube (
year
, department
)
order by GROUPING_id(year,department)
Related
I'm a SQL rookie, and am having trouble wrapping my head around how to do the following. I have a table that contains item information by branch. Within a branch an item can be in multiple locations. The data I need to extract needs to include a column that provides the total number of locations (count) the item is associated with for a given branch.
Output would look something like this:
I'm guessing this is a sub query, but to be honest I'm not sure how to get started... order in which this is done (subquery group by first, then join, etc)
In purely logical terms:
SELECT
a.Branch,
a.Item,
a.Loc,
COUNT(a.Branch||a.Item) AS 'LocCount'
FROM BranchInventoryFile a
GROUP BY a.Branch,a.Item
You can tackle this by using Oracle's Count Analytical functions found here. Be sure to read up on WINDOW/Partitioning functions as this unlocks quite a bit of functionality in SQL.
SQL:
SELECT
a.BRANCH,
a.ITEM,
a.LOC,
COUNT(a.ITEM) OVER (PARTITION BY a.BRANCH, a.ITEM) AS LOC_COUNT
FROM
BRANCH a;
Result:
| BRANCH | ITEM | LOC | LOC_COUNT |
|--------|------|------|-----------|
| 100 | A | 1111 | 2 |
| 100 | A | 1112 | 2 |
| 200 | A | 2111 | 1 |
| 200 | B | 1212 | 2 |
| 200 | B | 1212 | 2 |
| 300 | A | 1222 | 1 |
SQL Fiddle:
Here
total number of locations (count) the item is associated with for a given branch
The way you described it, you should
remove location from query:
SQL> with branchinventoryfile (branch, item, location) as
2 (select 100, 'A', 1111 from dual union all
3 select 100, 'A', 1112 from dual union all
4 select 200, 'A', 2111 from dual
5 )
6 select branch,
7 item,
8 count(distinct location) cnt
9 from BranchInventoryFile
10 group by branch, item;
BRANCH I CNT
---------- - ----------
100 A 2
200 A 1
SQL>
if you leave location in select, you have to group by it (and get wrong result):
6 select branch,
7 item,
8 location,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item, location;
BRANCH I LOCATION CNT
---------- - ---------- ----------
100 A 1111 1
200 A 2111 1
100 A 1112 1
SQL>
or include locations, but aggregate them, e.g.
6 select branch,
7 item,
8 listagg(location, ', ') within group (order by null) loc,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item;
BRANCH I LOC CNT
---------- - -------------------- ----------
100 A 1111, 1112 2
200 A 2111 1
SQL>
I have a table like this:
I want to group by the table base on "customer_id" column and calculate "Day-day[0]" column. "Day-day[0]" is "Day" field in every group and "day[0]" is first row of the day in the group. At the same time, I have to calculate total risk which is in following:
This is the table after grouping by:
This is total risk formula:
In fact, I have to loop through each row of every group to calculate total risk.
My sample table is like this:
CREATE TABLE risk_test
(id VARCHAR2 (32) NOT NULL PRIMARY KEY,
customer_id varchar2 (40BYTE),
risk number,
day VARCHAR2(50 BYTE))
insert into risk_test values(1,102,15,1);
insert into risk_test values(2,102,16,1);
insert into risk_test values(3,104,11,1);
insert into risk_test values(4,102,17,2);
insert into risk_test values(5,102,10,2);
insert into risk_test values(6,102,13,3);
insert into risk_test values(7,104,14,2);
insert into risk_test values(8,104,13,2);
insert into risk_test values(9,104,17,1);
insert into risk_test values(10,104,16,2);
The sample answer is like this:
Would you please guide me how I can do this scenario in Oracle database?
Any help is really appreciated.
Using the sample data that was provided, I believe this query should calculate the risks properly:
Query
SELECT o.*,
ROUND (
SUM (day_minus_day0 * risk) OVER (PARTITION BY customer_id)
/ SUM (day_minus_day0) OVER (PARTITION BY customer_id),
5) AS total_risk
FROM (SELECT rt.*, (rt.day - MIN (rt.day) OVER (PARTITION BY customer_id)) + 1 AS day_minus_day0
FROM risk_test rt) o
ORDER BY customer_id, TO_NUMBER (day), TO_NUMBER (id);
Result
ID CUSTOMER_ID RISK DAY DAY_MINUS_DAY0 TOTAL_RISK
_____ ______________ _______ ______ _________________ _____________
1 102 15 1 1 13.77778
2 102 16 1 1 13.77778
4 102 17 2 2 13.77778
5 102 10 2 2 13.77778
6 102 13 3 3 13.77778
3 104 11 1 1 14.25
9 104 17 1 1 14.25
7 104 14 2 2 14.25
8 104 13 2 2 14.25
10 104 16 2 2 14.25
Your total risk calculation just looks like a weighted average to me. That is, the average risk of the rows for each customer, weighted according to the day offset (day-day[0]), so that risks in later days count for more.
To compute that, you need a common table expression to 1st compute the day-weighted risk for each row. Then you can just compute the weighted average by dividing.
The query below illustrates the approach, with comments.
-- This first WITH clause is just sample data. In your database you would
-- get rid of this and replace all references to "input" with your actual
-- table name
with input ( customer_id, risk, day ) AS (
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 4 FROM DUAL
),
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
weighted_input AS (
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM input i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;
+-------------+------+-----+------------+-------------------+------------+
| CUSTOMER_ID | RISK | DAY | DAY_OFFSET | DAY_WEIGHTED_RISK | TOTAL_RISK |
+-------------+------+-----+------------+-------------------+------------+
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 3 | 3 | 300 | 100 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 4 | 4 | 800 | 200 |
+-------------+------+-----+------------+-------------------+------------+
For your database, having the actual table and not needing the input CTE, it would be:
WITH weighted_input AS (
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM my_table i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;
I have a table that tracks mileage of 10 vehicles every hour every day in a oracle database. For example:
Car | Mileage| Timestamp
Honda | 23.4| 11-Jan-17 08.00.00.000000 AM
Honda | 22| 11-Jan-17 09.00.00.000000 AM
Honda | 21.3 | 11-Jan-17 10.00.00.000000 AM
Honda | 24.4| 11-Jan-17 11.00.00.000000 AM
Honda | 23.2| 12-Jan-17 08.00.00.000000 AM
Honda | 25| 12-Jan-17 09.00.00.000000 AM
Honda | 26| 12-Jan-17 10.00.00.000000 AM
I dont understand how I can write a query to run this everyday for last 1 years worth of data and select mileage for every car before 9am everyday
Assuming you mean mileage later than midnight and prior to 09:00 then the following will do the job and also cope with other car makes.
WITH base_data AS
(SELECT car, mileage, read_date , ROW_NUMBER() OVER(PARTITION BY car,TRUNC(READ_DATE) ORDER BY read_date DESC) as ranking
FROM wg_test
WHERE EXTRACT (HOUR FROM read_date) BETWEEN 0 AND 8
AND read_date > SYSDATE - 365)
SELECT car, mileage, read_date
FROM base_data
WHERE ranking = 1
I have a EMP table. I need to get number of employees in each department grouped by country name = 'INDIA','USA', 'AUSTRALIA'.
For example,
DEPARTMENT | #EMPLOYEE(INDIA) | #EMPLOYEE(USA) | # EMPLOYEE(AUSTRALIA)
ACCOUNTING | 5 |2 | 3
IT | 5 |2 | 1
BUSINESS | 1 |4 | 3
I need to use Partition BY to do it. I am able to use PARTITION by to get the total count of employees for each department. But I am not able to subgroup by country name.
Please give me suggestions.
Thank you.
Consider conditional count.
SELECT DEPARTMENT,
COUNT(CASE WHEN Country = 'INDIA' THEN 1 END) as emp_india,
COUNT(CASE WHEN Country = 'USA' THEN 1 END) as emp_usa,
COUNT(CASE WHEN Country = 'AUSTRALIA' THEN 1 END) as emp_australia
GROUP BY DEPARTMENT
I need to get the cost of an item at a certain date and time. I have these two tables:
create table sales ( product_id int, items_sold int, date_loaded date );
create table product ( product_id int, description string, item_cost double, date_loaded date );
The product table is a history of each item. If the cost of an item today is $1.00 but the cost of that item yesterday was $0.99 I would have two records one for each day. When I load my sales data I need to reflect the cost of the item yesterday and not today's cost.
Here is the query I am trying to execute:
SELECT s.product_id, s.items_sold, p.description, s.items_sold * p.item_cost as total_cost FROM sales s, product p
WHERE
p.product_id = s.product_id and
p.date_loaded <= (
SELECT MAX(pp.date_loaded)
FROM product pp
WHERE
pp.product_id = s.product_id and
pp.date_loaded <= s.date_loaded
)
SALES TABLE:
|PRODUCT_ID |ITEMS_SOLD |DATE_LOADED |
|1 |4 |2016-06-30 |
|1 |5 |2016-07-01 |
|1 |6 |2016-07-02 |
|1 |3 |2016-07-03 |
PRODUCT TABLE:
|PRODUCT_ID |DESCRIPTION |ITEM_COST |DATE_LOADED |
|1 |ITEM A |0.99 |2016-06-20 |
|1 |ITEM A |1.00 |2016-07-02 |
I would expect to see this result:
|PRODUCT_ID |ITEMS_SOLD |DESCRIPTION |ITEM_COST |TOTAL_COST |
|1 |4 |ITEM A |0.99 |3.96 |
|1 |5 |ITEM A |0.99 |4.95 |
|1 |6 |ITEM A |1.00 |6.00 |
|1 |3 |ITEM A |1.00 |3.00 |
From everything I have read this form of a sub query is not allowed. So how can I accomplish this in HIVE?
It can be accomplished with CTE and Lag widow function
With result as(select PRODUCT_ID, DESCRIPTION, ITEM_COST , DATE_LOADED ,
LEAD(DATE_LOADED, 1,'2999-01-01')
OVER (ORDER BY DATE_LOADED) AS fromdate from PRODUCT )
SELECT s.product_id, s.items_sold, p.description, s.items_sold * p.item_cost
as total_cost FROM sales s join result p on s.product_id = p.product_id
where s.DATE_LOADED >= p.DATE_LOADED and s.DATE_LOADED < p.fromdate ;