Aggregating several columns in oracle sql - oracle

Having a difficult time phrasing this question. Let me know if there's a better title.
I have a query that produces data like this:
+----------+----------+----------+----------+----------+
| KEY | FEB_GRP1 | JAN_GRP1 | FEB_GRP2 | JAN_GRP2 |
+----------+----------+----------+----------+----------+
| 50840992 | 1 | 1 | 0 | 0 |
| 50840921 | 0 | 1 | 1 | 0 |
| 50848995 | 0 | 0 | 0 | 0 |
+----------+----------+----------+----------+----------+
Alternatively, I can produce data like this:
+----------+------+------+
| KEY | JAN | FEB |
+----------+------+------+
| 50840992 | <50 | ~<50 |
| 50840921 | <50 | <50 |
| 50848995 | ~<50 | ~<50 |
| 50840885 | <50 | <50 |
+----------+------+------+
Where <50 should be counter as "group 1" and ~<50 should be counter as "group 2".
And I want it to be like this:
+-------+------+------+
| MONTH | GRP1 | GRP2 |
+-------+------+------+
| JAN | 2 | 0 |
| FEB | 1 | 1 |
+-------+------+------+
I can already get JAN_GRP1_SUM just by summing JAN_GRP1, but I want that to just be a data point, not a column itself.
My query (generates the first diagram):
SELECT *
FROM (
SELECT KEY,
CASE WHEN "FEB-1-2016" = '<50' THEN 1 ELSE 0 END AS FEB_GRP1,
CASE WHEN "FEB-1-2016" != '<50' THEN 1 ELSE 0 END AS FEB_GRP2,
CASE WHEN "JAN-1-2016" = '<50' THEN 1 ELSE 0 END AS JAN_GRP1,
CASE WHEN "JAN-1-2016" != '<50' THEN 1 ELSE 0 END AS JAN_GRP2
FROM MY_TABLE);

Your data model doesn't make much sense, but from what you've shown you can do:
select 'JAN' as month,
count(case when "JAN-1-2016" = '<50' then 1 end) as grp1,
count(case when "JAN-1-2016" != '<50' then 1 end) as grp2
from my_table
union all
select 'FEB' as month,
count(case when "FEB-1-2016" = '<50' then 1 end) as grp1,
count(case when "FEB-1-2016" != '<50' then 1 end) as grp2
from my_table;
That doesn't scale well - if you have more months you need to add another union branch for each one.
If your query is based on a view or a previously calculated summary then it will probably be much easier to go back to the original data.
If you are stuck with this then another possible approach, which might be more manageable if you actually have more than two months to look at, could be to unpivot the data:
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
);
and then aggregate that:
select to_char(month, 'MON', 'NLS_DATE_LANGUAGE=ENGLISH') as month,
count(case when value = '<50' then 1 end) as grp1,
count(case when value != '<50' then 1 end) as grp2
from (
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
)
)
group by month;
Still not pretty and Oracle is doing pretty much the same thing under the hood I think, but fewer case expressions to create and maintain - the drudge part is the unpivot pairs. You might need to include the year in the `'month' field, depending on the range of data you have.

Related

How to count total amount of pending tickets for each day this week in oracle-sql?

I want to count the total amount of pending tickets for each day in this week. I was only able to get it for one day at a time. I have this query right now:
SELECT (n.TOTAL - v.TODAY) + d.GISTER AS GISTER
FROM
(
-- Counts yesterday
SELECT
COUNT(ID) AS Gister
FROM FRESHDESK_API
-- 4 = resolved 5 = closed
-- Both count as closed
WHERE STATUS IN(4, 5)
AND TRUNC(UPDATED_AT) = TRUNC(SYSDATE - 1)
) d
CROSS JOIN
(
-- Total pending
SELECT
COUNT(ID) AS TOTAL
FROM FRESHDESK_API
-- 3 is pending
WHERE STATUS IN(3)
) n
CROSS JOIN
(
-- Pending tickets today
SELECT
COUNT(ID) AS TODAY
FROM FRESHDESK_API
-- 3 is pending
WHERE STATUS IN(3)
AND TRUNC(UPDATED_AT) = TRUNC(SYSDATE)
) v
I want to get a result like this:
+----------------------------------+---------+----------+
| day | pending_tickets |
+----------------------------------+---------+----------+
| Monday | 20 |
| Tuesday | 22 |
| Wednesday | 25 |
| Thursday | 24 |
| Friday | 19 |
+----------------------------------+---------+----------+
The table is someting like this (left the unused data out):
+----------------------------------+---------+----------+---------+-----------+----------+----------+
| id | created_at | updated_at | status |
+----------------------------------+---------+----------+----------+----------+----------+----------+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
+----------------------------------+---------+----------+---------+-----------+---------+-----------+
You can use left join and group by as follows:
Select to_char(tday.updated_at, 'day') as updated_at,
count(tday.id) - count(yday.id) as pending_tickets
From FRESHDESK_API tday
Left join FRESHDESK_API yday
On trunc(tday.UPDATED_AT) = trunc(yday.UPDATED_AT - 1)
And trunc(yday.UPDATED_AT + 1, 'iw') = trunc(sysdate, 'iw')
And yday.status in (4,5)
Where trunc(tday.UPDATED_AT, 'iw') = trunc(sysdate, 'iw')
And tday.status = 3
Group by to_char(tday.updated_at, 'day'), trunc(tday.updated_at)
Order by trunc(tday.updated_at);

Lag and Lead to next month

TABLE: HIST
CUSTOMER MONTH PLAN
1 1 A
1 2 B
1 2 C
1 3 D
If I query:
select h.*, lead(plan) over (partition by customer order by month) np from HIST h
I get:
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B C
1 2 C D
1 3 D (null)
But I wanted
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B D
1 2 C D
1 3 D (null)
Reason being, next month to 2 is 3, with D. I'm guessing partition by customer order by month doesn't work the way I thought.
Is there a way to achieve this in Oracle 12c?
One way to do it is to use RANGE partitioning with the MIN analytic function. Like this:
select h.*,
min(plan) over
(partition by customer
order by month
range between 1 following and 1 following) np
from HIST h;
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | |
+----------+-------+------+----+
When you use RANGE partitioning, you are telling Oracle to make the windows based on the values of the column you are ordering by rather than making the windows based on the rows.
So, e.g.,
ROWS BETWEEN 1 following and 1 following
... will make a window containing the next row.
RANGE BETWEEN 1 following and 1 following
... will make a window containing all the rows having the next value for month.
UPDATE
If it is possible that some values for MONTH might be skipped for a given customer, you can use this variant:
select h.*,
first_value(plan) over
(partition by customer
order by month
range between 1 following and unbounded following) np
from h
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 3 | B | D |
| 1 | 3 | C | D |
| 1 | 4 | D | |
+----------+-------+------+----+
You can use LAG/LEAD twice. The first time to check for duplicate months and to set the value to NULL in those months and the second time use IGNORE NULLS to get the next monthly value.
It has the additional benefit that if months are skipped then it will still find the next value.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE HIST ( CUSTOMER, MONTH, PLAN ) AS
SELECT 1, 1, 'A' FROM DUAL UNION ALL
SELECT 1, 2, 'B' FROM DUAL UNION ALL
SELECT 1, 2, 'C' FROM DUAL UNION ALL
SELECT 1, 3, 'D' FROM DUAL UNION ALL
SELECT 2, 1, 'E' FROM DUAL UNION ALL
SELECT 2, 1, 'F' FROM DUAL UNION ALL
SELECT 2, 3, 'G' FROM DUAL UNION ALL
SELECT 2, 5, 'H' FROM DUAL;
Query 1:
SELECT CUSTOMER,
MONTH,
PLAN,
LEAD( np ) IGNORE NULLS OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM ) AS np
FROM (
SELECT h.*,
CASE MONTH
WHEN LAG( MONTH ) OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM )
THEN NULL
ELSE PLAN
END AS np
FROM hist h
)
Results:
| CUSTOMER | MONTH | PLAN | NP |
|----------|-------|------|--------|
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |
| 2 | 1 | E | G |
| 2 | 1 | F | G |
| 2 | 3 | G | H |
| 2 | 5 | H | (null) |
Just so that it is listed here as an option for Oracle 12c (onward), you can use an apply operator for this style of problem
select
h.customer, h.month, h.plan, oa.np
from hist h
outer apply (
select
h2.plan as np
from hist h2
where h.customer = h.customer
and h2.month > h.month
order by month
fetch first 1 rows only
) oa
order by
h.customer, h.month, h.plan
I don't know of any Oracle 12c public fiddles so, an example in SQL Server can be found here: http://sqlfiddle.com/#!18/cd95e/1
| customer | month | plan | np |
|----------|-------|------|--------|
| 1 | 1 | A | C |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |

Not able to aggregate in case statement in hive query

I have data like below:
SELECT
mtrans.merch_num,
mtrans.card_num
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30;
+-----------+----------------------------+
| merch_num | card_num |
+-----------+----------------------------+
| 1 | 4658XXXXXXXXXXXXXXXXXXURMX |
| 2 | 4658XXXXXXXXXXXXXXXXXXIE6X |
| 2 | 4658XXXXXXXXXXXXXXXXXXDA8X |
| 2 | 4658XXXXXXXXXXXXXXXXXX7D1X |
| 2 | 4658XXXXXXXXXXXXXXXXXXTJ2X |
| 2 | 4658XXXXXXXXXXXXXXXXXXQQWX |
| 2 | 4659XXXXXXXXXXXXXXXXXXY4EX |
| 2 | 4658XXXXXXXXXXXXXXXXXXRDOX |
| 2 | 4658XXXXXXXXXXXXXXXXXX0O3X |
| 2 | 4658XXXXXXXXXXXXXXXXXXNVBX |
+-----------+----------------------------+
I want to aggregate trans_amt by merch_num only if I get unique card_num more than 1.
In simple Query I can do it:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
SUM(mtrans.trans_amt) AS total_age_less_30_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30
GROUP BY
mtrans.merch_num having count(distinct mtrans.card_num) > 1;
+-----------+---------------+---------------------+
| merch_num | process_month | total_age_less_30_1 |
+-----------+---------------+---------------------+
| 2 | Nov-2017 | 2147.5 |
+-----------+---------------+---------------------+
Here I am able to skip merchant - 5493036 as it doesn't have unique cards more than 1.
But I have multiple conditions in where & want to write 1 query only.
Using case statement I am able to do it like below:
SELECT mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num
+-----------+---------------+---------------------+-------------------+
| merch_num | process_month | total_age_less_30_1 | total_age_30_40_1 |
+-----------+---------------+---------------------+-------------------+
| 3 | Nov-2017 | 0 | 0 |
| 4 | Nov-2017 | 0 | 0 |
| 1 | Nov-2017 | 2.49 | 203.68 |
| 2 | Nov-2017 | 2147.5 | 4907 |
| 5 | Nov-2017 | 0 | 0 |
+-----------+---------------+---------------------+-------------------+
I want to make 2.49 as NULL as for that merchant, more than 1 unique card is not present.
I am not able to apply having condition to check if unique card no is more than 1 then only I have to show the sum(trans_amt)
when I apply and condition in case statement, I get below error:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
ERROR: AnalysisException: aggregate function must not contain aggregate parameters: sum(CASE WHEN (round(datediff(mtrans.transaction_date, cdemo.date_birth) / 365) < 30 AND count(DISTINCT mtrans.card_num) > 1) THEN mtrans.trans_amt ELSE 0 END)
Can someone help?
The error seems to be because you have count inside the SUM statement. This is what you must try, Let me know how it goes :
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_less_30_1,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
I would suggest doing it in a better way as follows.
(PS: I didn't have any hive access, so I am doing this using Postgresql using regular SQL. So, it should be easier to adapt to Hive SQL).
Here is my SQL Table and records inserted in the table.
CREATE TEMPORARY TABLE hivetest (
merchant_id INTEGER,
card_number TEXT,
customer_dob TIMESTAMP,
transaction_dt TIMESTAMP,
transaction_amt DECIMAL
);
INSERT INTO hivetest VALUES
(1, 'A', '1997-12-01', '2017-11-01', 10.0),
(2, 'A', '1997-12-01', '2017-11-01', 11.0),
(2, 'B', '1980-12-01', '2017-11-01', 12.0),
(3, 'A', '1997-12-01', '2017-11-01', 13.0),
(3, 'A', '1997-12-01', '2017-11-01', 14.0),
(4, 'A', '1997-12-01', '2017-11-01', 15.0),
(4, 'C', '1980-12-01', '2017-11-01', 16.0);
First, you need to join the tables and generate a dataset that gives you the transaction_age (transaction_dt - customer_dob). I have most of the data for date subtraction in this single table, but simple INNER JOIN(s) should suffice to achieve this. Anyways, here is the query for the same.
SELECT
merchant_id, card_number, DATE(customer_dob) customer_dob, DATE(transaction_dt) transaction_dt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age,
transaction_amt
FROM hivetest ORDER BY 1;
This results in the data as follows.
+-------------+-------------+--------------+----------------+-----------------+----------------+
| merchant_id | card_number | customer_dob | transaction_dt | transaction_age |transaction_amt |
+-------------+-------------+--------------+----------------+-----------------+----------------+
| 1 | A | 1997-12-01 | 2017-11-01 | 20 | 10.0 |
| 2 | A | 1997-12-01 | 2017-11-01 | 20 | 11.0 |
| 2 | B | 1980-12-01 | 2017-11-01 | 37 | 12.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 13.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 14.0 |
| 4 | A | 1997-12-01 | 2017-11-01 | 20 | 15.0 |
| 4 | C | 1980-12-01 | 2017-11-01 | 37 | 16.0 |
+-------------+-------------+--------------+----------------+-----------------+----------------+
The above dataset will allow you to categorise the SUM of transaction amounts based on the transaction_age as you want. The trick is to have the above query in a sub-query and use the results of this subquery to categorize. Here is the query to do the same.
SELECT
merchant_id,
-- Transaction Age less than 30
SUM(CASE WHEN transaction_age <= 30 THEN 1 ELSE 0 END) count_30,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
-- Transaction Age between 30 and 40
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN 1 ELSE 0 END) case_30_40,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT
merchant_id, transaction_amt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age
FROM hivetest
) m
GROUP BY merchant_id ORDER BY 1;
This results in the categorised output as below which gives you the count of transactions and sum of transaction amounts for each category for each merchant:
+-------------+----------+--------+------------+-----------+
| merchant_id | count_30 | sum_30 | case_30_40 | sum_30_40 |
+-------------+----------+--------+------------+-----------+
| 1 | 1 | 10.0 | 0 | 0 |
| 2 | 1 | 11.0 | 1 | 12.0 |
| 3 | 2 | 27.0 | 0 | 0 |
| 4 | 1 | 15.0 | 1 | 16.0 |
+-------------+----------+--------+------------+-----------+
Now, this is our dataset which is more or less the final result. However, as per your requirement, you are only interested in merchants which have more than 1 unique cards (COUNT(DISTINCT card_number) > 1).
So, lets write another query which gives us this. Below is the query which calculates this and based on the criteria, it marks the flag as TRUE or FALSE indicating whether or not we are interested in that merchant or not.
SELECT
merchant_id,
CASE
WHEN COUNT(DISTINCT card_number) > 1 THEN
TRUE
ELSE
FALSE
END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
This gives the output as below.
+-------------+-------------------------+
| merchant_id | has_distinct_cards_gt_1 |
+-------------+-------------------------+
| 1 | false |
| 2 | true |
| 3 | false |
| 4 | true |
+-------------+-------------------------+
Now, we are almost done. We just need to join these two tables and then based on the has_distinct_cards_gt_1, display the columns accordingly from the dataset generated previously.
Here is the final join query and resultset data generated.
SELECT
merchants_all.merchant_id,
-- Age < 30
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30
ELSE
0
END total_sum_30,
-- Age in 30 and 40
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30_40
ELSE
0
END total_sum_30_40
FROM
(
SELECT
merchant_id,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT merchant_id, DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age, transaction_amt
FROM hivetest
) m
GROUP BY merchant_id
) merchants_all
JOIN
(
SELECT merchant_id, CASE WHEN COUNT(DISTINCT card_number) > 1 THEN TRUE ELSE FALSE END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
) merchants_cards
ON
(merchants_all.merchant_id = merchants_cards.merchant_id);
And this generates your final data, which you need.
+-------------+--------------+-----------------+
| merchant_id | total_sum_30 | total_sum_30_40 |
+-------------+--------------+-----------------+
| 1 | 0 | 0 |
| 2 | 11.0 | 12.0 |
| 3 | 0 | 0 |
| 4 | 15.0 | 16.0 |
+-------------+--------------+-----------------+
Let me know if this helps.
COUNT inside SUM is the problem.
Here is a solution. I haven't tested it though.
It's not obvious which table person_org_code belongs to. If it is in merch_trans_daily, then add person_org_code = 'P' to the where clause in the view. Let's know whether it works!
WITH mtrans_count AS
(SELECT merch_num,
COUNT(1) AS cnt
FROM a_sbp_db.merch_trans_daily
WHERE mtrans.transaction_date LIKE '2017-09%'
)
SELECT mtrans.merch_num
,FROM_UNIXTIME(UNIX_TIMESTAMP(), 'MMM-yyyy') AS process_month
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 30
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_less_30_1
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 40
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
INNER JOIN mtrans_count ON mtrans_count.merch_num = mtrans.merch_num
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code = 'P'
GROUP BY mtrans.merch_num;

HIVE Pivot and Sum

I have a table that I am trying to figure out how to pivot and sum based on the values in a second column.
Example input:
|own|pet|qty|
|---|---|---|
|bob|dog| 2 |
|bob|dog| 3 |
|bob|dog| 1 |
|bob|cat| 1 |
|jon|dog| 1 |
|jon|cat| 1 |
|jon|cat| 1 |
|jon|cow| 4 |
|sam|dog| 3 |
|sam|cow| 1 |
|sam|cow| 2 |
Example output:
|own|dog|cat|cow|
|---|---|---|---|
|bob| 6 | 1 | |
|jon| 1 | 2 | 4 |
|sam| 1 | | 3 |
Use case and sum():
select own, sum(case when pet='dog' then qty end) as dog,
sum(case when pet='cat' then qty end) as cat,
sum(case when pet='cow' then qty end) as cow
from your_table
group by own;
For dynamic data you can use MAP
select own
,str_to_map(concat_ws(',',collect_list(concat(pet,':',cast(qty as string))))) as pet_qty
from (select own,pet
,sum(qty) qty
from mytable
group by own,pet
) t
group by own
;
+-----+---------------------------------+
| own | pet_qty |
+-----+---------------------------------+
| bob | {"cat":"1","dog":"6"} |
| jon | {"cat":"2","cow":"4","dog":"1"} |
| sam | {"cow":"3","dog":"3"} |
+-----+---------------------------------+

Oracle Recursive Subquery Factoring convert

I'm trying to use this recursive SQL feature but can't get it to do what I want, not even close. I've coded up the logic in an unrolled loop, asking if it can be converted into a single recursive SQL query, not the table update style I've used.
http://sqlfiddle.com/#!4/b7217/1
There are six players to be ranked. They have id, group id, score and rank.
Initial state
+----+--------+-------+--------+
| id | grp_id | score | rank |
+----+--------+-------+--------+
| 1 | 1 | 100 | (null) |
| 2 | 1 | 90 | (null) |
| 3 | 1 | 70 | (null) |
| 4 | 2 | 95 | (null) |
| 5 | 2 | 70 | (null) |
| 6 | 2 | 60 | (null) |
+----+--------+-------+--------+
I want to take the person with the highest initial score and give them rank 1. Then I apply 10 bonus points to the score of everyone who has the same group id. Take the next highest, assign rank 2, distribute bonus points and so on until there are no players left.
User id breaks ties.
The bonus points changes the ranking. id=4 initially appears to be second placed with 95, behind the leader with 100 but with the 10 pts bonus, id=2 moves up and takes the spot.
Final state
+-----+---------+--------+------+
| ID | GRP_ID | SCORE | RANK |
+-----+---------+--------+------+
| 1 | 1 | 100 | 1 |
| 2 | 1 | 100 | 2 |
| 4 | 2 | 95 | 3 |
| 3 | 1 | 90 | 4 |
| 5 | 2 | 80 | 5 |
| 6 | 2 | 80 | 6 |
+-----+---------+--------+------+
This is a quite a bit late, but I'm not sure this can be done using Recursive CTE. I did however come up with a solution using the MODEL clause:
WITH SAMPLE (ID,GRP_ID,SCORE,RANK) AS (
SELECT 1,1,100,NULL FROM DUAL UNION
SELECT 2,1,90,NULL FROM DUAL UNION
SELECT 3,1,70,NULL FROM DUAL UNION
SELECT 4,2,95,NULL FROM DUAL UNION
SELECT 5,2,70,NULL FROM DUAL UNION
SELECT 6,2,60,NULL FROM DUAL)
SELECT ID,GRP_ID,SCORE,RANK FROM SAMPLE
MODEL
DIMENSION BY (ID,GRP_ID)
MEASURES (SCORE,0 RANK,0 LAST_RANKED_GRP,0 ITEM_COUNT,0 HAS_RANK)
RULES
ITERATE (1000) UNTIL (ITERATION_NUMBER = ITEM_COUNT[1,1]) --ITERATE ONCE FOR EACH ITEM TO BE RANKED
(
RANK[ANY,ANY] = CASE WHEN SCORE[CV(),CV()] = MAX(SCORE) OVER (PARTITION BY HAS_RANK) THEN RANK() OVER (ORDER BY SCORE DESC,ID) ELSE RANK[CV(),CV()] END, --IF THE CURRENT ITEM SCORE IS EQUAL TO THE MAX SCORE OF UNRANKED, ASSIGN A RANK
LAST_RANKED_GRP[ANY,ANY] = FIRST_VALUE(GRP_ID) OVER (ORDER BY RANK DESC),
SCORE[ANY,ANY] = CASE WHEN RANK[CV(),CV()] = 0 AND CV(GRP_ID) = LAST_RANKED_GRP[CV(),CV()] THEN SCORE[CV(),CV()]+10 ELSE SCORE[CV(),CV()] END,
ITEM_COUNT[ANY,ANY] = COUNT(*) OVER (),
HAS_RANK[ANY,ANY] = CASE WHEN RANK[CV(),CV()] <> 0 THEN 1 ELSE 0 END --TO SEPARATE RANKED/UNRANKED ITEMS
)
ORDER BY RANK;
It's not very pretty, and I suspect there is a better way to go about this, but it does give the expected output.
Caveats:
You'd have to increase the iteration count if you have more than that number of rows.
This does a full re-ranking based on the score after each iteration. So if we took your sample data, but changed the initial score of item 2 to 95 rather than 90: after ranking item 1 and giving the 10 point bonus to item 2, it now has a score of 105. So we rank it as 1st and move item 1 down to 2nd. You'd have to make a few modifications if this is not the desired behavior.

Resources