Average Function with not null columns - Hive - hadoop

I want to calculate an average for the first 3 years income which is not NULL for eg :
employee id 2016 2015 2014 2013 2012 2011 2010
1 100 NULL 200 50 10 50 50
average should be on 100 + 200 + 50 / 3
employee id 2016 2015 2014 2013 2012 2011 2010
2 NULL 100 NULL 50 NULL 25 100
average should be 100 + 50 + 25 / 3

Get one row per year with union all. Then rank the rows with row_number function so that non-null rows would be ranked first. Then get the average of first 3 rows.
select employee_id,avg(income)
from (select employee_id,yr,income
,row_number() over(partition by employee_id order by cast((income is not null) as int) desc,yr desc) as rnum
from (select employee_id,2016 as yr,`2016` as income from tbl
union all
select employee_id,2015 as yr,`2015` as income from tbl
union all
select employee_id,2014 as yr,`2014` as income from tbl
union all
select employee_id,2013 as yr,`2013` as income from tbl
union all
select employee_id,2012 as yr,`2012` as income from tbl
union all
select employee_id,2011 as yr,`2011` as income from tbl
union all
select employee_id,2010 as yr,`2010` as income from tbl
) t
) t
where rnum <= 3
group by employee_id
When 2 columns have values, the result would be (val1+val2)/2.
When only one column has a value, the result would be that column.
When all columns have a null value, null is returned.

Related

Oracle query to keep looking until value is not 0 anymore

I am using Oracle 11.
I have 2 tables
TblA with columns id, entity_id and effective_date.
TblADetail with columns id and value.
If Value = 0 for the effective date, I want to keep looking for the next effective date until I found value <> 0 anymore.
The below query only look for value on 3/10/21.
If value = 0, I want to look for value on 3/11/21. If that's not 0, I want to stop.
But, if that's 0, I want to look for value on 3/12/21. If that's not 0, I want to stop.
But, if that's 0, I want to keep looking until value is not 0.
How can I do that ?
SELECT SUM(pd.VALUE)
FROM TblA p,TblADetail pd
WHERE p.id = pd.id
AND p.effective_date = to_date('03/10/2021','MM/DD/YYYY')
AND TRIM (p.entity_id) = 123
Sample data:
TblA
id entity_id effective_date
1 123 3/10/21
2 123 3/11/21
3 123 3/12/21
TblADetail
id value
1 -136
1 136
2 2000
3 3000
In the above data, for entity_id 123, starting from effective_date 3/10/21, I would like to to return value 2000 (from TblADetail) effective_date 3/11/21.
So, starting from a certain date, I want the results from the minimum date that has non-zero values.
Thank you.
You can do what you need to do by grouping the sum on the effective date, and using the MIN analytic function to find the earliest date. Once you've done that, you simply need to select the date that matches the earliest date.
E.g.:
with tbla as (select 1 id, ' 123' entity_id, to_date('10/03/2021', 'dd/mm/yyyy') effective_date from dual union all
select 2 id, ' 123' entity_id, to_date('11/03/2021', 'dd/mm/yyyy') effective_date from dual union all
select 3 id, ' 123' entity_id, to_date('12/03/2021', 'dd/mm/yyyy') effective_date from dual),
tbla_detail as (select 1 id, -136 value from dual union all
select 1 id, 136 value from dual union all
select 2 id, 2000 value from dual union all
select 3 id, 3000 value from dual),
results as (select a.effective_date,
sum(ad.value) sum_value,
min(case when sum(ad.value) != 0 then a.effective_date end) over () min_effective_date
from tbla a
inner join tbla_detail ad on a.id = ad.id
where a.effective_date >= to_date('10/03/2021', 'dd/mm/yyyy')
and trim(a.entity_id) = '123'
group by a.effective_date)
select sum_value
from results
where effective_date = min_effective_date;
SUM_VALUE
----------
2000
Straightforward; read comments within code. Sample data in lines #1 - 13, query begins at line #14.
SQL> with
2 -- sample data
3 tbla (id, entity_id, effective_date) as
4 (select 1, 123, date '2021-03-10' from dual union all
5 select 2, 123, date '2021-03-11' from dual union all
6 select 3, 123, date '2021-03-12' from dual
7 ),
8 tblb (id, value) as
9 (select 1, -136 from dual union all
10 select 1, 136 from dual union all
11 select 2, 2000 from dual union all
12 select 3, 3000 from dual
13 ),
14 tblb_temp as
15 -- simple grouping per ID
16 (select id, sum(value) value
17 from tblb
18 group by id
19 )
20 -- return TBLA values whose ID equals TBLB_TEMP's minimum ID
21 -- whose value isn't zero
22 select a.id, a.entity_id, a.effective_date
23 from tbla a
24 where a.id = (select min(b.id)
25 from tblb_temp b
26 where b.value > 0
27 );
ID ENTITY_ID EFFECTIVE_
---------- ---------- ----------
2 123 03/11/2021
SQL>

Get latest record for a period of months and aggregate its value using Oracle PL-SQL for each ID

I have an Oracle stored procedure that takes input parameters as two date ranges.
e.g.
sp_periodic_data(p_from_date DATE, p_to_date DATE) // let's take p_from_date = 01-Jan-2021 and p_to_date = '03-31-2021'
I need to pick the latest record for each month from the table and add its corresponding value for the time period.
Table Value :
ID
Date
value
1
1-jan-2021
10
1
10-jan-2021
20
2
15-jan-2021
15
2
16-jan-2021
20
2
02-feb-2021
10
2
06-feb-2021
15
1
17-feb-2021
10
1
5-mar-2021
15
1
17-mar-2021
10
2
10-mar-2021
10
Expected output: Need to add the latest record (latest date) for each ID for every month between Jan to March
40 --> for ID 1 (20+10+10)
45 --> for ID 2 (20+15+10)
for a start :
SQL for Beginners
Aggregate Functions
Analytical SQL in Oracle Database 12c
example:
with
list_dates(id,dates,value) as
(
select 1,'1-jan-2021',10 from dual union all
select 1,'10-jan-2021',20 from dual union all
select 2,'15-jan-2021',15 from dual union all
select 2,'16-jan-2021',20 from dual union all
select 2,'02-feb-2021',10 from dual union all
select 2,'06-feb-2021',15 from dual union all
select 1,'17-feb-2021',10 from dual union all
select 1,'5-mar-2021',15 from dual union all
select 1,'17-mar-2021',10 from dual union all
select 2,'10-mar-2021',10 from dual
)
,step1 as (
select
id, trunc(to_date(dates,'dd-mon-yyyy'),'mm') mm,max(value) keep(dense_rank last order by to_date(dates,'dd-mon-yyyy')) value
from list_dates
group by id ,trunc(to_date(dates,'dd-mon-yyyy'),'mm')
)
select id,sum(value) val from step1
group by id;

How do I fetch data for 90 days if there is no data in 10 days using Sqplus?

I write a Query to fetch a data for 10 days from the table. If there is no data for 10 days, then i need to fetch for 50 days.But i don't know how to modify my query to fetch data for 90 days.
Query:
Select ep.NAME||'|'||s.id||'|'||s.SUBMISSION_DATE||'|'||E.VALUE
from SUMMARY_EXT e, summary s, enterprise ep
where e.id = id and e.name_res_key = 'Plan'
and s.id in (select id from summary where
trunc(start_date) > trunc(sysdate) -10 and service_name ='Dplan')
I want to modify my query as if there is a data for 10 days then it should fetch for 10 days. If there is no data then it should fetch for 90days.
Analytic functions can help return rows based on the existence of other rows. First, use CASE expressions to categorize the rows into 10 day, 50 day, or 90 day buckets. Then use analytic functions to count the number of rows in each group. Finally, select only from the relevant groups depending on those counts.
For example:
-- Return 10 days, 50 days, or 90 days of data.
--
--#3: Only return certain rows depending on the counts.
select id, start_date
from
(
--#2: Count the number of rows in each category.
select id, start_date, is_lt_10, is_lt_50, is_lt_90
,sum(is_lt_10) over () total_lt_10
,sum(is_lt_50) over () total_lt_50
,sum(is_lt_90) over () total_lt_90
from
(
--#1: Put each row into a date category.
select
id, start_date,
case when trunc(start_date) > trunc(sysdate) - 10 then 1 else 0 end is_lt_10,
case when trunc(start_date) > trunc(sysdate) - 50 then 1 else 0 end is_lt_50,
case when trunc(start_date) > trunc(sysdate) - 90 then 1 else 0 end is_lt_90
from summary
where start_date > trunc(sysdate) - 90
)
)
where
(is_lt_10 = 1 and total_lt_10 > 0) or
(is_lt_50 = 1 and total_lt_10 = 0 and total_lt_50 > 0) or
(is_lt_90 = 1 and total_lt_50 = 0 and total_lt_90 > 0);
The below views can help simulate date ranges. For complicated queries like this, it's helpful to start as simple as possible, and add all the other joins and columns later.
--Data for 10 days only.
create or replace view summary as
select 1 id, sysdate start_date from dual union all
select 2 id, sysdate-49 start_date from dual union all
select 3 id, sysdate-89 start_date from dual union all
select 4 id, sysdate-99 start_date from dual;
--Data for 50 days only.
create or replace view summary as
select 2 id, sysdate-49 start_date from dual union all
select 3 id, sysdate-89 start_date from dual union all
select 4 id, sysdate-99 start_date from dual;
--Data for 90 days only.
create or replace view summary as
select 3 id, sysdate-89 start_date from dual union all
select 4 id, sysdate-99 start_date from dual;

Oracle SQL Select Query Getting Max Row As a Fraction of a Rollup Total

hoping I might be able to get some advise regarding Oracle SQL…
I have a table roughly as follows (there are more columns, but not necessary for this example)…
LOCATION USER VALUE
1 1 10
1 2 20
1 3 30
2 4 10
2 5 10
2 6 20
1 60
2 40
100
I’ve used rollup to get subtotals.
What I need to do is get the max(value) row for each location and express the max(value) as a percentage or fraction of the subtotal for each location
ie:
LOCATION USER FRAC
1 3 0.5
2 6 0.5
I could probably solve this using my limited knowledge of select queries, but am guessing there must be a fairly quick and slick method..
Thanks in advance :)
Solution using analytic functions
(Please note the WITH MY_TABLE AS serving only as dummy datasource)
WITH MY_TABLE AS
( SELECT 1 AS LOC_ID,1 AS USER_ID, 10 AS VAL FROM DUAL
UNION
SELECT 1,2,20 FROM DUAL
UNION
SELECT 1,3,30 FROM DUAL
UNION
SELECT 2,4,10 FROM DUAL
UNION
SELECT 2,5,10 FROM DUAL
UNION
SELECT 2,6,20 FROM DUAL
)
SELECT LOC_ID,
USER_ID,
RATIO_IN_LOC
FROM
(SELECT LOC_ID,
USER_ID,
RATIO_IN_LOC,
RANK() OVER (PARTITION BY LOC_ID ORDER BY RATIO_IN_LOC DESC) AS ORDER_IN_LOC
FROM
(SELECT LOC_ID,
USER_ID,
VAL,
VAL/SUM(VAL) OVER (PARTITION BY LOC_ID) AS RATIO_IN_LOC
FROM MY_TABLE
)
)
WHERE ORDER_IN_LOC = 1
ORDER BY LOC_ID,
USER_ID;
Result
LOC_ID USER_ID RATIO_IN_LOC
1 3 0,5
2 6 0,5
with inputs ( location, person, value ) as (
select 1, 1, 10 from dual union all
select 1, 2, 20 from dual union all
select 1, 3, 30 from dual union all
select 2, 4, 10 from dual union all
select 2, 5, 10 from dual union all
select 2, 6, 20 from dual
),
prep ( location, person, value, m_value, total ) as (
select location, person, value,
max(value) over (partition by location),
sum(value) over (partition by location)
from inputs
)
select location, person, round(value/total, 2) as frac
from prep
where value = m_value;
Notes: Your table exists already? Then skip everything from "inputs" to the comma; your query should begin with with prep (...) as ( ...
I changed user to person since user is a keyword in Oracle, you shouldn't use it for table or column names (actually you can't unless you use double quotes, which is a very poor practice).
The query will output two or three or more rows per location if there are ties at the top. Presumably this is what you desire.
Output:
LOCATION PERSON FRAC
---------- ---------- ----------
1 3 .5
2 6 .5

Oracle sql retrive records based on maximum time

i have below data.
table A
id
1
2
3
table B
id name data1 data2 datetime
1 cash 12345.00 12/12/2012 11:10:12
1 quantity 222.12 14/12/2012 11:10:12
1 date 20/12/2012 12/12/2012 11:10:12
1 date 19/12/2012 13/12/2012 11:10:12
1 date 13/12/2012 14/12/2012 11:10:12
1 quantity 330.10 17/12/2012 11:10:12
I want to retrieve data in one row like below:
tableA.id tableB.cash tableB.date tableB.quantity
1 12345.00 13/12/2012 330.10
I want to retrieve based on max(datetime).
The data model appears to be insane-- it makes no sense to join an ORDER_ID to a CUSTOMER_ID. It makes no sense to store dates in a VARCHAR2 column. It makes no sense to have no relationship between a CUSTOMER and an ORDER. It makes no sense to have two rows in the ORDER table with the same ORDER_ID. ORDER is also a reserved word so you cannot use that as a table name. My best guess is that you want something like
select *
from customer c
join (select order_id,
rank() over (partition by order_id
order by to_date( order_time, 'YYYYMMDD HH24:MI:SS' ) desc ) rnk
from order) o on (c.customer_id=o.order_id)
where o.rnk = 1
If that is not what you want, please (as I asked a few times in the comments) post the expected output.
These are the results I get with my query and your sample data (fixing the name of the ORDER table so that it is actually valid)
SQL> ed
Wrote file afiedt.buf
1 with orders as (
2 select 1 order_id, 'iphone' order_name, '20121201 12:20:23' order_time from dual union all
3 select 1, 'iphone', '20121201 12:22:23' from dual union all
4 select 2, 'nokia', '20110101 13:20:20' from dual ),
5 customer as (
6 select 1 customer_id, 'paul' customer_name from dual union all
7 select 2, 'stuart' from dual union all
8 select 3, 'mike' from dual
9 )
10 select *
11 from customer c
12 join (select order_id,
13 rank() over (partition by order_id
14 order by to_date( order_time, 'YYYYMMDD HH24:MI:SS' ) desc ) rnk
15 from orders) o on (c.customer_id=o.order_id)
16* where o.rnk = 1
SQL> /
CUSTOMER_ID CUSTOM ORDER_ID RNK
----------- ------ ---------- ----------
1 paul 1 1
2 stuart 2 1
Try something like
SELECT *
FROM CUSTOMER c
INNER JOIN ORDER o
ON (o.CUSTOMER_ID = c.CUSTOMER_ID)
WHERE TO_DATE(o.ORDER_TIME, 'YYYYMMDD HH24:MI:SS') =
(SELECT MAX(TO_DATE(o.ORDER_TIME, 'YYYYMMDD HH24:MI:SS')) FROM ORDER)
Share and enjoy.

Categories

Resources