How to Model Last Values for Missing Weekend Data - oracle

I'm trying to create a query that will carry data forward to null rows. This is a fairly simple query with analytic functions LAG and LEAD
The query works when there is no partition by window and only one security_id. But, when I add that partition by and a second security_id, the last_value function fails. The bold rows do not appear.
Any ideas?
Thanks!
select t.*
,last_value(weight_pct ignore nulls) over
(partition by security_id order by n desc) value1 from (
select
datekey,portfolio_id,security_id,weight_pct
, n
from (
select to_date(first_date, 'yyyymmdd') -
to_date(datekey, 'yyyymmdd') day
,datekey
,portfolio_id
,security_id
,weight_pct
from ((select FIRST_VALUE(datekey) OVER(ORDER BY datekey desc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as first_date,
datekey,
portfolio_id,
security_id,
WEIGHT_PCT
from foo d
where security_id in (7,658900)
and portfolio_id = 2))) d
right outer join (select level as n
from dual
connect by level <= 95) l
on d.days = l.n) t
order by n, security_id, datekey desc

Because you have set an IGNORE NULLS On line 2?

Related

In case second highest value is missing then use the highest value of salary from employee table

I have an employee table as below. As you can see that second highest salary is 200
Incase the second highest salary is missing then there will be only one row as shown at last . In this case the query should fetch only 100
I have written query as but it is not working. Please help! Thanks
select salary "SecondHighestSalary" from(
(select id,salary,rank() over(order by salary desc) rnk
from employee2)
)a
where (rnk) in coalesce(2,1)
I have also tried the following but it is fetching 2 rows but i need only 1
It sounds like you'd want something like
with ranked_emp as (
select e.*,
rank() over (order by e.sal desc) rnk,
count(*) over () cnt
from employee2 e
)
select salary "SecondHighestSalary"
from ranked_emp
where (rnk = 2 and cnt > 1)
or (rnk = 1 and cnt = 1)
Note that I'm still using rank since you're using that in your approach and you don't specify how you want to handle ties. If there are two employees with the same top salary, rank will assign both a rnk of 1 and no employee would have a rnk of 2 so the query wouldn't return any data. dense_rank would ensure that there was at least one employee with a rnk of 2 if there were employees with at least 2 different salaries. If there are two employees with the same top salary, row_number would arbitrarily assign one the rnk of 2. The query I posted isn't trying to handle those duplicate situations because you haven't outlined exactly how you'd want it to behave in each instance.
If you are in Oracle 12.2 or higher, you can try:
select distinct id,
nvl
(
nth_value(salary, 2) from first
over(partition by id
order by salary desc
range between unbounded preceding and unbounded
following),
salary
) second_max_salary
from employee2

How to select data from multiple row into one row with multiple column dynamically?

I am trying to select multiple rows of data into one row through multiple columns which will change dynamically.
This is in Oracle database. I want to count repeated work done by the LEAD_TECHNISIAN_ID within a duration. If the difference of last work delivery date and new work receive date is 15 or below 15 then LEAD_TECHNISIAN_ID has one repeated work.
List item
SELECT *
FROM (WITH CTE AS (
SELECT ROW_NUMBER () OVER (ORDER BY ID) AS RW,
RECEIVED_DATE,
DELIVERY_DATE,
SERVICE_NO,
LEAD_TECHNISIAN_ID,
ID,
SERVICE_CENTER
FROM ( SELECT cc.SERVICE_CENTER,
CC.ID,
CC.BARCODE,
TRUNC (cc.CREATED_DATE) RECEIVED_DATE,
TRUNC (CC.DELIVERY_DATE) DELIVERY_DATE,
cc.SERVICE_NO,
CC.LEAD_TECHNISIAN_ID
FROM customer_complains cc
WHERE cc.BARCODE IN (SELECT BARCODE
FROM (SELECT BARCODE,
COUNT (BARCODE)
FROM customer_complains c
WHERE c.BARCODE <> 'UNDEFINE'
AND C.BARCODE = NVL ('351950102757821', BARCODE)
AND c.SEGMENT3 = NVL ('',c.SEGMENT3)
AND c.SEGMENT3 IN (SELECT SEGMENT3
FROM ITEM_MST
WHERE PRODUCT_GROUP = NVL ('',PRODUCT_GROUP))
GROUP BY c.BARCODE
HAVING COUNT (c.BARCODE) >1))
ORDER BY ID DESC)
ORDER BY ID DESC)
SELECT a.id,
a.DELIVERY_DATE,
a.RECEIVED_DATE,
b.RECEIVED_DATE PRE_RCV,
b.DELIVERY_DATE PRE_DEL,
(a.RECEIVED_DATE - b.DELIVERY_DATE) AS DIFF,
a.SERVICE_NO,
a.LEAD_TECHNISIAN_ID,
b.LEAD_TECHNISIAN_ID PRE_TECH --, a.DELIVERY_DATE
FROM CTE a
LEFT JOIN CTE b ON a.RW = b.RW + 1
)
WHERE DIFF <= 15
Here is the output for a specific barcode. but when I try for All the barcode I have in My Customer_complains table. The query provides irrelevant output.
Currently your code is giving numbers 1,2,3,4... to rows irrespective of LEAD_TECHNISIAN_ID and then you are joining it with RW. It will not consider LEAD_TECHNISIAN_ID while giving row numbers.
RW must start with 1 for each LEAD_TECHNISIAN_ID.
You just need to change calculation of RW as following:
ROW_NUMBER () OVER (PARTITION BY LEAD_TECHNISIAN_ID ORDER BY ID) AS RW
Cheers!!

selecting records from hive table where there are duplicates with a given criteria

Table 1 has duplicate entries in column A with same frequency values. I need to select one random record out of those .If the duplicate entry contain 'unknown' as a column B value ( like in record "d") select one from other rows . I need a select statement which satisfy the above . Thanks .
These conditions can be prioritized using a case expression in order by with a function like row_number.
select A,B,frequency,timekey
from (select t.*
,row_number() over(partition by A order by cast((B = 'unknown') as int), B) as rnum
from tbl t
) t
where rnum = 1
Here for each group of A rows, we prioritize rows other than B = 'unknown' first, and then in the order of B values.
Use row_number analytic function. If you want to select not unknown record first, then use the query below:
select A, B, Frequency, timekey
from
(select
A, B, Frequency, timekey,
row_number() over(partition by A,Frequency order by case when B='unknown' then 1 else 0 end) rn
)s where rn=1
And if you want to select unknown if they exist, use this row_number in the query above:
row_number() over(partition by A,Frequency order by case when B='unknown' then 0 else 1 end) rn

RANK OVER function in Hive

I'm trying to run this query in Hive to return only the top 10 url which appear more often in the adimpression table.
select
ranked_mytable.url,
ranked_mytable.cnt
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
Unfortunately I get an error message stating:
FAILED: SemanticException [Error 10002]: Line 26:23 Invalid column reference 'rnk'
I've tried to debug it and until the ranked_mytable sub-queries everything goes smooth. I've tried to comment the where ranked_mytable.rnk <= 10 clause but the error message keep appearing.
Hive is unable to order by a column that is not in the "output" of a select statement. To fix it, just include that column in the selected columns:
select
ranked_mytable.url,
ranked_mytable.cnt,
ranked_mytable.rnk
from
( select iq.url, iq.cnt, rank() over (partition by iq.url order by iq.cnt desc) rnk
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url
) iq
) ranked_mytable
where
ranked_mytable.rnk <= 10
order by
ranked_mytable.url,
ranked_mytable.rnk desc
;
If you don't want that 'rnk' column in your final output, I expect you could wrap that whole thing in another inner-query and just select out the 'url' and 'cnt' fields.
RANK OVER is not the best function to achieve this goal.
A better solution would be to use a combination of SORT BY and LIMIT. It's true in fact LIMIT picks randomly the rows in a table, but this might be avoided if used with the SORT BY function. From the Apache Wiki:
-- Top k queries. The following query returns the top 5 sales records wrt amount.
SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount
DESC LIMIT 5
The query can be re-written in this way:
select
iq.url,
iq.cnt
from
( select url, count(*) cnt
from store.adimpression ai
inner join zuppa.adgroupcreativesubscription agcs
on agcs.id = ai.adgroupcreativesubscriptionid
inner join zuppa.adgroup ag
on ag.id = agcs.adgroupid
where ai.datehour >= '2014-05-15 00:00:00'
and ag.siteid = 1240
group by url ) iq
sort by
iq.cnt desc
limit
10
;
Remove the partition by iq.url clause from rank over() and re-run query.
Thanks & Regards,
Kamleshkumar Gujarathi
Put as before the rnk variable. It should work fine.

Taking the record with the max date

Let's assume I extract some set of data.
i.e.
SELECT A, date
FROM table
I want just the record with the max date (for each value of A). I could write
SELECT A, col_date
FROM TABLENAME t_ext
WHERE col_date = (SELECT MAX (col_date)
FROM TABLENAME t_in
WHERE t_in.A = t_ext.A)
But my query is really long... is there a more compact way using ANALYTIC FUNCTION to do the same?
The analytic function approach would look something like
SELECT a, some_date_column
FROM (SELECT a,
some_date_column,
rank() over (partition by a order by some_date_column desc) rnk
FROM tablename)
WHERE rnk = 1
Note that depending on how you want to handle ties (or whether ties are possible in your data model), you may want to use either the ROW_NUMBER or the DENSE_RANK analytic function rather than RANK.
If date and col_date are the same columns you should simply do:
SELECT A, MAX(date) FROM t GROUP BY A
Why not use:
WITH x AS ( SELECT A, MAX(col_date) m FROM TABLENAME GROUP BY A )
SELECT t.A, t.date FROM TABLENAME t JOIN x ON x.A = t.A AND x.m = t.col_date
Otherwise:
SELECT A, FIRST_VALUE(date) KEEP(dense_rank FIRST ORDER BY col_date DESC)
FROM TABLENAME
GROUP BY A
You could also use:
SELECT t.*
FROM
TABLENAME t
JOIN
( SELECT A, MAX(col_date) AS col_date
FROM TABLENAME
GROUP BY A
) m
ON m.A = t.A
AND m.col_date = t.col_date
A is the key, max(date) is the value, we might simplify the query as below:
SELECT distinct A, max(date) over (partition by A)
FROM TABLENAME
Justin Cave answer is the best, but if you want antoher option, try this:
select A,col_date
from (select A,col_date
from tablename
order by col_date desc)
where rownum<2
Since Oracle 12C, you can fetch a specific number of rows with FETCH FIRST ROW ONLY.
In your case this implies an ORDER BY, so the performance should be considered.
SELECT A, col_date
FROM TABLENAME t_ext
ORDER BY col_date DESC NULLS LAST
FETCH FIRST 1 ROW ONLY;
The NULLS LAST is just in case you may have null values in your field.
SELECT mu_file, mudate
FROM flightdata t_ext
WHERE mudate = (SELECT MAX (mudate)
FROM flightdata where mudate < sysdate)

Resources