I have a fairly big table (about 300gb) as:
event_day event_hour customer_id initial_bal final_bal topups debts
01/01 00 11111 0 50 60 10
01/01 01 11111 50 80 45 15
01/01 02 11111 80 30 0 50
...
I want to summarize it into whole days, e.g.:
event_day customer_id initial_bal final_bal topups debts
01/01 11111 0 30 105 75
...
But I´m having troubles with the analitical functions.. I'm working with something like:
select *
from
(
select
event_day, customer_id, initial_bal, final_bal,
sum(topups) over (partition by event_day, customer_id by event_day, customer_id asc) topups,
row_number() over (partition by event_day, customer_id order by event_day, customer_id asc) as initial_bal,
row_number() over (partition by event_day, customer_id order by event_day, customer_id desc) as final_bal
from MY_300GB_TABLE t
)
where initial_bal = 1 or final_bal = 1
order by customer_id, event_day
Which inst doing what I expected... a hand?
I´m tring to avoid joins, sub-queries and such.. I simplified here but there the actual project involves a few big tables and performance might an issue. I'm using Oracle 12C.
thanks!
Good occasion to aggregate with first (or last) option:
select event_day, customer_id,
max(initial_bal) keep (dense_rank first order by event_hour) initial_bal,
max(final_bal) keep (dense_rank last order by event_hour) final_bal,
sum(topups) topups, sum(debts) debts
from tla_t_balance_summary t
group by event_day, customer_id;
dbfiddle demo
Your query works too, but you made a mistake in order by. And it requires additional aggregation, because we have two rows for customers:
select event_day, customer_id, max(topups), max(debts),
min(case rib when 1 then initial_bal end) ib, min(case rfb when 1 then final_bal end) fb
from (
select event_day, customer_id, initial_bal, final_bal,
sum(topups) over (partition by event_day, customer_id) topups,
sum(debts) over (partition by event_day, customer_id) debts,
row_number() over (partition by event_day, customer_id order by event_hour) as rib,
row_number() over (partition by event_day, customer_id order by event_hour desc) as rfb
from tla_t_balance_summary t)
where rib = 1 or rfb = 1
group by customer_id, event_day;
If you are looking for the first and last entry for each day based on hour, your row_number function should reflect that, with something like:
select *
from
(
select
event_day, customer_id, initial_bal, final_bal,
sum(recharge_amount) over (partition byevent_day, customer_id) topups,
row_number() over (partition by event_day, customer_id order by event_hour asc) as initial_hr,
row_number() over (partition by event_day, customer_id order by event_hour desc) as final_hr
from TLA_T_BALANCE_SUMMARY t
)
where initial_hr = 1 or final_hr = 1
order by customer_id, event_day
Its hard to comment exactly because your query doesn't really match the data in terms of columns etc.
I think you will need to use GROUP BY and analytical function as following:
SELECT
EVENT_DAY,
CUSTOMER_ID,
MAX(INITIAL_BAL) AS INITIAL_BAL,
MAX(FINAL_BAL) AS FINAL_BAL,
SUM(TOPUPS) AS TOPUPS,
SUM(DEBTS) AS DEBTS
FROM
(
SELECT
EVENT_DAY,
CUSTOMER_ID,
FIRST_VALUE(INITIAL_BAL) OVER(
PARTITION BY EVENT_DAY, CUSTOMER_ID
ORDER BY
EVENT_HOUR
) AS INITIAL_BAL,
FIRST_VALUE(FINAL_BAL) OVER(
PARTITION BY EVENT_DAY, CUSTOMER_ID
ORDER BY
EVENT_HOUR DESC
) AS FINAL_BAL,
TOPUPS,
DEBTS
FROM
TLA_T_BALANCE_SUMMARY T
WHERE
INITIAL_BAL = 1
OR FINAL_BAL = 1
)
GROUP BY
EVENT_DAY,
CUSTOMER_ID
ORDER BY
CUSTOMER_ID,
EVENT_DAY;
Cheers!!
Related
I have this query
SELECT CASE WHEN LAG(emp_id) OVER( ORDER BY NULL ) = emp_id THEN '-'
ELSE emp_id END "Employee ID",
row_number() over (partition by emp_id order by emp_id) as "S/N",
family_mem_id "MemID",
CASE WHEN LAG(emp_id) OVER(ORDER BY NULL ) = emp_id THEN 0
ELSE (SUM(amount_paid) OVER(PARTITION BY emp_id)) END "Total amount"
FROM Employee
ORDER BY emp_id;
And it shows me result like this: Resultset
I want to add row number (first column SN) for Employee ID for in between rows I want to set it as null, for e.g. For Employee ID -> S/N 2-> F904 (SN should be null). How can I do that?
You can use the outer query to create the SN column as follows:
SELECT CASE WHEN "S/N" = 1 THEN '<1>' END AS SN, T.* FROM -- added this
(SELECT
CASE WHEN LAG(emp_id) OVER( ORDER BY NULL ) = emp_id THEN '-'
ELSE emp_id END "Employee ID",
row_number() over (partition by emp_id order by emp_id) as "S/N",
family_mem_id "MemID",
CASE WHEN LAG(emp_id) OVER(ORDER BY NULL ) = emp_id THEN 0
ELSE (SUM(amount_paid) OVER(PARTITION BY emp_id)) END "Total amount"
FROM Employee ) T
ORDER BY "Employee ID", "S/N"; -- Added this
Note that I have also changed the ORDER BY clause (which is moved to the outer query).
Cheers!!
Use dense_rank() to enumerate rows:
select case lag(emp_id) over( order by emp_id ) when emp_id then null
else '<'||dense_rank() over (order by emp_id)||'>'
end sn,
nullif(emp_id, lag(emp_id) over( order by emp_id )) empid,
row_number() over (partition by emp_id order by emp_id) rn,
family_mem_id memid,
case lag(emp_id) over(order by emp_id) when emp_id then null
else (sum(amount_paid) over(partition by emp_id))
end total
FROM Employee order by emp_id;
dbfiddle and sample output:
SN EMPID RN MEMID TOTAL
---- ----- ------ ----- ----------
<1> 101 1 F901 200
2 F904
<2> 102 1 F901 135
2 F901
<3> 103 1 F901 185
2 F901
3 F901
Trying to fetch top n bottom n rows. Though it gives me result but, it takes lot of time. I believe it scans table twice.
Code used:
WITH TI AS
(SELECT * FROM
(SELECT
Column1,
Column2,
Colmn3
FROM TABLE
ORDER BY DESC
)
WHERE ROWNUM<=5),
T2 AS
(SELECT * FROM
(SELECT
Column1,
Column2,
Colmn3
FROM TABLE
ORDER BY ASC
)
WHERE ROWNUM<=5)
SELECT * FROM T1
UNION ALL
SELECT * FROM T2
How can i fetch this in more faster way??
Considering that tables are updated regularly.
The best way to solve this problem depends in part on your Oracle version. Here is a very simple (and, I suspect, very efficient) solution using the match_recognize clause, added in version 12.1.
I illustrate it using the EMPLOYEES table in the standard HR schema, ordering by SALARY. The only trick here is to select the top and bottom five rows, and to ignore everything in between; that (the "ignoring") is what the {- ... -} operator does in the pattern sub-clause.
select employee_id, first_name, last_name, salary
from hr.employees
match_recognize(
order by salary desc
all rows per match
pattern ( a{5} {- a* -} a{5} )
define a as 0 = 0 -- For reasons known only to Oracle, DEFINE is required.
);
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY
----------- -------------------- ------------------------- ----------
100 Steven King 24000
101 Neena Kochhar 17000
102 Lex De Haan 17000
145 John Russell 14000
146 Karen Partners 13500
135 Ki Gee 2400
127 James Landry 2400
136 Hazel Philtanker 2200
128 Steven Markle 2200
132 TJ Olson 2100
You can combine into a single query and a single pass over the table using analytic functions, generating two pseudocolumns in this case:
select column1, column2, column3,
row_number() over (order by column1 desc) rn_desc,
row_number() over (order by column1 asc) rn_asc
from your_table;
and then filtering using that query as an inline view (or CTE):
select column1, column2, column3
from (
select column1, column2, column3,
row_number() over (order by column1 desc) as rn_desc,
row_number() over (order by column1 asc) as rn_asc
from your_table
)
where rn_desc <=5
or rn_asc <= 5;
I've assumed your ordering is on column1, and picked your_table as a table name as you didn't include that either, so change as appropriate. Depending on how you want to handle ties, you might want to use the rank() or dense_rank() functions instead.
From #mathguy's comment, this may well perform better:
select column1, column2, column3
from (
select column1, column2, column3,
row_number() over (order by column1 desc) as rn,
count(*) over () as cnt
from your_table
)
where rn <=5
or cnt - rn < 5;
I have a table x_visa. I want to delete the duplicate columns from this table.
The query I am using for this is :
select * from (SELECT x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn > 1 );
The delete statement is giving an error :ORA-01752: cannot delete from view without exactly one key-preserved table
delete from
(select * from (SELECT x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn = 1 );
Is there a workaround to delete the duplicate data from this table ?
Each row has rowid identifier. So you can delete where rowid in results of your query.
delete from x_visa where rowid in (/*YOUR QUERY*/);
So we have:
delete from x_visa where rowid in (select r from (SELECT x_visa.rowid r, x_visa.*,
ROW_NUMBER() over (partition by effective_start_date, effective_end_date, person_id,
business_group_id, legislation_code , current_visa_permit, visa_permit_type, visa_permit_id, configuration_id
order by person_id) AS rn
from x_visa) T
WHERE rn > 1 ))
I have a statement that collapses date ranges and I am getting the proper collapsed version when executing the SQL by iteself and when inserting it into a nested table in a procedure I am getting a row that should have been collapsed into the other row.
SELECT client_pk,
plan_id,
grp,
MIN(start_dt) start_dt,
MAX(end_dt) end_dt
FROM (
SELECT client_pk
plan_id,
start_dt,
end_dt,
MAX(grp) OVER (PARTITION BY plan_id ORDER BY start_dt ASC) grp
FROM (
SELECT mp.client_pk,
mp.plan_id,
CASE
WHEN (LAG(mp.end_dt) OVER (PARTITION BY mp.plan_id ORDER BY mp.start_dt ASC)) BETWEEN mp.start_dt-1 AND
NVL(mp.end_dt,to_date('12/31/9999','MM/DD/YYYY'))
THEN NULL
ELSE ROWNUM
END grp,
mp.start_dt,
NVL(mp.end_dt,to_date('12/31/9999','MM/DD/YYYY')) end_dt
FROM client_plan mp
)
)
GROUP BY grp, plan_id, client_pk
So I have an initial result set from the inner most query to give a ROWNUM of:
client_pk PLAN_ID GRP start_dt end_dt
8752 25171 3 1/1/2016 3/31/2016
8752 25171 1 2/1/2016 1/31/2016
and by the end, it is collapsed appropriately when execute as a stand alone query
client_pk PLAN_ID GRP start_dt end_dt
8752 25171 3 1/1/2016 3/31/2016
But when run though a procedure that dumps these records into a nested table that will then eventually be inserted into the DB, both rows are still returned.
SELECT plan_spans_obj(client_pk, plan_id, start_dt, end_dt)
BULK COLLECT INTO plan_spans_ins_tbl
FROM (
SELECT client_pk,
plan_id,
start_dt,
end_dt
FROM ( SELECT client_pk,
plan_id,
grp,
MIN(start_dt) start_dt,
MAX(end_dt) end_dt
FROM (
SELECT client_pk
plan_id,
start_dt,
end_dt,
MAX(grp) OVER (PARTITION BY plan_id ORDER BY start_dt ASC) grp
FROM (
SELECT mp.client_pk,
mp.plan_id,
CASE
WHEN (LAG(mp.end_dt) OVER (PARTITION BY mp.plan_id ORDER BY mp.start_dt ASC)) BETWEEN mp.start_dt-1 AND
NVL(mp.end_dt,to_date('12/31/9999','MM/DD/YYYY'))
THEN NULL
ELSE ROWNUM
END grp,
mp.start_dt,
NVL(mp.end_dt,to_date('12/31/9999','MM/DD/YYYY')) end_dt
FROM client_plan mp
)
)
GROUP BY grp, plan_id, client_pk
)
);
So how am I getting 2 different results from the same query, just executed differently, is it an order of operation different depending on where it is executed.
Also, the extra record is essentially a negative time span, end date being before the start date but this is handled in the query.
The answer was that I didn't initialize the collection. Can't believe it was something like that but apparently so. Seems to be working now.
I have two tables as follows--
ORDERS
create table orders (
ono number(5) not null primary key,
cno number(5) references customers,
eno number(4) references employees,
received date,
shipped date);
ODETAILS
create table odetails (
ono number(5) not null references orders,
pno number(5) not null references parts,
qty integer check(qty > 0),
primary key (ono,pno));
ODETAILS Table
Now I'm trying to figure out the highest and lowest selling product. Logically PNO 10601 which has the highest QTY of 4 is the highest selling product. the following query returns the highest selling product.
SELECT PNO FROM
(SELECT od.PNO, SUM(od.QTY) AS TOTAL_QTY
FROM ODETAILS od
GROUP BY od.PNO
ORDER BY SUM(od.QTY) DESC)
WHERE ROWNUM =1
--Thanks to Bob Jarvis
How do I add a DATE WHERE clause to the SQL above so that I can find out the highest selling product for a given month(lets say DECEMBER) ? The DATE that I'm referring to is from ORDERS table and RECEIVED attribute. I guess I need to join the tables first as well
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table orders (
ono number(5) not null primary key,
cno number(5),
eno number(4),
received date,
shipped date
);
INSERT INTO orders
SELECT 1020, 1, 1, DATE '2015-12-21', NULL FROM DUAL UNION ALL
SELECT 1021, 1, 1, DATE '2015-12-20', DATE '2015-12-20' FROM DUAL UNION ALL
SELECT 1022, 1, 1, DATE '2015-12-18', DATE '2015-12-20' FROM DUAL UNION ALL
SELECT 1023, 1, 1, DATE '2015-12-21', NULL FROM DUAL UNION ALL
SELECT 1024, 1, 1, DATE '2015-12-20', DATE '2015-12-20' FROM DUAL;
create table odetails (
ono number(5) not null references orders(ono),
pno number(5) not null,
qty integer check(qty > 0),
primary key (ono,pno)
);
INSERT INTO odetails
SELECT 1020, 10506, 1 FROM DUAL UNION ALL
SELECT 1020, 10507, 1 FROM DUAL UNION ALL
SELECT 1020, 10508, 2 FROM DUAL UNION ALL
SELECT 1020, 10509, 3 FROM DUAL UNION ALL
SELECT 1021, 10601, 4 FROM DUAL UNION ALL
SELECT 1022, 10601, 1 FROM DUAL UNION ALL
SELECT 1022, 10701, 1 FROM DUAL UNION ALL
SELECT 1023, 10800, 1 FROM DUAL UNION ALL
SELECT 1024, 10900, 1 FROM DUAL;
Query 1 - The onoand pnos for the pno which has sold the maximum total quantity in December 2015:
SELECT ono,
pno,
TOTAL_QTY
FROM (
SELECT q.*,
RANK() OVER ( ORDER BY TOTAL_QTY DESC ) AS rnk
FROM (
SELECT od.ono,
od.PNO,
SUM( od.QTY ) OVER ( PARTITION BY od.PNO ) AS TOTAL_QTY
FROM ODETAILS od
INNER JOIN
orders o
ON ( o.ono = od.ono )
WHERE TRUNC( o.received, 'MM' ) = DATE '2015-12-01'
-- WHERE EXTRACT( MONTH FROM o.received ) = 12
) q
)
WHERE rnk = 1
Change the WHERE clause to get the results for any December rather than just December 2015.
Results:
| ONO | PNO | TOTAL_QTY |
|------|-------|-----------|
| 1021 | 10601 | 5 |
| 1022 | 10601 | 5 |
Query 2 - The ono and pnos for the items which have sold the maximum quantity in a single order in December 2015:
SELECT ono,
pno,
qty
FROM (
SELECT od.*,
RANK() OVER ( ORDER BY od.qty DESC ) AS qty_rank
FROM ODETAILS od
INNER JOIN
orders o
ON ( o.ono = od.ono )
WHERE TRUNC( o.received, 'MM' ) = DATE '2015-12-01'
-- WHERE EXTRACT( MONTH FROM o.received ) = 12
)
WHERE qty_rank = 1
Change the WHERE clause to get the results for any December rather than just December 2015.
Results:
| ONO | PNO | QTY |
|------|-------|-----|
| 1021 | 10601 | 4 |
... where received between to_date('12/01/2015','MM/DD/YYYY') and to_date('12/31/2015','MM/DD/YYYY')
I believe I have solved it!
SELECT PNO
FROM (SELECT OD.PNO, SUM(OD.QTY) AS TOTAL_QTY
FROM ODETAILS OD INNER JOIN ORDERS ON OD.ONO = ORDERS.ONO
WHERE EXTRACT(MONTH FROM ORDERS.RECEIVED) = &MONTH_NUMBER
GROUP BY OD.PNO
ORDER BY SUM(OD.QTY) DESC)
WHERE ROWNUM =1;
You can add some to_char calls to your query on the date columns to parse out year and month, or just month if you want all years divided by month (month and year seems more useful), then add that to your where clause. See my self-contained example:
with odetails as
(
select 1 as ono, 1 as pno, 4 as qty from dual
union all
select 1 as ono, 2 as pno, 1 as qty from dual
union all
select 1 as ono, 3 as pno, 2 as qty from dual
union all
select 1 as ono, 4 as pno, 1 as qty from dual
union all
select 2 as ono, 2 as pno, 1 as qty from dual
union all
select 2 as ono, 3 as pno, 2 as qty from dual
),
orders as
(
select 1 as ono, 1 as cno, 1 as eno, to_date('2015-10-12', 'YYYY-MM-DD') as received, to_date('2015-10-15', 'YYYY-MM-DD') as shipped from dual
union all
select 2 as ono, 1 as cno, 1 as eno, to_date('2015-11-12', 'YYYY-MM-DD') as received, to_date('2015-11-15', 'YYYY-MM-DD') as shipped from dual
)
select pno
from
(
select od.pno, Sum(od.qty) as total_qty, to_char(received, 'YYYY-MM') as year_month
from odetails od
join orders o
on o.ono = od.ono
group by od.pno, to_char(received, 'YYYY-MM')
order by Sum(od.qty) desc
)
where rownum = 1
and year_month = '2015-11'
;
This gives you PNO of 3, since it has the highest quantity in november of 2015.