Min(), Max() within date subset - oracle

Not sure if the title fits, but here's my problem:
I have the following table:
create table OpenTrades(
AccountNumber number,
SnapshotTime date,
Ticket number,
OpenTime date,
TradeType varchar2(4),
TradeSize number,
TradeItem char(6),
OpenPrice number,
CurrentAsk number,
CurrentBid number,
TradeSL number,
TradeTP number,
TradeSwap number,
TradeProfit number
);
alter table OpenTrades add constraint OpenTrades_PK Primary Key (AccountNumber, SnapshotTime, Ticket) using index tablespace MyNNIdx;
For every (SnapshotTime, account), I want to select min(OpenPrice), max(OpenPrice) in such a way that the resultimg min and max are relative to the past only, with respect to SnapshotTime.
For instance, for any possible (account, tradeitem) pair, I may have 10 records with, say, Snapshottime=10-jun and openprice between 0.9 and 2.0, as well as 10 more records with SnapshotTime=11-jun and openprice between 1.0 and 2.1, as well as 10 more records with SnapshotTime=12-jun and openprice between 0.7 and 1.9.
In such scenario, the sought query should return something like this:
AccountNumber SnapshotTime MyMin MyMax
------------- ------------ ----- -----
1234567 10-jun 0.9 2.0
1234567 11-jun 0.9 2.1
1234567 12-jun 0.7 2.1
I've already tried this, but it only returns min() and max() within the same snapshottime:
select accountnumber, snapshottime, tradeitem, min(openprice), max(openprice)
from opentrades
group by accountnumber, snapshottime, tradeitem
Any help would be appreciated.

You can use the analytic versions of min() and max() for this, along with windowing clauses:
select distinct accountnumber, snapshottime, tradeitem,
min(openprice) over (partition by accountnumber, tradeitem
order by snapshottime, openprice
rows between unbounded preceding and current row) as min_openprice,
max(openprice) over (partition by accountnumber, tradeitem
order by snapshottime, openprice desc
rows between unbounded preceding and current row) as max_openprice
from opentrades
order by accountnumber, snapshottime, tradeitem;
ACCOUNTNUMBER SNAPSHOTTIME TRADEITEM MIN_OPENPRICE MAX_OPENPRICE
------------- ------------ --------- ------------- -------------
1234567 10-JUN-14 X .9 2
1234567 11-JUN-14 X .9 2.1
1234567 12-JUN-14 X .7 2.1
SQL Fiddle.
The partition by calculates the value for the current accountnumber and tradeitem, within the subset of rows based on the rows between clause; the order by means that it only looks at rows in any previous snapshot and up to the lowest (for min) or highest (for max, because of the desc) in the current snapshot, when calculating the appropriate min/max for each row.
The analytic result is calculated for every row. If you run it without the distinct then you see all your base data plus the same min/max for each snapshot (Fiddle). As you don't want any of the varying data you can suppress the duplication with distinct, or by making it a query with a row_number() that you then filter on, etc.

Does this answer your problem ?
select ot1.accountnumber, ot1.snapshottime, ot1.tradeitem,
min(ot2.openprice), max(ot2.openprice)
from opentrades ot1, opentrades ot2
where ot2.accountnumber = ot1.accountnumber
and ot2.tradeitem = ot1.tradeitem
and ot2.snapshottime <= ot1.snapshottime
group by ot1.accountnumber, ot1.snapshottime, ot1.tradeitem

Related

Power Pivot - Looking up details with start and end dates?

I need help in the formula correctness of my approach in this PowerPivot table in Excel.
I have two tables: daily timesheet and employee details. The idea is to lookup the department, sub-department, and managers from the employee_details table based on the employee number and shift date in the daily_timesheet table.
Here's a representation of the tables:
daily_timesheet
shift_date
emp_num
scheduled_hours
actual_worked_hrs
dept
sub_dept
mgr
2022-02-28
01234
7.5
7.34
16100
16181
05432
2022-03-15
01234
7.5
7.50
16200
16231
06543
employee_details
emp_num
dept_code
sub_dept_code
mgr_emp_num
start_date
end_date
is_current
01234
16000
16041
04321
2022-01-01
2022-01-31
FALSE
01234
16100
16181
05432
2022-02-01
2022-01-28
FALSE
01234
16200
16231
06543
2022-03-01
null
TRUE
End dates have null values if it is the employee's current assignment; but it's never null if the is_current field is FALSE.
Here's what I've managed so far:
First, lookup the current start date in the employee details is less than or equal to the shift date.
If true, I'm using the LOOKUP function to return the department, sub-department, and manager by searching the employee number and the true value in the is_current field.
Else, I use the MIN function to get the value of those fields and wrap it around a CALCULATE function then apply FILTER for: (1) emp_num matching the timesheet, (2) is_current that has a FALSE value, (3) start_date less than or equal to the shift_date, and (4) end_date is greater than or equal to the shift_date.
And the bedrock of my question is actually, the item 3 above. I know using the MIN function is incorrect, but I can't find any solution that will work.
Here's the formula I've been using for to get the dept in the daily_timesheet table from the employee_details table:
=IF(
LOOKUP(employee_details[start_date],
employee_details[emp_num],
daily_timesheet[emp_num],
employee_details[is_current] = TRUE) <= daily_timesheet[shift_date],
LOOKUP(employee_details[dept_code],
employee_details[emp_num],
daily_timesheet[emp_num],
employee_details[is_current] = TRUE),
CALCULATE(MIN(employee_details[dept_code]),
FILTER(employee_details, employee_details[emp_num] = daily_timesheet[emp_num]),
FILTER(employee_details, employee_details[is_current] = FALSE),
FILTER(employee_details, employee_details[start_date] <= daily_timesheet[shift_date]),
FILTER(employee_details, employee_details[end_date] >= daily_timesheet[shift_date]))
)
Any advice please?

Generate order number based on a column value with reference to other columns

This is question in Oracle views.I have a table with Emp_id,Start_Period and Key. Sample data is given in Descending order of start period with 201909 on top. Need to generate a column named Key_order. (Finally I am planning to create a view with all 4 columns.)
With the sample data as shown. In the sorted list with Start_period what ever comes in first position with number 1 and then on, when the Key changes order has to increment by one.
That is in row 1 and 2 key is same and order is 1. In row 3 SCD changed to ABC so order has to increment by 1 so order value is 2. 4th position key changes and order becomes 3.
See in 7th and 8th position value is same so order remains 6 for both. I am trying to do this inside a view. Tried RANK() but it is sorting column Key and giving order based on that.
Please help.Sample Data
Set a one in each line that has a different key than the line before. Use LAG for this. Then build a running total of these ones with SUM OVER.
select
emp_id, start_period, key,
sum(chg) over (partition by emp_id order by start_period desc) as key_order
from
(
select
emp_id, start_period, key,
case when key = lag(key) over (partition by emp_id order by start_period desc)
then 0 else 1 end as chg
from mytable
)
order by emp_id, start_period desc;

Oracle: Update values in table with aggregated values from same table

I am looking for a possibly better approach to this.
I have created a temp table in Oracle 11.2 that I'm using to pre calculate values that I will need in other selects instead of always generating them again with each select.
create global temporary table temp_foo (
DT timestamp(6), --only the date part will be used in this example but for later things I will need the time
Something varchar2(100),
Customer varchar2(100),
MinDate timestamp(6),
MaxDate timestamp(6),
Filecount int,
Errorcount int,
AvgFilecount int,
constraint PK_foo primary key (DT, Customer)
) on commit preserve rows;
I then first insert some fixed values for everything except AvgFilecount. AvgFilecount should contain the average for the Filecount for the 3 previous records (going by the date in DT). It doesn’t matter that the result will be converted to an int, I don’t need the decimal places
DT | Customer | Filecount | AvgFilecount
2019-04-30 | x | 10 | avg(2+3+9)
2019-04-29 | x | 2 | based on values before this
2019-04-28 | x | 3 | based on values before this
2019-04-27 | x | 9 | based on values before this
I thought about using a normal UPDATE statement as this should be faster than looping through the values. I should mention that there are no gaps in the DT field but obviously there is a first one where I won‘t find any previous records. If I would loop through, I could easily calculate AvgFilecount with (the record before previous record/2 + previous record)/3 which I cannot with UPDATE as I cannot guarantee the order of how they are executed. So I‘m fine with just taking the last 3 records (going by DT) and calcuting it from there.
What I thought would be an easy update is giving me headaches. I‘m mostly doing SQL Server where I would just join the 3 other records but it seems is a bit different in Oracle. I have found https://stackoverflow.com/a/2446834/4040068 and wanted to use the second approach in the answer.
update
(select curr.DT, curr.temp_foo, curr.Filecount, curr.AvgFilecount as OLD, (coalesce(Minus1.Filecount, 0) + coalesce(Minus2.Filecount, 0) + coalesce(Minus3.Filecount, 0)) / 3 as NEW
from temp_foo curr
left join temp_foo Minus1 ON Minus1.Customer = curr.Customer and trunc(Minus1.DT) = trunc(curr.DT-1)
left join temp_foo Minus2 ON Minus2.Customer = curr.Customer and trunc(Minus2.DT) = trunc(curr.DT-2)
left join temp_foo Minus3 ON Minus3.Customer = curr.Customer and trunc(Minus3.DT) = curr.DT-3
order by 1, 2
)
set OLD = NEW;
Which gives me an
ORA-01779: cannot modify a column which maps to a non key-preserved
table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
I thought this should work as both join conditions are in the primary key and thus unique. I am currently implementing the first approach in the above mentioned answer but it is getting quite big and it feels like there should be a better solution to this.
Other things I thought about trying:
using a nested subselect (nested because Oracle doesn’t know top(n) and I need to sort the subselect) to select the previous 3 records ordered by DT and then he outer select with rownum <=3 and then I could just use AVG(). However, I was told subselect can be quite slow and joins are better in Oracle performance wise. Dunno if that is really the case, haven‘t done any testing
Edit: My insert right now looks like this. I am already aggregating the Filecount for a day as there can be multiple records per DT per Customer per Something.
insert into temp_foo (DT, Something, Customer, Filecount)
select dates.DT, tbl1.Something, tbl1.Customer, coalesce(sum(tbl3.Filecount),0)
from table(Function_Returning_Daterange(NULL, NULL)) dates
cross join
(SELECT Something,
Code,
Value
FROM Table2 tbl2
WHERE (Something = 'Value')) tbl1
left outer join Table3 tbl3
on tbl3.Customer = tbl1.Customer
and trunc(tbl3.MinDate) = trunc(dates.DT)
group by dates.DT, tbl1.Something, tbl1.Customer;
You could use an analytic average with a window clause:
select dt, customer, filecount,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
and then do the update part with a merge statement:
merge into tmp_foo t
using (
select dt, customer,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
) s
on (s.dt = t.dt and s.customer = t.customer)
when matched then update set t.avgfilecount = s.avgfilecount;
4 rows merged.
select dt, customer, filecount, avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
You haven't shown your original insert statement; it might be possible to add the analytic calculation to that, and avoid the separate update step.
Also, if you want the first two date values to be calculated as if the 'missing' extra days before them had zero counts, you could use sum and division instead of avg:
select dt, customer, filecount,
sum(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding)/3 as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 4
2019-04-28 x 3 3
2019-04-27 x 9
It depends what you expect those last calculated values to be.

Oracle pagination ROWNUM column>=value challenge

Having some trouble with oracle pagination. Case:
Table with > 1 billion rows:
Measurement(Id Number, Classification VARCHAR, Value NUMBER)
Index:
ON Measurement(Value)
I need a query that gets the first match and the following 2000 matches ordered by Value. I also would like to use the index.
First idea:
SELECT * FROM Measurement WHERE Value >= 1234567890
AND ROWNUM <= 2000 ORDER BY Value ASC
Result:
The query just returns the first 2000 cases it can find in the table, starting from the top, where Value is higher or equal to 1234567890, and then orders that resultset ascending.
Second idea:
SELECT * FROM
(SELECT * FROM Measurement WHERE Value >= 1234567890 ORDER BY Value ASC)
WHERE ROWNUM <= 2000
Result:
Oracle does not understand that ROWNUM should limit the amount from the inner query, so oracle decides to get all rows where Value is greater or equal to 1234567890 first, and then order that giant resultset before returning the first 2000 rows. Because Oracle is guessing that most of the data in the table will be returned, it ignores any use of index as well.
None of these approaches are acceptable as the first one gives the wrong results, and the second one takes hours.
Is pagination supported at all in Oracle?
You can use the following
SELECT * FROM
(SELECT Id, Classification, Value, ROWNUM Rank FROM Measurement WHERE Value >= 1234567890)
WHERE Rank <= 2000
order by Rank
You do not need to order in the sub-query. Simply unnecessary.
The above is not pagination but the firs page I would suppose.
Not sure if you got the solution for your problem, but to put my two cents:
The first query will not answer your requirements as it will fetch 2000 random records that satisfy your query and then do an order by.
Coming to the second query :
Oracle will first do the execution of the second query and will then only move to the outer query. So, the rownum filter will be applied only after the inner query is executed.
You can try the below approach, to do INDEX FAST FULL SCAN, i have tested it on a table with 2.76 million rows and it is having lesser cost than the other approach:
SELECT * from Measurement
where value in ( SELECT VALUE FROM
(SELECT Value FROM Measurement
WHERE Value >= 1234567890 ORDER BY Value ASC)
WHERE ROWNUM <= 2000)
Hope it Helps
Vishad
I think I have fond a potential solution. However, it's not a query.
declare
cursor c is
SELECT * FROM Measurement WHERE Value >= 1234567890 ORDER BY Value ASC;
l_rec c%rowtype;
begin
open c;
for i in 1 .. 2000
loop
fetch c into l_rec;
exit when c%notfound;
end loop;
close c;
end;
/
Kindly experiment with more options
SELECT *
FROM( SELECT /*+ FIRST_ROWS(2000) */
Id,
Classification,
Value,
ROW_NUMBER() OVER (ORDER BY Value) AS rn
FROM Measurement
where Value > 1234567889
)
WHERE rn <=2000;
Update1:- Force the use of index on Value.Here IDX_ON_VALUE is the Name of the index on Value in Measurement
SELECT * FROM
(SELECT /*+ INDEX(a IDX_ON_VALUE) */* FROM Measurement
a WHERE value >=1234567890 )
ORDER BY a.Value ASC)
WHERE ROWNUM <= 2000

Oracle Analytic Rolling Percentile

Is it possible to use windowing with any of the percentile functions? Or do you know a work around to get a rolling percentile value?
It is easy with a moving average:
select avg(foo) over (order by foo_date rows
between 20 preceding and 1 preceding) foo_avg_ma
from foo_tab
But I can't figure out how to get the median (50% percentile) over the same window.
You can use PERCENTILE_CONT or PERCENTILE_DISC function to find the median.
PERCENTILE_CONT is an inverse distribution function that assumes a
continuous distribution model. It takes a percentile value and a sort
specification, and returns an interpolated value that would fall into
that percentile value with respect to the sort specification. Nulls
are ignored in the calculation.
...
PERCENTILE_DISC is an inverse distribution function that assumes a
discrete distribution model. It takes a percentile value and a sort
specification and returns an element from the set. Nulls are ignored
in the calculation.
...
The following example computes the median salary in each department:
SELECT department_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median cont",
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) "Median disc"
FROM employees
GROUP BY department_id
ORDER BY department_id;
...
PERCENTILE_CONT and PERCENTILE_DISC may return different results.
PERCENTILE_CONT returns a computed result after doing linear
interpolation. PERCENTILE_DISC simply returns a value from the set of
values that are aggregated over. When the percentile value is 0.5, as
in this example, PERCENTILE_CONT returns the average of the two middle
values for groups with even number of elements, whereas
PERCENTILE_DISC returns the value of the first one among the two
middle values. For aggregate groups with an odd number of elements,
both functions return the value of the middle element.
a SAMPLE with windowing simulation trough range self-join
with sample_data as (
select /*+materialize*/ora_hash(owner) as table_key,object_name,
row_number() over (partition by owner order by object_name) as median_order,
row_number() over (partition by owner order by dbms_random.value) as any_window_sort_criteria
from dba_objects
)
select table_key,x.any_window_sort_criteria,x.median_order,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY y.median_order DESC) as rolling_median,
listagg(to_char(y.median_order), ',' )WITHIN GROUP (ORDER BY y.median_order) as elements
from sample_data x
join sample_data y using (table_key)
where y.any_window_sort_criteria between x.any_window_sort_criteria-3 and x.any_window_sort_criteria+3
group by table_key,x.any_window_sort_criteria,x.median_order
order by table_key, any_window_sort_criteria
/

Resources