KDB:selecting data “around” time of certain events Part2 - intervals

Follow up to this question... KDB:selecting data "around" time of certain events
Consider a huge table of market data T. I am particularly interested in rows where Status=`SSS.
However, in addition to the rows given by (select from T where Status=`SSS), I also would like to select records that are within a certain short time interval around that event (as opposed to the earlier question, where we selected a fixed number of records surrounding the events). Note that in some cases, these intervals may overlap. What is an efficient way to do this?

We have an idea here that might help you:
q)n:10000000;
q)T:([]time:asc n?1D0;sym:n?3;price:n?100f;status:#[n?`3;-100000?n;:;`SSS])
q)f:{[t;x;d]t where 0<sums sum #[c#0;;+;]'[(-1+c:count t)&t[`time]binr/:x+/:-1 1*d;1 -1]}
q)f[T;exec time from T where status=`SSS;0D00:00:00.01]
time sym price status
-----------------------------------------
0D00:00:01.169838756 2 77.1118 lbh
0D00:00:01.175813376 2 24.94157 emk
0D00:00:01.176316291 2 68.49994 SSS
0D00:00:01.180037856 1 81.54316 hhi
0D00:00:01.183518022 1 0.6516971 hni
0D00:00:01.291926205 2 51.94651 kjf
0D00:00:01.300173997 0 14.67675 SSS
0D00:00:01.309709250 1 82.77418 oji
The idea here is to extract out the time of event and use binr to find all the time windows that you need:
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
Say t3, t5 and t6,t8 meets the time window, we put a marker around them
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 -1 0 0 -1 0
sums sum will highlight all the records you need:
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
`a `b `c `S `d `e `S `f `g `h
0 0 1 1 1 1 1 1 0 0
then the rest is just straight forward...

An alternative method to this is to use window join, wj1, which allows you to pass in custom windows for each time and perform an aggregation on the data within that window. A simplified explanation of the syntax is:
wj[window pairs;common columns;table 1;(table 2;(function;column))]
Taking the following tables as an example:
q)trade / simplified trade table
time sym
---------
09:00 a
09:30 a
10:00 a
q)quote / simplified quote table
time sym px
------------------
09:12 a 9.420396
09:29 a 6.416515
10:07 a 8.53406
To sums all the quote prices within a 20 minute window either side of the trade time for each sym we use the following method. First create pairs of start and end times for the windows:
q)show window:-20 20+\:x`time
08:40 09:10 09:40
09:20 09:50 10:20
Where each list has the same length as the trade table. Then pass this in to wj, alongside the aggregation function sum for the prices px:
q)wj1[window;`sym`time;x;(y;(sum;`px))]
time sym px
------------------
09:00 a 9.420396
09:30 a 15.83691
10:00 a 8.53406
To investigate the values being aggregated in each window we can use the null function :::
q)wj1[window;`sym`time;x;(y;(::;`px))]
time sym px
---------------------------
09:00 a ,9.420396
09:30 a 9.420396 6.416515
10:00 a ,8.53406
It should be noted that wj1 only considers value inside the window, whereas wj considers prevailing values to be part of the window.

Related

Oracle: Join 2 tables without an explicit key

I have 2 table as follows:
Case_No.
Month
Month_Prev
Code
Stage
Code_Prev
Stage_Prev
Status
1
2022.09
2022.08
b
2
a
1
1
2
2022.09
2022.08
a
2
b
1
1
and
Month
Code
Stage
Rate
Status
2022.09
a
1
0.2
1
2022.09
a
2
0.1
1
2022.09
b
1
0.3
1
2022.09
b
2
0.1
1
2022.08
a
1
0.3
1
2022.08
a
2
0.2
1
2022.08
b
1
0.15
1
2022.08
b
2
0.25
1
My desired output:
Case_No.
Month
Month_Prev
Code
Stage
Code_Prev
Stage_Prev
Status
Rate
Rate_Prev
1
2022.09
2022.08
b
2
a
1
1
0.1
0.3
2
2022.09
2022.08
a
2
b
1
1
0.1
0.15
Basically, I want to obtain the rate corresponding to each individual set of {Month, Code, Stage, Status} and {Month_Prev, Code_Prev, Stage_Prev, Status} and I'm using Oracle. Anyone can help?
Well, you have already shown the keys for the join, so simply apply them. You'll have to join the second table twice, once for Month, once for Month_Prev.
select
t1.*
this.rate,
prev.rate as prev_rate
from t1
join t2 this on this.month = t1.month and this.code = t1.code and this.stage = t1.stage and this.status = t1.status
join t2 prev on prev.month = t1.month_prev and prev.code = t1.code and prev.stage = t1.stage and prev.status = t1.status
order by t1.month, t1.code, t1.stage, t1.status;
(In case there can be t1 rows without a match in t2 and you still want to show the row without a rate then, then change the inner joins to left outer joins.)

data.table create table from rows

I would like to analyze a table that reports job codes used by people over the course of several pay periods: I want to know how many times each person has used each job code.
The table lists people in the first column, and pay periods in subsequent columns -- I cannot transpose without creating new problems with names.
The table looks like this:
people
pp1
pp2
pp3
pp4
Bob
A
A
A
C
Ted
B
B
B
B
Alice
B
A
C
C
My desired output looks like this:
people
A
B
C
Bob
3
0
1
Ted
0
4
0
Alice
1
1
2
My code is as follows:
myDT <- data.table(
people = c('Bob','Ted','Alice'),
pp1 = c('A','B','B'),
pp2 = c('A','B','A'),
pp3 = c('A','B','C'),
pp4 = c('C','B','C')
)
id.col=paste('pp',1:3)
myDT[ , table(as.matrix(.SD)), .SDcols = id.col, by = 1:nrow(myDT)]
but it's nowhere close to working
melt(myDT, "people") |>
dcast(people ~ value, fun.aggregate = length)
# people A B C
# <char> <int> <int> <int>
# 1: Alice 1 1 2
# 2: Bob 3 0 1
# 3: Ted 0 4 0

Determine the minimum score in each fight

There are T1 and T2 Teams playing a game.Both Team have N players.The power of T1 team player are represented in an array and Similarly T2 team player.
The rule of Game are follows:
1.There are only N fights.
2.No player of any team can play twice
3.For each fight score generated from (P1 + P2)%N P1 represent power of T1 .Similarly P2
T2 is aware of the order of T1 is sending their player to play the game T2 wants to obtains minimum score in each fight.
Your task is to determine the order in Which T2 is sending player such that they obtain minimum score for each fight
You are required to print score for each round and the other players of T2.
N=3
T1=[1,2,3]
T2=[2,0,1]
Output:[0,0,0]
N=4
T1=[0,1,2,1]
T2=[3,2,1,1]
output:[1,0,0,2]
Explanation Case 1 N=3:
[(1+2)%3,(2+1)%3,(3+0)%3]
What is optimal solution for this problem?
One approach is checking one by one power with T1.
This isn't a complete answer, but here's how I would think about it. Having more interesting examples could help develop the idea.
Sort the arrays, one descending, one ascending:
N=4
T1=[0,1,2,1]
T2=[3,2,1,1]
0 1 1 2
3 2 1 1
Now use a pointer to "shift" the alignment while maintaining a variable for the current, ascending target remainder.
Target remainder 0
0 1 1 2
3 2 1 1
^
useless, shift pointer right
0 1 1 2
3 2 1 1
^
useless, shift pointer right
0 1 1 2
3 2 1 1

How to do a Range Bucket on a Count returned by a Group By query

I was hoping if you can help me.
I am in a situation where I first need to do a Count Distinct of Funds and Group By Policies. Once I have done that, I need to put the count of policies in Range of Number of Funds.
This is the data available, where you can see the policies and different funds linked to it
**PolicyNum Fund**
1201 AB
1202 AC
1203 AB
1203 AC
1203 AD
1204 AB
1204 BC
1204 AC
1204 AD
1204 AE
1204 AF
Now I need to do a Count Distinct of Fund Grouped by Policy.
I have used this query to do that:
select fv, policy, count(distinct fv.fund)
from policy_fund fv
group by fv.policy
order by count(distinct fv.fund) desc
After using the above code, the following would come up
This is a view where you can see the number of funds linked to each policy
**Policy No. of Funds**
1201 1
1202 1
1203 3
1204 6
Now, the problem part, I want to reach to this, which is the Range of Number of Funds and how many policies fall under that range of funds:
Help required to achieve this view of Range of Number of Funds and how many policies are present in each range
**Range of Number of funds Number of policies**
0 to 1 2
2 to 3 1
4 to 5 0
5 to 6 1
You can left join your query to a derived table defining the ranges on the count being in the range and then group by the range and count the policies.
SELECT r.l || ' to ' || r.u "RANGE",
count(p) "COUNT"
FROM (SELECT 0 l,
1 u
FROM dual
UNION ALL
SELECT 2 l,
3 u
FROM dual
UNION ALL
SELECT 4 l,
5 u
FROM dual
UNION ALL
SELECT 6 l,
7 u
FROM dual) r
LEFT JOIN (SELECT fv.policy p,
count(distinct fv.fund) cof
FROM policy_fund fv
GROUP BY fv.policy) fpp
ON fpp.cof >= r.l
AND fpp.cof <= r.u
GROUP BY r.l,
r.u
ORDER BY r.l,
r.u;
db<>fiddle

Is there an algorithm that can divide a number into three parts and have their totals match the original number?

For example if you take the following example into consideration.
100.00 - Original Number
33.33 - 1st divided by 3
33.33 - 2nd divided by 3
33.33 - 3rd divided by 3
99.99 - Is the sum of the 3 division outcomes
But i want it to match the original 100.00
One way that i saw it could be done was by taking the original number minus the first two divisions and the result would be my third number. Now if i take those 3 numbers i get my original number.
100.00 - Original Number
33.33 - 1st divided by 3
33.33 - 2nd divided by 3
33.34 - 3rd number
100.00 - Which gives me my original number correctly. (33.33+33.33+33.34 = 100.00)
Is there a formula for this either in Oracle PL/SQL or a function or something that could be implemented?
Thanks in advance!
This version takes precision as a parameter as well:
with q as (select 100 as val, 3 as parts, 2 as prec from dual)
select rownum as no
,case when rownum = parts
then val - round(val / parts, prec) * (parts - 1)
else round(val / parts, prec)
end v
from q
connect by level <= parts
no v
=== =====
1 33.33
2 33.33
3 33.34
For example, if you want to split the value among the number of days in the current month, you can do this:
with q as (select 100 as val
,extract(day from last_day(sysdate)) as parts
,2 as prec from dual)
select rownum as no
,case when rownum = parts
then val - round(val / parts, prec) * (parts - 1)
else round(val / parts, prec)
end v
from q
connect by level <= parts;
1 3.33
2 3.33
3 3.33
4 3.33
...
27 3.33
28 3.33
29 3.33
30 3.43
To apportion the value amongst each month, weighted by the number of days in each month, you could do this instead (change the level <= 3 to change the number of months it is calculated for):
with q as (
select add_months(date '2013-07-01', rownum-1) the_month
,extract(day from last_day(add_months(date '2013-07-01', rownum-1)))
as days_in_month
,100 as val
,2 as prec
from dual
connect by level <= 3)
,q2 as (
select the_month, val, prec
,round(val * days_in_month
/ sum(days_in_month) over (), prec)
as apportioned
,row_number() over (order by the_month desc)
as reverse_rn
from q)
select the_month
,case when reverse_rn = 1
then val - sum(apportioned) over (order by the_month
rows between unbounded preceding and 1 preceding)
else apportioned
end as portion
from q2;
01/JUL/13 33.7
01/AUG/13 33.7
01/SEP/13 32.6
Use rational numbers. You could store the numbers as fractions rather than simple values. That's the only way to assure that the quantity is truly split in 3, and that it adds up to the original number. Sure you can do something hacky with rounding and remainders, as long as you don't care that the portions are not exactly split in 3.
The "algorithm" is simply that
100/3 + 100/3 + 100/3 == 300/3 == 100
Store both the numerator and the denominator in separate fields, then add the numerators. You can always convert to floating point when you display the values.
The Oracle docs even have a nice example of how to implement it:
CREATE TYPE rational_type AS OBJECT
( numerator INTEGER,
denominator INTEGER,
MAP MEMBER FUNCTION rat_to_real RETURN REAL,
MEMBER PROCEDURE normalize,
MEMBER FUNCTION plus (x rational_type)
RETURN rational_type);
Here is a parameterized SQL version
SELECT COUNT (*), grp
FROM (WITH input AS (SELECT 100 p_number, 3 p_buckets FROM DUAL),
data
AS ( SELECT LEVEL id, (p_number / p_buckets) group_size
FROM input
CONNECT BY LEVEL <= p_number)
SELECT id, CEIL (ROW_NUMBER () OVER (ORDER BY id) / group_size) grp
FROM data)
GROUP BY grp
output:
COUNT(*) GRP
33 1
33 2
34 3
If you edit the input parameters (p_number and p_buckets) the SQL essentially distributes p_number as evenly as possible among the # of buckets requested (p_buckets).
I've solved this problem yesterday by subtracting 2 of 3 parts from the starting number, e.g. 100 - 33.33 - 33.33 = 33.34 and the result of summing it up is still 100.

Resources