Context:
Activity has a grade
Activities belong to a subject, and the subject_avg is simply the average of its activities grades in a determined time range
The global_avg is the avg of many subject_avg (i.e, not to be confused with the average of all activity grades)
Problem:
"Efficiently" calculate global_avg in variable time windows
"Efficiently" calculating subject_avg for a single subject, by accumulating the amount and grade of its activities:
date
grade
act1
day 1
0.5
act2
day 3
1
act3
day 3
0.8
act4
day 6
0.6
act5
day 6
0
avg_sum
activity_count
day 1
0.5
1
day 3
2.3
3
day 6
2.6
5
I called it "efficiently" because if I need subject_avg between any 2 dates, I can obtain it with simple arithmetic over the second table:
subject_avg (day 2 to 5) = (2.3 - 0.5) / (3 - 1) = 0.6
Calculating global_avg:
subjectA
avg_sum
activity_count
day 1
0.5
1
day 3
2.3
3
day 6
2.6
5
subjectB
avg_sum
activity_count
day 4
0.8
1
day 6
1.8
2
global_avg (day 2 to 5) = (subjectA_avg + subjectB_avg)/2 = (0.6 + 0.8) / 2 = 0.7
I have hundred of subjects, so I need to now: Is there any way I could pre-process the subject_avgs so that I don't need to individually calculate its averages in the given time window before calculating global_avg?
Related
I am reading myself into event studies and need to regress the pre-announcement day volatility (daily) on the announcement days t0 and t1
Return(t0,t+1) = Volatility(t-5 to t-1) + Controls.
thus, I tried to create a dummy which is 0 in t-5 until t-1 and 1 in t0 and t+1. Then I set return to . if dummy not equal 1 and volatility to . if dummy not equal 0. Then:
proc reg data=regdata;
model return = volatility + control1 + control2;
quit;
Obviously, dependent and independent variables' data is in different observations. The program has no valid observations.
Event_time return volatility
-5 0.5
-4 0.4
-3 0.6
-2 0.2
-1 0.4
0 0.05
1 0.06
How can I achieve this?
Thanks in advance!
I did it by doing a proc means by stock and re-merging the new day 0 and 1 mean value to the pre-announcement window.
Week Sales
1 100
2 250
3 350
4 145
5 987
6 26
7 32
8 156
I wanted to calculate the sales only for the last 3 weeks so the total will be 156+32+26.
If new weeks are added it should automatically calculate only the data from the last 3 rows.
Tried this formula but it is returning an incorrect sum
sum(sales) over (lastperiod(3(week))
https://i.stack.imgur.com/6Y7h7.jpg
If you want only the last 3 weeks sum in calculated column you can use a simple if calculation.
If([week]>(Max([week]) - 3),Sum([sales]),0)
If you need 3 weeks calculation throughout table use below one.
sum([sales]) OVER (LastPeriods(3,[week]))
Consider the following table:
Id Verb Qty Price
`1 Buy 6 10.0
`2 Sell 5 11.0
`3 Buy 4 10.0
`4 Sell 3 11.0
`5 Sell 8 9.0
`6 Buy 1 8.0
etc...
What I would like is to associate a PNL with each transaction, computed on a FIFO (first-in-first-out basis). Thus, for Id=`1, I want the PNL to be -6*(10.0) +5*(11.0) + 1*(11.0) = +$6.00, for Id=`3, Pnl is -4*(10.0)+2*(11.0)+(2*9.0) = $0, etc.
In layman's terms, For the first buy-order of size 6, I want to offset this by the first 6 sells, and for the second buy-order of size 4, offset this with the subsequent 4 sells that have not been included in the pnl computation for the buy-6 order.
Any advice?
Take data from your example:
txn:([] t: til 6; side:`Buy`Sell`Buy`Sell`Sell`Buy; qty:6 5 4 3 8 1; px: 10.0 11.0 10.0 11.0 9.0 8.0)
Best to maintain buys and sells transactions/fills separately in your database:
buys: select from txn where side=`Buy
sells: select from txn where side=`Sell
Functions we'll need [1]:
/ first-in first-out allocation of bid/buy and ask/sell fills
/ returns connectivity matrix of (b)id fills in rows and (a)sk fills in columns
fifo: {deltas each deltas sums[x] &\: sums[y]};
/ connectivity list from connectivity matrix
lm: {raze(til count x),''where each x};
/ realized profit & loss
rpnl: {[b;s]
t: l,'f ./: l:lm (f:fifo[exec qty from b;exec qty from s])>0;
pnl: (select bt:t, bqty:qty, bpx:px from b#t[;0]),'(select st:t, sqty:qty, spx:px from s#t[;1]),'([] qty: t[;2]);
select tstamp: bt|st, rpnl:qty*spx-bpx from pnl
}
Run:
q)rpnl[buys;sells]
tstamp rpnl
-----------
1 5
3 1
3 2
4 -2
5 1
According to my timings, should be ~ 2x faster than the next best solution, since it's nicely vectorized.
Footnotes:
fifo function is a textbook example from Q for Mortals. In your case, it looks like this:
q)fifo[exec qty from buys;exec qty from sells]
5 1 0
0 2 2
0 0 1
lm function tells which buys and sell pairs were crossed (non-zero fills). More background here: [kdb+/q]: Convert adjacency matrix to adjacency list
q)lm fifo[exec qty from buys;exec qty from sells]>0
0 0
0 1
1 1
1 2
2 2
Cryptic first line of rpnl is then combination of the two concepts above:
q)t: l,'f ./: l:lm (f:fifo[exec qty from buys;exec qty from sells])>0;
0 0 5
0 1 1
1 1 2
1 2 2
2 2 1
A similar approach to JPC, but keeping things tabular:
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)tab
Id Verb Qty Price
-----------------
1 Buy 6 10
2 Sell 5 11
3 Buy 4 10
4 Sell 3 11
5 Sell 8 9
6 Buy 1 8
pnlinfo:{[x;y]
b:exec first'[(Qty;Price)] from x where Id=y;
r:exec (remQty;fifo[remQty;b 0];Price) from x where Verb=`Sell;
x:update remQty:r 1 from x where Verb=`Sell;
update pnl:neg[(*) . b]+sum[r[2]*r[0]-r[1]] from x where Id=y
};
fifo:{x-deltas y&sums x};
pnlinfo/[update remQty:Qty from tab where Verb=`Sell;exec Id from tab where Verb=`Buy]
Id Verb Qty Price remQty pnl
----------------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 5
6 Buy 1 8 1
Assumes that Buys will be offset against previous sells as well as future sells.
You could also in theory use other distributions such as
lifo:{x-reverse deltas y&sums reverse x}
but I haven't tested that.
Here is a first attempt to get the ball rolling. Not efficient.
q)t:([]id:1+til 6;v:`b`s`b`s`s`b;qty:6 5 4 3 8 1; px:10 11 10 11 9 8)
//how much of each sale offsets a given purchase
q)alloc:last each (enlist d`s){(fx-c;c:deltas y&sums fx:first x)}\(d:exec qty by v from t)`b
//revenues, ie allocated sale * appropriate price
q)revs:alloc*\:exec px from t where v=`s
q)(sum each revs)-exec qty*px from t where v=`b
6 0 1
Slightly different approach without using over/scan(except in sums...).
Here we create a list of duplicated indices(one per unit Qty) of every Sell order and use cut to assign them to the appropriate Buy order, then we index into the Price of those Sells and find the difference with the Price of the appropriate Buy order.
This should scale with table size, but memory will blow up when Qty is large.
q)tab:([] Id:`1`2`3`4`5`6;Verb:`Buy`Sell`Buy`Sell`Sell`Buy;Qty:6 5 4 3 8 1;Price:10.0 11.0 10.0 11.0 9.0 8.0)
q)sideMap:`Buy`Sell!1 -1
q)update pnl:sum each neg Price - Price{sells:where neg 0&x; -1_(count[sells]&0,sums 0|x) _ sells}Qty*sideMap[Verb] from tab
Id Verb Qty Price pnl
---------------------
1 Buy 6 10 6
2 Sell 5 11 0
3 Buy 4 10 0
4 Sell 3 11 0
5 Sell 8 9 0
6 Buy 1 8 1
Using Ruby how can you determine if the coming Sunday is the first, second or third Sunday of the month?
#ElYusubov - is close, but not quite right.
As a starting point, the division must be by seven (number of days in a week). But time.day gives a day-of-the-month from 1 to 31, so first you need to subtract one before the division and add one after. The first eight days of any month give...
Day number (Day# - 1) / 7 Week#
---------- -------------- -----
1 0.0 1
2 0.14 1
3 0.29 1
4 0.43 1
5 0.57 1
6 0.71 1
7 0.86 1
8 1 2
Whichever day-of-the-week time.day gives, that week# indicates whether it's the first, second etc of that day-of-the-week. But you want the coming Sunday.
wday gives a weekday - 0 to 6, with 0 meaning Sunday. So how many days are there to the coming Sunday? Well, that depends on your definition of "Coming", but if you exclude today==Sunday, you basically subtract todays weekday from 7.
Weekday today Days until next Sunday
------------- ----------------------
0 (Sun) 7
1 (Mon) 6
2 (Tue) 5
3 (Wed) 4
4 (Thu) 3
5 (Fri) 2
6 (Sat) 1
If you allow the "coming" Sunday to be today, then you do the same thing but replace seven with zero. You can either do a conditional check, or use the modulo/remainder operator.
Anyway, once you know how many days ahead the coming Sunday is, you can calculate the date value for that (add those days to todays date) and then determine the week number in the month for that date instead of today using the first method (subtract 1, divide by seven, add 1).
Relevant vocabulary...
Date.wday 0 to 6 (0 = sunday)
Date.day 1 to 31
I won't try to provide the code because I don't know Ruby.
Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 25000
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation:
Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed