SAS: Regressions of times around an event

SAS: Regressions of times around an event - events

I am reading myself into event studies and need to regress the pre-announcement day volatility (daily) on the announcement days t0 and t1
Return(t0,t+1) = Volatility(t-5 to t-1) + Controls.
thus, I tried to create a dummy which is 0 in t-5 until t-1 and 1 in t0 and t+1. Then I set return to . if dummy not equal 1 and volatility to . if dummy not equal 0. Then:
proc reg data=regdata;
model return = volatility + control1 + control2;
quit;
Obviously, dependent and independent variables' data is in different observations. The program has no valid observations.
Event_time return volatility
-5 0.5
-4 0.4
-3 0.6
-2 0.2
-1 0.4
0 0.05
1 0.06
How can I achieve this?
Thanks in advance!

I did it by doing a proc means by stock and re-merging the new day 0 and 1 mean value to the pre-announcement window.

Related

How to convert from long to wide format when the column numbers per row are variable? (MATLAB)

I have a time series dataset of accelerometry values where there are many sub-seconds of measurements but the actual number of sub-seconds recorded per second is variable.
So I would be starting with something that looks like this:
Date time
Dec sec
Acc X
1
.00
0.5
1
.25
0.5
1
.50
0.6
1
.75
0.5
2
.00
0.6
2
.40
0.5
2
.80
0.5
3
.00
0.5
3
.50
0.5
4
.00
0.6
4
.25
0.5
4
.50
0.5
4
.75
0.5
And trying to convert it to wide format where each row is a second, and the columns are the decimal seconds corresponding to each second.
sub1
sub2
sub3
sub4
.5
.5
.6
.5
.6
.5
.5
NaN
.5
.5
NaN
NaN
.6
.5
.5
.5
In code this would look like:
%Preallocate some space
Dpts_observations = NaN(13,3);
%These are the "seconds" number
Dpts_observations(:,1)=[1 1 1 1...
2 2 2...
3 3...
4 4 4 4];
%These are the "decimal seconds"
Dpts_observations(:,2) = [0.00 0.25 0.50 0.75...
0.00 0.33 0.66...
0.00 0.50 ...
0.00 0.25 0.50 0.75]
%Here's actual acceleration values
Dpts_observations(:,3) = [0.5 0.5 0.5 0.5...
0.6 0.5 0.5...
0.4 0.5...
0.5 0.5 0.6 0.4]
%I have this in a separate file but I have summary data that helps me
determine the row indexes corresponding to sub-seconds that belong to the same second and I use them to manually extract from long form to wide form.
%Create table to hold indexing information
Seconds = [1 2 3 4];
Obs_per_sec = [4 3 2 4];
Start_index = [1 5 8 10];
End_index = [4 7 9 13];
Dpts_attributes = table(Seconds, Obs_per_sec, Start_index, End_index);
%Preallocate new array
Acc_X = NaN(4,4);
%Loop through seconds
for i=1:max(size(Dpts_attributes))
Acc_X(i, 1:Dpts_attributes.Obs_per_sec(i))=Dpts_observations(Dpts_attributes.Start_index(i):Dpts_attributes.End_index(i),3);
end
Now this is working but its very slow. In reality, I have a huge data set consisting of millions of seconds and I'm hoping there might be a better solution than the one I currently have going. My data is all numeric to try to make everything as fast a possible.
Thank you!

Avg of avgs in variable time windows

Context:
Activity has a grade
Activities belong to a subject, and the subject_avg is simply the average of its activities grades in a determined time range
The global_avg is the avg of many subject_avg (i.e, not to be confused with the average of all activity grades)
Problem:
"Efficiently" calculate global_avg in variable time windows
"Efficiently" calculating subject_avg for a single subject, by accumulating the amount and grade of its activities:
date
grade
act1
day 1
0.5
act2
day 3
1
act3
day 3
0.8
act4
day 6
0.6
act5
day 6
0
avg_sum
activity_count
day 1
0.5
1
day 3
2.3
3
day 6
2.6
5
I called it "efficiently" because if I need subject_avg between any 2 dates, I can obtain it with simple arithmetic over the second table:
subject_avg (day 2 to 5) = (2.3 - 0.5) / (3 - 1) = 0.6
Calculating global_avg:
subjectA
avg_sum
activity_count
day 1
0.5
1
day 3
2.3
3
day 6
2.6
5
subjectB
avg_sum
activity_count
day 4
0.8
1
day 6
1.8
2
global_avg (day 2 to 5) = (subjectA_avg + subjectB_avg)/2 = (0.6 + 0.8) / 2 = 0.7
I have hundred of subjects, so I need to now: Is there any way I could pre-process the subject_avgs so that I don't need to individually calculate its averages in the given time window before calculating global_avg?

A variant of the Knapsack algorithm

I have a list of items, a, b, c,..., each of which has a weight and a value.
The 'ordinary' Knapsack algorithm will find the selection of items that maximises the value of the selected items, whilst ensuring that the weight is below a given constraint.
The problem I have is slightly different. I wish to minimise the value (easy enough by using the reciprocal of the value), whilst ensuring that the weight is at least the value of the given constraint, not less than or equal to the constraint.
I have tried re-routing the idea through the ordinary Knapsack algorithm, but this can't be done. I was hoping there is another combinatorial algorithm that I am not aware of that does this.

In the german wiki it's formalized as:
finite set of objects U
w: weight-function
v: value-function
w: U -> R
v: U -> R
B in R # constraint rhs
Find subset K in U subject to:
sum( w(u) <= B ) | all w in K
such that:
max sum( v(u) ) | all u in K
So there is no restriction like nonnegativity.
Just use negative weights, negative values and a negative B.
The basic concept is:
sum( w(u) ) <= B | all w in K
<->
-sum( w(u) ) >= -B | all w in K
So in your case:
classic constraint: x0 + x1 <= B | 3 + 7 <= 12 Y | 3 + 10 <= 12 N
becomes: -x0 - x1 <= -B |-3 - 7 <=-12 N |-3 - 10 <=-12 Y
So for a given implementation it depends on the software if this is allowed. In terms of the optimization-problem, there is no problem. The integer-programming formulation for your case is as natural as the classic one (and bounded).
Python Demo based on Integer-Programming
Code
import numpy as np
import scipy.sparse as sp
from cylp.cy import CyClpSimplex
np.random.seed(1)
""" INSTANCE """
weight = np.random.randint(50, size = 5)
value = np.random.randint(50, size = 5)
capacity = 50
""" SOLVE """
n = weight.shape[0]
model = CyClpSimplex()
x = model.addVariable('x', n, isInt=True)
model.objective = value # MODIFICATION: default = minimize!
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int) # assumes existence
print("INSTANCE")
print(" weights: ", weight)
print(" values: ", value)
print(" capacity: ", capacity)
print("Solution")
print(x_sol)
print("sum weight: ", x_sol.dot(weight))
print("value: ", x_sol.dot(value))
Small remarks
This code is just a demo using a somewhat low-level like library and there are other tools available which might be better suited (e.g. windows: pulp)
it's the classic integer-programming formulation from wiki modifies as mentioned above
it will scale very well as the underlying solver is pretty good
as written, it's solving the 0-1 knapsack (only variable bounds would need to be changed)
Small look at the core-code:
# create model
model = CyClpSimplex()
# create one variable for each how-often-do-i-pick-this-item decision
# variable needs to be integer (or binary for 0-1 knapsack)
x = model.addVariable('x', n, isInt=True)
# the objective value of our IP: a linear-function
# cylp only needs the coefficients of this function: c0*x0 + c1*x1 + c2*x2...
# we only need our value vector
model.objective = value # MODIFICATION: default = minimize!
# WARNING: typically one should always use variable-bounds
# (cylp problems...)
# workaround: express bounds lower_bound <= var <= upper_bound as two constraints
# a constraint is an affine-expression
# sp.eye creates a sparse-diagonal with 1's
# example: sp.eye(3) * x >= 5
# 1 0 0 -> 1 * x0 + 0 * x1 + 0 * x2 >= 5
# 0 1 0 -> 0 * x0 + 1 * x1 + 0 * x2 >= 5
# 0 0 1 -> 0 * x0 + 0 * x1 + 1 * x2 >= 5
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
# cylp somewhat outdated: need numpy's matrix class
# apart from that it's just the weight-constraint as defined at wiki
# same affine-expression as above (but only a row-vector-like matrix)
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
# internal conversion of type neeeded to treat it as IP (or else it would be
LP)
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
# type-casting
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int)
Output
Welcome to the CBC MILP Solver
Version: 2.9.9
Build Date: Jan 15 2018
command line - ICbcModel -solve -quit (default strategy 1)
Continuous objective value is 4.88372 - 0.00 seconds
Cgl0004I processed model has 1 rows, 4 columns (4 integer (4 of which binary)) and 4 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 5
Cbc0038I Before mini branch and bound, 4 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of 5 - took 0.00 seconds
Cbc0012I Integer solution of 5 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective 5, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 5 to 5
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Total time (CPU seconds): 0.00 (Wallclock seconds): 0.00
INSTANCE
weights: [37 43 12 8 9]
values: [11 5 15 0 16]
capacity: 50
Solution
[0 1 0 1 0]
sum weight: 51
value: 5

Fitness Proportionate Selection when some fitnesses are 0

I have a question about what to do with the fitnesses (fitness'?) that are 0 when getting the fitness proportionate probabilities. Should the container for the members be sorted by highest fitness first, then do code similar to this:
for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability then you have been selected
end for
end
create offspring
end loop
My problem that I am seeing as I go through one iteration by hand with randomly generated members is that I have some member's fitness as 0, but when getting the probability of those members, it keeps the same probability as the last non zero member. Is there a way I can separate the non zero probabilities from the zero probabilities? I was thinking that even if I sort based on highest fitness, the last non zero member would have the same probability as the zero probabilities.

Consider this example:
individual fitness(i) probability(i) partial_sum(i)
1 10 10/20 = 0.50 0.50
2 3 3/20 = 0.15 0.5+0.15 = 0.65
3 2 2/20 = 0.10 0.5+0.15+0.1 = 0.75
4 0 0/20 = 0.00 0.5+0.15+0.1+0.0 = 0.75
5 5 5/20 = 0.25 0.5+0.15+0.1+0.0+0.25 = 1.00
------
Sum 20
Now if number = Random between [0;1[ we are going to pick individual i if:
individual condition
1 0.00 <= number < partial_sum(1) = 0.50
2 0.50 = partial_sum(1) <= number < partial_sum(2) = 0.65
3 0.65 = partial_sum(2) <= number < partial_sum(3) = 0.75
4 0.75 = partial_sum(3) <= number < partial_sum(4) = 0.75
5 0.75 = partial_sum(4) <= number < partial_sum(5) = 1.00
If an individual has fitness 0 (e.g. I4) it cannot be selected because of its selection condition (e.g. I4 has the associated condition 0.75 <= number < 0.75).

How to calculate one certain value from a rolling-window estimation in Stata

I'm using Stata to estimate Value-at-risk (VaR) with the historical simulation method. Basically, I will create a rolling window with 100 observations, to estimate VaR for the next 250 days (repeat 250 times). Hence, as I've known, the rolling window with time series command in Stata would be useful in this case. Here is the process:
Input: 350 values
1. Ascending sort the very first 100 values (by magnitude).
2. Then I need to take the 5th smallest for each window.
3. Repeat 250 times.
Output: a list of the 5th values (250 in total).
Sound simple, but I cannot do it the right way. This was my attempt below:
program his,rclass
sort lnreturn
return scalar actual=lnreturn in 5
end
tsset stt
time variable: stt, 1 to 350
delta: 1 unit
rolling actual=r(actual), window(100) saving(C:\result100.dta, replace) : his
(running his on estimation sample)
And the result is:
Start end actual
1 100 -.047856
2 101 -.047856
3 102 -.047856
4 103 -.047856
.... ..... ......
251 350 -.047856
What I want is 250 different 5th values in panel "actual", not the same like that.

If I understand this correctly, you want the 5th percentile of values in a window of 100. That should yield to summarize, detail or centile. I see no need to write a program.
Your bug is that your program his calculates the same thing each time it is called. There is no communication about windows other than what is explicit in your code. It is like saying
move here: now add 2 + 2
move there: now add 2 + 2
move to New York: now add 2 + 2
The result is invariant to your supposed position.
Note that I doubt that
return scalar actual=lnreturn in 5
really is your code. lnreturn[5] should work.
UPDATE You don't even need rolling here. Looping over data is easy enough. The data in this example are clearly fake.
clear
* sandpit
set obs 500
set seed 2803
gen y = ceil(exp(rnormal(3,2)))
l y in 1/5
* initialise
gen p5 = .
* windows of length 100: 1..100, 101..200, ...
quietly forval j = 1/401 {
local J = `j' + 99
su y in `j'/`J', detail
replace p5 = r(p5) in `j'
}
* check first calculation
su y in 1/100, detail
l in 1/5

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

SAS: Regressions of times around an event - events

I did it by doing a proc means by stock and re-merging the new day 0 and 1 mean value to the pre-announcement window.

Related

How to convert from long to wide format when the column numbers per row are variable? (MATLAB)

Avg of avgs in variable time windows

A variant of the Knapsack algorithm

Fitness Proportionate Selection when some fitnesses are 0

How to calculate one certain value from a rolling-window estimation in Stata

Categories

Resources