I have one IV, one DV, one moderator, and two control variables.
The panel data is strongly balanced with 5 years.
Is there any way to draw an interaction plot of the xtdpdml results?
I've tried the usual methods, (iv: independent variable with values from 1 to 6; mo: moderator with values from 1 to 6)
egen iv = rowmean (q1 q2 q3 q4 q5 q6)
egen dv = rowmean (qd1 qd2 qd3 qd4 qd5 qd6)
egen mo = rpwmean (qm1 qm2 qm3 qm4 qm5 qm6)
egen im = iv * dv
xtdpdml dv, inv(gender birthyear) pre(iv mo im) fiml
summarize mo
global moa = round(r(mean) + r(sd),0.1)
global mo = round(r(mean),0.1)
global mob = round(r(mean) - r(sd),0.1)
margins, at(iv=(1(1)6) mo=($moa $mo $mob))
and the error messages showed up saying 'iv ambiguous abbreviation.'
EDIT This answer was in response to a previous version of the question using quite different variable names. It addresses only a side-issue, how to present numeric results rounded to a desired number of decimal places.
Your code doesn't mention pnv so it is hard to comment.
I see what you want to do, but that is an awkward way to do it, and fallible. The problem is that Stata works in binary and can only hold a few of the decimal fractions 0.01(0.01)0.99 exactly, specifically 0.25 0.50 0.75. That's 3 out of 99; the rest must be held as approximations. Other Stata code means that you won't always see the approximations, but it's not guaranteed. I would try this, which makes the rounding into the formatting question it really is:
summarize sns
local snsa : di %2.1f r(mean) + r(sd)
local sns: di %2.1f r(mean)
local snsb: di %2.1f r(mean) - r(sd)
margins, at(priv=(0(1)6) sns=(`snsa' `sns' `snsb'))
I can't rule out the real problem being in some other part of the code.
Related
I'd like to show students what happens when only a constant is used in a regression model. I specified one model as price ~ age for an OLS model of the price of used cars as a function of age plus a constant. Now I'd like to drop the age variable and just have the constant. How do I do this?
The formula fitting in statsmodels uses Patsy, which tries to mimic R-style model specifications.
Since you didn't specify a data source, I've taken a dataset from the
statsmodels OLS guide to provide a worked example - can wealth explain lottery spending:
import statsmodels.api as sm
import statsmodels.formula.api as smf
# load example and trim to a few features
df = sm.datasets.get_rdataset("Guerry", "HistData").data
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
# fit with y=mx + c
model1 = smf.ols(formula='Lottery ~ Wealth', data=df).fit()
print(model1.summary())
# fit with y=c (only an intercept)
model2 = smf.ols(formula='Lottery ~ 1', data=df).fit()
print(model2.summary())
For your question, a model with only the intercept is nothing more than the mean, but presumably you are interested in techniques for comparing different models, so let's do a quick comparison to see whether the simpler model gives a better fit - one option is the f-test:
f_val, p_val, _ = model1.compare_f_test(model2)
print(f_val, p_val, p_val<0.01)
The p value is below 1% significance level, so we interpret that the more complex model is "more correct" in this case.
For completeness, to specify a model without an intercept (useful e.g. if we already mean-centered the data), we can exclude with -1 in the formula:
# y = mx
model3 = smf.ols(formula='Lottery ~ Wealth -1', data=df).fit()
print(model3.summary())
f_val, p_val, _ = model1.compare_f_test(model3)
print(f_val, p_val, p_val<0.01)
Again, p_val is below 1% significance level, so including intercept and slope improves model fit. (No multi-test correction here, but p values are <<1%)
This question is an extension of the excellent answer provided by Robert Picard here: How to Randomly Assign to Groups of Different Sizes
We have this dataset, which is the same as in the previous question, but adds the year variable:
sysuse census, clear
keep state region pop
order state pop region
decode region, gen(reg)
replace reg="NCntrl" if reg=="N Cntrl"
drop region
gen year=20
replace year=30 if _n>15
replace year=40 if _n>35
If I just wanted to re-randomly assign reg's across all observations (without regard to group), I could implement the answer to the previous post:
tempfile orig
save `orig'
keep reg
rename reg reg_new
set seed 234
gen double u = runiform()
sort u reg_new
merge 1:1 _n using `orig', nogen
How would the code be modified so that reg is shuffled, but only within year? For example, there are 15 observations where year==20. These observations should be shuffled separately than the other years.
Shuffling one variable doesn't require any file choreography. This can probably be shortened:
sysuse auto, clear
set seed 2803
gen double shuffle = runiform()
* example 1
sort shuffle
gen long which = _n
sort mpg
gen mpg_new = mpg[which]
list which mpg*
* example 2
bysort foreign (shuffle) : gen long which2 = _n
bysort foreign (mpg) : gen mpg2 = mpg[which2]
list which2 mpg mpg2, sepby(foreign)
All that said, I think sample does this so long as you specify the same sample size as the number in the dataset. It's overkill because you get all the variables.
Recently stan has added the integrate_ode method. Unfortunately the only documentation I can find is the stan reference manual (p.191ff). I have a model that would require some driving signal. As I understand the parameter x_r and x_i are supposed to be used for this.
For the sake of a concrete example lets assume I want to implement the example from the documentation with following change:
real[] sho(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {
real dydt[2];
real input_signal; // Change from here!!!
input_signal <- how_to(t, x_r, x_i);
dydt[1] <- y[2] + input_signal; // Change to here!!!
dydt[2] <- -y[1] - theta[1] * y[2];
return dydt;
}
the input signal is supposed to be a time series that is inputted - let's say I submit input_signal_vector <- sin(t) + rnorm(T, sd=0.1) (which is supposed to be a signal at the time points in ts) and I plan to use for input_signal the closest value in time.
The only way I can imagine is one could concat ts and input_signal_vector in x_r and then a search in this array. But I can not imagine this is the intended use of these parameter. It would also be extremely inefficient.
So if someone could show how such a case is supposed to be solved I would be very grateful.
I have 3 measurements for a machine. Each measurement is trigged every time its value changes by a certain delta.
I have these 3 data sets, represented as Matlab objects: T1, T2 and O. Each of them has a obj.t containing the timestamp values and obj.y containing the measurement values.
I will measure T1 and T2 for a long time, but O only for a short period. The task is to reconstruct O_future from T1 and T2, using the existing values for O for training and validation.
Note that T1.t, T2.t and O.t are not equal, not even their frequency (I might call it 'variable sample rate', but not sure if this name applies).
Is it possible to solve this problem using Matlab or other software? Do I need to resample all data to a common time vector?
Concerning the common time. Below some basic code which does this. (I guess you might know how to do it but just in case). However, the second option might bring you further...
% creating test signals
t1 = 1:2:100;
t2 = 1:3:200;
to = [5 6 100 140];
s1 = round (unifrnd(0,1,size(t1)));
s2 = round (unifrnd(0,1,size(t2)));
o = ones(size(to));
maxt = max([t1 t2 to]);
mint = min([t1 t2 to]);
% determining minimum frequency
frequ = min([t1(2:length(t1)) - t1(1:length(t1)-1) t2(2:length(t2)) - t2(1:length(t2)-1) to(2:length(to)) - to(1:length(to)-1)] );
% create a time vector with highest resolution
tinterp = linspace(mint,maxt,(maxt-mint)/frequ+1);
s1_interp = zeros(size(tinterp));
s2_interp = zeros(size(tinterp));
o_interp = zeros(size(tinterp));
for i = 1: length(t1)
s1_interp(ceil(t1(i))==floor(tinterp)) =s1(i);
end
for i = 1: length(t2)
s2_interp(ceil(t2(i))==floor(tinterp)) =s2(i);
end
for i = 1: length(to)
o_interp(ceil(to(i))==floor(tinterp)) = o(i);
end
figure,
subplot 311
hold on, plot(t1,s1,'ro'), plot(tinterp,s1_interp,'k-')
legend('observation','interpolation')
title ('signal 1')
subplot 312
hold on, plot(t2,s2,'ro'), plot(tinterp,s2_interp,'k-')
legend('observation','interpolation')
title ('signal 2')
subplot 313
hold on, plot(to,o,'ro'), plot(tinterp,o_interp,'k-')
legend('observation','interpolation')
title ('O')
Its not ideal as for large vectors this might become ineffective as soon as you have small sampling frequencies in one of the signals which will determine the lowest resolution.
Another option would be to define a coarser time vector and look at the number of events that happend in a certain period which might have some predictive power as well (not sure about your setup).
The structure would be something like
coarse_t = 1:5:100;
s1_coarse = zeros(size(coarse_t));
s2_coarse = zeros(size(coarse_t));
o_coarse = zeros(size(coarse_t));
for i = 2:length(coarse_t)
s1_coarse(i) = sum(nonzeros(s1(t1<coarse_t(i) & t1>coarse_t(i-1))));
s2_coarse(i) = sum(nonzeros(s2(t2<coarse_t(i) & t2>coarse_t(i-1))));
o_coarse(i) = sum(nonzeros(o(to<coarse_t(i) & to>coarse_t(i-1))));
end
I have a question,
In Matlab, I have a vector of 20 years of daily data (X) and a vector of the relevant dates (DATES). In order to find the mean value of the daily data per year, I use the following script:
A = fints(DATES,X); %convert to financial time series
B = toannual(A,'CalcMethod', 'SimpAvg'); %calculate average value per year
C = fts2mat(B); %Convert fts object to vector
C is a vector of 20 values. showing the average value of the daily data for each of the 20 years. So far, so good.. Now I am trying to do the same thing but instead of calculating mean values annually, i need to calculate std annually but it seems there is not such an option with function "toannual".
Any ideas on how to do this?
THANK YOU IN ADVANCE
I'm assuming that X is the financial information and it is an even distribution across each year. You'll have to modify this if that isn't the case. Just to clarify, by even distribution, I mean that if there are 20 years and X has 200 values, each year has 10 values to it.
You should be able to do something like this:
num_years = length(C);
span_size = length(X)/num_years;
for n = 0:num_years-1
std_dev(n+1,1) = std(X(1+(n*span_size):(n+1)*span_size));
end
The idea is that you simply pass the date for the given year (the day to day values) into matlab's standard deviation function. That will return the std-dev for that year. std_dev should be a column vector that correlates 1:1 with your C vector of yearly averages.
unique_Dates = unique(DATES) %This should return a vector of 20 elements since you have 20 years.
std_dev = zeros(size(unique_Dates)); %Just pre allocating the standard deviation vector.
for n = 1:length(unique_Dates)
std_dev(n) = std(X(DATES==unique_Dates(n)));
end
Now this is assuming that your DATES matrix is passable to the unique function and that it will return the expected list of dates. If you have the dates in a numeric form I know this will work, I'm just concerned about the dates being in a string form.
In the event they are in a string form you can look at using regexp to parse the information and replace matching dates with a numeric identifier and use the above code. Or you can take the basic theory behind this and adapt it to what works best for you!