How to create a Matrix with p values from anova - matrix

I performed an ANOVA and corrected it with Tukey's test, so I got several values ​​of P.
Now I would like to build a Heatmap with these values ​​and for that I need to create an matrix with the values ​​of P to be able to make my Heat map
The first question would be how to fill a matrix with the anova p-values?
Then I made an ancova and obtained other p-values.
Now I would like to make a heatmap to compare these p-values ​​between the anova and the ancova.
Can someone help me ?
I will exemplify
anova_model <- aov( X ~ groups , data = T1)
postHocs <- glht(anova_model, linfct = mcp(groups = "Tukey"))
summary(postHocs)
This anova gave me several values ​​of P(!)
ancova_model <- aov( X ~ groups + age , data = T1)
postHocs <- glht(ancova_model, lymphct = mcp(groups = "Tukey"))
summary(postHocs)
This ancova gave me several other values ​​of P(!)
I would now like to create a Heat map to compare these P values. To see for example when age interferes a lot or not. I believe that before the ideal is to create a matrix before but I'm actually kind of lost.
Could someone help me?
Thank you very much

Related

For loop for a regression model with increasing number of predictors

Ho can I create a loop to fit models with increasing number of predictors. The first iteration should
use one predictor, then two, and so on until all predictors are included. I have to compute the RMSE
on both the training and test data for this model, and store these values in a list/array.
predictors = ['bedrooms','bathrooms','sqft_living','sqft_lot','floors',
'waterfront','view','condition','grade','sqft_above',
'sqft_basement','yr_built','yr_renovated','zipcode','lat',
'long','sqft_living15','sqft_lot15']
models = []
formula = 'price ~ bedrooms'
for p in predictors[0:19]:
formula = formula + p
print(formula)
model_linear_kc_5 = smf.ols(formula=formula, data=df_train_kc)
models.append(model_linear_kc_5.fit())
My code so far but I know this isn't right and am stuck how to do it.
I have to put print(formula) inside loop and then adjust the formula = … line until it does what I want it to.
I would really appreciate help in this regard. Thank you.

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

Eigenvalues for matrices in a for loop

I need to calculate eigenvalues of a series of matrices and then save them in a separate file. My data has 5 columns and 10,000 rows. I use the following functions:
R<-NULL
A <- setwd("c:/location of the file on this computer")
for(i in 0:1){
X<-read.table(file="Example.prn", skip=i*5, nrow=5)
M <- as.matrix(X)
E=eigen(M, only.values = TRUE)
R<-rbind(R,E)}
print(E)
}
As an example I have used a data set with 10 rows and 5 columns. This gives me the following results:
$`values`
[1] 1.350000e+02+0.000e+00i -4.000000e+00+0.000e+00i 4.365884e-15+2.395e-15i 4.365884e-15-2.395e-15i
[5] 8.643810e-16+0.000e+00i
$vectors
NULL
$`values`
[1] 2.362320e+02+0.000000e+00i -4.960046e+01+1.258757e+01i -4.960046e+01-1.258757e+01i 9.689475e-01+0.000000e+00i
[5] 1.104994e-14+0.000000e+00i
$vectors
NULL
I have three questions and I would really appreciate any help:
I want to save the results in consecutive rows, such as:
Eigenvalue(1) Eigenvalue(3) Eigenvalue(5) Eigenvalue(7) Eigenvalue(9)
Eigenvalue(2) Eigenvalue(4) Eigenvalue(6) Eigenvalue(8) Eigenvalue(10)
any thoughts?
Also, I don't understand the eigenvalues in the output. They are not numbers. For example, one of them is 2.362320e+02+0.000000e+00i. My first though was that this is the sum of five determinants for a 5x5 matrix. However, "2.362320e+02+0.000000e+00i" seems to only have four numbers in it. Any thoughts? Doesn't eigen() function calculate the final values of eigenvalues?
how can I save my outcome on an Excel file? I have used the following codes
However, the result I get from the current codes are:
> class(R)
[1] "matrix"
> print(R)
values vectors
E Complex,5 NULL
E Complex,5 NULL
I think, you can easily get values by the following code:
R<-NULL
A <- setwd("c:/location of the file on this computer")
for(i in 0:1){
X<-read.table(file="Example.prn", skip=i*5, nrow=5)
M <- as.matrix(X)
E=eigen(M, only.values = TRUE)
R<-rbind(R,E$values)}
}
and then use the answer of this question, to save R into a file

Stack multiple columns into one

I want to do a simple task but somehow I'm unable to do it. Assume that I have one column like:
a
z
e
r
t
How can I create a new column with the same value twice with the following result:
a
a
z
z
e
e
r
r
t
t
I've already tried to double my column and do something like :
=TRANSPOSE(SPLIT(JOIN(";",A:A,B:B),";"))
but it creates:
a
z
e
r
t
a
z
e
r
t
I get inspired by this answer so far.
Try this:
=SORT({A1:A5;A1:A5})
Here we use:
sort
{} to combine data
Accounting your comment, then you may use this formula:
=QUERY(SORT(ArrayFormula({row(A1:A5),A1:A5;row(A1:A5),A1:A5})),"select Col2")
The idea is to use additional column of data with number of row, then sort by row, then query to get only values.
And join→split method will do the same:
=TRANSPOSE(SPLIT(JOIN(",",ARRAYFORMULA(CONCAT(A1:A5&",",A1:A5))),","))
Here we use range only two times, so this is easier to use. Also see Concat + ArrayFormula sample.
Few hundreds rows is nothing :)
I created index from 1 to n, then pasted it twice and sorted by index. But it's obviously fancier to do it with a formula :)
Assuming Your list is in column A and (for now) the times of repeat are in C1 (can be changed to a number in the formula), then something simple like this will do (starting in B1):
=INDEX(A:A,(INT(ROW()-1)/$C$1)+1)
Simply copy down as you need it (will give just 0 after the last item). No sorting. No array. No sheets/excel problems. No heavy calculations.

In Stata, how do I manipulate matrix elements by their name?

In Stata, after a regression I know it is possible to call the elements of stored results by name. For example, if I want to manipulate the coefficient on the variable precip, I just type _b[precip]. My question is how do I do the same after the tabstat command? For example, say I want to multiply the coefficient on precip by the sample mean of precip:
reg --variables in regression--
tabstat --variables in regression--
mat X=r(StatTotal)
mat Y=_b[precip]*X[1,precip]
Ah, if only it were that simple. But alas, in the last line X[1, precip] is invalid syntax. Oddly, Stata does recognize display X[1, precip]. And Stata would know what I'm trying to do if instead of precip I used the column number where precip appears in the X vector. If I were just doing this operation once, no problem. But I need to do this operation several times (for several different model specifications) and for several variables which change position in the vector from one model to the next, so I cannot just use the column number.
I am not yet sure I understand exactly what you want to do, but here's my attempt to reproduce what you are doing:
sysuse auto, clear
regress price mpg foreign weight
tabstat mpg foreign weight, save
matrix X = r(StatTotal)
matrix Y = _b[mpg]*X[1, colnumb(X, "mpg") ]
If you need to put this into a cycle, that's doable, too:
matrix bb = e(b)
local explvar : colnames bb
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix Y_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix Y_`x' = _b[`x']
}
}
You'd probably want to put this into a program that you will call after each regression model estimation call, e.g.:
program define reg2mat , prefix( name )
if "`e(cmd)'" != "regress" {
// this will intentionally produce an error
regress
}
tempname bb
matrix `bb' = e(b)
local explvar : colnames `bb'
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix `prefix'_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix `prefix'_`x' = _b[`x']
}
}
end // of reg2mat
At many levels, it is not ideal, as it manipulates with the (global) matrices in Stata memory; most of the time, it is a bad idea, as the programs should only manipulate with objects local to them.
I suspect that what you want to do is addressed, in one way or another, by either omnipowerful margins command, or by an appropriate predict, or by matrix score (which is the low level version of predict). Attributing the effects to a variable only makes sense when your regressors are orthogonal, which only happens in carefully designed and conducted experiments.

Resources