Convert a multilevel output from xthtaylor to a matrix in Stata - matrix

In Stata, after running the xthtaylor command, the command
matrix regtab = r(table)
yields an empty matrix. I think this is because of the multilevel of the output of this command
Being new to Stata, I haven't found how to fix this. The purpose here is to extract the coeffecient and standard errors to add them to another output (as is done in the accepted solution of How do I create a table wth both plain and robust standard errors?)

To expand on Nick's point: matrix regtab = r(table) gives you an empty matrix, because xthtaylor doesn't put anything into r(table).
To see this run the following example:
clear all // empties r(table) and everything else
webuse psidextract
* the example regression from `help xthtaylor`
xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 occ ind union ed) constant(fem blk ed)
return list doesn't have anything in r(table), but ereturn list will show you that you have access to the coefficients through e(b) and the variance-covariance matrix through e(V).
You can assign these to their own matrices as follows:
matrix betas = e(b)
matrix varcovar = e(V)
Then you can use matrix commands (see help matrix) to manipulate these matrices.
As you discovered, ereturn display creates r(table) which appears quite convenient for your use. It's worth taking a look at help return for more information about the differences between the contents of return list and ereturn list.

Related

How to create a Matrix with p values from anova

I performed an ANOVA and corrected it with Tukey's test, so I got several values ​​of P.
Now I would like to build a Heatmap with these values ​​and for that I need to create an matrix with the values ​​of P to be able to make my Heat map
The first question would be how to fill a matrix with the anova p-values?
Then I made an ancova and obtained other p-values.
Now I would like to make a heatmap to compare these p-values ​​between the anova and the ancova.
Can someone help me ?
I will exemplify
anova_model <- aov( X ~ groups , data = T1)
postHocs <- glht(anova_model, linfct = mcp(groups = "Tukey"))
summary(postHocs)
This anova gave me several values ​​of P(!)
ancova_model <- aov( X ~ groups + age , data = T1)
postHocs <- glht(ancova_model, lymphct = mcp(groups = "Tukey"))
summary(postHocs)
This ancova gave me several other values ​​of P(!)
I would now like to create a Heat map to compare these P values. To see for example when age interferes a lot or not. I believe that before the ideal is to create a matrix before but I'm actually kind of lost.
Could someone help me?
Thank you very much

Extrapolating variance components from Weir-Fst on Vcftools

vcftools --vcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2
The above script computes Fst distances on 1000 Genomes population data using Weir and Cokerham's 1984 formula. This formula uses 3 variance components, namely a,b,c (between populations; between individuals within populations; between gametes within individuals within populations).
The output directly provides the result of the formula but not the components that the program calculated to arrive at the final result. How can I ask Vcftools to output the values for a,b,c?
If you can get the data into the format for hierfstat, you can get the variance components from varcomp.glob. What I normally do is:
use vcftools with --012 to get genotypes
convert 0/1/2/-1 to hierfstat format (eg., 11/12/22/NA)
load the data into hierfstat and compute (see below)
R example:
library(hierfstat)
data = read.table("hierfstat.txt", header=T, sep="\t")
levels = data.frame(data$popid)
loci = data[,2:ncol(data)]
res = varcomp.glob(levels=levels, loci=loci, diploid=T)
print(res$loc)
print(res$F)
Fst for each locus (row) therefore is (without hierarchical design), from res$loc: res$loc[1]/sum(res$loc). If you have more complicated sampling, you'll need to interpret the variance components differently.
--update per your comment--
I do this in Pandas, but any language would do. It's a text replacement exercise. Just get your .012 file into a dataframe and convert as below. I read in row by row into numpy b/c I have tons of snps, but read_csv would work, too.
import pandas as pd
import numpy as np
z12_data = []
for i, line in enumerate(open(z12_file)):
line = line.strip()
line = [int(x) for x in line.split("\t")]
z12_data.append(np.array(line))
if i % 10 == 0:
print i
z12_data = np.array(z12_data)
z12_df = pd.DataFrame(z12_data)
z12_df = z12_df.drop(0, axis=1)
z12_df.columns = pd.Series(z12_df.columns)-1
hierf_trans = {0:11, 1:12, 2:22, -1:'NA'}
def apply_hierf_trans(series):
return [hierf_trans[x] if x in hierf_trans else x for x in series]
hierf = df.apply(apply_hierf_trans)
hierf.to_csv("hierfstat.txt", header=True, index=False, sep="\t")
Then, you'd read that file hierfstat.txt into R, these are your loci. You'd need to specify your levels in your sampling design (e.g., your population). Then call varcomp.glob() to get the variance components. I have a parallel version of this here if you want to use it.
Note that you are specifying 0 as the reference allele, in this case. May be what you want, maybe not. I often calculate minor allele frequency and make 2 the minor allele, but it depends on your study goal.

How to setup my function with blockproc to process the image in parts?

I have an image:
I want to divide this image into 3 equal parts and calculate the SIFT for each part individually and then concatenate the results.
I found out that Matlab's blockproc does just that, but I do not know how to get it to work with my function. Here is what I have:
[r c] = size(image);
c_new = floor(c/3); %round it
B = blockproc(image, [r c_new], #block_fun)
So according to Matlabs documentation the function, block_fun will be applied to the original image in blocks of size r and c_new.
this is what I wrote as block_fun
function feats = block_fun(img)
[keypoints, descriptors] = vl_sift(single(img));
feats = descriptors;
end
So, my matrix B should be a concatenation of the SIFT descriptors of all three parts of the same image? right?
But the error that I get when I run the command:
B = blockproc(image, [r c_new], #block_fun)
Function BLOCKPROC encountered an error while evaluating the user
supplied function handle, FUN.
The cause of the error was:
Error using single Conversion to single from struct is not possible.
For your custom function, blockproc sends in a structure where the image data is stored in a field called data. As such, you simply need to change your function so that it accesses the data field in the input. Like so:
function feats = block_fun(block_struct) %// Change
[keypoints, descriptors] = vl_sift(single(block_struct.data)); %// Change
feats = descriptors;
end
This error is caused by the fact that the function that is called via its handle by blockproc expects a block struct.
The real problem is that blockproc will attempt to concatenate all results and you will have a different set of 128xN feature vectors for each block, which blockproc doesn't allow.
I think that using im2col and reshape would be much more simple.

Stata: why is my matrix not clearing over the foreach loop

When I run the following code, the two output matrices (diffInDiffOne & diffInDiffTwo) are the same. My guess is that coeffs is not being replaced after each loop but I have no idea why . I think that the coefficients matrix is being overwritten but I have no idea how. I tried changing the for loop order but this surprisingly didn't solve my issue either:
local treatments treat_one treat_two
matrix diffInDiffOne = J(1,9,.)
matrix diffInDiffTwo = J(1,9,.)
foreach treatment in `treatments' {
reg science inSchool#`treatment'#male
matrix coeffs=e(b)
if treat_one==`treatment'{
matrix diffInDiffOne = diffInDiffOne\coeffs
}
if treat_two==`treatment'{
matrix diffInDiffTwo = diffInDiffTwo\coeffs
}
}
matrix list diffInDiffOne
matrix list diffInDiffTwo
When I list the matrix they are both the same, depsite the fact that two regressions give different answers. Any help with this issue is much appreciated. Thanks
This code appears at first sight to reduce to
reg science inSchool#treat_one#male
matrix li e(b)
reg science inSchool#treat_two#male
matrix li e(b)
apart from the detail of adding nine missing values to the matrix.
However, that is not your code, so what is biting you? I guess at something much more subtle.
You should need to be very careful with the if command. Variables evaluated in if commands are evaluated in their first observation. So, the first time round the loop
the conditions are
if treat_one[1] == treat_one[1]
if treat_two[1] == treat_one[1]
The second time, it is
if treat_one[1] == treat_two[1]
if treat_two[1] == treat_two[1]
If it is true in your data that treat_one[1] == treat_two[1] the effect will not be as you may imagine.
If you want to test for equality of strings, do something like
if "`treatment'" == "treat_one"
You may have in mind something more like
foreach treatment in treat_one treat_two {
reg science inSchool#`treatment'#male
matrix `treatment' = e(b)
matrix list `treatment`
}
You seem to be wanting to write very complicated code for rather simple problems. A while back, I recommended thinking in terms of do-files rather than programs. That may be advice to reconsider.

In Stata, how do I manipulate matrix elements by their name?

In Stata, after a regression I know it is possible to call the elements of stored results by name. For example, if I want to manipulate the coefficient on the variable precip, I just type _b[precip]. My question is how do I do the same after the tabstat command? For example, say I want to multiply the coefficient on precip by the sample mean of precip:
reg --variables in regression--
tabstat --variables in regression--
mat X=r(StatTotal)
mat Y=_b[precip]*X[1,precip]
Ah, if only it were that simple. But alas, in the last line X[1, precip] is invalid syntax. Oddly, Stata does recognize display X[1, precip]. And Stata would know what I'm trying to do if instead of precip I used the column number where precip appears in the X vector. If I were just doing this operation once, no problem. But I need to do this operation several times (for several different model specifications) and for several variables which change position in the vector from one model to the next, so I cannot just use the column number.
I am not yet sure I understand exactly what you want to do, but here's my attempt to reproduce what you are doing:
sysuse auto, clear
regress price mpg foreign weight
tabstat mpg foreign weight, save
matrix X = r(StatTotal)
matrix Y = _b[mpg]*X[1, colnumb(X, "mpg") ]
If you need to put this into a cycle, that's doable, too:
matrix bb = e(b)
local explvar : colnames bb
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix Y_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix Y_`x' = _b[`x']
}
}
You'd probably want to put this into a program that you will call after each regression model estimation call, e.g.:
program define reg2mat , prefix( name )
if "`e(cmd)'" != "regress" {
// this will intentionally produce an error
regress
}
tempname bb
matrix `bb' = e(b)
local explvar : colnames `bb'
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix `prefix'_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix `prefix'_`x' = _b[`x']
}
}
end // of reg2mat
At many levels, it is not ideal, as it manipulates with the (global) matrices in Stata memory; most of the time, it is a bad idea, as the programs should only manipulate with objects local to them.
I suspect that what you want to do is addressed, in one way or another, by either omnipowerful margins command, or by an appropriate predict, or by matrix score (which is the low level version of predict). Attributing the effects to a variable only makes sense when your regressors are orthogonal, which only happens in carefully designed and conducted experiments.

Resources