I have matrix with 400 rows and 40 columns.
I would like to create a new matrix from this data where I calculate the concordance between 2 variables, i.e., concord [A1,B1]=number1; concord [A1,B2]=number2; [A1,B39]=number39. So, number1 should now be the first number of the first row of a new matrix; number 2 is the second number in the first row....
The end result is a new matrix that shows the rho_c for each pair of numbers in the original data matrix.
The original matrix has a lot of empty cells. I can also create multiple matrix of subsections of concordance calculations, it doesn't matter much. However, I don't quite understand how to write this command in mata.
I've searched here: http://jasoneichorst.com/wp-content/uploads/2012/01/BeginMatrix.pdf
EDIT: The data looks like this (variable "Score1" is a rater). Not all raters rate the same item.
enter image description here
Assuming I fully understand the question, there are methods to do this. One which comes to mind involves the use of concord available from SSC (ssc install concord) along with some local macros and loops.
/* Clear and set up sample data */
clear *
set obs 60
forvalues i = 1/6 {
gen A`i' = runiform()
}
replace A2 = . in 10/L
replace A3 = . in 1/5
replace A3 = . in 20/L
replace A4 = . in 1/20
replace A4 = . in 30/L
replace A5 = . in 1/15
replace A5 = . in 40/L
replace A6 = . in 1/40
/* End data set-up */
* describe, varlist will allow you to store your variables in a local macro
qui describe, varlist
local vars `r(varlist)'
* get number of variables in local macro vars
local varcount : word count `vars'
* Create a matrix to hold rho_c
mat rho = J(6,6,.)
mat rownames rho = `vars'
mat colnames rho = `vars'
* Loop through vars to run concord on all unique combinations of A1-A6
* using the position of each variable in local vars to assign the var name
* to local x and local y
* concord is executed only for j >= i so that you don't end up with two sets
* of the same variables being ran (eg., A1,A2 and A2,A1)
forvalues i = 1/`varcount' {
local y `: word `i' of `vars''
forvalues j = 1/`varcount' {
local x `: word `j' of `vars''
if `j' >= `i' {
capture noisily concord `y' `x'
mat rho[`i',`j'] = r(rho_c)
}
}
}
* Display the results stored in the matrix, rho.
mat list rho
The above code should get you started, but there may need to be changes made depending on exactly what you want to do.
You will notice that inside of the loop, I have included capture noisily before concord. The reason for this is because in the image you linked to, your variables were missing values across entire sections of observations. This will likely result in an error message being thrown (specifically, r(2000): no observations). The capture piece forces Stata to continue to execute the loop if an error occurs there. The noisily piece tells Stata to display the output from concord even though capture was specified.
Also, if you search help concord in Stata, you will be directed to the help page which indicates that the concordance correlation coefficient is stored in r(rho_c). You can store these as individual scalars inside the loop or do as in the example and create a kxk matrix of values.
Related
In Stata, after running the xthtaylor command, the command
matrix regtab = r(table)
yields an empty matrix. I think this is because of the multilevel of the output of this command
Being new to Stata, I haven't found how to fix this. The purpose here is to extract the coeffecient and standard errors to add them to another output (as is done in the accepted solution of How do I create a table wth both plain and robust standard errors?)
To expand on Nick's point: matrix regtab = r(table) gives you an empty matrix, because xthtaylor doesn't put anything into r(table).
To see this run the following example:
clear all // empties r(table) and everything else
webuse psidextract
* the example regression from `help xthtaylor`
xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 occ ind union ed) constant(fem blk ed)
return list doesn't have anything in r(table), but ereturn list will show you that you have access to the coefficients through e(b) and the variance-covariance matrix through e(V).
You can assign these to their own matrices as follows:
matrix betas = e(b)
matrix varcovar = e(V)
Then you can use matrix commands (see help matrix) to manipulate these matrices.
As you discovered, ereturn display creates r(table) which appears quite convenient for your use. It's worth taking a look at help return for more information about the differences between the contents of return list and ereturn list.
I wrote a function that acts on each combination of columns in an input matrix. It uses multiple for loops and is very slow, so I am trying to parallelize it to use the maximum number of threads on my computer.
I am having difficulty finding the correct syntax to set this up. I'm using the Parallel package in octave, and have tried several ways to set up the calls. Here are two of them, in a simplified form, as well as a non-parallel version that I believe works:
function A = parallelExample(M)
pkg load parallel;
# Get total count of columns
ct = columns(M);
# Generate column pairs
I = nchoosek([1:ct],2);
ops = rows(I);
slice = ones(1, ops);
Ic = mat2cell(I, slice, 2);
## # Non-parallel
## A = zeros(1, ops);
## for i = 1:ops
## A(i) = cmbtest(Ic{i}, M);
## endfor
# Parallelized call v1
A = parcellfun(nproc, #cmbtest, Ic, {M});
## # Parallelized call v2
## afun = #(x) cmbtest(x, M);
## A = parcellfun(nproc, afun, Ic);
endfunction
# function to apply
function P = cmbtest(indices, matrix)
colset = matrix(:,indices);
product = colset(:,1) .* colset(:,2);
P = sum(product);
endfunction
For both of these examples I generate every combination of two columns and convert those pairs into a cell array that the parcellfun function should split up. In the first, I attempt to convert the input matrix M into a 1x1 cell array so it goes to each parallel instance in the same form. I get the error 'C must be a cell array' but this must be internal to the parcellfun function. In the second, I attempt to define an anonymous function that includes the matrix. The error I get here specifies that 'cmbtest' is undefined.
(Naturally, the actual function I'm trying to apply is far more complex than cmbtest here)
Other things I have tried:
Put M into a global variable so it doesn't need to be passed. Seemed to be impossible to put a global variable in a function file, though I may just be having syntax issues.
Make cmbtest a nested function so it can access M (parcellfun doesn't support that)
I'm out of ideas at this point and could use help figuring out how to get this to work.
Converting my comments above to an answer.
When performing parallel operations, it is useful to think of each parallel worker that will result as separate and independent octave instances, which need to have appropriate access to all functions and variables they will require in order to do their independent work.
Therefore, do not rely on subfunctions when calling parcellfun from a main function, since this might lead to errors if the worker is unable to access the subfunction directly under the hood.
In this case, separating the subfunction into its own file fixed the problem.
I am writing a program in IDL that requires reading n images (each of m pixels) from a directory, convert them to grayscale, concatenate each image as a single vector, and then form a an m * n matrix from the data.
So far I have managed to read and convert a single image to a grayscale vector, but I can't figure out how to extend this to reading multiple image files.
Can anyone advise on how I could adapt my code in order to do this?
(The image files will all be of the same size, and stored in the same directory with convenient filenames - i.e. testpicture1, testpicture2, etc)
Thanks
pro readimage
image = READ_IMAGE('Z:\My Documents\testpicture.jpg')
redChannel = REFORM(image[0, *, *])
greenChannel = REFORM(image[1, * , *])
blueChannel = REFORM(image[2, *, *])
grayscaleImage = BYTE(0.299*FLOAT(redChannel) + $
0.587*FLOAT(greenChannel) + 0.114*FLOAT(blueChannel))
imageVec = grayscaleImage[*]
end
Use FILE_SEARCH to find the names and number of the images of the given name:
filenames = FILE_SEARCH('Z:\My Documents\testpicture*.jpg', count=nfiles)
You will probably also want to declare an array to hold your results:
imageVec = bytarr(m, nfiles)
Then loop over the files with a FOR loop doing what you are doing already:
for f = 0L, nfiles - 1L do begin
; stuff you are already doing
imageVec[*, f] = grayscaleImage[*]
endfor
I'm currently using Stata 13.1 to examine a long list of float variables (e.g., A1 - A60). Each of these variables represents the frequency of a different medical symptom (e.g., "Insomnia", "Anxiety", "Nausea"). I'd to add labels to each variable to make data analysis a bit easier, but would prefer something more elegant than:
label var A1 "Insomnia"
label var A2 "Anxiety"
.
.
.
label var A60 "Nausea"
Any suggestions are very much appreciated!
Initially, you need to store the labels in some place. You can use a local macro for that. Below an example with variables that follow some naming pattern (like your example does).
clear
set more off
*----- example data -----
gen A1 = .
gen A2 = .
gen A3 = .
*----- what you want -----
local mylabels "Insomnia Anxiety Nausea"
local n: word count `mylabels'
forvalues i = 1/`n' {
label variable A`i' `:word `i' of `mylabels''
}
describe
The looping over parallel lists technique is from: http://www.stata.com/support/faqs/programming/looping-over-parallel-lists/.
See also help macro and help help extended_fcn.
In Stata, after a regression I know it is possible to call the elements of stored results by name. For example, if I want to manipulate the coefficient on the variable precip, I just type _b[precip]. My question is how do I do the same after the tabstat command? For example, say I want to multiply the coefficient on precip by the sample mean of precip:
reg --variables in regression--
tabstat --variables in regression--
mat X=r(StatTotal)
mat Y=_b[precip]*X[1,precip]
Ah, if only it were that simple. But alas, in the last line X[1, precip] is invalid syntax. Oddly, Stata does recognize display X[1, precip]. And Stata would know what I'm trying to do if instead of precip I used the column number where precip appears in the X vector. If I were just doing this operation once, no problem. But I need to do this operation several times (for several different model specifications) and for several variables which change position in the vector from one model to the next, so I cannot just use the column number.
I am not yet sure I understand exactly what you want to do, but here's my attempt to reproduce what you are doing:
sysuse auto, clear
regress price mpg foreign weight
tabstat mpg foreign weight, save
matrix X = r(StatTotal)
matrix Y = _b[mpg]*X[1, colnumb(X, "mpg") ]
If you need to put this into a cycle, that's doable, too:
matrix bb = e(b)
local explvar : colnames bb
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix Y_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix Y_`x' = _b[`x']
}
}
You'd probably want to put this into a program that you will call after each regression model estimation call, e.g.:
program define reg2mat , prefix( name )
if "`e(cmd)'" != "regress" {
// this will intentionally produce an error
regress
}
tempname bb
matrix `bb' = e(b)
local explvar : colnames `bb'
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix `prefix'_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix `prefix'_`x' = _b[`x']
}
}
end // of reg2mat
At many levels, it is not ideal, as it manipulates with the (global) matrices in Stata memory; most of the time, it is a bad idea, as the programs should only manipulate with objects local to them.
I suspect that what you want to do is addressed, in one way or another, by either omnipowerful margins command, or by an appropriate predict, or by matrix score (which is the low level version of predict). Attributing the effects to a variable only makes sense when your regressors are orthogonal, which only happens in carefully designed and conducted experiments.