Stata: why is my matrix not clearing over the foreach loop - for-loop

When I run the following code, the two output matrices (diffInDiffOne & diffInDiffTwo) are the same. My guess is that coeffs is not being replaced after each loop but I have no idea why . I think that the coefficients matrix is being overwritten but I have no idea how. I tried changing the for loop order but this surprisingly didn't solve my issue either:
local treatments treat_one treat_two
matrix diffInDiffOne = J(1,9,.)
matrix diffInDiffTwo = J(1,9,.)
foreach treatment in `treatments' {
reg science inSchool#`treatment'#male
matrix coeffs=e(b)
if treat_one==`treatment'{
matrix diffInDiffOne = diffInDiffOne\coeffs
}
if treat_two==`treatment'{
matrix diffInDiffTwo = diffInDiffTwo\coeffs
}
}
matrix list diffInDiffOne
matrix list diffInDiffTwo
When I list the matrix they are both the same, depsite the fact that two regressions give different answers. Any help with this issue is much appreciated. Thanks

This code appears at first sight to reduce to
reg science inSchool#treat_one#male
matrix li e(b)
reg science inSchool#treat_two#male
matrix li e(b)
apart from the detail of adding nine missing values to the matrix.
However, that is not your code, so what is biting you? I guess at something much more subtle.
You should need to be very careful with the if command. Variables evaluated in if commands are evaluated in their first observation. So, the first time round the loop
the conditions are
if treat_one[1] == treat_one[1]
if treat_two[1] == treat_one[1]
The second time, it is
if treat_one[1] == treat_two[1]
if treat_two[1] == treat_two[1]
If it is true in your data that treat_one[1] == treat_two[1] the effect will not be as you may imagine.
If you want to test for equality of strings, do something like
if "`treatment'" == "treat_one"
You may have in mind something more like
foreach treatment in treat_one treat_two {
reg science inSchool#`treatment'#male
matrix `treatment' = e(b)
matrix list `treatment`
}
You seem to be wanting to write very complicated code for rather simple problems. A while back, I recommended thinking in terms of do-files rather than programs. That may be advice to reconsider.

Related

Julia multiply 2 matrices in a for loop with a formula

Hello hopefully someone can help me... I am absolutely new to programming.
I just want to multiply two matrices in Julia with a for loop. It is about the frequency of a oscillation table at continuous casting in a steel plant. The below function calculates the frequency (=freqres) for a given mold stroke and casting speed(vc). The result is the frequency of the oscillation table for a given casting speed (from 1000mm/min to 6000mm/min).
vc = [1000:100:6000;]
function freqcalc(vc,stroke)
for i in vc
push!(freqres, 3i/4stroke)
end
end
Now I want to calculate the negative strip time with the following formula:
nst = (60/(pi*freqres))*acos(vc/(pi*stroke*freqres))
With the below command I did not achieve the desired result, I think due to a nested condition.
nstres = Float64[]
for i in vc, j in freqres
push!(nstres, (60/(pi*j))*acos(i/(pi*stroke*j)))
end
I just want to have the result of every entry of vc and freqres of the above formula in nstres.
push!(nstres, (60/(pi*freqres1))*acos(vc1/(pi*stroke*freqres1)))
push!(nstres, (60/(pi*freqres2))*acos(vc2/(pi*stroke*freqres2)))
I hope I could clarify my question, and thanks for your comments, I very appreciate it.

Designing an algorithm to check combinations

I´m having serious performance issues with a job that is running everyday and I think i cannot improve the algorithm; so I´m gonnga explain you what is the problem to solve and the algorithm we have, and maybe you have some other ideas to solve the problem better.
So the problem we have to solve is:
There is a set of Rules, ~ 120.000 Rules.
Every rule has a set of combinations of Codes. Codes are basically strings. So we have ~8 combinations per rule. Example of a combination: TTAAT;ZZUHH;GGZZU;WWOOF;SSJJW;FFFOLL
There is a set of Objects, ~800 objects.
Every object has a set of ~200 codes.
We have to check for every Rule, if there is at least one Combination of Codes that is fully contained in the Objects. It means =>
loop in Rules
Loop in Combinations of the rule
Loop in Objects
every code of the combination found in the Object? => create relationship rule/object and continue with the next object
end of loop
end of loop
end of loop
For example, if we have the Rule with this combination of two codes: HHGGT; ZZUUF
And let´s say we have an object with this codes: HHGGT; DHZZU; OIJUH; ZHGTF; HHGGT; JUHZT; ZZUUF; TGRFE; UHZGT; FCDXS
Then we create a relationship between the Object and the Rule because every code of the combination of the rule is contained in the codes of the object => this is what the algorithm has to do.
As you can see this is quite expensive, because we need 120.000 x 8 x 800 = 750 millions of times in the worst-case scenario.
This is a simplified scenario of the real problem; actually what we do in the loops is a little bit more complicated, that´s why we have to reduce this somehow.
I tried to think in a solution but I don´t have any ideas!
Do you see something wrong here?
Best regards and thank you for the time :)
Something like this might work better if I'm understanding correctly (this is in python):
RULES = [
['abc', 'def',],
['aaa', 'sfd',],
['xyy', 'eff',]]
OBJECTS = [
('rrr', 'abc', 'www', 'def'),
('pqs', 'llq', 'aaa', 'sdr'),
('xyy', 'hjk', 'fed', 'eff'),
('pnn', 'rrr', 'mmm', 'qsq')
]
MapOfCodesToObjects = {}
for obj in OBJECTS:
for code in obj:
if (code in MapOfCodesToObjects):
MapOfCodesToObjects[code].add(obj)
else:
MapOfCodesToObjects[code] = set({obj})
RELATIONS = []
for rule in RULES:
if (len(rule) == 0):
continue
if (rule[0] in MapOfCodesToObjects):
ValidObjects = MapOfCodesToObjects[rule[0]]
else:
continue
for i in range(1, len(rule)):
if (rule[i] in MapOfCodesToObjects):
codeObjects = MapOfCodesToObjects[rule[i]]
else:
ValidObjects = set()
break
ValidObjects = ValidObjects.intersection(codeObjects)
if (len(ValidObjects) == 0):
break
for vo in ValidObjects:
RELATIONS.append((rule, vo))
for R in RELATIONS:
print(R)
First you build a map of codes to objects. If there are nObj objects and nCodePerObj codes on average per object, this takes O(nObj*nCodePerObj * log(nObj*nCodePerObj).
Next you iterate through the rules and look up each code in each rule in the map you built. There is a relation if a certain object occurs for every code in the rule, i.e. if it is in the set intersection of the objects for every code in the rule. Since hash lookups have O(1) time complexity on average, and set intersection has time complexity O(min of the lengths of the 2 sets), this will take O(nRule * nCodePerRule * nObjectsPerCode), (note that is nObjectsPerCode, not nCodePerObj, the performance gets worse when one code is included in many objects).

An efficient, optimized code for matching rows in a huge matrix

I have a huge matrix on which I need to do some matching operation. Here's the code I have written and it works fine, but I think there is some room to make it more optimized or write another code that does the matching in less time. Could you please help me with that?
rowsMatched = find(bigMatrix(:, 1) == matchingRow(1, 1)
& bigMatrix(:, 2) == matchingRow(1, 2)
& bigMatrix(:, 3) == matchingRow(1, 3))
The problem with this code is that I cannot use && operand. So, in case one of the columns do not match, the program still checks the next condition. How can I avoid this?
Update: Here's the solution to this problem:
rowsMatched = find(all(bsxfun(#eq, bigMatrix, matchingRow),2));
Thank you
You can use BSXFUN to do it in a vectorized manner:
rowsMatched = find(all(bsxfun(#eq, bigMatrix, matchingRow),2));
Notice that it will work for any number of columns, just matchingRow should have the same number of columns as bigMatrix.

Removing a "row" from a structure array

This is similar to a question I asked before, but is slightly different:
So I have a very large structure array in matlab. Suppose, for argument's sake, to simplify the situation, suppose I have something like:
structure(1).name, structure(2).name, structure(3).name structure(1).returns, structure(2).returns, structure(3).returns (in my real program I have 647 structures)
Suppose further that structure(i).returns is a vector (very large vector, approximately 2,000,000 entries) and that a condition comes along where I want to delete the jth entry from structure(i).returns for all i. How do you do this? or rather, how do you do this reasonably fast? I have tried some things, but they are all insanely slow (I will show them in a second) so I was wondering if the community knew of faster ways to do this.
I have parsed my data two different ways; the first way had everything saved as cell arrays, but because things hadn't been working well for me I parsed the data again and placed everything as vectors.
What I'm actually doing is trying to delete NaN data, as well as all data in the same corresponding row of my data file, and then doing the very same thing after applying the Hampel filter. The relevant part of my code in this attempt is:
for i=numStock+1:-1:1
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
stock(i).return = sort(stock(i).return);
stock(i).returnLength = length(stock(i).return);
stock(i).medianReturn = median(stock(i).return);
stock(i).madReturn = mad(stock(i).return,1);
end;
for i=numStock:-1:1
for j = length(stock(i+1).volume):-1:1
if(isnan(stock(i+1).volume(j)))
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end
end
stock(i+1).volume = sort(stock(i+1).volume);
stock(i+1).volumeLength = length(stock(i+1).volume);
stock(i+1).medianVolume = median(stock(i+1).volume);
stock(i+1).madVolume = mad(stock(i+1).volume,1);
end;
for i=numStock+1:-1:1
for j=stock(i).returnLength:-1:1
if (abs(stock(i).return(j) - stock(i).medianReturn) > 3*stock(i).madReturn)
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end;
end;
end;
for i=numStock:-1:1
for j=stock(i+1).volumeLength:-1:1
if (abs(stock(i+1).volume(j) - stock(i+1).medianVolume) > 3*stock(i+1).madVolume)
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end;
end;
end;
However, this returns an error:
"Matrix index is out of range for deletion.
Error in Failure (line 110)
stock(k).return(j) = [];"
So instead I tried by parsing everything in as vectors. Then I decided to try and delete the appropriate entries in the vectors prior to building the structure array. This isn't returning an error, but it is very slow:
%% Delete bad data, Hampel Filter
% Delete bad entries
id=strcmp(returns,'');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
id=strcmp(returns,'C');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
% Convert returns from string to double
returns=cellfun(#str2double,returns);
sp500=cellfun(#str2double,sp500);
% Delete all data for which a return is not a number
nanid=isnan(returns);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Delete all data for which a volume is not a number
nanid=isnan(volume);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Apply the Hampel filter, and delete all data corresponding to
% observations deleted by the filter.
medianReturn = median(returns);
madReturn = mad(returns,1);
for i=length(returns):-1:1
if (abs(returns(i) - medianReturn) > 3*madReturn)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
medianVolume = median(volume);
madVolume = mad(volume,1);
for i=length(volume):-1:1
if (abs(volume(i) - medianVolume) > 3*madVolume)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
As I said, this is very slow, probably because I'm using a for loop on a very large data set; however, I'm not sure how else one would do this. Sorry for the gigantic post, but does anyone have a suggestion as to how I might go about doing what I'm asking in a reasonable way?
EDIT: I should add that getting the vector method to work is probably preferable, since my aim is to put all of the return vectors into a matrix and get all of the volume vectors into a matrix and perform PCA on them, and I'm not sure how I would do that using cell arrays (or even if princomp would work on cell arrays).
EDIT2: I have altered the code to match your suggestion (although I did decide to give up speed and keep with the for-loops to keep with the structure array, since reparsing this data will be way worse time-wise). The new code snipet is:
stock_return = zeros(numStock+1,length(stock(1).return));
for i=1:numStock+1
for j=1:length(stock(i).return)
stock_return(i,j) = stock(i).return(j);
end
end
stock_return = stock_return(~any(isnan(stock_return)), : );
This returns an Index exceeds matrix dimensions error, and I'm not sure why. Any suggestions?
I could not find a convenient way to handle structures, therefore I would restructure the code so that instead of structures it uses just arrays.
For example instead of stock(i).return(j) I would do stock_returns(i,j).
I show you on a part of your code how to get rid of for-loops.
Say we deal with this code:
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
Now, the deletion of columns with any NaN data goes like this:
stock_return = stock_return(:, ~any(isnan(stock_return)) );
As for the absolute difference from medianVolume, you can write a similar code:
% stock_return_length is a scalar
% stock_median_return is a column vector (eg. [1;2;3])
% stock_mad_return is also a column vector.
median_return = repmat(stock_median_return, stock_return_length, 1);
is_bad = abs(stock_return - median_return) > 3.* stock_mad_return;
stock_return = stock_return(:, ~any(is_bad));
Using a scalar for stock_return_length means of course that the return lengths are the same, but you implicitly assume it in your original code anyway.
The important point in my answer is using any. Logical indexing is not sufficient in itself, since in your original code you delete all the values if any of them is bad.
Reference to any: http://www.mathworks.co.uk/help/matlab/ref/any.html.
If you want to preserve the original structure, so you stick to stock(i).return, you can speed-up your code using essentially the same scheme but you can only get rid of one less for-loop, meaning that your program will be substantially slower.

In Stata, how do I manipulate matrix elements by their name?

In Stata, after a regression I know it is possible to call the elements of stored results by name. For example, if I want to manipulate the coefficient on the variable precip, I just type _b[precip]. My question is how do I do the same after the tabstat command? For example, say I want to multiply the coefficient on precip by the sample mean of precip:
reg --variables in regression--
tabstat --variables in regression--
mat X=r(StatTotal)
mat Y=_b[precip]*X[1,precip]
Ah, if only it were that simple. But alas, in the last line X[1, precip] is invalid syntax. Oddly, Stata does recognize display X[1, precip]. And Stata would know what I'm trying to do if instead of precip I used the column number where precip appears in the X vector. If I were just doing this operation once, no problem. But I need to do this operation several times (for several different model specifications) and for several variables which change position in the vector from one model to the next, so I cannot just use the column number.
I am not yet sure I understand exactly what you want to do, but here's my attempt to reproduce what you are doing:
sysuse auto, clear
regress price mpg foreign weight
tabstat mpg foreign weight, save
matrix X = r(StatTotal)
matrix Y = _b[mpg]*X[1, colnumb(X, "mpg") ]
If you need to put this into a cycle, that's doable, too:
matrix bb = e(b)
local explvar : colnames bb
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix Y_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix Y_`x' = _b[`x']
}
}
You'd probably want to put this into a program that you will call after each regression model estimation call, e.g.:
program define reg2mat , prefix( name )
if "`e(cmd)'" != "regress" {
// this will intentionally produce an error
regress
}
tempname bb
matrix `bb' = e(b)
local explvar : colnames `bb'
foreach x in `explvar' {
if "`x'" != "_cons" {
matrix `prefix'_`x' = _b[`x'] * X[1, colnumb(X, "`x'")]
}
else {
matrix `prefix'_`x' = _b[`x']
}
}
end // of reg2mat
At many levels, it is not ideal, as it manipulates with the (global) matrices in Stata memory; most of the time, it is a bad idea, as the programs should only manipulate with objects local to them.
I suspect that what you want to do is addressed, in one way or another, by either omnipowerful margins command, or by an appropriate predict, or by matrix score (which is the low level version of predict). Attributing the effects to a variable only makes sense when your regressors are orthogonal, which only happens in carefully designed and conducted experiments.

Resources