I am trying to write a Stata program to help me run regressions.
I have two dependent variables, y1 and y2. On the right-hand side I have x1 x2 x3 x4 x5. I have defined my global xlist as below and wrote a program help me reg y1 on each specification.
I was wondering how to add another argument so I can run another set of regressions by using y2 as my left-hand variable.
global xlist1 "x1"
global xlist2 "x2"
global xlist3 "x1 x2"
global xlist4 "x1 x3"
global xlist5 "x1 x4"
global xlist6 "x1 x5"
capture program drop reg_and_save
program define reg_and_save
args j
reg y1 ${spec`j'}
est save results`j'.ster, replace
end
forvalues j=1(1)6{
reg_and_save `j'
}
There are numerous possible answers here depending on
how literally we take your question
whether this code is going to grow in some direction, and even if it is, in what direction.
My rather personal answer is that I see no reason for abstraction here. You are attempting to wire in various specific details into a program, but in most cases the point of a program is to provide generality, not specificity.
Similarly there is little or no point in defining globals here. Globals don't really play much part in most Stata programming.
The most obvious weakness of your code that would stop it working is that you define global xlist1 to xlist6 but use quite a different set of global names within the program.
I can't see a strong case for this to be anything but part of a .do file.
local xlist1 "x1"
local xlist2 "x2"
local xlist3 "x1 x2"
local xlist4 "x1 x3"
local xlist5 "x1 x4"
local xlist6 "x1 x5"
forval k = 1/2 {
forval j = 1/6 {
reg y`k' `xlist`j''
est save results`k'_`j'.ster, replace
}
}
There's more caprice in your code. With 5 predictors, there are in principle 31 different regressions, so long as you have enough data. Trying all those regressions would be unusual but not out of the question. In specifying only 6 of those possible regressions, you may have been just inventing an example, or have subject-matter reasons for being interested in those regressions alone, but the point remains: your details don't encourage or suggest a more general command.
Related
I'm learning the Julia language and followed some tutorials to test OLS (ordinary least squares) estimation in Julia. First, I need to simulate a dataset of dependent variable ("Y"), independent variables ("X") ,error terms (epsilon) and parameters. The script is like:
# ols_simulate :generate necessary data
using Distributions
N=100000
K=3
genX = MvNormal(eye(K))
X = rand(genX,N)
X = X'
X_noconstant = X
constant = ones(N)
X = [constant X]
genEpsilon = Normal(0, 1)
epsilon = rand(genEpsilon,N)
trueParams = [0.1,0.5,-0.3,0.]
Y = X*trueParams + epsilon
and then I defined an OLS function
function OLSestimator(y,x)
estimate = inv(x'*x)*(x'*y)
return estimate
end
What I planed to do is first to simulate data from terminal with command:
ols_simulate
and hope this step generates and stores data properly, and then I could call olsestimator . But after trying this, when I typed mean(Y) in Julia REPL, it gives me an error message like
Error: UnderdefvarError: Y not defined
it seems the data are not stored properly. More generally, if I have multiple scripts (scripts and function), how can I use the data generated by one from others in the terminal?
Thank you.
Each time you run the Julia REPL (the Julia "command-line"), it begins with a fresh memory workspace. Thus, to define variables and then use them, you should run the interpreter once.
If I understand correctly, you have multiple scripts which do parts of the calculations. To run a script in the REPL and stay in it with all the global variables still defined, you can use
include("scriptname.jl")
(with scriptname changed to appropriate .jl filename).
In this case, the workflow could look like:
include("ols_simulate.jl")
estimate = OLSestimator(Y,X)
mean(Y)
In general, it's best to stay in the REPL, unless you want to clear everything and start fresh and then quitting and restarting is the way to go.
You need to save the script in a separate file and then load it into Julia. Say you already saved it with name "ols_simulate.jl" in directory "dir1", then navigate to that directory in the Terminal, startup Julia (you might want to see this). Once in Julia, you have to load "ols_simulate.jl", after which you can calculate the mean of Y and do whatever you want:
include("ols_simulate.jl")
mean(Y)
OLSestimator(Y, X)
For the kind of stuff that I think you are doing, I think you could find useful using a notebook interface like Jupyter.
Mathematica has a bevy of useful functions (Solve, NDSolve, etc.). These functions output in a very strange manner, ie {{v -> 2.05334*10^-7}}. The major issue is that there does not appear to be any way to use the output of these functions in the program; that is to say all of these appear to be terminal functions where the output is for human viewing only.
I have tired multiple methods (Part, /., etc.) to try to get the output of functions into variables so the program can use them for further steps, but nothing works. The documentation says it can be done but nothing they list actually functions. For example, if I try to use /. to move variables, it continues to treat the variable I assigned to as empty and does symbolic math with it instead of seeing the value. If I try to access the variable ie [[1]], it says the variable is not that deep.
The only method I have found is to put the later steps in separate blocks and copy-paste the output to continue evaluation. Is there any way to get the output of these functions into variables programmatically?
Solve etc. produce a list of replacement rules. So you need to apply these rules to the pattern to be replaced. For instance
solutions = x /. Solve[x^2 == 3, x]
gives you all the solutions in a list.
Here is a quick way to get variable names for the solutions:
x1 = solutions[[1]]
x2 = solutions[[2]]
So I have a whole lot of variables I need to declare, and the original code looked like this:
DIMENSION energy_t(20000),nrt(20000),npsh(1000),xx(1000),yy(1000),
:step(1000),stepz(1000),r1(1000),rr(1000),ic(1000),diffrr(1000)
And I rewrote it as this:
DIMENSION
:energy_t(20000),
:nrt(20000),
:npsh(1000),
:step(1000),
:r1(1000),
:rr(1000),
:ic(1000),
:diffrr(1000)
Is this considered good style, or are there better ways? Note that the second way allows for comments with each variable, and I don't have to use line continuations if I might add another variable.
P.S.: is there a consensus/style bible/widely regarded source on Fortran programming style & good practices?
Good style is not to use the dimension statement in the first place. Especially if you use implicit typing. Every variable should have a declared type and is better to put the array dimension there. Use attributes with the type declaration (Fortran 90+).
real :: energy_t(20000), nrt(20000)
real, dimension(1000) :: npsh, xx, yy, step, stepz, r1, rr, ic, diffrr
Keep lines not too long. Both ways of declaring size (shape) are possible.
If you need Fortran 77, you are more limited, but still
real energy_t(20000), nrt(20000)
real npsh(1000), xx(1000), yy(1000), step(1000), stepz(1000)
real r1(1000), rr(1000), ic(1000), diffrr(1000)
is probably better.
Try to group related variables on one line and the others on different lines.
I would also suggest to declare parameter constants for the sizes 1000 and 20000.
Good style would be to parametrize the dimensions
integer, parameter:: NODES_MAX = 1000, TIMES_MAX = 2000, COORD_MAX = 1000
real energy_t(TIMES_MAX), ..
real npsh(NODES_MAX), xx(COORD_MAX) ...
so that the loops can be parameterized.
do ii = 1, COORD_MAX
xx(ii) = ...
yy(ii) = ..
end do
and error checks can be made
if (ii .gt. NODES_MAX) then
print *, 'Please increase NODES_MAX oldvalue=', NODES_MAX, ' required=', ii
pause
end if
This will also minimize the number of changes required when the dimensions are increased/decreased. This style could also have been applied 30+ years ago when F77 came out.
As the title says I'm curious about the difference between "call-by-reference" and "call-by-value-return". I've read about it in some literature, and tried to find additional information on the internet, but I've only found comparison of "call-by-value" and "call-by-reference".
I do understand the difference at memory level, but not at the "conceptual" level, between the two.
The called subroutine will have it's own copy of the actual parameter value to work with, but will, when it ends executing, copy the new local value (bound to the formal parameter) back to the actual parameter of the caller.
When is call-by-value-return actually to prefer above "call-by-reference"? Any example scenario? All I can see is that it takes extra memory and execution time due to the copying of values in the memory-cells.
As a side question, is "call-by-value-return" implemented in 'modern' languages?
Call-by-value-return, from Wikipedia:
This variant has gained attention in multiprocessing contexts and Remote procedure call: if a parameter to a function call is a reference that might be accessible by another thread of execution, its contents may be copied to a new reference that is not; when the function call returns, the updated contents of this new reference are copied back to the original reference ("restored").
So, in more practical terms, it's entirely possible that a variable is in some undesired state in the middle of the execution of a function. With parallel processing this is a problem, since you can attempt to access the variable while it has this value. Copying it to a temporary value avoids this problem.
As an example:
policeCount = 0
everyTimeSomeoneApproachesOrLeaves()
calculatePoliceCount(policeCount)
calculatePoliceCount(count)
count = 0
for each police official
count++
goAboutMyDay()
if policeCount == 0
doSomethingIllegal()
else
doSomethingElse()
Assume everyTimeSomeoneApproachesOrLeaves and goAboutMyDay are executed in parallel.
So if you pass by reference, you could end up getting policeCount right after it was set to 0 in calculatePoliceCount, even if there are police officials around, then you'd end up doing something illegal and probably going to jail, or at least coughing up some money for a bribe. If you pass by value return, this won't happen.
Supported languages?
In my search, I found that Ada and Fortran support this. I don't know of others.
Suppose you have a call by reference function (in C++):
void foobar(int &x, int &y) {
while (y-->0) {
x++;
}
}
and you call it thusly:
int z = 5;
foobar(z, z);
It will never terminate, because x and y are the same reference, each time you decrement y, that is subsequently undone by the increment of x (since they are both really z under the hood).
By contrast using call-by-value-return (in rusty Fortran):
subroutine foobar(x,y):
integer, intent(inout) :: x,y
do while y > 0:
y = y - 1
x = x + 1
end do
end subroutine foobar
If you call this routine with the same variable:
integer, z = 5
call foobar(z,z)
it will still terminate, and at the end z will be changed have a value of either 10 or 0, depending on which result is applied first (I don't remember if a particular order is required and I can't find any quick answers to the question online).
Kindly go to the following link , the program in there can give u an practical idea regarding these two .
Difference between call-by-reference and call-by-value
Leonid wrote in chapter iv of his book : "... Module, Block and With. These constructs are explained in detail in Mathematica Book and Mathematica Help, so I will say just a few words about them here. ..."
From what I have read ( been able to find ) I am still in the dark. For packaged functions I ( simply ) use Module, because it works and I know the construct. It may not be the best choice though. It is not entirely clear to me ( from the documentation ) when, where or why to use With ( or Block ).
Question. Is there a rule of thumb / guideline on when to use Module, With or Block ( for functions in packages )? Are there limitations compared to Module? The docs say that With is faster. I want to be able to defend my =choice= for Module ( or another construct ).
A more practical difference between Block and Module can be seen here:
Module[{x}, x]
Block[{x}, x]
(*
-> x$1979
x
*)
So if you wish to return eg x, you can use Block. For instance,
Plot[D[Sin[x], x], {x, 0, 10}]
does not work; to make it work, one could use
Plot[Block[{x}, D[Sin[x], x]], {x, 0, 10}]
(of course this is not ideal, it is simply an example).
Another use is something like Block[{$RecursionLimit = 1000},...], which temporarily changes $RecursionLimit (Module would not have worked as it renames $RecursionLimit).
One can also use Block to block evaluation of something, eg
Block[{Sin}, Sin[.5]] // Trace
(*
-> {Block[{Sin},Sin[0.5]],Sin[0.5],0.479426}
*)
ie, it returns Sin[0.5] which is only evaluated after the Block has finished executing. This is because Sin inside the Block is just a symbol, rather than the sine function. You could even do something like
Block[{Sin = Cos[#/4] &}, Sin[Pi]]
(*
-> 1/Sqrt[2]
*)
(use Trace to see how it works). So you can use Block to locally redefine built-in functions, too:
Block[{Plus = Times}, 3 + 2]
(*
-> 6
*)
As you mentioned there are many things to consider and a detailed discussion is possible. But here are some rules of thumb that I apply the majority of the time:
Module[{x}, ...] is the safest and may be needed if either
There are existing definitions for x that you want to avoid breaking during the evaluation of the Module, or
There is existing code that relies on x being undefined (for example code like Integrate[..., x]).
Module is also the only choice for creating and returning a new symbol. In particular, Module is sometimes needed in advanced Dynamic programming for this reason.
If you are confident there aren't important existing definitions for x or any code relying on it being undefined, then Block[{x}, ...] is often faster. (Note that, in a project entirely coded by you, being confident of these conditions is a reasonable "encapsulation" standard that you may wish to enforce anyway, and so Block is often a sound choice in these situations.)
With[{x = ...}, expr] is the only scoping construct that injects the value of x inside Hold[...]. This is useful and important. With can be either faster or slower than Block depending on expr and the particular evaluation path that is taken. With is less flexible, however, since you can't change the definition of x inside expr.
Andrew has already provided a very comprehensive answer. I would just summarize by noting that Module is for defining local variables that can be redefined within the scope of a function definition, while With is for defining local constants, which can't be. You also can't define a local constant based on the definition of another local constant you have set up in the same With statement, or have multiple symbols on the LHS of a definition. That is, the following does not work.
With[{{a,b}= OptionValue /# {opt1,opt2} }, ...]
I tend to set up complicated function definitions with Module enclosing a With. I set up all the local constants I can first inside the With, e.g. the Length of the data passed to the function, if I need that, then other local variables as needed. The reason is that With is a little faster of you genuinely do have constants not variables.
I'd like to mention the official documentation on the difference between Block and Module is available at http://reference.wolfram.com/mathematica/tutorial/BlocksComparedWithModules.html.