How to run Julia script and function from terminal? - terminal

I'm learning the Julia language and followed some tutorials to test OLS (ordinary least squares) estimation in Julia. First, I need to simulate a dataset of dependent variable ("Y"), independent variables ("X") ,error terms (epsilon) and parameters. The script is like:
# ols_simulate :generate necessary data
using Distributions
N=100000
K=3
genX = MvNormal(eye(K))
X = rand(genX,N)
X = X'
X_noconstant = X
constant = ones(N)
X = [constant X]
genEpsilon = Normal(0, 1)
epsilon = rand(genEpsilon,N)
trueParams = [0.1,0.5,-0.3,0.]
Y = X*trueParams + epsilon
and then I defined an OLS function
function OLSestimator(y,x)
estimate = inv(x'*x)*(x'*y)
return estimate
end
What I planed to do is first to simulate data from terminal with command:
ols_simulate
and hope this step generates and stores data properly, and then I could call olsestimator . But after trying this, when I typed mean(Y) in Julia REPL, it gives me an error message like
Error: UnderdefvarError: Y not defined
it seems the data are not stored properly. More generally, if I have multiple scripts (scripts and function), how can I use the data generated by one from others in the terminal?
Thank you.

Each time you run the Julia REPL (the Julia "command-line"), it begins with a fresh memory workspace. Thus, to define variables and then use them, you should run the interpreter once.
If I understand correctly, you have multiple scripts which do parts of the calculations. To run a script in the REPL and stay in it with all the global variables still defined, you can use
include("scriptname.jl")
(with scriptname changed to appropriate .jl filename).
In this case, the workflow could look like:
include("ols_simulate.jl")
estimate = OLSestimator(Y,X)
mean(Y)
In general, it's best to stay in the REPL, unless you want to clear everything and start fresh and then quitting and restarting is the way to go.

You need to save the script in a separate file and then load it into Julia. Say you already saved it with name "ols_simulate.jl" in directory "dir1", then navigate to that directory in the Terminal, startup Julia (you might want to see this). Once in Julia, you have to load "ols_simulate.jl", after which you can calculate the mean of Y and do whatever you want:
include("ols_simulate.jl")
mean(Y)
OLSestimator(Y, X)
For the kind of stuff that I think you are doing, I think you could find useful using a notebook interface like Jupyter.

Related

Small numbers misrepresented in windows

I am running a model which is written in Fortan (an executable), in some runs it started to deliver constant errors and apparently incoherent results, however when I closely checked the results file (a text with n columns of data) and I realized that when the concentration of certain mineral is very very low, lets say 2.9984199E-306, the code omits the 'E' and the number presented is 2.9984199-306 which of course causes problems. Since I have no access to the source code of the executable file, is there a way to avoid this problem in Windows? I have seen that in other computers these numbers are directly replaced by zero, however I was not able to find the specific configuration to achieve it.
You will need access to code to change the output formatting or you will need to post-process your output. You are seeing standard conforming Fortran behavior. Consider the simple program
program foo
implicit none
real(8) x
integer i
x = 1
do i = 1, 10
x = x / 5.4321e11
write(*,'(ES15.7)') x
end do
end program foo
It's output is
1.8409086E-12
3.3889446E-24
6.2387373E-36
1.1484945E-47
2.1142735E-59
3.8921843E-71
7.1651557E-83
1.3190397E-94
2.4282316-106
4.4701525-118
See Fortran 2018 Standard, 13.7.2.3.3 E and D editing, in particular, Table 13.1.

Using the output of functions in mathematica for further computation

Mathematica has a bevy of useful functions (Solve, NDSolve, etc.). These functions output in a very strange manner, ie {{v -> 2.05334*10^-7}}. The major issue is that there does not appear to be any way to use the output of these functions in the program; that is to say all of these appear to be terminal functions where the output is for human viewing only.
I have tired multiple methods (Part, /., etc.) to try to get the output of functions into variables so the program can use them for further steps, but nothing works. The documentation says it can be done but nothing they list actually functions. For example, if I try to use /. to move variables, it continues to treat the variable I assigned to as empty and does symbolic math with it instead of seeing the value. If I try to access the variable ie [[1]], it says the variable is not that deep.
The only method I have found is to put the later steps in separate blocks and copy-paste the output to continue evaluation. Is there any way to get the output of these functions into variables programmatically?
Solve etc. produce a list of replacement rules. So you need to apply these rules to the pattern to be replaced. For instance
solutions = x /. Solve[x^2 == 3, x]
gives you all the solutions in a list.
Here is a quick way to get variable names for the solutions:
x1 = solutions[[1]]
x2 = solutions[[2]]

try catch or type conversion performance in julia - (Julia 73 seconds, Python 0.5 seconds)

I have been playing with Julia because it seems syntactically similar to python (which I like) but claims to be faster. However, I tried making a similar script to something I have in python for tesing where numerical values are within a text file which uses this function:
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
For some reason, this takes a great deal of time for a text file with a reasonable amount of rows of text (~500000).
Why would this be? Is there a better way to do this? What general feature of the language can I understand from this to apply to other languages?
Here are the two exact scripts i ran with the times for reference:
python: ~0.5 seconds
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
start = time.time()
file_data = open('SMW100.asc').readlines()
file_data = map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data)
bools = [(all(map(is_number, x)), x) for x in file_data]
print time.time() - start
julia: ~73.5 seconds
start = time()
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
x = map(x-> split(replace(x, ",", " ")), open(readlines, "SMW100.asc"))
u = [(all(map(isFloat, i)), i) for i in x]
print(start - time())
Note also that you can use the float64_isvalid function in the standard library to (a) check whether a string is a valid floating-point value and (b) return the value.
Note also that the colons (:) after try and catch in your isFloat code are wrong in Julia (this is a Pythonism).
A much faster version of your code should be:
const isFloat2_out = [1.0]
isFloat2(s::String) = float64_isvalid(s, isFloat2_out)
function foo(L)
x = split(L, ",")
(all(isFloat2, x), x)
end
u = map(foo, open(readlines, "SMW100.asc"))
On my machine, for a sample file with 100,000 rows and 10 columns of data, 50% of which are valid numbers, your Python code takes 4.21 seconds and my Julia code takes 2.45 seconds.
This is an interesting performance problem that might be worth submitting to julia-users to get more focused feedback than SO will probably provide. At a first glance, I think you're hitting problems because (1) try/catch is just slightly slow to begin with and then (2) you're using try/catch in a context where there's a very considerable amount of type uncertainty because of lots of function calls that don't return stable types. As a result, the Julia interpreter spend its time trying to figure out the types of objects rather than doing your computation. It's a bit hard to tell exactly where the big bottlenecks are because you're doing a lot of things that are not very idiomatic in Julia. Also you seem to be doing your computations in the global scope, where Julia's compiler can't perform many meaningful optimizations due to additional type uncertainty.
Python is oddly ambiguous on the subject of whether using exceptions for control flow is good or bad. See Python using exceptions for control flow considered bad?. But even in Python, the consensus is that user code shouldn't use exceptions for control flow (although for some reason generators are allowed to do this). So basically, the simple answer is that you should not be doing that – exceptions are for exceptional situations, not for control flow. That is why almost zero effort has been put into making Julia's try/catch construct faster – you shouldn't be using it like that in the first place. Of course, we will probably get around to making it faster at some point.
That said, the onus is on us as the designers of Julia's standard library to make sure that we provide APIs that never force you to use exceptions for control flow. In this case, you need a function that allows you to try to parse something as a floating-point value and indicate whether that was possible or not – not by throwing an exception, but rather by returning normal values. We don't provide such an API, so this ultimately a shortcoming of Julia's standard library – as it exists right now. I've opened an issue to discuss this API design question: https://github.com/JuliaLang/julia/issues/5704. We'll see how it pans out.

excel performance using Range

I am new to VBA and I'm now working on a project where speed is absolutely everything. So as I'm writing the code, I noticed a lot of the cells in the sheet are named ranges and are referenced in the functions explicitly like this:
function a()
if range("x") > range("y") then
end if
... (just imagine a lot of named ranges)
end function
My question is, should i modify these functions so that the values in these named ranges are passed in as parameters like this:
'i can pass in the correct cells when i call the function
function a(x as int, y as int)
if x > y then
end if
...
end function
Will that speed things up a little bit? These functions are called almost constantly (except when the process is put to sleep on purpose) to communicate with a RTD server.
VBA is much slower at making "connections" to your worksheet than it is at dealing with its own variables. If your function refers to the same cell (or range) more than once then it would be advantageous to load those into memory before VBA interacts with them. For example if range("x")>range("y") is the only time in the function that either x or y are referred to then it won't matter. If you have if range("x")>range("a") and if range("x")>range("b") and so on then you'd be much better off starting your function with
varX=range("x")
varY=range("y")
and then working with the VBA variables.
It might seem that by parameterizing the function as your second example shows accomplishes my recommendation. This may or may not be the case because Excel might just treat those variables as references to the worksheet and not as values (I'm not sure). Just to be safe you should specifically define new variables at the beginning of your function and then only refer to those variables in the rest of your function.
To sum up the above wall of text, your goal should be to minimize the number of times VBA "connects" to the worksheet.

Mathematica - can I define a block of code using a single variable?

It has been a while since I've used Mathematica, and I looked all throughout the help menu. I think one problem I'm having is that I do not know what exactly to look up. I have a block of code, with things like appending lists and doing basic math, that I want to define as a single variable.
My goal is to loop through a sequence and when needed I wanted to call a block of code that I will be using several times throughout the loop. I am guessing I should just put it all in a loop anyway, but I would like to be able to define it all as one function.
It seems like this should be an easy and straightforward procedure. Am I missing something simple?
This is the basic format for a function definition in Mathematica.
myFunc[par1_,par2_]:=Module[{localVar1,localVar2},
statement1; statement2; returnStatement ]
Your question is not entirely clear, but I interpret that you want something like this:
facRand[] :=
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b])
Now every time facRand[] is called a new random integer is factored, global variables b and x are assigned, and the value of b is printed. This could also be done with Function:
Clear[facRand]
facRand =
({b, x} = Last#FactorInteger[RandomInteger[1*^12]]; Print[b]) &
This is also called with facRand[]. This form is standard, and allows addressing or passing the symbol facRand without triggering evaluation.

Resources