Julia pmap: How to effectively send helper functions to other workers?

Julia pmap: How to effectively send helper functions to other workers? - parallel-processing

When using pmap function, an error would occur if the function is not defined on any of the worker processes. However, when the function calls other functions or using other functions inside another .jl file, to use #everywhere macro on every related functions is certainly not a good solution.
Is there a neat way to effectively send a function along with its helpers to all available workers?

I do not think there is a macro that can be used with a function definition to send the definitions of all its helper functions to all the worker processes.
However, there are better ways to send all the functions you need than putting an #everywhere before each of them.
You can put all these functions in a file and include it everywhere with #everywhere include("mynewfile.jl"). If your functions use other functions inside another .jl file, put that include statement for the other file in mynewfile.jl as well. If you are using modules from the other file, put the using or import statements inside mynewfile.jl
In a similar way, instead of a file, you can use #everywhere begin...end block. Put all these functions, using or import statements, includes etc. into a begin...end block and put an #everywhere before the begin. This is especially useful if you are working with IJulia notebooks.
julia> #everywhere begin
g(x) = x^2
f(x) = g(x)*2
end
julia> pmap(f, 1:5)
5-element Array{Int64,1}:
2
8
18
32
50
You can also create modules/packages and just use a single #eveywhere using MyNewModule. If you are using modules outside a package, you should also include the definitions/file of that module everywhere.
You might find it useful to read the relevant manual entry.

Related

Is there an identity function in Ruby?

I'm currently writing a Ruby class that provides a menu of base lambdas that can be mixed-and-matched to create a new set of lambdas. (it's an evolutionary algorithm that requires a lot of customizing of the fitness function depending on the dataset)
The configuration fire where this happens it full of stuff like this
function_from_modifier.(base_function, either.(modifier_from.(case),->(x){x}) )
The identity function ->(x){x} pops up several times in the configuration file, and it looks ugly, so I was wondering if there is a more elegant way of doing this. Is something like Elixir's &(&1) possible in Ruby?

tl;dr summary: there is no identity function in the Ruby core or standard libraries. In fact, there are no functions (in the sense of pre-defined Proc instances) at all anywhere in the core or standard libraries.
First-class functions in Ruby are kind-of second-class (if that makes sense).
When Yukihiro "matz" Matsumoto first designed Ruby, he surveyed the standard libraries of other languages for uses of first-class and higher-order functions, and he found that the vast majority of uses were:
a single function argument
that is not stored, passed, or returned
that is only immediately invoked one or more times
A significant portion of higher-order functions where this is not true are control structures (e.g. if which takes a condition and two consequences), which however he wanted to model as built-in language constructs, not library functions.
Therefore, he decided to optimize Ruby for the common case that he identified, and created blocks.
Blocks are not first-class functions:
they aren't objects
you can't send messages to them
they can't be stored in variables
they can't be returned
they can't be freely passed as arguments, you can only pass at most one and only at a special place in the argument list
As a result, real (in the sense that they are actual objects) first-class functions (Procs) are in some sense second-class language features compared to blocks:
they are more expensive in memory
calling them is slower
they are more syntactically cumbersome to create
So, in essence, it is not surprising that you are running into limitations when trying to use Ruby the way you do: that's not what Ruby was designed for.
In the past, I used to carry around a helper library with constants such like this:
class Proc
Id = -> x { x }
end
But I stopped doing that. If I want to use Ruby, I use Ruby as an OO language, if I want to do fancy FP, I use Haskell or Scala or Clojure or Racket or …

There is no anonymous functions capture in ruby, because in OOP there are objects having methods defined on them, not functions.
The most similar call would be Object#itself, and while one might do method(:itself) instead of ->(x) { x }, it would not be exactly same for many reasons.
Why would not you just assign the anonymous function to the variable and use it instead, like λ = ->(x) { x } and then use λ everywhere?
Sidenote: using a bare function instead of the block in call to either looks like a bad design to me. Blocks are faster, and everywhere in the core lib ruby uses blocks in such a case.

Best way to use variable across functions in python

I need to refer to the same variable in several functions in a python script. (The write context of a CGPDFDocument).
I can either write global writeContext in every function;
or add the variable as an argument to every function;
....both of which seem to be excessive duplication.
Other answers to questions about global variables in python suggest that they are "bad".
So what's the better way of handling it?

Matlab: Are local functions (subfunctions) compiled together with main function or separately?

I have heard that MATLAB has an automatic in-need compilation of functions which could create a lot of function-call overhead if you call a function many times like in the following code:
function output = BigFunction( args )
for i = 1:10000000
SmallFunction( args );
end
end
Is it faster to call the function SmallFunction() if I put it in the same file as BigFunction() as a local function? Or is there any good solution other than pasting the code from SmallFunction() into the BigFunction() to optimize the performance?
Edit: It may be false assumption that the function-call overhead is because of the in-need compilation. The question is how to cut down on the overhead without making the code look awful.

Matlab hashes the functions it reads into memory. The functions are only compiled once if they exist as an independent function in its own file. If you put BigFunction in BigFunction.m and SmallFunction in SmallFunction.m then you should recieve the optimization benefit of having the m-script compiled once.

The answer to my first question is that a local function performs the same as a function in another file.
An idea for the second question is to, if possible, make SmallFunction() an inline-function, which has less function-call overhead. I found more about function-call performances in the MathWorks forum, and I paste the question and answer below:
Question:
I have 7 different types of function call:
An Inline function. The body of the function is directory written down (inline).
A function is defined in a separate MATLAB file. The arguments are passed by the calling function (file-pass).
A function is defined in a separate MATLAB file. The arguments are provided by referencing global variables; only indices are provided by the calling function (file-global).
A nested function. The arguments are passed by the enclosing function (nest-pass).
A nested function. The arguments are those shared with the enclosing function; only indices are provided by the enclosing function (nest-share).
A sub function. The arguments are passed by the calling function (sub-pass).
A sub function. The arguments are provided by referencing global variables; only indices are provided by the calling function (sub-global).
I would like to know which function call provides better performance than the others in general.
The answer from MathWorks Support Team pasted here:
The ordering of performance of each function call from the fastest to the slowest tends to be as follows:
inline > file-pass = nest-pass = sub-pass > nest-share > sub-global > file-global
(A>B means A is faster than B and A=B means A is as fast as B)
First, inline is the fastest as it does not incur overhead associated with function call.
Second, when the arguments are passed to the callee function, the calling function sets up the arguments in such a way that the callee function knows where to retrieve them. This setup associated with function call in general incurs performance overhead, and therefore file-pass, nest-pass, and sub-pass are slower than inline.
Third, if the workspace is shared with nested functions and the arguments to a nested function are those shared within the workspace, rather than pass-by-value, then performance of that function call is inhibited. If MATLAB sees a shared variable within the shared workspace, it searches the workspace for the variable. On the other hand, if the arguments are passed by the calling function, then MATLAB does not have to search for them. The time taken for this search explains that type nest-share is slower than file-pass, nest-pass, and sub-pass.
Finally, when a function call involves global variables, performance is even more inhibited. This is because to look for global variables, MATLAB has to expand its search space to the outside of the current workspace. Furthermore, the reason a function call involving global variables appears a lot slower than the others is that MATLAB Accelerator does not optimize such a function call. When MATLAB Accelerator is turned off with the following command,
feature accel off
the difference in performance between inline and file-global becomes less significant.
Please note that the behaviors depend largely on various factors such as operating systems, CPU architectures, MATLAB Interpreter, and what the MATLAB code is doing.

Bash Functions Order and Timing

This should be easy to answer, but I couldn't find exactly what I was asking on google/stackoverflow.
I have a bash script with 18 functions (785 lines)- ridiculous, I know I need to learn another language for the lengthy stuff. I have to run these functions in a particular order because the functions later in the sequence use info from the database and/or text files that were modified by the functions preceding. I am pretty much done with the core functionality of all the functions individually and I would like a function to run them all (One ring to rule them all!).
So my questions are, if I have a function like so:
function precious()
{
rings_of #Functions in Sequence
elves #This function Modifies DB
men #This function uses DB to modify text
dwarves #This function uses that modified text
}
Would variables be carried from one function to the next if declared like so? (inside of a function):
function men()
{
...
frodo_sw_name=`some DB query returning the name of Frodo's sword`
...
}
Also, if the functions are called in a specific order, as seen above, will Bash wait for one function to finish before starting the next? - I am pretty sure the answer is yes, but I have a lot of typing to do either way, and since I couldn't find this answer quickly on the internet, I figured it might benefit others to have this answer posted as well.
Thanks!

Variables persist unless you run the function in a subshell. This would happen if you run it as part of a pipeline, or group it with (...) (you should use { ... } instead for grouping if you don't want to create a subshell.
The exception is if you explicitly declare the variables in the function with declare, typeset, or local, which makes them local to that function rather than global to the script. But you can also use the -g option to declare and typeset to declare global variables (this would obviously be inappropriate for the local declaration).
See this tutorial on variable scope in bash.
Commands are all run sequentially, unless you deliberately background them with & at the end. There's no difference between functions and other commands in this regard.

How can I get the name of a calling function within a module in Mathematica?

If I write a function or module that calls another module, how can I get the name of the calling function/module? This would be helpful for debugging purposes.

The Stack function will do almost exactly what you want, giving a list of the "tags" (for your purposes, read "functions") that are in the call stack. It's not bullet-proof, because of the existence of other functions like StackBegin and StackInhibit, but those are very exotic to begin with.
In most instances, Stack will return the symbols that name the functions being evaluated. To figure out what context those symbols are from, you can use the Context function, which is aboput as close as you can come to figuring out what package they're a part of. This requires some care, though, as symbols can be added to packages dynamically (via Get, Import, ToExpression or Symbol) and they can be redefined or modified (with new evaluation rules, for instance) in other packages as well.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Julia pmap: How to effectively send helper functions to other workers? - parallel-processing

Related

Is there an identity function in Ruby?

Best way to use variable across functions in python

Matlab: Are local functions (subfunctions) compiled together with main function or separately?

Bash Functions Order and Timing

How can I get the name of a calling function within a module in Mathematica?

Categories

Resources