I am using GLPK with Julia, and using the methods written by spencerlyon
sendto(2, lp = lp) #lp is type GLPK.Prob
However, I cant seem to send a type GLPK.Prob between workers. Whenever I do try to send a type GLPK.Prob, it gets 'sent' and calling
remotecall_fetch(2, whos)
confirms that the GLPK.Prob got sent
The problem appears when I try to solve it by calling
simplex(lp)
the error
GLPK.GLPKError("invalid GLPK.Prob")
appears. I know that the GLPK.Prob isnt originally an invalid GLPK.Prob and if I decide to construct the GLPK.Prob type explicitly on another worker, fx worker 2, calling simplex runs just fine
This is a problem as the GLPK.Prob is generated from a custom type of mine that is a bit on the heavy side
tl;dr Are there possibly some types that cannot be sent between workers properly?
Update
I see now that calling
remotecall_fetch(2, simplex, lp)
will return the above GLPK error
Furthermore I've just noticed that the GLPK module has got a method called
GLPK.copy_prob(GLPK.Prob, GLPK.Prob, Int)
but deepcopy (and certainly not copy) wont work when copying a GLPK.Prob
Example
function create_lp()
lp = GLPK.Prob()
GLPK.set_prob_name(lp, "sample")
GLPK.term_out(GLPK.OFF)
GLPK.set_obj_dir(lp, GLPK.MAX)
GLPK.add_rows(lp, 3)
GLPK.set_row_bnds(lp,1,GLPK.UP,0,100)
GLPK.set_row_bnds(lp,2,GLPK.UP,0,600)
GLPK.set_row_bnds(lp,3,GLPK.UP,0,300)
GLPK.add_cols(lp, 3)
GLPK.set_col_bnds(lp,1,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,1,10)
GLPK.set_col_bnds(lp,2,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,2,6)
GLPK.set_col_bnds(lp,3,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,3,4)
s = spzeros(3,3)
s[1,1] = 1
s[1,2] = 1
s[1,3] = 1
s[2,1] = 10
s[3,1] = 2
s[2,2] = 4
s[3,2] = 2
s[2,3] = 5
s[3,3] = 6
GLPK.load_matrix(lp, s)
return lp
end
This will return a lp::GLPK.Prob() which will return 733.33 when running
simplex(lp)
result = get_obj_val(lp)#returns 733.33
However, doing
addprocs(1)
remotecall_fetch(2, simplex, lp)
will result in the error above
It looks like the problem is that your lp object contains a pointer.
julia> lp = create_lp()
GLPK.Prob(Ptr{Void} #0x00007fa73b1eb330)
Unfortunately, working with pointers and parallel processing is difficult - if different processes have different memory spaces then it won't be clear which memory address the process should look at in order to access the memory that the pointer points to. These issues can be overcome, but apparently they require individual work for each data type that involves said pointers, see this GitHub discussion for more.
Thus, my thought would be that if you want to access the pointer on the worker, you could just create it on that worker. E.g.
using GLPK
addprocs(2)
#everywhere begin
using GLPK
function create_lp()
lp = GLPK.Prob()
GLPK.set_prob_name(lp, "sample")
GLPK.term_out(GLPK.OFF)
GLPK.set_obj_dir(lp, GLPK.MAX)
GLPK.add_rows(lp, 3)
GLPK.set_row_bnds(lp,1,GLPK.UP,0,100)
GLPK.set_row_bnds(lp,2,GLPK.UP,0,600)
GLPK.set_row_bnds(lp,3,GLPK.UP,0,300)
GLPK.add_cols(lp, 3)
GLPK.set_col_bnds(lp,1,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,1,10)
GLPK.set_col_bnds(lp,2,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,2,6)
GLPK.set_col_bnds(lp,3,GLPK.LO,0,0)
GLPK.set_obj_coef(lp,3,4)
s = spzeros(3,3)
s[1,1] = 1
s[1,2] = 1
s[1,3] = 1
s[2,1] = 10
s[3,1] = 2
s[2,2] = 4
s[3,2] = 2
s[2,3] = 5
s[3,3] = 6
GLPK.load_matrix(lp, s)
return lp
end
end
a = #spawnat 2 eval(:(lp = create_lp()))
b = #spawnat 2 eval(:(result = simplex(lp)))
fetch(b)
See the documentation below on #spawn for more info on using it, as it can take a bit of getting used to.
The macros #spawn and #spawnat are two of the tools that Julia makes available to assign tasks to workers. Here is an example:
julia> #spawnat 2 println("hello world")
RemoteRef{Channel{Any}}(2,1,3)
julia> From worker 2: hello world
Both of these macros will evaluate an expression on a worker process. The only difference between the two is that #spawnat allows you to choose which worker will evaluate the expression (in the example above worker 2 is specified) whereas with #spawn a worker will be automatically chosen, based on availability.
In the above example, we simply had worker 2 execute the println function. There was nothing of interest to return or retrieve from this. Often, however, the expression we sent to the worker will yield something we wish to retrieve. Notice in the example above, when we called #spawnat, before we got the printout from worker 2, we saw the following:
RemoteRef{Channel{Any}}(2,1,3)
This indicates that the #spawnat macro will return a RemoteRef type object. This object in turn will contain the return values from our expression that is sent to the worker. If we want to retrieve those values, we can first assign the RemoteRef that #spawnat returns to an object and then, and then use the fetch() function which operates on a RemoteRef type object, to retrieve the results stored from an evaluation performed on a worker.
julia> result = #spawnat 2 2 + 5
RemoteRef{Channel{Any}}(2,1,26)
julia> fetch(result)
7
The key to being able to use #spawn effectively is understanding the nature behind the expressions that it operates on. Using #spawn to send commands to workers is slightly more complicated than just typing directly what you would type if you were running an "interpreter" on one of the workers or executing code natively on them. For instance, suppose we wished to use #spawnat to assign a value to a variable on a worker. We might try:
#spawnat 2 a = 5
RemoteRef{Channel{Any}}(2,1,2)
Did it work? Well, let's see by having worker 2 try to print a.
julia> #spawnat 2 println(a)
RemoteRef{Channel{Any}}(2,1,4)
julia>
Nothing happened. Why? We can investigate this more by using fetch() as above. fetch() can be very handy because it will retrieve not just successful results but also error messages as well. Without it, we might not even know that something has gone wrong.
julia> result = #spawnat 2 println(a)
RemoteRef{Channel{Any}}(2,1,5)
julia> fetch(result)
ERROR: On worker 2:
UndefVarError: a not defined
The error message says that a is not defined on worker 2. But why is this? The reason is that we need to wrap our assignment operation into an expression that we then use #spawn to tell the worker to evaluate. Below is an example, with explanation following:
julia> #spawnat 2 eval(:(a = 2))
RemoteRef{Channel{Any}}(2,1,7)
julia> #spawnat 2 println(a)
RemoteRef{Channel{Any}}(2,1,8)
julia> From worker 2: 2
The :() syntax is what Julia uses to designate expressions. We then use the eval() function in Julia, which evaluates an expression, and we use the #spawnat macro to instruct that the expression be evaluated on worker 2.
We could also achieve the same result as:
julia> #spawnat(2, eval(parse("c = 5")))
RemoteRef{Channel{Any}}(2,1,9)
julia> #spawnat 2 println(c)
RemoteRef{Channel{Any}}(2,1,10)
julia> From worker 2: 5
This example demonstrates two additional notions. First, we see that we can also create an expression using the parse() function called on a string. Secondly, we see that we can use parentheses when calling #spawnat, in situations where this might make our syntax more clear and manageable.
Related
I'm trying to distribute a function that outputs a vector into an array.
I followed this post with something like the following code:
a = distribute([Float64[] for _ in 1:nrow(df)])
#sync #distributed for i in 1:nrow(df)
append!(localpart(a)[i], foo(df[i]))
end
But I get the following error:
BoundsError: attempt to access 145-element Vector{Vector{Float64}} at index [147]
I've only ever parallelized with SharedArrays, which aren't an option, since I need to store vectors in the shared array. Any and all advice would be life-saving.
Each localpart is indexed starting from one.
Hence you need to convert between a global index and the local index.
The function DistributedArrays.localindices() returns a one element tuple that contains a range of global indices that are mapped to the localpart.
This information can be in turn used for the index conversion:
#sync #distributed for i in 1:nrow(df)
id = i - DistributedArrays.localindices(a)[1][1] + 1
push!(localpart(a)[id], f(df[i,1]))
end
EDIT
to understand how localindiced work look at this code:
julia> addprocs(4);
julia> a = distribute([Float64[] for _ in 1:14]);
julia> fetch(#spawnat 2 DistributedArrays.localindices(a))
(1:4,)
julia> fetch(#spawnat 3 DistributedArrays.localindices(a))
(5:8,)
julia> fetch(#spawnat 4 DistributedArrays.localindices(a))
(9:11,)
julia> fetch(#spawnat 5 DistributedArrays.localindices(a))
(12:14,)
I just encountered a very confusing julia behavior. I always thought that variables defined inside a function remain local to that function. But in the following example, the scope changes.
I define a simple function as below
using Distributed
addprocs(2)
function f()
#everywhere x = myid()
#everywhere println("x = ", x)
end
Executing the following code
f()
gives the result
x = 1
From worker 2: x = 2
From worker 3: x = 3
But since x is defined inside the function, I would expect the variable x to be not defined outside the function. However, upon executing the following code
x
I get the result
1
Even more confusing is the execution of the following code
#fetchfrom 3 x
which again gives
1
This is super confusing behavior. First, how does x become available outside the function? Second, why are all the processors/cores returning the same value of x? Thank you for your help.
I've started using ParallelDataTransfer.jl library (https://github.com/ChrisRackauckas/ParallelDataTransfer.jl) which is based on the solution given in Julia: How to copy data to another processor in Julia and I've come across a couple of questions which are listed as follows:
1- How to perform a computation on a worker using its own local variables? For example, let's define "x=pi" on process 2 using "sendto(2,x=pi)", now how to execute the function "sin(x)" on process 2 and put the result in variable "y" which also lives in process 2? (Note that when running "#spawnat 2 y=sin(x), the variable "y" doesn't appear in the scope of process 2 which can be accessed by running "#spawnat 2 whos()")
2- Following the first question, is it possible that a worker (say i) request a function to be executed on another worker (say j), using worker j's own local variables and then fetch the result on worker i? For instance in the above example, how can process 3 request process 2 to compute "sin(x)" where "x" lives in process 2, and then send the result to variable "z" which lives in process 3?
According to what Keegan Lensink suggested in a Julia Channel, these works can be done as follows:
1-
Define x on process 2
sendto(2, x = pi)
Evaluate the expression on 2, and store the result in Main
#fetchfrom 2 eval(Main, :(y = sin(x)))
Check that it worked by asking for y
#fetchfrom 2 y
2-
Process 3 is asking 2 to evaluate y = sin(x), then is assigning y to z in 3's module Main
#spawnat 3 eval(Main, :(z = eval(:(#fetchfrom 2 eval(Main, :(y = sin(x)))))))
#fetchfrom 3 z
I found this post - Shared array usage in Julia, which is clearly close but I still don't really understand what to do in my case.
I am trying to pass a shared array to a function I define, and call that function using #everywhere. The following, which has no shared array, works:
#everywhere mat = rand(3,3)
#everywhere foo1(x::Array) = det(x)
Then this
#everywhere println(foo1(mat))
properly produces different results from each worker. Now let me include a shared array:
test = SharedArray(Float64,10)
#everywhere foo2(x::Array,y::SharedArray) = det(x) + sum(y)
Then this
#everywhere println(foo2(mat,test))
fails on the workers.
ERROR: On worker 2:
UndefVarError: test not defined
etc. I can get what I want like this:
for w in procs()
#spawnat w println(foo2(eval(:mat),test))
end
This works - but is it optimal? Is there a way to make it work with #everywhere?
While it's tempting to use "named variables" on workers, it generally seems to work better if you access them via references. Schematically, you might do something like this:
mat = [#spawnat p rand(3,3) for p in workers()] # process 1 holds references to objects on workers
#sync for (i, p) in enumerate(workers())
#spawnat p foo(mat[i], sharedarray)
end
I want to run a parallel for loop. I need each of my processes to have access to 2 large dictionaries, gene_dict and transcript_dict. This is what I tried first
#everywhere( function EM ... end )
generefs = [ #spawnat i genes for i in 2:nprocs()]
dict1refs = [ #spawnat i gene_dict for i in 2:nprocs()]
dict2refs = [ #spawnat i transcript_dict for i in 2:nprocs()]
result = #parallel (vcat) for i in 1:length(genes)
EM(genes[i], gene_dict, transcript_dict)
end
but I get the following error on all processes (not just on 5):
exception on 5: ERROR: genes not defined
in anonymous at no file:1514
in anonymous at multi.jl:1364
in anonymous at multi.jl:820
in run_work_thunk at multi.jl:593
in run_work_thunk at multi.jl:602
in anonymous at task.jl:6
UndefVarError(:genes)
I thought #spawnat would move the three data structures I need to all of the processes. My first thought is maybe this move takes awhile and the parallel for loop tries to run before the data transfer is complete.
The data is moved by #spawnat but it is not bound to variables with the same name as the name on the master node. Instead the data is saved in the fairly hidden Dict named Base.PGRP on the workers. To access the values, you'll have to fetch the RemoteRefs which in your case would be something like
result = #parallel (vcat) for i in 1:length(genes)
EM(fetch(genes[i]), fetch(gene_dict[i]), fetch(transcript_dict[i]))
end