Can we do a parallel operation for Quantum Monte Carlo method in julia? - parallel-processing

This is my main code of parallel operation:
using Distributed
using SharedArrays
nprocs()
addprocs(7)
Now, I need to store a variable about time:
variable = SharedArray{ComplexF64, 3}(Dim, steps, paths)
Note that "steps" and "paths" denote time series and total number of trajectories, respectively. However, if i define this variable, i will meet with the out of memory probelm because Dim=10000, steps=600, and paths=1000, though i can use multiple kernels to achieve parallel operation. The code of parallel operation can be written as
#sync #distributed for path=1:paths
...
variable[:,:,path] = matrix_var
end
Actually, this variable is not my final result, and the result is
final_var = sum(variable, dim=3)
, which represents the summation of all trajectories.
Thus, I want to deal with the out of memory problem and simultaneously use parallel operation. If i cast away the dimension of "paths" when i define this variable, the out of memory problem will vanish, but parallel operation becomes invaild. I hope that there are a solution to overcome it.

Seems that for each value of path you should create the variable locally rather than on huge array. Your code might look more or less like this:
final_vars = #distributed (append!) for path=1:paths
#create local variable for a single step
locvariable = Array{ComplexF64, 2}(undef, Dim, steps)
# at any time locvariable is at most in nprocs() copies
# load data to locvariable specific to path and do your job
final_var = sum(locvariable, dim=2)
[final_var] # in this way you will get a vector of arrays
end

Related

Write a sum of independent values in parallel

I have a function, say foo() that returns an int value, and I have to pass to different values to this function to obtaion two different values that have to be summed up, eg.
result = foo(2) + foo(37)
and I would like to make those foo(2) and foo(37) to be calculated in parallel (at the same time). It may help to have two versions of foo, one that uses a for loop and another one recursive. I am quite new to Julia and parallel programming but would like to get this problem going so that I can keep it up until I can build it as a web app with Genie.jl. Also, any resources to learn about parallel programming with Julia besides its docs will be highly appreciated!
If you want to use processes for the parallelisation you can use the "distributed for" loop:
8.2.3. Aggregate results
The second situation is when you want to perform a small operation on each of the items but you also want to perform an “aggregation function” at the end to retrieve a scalar value (or an array if the input is a matrix).
In these cases, you can use the #distributed (aggregationFunction) for construct.
As an example, you run in parallel a division by 2 and then use the sum as the aggregation function (assume three working processes are available):
function f(n)
s = 0.0
for i = 1:n
s += i/2
end
return s
end
function pf(n)
s = #distributed (+) for i = 1:n # aggregate using sum on variable s
i/2
# last element of for cycle is used by the aggregator
end
return s
end
#benchmark f(10000000) # median time: 11.478 ms
#benchmark pf(10000000) # median time: 4.458 ms
(From Julia Quick Syntax Reference)
Alternatively you can use threads. Julia already have multi-threads, but Julia 1.3 (due in few days/weeks, in rc4 at time of writing) will introduce a comprehensive thread API.

In Tensorflow, what is the difference between Session.partial_run and Session.run?

I always thought that Session.run required all placeholders in the graph to be fed, while Session.partial_run only the ones specified through Session.partial_run_setup, but looking further that is not the case.
So how exactly do the two methods differentiate? What are the advantages/disadvantages of using one over the other?
With tf.Session.run, you usually give some inputs and expected outputs, and TensorFlow runs the operations in the graph to compute and return those outputs. If you later want to get some other output, even if it is with the same input, you have to run again all the necessary operations in the graph, even if some intermediate results will be the same as in the previous call. For example, consider something like this:
import tensorflow as tf
input_ = tf.placeholder(tf.float32)
result1 = some_expensive_operation(input_)
result2 = another_expensive_operation(result1)
with tf.Session() as sess:
x = ...
sess.run(result1, feed_dict={input_: x})
sess.run(result2, feed_dict={input_: x})
Computing result2 will require to run both the operations from some_expensive_operation and another_expensive_operation, but actually most of the computation is repeated from when result1 was calculated. tf.Session.partial_run allows you to evaluate part of a graph, leave that evaluation "on hold" and complete it later. For example:
import tensorflow as tf
input_ = tf.placeholder(tf.float32)
result1 = some_expensive_operation(input_)
result2 = another_expensive_operation(result1)
with tf.Session() as sess:
x = ...
h = sess.partial_run_setup([result1, result2], [input_ ])
sess.partial_run(h, result1, feed_dict={input_: x})
sess.partial_run(h, result2)
Unlike before, here the operations from some_expensive_operation will only we run once in total, because the computation of result2 is just a continuation from the computation of result1.
This can be useful in several contexts, for example if you want to split the computational cost of a run into several steps, but also if you need to do some mid-evaluation checks out of TensorFlow, such as computing an input to the second half of the graph that depends on an output of the first half, or deciding whether or not to complete an evaluation depending on an intermediate result (these may also be implemented within TensorFlow, but there may be cases where you do not want that).
Note too that it is not only a matter of avoiding repeating computation. Many operations have a state that changes on each evaluation, so the result of two separate evaluations and one evaluation divided into two partial ones may actually be different. This is the case with random operations, where you get a new different value per run, and other stateful object like iterators. Variables are also obviously stateful, so operations that change variables (like tf.Session.assign or optimizers) will not produce the same results when they are run once and when they are run twice.
In any case, note that, as of v1.12.0, partial_run is still an experimental feature and is subject to change.

Collecting results of #parallel for-loop via remotecall

I use the #parallel for macro to run simulations for a range of parameters. Each run results in a 1-dimensional vector. In the end I would like to collect the results in a DataFrame.
Up until now I had always created an intermediate array and reduced the for-loop with vcat; then constructed the DataFrame. I thought it might also work to push! the result of each calculation to the master process via remotecall. A minimal example would look like
X=Float64[]
#sync #parallel for i in linspace(1.,10.,10)
remotecall_fetch(()->push!(X,i),1)
end
The result of which is consistently an array X with 9 not 10 elements. The number of dropped elements becomes larger as more workers are added.
This is on julia-0.6.1.
I thought I had understood julia's parallel computing structure, but it seems not.
What is the reason for this behavior? And how can I do it better and safely?
I suspect you're triggering a race condition, though couldn't say where.
If you only need to return one value per iteration, I would suggest just using pmap:
pmap(linspace(1.,10.,10)) do i
i
end
otherwise if each iteration could return multiple values, it would probably best to use RemoteChannels.

Data management in a parallel for-loop in Julia

I'm trying to do some statistical analysis using Julia. The code consists of the files script.jl (e.g. initialisation of the data) and algorithm.jl.
The number of simulations is large (at least 100,000) so it makes sense to use parallel processing.
The code below is just some pseudocode to illustrate my question —
function script(simulations::Int64)
# initialise input data
...
# initialise other variables for statistical analysis using zeros()
...
require("algorithm.jl")
#parallel for z = 1:simulations
while true
choices = algorithm(data);
if length(choices) == 0
break
else
# process choices and pick one (which alters the data)
...
end
end
end
# display results of statistical analysis
...
end
and
function algorithm(data)
# actual algorithm
...
return choices;
end
As example, I would like to know how many choices there are on average, what is the most common choice, and so on. For this purpose I need to save some data from choices (in the for-loop) to the statistical analysis variables (initialised before the for-loop) and display the results (after the for-loop).
I've read about using #spawn and fetch() and functions like pmap() but I'm not sure how I should proceed. Just using the variables inside the for-loop does not work as each proc gets its own copy, so the values of the statistical analysis variables after the for-loop will just be zeros.
[Edit] In Julia I use include("script.jl") and script(100000) to run the simulations, there are no issues when using a single proc. However, when using multiple procs (e.g. using addprocs(3)) all statistical variables are zeros after the for-loop — which is to be expected.
It seems that you want to parallelize an inherently serial operations, because each operation is related to the result of another one (in this case data).
I think if you could implement the above code like:
#parallel (dosumethingwithdata) for z = 1:simulations
while true
choices = algorithm(data,z);
if length(choices) == 0
break
else
# process choices and pick one (which alters the data)
...
end
data
end
end
then you may find a parallel solution for the problem.

How to run a method in parallel using Julia?

I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.
Let's say I have a matrix/dataframe df from some experiment. Its N rows are variables, and M columns are samples. I have a method pwCorr(..) that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2 (upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr() calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)
To do this, I feel like I'd have to create a DArray that gets filled by a #parallel for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.
This should work, first you need to propagate the top level variable (data) to all the workers:
for pid in workers()
remotecall(pid, x->(global data; data=x; nothing), data)
end
then perform the computation in chunks using the DArray constructor with some fancy indexing:
corrs = DArray((20,20)) do I
out=zeros(length(I[1]),length(I[2]))
for i=I[1], j=I[2]
if i<j
out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
else
out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
end
end
out
end
In more detail, the DArray constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I is the tuple of ranges with I[1] being the first range. You can see this more clearly with:
julia> DArray((10,10)) do I
println(I)
return zeros(length(I[1]),length(I[2]))
end
From worker 2: (1:10,1:5)
From worker 3: (1:10,6:10)
where you can see it split the array into two chunks on the second axis.
The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia.
Hope that helps!

Resources