How can I pass multiple parameters to a parallel operation in Octave? - parallel-processing

I wrote a function that acts on each combination of columns in an input matrix. It uses multiple for loops and is very slow, so I am trying to parallelize it to use the maximum number of threads on my computer.
I am having difficulty finding the correct syntax to set this up. I'm using the Parallel package in octave, and have tried several ways to set up the calls. Here are two of them, in a simplified form, as well as a non-parallel version that I believe works:
function A = parallelExample(M)
pkg load parallel;
# Get total count of columns
ct = columns(M);
# Generate column pairs
I = nchoosek([1:ct],2);
ops = rows(I);
slice = ones(1, ops);
Ic = mat2cell(I, slice, 2);
## # Non-parallel
## A = zeros(1, ops);
## for i = 1:ops
## A(i) = cmbtest(Ic{i}, M);
## endfor
# Parallelized call v1
A = parcellfun(nproc, #cmbtest, Ic, {M});
## # Parallelized call v2
## afun = #(x) cmbtest(x, M);
## A = parcellfun(nproc, afun, Ic);
endfunction
# function to apply
function P = cmbtest(indices, matrix)
colset = matrix(:,indices);
product = colset(:,1) .* colset(:,2);
P = sum(product);
endfunction
For both of these examples I generate every combination of two columns and convert those pairs into a cell array that the parcellfun function should split up. In the first, I attempt to convert the input matrix M into a 1x1 cell array so it goes to each parallel instance in the same form. I get the error 'C must be a cell array' but this must be internal to the parcellfun function. In the second, I attempt to define an anonymous function that includes the matrix. The error I get here specifies that 'cmbtest' is undefined.
(Naturally, the actual function I'm trying to apply is far more complex than cmbtest here)
Other things I have tried:
Put M into a global variable so it doesn't need to be passed. Seemed to be impossible to put a global variable in a function file, though I may just be having syntax issues.
Make cmbtest a nested function so it can access M (parcellfun doesn't support that)
I'm out of ideas at this point and could use help figuring out how to get this to work.

Converting my comments above to an answer.
When performing parallel operations, it is useful to think of each parallel worker that will result as separate and independent octave instances, which need to have appropriate access to all functions and variables they will require in order to do their independent work.
Therefore, do not rely on subfunctions when calling parcellfun from a main function, since this might lead to errors if the worker is unable to access the subfunction directly under the hood.
In this case, separating the subfunction into its own file fixed the problem.

Related

multiprocessing a geopandas.overlay() throws no error but seemingly never completes

I'm trying to pass a geopandas.overlay() to multiprocessing to speed it up.
I have used custom functions and functools to partially fill function inputs and then pass the iterative component to the function to produce a series of dataframes that I then concat into one.
def taska(id, points, crs):
return make_break_points((vms_points[points.ID == id]).reset_index(drop=True), crs)
points_gdf = geodataframe of points with an id field
grid_gdf = geodataframe polygon grid
partialA = functools.partial(taska, points=points_gdf, crs=grid_gdf.crs)
partialA_results =[]
with Pool(cpu_count()-4) as pool:
for results in pool.map(partialA, list(points_gdf.ID.unique())):
partialA_results.append(results)
bpts_gdf = pd.concat(partialA_results)
In the example above I use the list of unique values to subset the df and pass it to a processor to perform the function and return the results. In the end all the results are combined using pd.concat.
When I apply the same approach to a list of dataframes created using numpy.array_split() the process starts with a number of processors, then they all close and everything hangs with no indication that work is being done or that it will ever exit.
def taskc(tracks, grid):
return gpd.overlay(tracks, grid, how='union').explode().reset_index(drop=True)
tracks_gdf = geodataframe of points with an id field
dfs = np.array_split(tracks_gdf, (cpu_count()-4))
grid_gdf = geodataframe polygon grid
partialC_results = []
partialC = functools.partial(taskc, grid=grid_gdf)
with Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfs):
partialC_results.append(results)
results_df = pd.concat(partialC_results)
I tried using with get_context('spawn').Pool(cpu_count() - 4) as pool: based on the information here https://pythonspeed.com/articles/python-multiprocessing/ with no change in behavior.
Additionally, if I simply run geopandas.overlay(tracks_gdf, grid_gdf) the process is successful and the script carries on to the end with expected results.
Why does the partial function approach work on a list of items but not a list of dataframes?
Is the numpy.array_split() not an iterable object like a list?
How can I pass a single df into geopandas.overlay() in chunks to utilize multiprocessing capabilities and get back a single dataframe or a series of dataframes to concat?
This is my work around but am also interested if there is a better way to perform this and similar tasks. Essentially, modified the partial function so the df split is moved to the partial function then I create a list of values from range() as my iteral.
def taskc(num, tracks, grid):
return gpd.overlay(np.array_split(tracks, cpu_count()-4)[num], grid, how='union').explode().reset_index(drop=True)
partialC = functools.partial(taskc, tracks=tracks_gdf, grid=grid_gdf)
dfrange = list(range(0, cpu_count() - 4))
partialC_results = []
with get_context('spawn').Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfrange):
partialC_results.append(results)
results_gdf = pd.concat(partialC_results)

How can I use randjump instead of generating a certain number of random numbers?

I am trying to "skip forward" a few realizations by using the function Future.randjump(), but it doesn't seem to behave as I expect it to. The following code gives me the desired result, where jumping forward 1 steps gives the same result as if I had called rand(rng) twice, i.e. the two println display the same number:
using Random, Future
rng = MersenneTwister(123);
new_rng = Future.randjump(rng, 1)
rand(rng)
rand(rng)
println(rand(rng))
println(rand(new_rng))
However, if I add one extra call to rand(rng) before the call to randjump(), the two printed numbers are completely different:
using Random, Future
rng = MersenneTwister(123);
rand(rng) # Added line
new_rng = Future.randjump(rng, 1)
rand(rng)
rand(rng)
println(rand(rng))
println(rand(new_rng))
I expected that the two calls to println() would display the same thing even in the second case, how come they don't? Is there a way I can use randjump() in the second case to get the same realizations as if I had called rand(rng) several times? Thank you in advance.
One unit of randjump corresponds to generation of two floating point numbers.
Consider this example
julia> rng = MersenneTwister(123);
julia> rng2 = Future.randjump(rng, 1);
julia> rand(rng, 4)
4-element Vector{Float64}:
0.7684476751965699
0.940515000715187
0.6739586945680673
0.3954531123351086
julia> rand(rng2,2)
2-element Vector{Float64}:
0.6739586945680673
0.3954531123351086
Note that in the second call (that is rand(rng2,2)) the both numbers are identical to the two last numbers in the first call (taht is rand(rng,2)).
Another issue is that different distributions might "consume" Float64 numbers from the stream at a different speed - so you need to check with a particular distribution how fast it consumes floats for the stream (some might also use buffering etc...).
Looking at the source code of randn (#edit randn()) it consumes one float and hence you get the same results for those two calls:
julia> randn(MersenneTwister(123),6)[3:end]
4-element Vector{Float64}:
1.142650902867199
0.45941562040708034
-0.396679079295223
-0.6647125451916877
julia> randn(Future.randjump(MersenneTwister(123),1),4)
4-element Vector{Float64}:
1.142650902867199
0.45941562040708034
-0.396679079295223
-0.6647125451916877
EDIT
Regarding your comment the size of Mersenne Twister state is 19937 bits and half-unit jumps are not supported. Running rand is mutating this state but not half-the way - so you end up with different bits. Note that an RNG is a sequence of states and the actual values are calculated from that state.
The correct pattern to synchronize random numbers in your computations is the following:
master_rng = MersenneTwister(123);
rng1 = Future.randjump(master_rng, big(10)^20)
# do whatever you want
rng2 = Future.randjump(master_rng, 2*big(10)^20)
# do whatever you want
rng3 = Future.randjump(master_rng, 3*big(10)^20)
# do whatever you want
With this pattern you can correctly maintains synchronization between random number streams and have full control whether the should overlap or not.

How do I parallelize for-loop in octave using pararrayfun (or any other function will also do)?

Well, I'm new to octave and i wanted to know how to implement parallel execution of for loop in octave.
I'm looking for parallel implementation of the below code (its not the exact code that I'm trying to execute, but something similar to this)
`%read a csv file
master_sheet = csv2cell('master_sheet.csv');
delta = 0.001;
nprocs= nproc();
%extract some values from the csv file and store it in the variables
a = master_sheet{34,2} ;
b = master_sheet{38,2} ;
c = master_sheet{39,2} ;
for i=0:1000
%%create variants of a,b and c by adding a delta value
a_adj = a +(i)*delta ;
b_adj = b +(i)*delta ;
c_adj = c +(i)*delta ;
%club all the above variables and put it to an array variable
array_abc = [a_adj, b_adj, c_adj];
%send this array as an argument/parameter to a function
%processingData() function would essentially perform some series of calculation and would write the
%results onto a file
processingData(array_abc);
endfor
Currently, I'm using parallel pkg (pararrayfun) to implement this, but if there is any other way(package) that could achieve the parallelization of for loop in octave, then I'm open to exploring that as well.
Thank you!

How to setup my function with blockproc to process the image in parts?

I have an image:
I want to divide this image into 3 equal parts and calculate the SIFT for each part individually and then concatenate the results.
I found out that Matlab's blockproc does just that, but I do not know how to get it to work with my function. Here is what I have:
[r c] = size(image);
c_new = floor(c/3); %round it
B = blockproc(image, [r c_new], #block_fun)
So according to Matlabs documentation the function, block_fun will be applied to the original image in blocks of size r and c_new.
this is what I wrote as block_fun
function feats = block_fun(img)
[keypoints, descriptors] = vl_sift(single(img));
feats = descriptors;
end
So, my matrix B should be a concatenation of the SIFT descriptors of all three parts of the same image? right?
But the error that I get when I run the command:
B = blockproc(image, [r c_new], #block_fun)
Function BLOCKPROC encountered an error while evaluating the user
supplied function handle, FUN.
The cause of the error was:
Error using single Conversion to single from struct is not possible.
For your custom function, blockproc sends in a structure where the image data is stored in a field called data. As such, you simply need to change your function so that it accesses the data field in the input. Like so:
function feats = block_fun(block_struct) %// Change
[keypoints, descriptors] = vl_sift(single(block_struct.data)); %// Change
feats = descriptors;
end
This error is caused by the fact that the function that is called via its handle by blockproc expects a block struct.
The real problem is that blockproc will attempt to concatenate all results and you will have a different set of 128xN feature vectors for each block, which blockproc doesn't allow.
I think that using im2col and reshape would be much more simple.

Removing a "row" from a structure array

This is similar to a question I asked before, but is slightly different:
So I have a very large structure array in matlab. Suppose, for argument's sake, to simplify the situation, suppose I have something like:
structure(1).name, structure(2).name, structure(3).name structure(1).returns, structure(2).returns, structure(3).returns (in my real program I have 647 structures)
Suppose further that structure(i).returns is a vector (very large vector, approximately 2,000,000 entries) and that a condition comes along where I want to delete the jth entry from structure(i).returns for all i. How do you do this? or rather, how do you do this reasonably fast? I have tried some things, but they are all insanely slow (I will show them in a second) so I was wondering if the community knew of faster ways to do this.
I have parsed my data two different ways; the first way had everything saved as cell arrays, but because things hadn't been working well for me I parsed the data again and placed everything as vectors.
What I'm actually doing is trying to delete NaN data, as well as all data in the same corresponding row of my data file, and then doing the very same thing after applying the Hampel filter. The relevant part of my code in this attempt is:
for i=numStock+1:-1:1
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
stock(i).return = sort(stock(i).return);
stock(i).returnLength = length(stock(i).return);
stock(i).medianReturn = median(stock(i).return);
stock(i).madReturn = mad(stock(i).return,1);
end;
for i=numStock:-1:1
for j = length(stock(i+1).volume):-1:1
if(isnan(stock(i+1).volume(j)))
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end
end
stock(i+1).volume = sort(stock(i+1).volume);
stock(i+1).volumeLength = length(stock(i+1).volume);
stock(i+1).medianVolume = median(stock(i+1).volume);
stock(i+1).madVolume = mad(stock(i+1).volume,1);
end;
for i=numStock+1:-1:1
for j=stock(i).returnLength:-1:1
if (abs(stock(i).return(j) - stock(i).medianReturn) > 3*stock(i).madReturn)
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end;
end;
end;
for i=numStock:-1:1
for j=stock(i+1).volumeLength:-1:1
if (abs(stock(i+1).volume(j) - stock(i+1).medianVolume) > 3*stock(i+1).madVolume)
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end;
end;
end;
However, this returns an error:
"Matrix index is out of range for deletion.
Error in Failure (line 110)
stock(k).return(j) = [];"
So instead I tried by parsing everything in as vectors. Then I decided to try and delete the appropriate entries in the vectors prior to building the structure array. This isn't returning an error, but it is very slow:
%% Delete bad data, Hampel Filter
% Delete bad entries
id=strcmp(returns,'');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
id=strcmp(returns,'C');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
% Convert returns from string to double
returns=cellfun(#str2double,returns);
sp500=cellfun(#str2double,sp500);
% Delete all data for which a return is not a number
nanid=isnan(returns);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Delete all data for which a volume is not a number
nanid=isnan(volume);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Apply the Hampel filter, and delete all data corresponding to
% observations deleted by the filter.
medianReturn = median(returns);
madReturn = mad(returns,1);
for i=length(returns):-1:1
if (abs(returns(i) - medianReturn) > 3*madReturn)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
medianVolume = median(volume);
madVolume = mad(volume,1);
for i=length(volume):-1:1
if (abs(volume(i) - medianVolume) > 3*madVolume)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
As I said, this is very slow, probably because I'm using a for loop on a very large data set; however, I'm not sure how else one would do this. Sorry for the gigantic post, but does anyone have a suggestion as to how I might go about doing what I'm asking in a reasonable way?
EDIT: I should add that getting the vector method to work is probably preferable, since my aim is to put all of the return vectors into a matrix and get all of the volume vectors into a matrix and perform PCA on them, and I'm not sure how I would do that using cell arrays (or even if princomp would work on cell arrays).
EDIT2: I have altered the code to match your suggestion (although I did decide to give up speed and keep with the for-loops to keep with the structure array, since reparsing this data will be way worse time-wise). The new code snipet is:
stock_return = zeros(numStock+1,length(stock(1).return));
for i=1:numStock+1
for j=1:length(stock(i).return)
stock_return(i,j) = stock(i).return(j);
end
end
stock_return = stock_return(~any(isnan(stock_return)), : );
This returns an Index exceeds matrix dimensions error, and I'm not sure why. Any suggestions?
I could not find a convenient way to handle structures, therefore I would restructure the code so that instead of structures it uses just arrays.
For example instead of stock(i).return(j) I would do stock_returns(i,j).
I show you on a part of your code how to get rid of for-loops.
Say we deal with this code:
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
Now, the deletion of columns with any NaN data goes like this:
stock_return = stock_return(:, ~any(isnan(stock_return)) );
As for the absolute difference from medianVolume, you can write a similar code:
% stock_return_length is a scalar
% stock_median_return is a column vector (eg. [1;2;3])
% stock_mad_return is also a column vector.
median_return = repmat(stock_median_return, stock_return_length, 1);
is_bad = abs(stock_return - median_return) > 3.* stock_mad_return;
stock_return = stock_return(:, ~any(is_bad));
Using a scalar for stock_return_length means of course that the return lengths are the same, but you implicitly assume it in your original code anyway.
The important point in my answer is using any. Logical indexing is not sufficient in itself, since in your original code you delete all the values if any of them is bad.
Reference to any: http://www.mathworks.co.uk/help/matlab/ref/any.html.
If you want to preserve the original structure, so you stick to stock(i).return, you can speed-up your code using essentially the same scheme but you can only get rid of one less for-loop, meaning that your program will be substantially slower.

Resources