How to implement the pool.map or pool.starmap for the multiple arguments with different lenths - multiprocessing

The function below have five arguments. "day" and "N" are both a number. "P_net", "lamdba_z", and
"lamdba_l" are vertors with different lenths. I dont know how to use pool.map or pool.starmap to realize parallel computing this function. More, these five arguments are all iterative. Very preciate for the help
def local_problem(day, P_net, lamdba_z, lamdba_l, N):
Pnet=P_net[:,day].reshape(-1,1)
P_grid_load = cp.Variable((24,1), nonneg=True)
P_pv_grid = cp.Variable((24,1), nonneg=True)
P_b = cp.Variable((24,1))
E_b = cp.Variable((24,1), nonneg=True)
E_b_end = E_b[23]
E_b_0 = cp.Variable((1), nonneg=True)
dsoh_cyc = cp.Variable((24,1), nonneg=True)
...
for par_index, N_b in enumerate(my_swarm.position):
for k in range(0,dayi):
input_data.append([k, P_net[:,par_index], lamdba_z[:,par_index], lamdba_l[:,par_index], N_b])
I try to buld a list "input_data" to accommodate these five arguments with 10000+ row, then I want to assign these rows into pools to computing these rows parallelly as follows:
results=pool.map(local_problem, input_data)

Related

multiprocessing a geopandas.overlay() throws no error but seemingly never completes

I'm trying to pass a geopandas.overlay() to multiprocessing to speed it up.
I have used custom functions and functools to partially fill function inputs and then pass the iterative component to the function to produce a series of dataframes that I then concat into one.
def taska(id, points, crs):
return make_break_points((vms_points[points.ID == id]).reset_index(drop=True), crs)
points_gdf = geodataframe of points with an id field
grid_gdf = geodataframe polygon grid
partialA = functools.partial(taska, points=points_gdf, crs=grid_gdf.crs)
partialA_results =[]
with Pool(cpu_count()-4) as pool:
for results in pool.map(partialA, list(points_gdf.ID.unique())):
partialA_results.append(results)
bpts_gdf = pd.concat(partialA_results)
In the example above I use the list of unique values to subset the df and pass it to a processor to perform the function and return the results. In the end all the results are combined using pd.concat.
When I apply the same approach to a list of dataframes created using numpy.array_split() the process starts with a number of processors, then they all close and everything hangs with no indication that work is being done or that it will ever exit.
def taskc(tracks, grid):
return gpd.overlay(tracks, grid, how='union').explode().reset_index(drop=True)
tracks_gdf = geodataframe of points with an id field
dfs = np.array_split(tracks_gdf, (cpu_count()-4))
grid_gdf = geodataframe polygon grid
partialC_results = []
partialC = functools.partial(taskc, grid=grid_gdf)
with Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfs):
partialC_results.append(results)
results_df = pd.concat(partialC_results)
I tried using with get_context('spawn').Pool(cpu_count() - 4) as pool: based on the information here https://pythonspeed.com/articles/python-multiprocessing/ with no change in behavior.
Additionally, if I simply run geopandas.overlay(tracks_gdf, grid_gdf) the process is successful and the script carries on to the end with expected results.
Why does the partial function approach work on a list of items but not a list of dataframes?
Is the numpy.array_split() not an iterable object like a list?
How can I pass a single df into geopandas.overlay() in chunks to utilize multiprocessing capabilities and get back a single dataframe or a series of dataframes to concat?
This is my work around but am also interested if there is a better way to perform this and similar tasks. Essentially, modified the partial function so the df split is moved to the partial function then I create a list of values from range() as my iteral.
def taskc(num, tracks, grid):
return gpd.overlay(np.array_split(tracks, cpu_count()-4)[num], grid, how='union').explode().reset_index(drop=True)
partialC = functools.partial(taskc, tracks=tracks_gdf, grid=grid_gdf)
dfrange = list(range(0, cpu_count() - 4))
partialC_results = []
with get_context('spawn').Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfrange):
partialC_results.append(results)
results_gdf = pd.concat(partialC_results)

For loop for a regression model with increasing number of predictors

Ho can I create a loop to fit models with increasing number of predictors. The first iteration should
use one predictor, then two, and so on until all predictors are included. I have to compute the RMSE
on both the training and test data for this model, and store these values in a list/array.
predictors = ['bedrooms','bathrooms','sqft_living','sqft_lot','floors',
'waterfront','view','condition','grade','sqft_above',
'sqft_basement','yr_built','yr_renovated','zipcode','lat',
'long','sqft_living15','sqft_lot15']
models = []
formula = 'price ~ bedrooms'
for p in predictors[0:19]:
formula = formula + p
print(formula)
model_linear_kc_5 = smf.ols(formula=formula, data=df_train_kc)
models.append(model_linear_kc_5.fit())
My code so far but I know this isn't right and am stuck how to do it.
I have to put print(formula) inside loop and then adjust the formula = … line until it does what I want it to.
I would really appreciate help in this regard. Thank you.

How can I pass multiple parameters to a parallel operation in Octave?

I wrote a function that acts on each combination of columns in an input matrix. It uses multiple for loops and is very slow, so I am trying to parallelize it to use the maximum number of threads on my computer.
I am having difficulty finding the correct syntax to set this up. I'm using the Parallel package in octave, and have tried several ways to set up the calls. Here are two of them, in a simplified form, as well as a non-parallel version that I believe works:
function A = parallelExample(M)
pkg load parallel;
# Get total count of columns
ct = columns(M);
# Generate column pairs
I = nchoosek([1:ct],2);
ops = rows(I);
slice = ones(1, ops);
Ic = mat2cell(I, slice, 2);
## # Non-parallel
## A = zeros(1, ops);
## for i = 1:ops
## A(i) = cmbtest(Ic{i}, M);
## endfor
# Parallelized call v1
A = parcellfun(nproc, #cmbtest, Ic, {M});
## # Parallelized call v2
## afun = #(x) cmbtest(x, M);
## A = parcellfun(nproc, afun, Ic);
endfunction
# function to apply
function P = cmbtest(indices, matrix)
colset = matrix(:,indices);
product = colset(:,1) .* colset(:,2);
P = sum(product);
endfunction
For both of these examples I generate every combination of two columns and convert those pairs into a cell array that the parcellfun function should split up. In the first, I attempt to convert the input matrix M into a 1x1 cell array so it goes to each parallel instance in the same form. I get the error 'C must be a cell array' but this must be internal to the parcellfun function. In the second, I attempt to define an anonymous function that includes the matrix. The error I get here specifies that 'cmbtest' is undefined.
(Naturally, the actual function I'm trying to apply is far more complex than cmbtest here)
Other things I have tried:
Put M into a global variable so it doesn't need to be passed. Seemed to be impossible to put a global variable in a function file, though I may just be having syntax issues.
Make cmbtest a nested function so it can access M (parcellfun doesn't support that)
I'm out of ideas at this point and could use help figuring out how to get this to work.
Converting my comments above to an answer.
When performing parallel operations, it is useful to think of each parallel worker that will result as separate and independent octave instances, which need to have appropriate access to all functions and variables they will require in order to do their independent work.
Therefore, do not rely on subfunctions when calling parcellfun from a main function, since this might lead to errors if the worker is unable to access the subfunction directly under the hood.
In this case, separating the subfunction into its own file fixed the problem.

Neural network - solve a net with time arrays and different sample rate

I have 3 measurements for a machine. Each measurement is trigged every time its value changes by a certain delta.
I have these 3 data sets, represented as Matlab objects: T1, T2 and O. Each of them has a obj.t containing the timestamp values and obj.y containing the measurement values.
I will measure T1 and T2 for a long time, but O only for a short period. The task is to reconstruct O_future from T1 and T2, using the existing values for O for training and validation.
Note that T1.t, T2.t and O.t are not equal, not even their frequency (I might call it 'variable sample rate', but not sure if this name applies).
Is it possible to solve this problem using Matlab or other software? Do I need to resample all data to a common time vector?
Concerning the common time. Below some basic code which does this. (I guess you might know how to do it but just in case). However, the second option might bring you further...
% creating test signals
t1 = 1:2:100;
t2 = 1:3:200;
to = [5 6 100 140];
s1 = round (unifrnd(0,1,size(t1)));
s2 = round (unifrnd(0,1,size(t2)));
o = ones(size(to));
maxt = max([t1 t2 to]);
mint = min([t1 t2 to]);
% determining minimum frequency
frequ = min([t1(2:length(t1)) - t1(1:length(t1)-1) t2(2:length(t2)) - t2(1:length(t2)-1) to(2:length(to)) - to(1:length(to)-1)] );
% create a time vector with highest resolution
tinterp = linspace(mint,maxt,(maxt-mint)/frequ+1);
s1_interp = zeros(size(tinterp));
s2_interp = zeros(size(tinterp));
o_interp = zeros(size(tinterp));
for i = 1: length(t1)
s1_interp(ceil(t1(i))==floor(tinterp)) =s1(i);
end
for i = 1: length(t2)
s2_interp(ceil(t2(i))==floor(tinterp)) =s2(i);
end
for i = 1: length(to)
o_interp(ceil(to(i))==floor(tinterp)) = o(i);
end
figure,
subplot 311
hold on, plot(t1,s1,'ro'), plot(tinterp,s1_interp,'k-')
legend('observation','interpolation')
title ('signal 1')
subplot 312
hold on, plot(t2,s2,'ro'), plot(tinterp,s2_interp,'k-')
legend('observation','interpolation')
title ('signal 2')
subplot 313
hold on, plot(to,o,'ro'), plot(tinterp,o_interp,'k-')
legend('observation','interpolation')
title ('O')
Its not ideal as for large vectors this might become ineffective as soon as you have small sampling frequencies in one of the signals which will determine the lowest resolution.
Another option would be to define a coarser time vector and look at the number of events that happend in a certain period which might have some predictive power as well (not sure about your setup).
The structure would be something like
coarse_t = 1:5:100;
s1_coarse = zeros(size(coarse_t));
s2_coarse = zeros(size(coarse_t));
o_coarse = zeros(size(coarse_t));
for i = 2:length(coarse_t)
s1_coarse(i) = sum(nonzeros(s1(t1<coarse_t(i) & t1>coarse_t(i-1))));
s2_coarse(i) = sum(nonzeros(s2(t2<coarse_t(i) & t2>coarse_t(i-1))));
o_coarse(i) = sum(nonzeros(o(to<coarse_t(i) & to>coarse_t(i-1))));
end

Formula for calculating Exotic wagers such as Trifecta and Superfecta

I am trying to create an application that will calculate the cost of exotic parimutuel wager costs. I have found several for certain types of bets but never one that solves all the scenarios for a single bet type. If I could find an algorithm that could calculate all the possible combinations I could use that formula to solve my other problems.
Additional information:
I need to calculate the permutations of groups of numbers. For instance;
Group 1 = 1,2,3
Group 2 = 2,3,4
Group 3 = 3,4,5
What are all the possible permutation for these 3 groups of numbers taking 1 number from each group per permutation. No repeats per permutation, meaning a number can not appear in more that 1 position. So 2,4,3 is valid but 2,4,4 is not valid.
Thanks for all the help.
Like most interesting problems, your question has several solutions. The algorithm that I wrote (below) is the simplest thing that came to mind.
I found it easiest to think of the problem like a tree-search: The first group, the root, has a child for each number it contains, where each child is the second group. The second group has a third-group child for each number it contains, the third group has a fourth-group child for each number it contains, etc. All you have to do is find all valid paths from the root to leaves.
However, for many groups with lots of numbers this approach will prove to be slow without any heuristics. One thing you could do is sort the list of groups by group-size, smallest group first. That would be a fail-fast approach that would, in general, discover that a permutation isn't valid sooner than later. Look-ahead, arc-consistency, and backtracking are other things you might want to think about. [Sorry, I can only include one link because it's my first post, but you can find these things on Wikipedia.]
## Algorithm written in Python ##
## CodePad.org has a Python interpreter
Group1 = [1,2,3] ## Within itself, each group must be composed of unique numbers
Group2 = [2,3,4]
Group3 = [3,4,5]
Groups = [Group1,Group2,Group3] ## Must contain at least one Group
Permutations = [] ## List of valid permutations
def getPermutations(group, permSoFar, nextGroupIndex):
for num in group:
nextPermSoFar = list(permSoFar) ## Make a copy of the permSoFar list
## Only proceed if num isn't a repeat in nextPermSoFar
if nextPermSoFar.count(num) == 0:
nextPermSoFar.append(num) ## Add num to this copy of nextPermSoFar
if nextGroupIndex != len(Groups): ## Call next group if there is one...
getPermutations(Groups[nextGroupIndex], nextPermSoFar, nextGroupIndex + 1)
else: ## ...or add the valid permutation to the list of permutations
Permutations.append(nextPermSoFar)
## Call getPermutations with:
## * the first group from the list of Groups
## * an empty list
## * the index of the second group
getPermutations(Groups[0], [], 1)
## print results of getPermutations
print 'There are', len(Permutations), 'valid permutations:'
print Permutations
This is the simplest general formula I know for trifectas.
A=the number of selections you have for first; B=number of selections for second; C=number of selections for third; AB=number of selections you have in both first and second; AC=no. for both first and third; BC=no. for both 2nd and 3rd; and ABC=the no. of selections for all of 1st,2nd, and third.
the formula is
(AxBxC)-(ABxC)-(ACxB)-(BCxA)+(2xABC)
So, for your example ::
Group 1 = 1,2,3
Group 2 = 2,3,4
Group 3 = 3,4,5
the solution is:: (3x3x3)-(2x3)-(1x3)-(2x3)+(2x1)=14. Hope that helps
There might be an easier method that I am not aware of. Now does anyone know a general formula for First4?
Revised after a few years:-
I re logged into my SE account after a while and noticed this question, and realised what I'd written didn't even answer you:-
Here is some python code
import itertools
def explode(value, unique):
legs = [ leg.split(',') for leg in value.split('/') ]
if unique:
return [ tuple(ea) for ea in itertools.product(*legs) if len(ea) == len(set(ea)) ]
else:
return [ tuple(ea) for ea in itertools.product(*legs) ]
calling explode works on the basis that each leg is separated by a /, and each position by a ,
for your trifecta calculation you can work it out by the following:-
result = explode('1,2,3/2,3,4/3,4,5', True)
stake = 2.0
cost = stake * len(result)
print cost
for a superfecta
result = explode('1,2,3/2,4,5/1,3,6,9/2,3,7,9', True)
stake = 2.0
cost = stake * len(result)
print cost
for a pick4 (Set Unique to False)
result = explode('1,2,3/2,4,5/3,9/2,3,4', False)
stake = 2.0
cost = stake * len(result)
print cost
Hope that helps
AS a punter I can tell you there is a much simpler way:
For a trifecta, you need 3 combinations. Say there are 8 runners, the total number of possible permutations is 8 (total runners)* 7 (remaining runners after the winner omitted)* 6 (remaining runners after the winner and 2nd omitted) = 336
For an exacta (with 8 runners) 8 * 7 = 56
Quinellas are an exception, as you only need to take each bet once as 1/2 pays as well as 2/1 so the answer is 8*7/2 = 28
Simple
The answer supplied by luskin is correct for trifectas. He posed another question I needed to solve regarding First4. I looked everywhere but could not find a formula. I did however find a simple way to determine the number of unique permutations, using nested loops to exclude repeated sequences.
Public Function fnFirst4PermCount(arFirst, arSecond, arThird, arFourth) As Integer
Dim intCountFirst As Integer
Dim intCountSecond As Integer
Dim intCountThird As Integer
Dim intCountFourth As Integer
Dim intBetCount As Integer
'Dim arFirst(3) As Integer
'Dim arSecond(3) As Integer
'Dim arThird(3) As Integer
'Dim arFourth(3) As Integer
'arFirst(0) = 1
'arFirst(1) = 2
'arFirst(2) = 3
'arFirst(3) = 4
'
'arSecond(0) = 1
'arSecond(1) = 2
'arSecond(2) = 3
'arSecond(3) = 4
'
'arThird(0) = 1
'arThird(1) = 2
'arThird(2) = 3
'arThird(3) = 4
'
'arFourth(0) = 1
'arFourth(1) = 2
'arFourth(2) = 3
'arFourth(3) = 4
intBetCount = 0
For intCountFirst = 0 To UBound(arFirst)
For intCountSecond = 0 To UBound(arSecond)
For intCountThird = 0 To UBound(arThird)
For intCountFourth = 0 To UBound(arFourth)
If (arFirst(intCountFirst) <> arSecond(intCountSecond)) And (arFirst(intCountFirst) <> arThird(intCountThird)) And (arFirst(intCountFirst) <> arFourth(intCountFourth)) Then
If (arSecond(intCountSecond) <> arThird(intCountThird)) And (arSecond(intCountSecond) <> arFourth(intCountFourth)) Then
If (arThird(intCountThird) <> arFourth(intCountFourth)) Then
' Debug.Print "First " & arFirst(intCountFirst), " Second " & arSecond(intCountSecond), "Third " & arThird(intCountThird), " Fourth " & arFourth(intCountFourth)
intBetCount = intBetCount + 1
End If
End If
End If
Next intCountFourth
Next intCountThird
Next intCountSecond
Next intCountFirst
fnFirst4PermCount = intBetCount
End Function
this function takes four string arrays for each position. I left in test code (commented out) so you can see how it works for 1/2/3/4 for each of the four positions

Resources