MPI: gathering parse data to all processes - parallel-processing

I am parallelizing a scientific code with MPI, in which certains calculations are done in a grid (array) basis. This array stores a distributed probability that will be needed then for different computations in each of the tasks.
To distribute this calculations, i've come to the conclusion that the best way to balance the computational load is to make each individual MPI task make the computation for each position that is integer multiple of the task id (task id +1 actually). For example, in an mpi execution with four tasks (mpi_size=4), if the array has 10 elements, task 1 would compute positions 1, 5, and 9; task 2: 2, 6, and 10; task 3: 3 and 7; task 4: 4 and 8.
So, essentially in each task I am running a loop over the array with step mpi_size, and starting with an offset = (the task id + 1). Afterwards, I need all the tasks to have the results that all the other tasks have computed. This is similar to a typical mpi_allgatherv case, but as far as I understand there is no way to gather scattered data in such way, since the message of each task is then stored in clusters. Graphical explanation of what I mean (Before the MPI operation --left--, "vn" would be the values updated by the current task while the white spaces would be the values updated by other tasks):
Sparse computation of V (what I am trying to achieve):
task#1: V#1 = [v1][ ][ ][ ][v5][ ][ ][ ][v9][ ]
task#2: V#2 = [ ][v2][ ][ ][ ][v6][ ][ ][ ][V10] => V#1=V#2=V#3=V#4 = [v1][v2]...[v9][v10]
task#3: V#3 = [ ][ ][v3][ ][ ][ ][v7][ ][ ][ ] ^
task#4: V#4 = [ ][ ][ ][v4][ ][ ][ ][v8][ ][ ] ( HOW ??? )
Clustered computation of V
task#1: V#1 = [v1][v2][v3][ ][ ][ ][ ][ ][ ][ ]
task#2: V#2 = [ ][ ][ ][v4][v5][v6][ ][ ][ ][ ] => V#1=V#2=V#3=V#4 = [v1][v2]...[v9][v10]
task#3: V#3 = [ ][ ][ ][ ][ ][ ][v7][v8][ ][ ] ^
task#4: V#4 = [ ][ ][ ][ ][ ][ ][ ][ ][v9][v10] ( MPI_ALLGATHERV )
Simplest solution I have found is to use allreduce with --for example-- MPI_SUM operation where the elements that are not calculated in each rank (white spaces above) are set to 0 in that particular rank. Other operations instead of sum come to mind, but I am trying to avoid such overhead.
Any ideas?

Related

Understanding error in oop Backprop implementation

Good evening everybody.
I have been using NN quite often already, so I thought it's time to face the background.
As a result I have been spending quite a lot of hours with my c++ implementation of a Neural Network from scratch. Still, I do not get any useful output.
My issue is the clean OOP and efficient implementation, especially what I have to backpropagate from one Layer class to the next. I am aware that I'm just skipping the full calculation/forwarding of the jacobian matrices, but from my understanding this isn't necessary, since most of the entries will be cut out.
I have a softmax class with size n:
Forward Pass: It takes an input vector input of length n and creates an output vector output of size n.
sum = 0; for (int i = 0; i < n; i++) sum += e^input[ i ].
Then it calculates an output vector output of length n with:
output [ i ] = e^input [ i ] / sum
Backward Pass: It takes a feedback vector target of size n, the target value.
I do not have weights or biase in my softmax class, so I just calculate the feedback vector feedback of size n:
feedback[ i ] = output[ i ] - target[ i ]
That is what I return from my softmax layer.
I have a fully Connected class: m -> n
Forward Pass: It take an input vector of size m.
I calculate the net activity vector net of size n, and an output vector of size n:
net[ i ] = b[ i ];
for (int j = 0; j < m; j++) net[ i ] += w[ i ][ j ] * input[ i ]
output [ i ] = 1 / (1 + e^-net[ i ])
Backward Pass: It takes an feedback vector of size n from the following layer.
b'[ i ] = b[ i ] + feedback[ i ] * 1 * learningRate
w'[ i ][ j ] = w[ i ][ j ] + feedback[ i ] * input[ j ] * learningRate
The new feedback array of size m:
feedback'[ i ] = 0;
feedback'[ i ] += feedback[ j ] * weights[ i ][ j ] * (output[ j ] * (1 - output[ j ]))
Of course, the feedback from one fully connected layer will be passed to the next, and so on.
I've been reading a few articles and found this one quite nice:
https://www.ics.uci.edu/~pjsadows/notes.pdf
I feel like my implementation should be identical to what I read in such papers, but even after a small number of training examples (~100) my network output is getting close to a constant. Basically as if it would be just depending on the biase.
So, could someone please give me a hint if I'm wrong with my theoretical understanding, or if I just have some issues with my implementation?

Sorting a Maple dataframe by the contents of a column

I have a dataset stored in a Maple dataframe that I'd like to sort by values in a given column. My example is larger, but the data is such that I have two columns of data, one that has some numeric values, and the other that has strings. So for example, say if I have a dataframe constructed as:
Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );
I'd like something like the sort command to be able to return the same dataframe with the numbers in the Value column sorted in ascending or descending order, but the sort command doesn't seem to support dataframes. Any ideas on how I can sort this?
You're right that the sort command doesn't currently support DataFrames (but it should!). I've gotten around this by converting the DataFrame column (a DataSeries) to a Vector, sorting the Vector using output = permutation option and then indexing the DataFrame by the result. Using your example:
Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );
sort( convert( Mydata[Value], Vector ), output = permutation );
Which returns:
[4, 2, 1, 3]
Indexing the original DataFrame by this result then returns the sorted DataFrame in ascending order of the Value column:
Mydata[ sort( convert( Mydata[Value], Vector ), output = permutation ), .. ];
Mydata[ [4, 2, 1, 3], .. ];
returns:
[ Value Color ]
[ ]
[4 0 "Orange"]
[ ]
[2 1 "Blue" ]
[ ]
[1 2 "Red" ]
[ ]
[3 3 "Green" ]
That said, I have needed to sort DataFrames a number of times, so I have also created a procedure that seems to work for most my data sets. This procedure uses a similar approach of using the sort command, however it doesn't require any data conversions since it works on the Maple DataFrame object itself. To do so, I need to set kernelopts(opaquemodules = false) in order to work directly with the internal DataFrame data object (you could also make a bunch of conversions to intermediate Matrices and Vectors, but this approach limits the amount of duplicate internal data being created):
DSort := proc( self::{DataFrame,DataSeries}, {ByColumn := NULL} )
local i, opacity, orderindex;
opacity := kernelopts('opaquemodules' = false):
if type( self, ':-DataFrame' ) and ByColumn <> NULL then
orderindex := sort( self[ByColumn]:-data, ':-output' = ':-permutation', _rest );
elif type( self, ':-DataSeries' ) and ByColumn = NULL then
orderindex := sort( self:-data, ':-output' = ':-permutation', _rest );
else
return self;
end if;
kernelopts(opaquemodules = opacity): #Set opaquemodules back to original setting
if type( self, ':-DataFrame' ) then
return DataFrame( self[ orderindex, .. ] );
else
return DataSeries( self[ orderindex ] );
end if;
end proc:
For example:
DSort( Mydata, ByColumn=Value );
returns:
[ Value Color ]
[ ]
[4 0 "Orange"]
[ ]
[2 1 "Blue" ]
[ ]
[1 2 "Red" ]
[ ]
[3 3 "Green" ]
This also works on strings, so DSort( Mydata, ByColumn=Color ); should work.
[ Value Color ]
[ ]
[2 1 "Blue" ]
[ ]
[3 3 "Green" ]
[ ]
[4 0 "Orange"]
[ ]
[1 2 "Red" ]
In this procedure, I pass additional arguments to the sort command, which means that you can also add in the ascending or descending options, so you could do DSort( Mydata, ByColumn=Value, `>` ); to return the DataFrame in descending 'Value' order (this doesn't seem to play well with strings though).

Connecting disjoint sets in a 2D array

I am trying to generate a random grid with positions that are Traversable and Non-Traversable and ensure that there is a path from one Traversable position to any other Traversable position in one of the 4 directions {Right, Up, Left, Down}. Traversable positions are represented as "[ ]" and Non-Traversable positions are represented as "[X]"
Here is a grid I have generated:
[ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][X][ ][X]
[ ][ ][X][ ][ ][ ][X][ ][ ][X][X][ ][ ][ ]
[X][ ][ ][ ][ ][X][X][X][ ][ ][ ][X][ ][ ]
[ ][ ][ ][ ][ ][X][ ][ ][ ][X][ ][ ][X][ ]
[ ][X][ ][ ][ ][X][ ][ ][ ][ ][X][X][X][X]
[ ][ ][X][X][X][ ][ ][ ][X][X][X][X][X][X]
[ ][X][ ][ ][ ][X][ ][ ][ ][X][X][ ][ ][X]
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][ ]
[ ][ ][X][ ][ ][ ][X][ ][X][X][ ][ ][ ][ ]
[ ][X][ ][X][ ][ ][ ][ ][ ][ ][X][X][ ][ ]
[ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][X][X][ ]
What algorithm can I use to find the disjoint sets in my grid and create a path between the disjoint sets? Thanks!
To find the disjoint components, you can use a breadth-first search (with a queue) or depth-first search (with a stack) starting at any traversable position. When the search terminates, it will have marked an entire component. Then if there are unmarked positions, use those as starting points for another search, etc., until you have marked all traversable positions.
To figure out which non-traversable positions have to be removed, if you wish to remove (nearly) as few as possible, think of each of the "disjoint sets" (better to call them "connected components") as single nodes in a graph, and look at a wide variety of paths connecting them. Count the number of X's that have to be removed for each path connecting a node to another node, and use that as the weight of an edge in a graph. Then you want to find the minimal spanning tree of that graph, using, e.g., Kruskal's algorithm.
That method is not guaranteed to find the minimum number of X's to remove to connect the traversable positions; for example, in the graph you gave, removing a single X near the top right corner connects three components, whereas my suggestion might result in removing two X's. However, your problem is vaguely specified, so I believe the difference is not important.
To find the exact minimum number of X's to remove, you would have to solve the "node-weighted Steiner tree problem" which in general is NP-hard, I believe. You may be able to get a good approximation, given that your graphs are planar: http://www-math.mit.edu/~hajiagha/NodePlanarSteiner.pdf.

What is the best in place sorting algorithm to sort a singly linked list

I've been reading on in place sorting algorithm to sort linked lists. As per Wikipedia
Merge sort is often the best choice for sorting a linked list: in this situation it is relatively easy to implement a merge sort in such a way that it requires only Θ(1) extra space, and the slow random-access performance of a linked list makes some other algorithms (such as quicksort) perform poorly, and others (such as heapsort) completely impossible.
To my knowledge, the merge sort algorithm is not an in place sorting algorithm, and has a worst case space complexity of O(n) auxiliary. Now, with this taken into consideration, I am unable to decide if there exists a suitable algorithm to sort a singly linked list with O(1) auxiliary space.
As pointed out by Fabio A. in a comment, the sorting algorithm implied by the following implementations of merge and split in fact requires O(log n) extra space in the form of stack frames to manage the recursion (or their explicit equivalent). An O(1)-space algorithm is possible using a quite different bottom-up approach.
Here's an O(1)-space merge algorithm that simply builds up a new list by moving the lower item from the top of each list to the end of the new list:
struct node {
WHATEVER_TYPE val;
struct node* next;
};
node* merge(node* a, node* b) {
node* out;
node** p = &out; // Address of the next pointer that needs to be changed
while (a && b) {
if (a->val < b->val) {
*p = a;
a = a->next;
} else {
*p = b;
b = b->next;
}
// Next loop iter should write to final "next" pointer
p = &(*p)->next;
}
// At least one of the input lists has run out.
if (a) {
*p = a;
} else {
*p = b; // Works even if b is NULL
}
return out;
}
It's possible to avoid the pointer-to-pointer p by special-casing the first item to be added to the output list, but I think the way I've done it is clearer.
Here is an O(1)-space split algorithm that simply breaks a list into 2 equal-sized pieces:
node* split(node* in) {
if (!in) return NULL; // Have to special-case a zero-length list
node* half = in; // Invariant: half != NULL
while (in) {
in = in->next;
if (!in) break;
half = half->next;
in = in->next;
}
node* rest = half->next;
half->next = NULL;
return rest;
}
Notice that half is only moved forward half as many times as in is. Upon this function's return, the list originally passed as in will have been changed so that it contains just the first ceil(n/2) items, and the return value is the list containing the remaining floor(n/2) items.
This somehow kind of remind me about my answer to a Dutch National Flag Problem question.
After giving it some thought this is what I came up to, let's see if this works out. I suppose the main problem is the merging step of the mergesort in O(1) extra-space.
Our representation of a linked-list:
[ 1 ] => [ 3 ] => [ 2 ] => [ 4 ]
^head ^tail
You end up with this merging step:
[ 1 ] => [ 3 ] => [ 2 ] => [ 4 ]
^p ^q ^tail
Being p and q the pointers for the segments we want to merge.
Simply add your nodes after the tail pointer. If *p <= *q you add p at the tail.
[ 1 ] => [ 3 ] => [ 2 ] => [ 4 ] => [ 1 ]
^p ^pp ^q/qq ^tail ^tt
Otherwise, add q.
[ 1 ] => [ 3 ] => [ 2 ] => [ 4 ] => [ 1 ] => [ 2 ]
^p ^pp ^q ^qq/tail ^tt
(Keeping track of the ending of our list q becomes tricky)
Now, if you move them you will rapidly lose track of where you are. You can beat this having a clever way to move your pointers or add the lengths into the equation. I definitely prefer the latter. The approach becomes:
[ 1 ] => [ 3 ] => [ 2 ] => [ 4 ]
^p(2) ^q(2) ^tail
[ 3 ] => [ 2 ] => [ 4 ] => [ 1 ]
^p(1) ^q(2) ^tail
[ 3 ] => [ 4 ] => [ 1 ] => [ 2 ]
^p(1) ^q(1) ^tail
[ 4 ] => [ 1 ] => [ 2 ] => [ 3 ]
^p(0)/q(1) ^tail
[ 1 ] => [ 2 ] => [ 3 ] => [ 4 ]
^q(0) ^tail
Now, you use that O(1) extra-space to be able to move your elements.

How to select specific item from cartesian product without calculating every other item

I'm mostly convinced that there is an answer to this problem, but for the life of me can't figure out how to do it.
Let's say I have three sets:
A = [ 'foo', 'bar', 'baz', 'bah' ]
B = [ 'wibble', 'wobble', 'weeble' ]
C = [ 'nip', 'nop' ]
And I know how to calculate the cartesian / cross product, (ant it's covered all over the place, on this site and elsewhere) so I won't go over that here.
What I'm looking for is an algorithm that would allow me to simply select a specific item from the cartesian product without generating the whole set or iterating until I reach the nth item.
Of course, I could easily iterate for a small example set like this, but the code I am working on will be working with much larger sets.
Therefore, I'm looking for a function, let's call it 'CP', where:
CP(1) == [ 'foo', 'wibble', 'nip' ]
CP(2) == [ 'foo', 'wibble', 'nop' ]
CP(3) == [ 'foo', 'wobble', 'nip' ]
CP(4) == [ 'foo', 'wobble', 'nop' ]
CP(5) == [ 'foo', 'weeble', 'nip' ]
CP(6) == [ 'foo', 'weeble', 'nop' ]
CP(7) == [ 'bar', 'wibble', 'nip' ]
...
CP(22) == [ 'bah', 'weeble', 'nop' ]
CP(23) == [ 'bah', 'wobble', 'nip' ]
CP(24) == [ 'bah', 'wobble', 'nop' ]
And the answer is produced in O(1) time, more or less.
I've been following the idea that it should be possible, (heck, even simple!) to calculate the indices of the elements from A,B,C that I want and then simply return the them from the original arrays, but my attempts to make this work correctly have so far, um, not worked.
I'm coding in Perl, but I can handily port a solution from Python, JavaScript, or Java (and probably a few others)
The number of possible combinations ist given by
N = size(A) * size(B) * size(C)
and you can index all items by an index i ranging from 0 to N (exclusive) via
c(i) = [A[i_a], B[i_b], C[i_c]]
where
i_a = i/(size(B)*size(C))
i_b = (i/size(C)) mod size(B)
i_c = i mod size(C)
(all sets are assumed to be indexable starting wih zero, / is integer division).
In order to get your example you may shift the index by 1.
I made a python version of the answer by Howard. Please let me know if you think it can be improved.
def ith_item_of_cartesian_product(*args, repeat=1, i=0):
pools = [tuple(pool) for pool in args] * repeat
len_product = len(pools[0])
for j in range(1,len(pools)):
len_product = len_product * len(pools[j])
if n >= len_product:
raise Exception("n is bigger than the length of the product")
i_list = []
for j in range(0, len(pools)):
ith_pool_index = i
denom = 1
for k in range(j+1, len(pools)):
denom = denom * len(pools[k])
ith_pool_index = ith_pool_index//denom
if j != 0:
ith_pool_index = ith_pool_index % len(pools[j])
i_list.append(ith_pool_index)
ith_item = []
for i in range(0, len(pools)):
ith_item.append(pools[i][i_list[i]])
return ith_item
Here is a shorter Python code based on Howard's answer:
import functools
import operator
import itertools
def nth_product(n, *iterables):
sizes = [len(iterable) for iterable in iterables]
indices = [
int((n/functools.reduce(operator.mul, sizes[i+1:], 1)) % sizes[i])
for i in range(len(sizes))]
return tuple(iterables[i][idx] for i, idx in enumerate(indices))

Resources