Why does printing out local data mess with MPI_GATHERV results? - parallel-processing

The following code needs 12 processors to run properly.
program GameOfLife
use mpi
implicit none
integer ierr, myid, numprocs
integer send, recv, count, tag
parameter (tag=111)
integer recv_buff, send_buff, request
integer stat(MPI_STATUS_SIZE)
integer N, m, i, j, sum
parameter (N=3) !# of squares per a processors
integer W, H
parameter (W=4,H=3) !# of processors up and across
integer A(N,N), buff(0:N+1,0:N+1), G(N*H, N*W)
! real r
integer sizes(2), subsizes(2), starts(2), recvcount(N*N)
integer newtype, intsize, resizedtype
integer(kind=MPI_ADDRESS_KIND) extent, begin
integer disp(W*H)
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
! fill up subgrid
do i = 1, N
do j = 1, N
! call random_number(r)
A(i,j) = myid! floor(r*2)
end do
end do
do i = 1, N
print *, A(i, :)
end do
starts = [0,0]
sizes = [N*H, N*W]
subsizes = [N, N]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = intsize*N
begin = 0
call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
disp = [0, 1, 2, 9, 10, 11, 18, 19, 20, 27, 28, 29]
recvcount = 1
call MPI_GATHERV(A,N*N,MPI_INTEGER,G,recvcount,disp,resizedtype,0,MPI_COMM_WORLD,ierr)
call MPI_WAIT(request, stat, ierr)
if ( myid == 0 ) then
do i = 1, N*H
print *, G(i,:)
end do
endif
call MPI_FINALIZE(ierr)
end program GameOfLife
When ran without printing out the matrix A, everything works mostly okay. But If I try to print out A before feeding it to the gather statement, I get a jumbled mess.
What's going on here? I assume memory is trying to be accessed at the same time or something along those lines.
Output of G looks like
0 0 0 4 4 4 0 -1302241474 1 13 13 13
0 0 0 4 4 4 0 0 0 13 13 13
0 0 0 4 4 4 -10349344 -12542198 -10350200 13 13 13
1 1 1 5 5 5 59 59 59 14 14 14
1 1 1 5 5 5 -1342953792 0 0 14 14 14
1 1 1 5 5 5 32767 0 0 14 14 14
2 2 2 6 6 6 -1342953752 1451441280 0 15 15 15
2 2 2 6 6 6 32767 10985 0 15 15 15
2 2 2 6 6 6 -10350200 1 0 15 15 15
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0

Related

R Margins outcome non-compliant with excpected raw data

We've run an Interrupted Time Series analysis on some aggregate count data using a Poisson regression. Code shown below - where Subject Total is the count, Quarter is time, int2 is the dummy variable for the intervention [0 pre, 1 post] and time_since_intervention2 the dummy variable for time since intervention [0 pre, 1:N post].
fit1a <- glm(`Subject Total` ~ Quarter + int2 + time_since_intervention2 , df, family = "poisson")
Quarter Subject Total int2 time_since_intervention2 subjectfit subcounter
1 1 34 0 0 34.20968 34.20968
2 2 32 0 0 33.39850 33.39850
3 3 36 0 0 32.60656 32.60656
4 4 34 0 0 31.83339 31.83339
5 5 23 0 0 31.07856 31.07856
6 6 34 0 0 30.34163 30.34163
7 7 33 0 0 29.62217 29.62217
8 8 24 0 0 28.91977 28.91977
9 9 31 0 0 28.23402 28.23402
10 10 32 0 0 27.56454 27.56454
11 11 21 0 0 26.91093 26.91093
12 12 26 0 0 26.27282 26.27282
13 13 22 0 0 25.64984 25.64984
14 14 28 0 0 25.04163 25.04163
15 15 28 0 0 24.44784 24.44784
16 16 22 0 0 23.86814 23.86814
17 17 14 1 1 17.88365 23.30218
18 18 16 1 2 17.01622 22.74964
19 19 20 1 3 16.19087 22.21020
20 20 19 1 4 15.40556 21.68355
21 21 13 1 5 14.65833 21.16939
22 22 15 1 6 13.94735 20.66743
23 23 16 1 7 13.27085 20.17736
24 24 8 1 8 12.62717 19.69892
Due to the need to exponentiate the outcome the summary is currently being derived using the margins package.
> summary(margins(fit1a))
factor AME SE z p lower upper
int2 -5.7843 5.1734 -1.1181 0.2635 -15.9241 4.3555
Quarter -0.5809 0.2469 -2.3526 0.0186 -1.0649 -0.0970
time_since_intervention2 -0.6227 0.9955 -0.6255 0.5316 -2.5738 1.3285
If reading the outcome correctly it would suggest that the level change between the final quarter in the pre-intervention period and first in the post-intervention period is -5.7843.
I've tried inputting coefficient values into my model [initial intercept = 35.0405575], but they don't appear to correlate at all with the subjectfit data, which I believed it would. Should the level change reported by the margins package replicate the difference in the full data.....?

How do I rearrange elements in a 1D matrix array in-place?

I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}

Lua (trAInsported): trying to implement Wavefront Algorithm, not working

I'm trying to implement a wavefront algorithm and I have a problem with the function, that produces the map with specific gradients.
I've tried several different versions of the code below and none of them worked.
The starting point for the algorithm (the goal) is set to 1 before and from that point on each neighbour's gradient should be increased (similar to every wavefront algorithm), if the gradient hasn't bin altered yet.
originX and originY is the goal, from which the alorithm should start. mapMatrix is a global variable
mapMatrix looks like this:
0 0 0 0 0 0 0
0 0 N 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 N 0 0 N 0 N
N N 0 0 N 0 0
0 0 0 0 0 0 0
(0 for rails, N(nil) for obstacles)
expected output example:
7 6 5 4 3 4 5
6 5 N 3 2 3 4
5 4 3 2 1 2 3
6 5 4 3 2 3 3
7 N 5 4 N 4 N
N N 6 5 N 5 6
9 8 7 6 7 6 7
And with this code for example:
function pathFinder(originX, originY)
northDir = originY - 1
eastDir = originX + 1
southDir = originY + 1
westDir = originX - 1
if northDir > 0 and mapMatrix[originX][northDir] == 0 then
mapMatrix[originX][northDir] = mapMatrix[originX][originY] + 1
pathFinder(originX, northDir)
end
if eastDir <= 7 and mapMatrix[eastDir][originY] == 0 then
mapMatrix[eastDir][originY] = mapMatrix[originX][originY] + 1
pathFinder(eastDir, originY)
end
if southDir <= 7 and mapMatrix[originX][southDir] == 0 then
mapMatrix[originX][southDir] = mapMatrix[originX][originY] + 1
pathFinder(originX, southDir)
end
if westDir > 0 and mapMatrix[westDir][originY] == 0 then
mapMatrix[westDir][originY] = mapMatrix[originX][originY] + 1
pathFinder(westDir, originY)
end
end
I get this mapMatrix:
0 0 0 0 3 4 5
0 0 N 0 2 10 6
0 0 0 0 1 9 7
0 0 0 0 0 0 8
0 N 0 0 N 0 N
N N 0 0 N 0 0
0 0 0 0 0 0 0
And if I switch the if statements arround it produces different version of mapMatrix
After making northDir, etc local the output looks like this: EDIT
33 24 23 22 3 4 5
32 25 N 21 2 11 6
31 26 27 20 1 10 7
30 29 28 19 20 9 8
31 N 29 18 N 10 N
N N 30 17 N 11 12
33 32 31 16 15 14 13
If more code or information is needed, I'd be happy to help
Your code is just wrong at all. As pathFinder is called recursively in the first check, it will be just going in that direction until any obstacle appears, and than going in the next direction, and so on.
BFS is actually a pretty simple algorithm. It can be easily implemented iteratively on a queue without any recursion as follow:
Put initial node to a queue;
Pop first node from the queue and process it;
Push unprocessed adjacent nodes to the end of the queue;
If queue is not empty, go to the step 2.
In Lua on a rectangular matrix it can be implemented in about two or three dozen of lines:
function gradient(matrix, originX, originY)
-- Create queue and put origin position and initial value to it.
local queue = { { originX, originY, 1 } }
repeat
-- Pop first position and value from the queue.
local x, y, value = unpack(table.remove(queue, 1))
-- Mark this position in the matrix.
matrix[y][x] = value
-- Check position to the top.
if y > 1 and matrix[y - 1][x] == 0 then
-- If it is not already processed, push it to the queue.
table.insert(queue, { x, y - 1, value + 1 })
end
-- Check position on the left.
if x > 1 and matrix[y][x - 1] == 0 then
table.insert(queue, { x - 1, y, value + 1 })
end
-- Check position to the bottom.
if y < #matrix and matrix[y + 1][x] == 0 then
table.insert(queue, { x, y + 1, value + 1 })
end
-- Check position on the right.
if x < #matrix[y] and matrix[y][x + 1] == 0 then
table.insert(queue, { x + 1, y, value + 1 })
end
-- Repeat, until queue is not empty.
until #queue == 0
end
-- Just helper function to print a matrix.
function printMatrix(matrix)
for _, row in pairs(matrix) do
for _, value in pairs(row) do
io.write(string.format("%2s", value))
end
io.write('\n')
end
end
local mapMatrix = {
{ 0, 0, 0, 0, 0, 0, 0, },
{ 0, 0, 'N', 0, 0, 0, 0, },
{ 0, 0, 0, 0, 0, 0, 0, },
{ 0, 0, 0, 0, 0, 0, 0, },
{ 0, 'N', 0, 0, 'N', 0, 'N', },
{ 'N', 'N', 0, 0, 'N', 0, 0, },
{ 0, 0, 0, 0, 0, 0, 0, },
}
gradient(mapMatrix, 5, 3)
printMatrix(mapMatrix)
--[[
Produces:
7 6 5 4 3 4 5
6 5 N 3 2 3 4
5 4 3 2 1 2 3
6 5 4 3 2 3 4
7 N 5 4 N 4 N
N N 6 5 N 5 6
9 8 7 6 7 6 7
]]
This is a complete script, runnable in the console.
But although, for illustrative purposes, this code is very simple, it is not very efficient. Each removal of the first item from the queue causes reindexing of the remaining items. For production code you should implement a linked list or something similar for the queue.

Count the frequency of matrix values including 0

I have a vector
A = [ 1 1 1 2 2 3 6 8 9 9 ]
I would like to write a loop that counts the frequencies of values in my vector within a range I choose, this would include values that have 0 frequencies
For example, if I chose the range of 1:9 my results would be
3 2 1 0 0 1 0 1 2
If I picked 1:11 the result would be
3 2 1 0 0 1 0 1 2 0 0
Is this possible? Also ideally I would have to do this for giant matrices and vectors, so the fasted way to calculate this would be appreciated.
Here's an alternative suggestion to histcounts, which appears to be ~8x faster on Matlab 2015b:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11;
N = accumarray(A(:), 1, [maxRange,1])';
N =
3 2 1 0 0 1 0 1 2 0 0
Comparing the speed:
K>> tic; for i = 1:100000, N1 = accumarray(A(:), 1, [maxRange,1])'; end; toc;
Elapsed time is 0.537597 seconds.
K>> tic; for i = 1:100000, N2 = histcounts(A,1:maxRange+1); end; toc;
Elapsed time is 4.333394 seconds.
K>> isequal(N1, N2)
ans =
1
As per the loop request, here's a looped version, which should not be too slow since the latest engine overhaul:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11; %// your range
output = zeros(1,maxRange); %// initialise output
for ii = 1:maxRange
tmp = A==ii; %// temporary storage
output(ii) = sum(tmp(:)); %// find the number of occurences
end
which would result in
output =
3 2 1 0 0 1 0 1 2 0 0
Faster and not-looping would be #beaker's suggestion to use histcounts:
[N,edges] = histcounts(A,1:maxRange+1);
N =
3 2 1 0 0 1 0 1 2 0
where the +1 makes sure the last entry is included as well.
Assuming the input A to be a sorted array and the range starts from 1 and goes until some value greater than or equal to the largest element in A, here's an approach using diff and find -
%// Inputs
A = [2 4 4 4 8 9 11 11 11 12]; %// Modified for variety
maxN = 13;
idx = [0 find(diff(A)>0) numel(A)]+1;
out = zeros(1,maxN); %// OR for better performance : out(maxN) = 0;
out(A(idx(1:end-1))) = diff(idx);
Output -
out =
0 1 0 3 0 0 0 1 1 0 3 1 0
This can be done very easily with bsxfun.
Let the data be
A = [ 1 1 1 2 2 3 6 8 9 9 ]; %// data
B = 1:9; %// possible values
Then
result = sum(bsxfun(#eq, A(:), B(:).'), 1);
gives
result =
3 2 1 0 0 1 0 1 2

Selecting neighbours on a circle

Consider we have N points on a circle. To each point an index is assigned i = (1,2,...,N). Now, for a randomly selected point, I want to have a vector including the indices of 5 points, [two left neighbors, the point itself, two right neighbors].
See the figure below.
Some sxamples are as follows:
N = 18;
selectedPointIdx = 4;
sequence = [2 3 4 5 6];
selectedPointIdx = 1
sequence = [17 18 1 2 3]
selectedPointIdx = 17
sequence = [15 16 17 18 1];
The conventional way to code this is considering the exceptions as if-else statements, as I did:
if ii == 1
lseq = [N-1 N ii ii+1 ii+2];
elseif ii == 2
lseq = [N ii-1 ii ii+1 ii+2];
elseif ii == N-1
lseq=[ii-2 ii-1 ii N 1];
elseif ii == N
lseq=[ii-2 ii-1 ii 1 2];
else
lseq=[ii-2 ii-1 ii ii+1 ii+2];
end
where ii is selectedPointIdx.
It is not efficient if I consider for instance 7 points instead of 5. What is a more efficient way?
How about this -
off = -2:2
out = mod((off + selectedPointIdx) + 17,18) + 1
For a window size of 7, edit off to -3:3.
It uses the strategy of subtracting 1 + modding + adding back 1 as also discussed here.
Sample run -
>> off = -2:2;
for selectedPointIdx = 1:18
disp(['For selectedPointIdx =',num2str(selectedPointIdx),' :'])
disp(mod((off + selectedPointIdx) + 17,18) + 1)
end
For selectedPointIdx =1 :
17 18 1 2 3
For selectedPointIdx =2 :
18 1 2 3 4
For selectedPointIdx =3 :
1 2 3 4 5
For selectedPointIdx =4 :
2 3 4 5 6
For selectedPointIdx =5 :
3 4 5 6 7
For selectedPointIdx =6 :
4 5 6 7 8
....
For selectedPointIdx =11 :
9 10 11 12 13
For selectedPointIdx =12 :
10 11 12 13 14
For selectedPointIdx =13 :
11 12 13 14 15
For selectedPointIdx =14 :
12 13 14 15 16
For selectedPointIdx =15 :
13 14 15 16 17
For selectedPointIdx =16 :
14 15 16 17 18
For selectedPointIdx =17 :
15 16 17 18 1
For selectedPointIdx =18 :
16 17 18 1 2
You can use modular arithmetic instead: Let p be the point among N points numbered 1 to N. Say you want m neighbors on each side, you can get them as follows:
(p - m - 1) mod N + 1
...
(p - 4) mod N + 1
(p - 3) mod N + 1
(p - 2) mod N + 1
p
(p + 1) mod N + 1
(p + 2) mod N + 1
(p + 3) mod N + 1
...
(p + m - 1) mod N + 1
Code:
N = 18;
p = 2;
m = 3;
for i = p - m : p + m
nb = mod((i - 1) , N) + 1;
disp(nb);
end
Run code here
I would like you to note that you might not necessarily improve performance by avoiding a if statement. A benchmark might be necessary to figure this out. However, this will only be significant if you are treating tens of thousands of numbers.

Resources