MPI_BCAST() applied on only a fraction of the base group

MPI_BCAST() applied on only a fraction of the base group - parallel-processing

I grouped 8 processors into two groups, each of which contains evenly four processors. I ask the root of each subgroup to do some communication with their subordinates using the subroutine "MPI_BCAST."
I came across a question: to indicate the root of a subgroup, should I use the original rank which the subgroup root corresponds to with the MPI_COMM_WORLD communicator, or the new rank it represents with the new communicator?
Take the code snippet below for example, I want to require P:0 to send data to its subordinates P:1, P:2, and P:3, and similarly, I ask P:4 to send out its data to P:5, P:6, P:7. To reach this goal, I am wondering if I should specify the fourth argument in line 36 as 1, or specify them as 0 and 4 respectively conditional on which head of subgroup I am referring to?
Thanks.
Lee
1 program main
2 include 'mpif.h'
3 integer :: ierr, irank, num_procs, base_group
4 integer :: nrow, ncol, irow, icol
5 integer :: dummy_group, dummy_comm, new_comm, new_rank
6 integer :: i, j, roster(4), data(4)
7
8 call MPI_Init ( ierr )
9 call MPI_COMM_RANK( MPI_comm_world, irank, ierr )
10 call MPI_COMM_SIZE( MPI_comm_world, num_procs, ierr)
11 call MPI_COMM_GROUP( MPI_comm_world, base_group, ierr)
12 nrow = 4
13 ncol = 2
14 irow = mod( irank, nrow ) + 1
15 icol = irank/nrow + 1
16
17 roster(1) = 0
18 do i = 2, nrow
19 roster(i) = roster(i-1) + 1
20 enddo
21
22 do i = 1, ncol
23 call MPI_GROUP_INCL( base_group, nrow, roster, dummy_group, ierr )
24 call MPI_COMM_CREATE( MPI_COMM_WORLD, dummy_group, dummy_comm, ierr )
25 if( icol == i ) new_comm = dummy_comm
26 forall( j=1:nrow ) roster(j) = roster(j) + nrow
27 enddo
28
29 ! Here I want to initialize data for processors P:0 and P:4
30 if( irank == 0 ) data = 0
31 if( irank == 4 ) data = 4
32
33 ! In the code below I want to require P:0 to send data to
34 ! its subordinates P:1, P:2, and P:3. Similarly, I ask P:4
35 ! to send out its data to P:5, P:6, P:7.
36 call MPI_BCAST( data, 4, MPI_INTEGER, 0, new_comm, ierr)
37
38 call MPI_Finalize ( ierr )
39 end program

All rank-type arguments (origin, target, etc.) in MPI must be ranks in the same communicator as that given by the communicator argument. In practice, what this means is that after creating a new communicator, each process in that communicator must call MPI_Comm_rank and MPI_Comm_size to retrieve it's rank and the total size in that communicator (unless you can deduce the new rank and size by other means in your code, of course).
As an aside, as what you're doing is splitting the original communicator into two disjoint communicators, I think an easier way to accomplish that is to use MPI_Comm_split rather than setting up groups manually as you have done.

Related

Why does printing out local data mess with MPI_GATHERV results?

The following code needs 12 processors to run properly.
program GameOfLife
use mpi
implicit none
integer ierr, myid, numprocs
integer send, recv, count, tag
parameter (tag=111)
integer recv_buff, send_buff, request
integer stat(MPI_STATUS_SIZE)
integer N, m, i, j, sum
parameter (N=3) !# of squares per a processors
integer W, H
parameter (W=4,H=3) !# of processors up and across
integer A(N,N), buff(0:N+1,0:N+1), G(N*H, N*W)
! real r
integer sizes(2), subsizes(2), starts(2), recvcount(N*N)
integer newtype, intsize, resizedtype
integer(kind=MPI_ADDRESS_KIND) extent, begin
integer disp(W*H)
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
! fill up subgrid
do i = 1, N
do j = 1, N
! call random_number(r)
A(i,j) = myid! floor(r*2)
end do
end do
do i = 1, N
print *, A(i, :)
end do
starts = [0,0]
sizes = [N*H, N*W]
subsizes = [N, N]
call MPI_Type_create_subarray(2, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = intsize*N
begin = 0
call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)
disp = [0, 1, 2, 9, 10, 11, 18, 19, 20, 27, 28, 29]
recvcount = 1
call MPI_GATHERV(A,N*N,MPI_INTEGER,G,recvcount,disp,resizedtype,0,MPI_COMM_WORLD,ierr)
call MPI_WAIT(request, stat, ierr)
if ( myid == 0 ) then
do i = 1, N*H
print *, G(i,:)
end do
endif
call MPI_FINALIZE(ierr)
end program GameOfLife
When ran without printing out the matrix A, everything works mostly okay. But If I try to print out A before feeding it to the gather statement, I get a jumbled mess.
What's going on here? I assume memory is trying to be accessed at the same time or something along those lines.
Output of G looks like
0 0 0 4 4 4 0 -1302241474 1 13 13 13
0 0 0 4 4 4 0 0 0 13 13 13
0 0 0 4 4 4 -10349344 -12542198 -10350200 13 13 13
1 1 1 5 5 5 59 59 59 14 14 14
1 1 1 5 5 5 -1342953792 0 0 14 14 14
1 1 1 5 5 5 32767 0 0 14 14 14
2 2 2 6 6 6 -1342953752 1451441280 0 15 15 15
2 2 2 6 6 6 32767 10985 0 15 15 15
2 2 2 6 6 6 -10350200 1 0 15 15 15
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0
3 3 3 7 7 8 8 8 12 12 12 0

Segfault when passing data using MPI Windows (MPI_Put)

I am trying to figure out how to use MPI to work with matrices.
I have a 3x6 matrix filled with zeros and am running code with 3 threads. 0 is the main one, 1 writes to the first row of the matrix in columns from 1 to 3 ones, and 2 stream writes to the second row in columns 4-6 of two.
I pass these formed parts to the main thread (at 0), I get the correct result, but after that a memory error is output to the console.
I can't figure out what I'm doing wrong. Can you please tell me what is my mistake?
program test
Use mpi
Implicit None
integer :: process_Rank, size_Of_Cluster, ierror = 0, win, size_s, n = 6
integer:: i , j
integer:: start, target_count = 9
integer :: mtx(3,6)
integer(kind = MPI_ADDRESS_KIND) :: nbytes = 4
!input matrix
do i = 1,3
do j =1,6
mtx(i,j) = 0
end do
end do
Call mpi_sizeof( mtx, size_s, ierror ) !Get the size of a matrix element
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)
!create windows
if(process_Rank == 0) then
call MPI_WIN_CREATE(mtx, size_s *6 * 3 * nbytes, 1, MPI_INFO_NULL, MPI_COMM_WORLD, win, ierror)
else
call MPI_WIN_CREATE(mtx, size_s * 6* 3*nbytes,1, MPI_INFO_NULL, MPI_COMM_WORLD, win, ierror)
end if
CALL MPI_Win_fence(0,win,ierror)
if(process_Rank == 1) then
!fill 3 columns of the first row with ones
start = 0
do i = 0,3
mtx(process_Rank,i+start) = process_Rank
end do
CALL MPI_PUT(mtx, size_s*3*6, MPI_INTEGER, 0, start * nbytes, target_count, MPI_INTEGER, win, ierror)
!print mtx
print *, process_Rank, ' put = '
do i = 1,3
print *, ''
do j = 1,3
write(*,fmt='(g0)', advance = 'no') mtx(i,j)
write(*,fmt='(g0)', advance = 'no') ' '
end do
end do
end if
CALL MPI_Win_fence(0, win,ierror)
if(process_Rank == 2) then
!fill the last 3 columns of the second row with twos
start = 3
do i = 1,3
mtx(process_Rank,i+start) = process_Rank
end do
CALL MPI_PUT(mtx(1:3,4:6), size_s* 3 *6, MPI_INTEGER, 0, 3 * 3 * nbytes, target_count, MPI_INTEGER, win, ierror)
!print mtx
print *, process_Rank, ' put = '
do i = 1,3
print *, ''
do j = 4,6
write(*,fmt='(g0)', advance = 'no') mtx(i,j)
write(*,fmt='(g0)', advance = 'no') ' '
end do
end do
end if
CALL MPI_Win_fence(0, win,ierror)
! print result
if(process_Rank == 0) then
print *, 'result = '
do i = 1,3
print *, ''
do j = 1,6
write(*,fmt='(g0)', advance = 'no') mtx(i,j)
write(*,fmt='(g0)', advance = 'no') ' '
end do
end do
end if
CALL MPI_Win_fence(0, win,ierror)
CALL MPI_WIN_FREE(win, ierror)
call MPI_FINALIZE(ierror)
end program test
Console:
1 put =
1 1 1
0 0 0
0 0 0
2 put =
0 0 0
2 2 2
0 0 0
result =
1 1 1 0 0 0
0 0 0 2 2 2
0 0 0 0 0 0
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7fd4447bcd01 in ???
#1 0x7fd4447bbed5 in ???
#2 0x7fd4445f020f in ???
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node alm-VirtualBox exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

If you use the -fcheck=all, which Ian Bush suggested to you in the first comment under your question, you will get the reason for the error immediately and you do not have to wait many hours for feedback on the internet. I got:
At line 38 of file mpi_wins.f90 Fortran runtime error:
Index '0' of dimension 2 of array 'mtx' below lower bound of 1
Error termination. Backtrace:
#0 0x7f7ed3e75640 in ???
#1 0x7f7ed3e76185 in ???
#2 0x7f7ed3e7652a in ???
#3 0x4010e4 in test
at /home/lada/f/testy/stackoverflow/mpi_wins.f90:38
#4 0x401e78 in main
at /home/lada/f/testy/stackoverflow/mpi_wins.f90:3
You are indexing your mtx array using the process rank, but the array is defined to start from 1.
integer :: mtx(3,6)
However, MPI ranks start from 0, not from 1.
Also notice that the backtrace now contains a better code location thanks to the -g compiler option.

Wrong results when reading a txt file into an allocatable array using Fortran 90

I'm trying to solve problems related to retaining values, when I use de/allocate in the code shown below (fortran), making a copy array, but the problem persists. I've already seen links related with the topic:
Fortran array automatically growing when adding a value
How to get priorly-unknown array as the output of a function in Fortran
It would be easy and it doesn't make sense (for the purpose of this code) if I know array dimension (input from a txt file).
Possibly I make some mistakes (one of them is obvious: minute dimension against expected total dimension). I will be grateful if someone specify them. Despite of this I can't understand how making a copy array can solve the problem, because I need to de/allocate both the temporary and the main variables.
So, is it possible to read a txt without "variable dimension" info using reallocate (de/allocate)?
That's the code (using f90):
program prueba
implicit none
integer, dimension(:), allocatable :: minuto, temp
integer :: iounit, ierr
integer :: i = 1
integer :: n = 1
open(newunit = iounit, file = 'datos.txt')
read(iounit,*)
allocate(minuto(n), temp(n))
minuto = 0; temp = 0
!-------------------------------------------
do
read(unit = iounit, fmt = '(i2)',iostat = ierr) temp(i)
if (ierr/=0) exit
if (mod(size(temp,1),5)==0) then
deallocate(minuto)
allocate(minuto(i))
minuto((i-4):i) = temp((i-4):i)
end if
i = i+1
deallocate(temp)
allocate(temp(i))
end do
close(iounit)
print*,minuto
end program prueba
(I know better ways to achieve the same goal, that's only an exercise to deepen)
I use this data example (from a txt):
min
5
10
15
20
25
30
35
40
45
50
55
0
That's the result:
-2144186072 1 -2144186072 1 25 0 35 40 45 50

In the reallocation process you deallocate minuto and don't save its old data.
This is a sample program which could work out for you
program prueba
implicit none
integer, allocatable :: minuto(:)
integer, parameter :: n = 2
integer :: iounit, ierr, temp(n), i
open (newunit = iounit, file = 'datos.txt')
read (iounit, *)
! init minuto. needed for move_alloc in first call
allocate (minuto(0))
i = 1
do
read (unit = iounit, fmt = '(i2)', iostat = ierr) temp(i)
! exit loop. still save temp(1:i-1)
if (ierr /= 0) then
if (i > 1) call save_temp(i-1)
exit
end if
! save all of temp
if (i == n) call save_temp(n)
i = mod(i, n) +1
end do
close (iounit)
print *, minuto
contains
subroutine save_temp(n_temp)
!! append temp(1:n_temp) to minuto
integer, intent(in) :: n_temp
integer, allocatable :: temp_reloc(:)
! save old data from minuto into temp_reloc
call move_alloc(minuto, temp_reloc)
allocate (minuto(size(temp_reloc) + n_temp))
! init first part of minuto by its old data
minuto(:size(temp_reloc)) = temp_reloc
! append temp's data
minuto(size(temp_reloc)+1:) = temp(1:n_temp)
end subroutine
end program
Output
$ gfortran -g3 -Wall -fcheck=all a.f90 && ./a.out
5 10 15 20 25 30 35 40 45 50 55 0

the use of MPI_Init()

I encountered a question about the use of MPI_Init().
I want to initialize random number "randv" only on the root processor with the code in the context below. To see if my goal is fulfilled, I have the program print out the array "randv" by placing a do loop immediately after the line "call RANDOM_NUMBER(randv)."
However, what is shown on the outcome screen is the repetition of the random number array by 8 times (given the number of processors is 8). My question is why the processors other than the root one are initialized before call MPI_Init(). If all the processors are awaken and have the same random number array before evoking MPI_Init, why bother to place call MPI_Init() for initialization? Thanks.
Lee
Here is the example I use:
program main
include 'mpif.h'
integer :: i
integer :: ierr
integer :: irank
integer :: nrow, ncol
real, dimension(:,:), allocatable :: randv
nrow = 4
ncol = 2
allocate(randv(nrow,ncol))
call RANDOM_SEED
call RANDOM_NUMBER(randv)
do i = 1, nrow
write(*,'(2(f5.2,x))') randv(i,:)
enddo
call MPI_Init ( ierr )
allocate(row_list(ncol), col_list(nrow))
call MPI_Comm_rank ( MPI_COMM_WORLD, irank, ierr )
if( irank == 0 )then
do i = 1, nrow
write(*,'(2(f5.2,x))') randv(i,:)
enddo
endif
call MPI_Finalize ( ierr )
deallocate( randv )
end program

I think you misunderstand how MPI works. The program you wrote is executed by every process. MPI_Init initializes the MPI environment s.t. those processes can interact. After initialization every process is uniquely identified by its rank. You have to make sure that, based on these ranks, each process works on different portions of your data, or performs different tasks.
Typically, you should run MPI_Init before anything else in your program.
Using MPI_Comm_rank you can obtain the ID of the current process (its rank). The first process always has the rank 0. Therefore, if you want to run parts of the code on the "master" process only, you can test for irank == 0:
program main
include 'mpif.h'
integer :: i
integer :: ierr
integer :: irank
integer :: nrow, ncol
real, dimension(:,:), allocatable :: randv
! Initialize MPI
call MPI_Init ( ierr )
! Get process ID
call MPI_Comm_rank ( MPI_COMM_WORLD, irank, ierr )
! Executed on all processes
nrow = 4
ncol = 2
allocate(randv(nrow,ncol))
! Only exectued on the master process
if ( irank == 0 ) then
call RANDOM_SEED
call RANDOM_NUMBER(randv)
do i = 1, nrow
write(*,'(2(f5.2,x))') randv(i,:)
enddo
endif
! Executed on all threads
allocate(row_list(ncol), col_list(nrow))
! Only exectued on the master process
if ( irank == 0 ) then
do i = 1, nrow
write(*,'(2(f5.2,x))') randv(i,:)
enddo
endif
deallocate( randv )
! Finalize MPI, should always be executed last
call MPI_Finalize ( ierr )
end program

partition a 2D array column-wise and use allgather

I have a fortran MPI code in which a compute intensive function is invoked on every element of a 2D array. I'm trying to split the tasks among the ranks. For example if there are 30 columns and 10 ranks, then each rank gets 3 columns. The following code does this split and gathers the results using allgather. But the final array doesn't have the values from all ranks.
program allgather
include 'mpif.h'
!create a 2 x 30 myarray
integer :: x=2,y=30
integer :: numprocs,myid
integer :: i,j,k,myelements,mycolumns,jb,je
integer*4,dimension(:),allocatable :: displacement,recvcnt
real :: checksum
real,dimension(:,:),allocatable :: myarr,combinedarr
call MPI_INIT(IERR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
mycolumns = y/numprocs
myelements = x * mycolumns
allocate(displacement(numprocs),recvcnt(numprocs))
jb = 1 + ( myid * mycolumns )
je = ( myid + 1 ) * mycolumns
allocate(myarr(x,mycolumns))
allocate(combinedarr(x,y))
myarr(:,:) =0
do j=jb,je
do i=1,x
myarr(i,j) = 1
enddo
enddo
!myarr(:,:)=1
if(mod(y,numprocs) > 0) then
if(myid==numprocs-1) then
jb=(myid + 1) * mycolumns + 1
do j=jb,y
do i=1,x
myarr(i,j) = 1
enddo
enddo
endif
endif
combinedarr(:,:) =0
recvcnt(:)=myelements
do k=1,numprocs
displacement(k) = (k-1) *myelements
enddo
call MPI_ALLGATHERV(myarr,myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
if(mod(y,numprocs) > 0) then
recvcnt(:) = 0
recvcnt(numprocs) = (x*y) - myelements * (numprocs)
displacement(numprocs) = displacement(numprocs) + myelements
call MPI_ALLGATHERV(myarr,recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
endif
if (myid==0) then
checksum=0
write(6,*) "mycolumns:",mycolumns,"myelements:",myelements
do j=1,y
do i=1,x
checksum = checksum + combinedarr(i,j)
enddo
enddo
write(6,*) checksum
endif
end

First of all, you are using MPI_ALLGATHERV() just as MPI_ALLGATHER() and get no benefit from its ability to send different number of elements from/to each process. But that's not the error in your program. The error lies in the way it fills myarr. You allocate it as myarr(x,mycolumns) but when filling it from column jb to column je, you go past the end of the array in all processes but rank 0 since jb and je are greater than mycolumns there. Thus myarr contains ones only in rank 0 and zeroes in all other ranks. So, yes, the final array does not have the values that you expect but that's because you filled them wrong, not because of the way MPI subroutines are used.
Writing past the end of an allocatable array destroys the hidden structures that are used to manage heap allocation and usually crashes the program. In your case you are just lucky - I run your code with Open MPI and it crashed with core dumps each time.
And you are also missing a call to MPI_FINALIZE() at the end of your code.
Hint: use the Fortran 90 interface if available - replace include 'mpif.h' with use mpi

here is the final version of the code. I have implemented the fixes suggested by "Hristo Iliev" and also fixed the part where the # or ranks does not equally divide the # of columns. Here the last rank does the computation on the leftover columns.
program allgather
include 'mpif.h'
!create a 2 x 30 myarray
integer :: x=4,y=6
integer :: numprocs,myid
integer :: i,j,k,myelements,mycolumns,jb,je,jbb
integer*4,dimension(:),allocatable :: displacement,recvcnt
real :: checksum
real,dimension(:,:),allocatable :: myarr,combinedarr
call MPI_INIT(IERR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
mycolumns = y/numprocs
myelements = x * mycolumns
allocate(displacement(numprocs),recvcnt(numprocs))
jb = 1 + ( myid * mycolumns )
je = ( myid + 1 ) * mycolumns
allocate(myarr(x,y))
allocate(combinedarr(x,y))
myarr(:,:) =0
do j=jb,je
do i=1,x
myarr(i,j) = (j-1) * x + i
enddo
enddo
if(mod(y,numprocs) > 0) then
if(myid==numprocs-1) then
jbb=(myid + 1) * mycolumns + 1
do j=jbb,y
do i=1,x
myarr(i,j) = (j-1) * x + i
enddo
enddo
endif
endif
combinedarr(:,:) =0
recvcnt(:)=myelements
do k=1,numprocs
displacement(k) = (k-1) *myelements
enddo
call MPI_ALLGATHERV(myarr(1,jb),myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
if(mod(y,numprocs) > 0) then
recvcnt(:) = 0
recvcnt(numprocs) = (x*y) - myelements * (numprocs)
displacement(numprocs) = displacement(numprocs) + myelements
call MPI_ALLGATHERV(myarr(1,jbb),recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
endif
if (myid==0) then
checksum=0
write(6,*) "mycolumns:",mycolumns,"myelements:",myelements
do j=1,y
do i=1,x
checksum = checksum + combinedarr(i,j)
enddo
enddo
write(6,*) checksum
endif
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

MPI_BCAST() applied on only a fraction of the base group - parallel-processing

Related

Why does printing out local data mess with MPI_GATHERV results?

Segfault when passing data using MPI Windows (MPI_Put)

Wrong results when reading a txt file into an allocatable array using Fortran 90

the use of MPI_Init()

partition a 2D array column-wise and use allgather

Categories

Resources