MPI_BYTE error in allreduce on cluster

MPI_BYTE error in allreduce on cluster - parallel-processing

MPI_BYTE runs perfect on one cluster but throws an error on the other one.
Is there any reason for this, as sizeof(bool)=1 BYTE and I like 1 BYTE to be reduced.
here is the code
int main( int argcs, char *pArgs[] )
{
MPI_Init( &argcs, &pArgs );
int my_rank, comsize;
MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
MPI_Comm_size( MPI_COMM_WORLD, &comsize );
bool sb=false;
if(my_rank==comsize-1)
{
sb=true;
}
bool rb=true;
MPI_Request request0;
double t1;
t1 = MPI_Wtime();
MPI_Iallreduce( &sb, &rb, sizeof(bool), MPI_BYTE, MPI_MAX, MPI_COMM_WORLD, &request0 );
MPI_Wait( &request0, MPI_STATUS_IGNORE );
double t2 = MPI_Wtime();
MPI_Finalize();
}

I do not think the standard allows you to use MPI_BYTE with a C bool.
FWIW, in Fortran you can use MPI_LOGICAL.
Your statement sizeof(bool) == 1 is indeed incorrect, please refer to Is sizeof(bool) defined? for the details.
From my point of view, your program is incorrect and has hence an undefined behavior.
I am afraid you have to manually convert bool to byte in C, and then you can use MPI_BYTE.

Related

MPI: How to ensure a subroutine is executed only on one processor on the default node?

I use a large-scale parallelized code, and I am new to MPI itself.
I try to run a set of shell commands from Fortran, and hence it would be entirely wasteful (and cause my results to be incorrect) if done on more than one processor.
The most relevant commands I have found are MPI_gather and MPI_reduce, but these seem problematic, because they are trying to take information from other processors and use them on the processor 0, but I have no information that I am calling from other processors.
Basically I want to do something like this :
if (MPI_node = 0 .and. MPI_process = 0) then
(execute a code)
end if

I recently had an issue like this. The way I solved it was to use MPI_Comm_split to create a communicator for each node. Something like this (C++):
char node_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
int processor_hash_id;
int global_proc_id;
int global_proc_num;
int node_proc_id;
int node_proc_num;
MPI_Comm node_comm;
//Get global info
MPI_Comm_rank(MPI_COMM_WORLD, &global_proc_id);
MPI_Comm_size(MPI_COMM_WORLD, &global_proc_num);
MPI_Get_processor_name(node_name, &name_len);
//Hash the node name
processor_hash_id = get_hash_id(node_name);
//Make a new communicator for processes only on the node
// and get node info
MPI_Comm_split(MPI_COMM_WORLD, processor_hash_id, global_proc_id, &node_comm);
MPI_Comm_rank(node_comm, &node_proc_id);
MPI_Comm_size(node_comm, &node_proc_num);
//Now, if you know the name of the "root node" to execute shell commands on:
if (node_proc_id==0 && processor_hash_id == get_hash_id("name-of-root-node"))
{
//do whatever
}
//Some hash function
int get_hash_id(const char* s)
{
int h = 37;
while (*s)
{
h = (h * 54059) ^ (s[0] * 76963);
s++;
}
return h;
}
Of course you will need to know the name of the root node.
If it doesn't matter what node it executes on, then I would suggest the following:
int global_proc_id;
int global_proc_num;
//Get global info
MPI_Comm_rank(MPI_COMM_WORLD, &global_proc_id);
MPI_Comm_size(MPI_COMM_WORLD, &global_proc_num);
if (global_proc_id==0)
{
//do whatever
}
global_proc_id==0 will only be true on one node.

show location of biggest memory leak on Windows

My program has big leaks. I am using the debug heap by putting this in my stdafx.h:
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
Then I'm capturing all the leaks in a text file by putting this code just before exit:
HANDLE hLogFile;
hLogFile = CreateFile( "T:\\MyProject\\heap.txt", GENERIC_WRITE,
FILE_SHARE_WRITE, NULL, CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
_CrtSetReportFile(_CRT_WARN, hLogFile);
_CrtDumpMemoryLeaks();
exit( EXIT_SUCCESS );
However even then the data is leak by leak, which is far too low-level information.

Stepping into _CrtDumpMemoryLeaks(), the code is actually easy to follow. I wrote my own function that summarizes the data, reporting bytes leaked for each line of code and sorting by leak size.
However it requires a static variable inside dbgheap.c in order to work. I've tried to make a version of dbgheap.c that doesn't have these as static symbols and tried to make a mini-DLL out of it (but it complains about a missing symbol I can't find anywhere in the MSFT code, _heap_regions). Instead what I've settled on is putting this code right before the code above calling _CrtDumpMemoryLeaks():
// Put a breakpoint here; step INTO the malloc, then in variable watch
// window evaluate: _CrtDumpMemoryLeakSummary( _pFirstBlock );
void* pvAccess = malloc(1);
And in turn this is the code for the _CrtDumpMemoryLeakSummary function:
#define _CRTBLD
#include "C:\Program Files\Microsoft Visual Studio 9.0\VC\crt\src\dbgint.h"
typedef struct {
const char* pszFileName;
int iLine;
int iTotal;
} Location_T;
#define MAX_SUMMARY 5000
static Location_T aloc[ MAX_SUMMARY ];
static int CompareFn( const void* pv1, const void* pv2 ) {
Location_T* ploc1 = (Location_T*) pv1;
Location_T* ploc2 = (Location_T*) pv2;
if ( ploc1->iTotal > ploc2->iTotal )
return -1;
if ( ploc1->iTotal < ploc2->iTotal )
return 1;
return 0;
}
void _CrtDumpMemoryLeakSummary( _CrtMemBlockHeader* pHead )
{
int iLocUsed = 0, iUnbucketed = 0, i;
for ( /*pHead = _pFirstBlock */;
pHead != NULL && /* pHead != _pLastBlock && */ iLocUsed < MAX_SUMMARY;
pHead = pHead->pBlockHeaderNext ) {
const char* pszFileName = pHead->szFileName ? pHead->szFileName : "<UNKNOWN>";
// Linear search is theoretically horribly slow but saves trouble of
// avoiding heap use while measuring heap use.
int i;
for ( i = 0; i < iLocUsed; i++ ) {
// To speed search, compare line number (fast) before strcmp() (slow).
// If szFileName were guaranteed to be __LINE__ then we could take advantage
// of __LINE__ always having the same address for any given file, and just
// compare pointers rather than using strcmp(). However, szFileName could
// be something else.
if ( pHead->nLine == aloc[i].iLine &&
strcmp( pszFileName, aloc[i].pszFileName ) == 0 ) {
aloc[i].iTotal += pHead->nDataSize;
break;
}
}
if ( i == iLocUsed ) {
aloc[i].pszFileName = pszFileName;
aloc[i].iLine = pHead->nLine;
aloc[i].iTotal = pHead->nDataSize;
iLocUsed++;
}
}
if ( iLocUsed == MAX_SUMMARY )
_RPT0( _CRT_WARN, "\n\n\nARNING: RAN OUT OF BUCKETS! DATA INCOMPLETE!!!\n\n\n" );
qsort( aloc, iLocUsed, sizeof( Location_T ), CompareFn );
_RPT0(_CRT_WARN, "SUMMARY OF LEAKS\n" );
_RPT0(_CRT_WARN, "\n" );
_RPT0(_CRT_WARN, "bytes leaked code location\n" );
_RPT0(_CRT_WARN, "------------ -------------\n" );
for ( i = 0; i < iLocUsed; i++ )
_RPT3(_CRT_WARN, "%12d %s:%d\n", aloc[i].iTotal, aloc[i].pszFileName, aloc[i].iLine );
}
It produces output like this:
SUMMARY OF LEAKS
bytes leaked code location
------------ -------------
912997 <UNKNOWN>:0
377800 ..\MyProject\foo.h:205
358400 ..\MyProject\A.cpp:959
333672 ..\MyProject\B.cpp:359
8192 f:\dd\vctools\crt_bld\self_x86\crt\src\_getbuf.c:58
6144 ..\MyProject\Interpreter.cpp:196
4608 ..\MyProject\Interpreter.cpp:254
3634 f:\dd\vctools\crt_bld\self_x86\crt\src\stdenvp.c:126
2960 ..\MyProject\C.cpp:947
2089 ..\MyProject\D.cpp:1031
2048 f:\dd\vctools\crt_bld\self_x86\crt\src\ioinit.c:136
2048 f:\dd\vctools\crt_bld\self_x86\crt\src\_file.c:133

scanf,fgets, fgetc get skipped inside loop

Im trying to make a recursive menu.
This program will later work with a tree(hojanodo), thats why I keep track of the root.
Problem: For some reason the fgets/fgetc is being skipped inside the recursivity on the second run, why does this happen?
I want the user to input either 1,2 or 3.(int)
What would be the fix for this? and is this the best way to implement a menu?
Here's what I have right now:(It compiles and runs so you can test it out but doesn't really work like I would like to..)
#include<stdio.h>
#include<stdlib.h>
typedef struct node{
char ch;
int i;
struct node *left;
struct node *right;
}hojaNodo;
int handle_menu(int eventHandler, hojaNodo **root);
int opcion_menu();
char get_symbol();
int get_userMenuInput();
int intro();
int main(){
hojaNodo *treeRoot = NULL;
intro();
// system("clear");
handle_menu(opcion_menu(), &treeRoot);
return 0;
}
int opcion_menu(){
int userOption;
printf("1.Agrega un Simbolo.\n");
printf("2.Listar Codigo\n");
printf("3.Exit");
userOption = get_userMenuInput();
printf("User: %d",userOption);
if(userOption < 4 && userOption > 0){
return userOption;
}
else
return -1;
}//eof opcion_menu
int handle_menu(int userOption,hojaNodo **root){
hojaNodo *tempRoot = NULL;
tempRoot = *root;
int valor;
char simbol;
switch(userOption){
case 1:
simbol = get_symbol();
printf("Simbol: %c", simbol);
break;
case 2:
printf("List Nodes\n");
break;
case 3:
printf("Exit");
userOption = -1;
// destroy_tree(root);
break;
default:
printf("userOption Error, Bye!");
break;
}//eof switch
if(userOption != -1)
handle_menu(opcion_menu(),&tempRoot);
// return userOption;
return -1;
}//eof menu()
char get_symbol(){
/*char userKey[3]
fgets(userKey,len,stdin);*/
char simbolo;
printf("Give me a symbol.");
simbolo = fgetc(stdin);
return simbolo;
}
int get_userMenuInput(){
char userKey[3];
int userOption;
size_t len;
len = sizeof(userKey);
fgets(userKey,len,stdin);
userOption = atoi(userKey);
//printf("User Option: %d\n", userOption);
return userOption;
}

Well apart from all the comments related to recursion and other changes suggested, please check this out. fgets() function needs flushing the input stream. It can be done using fflush() or fgetc().
A simple solution would be:
In function:
int opcion_menu(){
...
fgets(userKey,2,stdin);
fgetc(stdin); // Add this statement
Also in function:
int handle_menu(int userOption,hojaNodo **root)
case 1:
printf("Give me a choice : ");
fgets(userKey,2,stdin);
fgetc(stdin); // add this statement
fgets reads in at most one less than size characters from stream and stores them into the buffer pointed to by string. This will lead the newline character still available in Input Stream which need to be flushed. If this newline character is not read from Input stream, than this would become the input for next fgets function and ultimately it will skip the fgets(since it has already got its input a newline character)
fgetc(stdin) will flush out these extra newline character.

I don't know if this might help anyone.
In my case, I had to 'free' the buffer from the char with this function:
void clean(){
char cTemp;
while((cTemp = getchar()) != '\n')
;
}
Im not really sure why this works but it does(if anyone does, please add it to my answer).
I call it right before I call get_userOption();

Task switching using a queue

i'm developing my own hobby os, and now i'm stuck with a problem on the scheduler/task switching.
I planned to use a FIFO queue as structure to hold processes. I implemented it using linked list.
I also decided to use the iret method to switch from a task to another (so when the os was serving an interrupt request just before the iret i change the ESP register in order to move to the new task).
But i have a problem.
When the os start it launch two tasks:
idle
shell
And with these two i have no problem.
But if i try to launch two other tasks (with a simply printf inside), the task queue was corrupted.
If after that i try to print the queue it print only two tasks that are the 2 just created and with idle and shell disappeared, but the os continues to work (i think that in a specific moment the esp field of the new tasks was replaced with the esp content of the shell).
The task data structure is:
typedef struct task_t{
pid_t pid;
char name[NAME_LENGTH];
void (*start_function)();
task_state status;
task_register_t *registers;
unsigned int cur_quants;
unsigned int eip;
long int esp;
unsigned int pdir;
unsigned int ptable;
struct task_t *next;
}task_t;
and the tss is:
typedef struct {
unsigned int edi; //+0
unsigned int esi; //+1
unsigned int ebp; //+2
unsigned int esp; //+3 (can be null)
unsigned int ebx; //+4
unsigned int edx; //+5
unsigned int ecx; //+6
unsigned int eax; //+7
unsigned int eip; //+8
unsigned int cs; //+9
unsigned int eflags; //+10
unsigned int end;
} task_register_t;
The scheduler function is the following:
void schedule(unsigned int *stack){
asm("cli");
if(active == TRUE){
task_t* cur_task = dequeue_task();
if(cur_task != NULL){
cur_pid = cur_task->pid;
dbg_bochs_print("#######");
dbg_bochs_print(cur_task->name);
if(cur_task->status!=NEW){
cur_task->esp=*stack;
} else {
cur_task->status=READY;
((task_register_t *)(cur_task->esp))->eip = cur_task->eip;
}
enqueue_task(cur_task->pid, cur_task);
cur_task=get_task();
if(cur_task->status==NEW){
cur_task->status=READY;
}
dbg_bochs_print(" -- ");
dbg_bochs_print(cur_task->name);
dbg_bochs_print("\n");
//load_pdbr(cur_taskp->pdir);
*stack = cur_task->esp;
} else {
enqueue_task(cur_task->pid, cur_task);
}
}
active = FALSE;
return;
asm("sti");
}
The tss is initalized with the following values:
void new_tss(task_register_t* tss, void (*func)()){
tss->eax=0;
tss->ebx=0;
tss->ecx=0;
tss->edx=0;
tss->edi =0;
tss->esi =0;
tss->cs = 8;
tss->eip = (unsigned)func;
tss->eflags = 0x202;
tss->end = (unsigned) suicide;
//tss->fine = (unsigned)end; //per metterci il suicide
return;
}
And the function that creates a new task is the following:
pid_t new_task(char *task_name, void (*start_function)()){
asm("cli");
task_t *new_task;
table_address_t local_table;
unsigned int new_pid = request_pid();
new_task = (task_t*)kmalloc(sizeof(task_t));
strcpy(new_task->name, task_name);
new_task->next = NULL;
new_task->start_function = start_function;
new_task->cur_quants=0;
new_task->pid = new_pid;
new_task->eip = (unsigned int)start_function;
new_task->esp = (unsigned int)kmalloc(STACK_SIZE) + STACK_SIZE-100;
new_task->status = NEW;
new_task->registers = (task_register_t*)new_task->esp;
new_tss(new_task->registers, start_function);
local_table = map_kernel();
new_task->pdir = local_table.page_dir;
new_task->ptable = local_table.page_table;
//new_task->pdir = 0;
//new_task->ptable = 0;
enqueue_task(new_task->pid, new_task);
//(task_list.current)->cur_quants = MAX_TICKS;
asm("sti");
return new_pid;
}
I'm sure that i just forgot something, or i miss some consideration. But i cannot figure what i'm missing.
Actually i'm working only in kernel mode, and inside the same address space (pagiing is enabled, but actually i use the same pagedir for all tasks).
The ISR macros are defined here:
https://github.com/inuyasha82/DreamOs/blob/master/include/processore/handlers.h
I declared four kinds of function in order to handle ISR:
EXCEPTION
EXCEPTION_EC (an exception with an error code)
IRQ
SYSCALL
Obviously the scheduler is called by an IRQ routine, so the macro looks like:
__asm__("INT_"#n":"\
"pushad;" \
"movl %esp, %eax;"\
"pushl %eax;"\
"call _irqinterrupt;"\
"popl %eax;"\
"movl %eax, %esp;"\
"popad;"\
"iret;")
the irq handler function is:
void _irqinterrupt(unsigned int esp){
asm("cli;");
int irqn;
irqn = get_current_irq();
IRQ_s* tmpHandler;
if(irqn>=0) {
tmpHandler = shareHandler[irqn];
if(tmpHandler!=0) {
tmpHandler->IRQ_func();
#ifdef DEBUG
printf("2 - IRQ_func: %d, %d\n", tmpHandler->IRQ_func, tmpHandler);
#endif
while(tmpHandler->next!=NULL) {
tmpHandler = tmpHandler->next;
#ifdef DEBUG
printf("1 - IRQ_func (_prova): %d, %d\n", tmpHandler->IRQ_func, tmpHandler);
#endif
if(tmpHandler!=0) tmpHandler->IRQ_func();
}
} else printf("irqn: %d\n", irqn);
}
else printf("IRQ N: %d E' arrivato qualcosa che non so gestire ", irqn);
if(irqn<=8 && irqn!=2) outportb(MASTER_PORT, EOI);
else if(irqn<=16 || irqn==2){
outportb(SLAVE_PORT, EOI);
outportb(MASTER_PORT, EOI);
}
schedule(&esp);
asm("sti;");
return;
}
And these are the enqueue_task and dequeue_task functions:
void enqueue_task(pid_t pid, task_t* n_task){
n_task->next=NULL;
if(task_list.tail == NULL){
task_list.head = n_task;
task_list.tail = task_list.head;
} else {
task_list.head->next=n_task;
task_list.head = n_task;
}
}
task_t* dequeue_task(){
if(task_list.head==NULL){
return NULL;
} else {
task_t* _task;
_task = task_list.tail;
task_list.tail=_task->next;
return _task;
}
return;
}
Thanks in advance,
and let me know if you need more details!

It is hard to tell. How does your assembly part of the isr look like? What makes me think is the problem (since you can save and restore two tasks but not more) is that you don't push and pop all registers properly. You do use pusha and popa for the isr right?
I also want to add that having cli and sti like you have done there can be dangerous. In your isrs set cli as the first opcode. Then you wont need to use sti at all because iret will automatically flip this on for you (it is actually a bit in the eflags register).
Good luck!

Pipe and select : sample code not working

Am I missing something ?
I want to come out of select by calling write in another thread... It never comes out of select.
Code is tested on OSX snow.
fd_set rio, wio;
int pfd[2];
void test(int sleep_time)
{
sleep(sleep_time);
char buf[] = "1";
write(pfd[1], buf, 1);
}
int main(int argc, char* argv[])
{
char buff[80];
int ended = 0;
pipe(pfd);
FD_ZERO(&rio);
FD_ZERO(&wio);
FD_SET(pfd[1], &wio);
FD_SET(pfd[0], &rio);
pthread_t tid; /* the thread identifier */
pthread_attr_t attr; /* set of thread attributes */
pthread_attr_init(&attr);
pthread_create(tid, NULL, test, 3);
while (!ended)
{
// Check my numbers ... they do not go over 1 ... so 2
if (select(2, &rio, &wio, NULL, 0) < 0)
perror("select");
else
{
if (FD_ISSET(pfd[1], &wio))
{
if ((read(pfd[0], &buff, 80))<0)
perror("read");
ended = 1;
}
}
}

I believe you have 2 errors:
1 - your select call is limiting the check to a max of fd 2, where the pipe will probably have larger FDs since 0, 1, and 2 are already opened for stdin, stdout, stderr. The pipe FDs will presumably have fds 3 and 4 so you actually need to determine the larger of the 2 pipe FDs and use that for the limit in the select instead of 2.
int maxfd = pfd[1];
if( pfd[0] > maxfd ) {
maxfd = pfd[0];
}
...
2 - After select returns, you are looking at the wio and pipe write FD when you need to instead look to see if there is anything available to READ:
if (FD_ISSET(pfd[0], &rio)) {

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

MPI_BYTE error in allreduce on cluster - parallel-processing

Related

MPI: How to ensure a subroutine is executed only on one processor on the default node?

show location of biggest memory leak on Windows

scanf,fgets, fgetc get skipped inside loop

Task switching using a queue

Pipe and select : sample code not working

Categories

Resources