Running MPI on two hosts - cluster-computing

I've looked through many examples and I'm still confused. I've compiled a simple latency check program from here, and it runs perfectly on one host, but when I try to run it on two hosts it hangs. However, running something like hostname runs fine:
[hamiltont#4 latency]$ mpirun --report-bindings --hostfile hostfile --rankfile rankfile -np 2 hostname
[4:16622] [[5908,0],0] odls:default:fork binding child [[5908,1],0] to slot_list 0
4
[5:12661] [[5908,0],1] odls:default:fork binding child [[5908,1],1] to slot_list 0
5
But here is the compiled latency program:
[hamiltont#4 latency]$ mpirun --report-bindings --hostfile hostfile --rankfile rankfile -np 2 latency
[4:16543] [[5989,0],0] odls:default:fork binding child [[5989,1],0] to slot_list 0
[5:12582] [[5989,0],1] odls:default:fork binding child [[5989,1],1] to slot_list 0
[4][[5989,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.0.2.5 failed: Connection timed out (110)
My current guess is that there is something wrong with my firewall rules (e.g. hostname does not communicate between hosts, but the latency program does).
[hamiltont#4 latency]$ cat rankfile
rank 0=10.0.2.4 slot=0
rank 1=10.0.2.5 slot=0
[hamiltont#4 latency]$ cat hostfile
10.0.2.4 slots=2
10.0.2.5 slots=2

There are two kinds of communication involved in running an Open MPI job. First the job has to be launched. Open MPI uses a special framework to support many kinds of launches and you are probably using the rsh remote login launch mechanism over SSH. Obviously your firewall is correctly set up to allow SSH connections.
When an Open MPI job is launched and the processes are true MPI programs, they connect back to the mpirun process that spawned the job and learn all about the other processes in the job, most importantly the available network endpoints at each process. This message:
[4][[5989,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.0.2.5 failed: Connection timed out (110)
indicates that the process which runs on host 4 is unable to open a TCP connection to the process which runs on host 5. The most common reason for that is the presence of a firewall, which limits the inbound connections. So checking your firewall is the first thing to do.
Another common reason is if on both nodes there are additional network interfaces configured and up, with compatible network addresses, but without the possibility to establish connection between them. This often happens on newer Linux setups where various virtual and/or tunnelling interfaces are being brought up by default. One can instruct Open MPI to skip those interfaces by listing them (either as interface names or as CIDR network addresses) in the btl_tcp_if_exclude MCA parameter, e.g.:
$ mpirun --mca btl_tcp_if_exclude "127.0.0.1/8,tun0" ...
(one always have to add the loopback interface if setting btl_tcp_if_exclude)
or one can explicitly specify which interfaces to be used for communication by listing them in the btl_tcp_if_include MCA parameter:
$ mpirun --mca btl_tcp_if_include eth0 ...
Since the IP address in the error message matches the address of your second host in the hostfile, then the problem must come from an active firewall rule.

Related

How to invoke InfiniBand with OpenMPI

I would like to know how to invoke InfiniBand hardware on CentOS 6x cluster with OpenMPI (static libs.) for running my C++ code. This is how I compile and run:
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib -Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "-mca btl tcp,self" reveals that TCP is used, and the cluster has InfiniBand.
What should be changed in compiling and running commands for InfiniBand to be invoked? If I just replace "-mca btl tcp,self" with "-mca btl openib,self" then I get plenty of errors with relevant one saying:
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Thanks very much!!!

Software network interface tunnel to localhost on OSX

I want to verify that my network monitoring program on Mac can handle network interfaces that come and go. For example, the user could attach a Wifi adapter via Thunderbolt, and my program must notice that.
So, I set up Python server to run in in localhost:8000. Running wget http://localhost:8000 on the command line gives me a valid response from the Python server. Direct communication with the localhost succeeds. So far so good.
Next, I wrote a Python script, setting up a software network interface, tunneling traffic from 10.0.2.1 to localhost. However, the tunnel is obviously not correctly set up because the script hangs on the wget part:
import os
try:
os.system("ifconfig gif6 create")
os.system("ifconfig gif6 inet 10.0.2.1 127.0.0.1 up")
os.system("wget http://10.0.2.1:8000")
finally:
os.system("ifconfig gif6 destroy")
What am I doing wrong when trying to set up the 10.0.2.1 <-> 127.0.0.1 tunnel? There is probably something wrong in the ifconfig commands but I'm unable to figure it out.

Adding processes to two different remote hosts

I have several servers that I'm planning to use to run some simulations in Julia. The problem is, I can only add remote processes to a single server. If i try to add the processes to the next server I get an error. This is what I'm trying to do and what I get
addprocs(["user#host1"], tunnel=true, dir="~/julia-483dbf5279/bin/", sshflags=`-p 6969`)
addprocs(["user#host2"], tunnel=true, dir="~/julia-483dbf5279/bin/", sshflags=`-p 6969`)
id: cannot find name for group ID 350
fatal error on 6: ERROR: connect: host is unreachable (EHOSTUNREACH)
in wait at ./task.jl:284
in wait at ./task.jl:194
in stream_wait at stream.jl:263
in wait_connected at stream.jl:301
in Worker at multi.jl:113
in anonymous at task.jl:905
Worker 6 terminated.
The host is reachable and I can connect to it via ssh. I had a similar problem when adding local processes, as I explained in this stackoverflow question:
Combining local processes with remote processes in Julia

How to start the echo service on OSX Mountain Lion to respond to autossh monitoring

I have the following two machines :
Machine A. OSX Machine which will act as the SSH server
Machine B. An SSH client connecting to the above using autossh
autossh allows persistent, self-healing connections to be made and restarts the child ssh process if it exited abnormally (the man page has details about what 'abnormal' means). Specifically, I am interested in allowing scenario #4 to work:
Periodically (by default every 10 minutes), autossh attempts to pass
traffic on the monitor forwarded port. If this fails, autossh will
kill the child ssh process (if it is still running) and start a new
one
Questions:
What is the recommended monitoring approach with autossh? Would it be monitoring on specific ports or using the echo service ? Or would it be to rely on OpenSSH's ServerAliveDelay so that ssh connections exit timely and disable autossh monitoring altogether?
If it's the latter, how do I start the echo service on osx? From the wiki page of inetd :
As of version Mac OS X v10.4, Apple has merged the functionality of
inetd into launchd.
Therefore, how do I use launchd to start the echo service on osx Mountain Lion?
I would suggest that you stick with ServerAliveDelay and ServerAliveInterval, both of which can be specified inside a local ssh_config file. This will obviate the need for either a monitoring port or the use of the echo service, and will allow the ssh client to handle graceful exits while enabling autossh to restart lost connections.

parallel ipython/ipcluster through head node

I want to use the parallel capabilities of ipython on a remote computer cluster. Only the head node is accessible from the outside. I have set up ssh keys so that I can connect to the head node with e.g. ssh head and from there I can also ssh into any node without entering a password, e.g. ssh node3. So I can basically run any commands on the nodes by doing:
ssh head ssh node3 command
Now what I really want to do is to be able to run jobs on the cluster from my own computer from ipython. The way to set up the hosts to use in ipcluster is:
send_furl = True
engines = { 'host1.example.com' : 2,
'host2.example.com' : 5,
'host3.example.com' : 1,
'host4.example.com' : 8 }
But since I only have a host name for the head node, I don't think I can do this. One option is to set us ssh tunneling on the head node, but I cannot do this in my case, since this requires enough ports to be open to accommodate all the nodes (and this is not the case). Are there any alternatives?
I use ipcluster on the NERSC clusters by using the PBS queue:
http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode
in summary you submit jobs which runs mpiexec ipengine, (after having launched ipcontroller on the login node). Do you have PBS on your cluster?
this was working fine with ipython .10, it is now broken in .11 alpha.
I would set up a VPN server on the master, and connect to that with a VPN client on my local machine. Once established, the virtual private network will allow all of the slaves to appear as if they're on the same LAN as my local machine (on a "virtual" network interface, in a "virtual" subnet), and it should be possible to ssh to them.
You could possibly establish that VPN over SSH ("ssh tunneling", as you mention); other options are OpenVPN and IPsec.
I don't understand what you mean by "this requires enough ports to be open to accommodate all the nodes". You will need: (i) one inbound port on the master, to provide the VPN/tunnel, (ii) inbound SSH on each slave, accessible from the master, (iii) another inbound port on each slave, over which the master drives the IPython engines. Wouldn't (ii) and (iii) be required in any setup? So all we've added is (i).

Resources