I am new in MPI, I have some doubts regarding Job creation and launching. I tried to figure it
out but things are quite messy for me. So the cluster architecture on which i am working is like this- There are four nodes(A,B,C,D) connected to each other, MPICH2 is installed on each node. mpiexec -info gives...
.....Configure options: '--prefix=/usr/local/mpich2-1.4.1-install/' '--with-pm=hydra' ....
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc plpa
Resource management kernels available: user slurm ll lsf sge pbs
According to my knowledge(Please correct me if i am wrong) PMI is process management interface, hydra, mpirun, mpiexec are process manager, PMI provides way to interact PM with processes if we are using different PMs. So my doubts are -
1, why it is showing PMI as Process Manager?
2, Is there any role of pbs?
3, Who is responsible for creating the copy of executable on different nodes?(I am launching job from node A).
I know question is very lengthy, I will be thankful for suggestion of some good resources.
There are two types of clusters - those who are under the control of some distributed resource manager (DRM) like PBS, LSF, S/OGE, etc. and those who are not. A typical DRM provides mechanisms to launch remote processes within the granted allocation and to control those processes, e.g. send them signals and get back information about their launch and termination statuses. When the cluster is not under the control of a DRM, the MPI runtime has to implement its own process management. Different MPI libraries have different approaches but almost all of them boil down to starting via rsh or ssh a daemon on the remote nodes to take care of the remote processes. Even when a DRM is in use, the library might still put its own process manager in between in order to provide portability.
MPICH comes with two process managers: MPD and Hydra. MPD stands for Multi-Purpose Daemon and is now considered legacy. Hydra is newer and better as it provides topology-aware process binding and other goodies. No matter what process manager is in use, the library has to talk to it somehow, e.g. obtain launch information or request that new processes are launched during MPI_COMM_SPAWN. This is done through the PMI interface.
That being said, the mpiexec in your case is the Hydra process manager. The information that you list are the capabilities of Hydra itself. Since MPICH and its derivatives (e.g. Intel MPI) are probably the only MPI implementations that uses Hydra, the latter doesn't need to provide any other process management interface than the one that is native to MPICH, namely PMI. The launchers are the mechanisms that Hydra could use in order to launch remote processes. ssh and rsh are the obvious choices when no DRM is in use. fork is for starting processes on the local node. Resource management kernels are mechanisms for Hydra to interact with DRMs in order to determine things like granted allocations. Some of those can also launch processes, e.g. pbs uses the tm interface of PBS or Torque.
To summarise:
1) Hydra implements the PMI interface in order to be able to talk to MPICH. It doesn't understand other interfaces, e.g. it cannot launch MPI executables compiled against Open MPI.
2) Hydra integrates with PBS-like DRMs (PBSPro, Torque). The integration means that, for example, you don't have to provide a list of hosts to mpiexec since the list of granted nodes is obtained automatically. It also uses the native tm interface of PBS to launch and monitor remote processes.
3) On a higher level, Hydra launches the remote copies. Ultimately, this is done either by the DRM or via rsh/ssh.
Related
I have a cluster that does not allow direct ssh access, but does permit me to submit commands through a proprietary interface.
Is there any way to either manually launch OpenMPI jobs, or documentation on how to write a custom launcher?
I don't think you can do it without breaking some kind of agreement.
I assume you have some kind of a web-based interface that allows you to fill certain fields and maybe upload data. Or something similar. What this interface will probably do - is it's going to generate a request/file for a scheduler. Most likely, SGE or PBS. The direct access to the cluster is limited in order to
organize task priorities and order
prevent users from hogging the machines
make it easier to launch complicated tasks requiring complicated machine(s) configuration
So, you, effectively, want to go around a scheduler. I don't think you can or you should.
However, usually, the clusters have so-called, head nodes which would allow SSH access to them. These nodes would serve as a place to submit scheduler requests from them and, maybe, do small compilation/result processing (with very limited resources). Such configuration would eliminate the web interface but still leaves a very important scheduler for a cluster that is used by many people concurrently.
Server Scenario:
Ubuntu 12.04 LTS
Torque w/ Maui Scheduler
Hadoop
I am building a small cluster (10 nodes). The users will have the ability to ssh into any child node(LDAP Auth) but this is really unnecessary since all computation jobs they want to run can be submitted on the head node using torque, hadoop, or other resource managers tied with a scheduler to insure priority and proper resource allocation throughout the nodes. Some users will have priority over others.
Problem:
You can't force a user to use a batch system like torque. If they want to hog all the resources on one node or the head node they can just run their script / code directly from their terminal / ssh session.
Solution:
My main users or "superusers" want me to set up a remote login timeout which is what their current cluster uses to eliminate this problem. (I do not have access to this cluster so I can not grab the configuration). I want to setup a 30 minute timeout on all remote sessions that are inactive(keystrokes), if they are running processes I also want the session to be killed along with all job processes. This will eliminate people from NOT using an available batch system / scheduler.
Question:
How can I implement something like this?
Thanks for all the help!
I've mostly seen sys admins solve this by not allowing ssh access to the nodes (often done using the pam module in TORQUE), but there are other techniques. One is to use pbstools. The reaver script can be setup to kill user processes that aren't part of jobs (or shouldn't be on those nodes). I believe it can also be configured to simply notify you. Some admins forcibly kill things, others educate users, that part is up to you.
Once you get people using jobs instead of ssh'ing directly, you may want to look into the cpuset feature in TORQUE as well. It can help you as you try to get users to use the amount of resources they request. Best of luck.
EDIT: noted that the pam module is one of the most common ways to restrict ssh access to the compute nodes.
I guess I have missed the obvious, maybe, but I am lost for a good answer.
I am developing a stand alone program that will be running on a Linux (Ubuntu?) embedded PC inside a piece of hardware. I want it to be the "thing" SNMP talks to. Well, short of compiling in my own SNMD "daemon" code and persuading Linux to let a general user have access to port 161, I think I'll opt for Net-SNMP's snmpd. I am open to suggestions for better products to use. LGPL, BSD, MIT, licenses, please.
I am working separately on the MIB and assigning OIDs, etc. I know what vars I want to set and get, etc.
I have read and reread the stuff on making an SNMP/snmpd Agent and/or subagent. Near as I can tell, they are both compiled into snmp or linked to it as a shared library. Right?
So, how do I get that agent to talk to my sepaprate program running in a separate general user session? Is there a direct technique to use? D-Bus? ppen()? Named pipes? Shared memory? Temp files? UDP port? Something better? Or do I really want to turn my program into a .SO and let snmpd launch it? I assume at that point I'd be abe to tell snmpd where to call in to me to get/set vars. Right?
Thanks!
The "AgentX" protocol is a way for arbitrary applications to supply SNMP services to a running system SNMP daemon. Your application listens on some port other than 161 (typically a library will take care of the details for you), and the system snmpd will forward requests for your OIDs to your subagent. This method doesn't involve linking any code into the system snmpd.
Often an easier way is to configure the system snmpd to run a script to get or set data. The script can, if you like, use some other kind of IPC to talk to your application (such as JSON to an HTTP server, for example).
I am developing an application on MAC OS . It has 2 parts -- a UI element and a daemon (which needs to run continuously and must restart on being killed). Currently I am using launchctl to restart the daemon.
But there is another issue. I need the 2 parts of my application to communicate with each other . For this I am using distibuted objects for the same (as given here) . However this does not work when I launch the daemon with launchctl. Can anyone suggest some alternative???
I use NSDistributedNotifications to handle this pretty well in one app, even on 10.7. You have to do your own handshaking since this can be lossy (i.e. include an ack notification and resend in case of timeouts). A side effect of this approach is that if there are multiple clients running (particularly under fast user switching), all of them receive the notifications. That's good in the particular case of this app. It's also extremely simple to implement.
For another app, I use two FIFOs. The server writes to one and reads from the other. The client does the opposite. You can of course also use a network socket to achieve the same thing. I tend to prefer FIFOs because you don't have to do deal with locking down a network socket.
That said, what problem are you seeing using distributed objects under launchd? Are you just seeing problems on 10.7 (which changed the rules around the launchd context)?
Are you using launchd to lazy-load the daemon when the port is accessed (this is the normal way to do it). Have you considered using a launchagent instead of a launchdaemon?
EDIT:
Ah... the bootstrap server. Yes. You need to execute things in the correct bootstrap context in order to talk to them. The bootstrap context for the login session is rooted to the windowserver process. LaunchDaemons run in a different context, so they can't directly communicate with the login sessions. Some background reading:
Starting/stopping a launchd agent for all users with GUI sessions
How can you start a LaunchAgent for the first time without rebooting, when your code runs as a LaunchDaemon?
launch agent from daemon in user context
I am not aware of anyway to get processes into the correct context without using launchctl bsexec. Launchd technically has an API (launchctl uses it), but it is not well documented. You can pull the source from opensource.apple.com.
Even if you stay with NSDistributedObject, I would try to use something other than the bootstrap service if you can. As I mentioned, I tend to use other tools and avoid NSDistributedObject. In my opinion, for the same reasons that REST is better than SOAP, simple protocols are usually better than remote objects. (YMMV)
If you are launching your daemon using sudo launchctl; You should not use CFMessagePort and Distributed object for IPC. CFMessagePort and Distributed object are implemented using the bootstrap service(Many Mac OS X subsystems work by exchanging Mach messages with a central service. For such a subsystem to work, it must be able to find the service. This is typically done using the Mach bootstrap service, which allows a process to look up a service by name).
If you will use DO or CFMessagePort; you will run into bootstrap namespace problem.
when you will launch your daemon using sudo launchctl ; your service is register in root bootstrap namespace so your clients(running in user mode) will not able to use that services.
you can check bootstrap service using
$ launchctl bslist
$ sudo launchctl bslist // If you are using sudo lunchctl
You should use UNIX Domain Sockets. UNIX domain sockets are somewhat like TCP/IP sockets, except that the communication is always local to the computer. You access UNIX domain sockets using the same BSD sockets API that you'd use for TCP/IP sockets. The primary difference is the address format. For TCP/IP sockets, the address structure (that which you pass to bind, connect, and so on) is (struct sockaddr_in), which contains an IP address and port number. For UNIX domain sockets, the address structure is (struct sockaddr_un), which contains a path.For an example of using UNIX domain sockets in a client/server environment, see Sample Code 'CFLocalServer'.
Take a look at this Technical Note TN2083 Daemons and Agents
Daemon IPC Recommendations
Mach Bootstrap Basics
Each user has a separate Mach namespace .You cannot communicate
between namespaces. You'll need to use sockets (NSSocketPort)
instead, which are not limited in such ways.[1]
I am wondering about actual examples or instances of inter process communication (IPC) which we encounter on a daily basis (which happen under the hood or otherwise) while using our laptop/desktop. I have always read about these theoretically from a textbook.
For example:
Between a parent process and child processes: one example of this in Linux I know is when a shell starts other processes and we can kill those processes using their process IDs.
Between two unrelated (in hierarchy) but cooperating processes?
One way of doing IPC on the two cases you mentioned is using sockets.
I recommend taking a look at Beej's Guide to Unix Interprocess Communication for information and examples.
Some examples of IPC we encounter on a daily basis:
X applications communicate with the X
server through network protocols.
Pipes are a form of IPC: grep foo file | sort
Servers like Apache spawn child processes to handle requests.
many more I can't think of right now
And I am not even mentioning examples of IPC where the processes are on different computers.