I am wondering about actual examples or instances of inter process communication (IPC) which we encounter on a daily basis (which happen under the hood or otherwise) while using our laptop/desktop. I have always read about these theoretically from a textbook.
For example:
Between a parent process and child processes: one example of this in Linux I know is when a shell starts other processes and we can kill those processes using their process IDs.
Between two unrelated (in hierarchy) but cooperating processes?
One way of doing IPC on the two cases you mentioned is using sockets.
I recommend taking a look at Beej's Guide to Unix Interprocess Communication for information and examples.
Some examples of IPC we encounter on a daily basis:
X applications communicate with the X
server through network protocols.
Pipes are a form of IPC: grep foo file | sort
Servers like Apache spawn child processes to handle requests.
many more I can't think of right now
And I am not even mentioning examples of IPC where the processes are on different computers.
Related
I am new in MPI, I have some doubts regarding Job creation and launching. I tried to figure it
out but things are quite messy for me. So the cluster architecture on which i am working is like this- There are four nodes(A,B,C,D) connected to each other, MPICH2 is installed on each node. mpiexec -info gives...
.....Configure options: '--prefix=/usr/local/mpich2-1.4.1-install/' '--with-pm=hydra' ....
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc plpa
Resource management kernels available: user slurm ll lsf sge pbs
According to my knowledge(Please correct me if i am wrong) PMI is process management interface, hydra, mpirun, mpiexec are process manager, PMI provides way to interact PM with processes if we are using different PMs. So my doubts are -
1, why it is showing PMI as Process Manager?
2, Is there any role of pbs?
3, Who is responsible for creating the copy of executable on different nodes?(I am launching job from node A).
I know question is very lengthy, I will be thankful for suggestion of some good resources.
There are two types of clusters - those who are under the control of some distributed resource manager (DRM) like PBS, LSF, S/OGE, etc. and those who are not. A typical DRM provides mechanisms to launch remote processes within the granted allocation and to control those processes, e.g. send them signals and get back information about their launch and termination statuses. When the cluster is not under the control of a DRM, the MPI runtime has to implement its own process management. Different MPI libraries have different approaches but almost all of them boil down to starting via rsh or ssh a daemon on the remote nodes to take care of the remote processes. Even when a DRM is in use, the library might still put its own process manager in between in order to provide portability.
MPICH comes with two process managers: MPD and Hydra. MPD stands for Multi-Purpose Daemon and is now considered legacy. Hydra is newer and better as it provides topology-aware process binding and other goodies. No matter what process manager is in use, the library has to talk to it somehow, e.g. obtain launch information or request that new processes are launched during MPI_COMM_SPAWN. This is done through the PMI interface.
That being said, the mpiexec in your case is the Hydra process manager. The information that you list are the capabilities of Hydra itself. Since MPICH and its derivatives (e.g. Intel MPI) are probably the only MPI implementations that uses Hydra, the latter doesn't need to provide any other process management interface than the one that is native to MPICH, namely PMI. The launchers are the mechanisms that Hydra could use in order to launch remote processes. ssh and rsh are the obvious choices when no DRM is in use. fork is for starting processes on the local node. Resource management kernels are mechanisms for Hydra to interact with DRMs in order to determine things like granted allocations. Some of those can also launch processes, e.g. pbs uses the tm interface of PBS or Torque.
To summarise:
1) Hydra implements the PMI interface in order to be able to talk to MPICH. It doesn't understand other interfaces, e.g. it cannot launch MPI executables compiled against Open MPI.
2) Hydra integrates with PBS-like DRMs (PBSPro, Torque). The integration means that, for example, you don't have to provide a list of hosts to mpiexec since the list of granted nodes is obtained automatically. It also uses the native tm interface of PBS to launch and monitor remote processes.
3) On a higher level, Hydra launches the remote copies. Ultimately, this is done either by the DRM or via rsh/ssh.
I'm working on a Ruby script that will be making hundreds of network requests (via open-uri) to various APIs and I'd like to do this in parallel since each request is slow, and blocking.
I have been looking at using Thread or Process to achieve this but I'm not sure which method to use.
With regard to network request, when should i use a Thread over Process, or does it not matter?
Before going into detail, there is already a library solving your problem. Typhoeus is optimized to run a large number of HTTP requests in parallel and is based on the libcurl library.
Like a modern code version of the mythical beast with 100 serpent
heads, Typhoeus runs HTTP requests in parallel while cleanly
encapsulating handling logic.
Threads will be run in the same process as your application. Since Ruby 1.9 native threads are used as the underlying implementation. Resources can be easily shared across threads, as they all can access the mutual state of the application. The problem, however, is that you cannot utilize the multiple cores of your CPU with most Ruby implementations.
Ruby uses the Global Interpreter Lock (GIL). GIL is a locking mechanism to ensure that the mutual state is not corrupted due to parallel modifications from different threads. Other Ruby implementations like JRuby, Rubinius or MacRuby offer an approach without GIL.
Processes run separately from each other. Processes do not share resources, which means every process has its own state. This can be a problem, if you want to share data across your requests. A process also allocates its own stack of memory. You could still share data by using a messaging bus like RabitMQ.
I cannot recommend to use either only threads or only processes. If you want to implement that yourself, you should use both. Fork for every n requests a new processes which then again spawns a number of threads to issue the HTTP requests. Why?
If you fork for every HTTP request another process, this will result in too many processes. Although your operating system might be able to handle this, the overhead is still tremendous. Some HTTP requests might finish very fast, so why bother with an extra process, just run them in another thread.
I am doing a comparison of different IPC mechanisms available on Mac OS X (pipes, sockets, System V IPC, etc.), and I would like to see how Mach ports compare to the higher-level alternatives. However, I've run into a very basic issue: getting send rights to ports across processes (specifically, across a parent process and a child process).
Unlike file descriptors, ports are generally not carried over to forked processes. This means that some other way to transfer them must be established. Just about the only relevant page I could find about this was this one, and they state in an update that their method no longer works and never was guaranteed to, even though that method was suggested by an Apple engineer in 2009. (It implied replacing the bootstrap port, and now doing that breaks XPC.) The replacement they suggest uses deprecated functions, so that's not a very appealing solution.
Besides, one thing I liked about the old solution is that ports remained pretty much private between the processes that used it. There was no need to broadcast the existence of the port, just like pipes (from the pipe call) work once forked. (I'll probably live with it if there's another solution, but it's a little annoying.)
So, how do you pass a send right to a Mach port from a parent process to a child process?
bootstrap_register is deprecated but bootstrap_check_in isn't, and can be used to register your port which can later be retrieved by the child process by using bootstrap_look_up. (This still doesn't provide the privacy you're looking for, unfortunately).
The recommended solution is to not use Mach IPC directly at all but implementing your child process as an XPC service, in which case you can use the XPC API that will use Mach IPC behind the scene, yet you don't have to deal with any details. You have an easy API to send XPC messages in the parent and an easy API to receive XPC messages in the client, that can also pass back replies easily. The system will handle all the hard parts for you.
https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingXPCServices.html
If you cannot use the XPC API, keep in mind that when you register your service with bootstrap_check_in() (which is not deprecated), it won't be private, but if you do so in a user space process, it will be private to your login session: root processes won't see it, processes of other users neither. If you do that in a root process, it will be visible to all sessions, though.
Also note however, that you can control who may send you IPC messages and who not. You can request a mach_msg_audit_trailer_t when receiving a mach message. That way you get access to the audit_token_t of the sender. And using audit_token_to_pid() you can get the pid_t of the sender. As you know the PID of your child, you can simply ignore all messages (passing it to mach_msg_destroy() to avoid leaking resources), unless the message was sent by your child process. So you cannot avoid your port to be discover-able, but you can avoid that any process other than your child process may use this port.
And last not but not least, you can just give your port a random name, after all only your child process needs to know it, so you can dynamicall generate a name in the parent process and the pass it along to your child process, that way your port can be seen if software scans for ports but most software just uses hardcoded names anyway.
One thing you might try (although it's a gross hack) is hijacking the exception ports as an inheritance mechanism. Set a custom port as an exception port in the parent, fork the child, have the child get the custom port from its exception port, send its task port to the parent, the parent resets its exception port, resets the child's exception port, and then the two proceed from there with a communication channel. See task_set_exception_ports().
I am looking for a good way to poll a lot of servers for their status through TCP. I am currently using synchronous code and the Minecraft Query Protocol, but whenever a server is offline the rest of the queue gets hold up.
Another problem I am experiencing with my current code is that some servers tend to block my server I use for polling in their firewall, and thus their servers appear offline on my serverlist.
I am using a Ruby rake task with an infinite loop in which every Minecraft server in my MongoDB database gets checked and updated every +- 10 minutes (I try to set this interval by letting the loop sleep (600/ s.count.to_i).ceil seconds.
Is there any way I can do this task efficiently (and prevent servers from blacklisting my IP in their firewall), preferably with Async code in Ruby?
You need to use non-blocking sockets to check - multithreading. The best thing to do is spawn several threads at once to check several servers at once - that way your main thread won't get held up.
This question contains a lot of information about multithreading in Ruby - you should be able to spawn multiple concurrent threads at once, or at least use non-blocking sockets.
Another point given by #Lie Ryan, you can use IO.Select to poll a array of servers, all at once. It will return an array of "online" servers when it's done - this could be more elegant than spawning multiple threads.
Question:
Can I use the multiprocessing module together with gevent on Windows in an efficient way?
Scenario:
I have a gevent based Python application doing asynchronous I/O on Windows. The application is mostly I/O bound, but there are spikes of higher CPU load as well. This application would need to control a console application via its stdin and stdout. I cannot modify this console application and the user will be able to use his own custom one, only the text (line) based communication protocol is fixed.
I have a working implementation using subprocess and threads, but I would rather move the whole subprocess based communication code together with those threads into a separate process to turn the main application back to single-threaded. I plan to use the multiprocessing module for this.
Prior reading:
I have been searching the Web a lot and read some source code, so I know that the multiprocessing module is using a Pipe implementation based on named pipes on Windows. A pair of multiprocessing.queue.Queue objects would be used to communicate with the second Python process. These queues are based on that Pipe implementation, e.g. the IPC would be done via named pipes.
The key question is, whether calling the incoming Queue's get method would block gevent's main loop or not. There's a timeout for that method, so I could make it into a loop with a small timeout, but that's not a good solution, since it would still block gevent for small time periods hurting its low I/O latency.
I'm also open to suggestions on how to circumvent the whole problem of using pipes on Windows, which is known to be hard and sometimes fragile. I'm not sure whether shared memory based IPC is possible on Windows or not. Maybe I could wrap the console application in a way which would allow communicating with the child process using network sockets, which is known to work well with gevent.
Please don't question my primary use case, if possible. Thanks.
The Queue's get method is really blocking. Using it with timeout could potentially solve your problem, but it definitely won't be a cleanest solution and, which is the most important, will introduce extra latency for no good reason. Even if it wasn't blocking, that won't be a good solution either. Just because non-blocking itself is not enough, the good asynchronous call/API should smoothly integrate into the I/O framework in use. Be that gevent for Python, libevent for C or Boost ASIO for C++.
The easiest solution would be to use simple I/O by spawning your console applications and attaching to its console in and out descriptors. There are at two major factors to consider:
It will be extremely easy for your clients to write client applications. They will not have to work with any kind of IPC, socket or other code, which could be very hard thing for many. With this approach, application will just read from stdin and write to stdout.
It will be extremely easy to test console applications using this approach as you can manually start them, enter text into console and see results.
Gevent is a perfect fit for async read/write here.
However, the downside is that you will have to start this application, there will be no support for concurrent communication with it, and there will be no support for communication over network. There is even a good example for starters.
To keep it simple but more flexible, you can use TCP/IP sockets. If both client and server are running on the same machine. Also, a good operating system will use IPC as an underlying implementation, so it will be fast. And, if you are worrying about performance of this case, you probably should not use Python at all and look at other technologies.
Even fancies solution – use ZeroC ICE. It is very modern technology allowing almost seamless inter-process communication. It is a CORBA killer, very easy to use. It is heavily used by many, proven to be fastest in its class and rock stable. The beauty of this solution is that you can seamlessly integrate programs in many different languages, like Python, Java, C++ etc. But this will require some of your time to get familiar with a concept. If you decide to go this way, just spend a day reading trough documentation.
Hope it helps. Good luck!
Your question is already quite old. Nevertheless, I would like to recommend http://gehrcke.de/gipc which -- I believe -- would tackle the outlined challenge in a very straight-forward fashion. Basically, it allows you to integrate multiprocessing-based child processes anywhere in your application (also on Windows). Interaction with Process objects (such as calling join()) is gevent-cooperative. Via its pipe management, it allows for cooperatively blocking inter-process communication. However, on Windows, IPC currently is much less efficient than on POSIX-compliant systems (since non-blocking I/O is imitated through a thread pool). Depending on the IPC messaging volume of your application, this might or might not be of significance.