What is the zmq underlying transport on linux? - zeromq

When ZeroMQ is used with inproc:// or ipc:// transport on linux, what exactly is the underlying implementation or operating system construct does it use ? Is it Unix Domain Sockets ? or (named) pipes ? anything else ?

The inproc:// Transport-class, where both Access-Points of the Scalable Formal Communication Pattern archetype are within the same process (address-space), there is no "underlying" transport needed at all. It uses just a zero-copy (pointer-based) memory-address pointing, instead of any byte-moving. This is why this Transport-class (if no other is used) does not even require having a Context()-instance (no engine, no high-level queuing buffers, we can imagine it to work just by passing pointers to a new message to the opposite side)
The ipc Transport-class uses UNIX domain sockets ( that is why it cannot be used on O/S-es, that do not provide UNIX domain sockets. Some bindings yet may circumvent this fact by emulating TCP-based proxy-transport and shuffle the "outer" ipc:-declared Transport-class services "under the hood" by such a proxy-transport locally available inside such O/S, if all of the counterparties "externally"-claimed to be using ipc: Transport-class do communicate using this same emulated-service trick )

Related

How is FastCGI implemented under Windows?

The official FastCGI documentation says that stdin is repurposed as a listening socket when a FastCGI module is started. That's great on Linux, where stdin and sockets are all ints, but I don't think it could it work on Windows, where stdin is a FILE*, and a socket is a HANDLE.
Since Windows servers do support FastCGI, someone has either found a way to make them compatible, or redefined the system for that OS. My Google-fu doesn't seem to be up to locating how though. Where can I find documentation on it?
FastCGI defines only the message exchange protocol, but people behind FastCGI also provide one implementation of that protocol for C++. In this implementation your app must use provided FCGX_Request object to rewire three provided FCGX_Stream objects to the usual ones (cin, cout, cerr). But I suspect that you don't have to rewire the streams, and can use them directly. Check out this FastCGI Hello World to see how it's done.
So, your app does not see HANDLE or FILE*. It sees instead fcgi_streambuf, which inherits from std::streambuf. The way the previously mentioned protocol is implemented is just a detail that you're not supposed to be concerned with. The implementation gets hold of a stream of bytes and provides it to the app, and also the other way around.

Where is send() implemented in OpenMPI?

In OpenMPI, if I follow the call stack of any collective operation (e.g MPI_Reduce) deep enough, I find that it calls a function called send().
After a lot of grepping, I'm not sure where send() is implemented. I suspect that send() may be buried inside of a macro or obscure shim layer of some sort.
Where are the implementation(s) of send() located in the OpenMPI codebase?
I'm looking at OpenMPI v1.8.1, though I suspect that the organization of the sorce tree hasn't changed that much between versions.
send(2) is the BSD socket system call for sending data over network sockets. It is ultimately used by the tcp BTL of Open MPI to perform the actual network transfer from one process to another and its implementation is to be found in the source code of the standard C library and in the OS kernel.
If you are interested in the actual higher-level mechanism that Open MPI uses to transmit messages from one rank to another over TCP/IP networks, then the tcp BTL itself is to be found in $OMPI_SOURCE/ompi/mca/btl/tcp/ (for older Open MPI versions) or in $OMPI_SOURCE/opal/mca/btl/tcp/ (for newer versions).

Is libpcap faster than reading a socket for inter-process communication on localhost?

I have a (legacy) specialized packet sniffing application which sniffs the Ethernet using libpcap and analyzes the received data. "The analyzer"
I'm adding another process which reads "data" from a PCI card and I'd like to feed that data into the analyzer. "The sender".
Both the sender and analyzer are on the same host running in different processes.
On the sender side, its easy enough to read the PCI card and send the data over a socket. However, on the receiving side I could either
a) modify the existing libpcap code and set an appropriate filter, or
b) just open and read a socket
Speed and performance is the important parameter. There are several pairs of sender/receiver processes running and the total across all of them is about 1 Gb/s.
Any insight on which method would be faster, more efficient, or "better" ?
Modifying the libpcap receiver code would be pretty messy, but reading other posts, pcap should be using lots of tricks to improve performance (mmap, etc).
(But wouldn't reading a local socket use those same tricks?)
Thanks!
(system environment is Centos 3.16 kernel)

Named pipes vs. UDP for IPC on Windows

Why Named Pipes are preferable for IPC (Inter Process Comunication) on local Windows machine over UDP? Or UDP sometimes might be somewhere better?
UDP packets even on localhost can be lost. Also, as UDP is datagram-based and has no guaranteed delivery, it's hard to transfer larger data blocks. Finally, UDP on localhost is sometimes blocked by browsers. In general, UDP is usually not even considered for single-computer IPC.
On Windows I recommend memory-mapped files + synchronization primitives as the fastest and probably the easiest method. Named pipes usually work well when you manage them to work, but I see lots of questions here regarding how to make the named pipes work at all (and I have yet to see a single complaint regarding MMFs).
We have a product, MsgConnect, which provides socket-, UDP- and MMF-based transports, suitable for IPC locally or across network, so I have practical experience with this topic. Named pipes were considered for support but then the idea was discarded in favor of other mechanisms.

Using gevent and multiprocessing together to communicate with a subprocess

Question:
Can I use the multiprocessing module together with gevent on Windows in an efficient way?
Scenario:
I have a gevent based Python application doing asynchronous I/O on Windows. The application is mostly I/O bound, but there are spikes of higher CPU load as well. This application would need to control a console application via its stdin and stdout. I cannot modify this console application and the user will be able to use his own custom one, only the text (line) based communication protocol is fixed.
I have a working implementation using subprocess and threads, but I would rather move the whole subprocess based communication code together with those threads into a separate process to turn the main application back to single-threaded. I plan to use the multiprocessing module for this.
Prior reading:
I have been searching the Web a lot and read some source code, so I know that the multiprocessing module is using a Pipe implementation based on named pipes on Windows. A pair of multiprocessing.queue.Queue objects would be used to communicate with the second Python process. These queues are based on that Pipe implementation, e.g. the IPC would be done via named pipes.
The key question is, whether calling the incoming Queue's get method would block gevent's main loop or not. There's a timeout for that method, so I could make it into a loop with a small timeout, but that's not a good solution, since it would still block gevent for small time periods hurting its low I/O latency.
I'm also open to suggestions on how to circumvent the whole problem of using pipes on Windows, which is known to be hard and sometimes fragile. I'm not sure whether shared memory based IPC is possible on Windows or not. Maybe I could wrap the console application in a way which would allow communicating with the child process using network sockets, which is known to work well with gevent.
Please don't question my primary use case, if possible. Thanks.
The Queue's get method is really blocking. Using it with timeout could potentially solve your problem, but it definitely won't be a cleanest solution and, which is the most important, will introduce extra latency for no good reason. Even if it wasn't blocking, that won't be a good solution either. Just because non-blocking itself is not enough, the good asynchronous call/API should smoothly integrate into the I/O framework in use. Be that gevent for Python, libevent for C or Boost ASIO for C++.
The easiest solution would be to use simple I/O by spawning your console applications and attaching to its console in and out descriptors. There are at two major factors to consider:
It will be extremely easy for your clients to write client applications. They will not have to work with any kind of IPC, socket or other code, which could be very hard thing for many. With this approach, application will just read from stdin and write to stdout.
It will be extremely easy to test console applications using this approach as you can manually start them, enter text into console and see results.
Gevent is a perfect fit for async read/write here.
However, the downside is that you will have to start this application, there will be no support for concurrent communication with it, and there will be no support for communication over network. There is even a good example for starters.
To keep it simple but more flexible, you can use TCP/IP sockets. If both client and server are running on the same machine. Also, a good operating system will use IPC as an underlying implementation, so it will be fast. And, if you are worrying about performance of this case, you probably should not use Python at all and look at other technologies.
Even fancies solution – use ZeroC ICE. It is very modern technology allowing almost seamless inter-process communication. It is a CORBA killer, very easy to use. It is heavily used by many, proven to be fastest in its class and rock stable. The beauty of this solution is that you can seamlessly integrate programs in many different languages, like Python, Java, C++ etc. But this will require some of your time to get familiar with a concept. If you decide to go this way, just spend a day reading trough documentation.
Hope it helps. Good luck!
Your question is already quite old. Nevertheless, I would like to recommend http://gehrcke.de/gipc which -- I believe -- would tackle the outlined challenge in a very straight-forward fashion. Basically, it allows you to integrate multiprocessing-based child processes anywhere in your application (also on Windows). Interaction with Process objects (such as calling join()) is gevent-cooperative. Via its pipe management, it allows for cooperatively blocking inter-process communication. However, on Windows, IPC currently is much less efficient than on POSIX-compliant systems (since non-blocking I/O is imitated through a thread pool). Depending on the IPC messaging volume of your application, this might or might not be of significance.

Resources