Speeding up IPC with Ruby - ruby

I am trying to do IPC between 2 processes on the same Linux box in Ruby, and I need to optimize the solution as far as practicable.
I had begun with a TCPSocket but I see that using UNIXSocket is probably faster, and perhaps does not copy data to kernel buffers.
I have been reading SO threads, and looks like mmap might be interesting to look at. But for mmap I need to a) install mmap gem and b) provide locking since multiple client processes would likely try connect with the server process (both running on same box).
My questions:
What other options would you recommend?
How do you recommend locking the memory with ruby mmap?
How do the numbers, if there's any available, stack up for UNIXSocket versus mmap?

Related

Ruby On Rails, and multi threading

I came across that Ruby doesn't really have any performance benefit when you do multi threading. because of GIL nature.
I see there is no point of using multi-threading in Rails app.
What is use case of multi-threading in Rails app?
An IO (input/output) operation is one that is not operating on your CPU, such as, reading from a hard drive, an API call to a service, a database operation of some kind.
Anything that is IO heavy would benefit from multi-threading even with GIL. IO operations are blocking in ruby while they wait for the result, so it's only reasonable, while you are waiting for the result of the operation, to want to switch to another thread to do some work.

When making network requests, when should I use Threads vs Processes?

I'm working on a Ruby script that will be making hundreds of network requests (via open-uri) to various APIs and I'd like to do this in parallel since each request is slow, and blocking.
I have been looking at using Thread or Process to achieve this but I'm not sure which method to use.
With regard to network request, when should i use a Thread over Process, or does it not matter?
Before going into detail, there is already a library solving your problem. Typhoeus is optimized to run a large number of HTTP requests in parallel and is based on the libcurl library.
Like a modern code version of the mythical beast with 100 serpent
heads, Typhoeus runs HTTP requests in parallel while cleanly
encapsulating handling logic.
Threads will be run in the same process as your application. Since Ruby 1.9 native threads are used as the underlying implementation. Resources can be easily shared across threads, as they all can access the mutual state of the application. The problem, however, is that you cannot utilize the multiple cores of your CPU with most Ruby implementations.
Ruby uses the Global Interpreter Lock (GIL). GIL is a locking mechanism to ensure that the mutual state is not corrupted due to parallel modifications from different threads. Other Ruby implementations like JRuby, Rubinius or MacRuby offer an approach without GIL.
Processes run separately from each other. Processes do not share resources, which means every process has its own state. This can be a problem, if you want to share data across your requests. A process also allocates its own stack of memory. You could still share data by using a messaging bus like RabitMQ.
I cannot recommend to use either only threads or only processes. If you want to implement that yourself, you should use both. Fork for every n requests a new processes which then again spawns a number of threads to issue the HTTP requests. Why?
If you fork for every HTTP request another process, this will result in too many processes. Although your operating system might be able to handle this, the overhead is still tremendous. Some HTTP requests might finish very fast, so why bother with an extra process, just run them in another thread.

Twisted process is huge

A Twisted app I have was constantly getting killed due to memory problems. The program grew in size, consuming all of the system's memory before being shut down by the os. Restart and repeat.
This is on a virtual server, so I doubled the memory, and the issue resolved - the daemon stabilized at around 1.25GB of memory
Does anyone have advice on how I can best profile this to tell what/where all the memory is getting sucked up into ?
If info on the app helps, I'm using the twisted reactor and internet.timer.TimerService to poll a database for items to update through three 'services'. the items to process are pushed into a twisted.internet.defer.DeferredList , and their processing occurs in a deferToThread block. In the deferred process there are a handful of blocking operations ( fetching web pages, etc ) and a lot of HTML parsing ( beautiful soup and other libraries ). I've suggested the reactor.threadpool size to be 10 and each 'service' defers to thread using a SemaphoreService that has 10 tokens. I really expected this daemon to max out at around 400MB of memory, not 3x that.
This is more of a generic share of thoughts how I debug memory leak/usage problems in my twisted applications.
Twisted has a ssh server support, and is something which I add in to almost all of my projects in development.
The ssh provides a interactive python interpreter access to the method which has python garbage collector available and a number of helper functions which allow me to a) inspect count of the instances from a same class, b) start and stop inspection of changes of that count over time and c) to get all references of that class. The nice thing with the interactive interpreter is that it allows ad-hoc introspection of offending instances, their relation to other objects and the state of process they are in. This so far has always proven a valuable instrument to pinpoint exact location where I have forgot / unforseen the ref release problems in my projects.

Optimized ping with threads in ruby, possible?

I am using ruby 1.8.7 and I cannot upgrade to 1.9+ anytime soon.
I understand that ruby has green threads, and anything cpu based does not gain much by way of multithreading.
However, I was trying a multithreaded ping in ruby, as in my script will try to ping N machines in the network -- in the time a machine replies back there's enough time to create a new thread and initiate connection with another host. However what I see is that multithreading has actually worsened performance.
Any suggestions to do an optimized ping with threads in ruby?
You should use EventMachine with an ICMP implementation. The author of icmp4em also provides two examples how to use all this stuff.

Using gevent and multiprocessing together to communicate with a subprocess

Question:
Can I use the multiprocessing module together with gevent on Windows in an efficient way?
Scenario:
I have a gevent based Python application doing asynchronous I/O on Windows. The application is mostly I/O bound, but there are spikes of higher CPU load as well. This application would need to control a console application via its stdin and stdout. I cannot modify this console application and the user will be able to use his own custom one, only the text (line) based communication protocol is fixed.
I have a working implementation using subprocess and threads, but I would rather move the whole subprocess based communication code together with those threads into a separate process to turn the main application back to single-threaded. I plan to use the multiprocessing module for this.
Prior reading:
I have been searching the Web a lot and read some source code, so I know that the multiprocessing module is using a Pipe implementation based on named pipes on Windows. A pair of multiprocessing.queue.Queue objects would be used to communicate with the second Python process. These queues are based on that Pipe implementation, e.g. the IPC would be done via named pipes.
The key question is, whether calling the incoming Queue's get method would block gevent's main loop or not. There's a timeout for that method, so I could make it into a loop with a small timeout, but that's not a good solution, since it would still block gevent for small time periods hurting its low I/O latency.
I'm also open to suggestions on how to circumvent the whole problem of using pipes on Windows, which is known to be hard and sometimes fragile. I'm not sure whether shared memory based IPC is possible on Windows or not. Maybe I could wrap the console application in a way which would allow communicating with the child process using network sockets, which is known to work well with gevent.
Please don't question my primary use case, if possible. Thanks.
The Queue's get method is really blocking. Using it with timeout could potentially solve your problem, but it definitely won't be a cleanest solution and, which is the most important, will introduce extra latency for no good reason. Even if it wasn't blocking, that won't be a good solution either. Just because non-blocking itself is not enough, the good asynchronous call/API should smoothly integrate into the I/O framework in use. Be that gevent for Python, libevent for C or Boost ASIO for C++.
The easiest solution would be to use simple I/O by spawning your console applications and attaching to its console in and out descriptors. There are at two major factors to consider:
It will be extremely easy for your clients to write client applications. They will not have to work with any kind of IPC, socket or other code, which could be very hard thing for many. With this approach, application will just read from stdin and write to stdout.
It will be extremely easy to test console applications using this approach as you can manually start them, enter text into console and see results.
Gevent is a perfect fit for async read/write here.
However, the downside is that you will have to start this application, there will be no support for concurrent communication with it, and there will be no support for communication over network. There is even a good example for starters.
To keep it simple but more flexible, you can use TCP/IP sockets. If both client and server are running on the same machine. Also, a good operating system will use IPC as an underlying implementation, so it will be fast. And, if you are worrying about performance of this case, you probably should not use Python at all and look at other technologies.
Even fancies solution – use ZeroC ICE. It is very modern technology allowing almost seamless inter-process communication. It is a CORBA killer, very easy to use. It is heavily used by many, proven to be fastest in its class and rock stable. The beauty of this solution is that you can seamlessly integrate programs in many different languages, like Python, Java, C++ etc. But this will require some of your time to get familiar with a concept. If you decide to go this way, just spend a day reading trough documentation.
Hope it helps. Good luck!
Your question is already quite old. Nevertheless, I would like to recommend http://gehrcke.de/gipc which -- I believe -- would tackle the outlined challenge in a very straight-forward fashion. Basically, it allows you to integrate multiprocessing-based child processes anywhere in your application (also on Windows). Interaction with Process objects (such as calling join()) is gevent-cooperative. Via its pipe management, it allows for cooperatively blocking inter-process communication. However, on Windows, IPC currently is much less efficient than on POSIX-compliant systems (since non-blocking I/O is imitated through a thread pool). Depending on the IPC messaging volume of your application, this might or might not be of significance.

Resources