Working with multiple processes in Ruby - ruby

Is there a module for Ruby that makes it easy to share objects between multiple processes? I'm looking for something similar to Python's multiprocessing, which supports process-safe queues and pipes that can be shared between processes.

I think you can do a lot of what you want using the facilities of Ruby IO; you're sharing between processes, not threads, correct?
If that's the case, IO.pipe will do what you need. Ruby doesn't have any built-in way of handling cross-process queues (to my knowledge), but you can also use FIFOs (if you're on Unix).
If you want something more fine-grained, and with good threading support, I'm fairly certain that you can piggyback on java.util.concurrent if you use JRuby. MRI has pretty lousy threading/concurrency support, so if that's what your aiming for, JRuby is probably a better place to go.

I've run into this library but I haven't tried it yet.
Parallel::ForkManager — A simple parallel processing fork manager.
http://parallelforkmgr.rubyforge.org/

Combining DRb, which provides simple inter-process communication, with Queue or SizedQueue, which are both threadsafe queues, should give you what you need.
You may also want to check out beanstalkd which is also hosted on github

Related

Simplest C++ library that supports distributed messaging - Observer Pattern

I need to do something relatively simple, and I don't really want to install a MOM like RabittMQ etc.
There are several programs that "register" with a central
"service" server through TCP. The only function of the server is to
call back all the registered clients when they all in turn say
"DONE". So it is a kind of "join" (edit: Barrier) for distributed client processes.
When all clients say "DONE" (they can be done at totally different times), the central server messages
them all saying "ALL-COMPLETE". The clients "block" until asynchronously called back.
So this is a kind of distributed asynchronous Observer Pattern. The server has to keep track of where the clients are somehow. It is ok for the client to pass its IP address to the server etc. It is constructable with things like Boost::Signal, BOOST::Asio, BOOST::Dataflow etc, but I don't want to reinvent the wheel if something simple already exists. I got very close with ZeroMQ, but non of their patterns support this use-case very well, AFAIK.
Is there a very simple system that does this? Notice that the server can be written in any language. I just need C++ bindings for the clients.
After much searching, I used this library
https://github.com/actor-framework
It turns out that doing this with this framework is relatively straightforward. The only real "impediment" to using it is that the library seems to have gotten an API transition recently and the documentation .pdf file has not completely caught up with the source. No biggie since the example programs and the source (.hpp) files get you over this hump. However, they need to bring the docs in sync with the source. In addition, IMO they need to provide more interesting examples on how to use c++ Actors for extreme performance. For my case it is not needed, but the idea of actors (shared nothing) in this use-case is one of the reasons people use it instead shared memory communication when using threads.
Also, getting used to the syntax that the library enforces (get used to lambdas!) if one is not used to state of the art c++11 programs it can be a bit of a mind-twister at first. Then, the triviality of remembering all the clients that registered with the server was the only other caveat.
STRONGLY RECOMMENDED.

Issues with EventMachine (and looking into Sinatra Async)

I've been trying to find a good way of dealing with asynchronous requests and organizing jobs that need to be repeated, and eventmachine seemed a good way to go, but I found some posts trying to discourage users from eventmachine (for example https://github.com/kyledrake/sinatra-synchrony). I was wondering what the issues they are referring to are? (and if someone would be nice enough, what the alternatives are?)
Considering you're basically searching for a job queue, take a look at Background Jobs at Ruby Toolbox and you'll find a plethora of good options. Manageability vs Speed goes something like this,
Delayed Job
Sidekiq/Resque
Beanstalkd
with DJ being slowest and most manageable and beanstalkd being fastest and least manageable. Your best bet is probably sidekiq or resque, they both depend on redis for managing their queue.
I'd discourage you to use EventMachine because:
It's hard to reason about the reactor pattern.
Fibers detangle reactor pattern's callback pyramid of doom into synchronous looking code but fiber support in third party apps tend to bite you.
You're limited to a very limited eco system when it comes to net-related code.
It's hard not to block the reactor and it's often not easy to catch it when you do.
There are finished solutions for background processing, you don't need to code your own.
It's not really maintained any more, just take a look at last commits and issue list on github.
There's celluloid and celluloid-io and dcell.
Actually, the Sinatra Synchrony people sum it up good:
This gem should not be considered for a new application. It is better
to use threads with Ruby, rather than EventMachine. It also tends to
break when new releases of ruby come out, and EM itself is not
maintained very well and has some pretty fundamental problems.
I will not be maintaining this gem anymore. If anyone is interested in
maintaining it, feel free to inquire, but I recommend not using
EventMachine or sinatra-synchrony anymore.
Use EM if it fits your workflow. Callbacks can be fine to work with as long as you don't get too crazy. We built a lot of software on top of EM at my last job.
There is pretty good support for third party protocols, just take a look at the protocol implementations page.
As to blocking the reactor, you just need to make sure you don't do work on the main thread, and if you do, make sure it's work you do fast. There are some things you can do to determine if this is working. The simplest is just to add a latency check into your code. It's as simple as adding a periodic timer for every x seconds and logging a message (in development). Printing out the time between the calls will tell you how lagged the reactor has become. The greater this time is then your x value the more work you're doing on the main thread.
So, I'd say, try it for yourself. Try celluloid, try straight up threads, try EM with EM-Synchrony and fibers.
It really comes down to personal preference.

ZeroMQ vs Crossroads I/O

I am looking into using ZeroMQ as the messaging/transport layer for a fairly large distributed system, mainly targeting monitoring and data collection (many producers, a few consumers).
As far as I can see there are currently two different implementations of the same concept; ZeroMQ and Crossroads I/O, the latter being a fork of ZeroMQ (in 2012?).
I am trying to figure out which one to use and wonder about the differences between them, but have so far not found much information regarding this.
For example:
Are they compatible on the wire?
Are they API compatible, i.e. some kind of common base API, possibly with different add-ons?
Do they both implement support for ZMTP (ZeroMQ Message Transport Protocol)?
Do they share some kind of common understanding of future development or will they continue in two separate and possible different directions?
What are the pros/cons in relation to the other?
Basically, how do one choose one over the other?
Crossroads.io is pretty dead since Martin Sustrik has started on a new stack, in C, called nano: https://github.com/250bpm/nanomsg
Crossroads.io does not, afaik, implement ZMTP/1.0 nor ZMTP/2.0 but its own version of the protocol.
Nano has pluggable transports and we'll probably make a ZMTP transport for that. Nano is really nice, a rethinking of the original libzmq library, and if it's successful would make a good new kernel.
Ideally, Nano would interoperate both at the API and the protocol level, so be a pluggable replacement for libzmq. It does have quite a long way to go, though.
Note that there are now several rewrites of libzmq emerging, including JeroMQ (Java) and NetMQ (C#). These two do implement ZMTP/1.0 and ZMTP/2.0 properly. There are also other libraries like Axon (https://github.com/visionmedia/axon) which are heavily inspired by 0MQ but not compatible.
Based on experience, users value interoperability more than almost anything else, so it's quite likely that different 0MQ-like stacks will end up speaking the same protocols.

Is communication between two ruby processes possible/easy?

If I have a ruby script Daemon that, as it's name implies, runs as a daemon, monitoring parts of the system and able to perform commands which require authentication, for example changing permissions, is there an easy way to have a second ruby script, say client, communicate to that script and send it commands / ask for information? I'm looking for a built in ruby way of doing this, I'd prefer to avoid building my own server protocol here.
Ruby provides many mechanisms for this including your standards such as: sockets, pipes, shared memory. But ruby also has a higher level library specifically for IPC which you can checkout Here, Drb. I haven't had a chance to play around with it too much but it looks really cool.
You may want to look into http://rubyeventmachine.com/

Ruby: Any gems for threadpooling?

Is there a gem for threadpooling anyone can recommend?
From my experience forking/process pooling is much more effective than thereadpooling in Ruby (assuming you do not need much in terms of thread communication). Some time ago I created a gem called process_pool, which is a very basic process pool with a file based job queue (you can check it out here: http://github.com/psyho/process_pool).
I would try https://github.com/ruby-concurrency/concurrent-ruby/ .
It's basically a port of the java.util.concurrent abstractions (including threadpools) to ruby -- except if you install it under Jruby, it'll use the java.util.concurrent stuff. So you can write code that'll work and do the same thing semantically (not neccesarily the same performance) under any ruby platform.
It also offers Futures, a higher level abstraction which may be more convenient to use than thread pools.

Resources