Ruby - how to thread across cores / processors - ruby

Im (re)writing a socket server in ruby in hopes of simplifying it. Reading about ruby sockets I ran across a site that says multithreaded ruby apps only use one core / processor in a machine.
Questions:
Is this accurate?
Do I care? Each thread in this server will potentially run for several minutes and there will be lots of them. Is the OS (CentOS 6.5) smart enough to share the load?
Is this any different from threading in C++? (language of the current socket server) IE do pthreads use multiple cores automatically?
What if I fork instead of thread?

CRuby has a global interpreter lock, so it cannot run threads in parallel. Jruby and some other implementations can do it, but CRuby will never run any kind of code in parallel. This means that, no matter how smart your OS is, it can never share the load.
This is different in threading in C++. pthreads create real OS threads, and the kernal's scheduler will run them on multiple cores at the same time. Technically Ruby uses pthreads as well, but the GIL prevents them from running in parallel.
Fork creates a new process, and your OS's scheduler will almost certainly be smart enough to run it on a separate core. If you need parallelism in Ruby, either use an implementation without a GIL, or use fork.

There is a very nice gem called parallel which allows data processing with parallel threads or multiple processes by forking (work around GIL of current CRuby implementation).

Due to GIL in YARV, ruby is not thread friendly. If you want to write multithreaded ruby use jruby or rubinius. It would be even better to use a functional language with actor model such as Erlang or Elixir and let the Virtual Machine handle the threads and you only manage the Erlang processes.

Threading
If you're going to want multi-core threading, you need to use an interpreter that actively uses multiple cores. MRI Ruby as of 2.1.3 is still only single-core; JRuby and Rubinius allow access to multiple cores.
Threading Alternatives
Alternatives to changing your interpreter include:
DRb with multiple Ruby processes.
A queuing system with multiple workers.
Socket programming with multiple interpreters.
Forking processes, if the underlying platform supports the fork(2) system call.

Related

Does HDF5 support concurrent reads, or writes to different files?

I'm trying to understand the limits of HDF5 concurrency.
There are two builds of HDF5: parallel HDF5 and default. The parallel version is is currently supplied in Ubuntu, and the default in Anaconda (judged by --enable-parallel flag).
I know that parallel writes to the same file are impossible. However, I don't fully understand to what extend the following actions are possible with default or with parallel build:
several processes reading from the same file
several processes reading from different files
several processes writing to different files.
Also, are there any reasons anaconda does not have --enable-parallel flag on by default? (https://github.com/conda/conda-recipes/blob/master/hdf5/build.sh)
AFAICT, there are three ways to build libhdf5:
with neither thread-safety nor MPI support (as in the conda recipe you posted)
with MPI support but no thread safety
with thread safety but no MPI support
That is, the --enable-threadsafe and --enable-parallel flags are mutually exclusive (https://www.hdfgroup.org/hdf5-quest.html#p5thread).
As for concurrent reads on one or even multiple files, the answer is that you need thread safety (https://www.hdfgroup.org/hdf5-quest.html#tsafe):
Concurrent access to one or more HDF5 file(s) from multiple threads in
the same process will not work with a non-thread-safe build of the
HDF5 library. The pre-built binaries that are available for download
are not thread-safe.
Users are often surprised to learn that (1) concurrent access to
different datasets in a single HDF5 file and (2) concurrent access to
different HDF5 files both require a thread-safe version of the HDF5
library. Although each thread in these examples is accessing different
data, the HDF5 library modifies global data structures that are
independent of a particular HDF5 dataset or HDF5 file. HDF5 relies on
a semaphore around the library API calls in the thread-safe version of
the library to protect the data structure from corruption by
simultaneous manipulation from different threads. Examples of HDF5
library global data structures that must be protected are the
freespace manager and open file lists.
Edit: The links above no longer work because the HDF Group reorganised their website. There is a page Questions about thread-safety and concurrent access in the HDF5 Knowledge Base that contains some useful information.
While only concurrent threads on a single process are mentioned in the passage, it appears to apply equally to forked subprocesses: see this h5py multiprocessing example.
Now, for parallel access, you might want to use "Parallel HDF5" but those features requires using MPI. This pattern is supported by h5py but is more complicated and esoteric, and probably even less portable than thread-safe mode. More importantly, trying to naively do concurrent reads with a parallel build of libhdf5 will lead to unexpected results because the library isn't thread-safe.
Besides efficiency, one limitation of the thread-safe build flag is lack of Windows support (https://www.hdfgroup.org/hdf5-quest.html#gconc):
The thread-safe version of HDF5 is currently not tested or supported
on MS Windows platforms. A user was able to get this working on
Windows 64-bit and contributed his Windows 64-bit Pthreads patches.
Getting weird corrupt results when reading (different!) files from Python is definitely unexpected and frustrating given how concurrent read access is one of the touted "features" of HDF5. Perhaps a better default recipe for conda would be to include --enable-threadsafe on those platforms that support it, but I guess then you would end up with platform-specific behavior. Maybe there ought to be separate packages for the three build modes instead?
Just to add:
I think independent concurrent processes (i.e. python) doing read access should be fine
HDF5 1.10 will support Single Writer Multiple Reader,more infos and also h5py 2.5.0 will have support for it

Difference between multi-process programming with fork and MPI

Is there a difference in performance or other between creating a multi-process program using the linux "fork" and the functions available in the MPI library?
Or is it just easier to do it in MPI because of the ready to use functions?
They don't solve the same problem. Note the difference between parallel programming and distributed-memory parallel programming.
Using the fork/join model you mentioned usually is for parallel programming on the same physical machine. You generally don't distribute your work to other connected machines (with the exceptions of some of the models in the comments).
MPI is for distributed-memory parallel programming. Instead of using a single processor, you use a group of machines (even hundreds of thousands of processors) to solve a problem. While these are sometimes considered one large logical machine, they are usually made up of lots of processors. The MPI functions are there to simplify communication between these processes on distributed machines to avoid having to do things like manually open TCP sockets between all of your processes.
So there's not really a way to compare their performance unless you're only running your MPI program on a single machine, which isn't really what it's designed to do. Yes, you can run MPI on a single machine and people do that all the time for small test codes or small projects, but that's not the biggest use case.

jRuby and Rubinius support parallel computing, but what about gems that don't support this?

What I'm trying to understand is, practically speaking, how much benefit do I get from the parallel computing support in jRuby / Rubinius? A lot of ruby libraries keep track of global internal state. Is there any way to deal with these libraries, or do they just become unusable if I decide to parallelize my Ruby script? Maybe Rubinius automatically puts a mutex in front of all usage of unsafe libraries?
Whenever you want to use jruby or rubinius, you'll be forced to use gems that are compatible.
Many gems has been implemented using C extensions and you can not use them along jruby e.g.
Regarding others, mostly they clearly states their thread safety status.
So choosing jruby or rubinius will narrow down your options regarding the gems.
but a huge opportunity will be exposed, you'd be able to use many mature Java Libraries(in jruby case).

Ruby threads not good enough?

How is that JRuby's support for multithreading is any better than regular Ruby's support for it? What's wrong with threads in plain old Ruby?
"Normal" ruby (or mri) has a great big lock that prevents more than one thread from running ruby code at a time (known as the GIL or GVL).
Rubinius and jruby don't have this lock. In ruby 1.8.x the threads were green threads too, but as of ruby 1.9 ruby threads are mapped to native threads. The GVL stops you from gaining much benefit though.
Native extensions can run code outside of the lock so that, for example, multiple MySQL queries can run simultaneously from different threads but they can't call into the regular ruby api when they don't hold the lock

Does Ruby Enterprise use Green threads?

I was wondering this and couldn't find anything except this
"Thread scheduler bug fixes and performance improvements. Threading on Ruby Enterprise Edition can be more than 10 times faster than official Ruby 1.8"
REE is derived from MRI 1.8.7. As such, it only used green threads. REE changes some parts of 1.8.7 (esp. in the areas memory management and garbage collection). But it still widely follows the design of the upstream MRI (the original Matz's Ruby Interpreter)
While YARV (1.9) switched to OS native threads, they still have a global interpreter lock making sure that only exactly one of these threads runs at a time.
There are a couple of Ruby implementations with OS native threads and without a GIL. The most prominent are JRuby (based on the JVM) and Rubinius (with its own VM). These implementations offer "real" concurrent threads.
Besides JRuby and Rubinius, who have got rid of an interpreter lock entirely, the state of affairs in CRuby/MRI has also made some progress with regard to concurrency.
One notable feature is that with the Bitmap Marking GC by Narihiro Nakamura, as of Ruby 2.0, another advantage of REE over CRuby will be gone: REE has a copy on write-friendly GC algorithm which made it attractive for achieving concurrency through processes (forking) rather than through threading. The new Bitmap Marking GC will have the same advantage of saving unnecessary copying of memory around when forking a new process.
The GIL (or GVL as it is officially called) is also not quite as bad as it sounds at first. For example, Ruby releases the interpreter lock when doing IO. Another feature that we see much more often lately is that C extension developers have the ability to manually release the lock by calling rb_thread_blocking_region, which will execute a C-level function with the GIL released. This can have huge effects if some operation in C is to be performed where we can rest assured that it will have no side effects. A nice example is RSA key generation - this runs completely in C with memory allocated by OpenSSL, so we can run it safely with the GIL released.
Fibers introduced in 1.9 or recent projects like Celluloid also cast a much more friendly light on the state of Ruby concurrency today as when compared to a few years ago.
Last not least, Koichi Sasada, the author of CRuby's VM, is actively working on the MVM technology, which will allow to run multiple VMs in a single Ruby process, and therefore achieving concurrency in yet another way.
Taking all the other performance improvements into account, there are less and less arguments for using REE, it's safe to switch to 1.9.3 or 2.0.0 once it's out, especially since the 1.8 series will no longer be actively developed and many popular projects have announced to quit their support for 1.8 sometime soon.
Edit:
As Holger pointed out, REE has also been EOLed, and there will be no port to 1.9 or further. So it's not only safe to switch, but also the right thing to do :)

Resources