Why doesn't Ruby have a ThreadPool built-in? - ruby

I have a program that creates 10000 threads at once, and runs 8 at the same time.
But ruby doesn't have a ThreadPool built-in as Java. Is there a good reason?

probably because it's easy to roll your own using the standard library "Queue" class.
q = Queue.new
3.times { Thread.new { while something = q.pop(true) rescue nil; ... }
It's a good question though--I might suggest bringing it up with Ruby Core.

My suspicion would be it's because a ThreadPool wouldn't be that useful in C-based implementations of Ruby. You can use only one processor at a time with Matz's Ruby Intepreter or Yet Another Ruby VM.
If you want multiple threads to be run on multiple processors, you need to use JRuby instead.

Most likely the reason is because ruby doesn't have "real" threads. It has what are called Green threads. The ruby interpreter takes care of scheduling execution threads without using any underlying OS threads. This effectively makes Ruby single threaded.

Related

Ruby GIL of MRI 1.9

In my understanding, here's how MRI 1.9 GIL works:
Interpreter spawns a new thread by calling the corresponding underlying C function and ask to acquire the "GIL".
If "GIL" is free, we are happy. If not, the new thread will wait and invoke another separate timer thread to set up "timeslice"
When current executing thread hit some boundaries such as return or checking backward branches, interpreter checks the timer to decide if context switch should happen.
However, as pointed by this article, we can only guarantee atomicity for pure C implementing. That being said, if some parts of our thread contains ruby code, we are still in danger of race condition.
My question is if a thread needs to acquire GIL before executing, why only C implementation methods guarantee atomicity?
Thank you in advance!
The GVL guarantees that only one thread can execute Ruby code at the same time. But of course different Ruby threads can execute Ruby code at different times.
Besides, the majority of Ruby implementations doesn't have a GVL anyway.

how to know what is NOT thread-safe in ruby?

starting from Rails 4, everything would have to run in threaded environment by default. What this means is all of the code we write AND ALL the gems we use are required to be threadsafe
so, I have few questions on this:
what is NOT thread-safe in ruby/rails? Vs What is thread-safe in ruby/rails?
Is there a list of gems that is known to be threadsafe or vice-versa?
is there List of common patterns of code which are NOT threadsafe example #result ||= some_method?
Are the data structures in ruby lang core such as Hash etc threadsafe?
On MRI, where there a GVL/GIL which means only 1 ruby thread can run at a time except for IO, does the threadsafe change effect us?
None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).
MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like #n = 0; 3.times { Thread.start { 100.times { #n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.
Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?
In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. #n += 1 is not thread safe, because it is multiple operations. #n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). #n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless #started; #started = true, which is not thread safe at all.
I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); #x = x; end is not side-effect free.
One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:
class Thing
attr_reader :stuff
def initialize(initial_stuff)
#stuff = initial_stuff
#state_lock = Mutex.new
end
def add(item)
#state_lock.synchronize do
#stuff << item
end
end
end
A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.
Another classic Ruby example is this:
STANDARD_OPTIONS = {:color => 'red', :count => 10}
def find_stuff
#some_service.load_things('stuff', STANDARD_OPTIONS)
end
find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.
If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.
It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like #n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).
Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.
In addition to Theo's answer, I'd add a couple problem areas to lookout for in Rails specifically, if you're switching to config.threadsafe!
Class variables:
##i_exist_across_threads
ENV:
ENV['DONT_CHANGE_ME']
Threads:
Thread.start
starting from Rails 4, everything would have to run in threaded environment by default
This is not 100% correct. Thread-safe Rails is just on by default. If you deploy on a multi-process app server like Passenger (community) or Unicorn there will be no difference at all. This change only concerns you, if you deploy on a multi-threaded environment like Puma or Passenger Enterprise > 4.0
In the past if you wanted to deploy on a multi-threaded app server you had to turn on config.threadsafe, which is default now, because all it did had either no effects or also applied to a Rails app running in a single process (Prooflink).
But if you do want all the Rails 4 streaming benefits and other real time stuff of the multi-threaded deployment
then maybe you will find this article interesting. As #Theo sad, for a Rails app, you actually just have to omit mutating static state during a request. While this a simple practice to follow, unfortunately you cannot be sure about this for every gem you find. As far as i remember Charles Oliver Nutter from the JRuby project had some tips about it in this podcast.
And if you want to write a pure concurrent Ruby programming, where you would need some data structures which are accessed by more than one thread you maybe will find the thread_safe gem useful.

Ruby Semaphores?

I'm working on an implementation of the "Fair Barbershop" problem in Ruby. This is for a class assignment, but I'm not looking for any handouts. I've been searching like crazy, but I cannot seem to find a Ruby implementation of Semaphores that mirror those found in C.
I know there is Mutex, and that's great. Single implementation, does exactly what that kind of semaphore should do.
Then there's Condition Variables. I thought that this was going to work out great, but looking at these, they require a Mutex for every wait call, which looks to me like I can't put numerical values to the semaphore (as in, I have seven barbershops, 3 barbers, etc.).
I think I need a Counting Semaphore, but I think it's a little bizarre that Ruby doesn't (from what I can find) contain such a class in its core. Can anyone help point me in the right direction?
If you are using JRuby, you can import semaphores from Java as shown in this article.
require 'java'
java_import 'java.util.concurrent.Semaphore'
SEM = Semaphore.new(limit_of_simultaneous_threads)
SEM.acquire #To decrement the number available
SEM.release #To increment the number available
There's http://sysvipc.rubyforge.org/SysVIPC.html which gives you SysV semaphores. Ruby is perfect for eliminating the API blemishes of SysV semaphores and SysV semaphores are the best around -- they are interprocess semaphores, you can use SEM_UNDO so that even SIGKILLs won't mess up your global state (POSIX interprocess semaphores don't have this), and you with SysV semaphores you can perform atomic operations on several semaphores at once as long as they're in the same semaphore set.
As for inter-thread semaphores, those should be perfectly emulatable with Condition Variables and Mutexes. (See Bernanrdo Martinez's link for how it can be done).
I also found this code:
https://gist.github.com/pettyjamesm/3746457
probably someone might like this other option.
since concurrent-ruby is stable (beyond 1.0) and is being widely used thus the best (and portable across Ruby impls) solution is to use its Concurrent::Semaphore class
Thanks to #x3ro for his link. That pointed me in the right direction. However, with the implementation that Fukumoto gave (at least for rb1.9.2) Thread.critical isn't available. Furthermore, my attempts to replace the Thread.critical calls with Thread.exclusive{} simply resulted in deadlocks. It turns out that there is a proposed Semaphore patch for Ruby (which I've linked below) that has solved the problem by replacing Thread.exclusive{} with a Mutex::synchronize{}, among a few other tweaks. Thanks to #x3ro for pushing me in the right direction.
http://redmine.ruby-lang.org/attachments/1109/final-semaphore.patch
Since the other links here aren't working for me, I decided to quickly hack something together. I have not tested this, so input and corrections are welcome. It's based simply on the idea that a Mutex is a binary Semaphore, thus a Semaphore is a set of Mutexes.
https://gist.github.com/3439373
I think it might be useful to mention the Thread::Queue in this context for others arriving at this question.
The Queue is a thread-safe tool (implemented with some behind-the-scenes synchronization primitives) that can be used like a traditional multi-processing semaphore with just a hint of imagination. And it comes preloaded by default, at least in ruby v3:
#!/usr/bin/ruby
# hold_your_horses.rb
q = Queue.new
wait_thread = Thread.new{
puts "Wait for it ..."
q.pop
puts "... BOOM!"
}
sleep 1
puts "... click, click ..."
q.push nil
wait_thread.join
And can be demonstrated simply enough:
user#host:~/scripts$ ruby hold_your_horses.rb
Wait for it ...
... click, click ...
... BOOM!
The docs for ruby v3.1 say a Queue can be initialized with an enumerable object to set up initial contents but that wasn't available in my v3.0. But if you want a semaphore with, say, 7 permits, it's easy to stuff the box with something like:
q = Queue.new
7.times{ q.push nil }
I used the Queue to implement baton-passing between some worker-threads:
class WaitForBaton
def initialize
#q = Queue.new
end
def pass_baton
#q.push nil
sleep 0.0
end
def wait_for_baton
#q.pop
end
end
So that thread task_master could perform steps one and three with thread little_helper stepping in at the appropriate time to handle step two:
baton = WaitForBaton.new
task_master = Thread.new{
step_one(ARGV[0])
baton.pass_baton
baton.wait_for_baton
step_three(logfile)
}
little_helper = Thread.new{
baton.wait_for_baton
step_two(ARGV[1])
baton.pass_baton
}
task_master.join
little_helper.join
Note that the sleep 0.0 in the .pass_baton method of my WaitForBaton class is necessary to prevent task_master from passing the baton to itself: unless thread scheduling happens to jump away from task_master right after baton.pass_baton, the very next thing that happens is task_master's baton.wait_for_baton - which takes the baton right back again. sleep 0.0 explicitly cedes execution to any other threads that might be waiting to run (and, in this case, blocking on the underlying Queue).
Ceding execution is not the default behavior because this is a somewhat unusual usage of semaphore technology - imagine that task_master could be generating many tasks for little_helpers to do and task_master can efficiently get right back to generating tasks right after passing a task off through a Thread::Queue's .push([object]) method.

How can I get synchronous/blocking I/O in ruby across multiple threads?

I simply want to use ruby, yet I feel that I cannot if my goal includes using multiple threads that do any form of blocking IO. Even for what would be a small script, when I see the need for multiple threads I start to turn to java. Is there a good way I can use Ruby to create multiple threads, have each block when needed? As many of you know, green threads do not support blocking IO as they will cause all threads to block..
use 1.9 which introduces native threads (and a GLI), or use Jruby, which has fully concurrent native threads. That's what I would do, anyway :)

How do I run two threads in Ruby at the same time?

Is there some way to run 2 threads at the same time?
I want to have my app run its current function and then bring up another thread running another function, that can change variables in the first thread.
If you want to run two threads at the same time, the entire execution stack has to be capable of doing that. Let's start at the top:
Ruby itself is capable of running two threads at the same time, no problem there. However, Ruby is just a programming language, i.e. just a bunch of rules. In order to run your program, you need a Ruby implementation. Unfortunately, many popular Ruby implementations are not capable of running multiple threads at the same time, including MRI, YARV and Rubinius. In fact, the only production-ready Ruby implementation which can run threads simultaneously is JRuby. (IronRuby too, but that is technically not yet production-ready although the final 1.0 release is probably only days away.)
But JRuby (and IronRuby) don't actually implement threads themselves, they just use the underlying platform's threads. I.e. JRuby maps Ruby threads to JVM threads and IronRuby maps them to CLI threads. So, the underlying platform has to be able to run threads in parallel, too.
Again: both the JVM and the CLI are in principle capable of running threads in parallel, but the JVM and the CLI are just specifications, they are just pieces of paper. In order to run your code, you need an implementation of those specifications, and not all of them do support truly concurrent threads.
Even if your platform implementation supports truly concurrent threads, they might themselves delegate their threading implementation to the underlying OS, just like JRuby delegates to the JVM. .NET, Mono, HotSpot and JRockit for example (which are the most popular implementations of the CLI and the JVM respectively) use native OS threads for their platform threads. So, obviously, the OS has to be able to run threads in parallel. And again: not all of them are.
And, of course, all the parallelism in the OS doesn't help if you only have one CPU. If you want two threads to run at the same time, you need either two CPUs, two cores or two simultaneous hardware threads.
http://ruby-doc.org/core/classes/Thread.html
x = 2
Thread.new do
x = 3
end
x = 4
For true concurrency having more then 2 cores or 2 processors is required - but it may not work if implementation is single-threaded (such as the MRI).
First, I'm gonna answer your question:
thread_1 = Thread.new do
#do something here
end
thread_2 = Thread.new do
#do something here
end
thread_1.join
thread_2.join(timeout_in_seconds)
Thread#join makes the main thread to wait until the joined thread finishes. If you specify a timeout in seconds, Ruby will close the thread after that timeout is reached.
Now, the truth, there's no real concurrency in ruby 1.8 with the Matz Ruby Interpreter (MRI) and there's no real concurrency with only one processor though. According to this page:
However, as part of this runtime, the interpreter also instantiates an instance of a Global Interpreter Lock (or more affectionately known as GIL), which is the culprit of our lack of concurrency
Read the article itself for more information.
The MRI tries to cheat you using what's called Green Threads, which means that the Ruby interpreter takes care of everything to do with threads, not the OS, the other kind of threads, the ones really concurrent are called native threads and Ruby 1.9 support them through YARV but it doesn't mean that every Ruby thread runs in parallel because YARV has global VM lock (global interpreter lock or GIL) so concurrency is a myth in ruby and it'll be for a long time.
http://ruby-doc.org/core/classes/Thread.html
Remember that only in JRuby threads are truly parallel (other interpreters implement GIL). From here:
# mutexsyncex.rb
require 'thread' # For Mutex class in Ruby 1.8
# A BankAccount has a name, a checking amount, and a savings amount
class BankAccount
def initialize(name, checking, savings)
#name,#checking,#savings = name,checking,savings
#lock = Mutex.new # For thread safety
end
# Lock account and transfer money from savings to checking
def transfer_from_savings(x)
#lock.synchronize {
#savings -= x
#checking += x
}
end
# Lock account and report current balances
def report
#lock.synchronize {
"##name\nChecking: ##checking\nSavings: ##savings"
}
end
end
ba = BankAccount.new('me', 1, 400)
ba.transfer_from_savings(10);
puts ba.report

Resources