How do I run two threads in Ruby at the same time? - ruby

Is there some way to run 2 threads at the same time?
I want to have my app run its current function and then bring up another thread running another function, that can change variables in the first thread.

If you want to run two threads at the same time, the entire execution stack has to be capable of doing that. Let's start at the top:
Ruby itself is capable of running two threads at the same time, no problem there. However, Ruby is just a programming language, i.e. just a bunch of rules. In order to run your program, you need a Ruby implementation. Unfortunately, many popular Ruby implementations are not capable of running multiple threads at the same time, including MRI, YARV and Rubinius. In fact, the only production-ready Ruby implementation which can run threads simultaneously is JRuby. (IronRuby too, but that is technically not yet production-ready although the final 1.0 release is probably only days away.)
But JRuby (and IronRuby) don't actually implement threads themselves, they just use the underlying platform's threads. I.e. JRuby maps Ruby threads to JVM threads and IronRuby maps them to CLI threads. So, the underlying platform has to be able to run threads in parallel, too.
Again: both the JVM and the CLI are in principle capable of running threads in parallel, but the JVM and the CLI are just specifications, they are just pieces of paper. In order to run your code, you need an implementation of those specifications, and not all of them do support truly concurrent threads.
Even if your platform implementation supports truly concurrent threads, they might themselves delegate their threading implementation to the underlying OS, just like JRuby delegates to the JVM. .NET, Mono, HotSpot and JRockit for example (which are the most popular implementations of the CLI and the JVM respectively) use native OS threads for their platform threads. So, obviously, the OS has to be able to run threads in parallel. And again: not all of them are.
And, of course, all the parallelism in the OS doesn't help if you only have one CPU. If you want two threads to run at the same time, you need either two CPUs, two cores or two simultaneous hardware threads.

http://ruby-doc.org/core/classes/Thread.html
x = 2
Thread.new do
x = 3
end
x = 4
For true concurrency having more then 2 cores or 2 processors is required - but it may not work if implementation is single-threaded (such as the MRI).

First, I'm gonna answer your question:
thread_1 = Thread.new do
#do something here
end
thread_2 = Thread.new do
#do something here
end
thread_1.join
thread_2.join(timeout_in_seconds)
Thread#join makes the main thread to wait until the joined thread finishes. If you specify a timeout in seconds, Ruby will close the thread after that timeout is reached.
Now, the truth, there's no real concurrency in ruby 1.8 with the Matz Ruby Interpreter (MRI) and there's no real concurrency with only one processor though. According to this page:
However, as part of this runtime, the interpreter also instantiates an instance of a Global Interpreter Lock (or more affectionately known as GIL), which is the culprit of our lack of concurrency
Read the article itself for more information.
The MRI tries to cheat you using what's called Green Threads, which means that the Ruby interpreter takes care of everything to do with threads, not the OS, the other kind of threads, the ones really concurrent are called native threads and Ruby 1.9 support them through YARV but it doesn't mean that every Ruby thread runs in parallel because YARV has global VM lock (global interpreter lock or GIL) so concurrency is a myth in ruby and it'll be for a long time.

http://ruby-doc.org/core/classes/Thread.html
Remember that only in JRuby threads are truly parallel (other interpreters implement GIL). From here:
# mutexsyncex.rb
require 'thread' # For Mutex class in Ruby 1.8
# A BankAccount has a name, a checking amount, and a savings amount
class BankAccount
def initialize(name, checking, savings)
#name,#checking,#savings = name,checking,savings
#lock = Mutex.new # For thread safety
end
# Lock account and transfer money from savings to checking
def transfer_from_savings(x)
#lock.synchronize {
#savings -= x
#checking += x
}
end
# Lock account and report current balances
def report
#lock.synchronize {
"##name\nChecking: ##checking\nSavings: ##savings"
}
end
end
ba = BankAccount.new('me', 1, 400)
ba.transfer_from_savings(10);
puts ba.report

Related

Why a+=1 is a thread-safe operation in ruby?

Code Snippet:
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
Result: a = 25_000_000.
In my understanding, (MRI) Ruby use GIL, so there is only one ruby thread can get the CPU, but when thread-switch happend, some data of ruby thread will be stored for restoring thread later. So, in theory, a += 1 may not be thread-safe.
But the result above turns out I'm wrong. Does Ruby makes a+=1 atomic? If true, which operations can be considered thread-safe?
It's Neither Atomic Nor Thread-Safe
In your example, the apparent consistency is largely due to the global interpreter lock, but is also partly due to the way your Ruby engine and your code sequences (theoretically) asynchronous threads. You are getting consistent results because each loop in each thread is simply incrementing the current value of a, which is not a block-local or thread-local variable. With threads on the YARV virtual machine, only one thread at a time is inspecting or setting the current value of a, but I wouldn't really say that it's an atomic operation. It's just a byproduct of the engine’s lack of real-time concurrency between threads, and the underlying implementation of the Ruby virtual machine.
If you're concerned about preserving thread-safety in Ruby without relying on idiosyncratic behaviors that just happen to appear consistent, consider using a thread-safe library like concurrent-ruby. Otherwise, you may be relying on behaviors that aren't guaranteed across Ruby engines or Ruby versions.
For example, three consecutive runs of your code in JRuby (which does have concurrent threads) will generally yield different results on each run. For example:
#=> "a: 3353241"
#=> "a: 3088145"
#=> "a: 2642263"
Ruby doesn't have a well-defined Memory Model, so in some philosophical sense, the question is non-sensical, since without a Memory Model, the term "thread-safe" isn't even defined. For example, the ISO Ruby Language Specification doesn't even document the Thread class.
The way that people write concurrent code in Ruby without a well-defined Memory Model is essentially "guess-and-test". You guess what the implementations will do, then you test as many versions of as many implementations on as many platforms and as many operating systems on as many CPU architectures and as many different system sizes as possible.
As you can see in Todd's answer, even just testing one other implementation already reveals that your conclusion was wrong. (Pro tip: never make a generalization based on a sample size of 1!)
The alternative is to use a library that has already done the above, such as the concurrent-ruby library mentioned in Todd's answer. They do all the testing I mentioned above. They also work closely with the maintainers of the various implementations. E.g. Chris Seaton, the lead developer of TruffleRuby is also one of the maintainers of concurrent-ruby, and Charlie Nutter, the lead developer of JRuby, is one of the contributors.
But the result above turns out I'm wrong.
The results are misleading. In Ruby, a += 1 is a shorthand for:
a = a + 1
With a + 1 being a method call that occurs before the assignment. Since integers are objects in Ruby, we can override that method:
module ThreadTest
def +(other)
super
end
end
Integer.prepend(ThreadTest)
The above code doesn't do anything useful, it just calls super. But merely adding a Ruby implementation on top of the built-in C implementation is enough to break (or fix) your test:
Integer.prepend(ThreadTest)
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
#=> "a: 11916339"

Ruby GIL of MRI 1.9

In my understanding, here's how MRI 1.9 GIL works:
Interpreter spawns a new thread by calling the corresponding underlying C function and ask to acquire the "GIL".
If "GIL" is free, we are happy. If not, the new thread will wait and invoke another separate timer thread to set up "timeslice"
When current executing thread hit some boundaries such as return or checking backward branches, interpreter checks the timer to decide if context switch should happen.
However, as pointed by this article, we can only guarantee atomicity for pure C implementing. That being said, if some parts of our thread contains ruby code, we are still in danger of race condition.
My question is if a thread needs to acquire GIL before executing, why only C implementation methods guarantee atomicity?
Thank you in advance!
The GVL guarantees that only one thread can execute Ruby code at the same time. But of course different Ruby threads can execute Ruby code at different times.
Besides, the majority of Ruby implementations doesn't have a GVL anyway.

how to know what is NOT thread-safe in ruby?

starting from Rails 4, everything would have to run in threaded environment by default. What this means is all of the code we write AND ALL the gems we use are required to be threadsafe
so, I have few questions on this:
what is NOT thread-safe in ruby/rails? Vs What is thread-safe in ruby/rails?
Is there a list of gems that is known to be threadsafe or vice-versa?
is there List of common patterns of code which are NOT threadsafe example #result ||= some_method?
Are the data structures in ruby lang core such as Hash etc threadsafe?
On MRI, where there a GVL/GIL which means only 1 ruby thread can run at a time except for IO, does the threadsafe change effect us?
None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).
MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like #n = 0; 3.times { Thread.start { 100.times { #n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.
Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?
In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. #n += 1 is not thread safe, because it is multiple operations. #n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). #n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless #started; #started = true, which is not thread safe at all.
I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); #x = x; end is not side-effect free.
One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:
class Thing
attr_reader :stuff
def initialize(initial_stuff)
#stuff = initial_stuff
#state_lock = Mutex.new
end
def add(item)
#state_lock.synchronize do
#stuff << item
end
end
end
A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.
Another classic Ruby example is this:
STANDARD_OPTIONS = {:color => 'red', :count => 10}
def find_stuff
#some_service.load_things('stuff', STANDARD_OPTIONS)
end
find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.
If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.
It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like #n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).
Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.
In addition to Theo's answer, I'd add a couple problem areas to lookout for in Rails specifically, if you're switching to config.threadsafe!
Class variables:
##i_exist_across_threads
ENV:
ENV['DONT_CHANGE_ME']
Threads:
Thread.start
starting from Rails 4, everything would have to run in threaded environment by default
This is not 100% correct. Thread-safe Rails is just on by default. If you deploy on a multi-process app server like Passenger (community) or Unicorn there will be no difference at all. This change only concerns you, if you deploy on a multi-threaded environment like Puma or Passenger Enterprise > 4.0
In the past if you wanted to deploy on a multi-threaded app server you had to turn on config.threadsafe, which is default now, because all it did had either no effects or also applied to a Rails app running in a single process (Prooflink).
But if you do want all the Rails 4 streaming benefits and other real time stuff of the multi-threaded deployment
then maybe you will find this article interesting. As #Theo sad, for a Rails app, you actually just have to omit mutating static state during a request. While this a simple practice to follow, unfortunately you cannot be sure about this for every gem you find. As far as i remember Charles Oliver Nutter from the JRuby project had some tips about it in this podcast.
And if you want to write a pure concurrent Ruby programming, where you would need some data structures which are accessed by more than one thread you maybe will find the thread_safe gem useful.

how come ruby's single os thread doesn't block while copying a file?

My assumptions:
MRI ruby 1.8.X doesn't have native threads but green threads.
The OS is not aware of these green threads.
issuing an IO-heavy operation should suspend the whole process until the proper IO interruption is issued back.
With these I've created a simple ruby program that does the following:
starts a thread that prints "working!" every second.
issues an IO request to copy a large (1gb) file on the "main" thread.
Now one would guess that being the green threads invisible to the OS, it would put the whole process on the "blocked" queue and the "working!" green thread would not execute. Surprisingly, it works :S
Does anyone know what's going on there? Thanks.
There is no atomic kernel file copy operation. It's a lot of fairly short reads and writes that are entering and exiting the kernel.
As a result, the process is constantly getting control back. Signals are delivered.
Green threads work by hooking the Ruby-level thread dispatcher into low-level I/O and signal reception. As long as these hooks catch control periodically the green threads will act quite a bit like more concurrent threads would.
Unix originally had a quite thread-unaware but beautifully simple abstract machine model for the user process environment.
As the years went by support for concurrency in general and threads in particular were added bit-by-bit in two different ways.
Lots of little kludges were added to check if I/O would block, to fail (with later retry) if I/O would block, to interrupt slow tty I/O for signals but then transparently return to it, etc. When the Unix API's were merged each kludge existed in more than one form. Lots of choices.1.
Direct support for threads in the form of multiple kernel-visible processes sharing an address space was also added. These threads are dangerous and untestable but widely supported and used. Mostly, programs don't crash. As time goes on, latent bugs become visible as the hardware supports more true concurrency. I'm not the least bit worried that Ruby doesn't fully support that nightmare.
1. The good thing about standards is that there are so many of them.
When MRI 1.9 initiates, it spawns two native threads. One thread is for the VM, the other is used to handle signals. Rubinis uses this strategy, as does the JVM. Pipes can be used to communicate any info from other processes.
As for the FileUtils module, the cd, pwd, mkdir, rm, ln, cp, mv, chmod, chown, and touch methods are all, to some degree, outsourced to OS native utilities using the internal API of the StreamUtils submodule while the second thread is left to wait for a signal from the an outside process. Since these methods are quite thread-safe, there is no need to lock the interpreter and thus the methods don't block eachother.
Edit:
MRI 1.8.7 is quite smart, and knows that when a Thread is waiting for some external event (such as a browser to send an HTTP request), the Thread can be put to sleep and be woken up when data is detected. - Evan Phoenix from Engine Yard in Ruby, Concurrency, and You
The implementation basic implementation for FileUtils has not changed much sense 1.8.7 from looking at the source. 1.8.7 also uses a sleepy timer thread to wait for a IO response. The main difference in 1.9 is the use of native threads rather than green threads. Also the thread source code is much more refined.
By thread-safe I mean that since there is nothing shared between the processes, there is no reason to lock the global interpreter. There is a misconception that Ruby "blocks" when doing certain tasks. Whenever a thread has to block, i.e. wait without using any cpu, Ruby simply schedules another thread. However in certain situations, like a rack-server using 20% of the CPU waiting for a response, it can be appropriate to unlock the interpreter and allow concurrent threads to handle other requests during the wait. These threads are, in a sense, working in parallel. The GIL is unlocked with the rb_thread_blocking_region API. Here is a good post on this subject.

Why doesn't Ruby have a ThreadPool built-in?

I have a program that creates 10000 threads at once, and runs 8 at the same time.
But ruby doesn't have a ThreadPool built-in as Java. Is there a good reason?
probably because it's easy to roll your own using the standard library "Queue" class.
q = Queue.new
3.times { Thread.new { while something = q.pop(true) rescue nil; ... }
It's a good question though--I might suggest bringing it up with Ruby Core.
My suspicion would be it's because a ThreadPool wouldn't be that useful in C-based implementations of Ruby. You can use only one processor at a time with Matz's Ruby Intepreter or Yet Another Ruby VM.
If you want multiple threads to be run on multiple processors, you need to use JRuby instead.
Most likely the reason is because ruby doesn't have "real" threads. It has what are called Green threads. The ruby interpreter takes care of scheduling execution threads without using any underlying OS threads. This effectively makes Ruby single threaded.

Resources