I'm trying to get some speed up in my program and I've been told that Ruby Fibers are faster than threads and can take advantage of multiple cores. I've looked around, but I just can't find how to actually run different fibers concurrently. With threads you can do this:
threads = []
threads << Thread.new {Do something}
threads << Thread.new {Do something}
threads.each {|thread| thread.join}
I can't see how to do something like this with fibers. All I can find is yield and resume which seems like just a bunch of starting and stopping between the fibers. Is there a way to do true concurrency with fibers?
No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)
The only concurrency construct in Ruby is Thread.
There seems to be a terminology issue between concurrency and parallelism.
I just can't find how to actually run different fibers concurrently.
I think you actually talk about parallelism, not about concurrency:
Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. Eg. multitasking on a single-core machine. Parallelism is when tasks literally run at the same time, eg. on a multicore processor
Quoting: Concurrency vs Parallelism - What is the difference?.
Also well illustrated here:
http://concur.rspace.googlecode.com/hg/talk/concur.html#title-slide
So to answer the question:
Fibers are primitives for implementing light weight cooperative concurrency in Ruby.
http://www.ruby-doc.org/core-2.1.1/Fiber.html
Which doesn't mean it can run in parallel.
if you want true concurrency you'll want to use threads with jruby (which doesn't actually have fibers, it only has threads, one per fiber).
Another option is to "fork" to new processes, which could run things in true parallel on MRI.
Related
As windows user, I know that OS threads consume ~1 Mb of memory due to By default, Windows allocates 1 MB of memory for each thread’s user-mode stack. How does golang use ~8kb of memory for each goroutine, if OS thread is much more gluttonous. Are goroutine sort of virtual threads?
Goroutines are not threads, they are (from the spec):
...an independent concurrent thread of control, or goroutine, within the same address space.
Effective Go defines them as:
They're called goroutines because the existing terms—threads, coroutines, processes, and so on—convey inaccurate connotations. A goroutine has a simple model: it is a function executing concurrently with other goroutines in the same address space. It is lightweight, costing little more than the allocation of stack space. And the stacks start small, so they are cheap, and grow by allocating (and freeing) heap storage as required.
Goroutines don't have their own threads. Instead multiple goroutines are (may be) multiplexed onto the same OS threads so if one should block (e.g. waiting for I/O or a blocking channel operation), others continue to run.
The actual number of threads executing goroutines simultaneously can be set with the runtime.GOMAXPROCS() function. Quoting from the runtime package documentation:
The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.
Note that in current implementation by default only 1 thread is used to execute goroutines.
1 MiB is the default, as you correctly noted. You can pick your own stack size easily (however, the minimum is still a lot higher than ~8 kiB).
That said, goroutines aren't threads. They're just tasks with coöperative multi-tasking, similar to Python's. The goroutine itself is just the code and data required to do what you want; there's also a separate scheduler (which runs on one on more OS threads), which actually executes that code.
In pseudo-code:
loop forever
take job from queue
execute job
end loop
Of course, the execute job part can be very simple, or very complicated. The simplest thing you can do is just execute a given delegate (if your language supports something like that). In effect, this is simply a method call. In more complicated scenarios, there can be also stuff like restoring some kind of context, handling continuations and coöperative task yields, for example.
This is a very light-weight approach, and very useful when doing asynchronous programming (which is almost everything nowadays :)). Many languages now support something similar - Python is the first one I've seen with this ("tasklets"), long before go. Of course, in an environment without pre-emptive multi-threading, this was pretty much the default.
In C#, for example, there's Tasks. They're not entirely the same as goroutines, but in practice, they come pretty close - the main difference being that Tasks use threads from the thread pool (usually), rather than a separate dedicated "scheduler" threads. This means that if you start 1000 tasks, it is possible for them to be run by 1000 separate threads; in practice, it would require you to write very bad Task code (e.g. using only blocking I/O, sleeping threads, waiting on wait handles etc.). If you use Tasks for asynchronous non-blocking I/O and CPU work, they come pretty close to goroutines - in actual practice. The theory is a bit different :)
EDIT:
To clear up some confusion, here is how a typical C# asynchronous method might look like:
async Task<string> GetData()
{
var html = await HttpClient.GetAsync("http://www.google.com");
var parsedStructure = Parse(html);
var dbData = await DataLayer.GetSomeStuffAsync(parsedStructure.ElementId);
return dbData.First().Description;
}
From point of view of the GetData method, the entire processing is synchronous - it's just as if you didn't use the asynchronous methods at all. The crucial difference is that you're not using up threads while you're doing the "waiting"; but ignoring that, it's almost exactly the same as writing synchronous blocking code. This also applies to any issues with shared state, of course - there isn't much of a difference between multi-threading issues in await and in blocking multi-threaded I/O. It's easier to avoid with Tasks, but just because of the tools you have, not because of any "magic" that Tasks do.
The main difference from goroutines in this aspect is that Go doesn't really have blocking methods in the usual sense of the word. Instead of blocking, they queue their particular asynchronous request, and yield. When the OS (and any other layers in Go - I don't have deep knowledge about the inner workings) receives the response, it posts it to the goroutine scheduler, which in turns knows that the goroutine that "waits" for the response is now ready to resume execution; when it actually gets a slot, it will continue on from the "blocking" call as if it had really been blocking - but in effect, it's very similar to what C#'s await does. There's no fundamental difference - there's quite a few differences between C#'s approach and Go's, but they're not all that huge.
And also note that this is fundamentally the same approach used on old Windows systems without pre-emptive multi-tasking - any "blocking" method would simply yield the thread's execution back to the scheduler. Of course, on those systems, you only had a single CPU core, so you couldn't execute multiple threads at once, but the principle is still the same.
goroutines are what we call green threads. They are not OS threads, the go scheduler is responsible for them. This is why they can have much smaller memory footprints.
I am building Ruby application. I have a set of images that I want to greyscale. My code used to be like this:
def Tools.grayscale_all_frames(frames_dir,output_dir)
number_of_frames = get_frames_count(frames_dir)
img_processor = ImageProcessor.new(frames_dir)
create_dir(output_dir)
for i in 1..number_of_frames
img_processor.load_image(frames_dir+"/frame_%04d.png"%+i)
img_processor.greyscale_image
img_processor.save_image_in_dir(output_dir,"frame_%04d"%+i)
end
end
after threading the code:
def Tools.greyscale_all_frames_threaded(frames_dir,output_dir)
number_of_frames = get_frames_count(frames_dir)
img_processor = ImageProcessor.new(frames_dir)
create_dir(output_dir)
greyscale_frames_threads = []
for frame_index in 1..3
greyscale_frames_threads << Thread.new(frame_index) { |frame_number|
puts "Loading Image #{frame_number}"
img_processor.load_image(frames_dir+"/frame_%04d.png"%+frame_number)
img_processor.greyscale_image
img_processor.save_image_in_dir(output_dir,"frame_%04d"%+frame_number)
puts "Greyscaled Image #{frame_number}"
}
end
puts "Starting Threads"
greyscale_frames_threads.each { |thread| thread.join }
end
What I expected is a thread being spawned for each image. I have 1000 images. The resolution is 1920*1080. So how I see things is like this. I have an array of threads that I call .join on it. So join will take all the threads and start them, one after the other? Does that mean that it will wait until thread 1 is done and then start thread 2? What is the point of multithreading then?
What I want is this:
Run all the threads at the same time and not one after the other. So mathematically, it will finish all the 1000 frames in the same time it will take to finish 1 frame, right?
Also can somebody explain me what .join does?
From my understanding .join will stop the main thread until your thread(s) is or are done?
If you don't use .join, then the thread will run the background and the main thread will just continue.
So what is the point of using .join? I want my main thread to continue running and have the other threads in the background doing stuff?
Thanks for any help/clarification!!
This is only true if you have 1000 CPU cores and massive (read: hundreds and hundreds) of RAM.
The point of join is not to start the thread, but to wait until the thread has finished. So calling join on an array of threads is a common pattern for waiting for them all to finish.
Explaining all of this, and clarifying your misconception this requires digging a little deeper. At the C/Assembler level, mst modern OSes (Win, Mac, Linux, and some others) use a preemptive scheduler. If you have only one core, two programs running in paralel is a complete illusion. In reality, the kernel is switching between the two every few milliseconds, giving all of use slow processing humans the illusion of parallel processing.
In newer, more modern CPUs, there are often more than one core. The most powerful CPU's today can go up to (I think) 16 real cores + 16 hyperthreaded cores (see here). This means that you could actually run 32 tasks completely in parallel. But even this does not ensure that if you start 32 threads they will all finish at the same time.
Because of competition for resources that are shared between cores (some cache, all the RAM, harddrive, network card, etc.), and the essentially random nature of preemptive scheduling, the amount of time your thread takes can be estimated in a certain range, but not exactly.
Unfortunatly, all of this breaks down when you get to Ruby. Because of some hariy internal details about the threading model an compatibility, only one thread can execute ruby code at a time. So, if your image processing is done in C, happy joy joy. If it's written in Ruby, well, all the treads in the world arn't going to help you now.
To be able to actually run Ruby code in parallel, you have to use fork. fork is only available on Linux and Mac, and not Windows, but you can think of it as a fork in a road. One process goes in, two processes come out. Multiple processes can run on all your different cores at once.
So, take #Stefan's advice: use a queue and a number of worker threads = to # of CPU cores. And con't expect so much of your computer. Now you know why ;).
So join will take all the threads and start them, one after the other?
No, the threads are started when invoking Thread#new. It creates a new thread and executed the given block within that thread.
join will stop the main thread until your thread(s) is or are done?
Yes, it will suspend execution until the receiver (each of your threads) exists.
So what is the point of using join?
Sometimes you want to start some tasks in parallel but you have to wait for each task to finish before you can continue.
I want my main thread to continue running and have the other threads in the background doing stuff
Then don't call join.
After all it's not a good idea to start 1,000 threads in parallel. Your machine is only capable of running as many tasks in parallel as CPUs are available. So instead of starting 1,000 threads, place your jobs / tasks in a queue / pool and process them using some worker threads (number of CPUs = number of workers).
I have been reading about Ruby 1.9 Thread and I see that all ruby threads go through the Global Interpreter Lock (GIL for friends) and that concurrency is actually non-existant.
I have done a test (without any signals nor waiting) and the performance using threads doesn't only not improve but the operations actually take more time than running them serially
My question is basically - Whats the point for these Threads if they are not concurrent? Is there any hope that they will be concurrent in the future?
A lot of other Ruby interpreters (JRuby, Rubinius) don't actually have GILs. Also, MRI 2.0 is going to do away with the GIL as well.
Also, in a lot of cases (such as when waiting for IO) the interpreter does switch to another thread. So while it's not technically multithreading (in the case of MRI/REE as of 1.9), it does get some of the benefits.
Parallelism is nonexistent, but Ruby threads do not prevent concurrent execution of Ruby code. Even on a single core machine, concurrent code execution is possible. I think you just conflated the terms 'concurrent' and parallel'.
See Working with Ruby Threads by Jesse Storimer for more details.
I've been looking at optimizing a ruby program that's quite calculation intensive on a lot of data. I don't know C and have chosen Ruby (not that I know it well either) and I'm quite happy with the results, apart from the time it takes to execute. It is a lot of data, and without spending any money, I'd like to know what I can do to make sure I'm maximizing my own systems resources.
When I run a basic Ruby program, does it use a single processor? If I have not specifically assigned tasks to a processor, Ruby won't read my program and magically load each processor to complete the program as fast as possible will it? I'm assuming no...
I've been reading a bit on speeding up Ruby, and in another thread read that Ruby does not support true multithreading (though it said JRuby does). But, if I were to "break up" my program into two chunks that can be run in separate instances and run these in parralel...would these two chunks run on two separate processors automatically? If I had four processors and opened up four shells and ran four separate parts (1/4) of the program - would it complete in 1/4 the time?
Update
After reading the comments I decided to give JRuby a shot. Porting the app over wasn't that difficult. I haven't used "peach" yet, but just by running it in JRuby, the app runs in 1/4 the time!!! Insane. I didn't expect that much of a change. Going to give .peach a shot now and see how that improves things. Still can't believe that boost.
Update #2
Just gave peach a try. Ended up shaving another 15% off the time. So switching to JRuby and using Peach was definitely worth it.
Thanks everyone!
Use JRuby and the peach gem, and it couldn't be easier. Just replace an .each with .peach and voila, you're executing in parallel. And there are additional options to control exactly how many threads are spawned, etc. I have used this and it works great.
You get close to n times speedup, where n is the number of CPUs/cores available. I find that the optimal number of threads is slightly more than the number of CPUs/cores.
Like others have said the MRI implementation of ruby (the one most people use) does not support native threads. Hence you can not split work between CPU cores by launching more threads using the MRI implementation.
However if your process is IO-bound (restricted by disk or network activity for example), then you may still benefit from multiple MRI-threads.
JRuby on the other hand does support native threads, meaning you can use threads to split work between CPU cores.
But all is not lost. With MRI (and all the other ruby implementations), you can still use processes to split work.
This can be done using Process.fork for example like this:
Process.fork {
10.times {
# Do some work in process 1
sleep 1
puts "Hello 1"
}
}
Process.fork {
10.times {
# Do some work in process 2
sleep 1
puts "Hello 2"
}
}
# Wait for the child processes to finish
Process.wait
Using fork will split the processing between CPU cores, so if you can live without threads then separate processes are one way to do it.
As nice as ruby is, it's not known for its speed of execution. That being said, if, as noted in your comment, you can break up the input into equal-sized chunks you should be able to start up n instances of the program, where n is the number of cores you have, and the OS will take care of using all the cores for you.
In the best case it would run in 1/n the time, but this kind of thing can be tricky to get exactly right as some portions of the system, like memory, need to be shared between the processes and contention between processes can cause things not to scale linearly. If the split is easy to do I'd give it a try. You can also just try running the same program twice and see how long it takes to run, if it takes the same amount of time to run one as it does to run two you're likely all set, just split your data and go for it.
Trying jruby and some threads would probably help, but that adds a fair amount of complexity. (It would probably be a good excuse to learn about threading.)
Threading is usually considered one of Ruby's weak points, but it depends more on which implementation of Ruby you use.
A really good writeup on the different threading models is "Does ruby have real multithreading?".
From my experience and from what I gathered from people who know better about this stuff, it seems if you are going to chose a Ruby implementation, JRuby is the way to go. Though, if you are learning Ruby you might want to chose another language such has Erlang, or maybe Clojure, which are popular choices if you wanting to use the JVM.
I have a code snippet like this
myhash.each_value{|subhash|
(subhash['key]'.each {|subsubhash|
statement that modifies the subsubhash and takes about 0.07 s to execute
})
}
This loop runs 100+ times and needless to say slows down my application tremendously(about 7 seconds to run this loop).
Any pointers on how to make this faster? I have no control over the really expensive statement. Is there a way I can multi thread within the loop so the statements can be executed in parallel?
threads = []
myhash.each_value{ |subhash|
threads << Thread.start do
subhash['key'].each { |subsubhash|
threads << Thread.start do
statement that modifies the subsubhash and takes about 0.07 s to execute
end
}
end
}
threads.each { |t| t.join }
Note that MRI 1.8.x doesn't use real threads, but rather green ones which do not correspond to real OS threads. However, if you use JRuby you might see a performance boost as it supports real threads.
You could run each subhash processing loop in a separate thread but whether or not this results in a performance boost may depend on either (1) the Ruby interpreter you are using or (2) whether the innermost block is IO-bound or compute-bound.
The reason for #1 is that some Ruby interpreters (such as CRuby/MRI 1.8) use green threads which typically do not benefit from any actual parallel processing, even on multicore machines. However, YARV and JRuby both use native OS threads (JRuby even for 1.8 since the JVM uses native threads), so if you can target those interpreters specifically then you might see an improvement.
The reason for #2 is that if the innermost block is IO-bound then even a green thread based interpreter might improve performance since most OSes do a good job of scheduling threads around blocking IO calls. If the block is strictly compute-bound then only a native-thread based interpreter will likely show a performance boost using multiple threads.