I've read that Ruby code (CRuby/YARV) only "runs" on a single processor core, but something is not clear yet:
I understand that the GIL prevents threads from running concurrently and that in recent Ruby versions threads are scheduled by the operating system.
Couldn't a thread possibly be "placed" on core 1 and the other on core 2, even if they're not actually running at the same time?
Just trying to understand if the OS scheduler actually puts all Ruby threads on a single core. Thanks!
Edit: Another answer mentions that C++ uses pthreads and those are scheduled across cores, and that Ruby uses the same. I guess that's what I was looking for, but since most answers seem to equate not running threads in parallel with never running on multiple cores, I just wanted to confirm.
First off, we have to clearly distinguish between "Ruby Threads" and "Ruby Threads as implemented by YARV". Ruby Threads make no guarantees how they are scheduled. They might be scheduled concurrently, they might not. They might be scheduled on multiple CPUs, they might not. They might be implemented as native platform threads, they might be implemented as green threads, they might be implemented as something else.
YARV implements Ruby Threads as native platform threads (e.g. pthreads on POSIX and Windows threads on Windows). However, unlike other Ruby implementations which use native platform threads (e.g. JRuby, IronRuby, Rubinius), YARV has a Giant VM Lock (GVL) which prevents two threads to enter the YARV bytecode interpreter at the same time. This makes it effectively impossible to run Ruby code in multiple threads at the same time.
Note however, that the GVL only protects the YARV interpreter and runtime. This means that, for example, multiple threads can execute C code at the same time, and at the same time as another thread executed Ruby code. It just means that no two threads can execute Ruby code at the same time on YARV.
Note also that in recent versions of YARV, the "Giant" VM Lock is becoming ever smaller. Sections of code are moved out from under the lock, and the lock itself is broken down in smaller, more fine-grained locks. That is a very long process, but it means that in the future more and more Ruby code will be able to run in parallel on YARV.
But, all of this has nothing to do with how the platform schedules the threads. Many platforms have some sort of heuristics for thread affinity to CPU cores, e.g they may try to schedule the same thread to the same core, under the assumption that its working set is still in that core's cache, or they may try to identify threads that operate on shared data, and schedule those threads to the same CPU and so on. Therefore, it is hard to impossible to predict how and where a thread will be scheduled.
Many platforms also provide a way to influence this CPU affinity, e.g. on Linux and Windows, you can set a thread to only be scheduled on one specific or a set of specific cores. However, YARV does not do that by default. (In fact, on some platforms influencing CPU affinity requires elevated privileges, so it would mean that YARV would have to run with elevated privileges, which is not a good idea.)
So, in short: yes, depending on the platform, the hardware, and the environment, YARV threads may and probably will be scheduled on different cores. But, they won't be able to take advantage of that fact, i.e. they won't be able to run faster than on a single core (at least when running Ruby code).
Related
Sorry, if this sounds stupid.
What will happen if run runtime.GOMAXPROCS(4) while runtime.NumCpu() == 2
runtime.GOMAXPROCS controls how many operating system-level threads will be created to run goroutines of your program (and the runtime powering it). (The runtime itself will create several more threads for itself but this is beside the point.)
Basically, that is all that will happen.
But supposedly, you intended to actually ask something like "how would that affect the performance of my program?", right?
If yes, the answer is "it depends".
I'm not sure whether you had a chance to work with systems having only a single CPU with a single core (basically most consumer-grade IBM PC-compatible computers up to the generation of the PentiumĀ® CPUs which had the so-called "hyper-threading" technology), but those systems were routinely running hundreds to thousands of OS threads on a "single core" (the term did not really existed in mainstream then but OK).
Another thing to consider is that your program does not run in isolation: there are other programs running on the same CPU, and the kernel itself has several in-kernel threads as well.
You may use a tool like top or htop to assess the number of threads your system is currently scheduling across all your cores.
By this time, you might be wondering why the Go runtime defaults to creating as many threads to power the goroutines as there are physical cores.
Presumably, this comes from a simple fact that in a typical server-side workload, your program will be sort of "the main one".
In other words, the contention of its threads with the threads
of other processes and the kernel will be reasonably low.
I use OpenMP for parallel sorting at start of my program. Once data is loaded and sorted, the program runs as a daemon and OpenMP is not used any more. Is there a way to turn off the idle threads created by OpenMP? omp_set_num_threads() doesn't affect the idle threads which have already been created for a task.
Please look up OMP_WAIT_POLICY, which is new in OpenMP 4 [https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fWAIT_005fPOLICY.html].
There are non-portable alternatives like GOMP_SPINCOUNT if your OpenMP implementation isn't recent enough. I recall from OpenMP specification discussions that at least Intel, IBM, Cray, and Oracle support their own implementation of this feature already.
I don't believe there is a way to trigger the threads' destruction. Modern OpenMP implementations tend to keep threads around in a pool to speed up starting future parallel sections.
In your case I would recommend a two program solution (one parallel to sort and one serial for the daemon). How you communicate the data between them is up to you. You could do something simple like writing it to a file and then reading it again. This may not be as slow as it sounds since a modern linux distribution might keep that file in memory in the file cache.
If you really want to be sure it stays in memory, you could launch the two processes simultaneously and allow them to share memory and allow the first parallel sort process to exit when it is done.
In theory, OpenMP has a implicit synchronization at the end of the "pragma" clauses. So, when the OpenMP parallel work ends, all the threads are deleted. You dont need to kill them or free them: OpenMP does that automatically.
Maybe "omp_get_num_threads()" is telling to you the actual configuration of the program, not the number of active threads. I mean: if you set the number of threads to 4, omp will tell you that the configuration is "4 threads", but this does not mean that there are actually 4 threads in process.
This question already has answers here:
Does ruby have real multithreading?
(9 answers)
Closed 6 years ago.
I'm using ruby-head and Debian wheezy x64. When I run a multithreaded ruby script, htop shows that it's using multiple cores visually with the bars at the top, and that it's using 100% CPU on the process list but it's only using 100% of the capacity of one core. I assume it's possible to have multiple cores running at 100% and this number seems too convenient to be bottle-necked by either the program logic or another hardware aspect. Is the OS limiting the amount of available instructions I'm using, if so how do I stop this?
EDIT more info:
When I mean visually using multiple cores e.g.: 47% core 1, 29% core 2, and 24% core 3. These percentages are constantly shifting up and down and to different sets of cores, but always collectively add up to 100%-102%. More than 3(/8 total) cores are being used, but any cores other than the three most burdened only utilize 2% or less capacity. I guess I should also mention this is a linode VPS.
EDIT:
Well it looks like I was reading promises that 2.0 would feature true parallel threads, and not actual release information. Time to switch to Jruby...
You failed to mention which Ruby implementation you are using. Not all Ruby implementations are capable of scheduling Ruby threads to multiple CPUs.
In particular:
MRI implements Ruby threads as green threads inside the interpreter and schedules them itself; it cannot schedule more than one thread at a time and it cannot schedule them to multiple CPUs
YARV implements Ruby threads as native OS threads (POSIX threads or Windows threads) and lets the OS schedule them, however it puts a Giant VM Lock (GVL) around them, so that only one Ruby thread can be running at any given time
Rubinius implements Ruby threads as native OS threads (POSIX threads or Windows threads) and lets the OS schedule them, however it puts a Global Interpreter Lock (GIL) around them, so that only one Ruby thread can be running at any given time; Rubinius 2.0 is going to have fine-grained locks so that multiple Ruby threads can run at any given time
JRuby implements Ruby threads as JVM threads and uses fine-grained locking so that multiple threads can be running; however, whether or not those threads are scheduled to multiple CPUs depends on the JVM being used, some allow this, some don't
IronRuby implements Ruby threads as CLI threads and uses fine-grained locking so that multiple threads can be running; however, whether or not those threads are scheduled to multiple CPUs depends on the VES being used, some allow this, some don't
MacRuby implements Ruby threads as native OS threads and uses fine-grained locking so that multiple threads can be running on multiple CPUs at the same time
I don't know enough about Topaz, Cardinal, MagLev, MRuby and all the others.
MRI implements Ruby Threads as Green Threads within its interpreter. Unfortunately, it doesn't allow those threads to be scheduled in parallel, they can only run one thread at a time.
See similar question here
I am running a parallel algorithm using light threads and I am wondering how are these assigned to different cores when the system provides several cores and several chips. Are threads assigned to a single chip until all the cores on the chip are exhausted? Are threads assigned to cores on different chips in order to better distribute the work between chips?
You don't say what OS you're on, but in Linux, threads are assigned to a core based on the load on that core. A thread that is ready to run will be assigned to a core with lowest load unless you specify otherwise by setting thread affinity. You can do this with sched_setaffinity(). See the man page for more details. In general, as meyes1979 said, this is something that is decided by the scheduler implemented in the OS you are using.
Depending upon the version of Linux you're using, there are two articles that might be helpful: this article describes early 2.6 kernels, up through 2.6.22, and this article describes kernels newer than 2.6.23.
Different threading libraries perform threading operations differently. The "standard" in Linux these days is NPTL, which schedules threads at the same level as processes. This is quite fine, as process creation is fast on Linux, and is intended to always remain fast.
The Linux kernel attempts to provide very strong CPU affinity with executing processes and threads to increase the ratio of cache hits to cache misses -- if a task always executes on the same core, it'll more likely have pre-populated cache lines.
This is usually a good thing, but I have noticed the kernel might not always migrate tasks away from busy cores to idle cores. This behavior is liable to change from version to version, but I have found multiple CPU-bound tasks all running on one core while three other cores were idle. (I found it by noticing that one core was six or seven degrees Celsius warmer than the other three.)
In general, the right thing should just happen; but when the kernel does not automatically migrate tasks to other processors, you can use the taskset(1) command to restrict the processors allowed to programs or you could modify your program to use the pthread_setaffinity_np(3) function to ask for individual threads to be migrated. (This is perhaps best for in-house applications -- one of your users might not want your program to use all available cores. If you do choose to include calls to this function within your program, make sure it is configurable via configuration files to provide functionality similar to the taskset(1) program.)
I'm trying to understand the practical impact of different threading models between MRI Ruby 1.8 and JRuby.
What does this difference mean to me as a developer?
And also, are there any practical examples of code in MRI Ruby 1.8 that will have worse performance characteristics on JRuby due to different threading models?
State
ruby 1.8 has green threads, these are fast to create/delete (as objects) but do not truly execute in parallel and are not even scheduled by the operating system but by the virtual machine
ruby 1.9 has real threads, these are slow to create/delete (as objects) because of OS calls, but because of the GIL (global interpreter lock) that only allows one thread to execute at a time, neither these are truly parallel
JRuby also has real threads scheduled by the OS, and are truly concurrent
Conclusion
A threaded program running on a 2-core CPU will run faster on JRuby then the other implementations, regarding the threading point of view
Notice!
Many existing ruby libraries are not thread-safe so the advantage of JRuby in many times useless.
Also note that many techniques of ruby programming (for example class vars) will need additional programming effort to ensure thread-safeness (mutex locks, monitors etc) if one is to use threads.
JRuby's threads are native system threads, so they give you all the benefits of threaded programming (including the use of multiple processor cores, if applicable). However, Ruby has a Global Interpreter Lock (GIL), which prevents multiple threads from running simultaneously. So the only real performance difference is the fact that your MRI/YARV Ruby applications won't be able to utilize all of your processor cores, but your JRuby applications will happily do so.
However, if that isn't an issue, MRI's threads are (theoretically, I haven't tested this) a little faster because they are green threads, which use fewer system resources. YARV (Ruby 1.9) uses native system threads.
I am a regular JRuby user and the biggest difference is that JRuby threads are truly concurrent. They are actually system level threads so they can be executed concurrently on multiple cores. I do not know of any place where MRI Ruby 1.8 code runs slower on JRuby. You might consider checking out this question Does ruby have real multithreading?.