Ruby garbage collection and Puma

Ruby garbage collection and Puma - ruby

When using MRI Ruby 2.1.2 with Puma (say 1 worker with 8 threads), when is the GC run? Is it run by the parent worker process when all those threads become idle, or would it be run by the parent process as needed, even when those threads are busy processing requests?
And how would this behaviour be different in Ruby 2.0 (without deferred GC).
Also asked here.

Its been answered on the Github issue.
It runs whenever the VM decides to run it. Puma does nothing to control that nor can it really.

Related

Why Rails applications run Garbage Collector at all?

I was pretty sure, that all Rack application servers (I had some experience with Unicorn and Passenger) were creating single process for every worker when they were created, and its state was "frozen".
Whenever app server receives request to handle, it forks from the master process and all further changes to forked process are separated from the original process. They benefit from copy-on-write optimizations and are safe to be "damaged" by processing request. All changes to the environment affect only single process that will be preempted anyway.
If my vision of RoR application stack was true, there should be almost no need of garbage collection, unless serving single request would take a lot of time and memory (which usually is not the case).
On the other hand, question about GC measurements done with NewRelic and it's answers led me to conclusion that I must be completely wrong.
Can someone clarify this process?

Rack applications servers are not forking at each request, only during initialization:
First, the environment is loaded in one process
Then, the server fork several workers
Then all requests are distributed among these processes
That's why the Garbage collector is used to keep each process memory clean & stable.

Sidekiq workers slower than Resque

I am trying out sidekiq alongside my resque system in production. Now I know this isn't quite an apples-to-oranges comparison but my resque jobs running on a heroku worker take around 4s to complete. I am running only 50 threads on an amazon large instance with sidekiq and the same jobs take on average around 18s. The jobs are very heavy on use of third-party apis so I am assuming my bottleneck is just my network connection but I just wanted to see if anyone has suggestions as to how I can better configure sidekiq.

Sidekiq workers will work parallel only if you will use jruby or rubinius, becouse ruby mri have global interpreter lock
Sidekiq workers will work faster only if you use jruby or rubinious with thread safe libriaries, that not block resoures that they use. So main reason of using sidekiq instead of resque is memory saving

Ruby 1.9 thread pools

As I understand, Ruby 1.9 uses OS threads but only one thread will still actually be running concurrently (though one thread may be doing blocking IO while another thread is doing processing). The threading examples I've seen just use Thread.new to launch a new thread. Coming from a Java background, I typically use thread pools as to not launch to many new threads since they are "heavyweight."
Is there a thread pool construct built into ruby? I didn't see one in the default language libraries. Or are there is a standard gem that is typically used? Since OS level threading is a newer feature of ruby, I don't know how mature the libraries are for it.

You are correct in that the default C Ruby interpreter only executes one thread at a time (other C based dynamic languages such as Python have similar restrictions). Because of this restriction, threading is not really that common in Ruby and as a result there is no default threadpool library. If there are tasks to be done in parallel, people typically uses processes since processes can scale over multiple servers.
If you do need to use threads, I would recommend you use https://github.com/meh/ruby-threadpool on the JRuby platform, which is a Ruby interpreter running on the JVM. That should be right up your alley, and because it is running on the virtual machine it will have true threading.

The accepted answer is correct, But, there are many tasks in which threads are fine. after all there are some reasons why it is there. even though it can only run a thread at a time. it is still can be considered parallel in many real life situations.
for example when we have 100 long running process in which each takes approximate 10 minutes to complete. by using threads in ruby, even with all those restrictions, if we define a threadpool of 10 tasks at time, it will run much faster than 100*10 minutes when running without threads. examples include, live capturing of file changes, sending large number of web requests (such as status check)
You can understand how pooling works by reading https://blog.codeship.com/understanding-fundamental-ruby-abstraction-concurrency/ . in production code use https://github.com/meh/ruby-thread#pool

Is Sinatra multi threaded?

Is Sinatra multi-threaded? I read else where that "sinatra is multi-threaded by default", what does that imply?
Consider this example
get "/multithread" do
t1 = Thread.new{
puts "sleeping for 10 sec"
sleep 10
# Actually make a call to Third party API using HTTP NET or whatever.
}
t1.join
"multi thread"
end
get "/dummy" do
"dummy"
end
If I access "/multithread" and "/dummy" subsequently in another tab or browser then nothing can be served(in this case for 10 seconds) till "/multithread" request is completed. In case activity freezes application becomes unresponsive.
How can we work around this without spawning another instance of the application?

tl;dr Sinatra works well with Threads, but you will probably have to use a different web server.
Sinatra itself does not impose any concurrency model, it does not even handle concurrency. This is done by the Rack handler (web server), like Thin, WEBrick or Passenger. Sinatra itself is thread-safe, meaning that if your Rack handler uses multiple threads to server requests, it works just fine. However, since Ruby 1.8 only supports green threads and Ruby 1.9 has a global VM lock, threads are not that widely used for concurrency, since on both versions, Threads will not run truly in parallel. The will, however, on JRuby or the upcoming Rubinius 2.0 (both alternative Ruby implementations).
Most existing Rack handlers that use threads will use a thread pool in order to reuse threads instead of actually creating a thread for each incoming request, since thread creation is not for free, esp. on 1.9 where threads map 1:1 to native threads. Green threads have far less overhead, which is why fibers, which are basically cooperatively scheduled green threads, as used by the above mentioned sinatra-synchrony, became so popular recently. You should be aware that any network communication will have to go through EventMachine, so you cannot use the mysql gem, for instance, to talk to your database.
Fibers scale well for network intense processing, but fail miserably for heavy computations. You are less likely to run into race conditions, a common pitfall with concurrency, if you use fibers, as they only do a context switch at clearly defined points (with synchony, whenever you wait for IO). There is a third common concurrency model: Processes. You can use preforking server or fire up multiple processes yourself. While this seems a bad idea at first glance, it has some advantages: On the normal Ruby implementation, this is the only way to use all your CPUs simultaniously. And you avoid shared state, so no race conditions by definition. Also, multiprocess apps scale easily over multiple machines. Keep in mind that you can combine multiple process with other concurrency models (evented, cooperative, preemptive).
The choice is mainly made by the server and middleware you use:
Multi-Process, non-preforking: Mongrel, Thin, WEBrick, Zbatery
Multi-Process, preforking: Unicorn, Rainbows, Passenger
Evented (suited for sinatra-synchrony): Thin, Rainbows, Zbatery
Threaded: Net::HTTP::Server, Threaded Mongrel, Puma, Rainbows, Zbatery, Thin[1], Phusion Passenger Enterprise >= 4
[1] since Sinatra 1.3.0, Thin will be started in threaded mode, if it is started by Sinatra (i.e. with ruby app.rb, but not with the thin command, nor with rackup).

While googling around, found this gem:
sinatra-synchrony
which might help you, because it touches you question.
There is also a benchmark, they did nearly the same thing like you want (external calls).
Conclusion: EventMachine is the answer here!

Thought I might elaborate for people who come across this. Sinatra includes this little chunk of code:
server.threaded = settings.threaded if server.respond_to? :threaded=
Sinatra will detect what gem you have installed for a webserver (aka, thin, puma, whatever.) and if it responds to "threaded" will set it to be threaded if requested. Neat.

After making some changes to code I was able to run padrino/sinatra application on mizuno
. Initially I tried to run Padrino application on jRuby but it was simply too unstable and I did not investigate as to why. I was facing JVM crashes when running on jRuby. I also went through this article, which makes me think why even choose Ruby if deployment can be anything but easy.
Is there any discussion on deployment of applications in ruby? Or can I spawn a new thread :)

I've been getting in to JRuby myself lately and I am extremely surprised how simple it is to switch from MRI to JRuby. It pretty much involves swapping out a few gems (in most cases).
You should take a look at the combination JRuby and Trinidad (App Server). Torquebox also seems to be an interesting all-in-one solution, it comes with a lot more than just an app server.
If you want to have an app server that supports threading, and you're familiar with Mongrel, Thin, Unicorn, etc, then Trinidad is probably the easiest to migrate to since it's practically identical from the users perspective. Loving it so far!

Ruby threading deadlocks

I'm writing a project at the moment that involves running two parallel threads to pull data from different sources at regular intervals. I am using the Threads functionality in ruby 1.9 to do this but am unfortunately running up against deadlock problems. Also I have a feeling that the Thread.join method is causing the threads to queue rather than run in parallel.
I'm new to multithreading programming and any advice would be greatly appreciated
Cheers
Patrick
EDIT: The shared resource that both these threads are accessing is a mysql database which could be the problem. The deadlock arrises after a few iterations of these threads being run.

You can use synchronization mechanisms such as Mutex, Monitor, Queue, SizedQueue from standart library. Or problem in using them?

It's very difficult to diagnose what could be going wrong without more details but deadlock is (obviously) caused by multiple threads trying to acquire resources held by others. That really means that you must have at least two mutexes and two threads. Could that be happening in your code?
Thread.join doesn't have anything to do with parallel executiion - it's a synchronization method to enable one (usually the master) thread to wait for one or more threads to complete.

Which Ruby 1.9 implementation are you using? YARV cannot run Ruby Threads in parallel. At the moment, there is no production-ready implementation of Ruby 1.9 which can run threads in parallel. JRuby can threads in parallel, but its Ruby 1.9 implementation is not quite complete yet. (Although it is stable, so if all the features you need are there, you can use it.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio