ActiveRecord inside the thread is too slow - ruby

I have a problem. I have serveral code blocks that has to work parallel and independent, but can have some connection dots like threadsafe queue. I'm not using rails, that is just a ruby script. But i'm using activerecord. And when I'm calling an activerecord model find in the main ruby thread it is ok and selects are taking about 0.3 ms. But if I call an activerecord model find inside a created thread like:
Thread.new do
3.times {
SomeModel.find(3)
}
end
It now takes 400 ms. Why does it happen and what to do, to low the request execution time?

I discivored that this is of ActiveRecord gets connection from a main thread, there it established the connection. Reestablishing connection in each thread does the trick. Another way is to override current_connection method in activerecord to share connection between the threads.

Related

Sinatra, Puma, ActiveRecord: No connection pool with 'primary' found

I am building a service in Ruby 2.4.4, with Sinatra 2.0.5, ActiveRecord 5.2.2, Puma 3.12.0. (I'm not using rails.)
My code looks like this. I have an endpoint which opens a DB connection (to a Postgres DB) and runs some DB queries, like this:
POST '/endpoint' do
# open a connection
ActiveRecord::Base.establish_connection(##db_configuration)
# run some queries
db_value = TableModel.find_by(xx: yy)
return whatever
end
after do
# after the endpoint finishes, close all open connections
ActiveRecord::Base.clear_all_connections!
end
When I get two parallel requests to this endpoint, one of them fails with this error:
2019-01-12 00:22:07 - ActiveRecord::ConnectionNotEstablished - No connection pool with 'primary' found.:
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/activerecord-5.2.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:1009:in `retrieve_connection'
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/activerecord-5.2.2/lib/active_record/connection_handling.rb:118:in `retrieve_connection'
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/activerecord-5.2.2/lib/active_record/connection_handling.rb:90:in `connection'
C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/activerecord-5.2.2/lib/active_record/core.rb:207:in `find_by'
...
My discovery process went this way so far.
I looked at the connection usage in Postgres, thinking I might leak connections - no, I didn't seem to.
Just in case, I increased the connection pool to 16 (corresponding to 16 Puma threads) - didn't help.
Then I looked into the ActiveRecord sources. Here I realized why 2) didn't help. The problem is not that I can't get a connection, but I can't get a connection pool (yes, yes, it says that in the exception). The #owner_to_pool map variable, from which a connection pool is obtained, stores the process_id as key, and as values - connection pools (actually, the value is also a map, where the key is a connection specification and the value, I presume, is an actual pool instance). In my case, I have only one connection spec to my only db.
But Puma is a multithreaded webserver. It runs all requests in the same process but in different threads.
Because of that, I think, the following happens:
The first request, starting in process_id=X, thread=Y, "checks out" the connection pool in establish_connection, based on process_id=X, - and "takes" it. Now it's not present in the #owner_to_pool.
The second request, starting in the same process_id=X, but different thread=Z, tries to do the same - but the connection pool for process_id=X is not present in owner_to_pool. So the second request doesn't get a connection pool and fails with that exception.
The first request finished successfully and puts the connection pool for process_id=X back in place by calling clear_all_connections.
Another request, starting after all that, and not having any parallel requests in parallel threads, will succeed, because it will pick up the connection pool and put it back again with no problems.
Although I am not sure I understand everything 100% correctly, but it seems to me that something like this happens.
Now, my question is: what do I do with all this? How do I make the multithreaded Puma webserver work correctly with ActiveRecord's connection pool?
Thanks a lot in advance!
This question seems similar, but unfortunately it doesn't have an answer, and I don't have enough reputation to comment on it and ask the author if they solved it.
So, basically, I didn't realize I was establish_connection is creating a connection pool. (Yes, yes, I said so myself in the question. Still, didn't quite realize it.)
What I ended up doing, is this:
require ....
# create the connection pool with the required configuration - once; it'll belong to the process
ActiveRecord::Base.establish_connection(db_configuration)
at_exit {
# close all connections on app exit
ActiveRecord::Base.clear_all_connections!
}
class SomeClass < Sinatra::Base
POST '/endpoint' do
# run some queries - they'll automatically use a connection from the pool
db_value = TableModel.find_by(xx: yy)
return whatever
end
end

How does one achieve parallel tasks with Ruby's Fibers?

I'm new to fibers and EventMachine, and have only recently found out about fibers when I was seeing if Ruby had any concurrency features, like go-lang.
There don't seem to be a whole lot of examples out there for real use cases for when you'd use a Fiber.
I did manage to find this: https://www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/ (back from 2009!!!)
which has the following code:
require 'eventmachine'
require 'em-http'
require 'fiber'
def async_fetch(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get :timeout => 10
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
EventMachine.run do
Fiber.new{
puts "Setting up HTTP request #1"
data = async_fetch('http://www.google.com/')
puts "Fetched page #1: #{data.response_header.status}"
EventMachine.stop
}.resume
end
And that's great, async GET request! yay!!! but... how do I actually use it asyncily? The example doesn't have anything beyond creating the containing Fiber.
From what I understand (and don't understand):
async_fetch is blocking until f.resume is called.
f is the current Fiber, which is the wrapping Fiber created in the EventMachine.run block.
the async_fetch yields control flow back to its caller? I'm not sure what this does
why does the wrapping fiber have resume at the end? are fibers paused by default?
Outside of the example, how do I use fibers to say, shoot off a bunch of requests triggered by keyboard commands?
like, for example: every time I type a letter, I make a request to google or something? - normally this requires a thread, which the main thread would tell the parallel thread to launch a thread for each request. :-\
I'm new to concurrency / Fibers. But they are so intriguing!
If anyone can answers these questions, that would be very appreciated!!!
There is a lot of confusion regarding fibers in Ruby. Fibers are not a tool with which to implement concurrency; they are merely a way of organizing code in a way that may more clearly represent what is going on.
That the name 'fibers' is similar to 'threads' in my opinion contributes to the confusion.
If you want true concurrency, that is, distributing the CPU load across all available CPU's, you have the following options:
In MRI Ruby
Running multiple Ruby VM's (i.e. OS processes), using fork, etc. Even with multiple threads in Ruby, the GIL (Global Interpreter Lock) prevents the use of more than 1 CPU by the Ruby runtime.
In JRuby
Unlike MRI Ruby, JRuby will use multiple CPU's when assigning threads, so you can get truly concurrent processing.
If your code is spending most of its time waiting for external resources, then you may not have any need for this true concurrency. MRI threads or some kind of event handling loop will probably work fine for you.

Mutex for ActiveRecord Model

My User model has a nasty method that should not be called simultaneously for two instances of the same record. I need to execute two http requests in a row and at the same time make sure that any other thread does not execute the same method for the same record at the same time.
class User
...
def nasty_long_running_method
// something nasty will happen if this method is called simultaneously
// for two instances of the same record and the later one finishes http_request_1
// before the first one finishes http_request_2.
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
update_model
end
end
For example this would break everything:
user = User.first
Thread.new { user.nasty_long_running_method }
Thread.new { user.nasty_long_running_method }
But this would be ok and it should be allowed:
user1 = User.find(1)
user2 = User.find(2)
Thread.new { user1.nasty_long_running_method }
Thread.new { user2.nasty_long_running_method }
What would be the best way to make sure the method is not called simultaneously for two instances of the same record?
I found a gem Remote lock when searching for a solution for my problem. It is a mutex solution that uses Redis in the backend.
It:
is accessible for all processes
does not lock the database
is in memory -> fast and no IO
The method looks like this now
def nasty
$lock = RemoteLock.new(RemoteLock::Adapters::Redis.new(REDIS))
$lock.synchronize("capi_lock_#{user_id}") do
http_request_1
http_request_2
update_user
end
end
I would start with adding a mutex or semaphore. Read about mutex: http://www.ruby-doc.org/core-2.1.2/Mutex.html
class User
...
def nasty
#semaphore ||= Mutex.new
#semaphore.synchronize {
# only one thread at a time can enter this block...
}
end
end
If your class is an ActiveRecord object you might want to use Rails' locking and database transactions. See: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html
def nasty
User.transaction do
lock!
...
save!
end
end
Update: You updated your question with more details. And it seems like my solutions do not really fit anymore. The first solutions does not work if you have multiple instances running. The second locks only the database row, it does not prevent multiple thread from entering the code block at the same time.
Therefore if would think about building a database based semaphore.
class Semaphore < ActiveRecord::Base
belongs_to :item, :polymorphic => true
def self.get_lock(item, identifier)
# may raise invalid key exception from unique key contraints in db
create(:item => item) rescue false
end
def release
destroy
end
end
The database should have an unique index covering the rows for the polymorphic association to item. That should protect multiple thread from getting a lock for the same item at the same time. Your method would look like this:
def nasty
until semaphore
semaphore = Semaphore.get_lock(user)
end
...
semaphore.release
end
There are a couple of problems to solve around this: How long do you want to wait to get the semaphore? What happens if the external http requests take ages? Do you need to store additional pieces of information (hostname, pid) to identifier what thread lock an item? You will need some kind of cleanup task the removes locks that still exist after a certain period of time or after restarting the server.
Furthermore I think it is a terrible idea to have something like this in a web server. At least you should move all that stuff into background jobs. What might solve your problem, if your app is small and needs just one background job to get everything done.
You state that this is an ActiveRecord model, in which case the usual approach would be to use a database lock on that record. No need for additional locking mechanisms as far as I can see.
Take a look at the short (one page) Rails Guides section on pessimistic locking - http://guides.rubyonrails.org/active_record_querying.html#pessimistic-locking
Basically you can get a lock on a single record or a whole table (if you were updating a lot of things)
In your case something like this should do the trick...
class User < ActiveRecord::Base
...
def nasty_long_running_method
with_lock do
// something nasty will happen if this method is called simultaneously
// for two instances of the same record and the later one finishes http_request_1
// before the first one finishes http_request_2.
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
update_model
end
end
end
I recently created a gem called szymanskis_mutex. It is a module that you can include in the class User and provides the method mutual_exclusion(concern) to provide the functionality you want.
It doesnt rely on databases and doesn't depend on how many processes want to enter the critical section at any given moment.
Note that if the class is initialized in different servers it will not work.
I may suite your needs if your app is small enough. Your code would look like this:
class User
include SzymanskisMutex
...
def nasty_long_running_method
mutual_exclusion(:nasty_long) do
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
end
update_model
end
end
I suggest rethinking your architecture as this is not going to be scalable - imagine having multiple ruby processes, failing processes, timeouts etc. Also in-process locking and spawning threads is quite dangerous for application servers.
If you want to sleep well with production then try some async background processing framework for long running tasks with serial queue which will ensure order of running tasks. Just simple RabbitMQ or check this QA Best practice for Rails App to run a long task in the background? , eventually try DB but Optimistic Locking.

ActiveRecord with Ruby script, without Rails, connection pool timeout

I'm using ActiveRecord in a Ruby script, without using rails, just running the Ruby script. The script kicks off threads which access the database. Eventually, I get:
could not obtain a database connection within 20.000 seconds
database configuration:
The pool is set for 10 connection. The timeout is set for 20 seconds.
I tried without using connection poll calls directly, but I'm not clear on how that is supposed to be managed. So I put cxn = ActiveRecord::Base.connection_pool.checkout and ActiveRecord::Base.connection_pool.checkin(cxn) around the database access portions of the code. I still get the error.
I put some debug in the script, and I can see the checkout/checkin calls happening. There are 7 successful checkout/checkins. There are 13 total checkouts, so 6 left open.
I'm also seeing:
undefined method `run_callbacks'
when the timeout occurs.
Thank you,
Jerome
You need to explicitly release connections back to the ActiveRecord connection pool. You either need to explicitly call ActiveRecord::Base.clear_active_connections! or else define
a helper method which does the following:
def with_connection(&block)
ActiveRecord::Base.connection_pool.with_connection do
yield block
end
end
which you would use as follows:
def my_method
with_connection do
User.where(:id => 1)
end
end
which should release your connection when the block exits.
Be warned that ActiveRecord uses the current Thread ID to manage connections, so if you are spawning off threads each thread needs to release connections once done with them. Your Threads are most likely not clearing connections properly, or you are spawning more threads than you have connections available in your connection pool.

Why Sinatra request takes EM thread?

Sinatra app receives requests for long running tasks and EM.defer them, launching them in EM's internal pool of 20 threads. When there are more than 20 EM.defer running, they are stored in EM's threadqueue by EM.defer.
However, it seems Sinatra won't service any requests until there is an EM thread available to handle them. My question is, isn't Sinatra suppose to use the reactor of the main thread to service all requests? Why am I seeing an add on the threadqueue when I make a new request?
Steps to reproduce:
Access /track/
Launch 30 /sleep/ reqs to fill the threadqueue
Access /ping/ and notice the add in the threadqueue as well as the delay
Code to reproduce it:
require 'sinatra'
#monkeypatch EM so we can access threadpools
module EventMachine
def self.queuedDefers
#threadqueue==nil ? 0: #threadqueue.size
end
def self.availThreads
#threadqueue==nil ? 0: #threadqueue.num_waiting
end
def self.busyThreads
#threadqueue==nil ? 0: #threadpool_size - #threadqueue.num_waiting
end
end
get '/track/?' do
EM.add_periodic_timer(1) do
p "Busy: " + EventMachine.busyThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Available: " + EventMachine.availThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Queued: " + EventMachine.queuedDefers.to_s
end
end
get '/sleep/?' do
EM.defer(Proc.new {sleep 20}, Proc.new {body "DONE"})
end
get '/ping/?' do
body "pong"
end
I tried the same thing on Rack/Thin (no Sinatra) and works as it's supposed to, so I guess Sinatra is causing it.
Ruby version: 1.9.3.p125
EventMachine: 1.0.0.beta.4.1
Sinatra: 1.3.2
OS: Windows
Ok, so it seems Sinatra starts Thin in threaded mode by default causing the above behavior.
You can add
set :threaded, false
in your Sinatra configure section and this will prevent the Reactor defering requests on a separate thread, and blocking when under load.
Source1
Source2
Unless I'm misunderstanding something about your question, this is pretty much how EventMachine works. If you check out the docs for EM.defer, they state:
Don't write a deferred operation that will block forever. If so, the
current implementation will not detect the problem, and the thread
will never be returned to the pool. EventMachine limits the number of
threads in its pool, so if you do this enough times, your subsequent
deferred operations won't get a chance to run.
Basically, there's a finite number of threads, and if you use them up, any pending operations will block until a thread is available.
It might be possible to bump threadpool_size if you just need more threads, although ultimately that's not a long-term solution.
Is Sinatra multi threaded? is a really good question here on SO about Sinatra and threads. In short, Sinatra is awesome but if you need decent threading you might need to look elsewhere.

Resources