We have a website running in Windows Server 2008 + SQLServer 2008 + Ruby + Sinatra + Sequel/Puma
We've developed an API for our website.
When the access points are requested by many clients, at the same time, the clients start getting RequestTimeout exceptions.
I investigated a bit, and I noted that Puma is managing multi threading fine.
But Sequel (or any layer below Sequel) is processing one query at time, even if they came from different clients.
In fact, the RequestTimeout exceptions don't occur if I launch many web servers, each one listening one different port, and I assign one different port to each client.
I don't know yet if the problem is Sequel, ADO, ODBC, Windows, SQLServer or what.
The truth is that I cannot switch to any other technology (like TinyTDS)
Bellow is a little piece of code with screenshots that you can use to replicate the bug:
require 'sinatra'
require 'sequel'
CONNECTION_STRING =
"Driver={SQL Server};Server=.\\SQLEXPRESS;" +
"Trusted_Connection=no;" +
"Database=pulqui;Uid=;Pwd=;"
DB = Sequel.ado(:conn_string=>CONNECTION_STRING)
enable :sessions
configure { set :server, :puma }
set :public_folder, './public/'
set :bind, '0.0.0.0'
get '/delaybyquery.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
q = "select p1.* from liprofile p1, liprofile p2, liprofile p3, liprofile p4, liprofile p5"
DB[q].each { |row| # this query should takes a lot of time
puts row[:id]
}
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
get '/delaybycode.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
sleep(30)
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
There are 2 access points in the code above:
delaybyquery.json, that generates a delay by joining the same table 5
times. Note that the table must be about 1000 rows in order to get the
query working really slow; and
delaybycode.json, that generates a delay by just calling the ruby sleep
function.
Both access points receives a tid (tracking-id) parameter, and both write the
outout in the CMD, so you can follow the activity of both process in the same
window and check which access point is blocking incoming requests from other
browsers.
For testing I'm opening 2 tabs in the same chrome browser.
Below are the 2 testings that I'm performing.
Step #1: Run the webserver
c:\source\pulqui>ruby example.app.rb -p 81
I get the output below
Step #2: Testing Delay by Code
I called to this URL:
127.0.0.1:81/delaybycode.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybycode.json?tid=456
Below is the output, where you can see that both calls are working in parallel
click here to see the screenshot
Step #3: Testing Delay by Query
I called to this URL:
127.0.0.1:81/delaybyquery.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybyquery.json?tid=456
Below is the output, where you can see that calls are working 1 at time.
Each call to an access point is finishing with a query timeout exception.
click here to see the screenshot
This is almost assuredly due to win32ole (the driver that Sequel's ado adapter uses). It probably doesn't release the GVL during queries, which would cause the issues you are seeing.
If you cannot switch to TinyTDS or switch to JRuby, then your only option if you want concurrent queries is to run separate webserver processes, and have a reverse proxy server dispatch requests to them.
Related
We are using Sidekiq to process a number of backend jobs. One in particular is used very heavily. All I can really say about it is that it sends emails. It doesn't do the email creation (that's a separate job), it just sends them. We spin up a new worker for each email that needs to be sent.
We are trying to upgrade to Ruby 3 and having problems, though. Ruby 2.6.8 has no issues; in 3 (as well as 2.7.3 IIRC), if there is a large number of queued workers, it will get through maybe 20K of them, then it will start hemorrhaging FIFO pipes, on the order of 300-1000 ever 5 seconds or so. Eventually it gets to the ulimit on the system (currently set at 64K) and all sockets/connections fail due to insufficient resources.
In trying to debug this issue I did a run with 90% of what the email worker does entirely commented out, so it does basically nothing except make a couple database queries and do some string templating. I thought I was getting somewhere with that approach, as one run (of 50K+ emails) succeeded without the pipe explosion. However, the next run (identical parameters) did wind up with the runaway pipes.
Profiling with rbspy and ruby-prof did not help much, as they primarily focus on the Sidekiq infrastructure, not the workers themselves.
Looking through our code, I did see that nothing we wrote is ever using IO.* (e.g. IO.popen, IO.select, etc), so I don't see what could be causing the FIFO pipes.
I did see https://github.com/mperham/sidekiq/wiki/Batches#huge-batches, which is not necessarily what we're doing. If you look at the code snippet below, we're basically creating one large batch. I'm not sure whether pushing jobs in bulk as per the link will help with the problem we're having, but I'm about to give it a try once I rework things a bit.
No matter what I do I can't seem to figure out the following:
What is making these pipes? Why are they being created?
What is the condition by which the pipes start getting made exponentially? There are two FIFO pipes that open when we start Sidekiq, but until enough work has been done, we don't see more than 2-6 pipes open generally.
Any advice is appreciated, even along the lines of where to look next, as I'm a bit stumped.
Initializer:
require_relative 'logger'
require_relative 'configuration'
require 'sidekiq-pro'
require "sidekiq-ent"
module Proprietary
unless const_defined?(:ENVIRONMENT)
ENVIRONMENT = ENV['RACK_ENV'] || ENV['RAILS_ENV'] || 'development'
end
# Sidekiq.client_middleware.add Sidekiq::Middleware::Client::Batch
REDIS_URL = if ENV["REDIS_URL"].present?
ENV["REDIS_URL"]
else
"redis://#{ENV["REDIS_SERVER"]}:#{ENV["REDIS_PORT"]}"
end
METRICS = Statsd.new "10.0.9.215", 8125
Sidekiq::Enterprise.unique! unless Proprietary::ENVIRONMENT == "test"
Sidekiq.configure_server do |config|
# require 'sidekiq/pro/reliable_fetch'
config.average_scheduled_poll_interval = 2
config.redis = {
namespace: Proprietary.config.SIDEKIQ_NAMESPACE,
url: Proprietary::REDIS_URL
}
config.server_middleware do |chain|
require 'sidekiq/middleware/server/statsd'
chain.add Sidekiq::Middleware::Server::Statsd, :client => METRICS
end
config.error_handlers << Proc.new do |ex,ctx_hash|
Proprietary.report_exception(ex, "Sidekiq", ctx_hash)
end
config.super_fetch!
config.reliable_scheduler!
end
Sidekiq.configure_client do |config|
config.redis = {
namespace: Proprietary.config.SIDEKIQ_NAMESPACE,
url: Proprietary::REDIS_URL,
size: 15,
network_timeout: 5
}
end
end
Code snippet (sanitized)
def add_targets_to_batch
#target_count = targets.count
queue_counter = 0
batch.jobs do
targets.shuffle.each do |target|
send(campaign_target)
queue_counter += 1
end
end
end
def send(campaign_target)
TargetEmailWorker.perform_async(target[:id],
guid,
is_draft ? target[:email_address] : nil)
begin
Target.where(id: target[:id]).update(send_at: Time.now.utc)
rescue Exception => ex
Proprietary.report_exception(ex, self.class.name, { target_id: target[:id], guid: guid })
end
end
end
First I tried auditing our external connections for connection pooling, etc. That did not help the issue. Eventually I got to the point where I disabled all external connections and let the job run doing virtually nothing outside of a database query and some logging. This allowed one run to complete without issue, but on the second one, the FIFO pipes still grew exponentially after a certain (variable) amount of work was done.
PostgreSQL is installed on my workstation and I host an admin website (Sinatra using the PG gem on nginx) that only I have access too.
Every so often I get thei following error:
PG::ConnectionBad - FATAL: sorry, too many clients already
max_connections is indeed set to 100, but I'm using two on any page at any time. Looking at the pg_stat_activity, I see two simple SELECTs that are not problematic:
SELECT * FROM twils ORDER BY id DESC LIMIT 30 client backend
SELECT mdate FROM params WHERE mkey = $1 client backend
Both of these are completed before the page has time to even render (comparable speed) and should not stay open. So either I'm mistaken in that I need to force these closed, or they are not closing at the end of the page, basically garbage collection.
Since PG connections are stateless and don't carry from page to page, I am under the impression the end of the page means all connections closed for the user. I thought it was automatic. Am I wrong to assume this? Am I to deliberately perform a conn.close or conn.finish on every connection?
conn = PG.connect( dbname: 'hq' )
I am running something along the lines of the following:
results = queries.map do |query|
begin
Neo4j::Session.query(query)
rescue Faraday::TimeoutError
nil
end
end
After a few iterations I get an unrescued Faraday::TimeoutError: too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) and Neo4j needs switching off and on again.
I believe this is because the queries themselves aren't aborted - i.e. the connection times out but Neo4j carries on trying to run my query. I actually want to time them out, so simply increasing the timeout window won't help me.
I've had a scout around and it looks like I can find my queries and abort them via the Neo4j API, which will be my next move.
Am I right in my diagnosis? If so, is there a recommended way of managing queries (and aborting them) from neo4jrb?
Rebecca is right about managing queries manually. Though if you want Neo4j to automatically stop queries within a certain time period, you can set this in your neo4j conf:
dbms.transaction.timeout=60s
You can find more info in the docs for that setting.
The Ruby gem is using Faraday to connect to Neo4j via HTTP and Faraday has a built-in timeout which is separate from the one in Neo4j. I would suggest setting the Neo4j timeout as a bit longer (5-10 seconds perhaps) than the one in Ruby (here are the docs for configuring the Faraday timeout). If they both have the same timeout, Neo4j might raise a timeout before Ruby, making for a less clear error.
Query management can be done through Cypher. You must be an admin user.
To list all queries, you can use CALL dbms.listQueries;.
To kill a query, you can use CALL dbms.killQuery('ID-OF-QUERY-TO-KILL');, where the ID is obtained from the list of queries.
The previous statements must be executed as a raw query; it does not matter whether you are using an OGM, as long as you can input queries manually. If there is no way to manually input queries, and there is no way of doing this in your framework, then you will have to access the database using some other method in order to execute the queries.
So thanks to Brian and Rebecca for useful tips about query management within Neo4j. Both of these point the way to viable solutions to my problem, and Brian's explicitly lays out steps for achieving one via Neo4jrb so I've marked it correct.
As both answers assume, the diagnosis I made IS correct - i.e. if you run a query from Neo4jrb and the HTTP connection times out, Neo4j will carry on executing the query and Neo4jrb will not issue any instruction for it to stop.
Neo4jrb does not provide a wrapper for any query management functionality, so simply setting a transaction timeout seems most sensible and probably what I'll adopt. Actually intercepting and killing queries is also possible, but this means running your query on one thread so that you can look up its queryId in another. This is the somewhat hacky solution I'm working with atm:
class QueryRunner
DEFAULT_TIMEOUT=70
def self.query(query, timeout_limit=DEFAULT_TIMEOUT)
new(query, timeout_limit).run
end
def initialize(query, timeout_limit)
#query = query
#timeout_limit = timeout_limit
end
def run
start_time = Time.now.to_i
Thread.new { #result = Neo4j::Session.query(#query) }
sleep 0.5
return #result if #result
id = if query_ref = Neo4j::Session.query("CALL dbms.listQueries;").to_a.find {|x| x.query == #query }
query_ref.queryId
end
while #result.nil?
if (Time.now.to_i - start_time) > #timeout_limit
puts "killing query #{id} due to timeout"
Neo4j::Session.query("CALL dbms.killQuery('#{id}');")
#result = []
else
sleep 1
end
end
#result
end
end
I've got a small little ruby script that pours over 80,000 or so records.
The processor and memory load involved for each record is smaller than a smurf balls, but it still takes about 8 minutes to walk all the records.
I'd though to use threading, but when I gave it a go, my db ran out of connections. Sure it was when I attempted to connect 200 times, and really I could limit it better than that.. But when I'm pushing this code up to Heroku (where I have 20 connections for all workers to share), I don't want to chance blocking other processes because this one ramped up.
I have thought of refactoring the code so that it conjoins the all the SQL, but that is going to feel really really messy.
So I'm wondering is there a trick to letting the threads share connections? Given I don't expect the connection variable to change during processing, I am actually sort of surprised that the thread fork needs to create a new DB connection.
Well any help would be super cool (just like me).. thanks
SUPER CONTRIVED EXAMPLE
Below is a 100% contrived example. It does display the issue.
I am using ActiveRecord inside a very simple thread. It seems each thread is creating it's own connection to the database. I base that assumption on the warning message that follows.
START_TIME = Time.now
require 'rubygems'
require 'erb'
require "active_record"
#environment = 'development'
#dbconfig = YAML.load(ERB.new(File.read('config/database.yml')).result)
ActiveRecord::Base.establish_connection #dbconfig[#environment]
class Product < ActiveRecord::Base; end
ids = Product.pluck(:id)
p "after pluck #{Time.now.to_f - START_TIME.to_f}"
threads = [];
ids.each do |id|
threads << Thread.new {Product.where(:id => id).update_all(:product_status_id => 99); }
if(threads.size > 4)
threads.each(&:join)
threads = []
p "after thread join #{Time.now.to_f - START_TIME.to_f}"
end
end
p "#{Time.now.to_f - START_TIME.to_f}"
OUTPUT
"after pluck 0.6663269996643066"
DEPRECATION WARNING: Database connections will not be closed automatically, please close your
database connection at the end of the thread by calling `close` on your
connection. For example: ActiveRecord::Base.connection.close
. (called from mon_synchronize at /Users/davidrawk/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211)
.....
"after thread join 5.7263710498809814" #THIS HAPPENS AFTER THE FIRST JOIN.
.....
"after thread join 10.743254899978638" #THIS HAPPENS AFTER THE SECOND JOIN
See this gem https://github.com/mperham/connection_pool and answer, a connection pool might be what you need: Why not use shared ActiveRecord connections for Rspec + Selenium?
The other option would be to use https://github.com/eventmachine/eventmachine and run your tasks in EM.defer block in such a way that DB access happens in the callback block (within reactor) in a non-blocking way
Alternatively, and a more robust solution too, go for a light-weight background processing queue such as beanstalkd, see https://www.ruby-toolbox.com/categories/Background_Jobs for more options - this would be my primary recommendation
EDIT,
also, you probably don't have 200 cores, so creating 200+ parallel threads and db connections doesn't really speed up the process (slows it down actually), see if you can find a way to partition your problem into a number of sets equal to your number of cores + 1 and solve the problem this way,
this is probably the simplest solution to your problem
Here is my app: a price inflation calculator. If you click 'Calculate,' you'll probably get a server-related error. When I tested the app out in Sinatra, I got an error saying
PG::Error: FATAL: too many connections for role "********"
After looking into it, it turns out that Heroku limits its free databases to 20 connections.
I'd like to continue developing small apps (especially database-driven ones) on Heroku, but I may not be able to if I'm unable to get around this restriction. I would pay $50/month to get a database that allows more connections, but I do not yet know if it would be worth it.
My question is: Does anyone know if it's possible to get around this restriction for free, or if there are alternatives to Heroku that can host a database-driven Sinatra app?
Here's the code I use that adds inflation data to the database:
require 'rubygems'
require 'rest-client'
require 'nokogiri'
require 'sequel'
### MAKE CPI DATABASE ###
db_name = 'DATABASE_NAME_HERE'
DB = Sequel.postgres(db_name,:user=>'USER_NAME',:password=>'PASSWORD',:host=>'HOST',:port=>5432,:sslmode=>'require')
DB.create_table! :cpi_nsa_annual do
primary_key :id
Integer :year
Float :cpi
end # DONE: DB.create_table :cpi_nsa_annual do
cpi_annual = DB[:cpi_nsa_annual]
### DONE MAKING CPI DATABASE ###
post_url = "http://data.bls.gov/pdq/SurveyOutputServlet"
post_params = {
'delimiter'=>'comma',
'output_format'=>'html',
'output_type'=>'column',
'periods_option'=>'all_periods',
'series_id'=>'CUUR0000SA0',
'years_option'=>'all_years'
}
if page = RestClient.post(post_url,post_params)
npage = Nokogiri::HTML(page)
data = npage.css('table.regular-data tbody tr')
data.each{|row|
month_prefix = (row.css('th')[2].text)[0]
year = row.css('th')[1].text
month = (row.css('th')[2].text)[1..2]
cpi = row.css('td').text
if month_prefix=='M' and month=='13'
cpi_annual.insert(
:year=>year,
:cpi=>cpi
) # DONE: cpi_annual_insert
p ["YEAR",year,cpi]
end # DONE: month_prefix=='M' and month!='13'
}
end # DONE: if page
p cpi_annual.each{|row| p row}
It really doesn't seem like you need a database to accomplish what you want in that app. Can't you just store the annual rates in an array in your ruby code?
$inflation_rates = {"1999" => 2.19, "2000" => 2.97, "2001" => 3.73}
Or maybe I'm misunderstanding how your app works.
I've hosted small Sinatra apps with databases on Heroku without problems. The 20 connection limit has been enough. Are you sure its not a fault of your app? Maybe its using an excessive number of connections? Could you post your code?
Also, OpenKeyVal is an awesome free key/value data store with no connection limit (the only limit is the key must be under 64KB). It might force you to change your code around but its at least worth looking into. Because you can store data with JSONP calls you can build an app that uses the datastore in a single static html file.
meub's answer suggesting you store the data outside of a database is sound, but if you feel like you really want to stick with a database try looking into why your app is using so many connections.
If you run select * from pg_stat_database WHERE datname = 'yourdbname' the 'numbackends' field will tell you how many connections are being made to your database. If you do a fresh deploy to heroku and then visit your app a few times does the number of connections increase? Perhaps you need to close your connection.
If you add DB[:cpi_nsa_annual].disconnect at the end of your code does the number of connections stop going up with each page load?