Ruby PG gem Connections Not ClosingPG gem Connections Not Closing - ruby

PostgreSQL is installed on my workstation and I host an admin website (Sinatra using the PG gem on nginx) that only I have access too.
Every so often I get thei following error:
PG::ConnectionBad - FATAL: sorry, too many clients already
max_connections is indeed set to 100, but I'm using two on any page at any time. Looking at the pg_stat_activity, I see two simple SELECTs that are not problematic:
SELECT * FROM twils ORDER BY id DESC LIMIT 30 client backend
SELECT mdate FROM params WHERE mkey = $1 client backend
Both of these are completed before the page has time to even render (comparable speed) and should not stay open. So either I'm mistaken in that I need to force these closed, or they are not closing at the end of the page, basically garbage collection.
Since PG connections are stateless and don't carry from page to page, I am under the impression the end of the page means all connections closed for the user. I thought it was automatic. Am I wrong to assume this? Am I to deliberately perform a conn.close or conn.finish on every connection?
conn = PG.connect( dbname: 'hq' )

Related

Sequel + ADO + Puma is not threading queries

We have a website running in Windows Server 2008 + SQLServer 2008 + Ruby + Sinatra + Sequel/Puma
We've developed an API for our website.
When the access points are requested by many clients, at the same time, the clients start getting RequestTimeout exceptions.
I investigated a bit, and I noted that Puma is managing multi threading fine.
But Sequel (or any layer below Sequel) is processing one query at time, even if they came from different clients.
In fact, the RequestTimeout exceptions don't occur if I launch many web servers, each one listening one different port, and I assign one different port to each client.
I don't know yet if the problem is Sequel, ADO, ODBC, Windows, SQLServer or what.
The truth is that I cannot switch to any other technology (like TinyTDS)
Bellow is a little piece of code with screenshots that you can use to replicate the bug:
require 'sinatra'
require 'sequel'
CONNECTION_STRING =
"Driver={SQL Server};Server=.\\SQLEXPRESS;" +
"Trusted_Connection=no;" +
"Database=pulqui;Uid=;Pwd=;"
DB = Sequel.ado(:conn_string=>CONNECTION_STRING)
enable :sessions
configure { set :server, :puma }
set :public_folder, './public/'
set :bind, '0.0.0.0'
get '/delaybyquery.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
q = "select p1.* from liprofile p1, liprofile p2, liprofile p3, liprofile p4, liprofile p5"
DB[q].each { |row| # this query should takes a lot of time
puts row[:id]
}
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
get '/delaybycode.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
sleep(30)
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
There are 2 access points in the code above:
delaybyquery.json, that generates a delay by joining the same table 5
times. Note that the table must be about 1000 rows in order to get the
query working really slow; and
delaybycode.json, that generates a delay by just calling the ruby sleep
function.
Both access points receives a tid (tracking-id) parameter, and both write the
outout in the CMD, so you can follow the activity of both process in the same
window and check which access point is blocking incoming requests from other
browsers.
For testing I'm opening 2 tabs in the same chrome browser.
Below are the 2 testings that I'm performing.
Step #1: Run the webserver
c:\source\pulqui>ruby example.app.rb -p 81
I get the output below
Step #2: Testing Delay by Code
I called to this URL:
127.0.0.1:81/delaybycode.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybycode.json?tid=456
Below is the output, where you can see that both calls are working in parallel
click here to see the screenshot
Step #3: Testing Delay by Query
I called to this URL:
127.0.0.1:81/delaybyquery.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybyquery.json?tid=456
Below is the output, where you can see that calls are working 1 at time.
Each call to an access point is finishing with a query timeout exception.
click here to see the screenshot
This is almost assuredly due to win32ole (the driver that Sequel's ado adapter uses). It probably doesn't release the GVL during queries, which would cause the issues you are seeing.
If you cannot switch to TinyTDS or switch to JRuby, then your only option if you want concurrent queries is to run separate webserver processes, and have a reverse proxy server dispatch requests to them.

How to configure Redis connections with Rails 4, Puma and Sidekiq?

I am using Sidekiq (on Heroku with Puma) to send emails asynchronously and would like to use Redis to keep counters and cache models.
RedisCloud's free plan includes 30 connections to Redis. It is not clear to me how to manage:
redis connections used by Sidekiq
redis connections used in models (caching and counters)
Sidekiq Client size is configured like this:
Sidekiq.configure_client do |config|
config.redis = {url: ENV["REDISCLOUD_URL"], size: 3}
end
If I understood this correctly, Puma forks multiple processes, 2 in my case, which will result in:
2 (Puma Workers) * 3 (size) * 1 (Web Dyno) = 6 connections to redis used to push jobs.
Sidekiq Server
With Sidekiq taking 2 connections (or 5 in version 4), setting a concurrency of 10 would default in a server size of 12 or 15.
If I wanted to use all the remaining available connections (30 - 6 = 24), I could set :
Sidekiq.configure_client do |config|
config.redis = { size: 19 }
end
Total redis connections would be 19 + 5 (Sidekiq 4) = 24, and use the default concurrency of 25 would be ok.
As Mike Perham stated generally the concurrency must not be more than (server pool size - 2) * 2.
Now, where it starts to get confusing for me is the use of Redis out of Sidekiq.
# initializers/redis.rb
$redis = Redis.new(:url => uri)
Whenever I use Redis in a model or controller I call like so:
$redis.hincrby("mycounter", "key", 1)
As I understand it, all the puma threads wait on each other on a single Redis connection when $redis.whateverFunction is called.
In this answer What is the best way to use Redis in a Multi-threaded Rails environment? (Puma / Sidekiq), the recommended approach is using the connection_pool gem, related to the Sidekiq Wiki https://github.com/mperham/sidekiq/wiki/Advanced-Options#connection-pooling
require 'connection_pool'
$redis = ConnectionPool.new(size: 10) { Redis.new }
If I understand it right, it that case $redis.whateverFunction would have its own connection pool of 10, and sidekiq its own connection workers pool which would now be set out a new total of 20 redis connections ( 30 (available total) - 10 (redis model connections ), and Sidekiq client and server size would need to be changed.
How do you determine the size of the connection pool (here 10) needed for model/controller redis connections? Since Redis is single-threaded, how does increasing the connection pool actually increases redis operations performance?
Any thoughts on this would be of great help.
Thx!
Redis is single-threaded, but written in pure C, uses an event loop inside and handles connections asynchronously, so connection count does not affect it by much provided the same number of requests. It is capable of handling requests faster than your application can generate them because of network delay, ruby being slower than compiled and optimized C, etc, so you do not need to worry about it being single-threaded.
Increasing number of connections is beneficial for concurrent requests from different threads because there's no need to wait for response to be delivered over network to unlock connection, plus ruby can do parallel IOs.
Also you can tell if pool is too small when connection checkout times become worse than you expect/tolerate and corresponding thread/worker is idling while waiting for it, so benchmark your code and have a good look on your actual usage and behavior patterns.
On the other side i'd advise against using all of the connection count limit, there're times when you might need these extra connections. For example:
for graceful/"zero downtime" dyno restarts ("preboot") you need twice the connections, since old processes are still running for some time
keep at least one free connection for emergency debug as you may want to be able to connect from console/directly and see what data is inside when some unexpected highload comes

How to join multiple multicast groups on one interface

The Ruby version I have available is 1.8.7 and can't be upgraded as it is part of standard image that is used on all the companies Linux servers at this time and anything I do needs to be able to run on all of these servers without issue (I'm hoping though this won't be an issue)
The project I am doing is to recreate an application that currently runs on Windows on a Linux server. The application takes a list of multicast groups and interfaces and attempts to join the groups and then listens for any data (doesn't matter what) reporting whether it could join and the data was there. It helps us in our environment prove out network connectivity prior to deployment of actual software on to the server. The data that it will be receiving will be binary encoded financial information from an exchange so I don't need to output (hence the commented out line and the output) I just need to check it is available to the server.
I have read up online and found bits and pieces of code that I have cobbled together into a small version of this where it joins 1 multicast group bound to 1 interface and listens for data for a period of time reporting whether any data was received.
I then wanted to add a second multicast group and this is where my understanding is lacking in how to achieve this. My code is as follows:
#!/usr/bin/ruby
require 'socket'
require 'ipaddr'
require 'timeout'
MCAST_GROUP_A =
{
:addr => '233.54.12.111',
:port => 26477,
:bindaddr => '172.31.230.156'
}
MCAST_GROUP_B =
{
:addr => '233.54.12.111',
:port => 18170,
:bindaddr => '172.31.230.156'
}
ipA = IPAddr.new(MCAST_GROUP_A[:addr]).hton + IPAddr.new(MCAST_GROUP_A[:bindaddr]).hton
ipB = IPAddr.new(MCAST_GROUP_B[:addr]).hton + IPAddr.new(MCAST_GROUP_B[:bindaddr]).hton
begin
sockA = UDPSocket.open
sockA.setsockopt Socket::IPPROTO_IP, Socket::IP_ADD_MEMBERSHIP, ipA
sockA.setsockopt Socket::IPPROTO_IP, Socket::IP_ADD_MEMBERSHIP, ipB
sockA.bind Socket::INADDR_ANY, MCAST_GROUP_A[:port]
sockA.bind Socket::INADDR_ANY, MCAST_GROUP_B[:port]
timeoutSeconds = 10
Timeout.timeout(timeoutSeconds) do
msg, info = sockA.recvfrom(1024)
#puts "MSG: #{msg} from #{info[2]} (#{info[3]})/#{info[1]} len #{msg.size}"
puts "MSG: <garbled> from #{info[2]} (#{info[3]})/#{info[1]} len #{msg.size}"
end
rescue Timeout::Error
puts "Nothing received connection timedout\n"
ensure
sockA.close
end
The error I get when I run this is:
[root#dt1d-ddncche21a ~]# ./UDPServer.rb
./UDPServer.rb:35:in `setsockopt': Address already in use (Errno::EADDRINUSE)
from ./UDPServer.rb:35
So that's where I am at and could really do with firstly pointers as to what is wrong (hopefully with an update to the code) and then once I this example working the next step will to be add a second interface into the mix to listen to again multiple multicast groups,
Ok so I followed the advice given to bind to the interface first for each port and then add members for each of the multicast groups I want to listen to and this has resolved this particular issue and moved me on to the next issue I have. The next issue I will raise as a new topic.

In Ruby, is it possible to share a database connection across threads?

I've got a small little ruby script that pours over 80,000 or so records.
The processor and memory load involved for each record is smaller than a smurf balls, but it still takes about 8 minutes to walk all the records.
I'd though to use threading, but when I gave it a go, my db ran out of connections. Sure it was when I attempted to connect 200 times, and really I could limit it better than that.. But when I'm pushing this code up to Heroku (where I have 20 connections for all workers to share), I don't want to chance blocking other processes because this one ramped up.
I have thought of refactoring the code so that it conjoins the all the SQL, but that is going to feel really really messy.
So I'm wondering is there a trick to letting the threads share connections? Given I don't expect the connection variable to change during processing, I am actually sort of surprised that the thread fork needs to create a new DB connection.
Well any help would be super cool (just like me).. thanks
SUPER CONTRIVED EXAMPLE
Below is a 100% contrived example. It does display the issue.
I am using ActiveRecord inside a very simple thread. It seems each thread is creating it's own connection to the database. I base that assumption on the warning message that follows.
START_TIME = Time.now
require 'rubygems'
require 'erb'
require "active_record"
#environment = 'development'
#dbconfig = YAML.load(ERB.new(File.read('config/database.yml')).result)
ActiveRecord::Base.establish_connection #dbconfig[#environment]
class Product < ActiveRecord::Base; end
ids = Product.pluck(:id)
p "after pluck #{Time.now.to_f - START_TIME.to_f}"
threads = [];
ids.each do |id|
threads << Thread.new {Product.where(:id => id).update_all(:product_status_id => 99); }
if(threads.size > 4)
threads.each(&:join)
threads = []
p "after thread join #{Time.now.to_f - START_TIME.to_f}"
end
end
p "#{Time.now.to_f - START_TIME.to_f}"
OUTPUT
"after pluck 0.6663269996643066"
DEPRECATION WARNING: Database connections will not be closed automatically, please close your
database connection at the end of the thread by calling `close` on your
connection. For example: ActiveRecord::Base.connection.close
. (called from mon_synchronize at /Users/davidrawk/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211)
.....
"after thread join 5.7263710498809814" #THIS HAPPENS AFTER THE FIRST JOIN.
.....
"after thread join 10.743254899978638" #THIS HAPPENS AFTER THE SECOND JOIN
See this gem https://github.com/mperham/connection_pool and answer, a connection pool might be what you need: Why not use shared ActiveRecord connections for Rspec + Selenium?
The other option would be to use https://github.com/eventmachine/eventmachine and run your tasks in EM.defer block in such a way that DB access happens in the callback block (within reactor) in a non-blocking way
Alternatively, and a more robust solution too, go for a light-weight background processing queue such as beanstalkd, see https://www.ruby-toolbox.com/categories/Background_Jobs for more options - this would be my primary recommendation
EDIT,
also, you probably don't have 200 cores, so creating 200+ parallel threads and db connections doesn't really speed up the process (slows it down actually), see if you can find a way to partition your problem into a number of sets equal to your number of cores + 1 and solve the problem this way,
this is probably the simplest solution to your problem

Where/how can I host my small, database-driven, Sinatra-powered Ruby app? Heroku limits database connections

Here is my app: a price inflation calculator. If you click 'Calculate,' you'll probably get a server-related error. When I tested the app out in Sinatra, I got an error saying
PG::Error: FATAL: too many connections for role "********"
After looking into it, it turns out that Heroku limits its free databases to 20 connections.
I'd like to continue developing small apps (especially database-driven ones) on Heroku, but I may not be able to if I'm unable to get around this restriction. I would pay $50/month to get a database that allows more connections, but I do not yet know if it would be worth it.
My question is: Does anyone know if it's possible to get around this restriction for free, or if there are alternatives to Heroku that can host a database-driven Sinatra app?
Here's the code I use that adds inflation data to the database:
require 'rubygems'
require 'rest-client'
require 'nokogiri'
require 'sequel'
### MAKE CPI DATABASE ###
db_name = 'DATABASE_NAME_HERE'
DB = Sequel.postgres(db_name,:user=>'USER_NAME',:password=>'PASSWORD',:host=>'HOST',:port=>5432,:sslmode=>'require')
DB.create_table! :cpi_nsa_annual do
primary_key :id
Integer :year
Float :cpi
end # DONE: DB.create_table :cpi_nsa_annual do
cpi_annual = DB[:cpi_nsa_annual]
### DONE MAKING CPI DATABASE ###
post_url = "http://data.bls.gov/pdq/SurveyOutputServlet"
post_params = {
'delimiter'=>'comma',
'output_format'=>'html',
'output_type'=>'column',
'periods_option'=>'all_periods',
'series_id'=>'CUUR0000SA0',
'years_option'=>'all_years'
}
if page = RestClient.post(post_url,post_params)
npage = Nokogiri::HTML(page)
data = npage.css('table.regular-data tbody tr')
data.each{|row|
month_prefix = (row.css('th')[2].text)[0]
year = row.css('th')[1].text
month = (row.css('th')[2].text)[1..2]
cpi = row.css('td').text
if month_prefix=='M' and month=='13'
cpi_annual.insert(
:year=>year,
:cpi=>cpi
) # DONE: cpi_annual_insert
p ["YEAR",year,cpi]
end # DONE: month_prefix=='M' and month!='13'
}
end # DONE: if page
p cpi_annual.each{|row| p row}
It really doesn't seem like you need a database to accomplish what you want in that app. Can't you just store the annual rates in an array in your ruby code?
$inflation_rates = {"1999" => 2.19, "2000" => 2.97, "2001" => 3.73}
Or maybe I'm misunderstanding how your app works.
I've hosted small Sinatra apps with databases on Heroku without problems. The 20 connection limit has been enough. Are you sure its not a fault of your app? Maybe its using an excessive number of connections? Could you post your code?
Also, OpenKeyVal is an awesome free key/value data store with no connection limit (the only limit is the key must be under 64KB). It might force you to change your code around but its at least worth looking into. Because you can store data with JSONP calls you can build an app that uses the datastore in a single static html file.
meub's answer suggesting you store the data outside of a database is sound, but if you feel like you really want to stick with a database try looking into why your app is using so many connections.
If you run select * from pg_stat_database WHERE datname = 'yourdbname' the 'numbackends' field will tell you how many connections are being made to your database. If you do a fresh deploy to heroku and then visit your app a few times does the number of connections increase? Perhaps you need to close your connection.
If you add DB[:cpi_nsa_annual].disconnect at the end of your code does the number of connections stop going up with each page load?

Resources