Where/how can I host my small, database-driven, Sinatra-powered Ruby app? Heroku limits database connections - ruby

Here is my app: a price inflation calculator. If you click 'Calculate,' you'll probably get a server-related error. When I tested the app out in Sinatra, I got an error saying
PG::Error: FATAL: too many connections for role "********"
After looking into it, it turns out that Heroku limits its free databases to 20 connections.
I'd like to continue developing small apps (especially database-driven ones) on Heroku, but I may not be able to if I'm unable to get around this restriction. I would pay $50/month to get a database that allows more connections, but I do not yet know if it would be worth it.
My question is: Does anyone know if it's possible to get around this restriction for free, or if there are alternatives to Heroku that can host a database-driven Sinatra app?
Here's the code I use that adds inflation data to the database:
require 'rubygems'
require 'rest-client'
require 'nokogiri'
require 'sequel'
### MAKE CPI DATABASE ###
db_name = 'DATABASE_NAME_HERE'
DB = Sequel.postgres(db_name,:user=>'USER_NAME',:password=>'PASSWORD',:host=>'HOST',:port=>5432,:sslmode=>'require')
DB.create_table! :cpi_nsa_annual do
primary_key :id
Integer :year
Float :cpi
end # DONE: DB.create_table :cpi_nsa_annual do
cpi_annual = DB[:cpi_nsa_annual]
### DONE MAKING CPI DATABASE ###
post_url = "http://data.bls.gov/pdq/SurveyOutputServlet"
post_params = {
'delimiter'=>'comma',
'output_format'=>'html',
'output_type'=>'column',
'periods_option'=>'all_periods',
'series_id'=>'CUUR0000SA0',
'years_option'=>'all_years'
}
if page = RestClient.post(post_url,post_params)
npage = Nokogiri::HTML(page)
data = npage.css('table.regular-data tbody tr')
data.each{|row|
month_prefix = (row.css('th')[2].text)[0]
year = row.css('th')[1].text
month = (row.css('th')[2].text)[1..2]
cpi = row.css('td').text
if month_prefix=='M' and month=='13'
cpi_annual.insert(
:year=>year,
:cpi=>cpi
) # DONE: cpi_annual_insert
p ["YEAR",year,cpi]
end # DONE: month_prefix=='M' and month!='13'
}
end # DONE: if page
p cpi_annual.each{|row| p row}

It really doesn't seem like you need a database to accomplish what you want in that app. Can't you just store the annual rates in an array in your ruby code?
$inflation_rates = {"1999" => 2.19, "2000" => 2.97, "2001" => 3.73}
Or maybe I'm misunderstanding how your app works.
I've hosted small Sinatra apps with databases on Heroku without problems. The 20 connection limit has been enough. Are you sure its not a fault of your app? Maybe its using an excessive number of connections? Could you post your code?
Also, OpenKeyVal is an awesome free key/value data store with no connection limit (the only limit is the key must be under 64KB). It might force you to change your code around but its at least worth looking into. Because you can store data with JSONP calls you can build an app that uses the datastore in a single static html file.

meub's answer suggesting you store the data outside of a database is sound, but if you feel like you really want to stick with a database try looking into why your app is using so many connections.
If you run select * from pg_stat_database WHERE datname = 'yourdbname' the 'numbackends' field will tell you how many connections are being made to your database. If you do a fresh deploy to heroku and then visit your app a few times does the number of connections increase? Perhaps you need to close your connection.
If you add DB[:cpi_nsa_annual].disconnect at the end of your code does the number of connections stop going up with each page load?

Related

Ruby PG gem Connections Not ClosingPG gem Connections Not Closing

PostgreSQL is installed on my workstation and I host an admin website (Sinatra using the PG gem on nginx) that only I have access too.
Every so often I get thei following error:
PG::ConnectionBad - FATAL: sorry, too many clients already
max_connections is indeed set to 100, but I'm using two on any page at any time. Looking at the pg_stat_activity, I see two simple SELECTs that are not problematic:
SELECT * FROM twils ORDER BY id DESC LIMIT 30 client backend
SELECT mdate FROM params WHERE mkey = $1 client backend
Both of these are completed before the page has time to even render (comparable speed) and should not stay open. So either I'm mistaken in that I need to force these closed, or they are not closing at the end of the page, basically garbage collection.
Since PG connections are stateless and don't carry from page to page, I am under the impression the end of the page means all connections closed for the user. I thought it was automatic. Am I wrong to assume this? Am I to deliberately perform a conn.close or conn.finish on every connection?
conn = PG.connect( dbname: 'hq' )

Sequel + ADO + Puma is not threading queries

We have a website running in Windows Server 2008 + SQLServer 2008 + Ruby + Sinatra + Sequel/Puma
We've developed an API for our website.
When the access points are requested by many clients, at the same time, the clients start getting RequestTimeout exceptions.
I investigated a bit, and I noted that Puma is managing multi threading fine.
But Sequel (or any layer below Sequel) is processing one query at time, even if they came from different clients.
In fact, the RequestTimeout exceptions don't occur if I launch many web servers, each one listening one different port, and I assign one different port to each client.
I don't know yet if the problem is Sequel, ADO, ODBC, Windows, SQLServer or what.
The truth is that I cannot switch to any other technology (like TinyTDS)
Bellow is a little piece of code with screenshots that you can use to replicate the bug:
require 'sinatra'
require 'sequel'
CONNECTION_STRING =
"Driver={SQL Server};Server=.\\SQLEXPRESS;" +
"Trusted_Connection=no;" +
"Database=pulqui;Uid=;Pwd=;"
DB = Sequel.ado(:conn_string=>CONNECTION_STRING)
enable :sessions
configure { set :server, :puma }
set :public_folder, './public/'
set :bind, '0.0.0.0'
get '/delaybyquery.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
q = "select p1.* from liprofile p1, liprofile p2, liprofile p3, liprofile p4, liprofile p5"
DB[q].each { |row| # this query should takes a lot of time
puts row[:id]
}
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
get '/delaybycode.json' do
tid = params[:tid].to_s
begin
puts "(track-id=#{tid}).starting access point"
sleep(30)
puts "(track-id=#{tid}).done!"
rescue=>e
puts "(track-id=#{tid}).error:#{e.to_s}"
end
end
There are 2 access points in the code above:
delaybyquery.json, that generates a delay by joining the same table 5
times. Note that the table must be about 1000 rows in order to get the
query working really slow; and
delaybycode.json, that generates a delay by just calling the ruby sleep
function.
Both access points receives a tid (tracking-id) parameter, and both write the
outout in the CMD, so you can follow the activity of both process in the same
window and check which access point is blocking incoming requests from other
browsers.
For testing I'm opening 2 tabs in the same chrome browser.
Below are the 2 testings that I'm performing.
Step #1: Run the webserver
c:\source\pulqui>ruby example.app.rb -p 81
I get the output below
Step #2: Testing Delay by Code
I called to this URL:
127.0.0.1:81/delaybycode.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybycode.json?tid=456
Below is the output, where you can see that both calls are working in parallel
click here to see the screenshot
Step #3: Testing Delay by Query
I called to this URL:
127.0.0.1:81/delaybyquery.json?tid=123
and 5 seconds later I called this other URL
127.0.0.1:81/delaybyquery.json?tid=456
Below is the output, where you can see that calls are working 1 at time.
Each call to an access point is finishing with a query timeout exception.
click here to see the screenshot
This is almost assuredly due to win32ole (the driver that Sequel's ado adapter uses). It probably doesn't release the GVL during queries, which would cause the issues you are seeing.
If you cannot switch to TinyTDS or switch to JRuby, then your only option if you want concurrent queries is to run separate webserver processes, and have a reverse proxy server dispatch requests to them.

How to join multiple multicast groups on one interface

The Ruby version I have available is 1.8.7 and can't be upgraded as it is part of standard image that is used on all the companies Linux servers at this time and anything I do needs to be able to run on all of these servers without issue (I'm hoping though this won't be an issue)
The project I am doing is to recreate an application that currently runs on Windows on a Linux server. The application takes a list of multicast groups and interfaces and attempts to join the groups and then listens for any data (doesn't matter what) reporting whether it could join and the data was there. It helps us in our environment prove out network connectivity prior to deployment of actual software on to the server. The data that it will be receiving will be binary encoded financial information from an exchange so I don't need to output (hence the commented out line and the output) I just need to check it is available to the server.
I have read up online and found bits and pieces of code that I have cobbled together into a small version of this where it joins 1 multicast group bound to 1 interface and listens for data for a period of time reporting whether any data was received.
I then wanted to add a second multicast group and this is where my understanding is lacking in how to achieve this. My code is as follows:
#!/usr/bin/ruby
require 'socket'
require 'ipaddr'
require 'timeout'
MCAST_GROUP_A =
{
:addr => '233.54.12.111',
:port => 26477,
:bindaddr => '172.31.230.156'
}
MCAST_GROUP_B =
{
:addr => '233.54.12.111',
:port => 18170,
:bindaddr => '172.31.230.156'
}
ipA = IPAddr.new(MCAST_GROUP_A[:addr]).hton + IPAddr.new(MCAST_GROUP_A[:bindaddr]).hton
ipB = IPAddr.new(MCAST_GROUP_B[:addr]).hton + IPAddr.new(MCAST_GROUP_B[:bindaddr]).hton
begin
sockA = UDPSocket.open
sockA.setsockopt Socket::IPPROTO_IP, Socket::IP_ADD_MEMBERSHIP, ipA
sockA.setsockopt Socket::IPPROTO_IP, Socket::IP_ADD_MEMBERSHIP, ipB
sockA.bind Socket::INADDR_ANY, MCAST_GROUP_A[:port]
sockA.bind Socket::INADDR_ANY, MCAST_GROUP_B[:port]
timeoutSeconds = 10
Timeout.timeout(timeoutSeconds) do
msg, info = sockA.recvfrom(1024)
#puts "MSG: #{msg} from #{info[2]} (#{info[3]})/#{info[1]} len #{msg.size}"
puts "MSG: <garbled> from #{info[2]} (#{info[3]})/#{info[1]} len #{msg.size}"
end
rescue Timeout::Error
puts "Nothing received connection timedout\n"
ensure
sockA.close
end
The error I get when I run this is:
[root#dt1d-ddncche21a ~]# ./UDPServer.rb
./UDPServer.rb:35:in `setsockopt': Address already in use (Errno::EADDRINUSE)
from ./UDPServer.rb:35
So that's where I am at and could really do with firstly pointers as to what is wrong (hopefully with an update to the code) and then once I this example working the next step will to be add a second interface into the mix to listen to again multiple multicast groups,
Ok so I followed the advice given to bind to the interface first for each port and then add members for each of the multicast groups I want to listen to and this has resolved this particular issue and moved me on to the next issue I have. The next issue I will raise as a new topic.

In Ruby, is it possible to share a database connection across threads?

I've got a small little ruby script that pours over 80,000 or so records.
The processor and memory load involved for each record is smaller than a smurf balls, but it still takes about 8 minutes to walk all the records.
I'd though to use threading, but when I gave it a go, my db ran out of connections. Sure it was when I attempted to connect 200 times, and really I could limit it better than that.. But when I'm pushing this code up to Heroku (where I have 20 connections for all workers to share), I don't want to chance blocking other processes because this one ramped up.
I have thought of refactoring the code so that it conjoins the all the SQL, but that is going to feel really really messy.
So I'm wondering is there a trick to letting the threads share connections? Given I don't expect the connection variable to change during processing, I am actually sort of surprised that the thread fork needs to create a new DB connection.
Well any help would be super cool (just like me).. thanks
SUPER CONTRIVED EXAMPLE
Below is a 100% contrived example. It does display the issue.
I am using ActiveRecord inside a very simple thread. It seems each thread is creating it's own connection to the database. I base that assumption on the warning message that follows.
START_TIME = Time.now
require 'rubygems'
require 'erb'
require "active_record"
#environment = 'development'
#dbconfig = YAML.load(ERB.new(File.read('config/database.yml')).result)
ActiveRecord::Base.establish_connection #dbconfig[#environment]
class Product < ActiveRecord::Base; end
ids = Product.pluck(:id)
p "after pluck #{Time.now.to_f - START_TIME.to_f}"
threads = [];
ids.each do |id|
threads << Thread.new {Product.where(:id => id).update_all(:product_status_id => 99); }
if(threads.size > 4)
threads.each(&:join)
threads = []
p "after thread join #{Time.now.to_f - START_TIME.to_f}"
end
end
p "#{Time.now.to_f - START_TIME.to_f}"
OUTPUT
"after pluck 0.6663269996643066"
DEPRECATION WARNING: Database connections will not be closed automatically, please close your
database connection at the end of the thread by calling `close` on your
connection. For example: ActiveRecord::Base.connection.close
. (called from mon_synchronize at /Users/davidrawk/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211)
.....
"after thread join 5.7263710498809814" #THIS HAPPENS AFTER THE FIRST JOIN.
.....
"after thread join 10.743254899978638" #THIS HAPPENS AFTER THE SECOND JOIN
See this gem https://github.com/mperham/connection_pool and answer, a connection pool might be what you need: Why not use shared ActiveRecord connections for Rspec + Selenium?
The other option would be to use https://github.com/eventmachine/eventmachine and run your tasks in EM.defer block in such a way that DB access happens in the callback block (within reactor) in a non-blocking way
Alternatively, and a more robust solution too, go for a light-weight background processing queue such as beanstalkd, see https://www.ruby-toolbox.com/categories/Background_Jobs for more options - this would be my primary recommendation
EDIT,
also, you probably don't have 200 cores, so creating 200+ parallel threads and db connections doesn't really speed up the process (slows it down actually), see if you can find a way to partition your problem into a number of sets equal to your number of cores + 1 and solve the problem this way,
this is probably the simplest solution to your problem

Best way to concurrently check urls (for status i.e. 200,301,404) for multiple urls in database

Here's what I'm trying to accomplish. Let's say I have 100,000 urls stored in a database and I want to check each of these for http status and store that status. I want to be able to do this concurrently in a fairly small amount of time.
I was wondering what the best way(s) to do this would be. I thought about using some sort of queue with workers/consumers or some sort of evented model, but I don't really have enough experience to know what would work best in this scenario.
Ideas?
Take a look at the very capable Typhoeus and Hydra combo. The two make it very easy to concurrently process multiple URLs.
The "Times" example should get you up and running quickly. In the on_complete block put your code to write your statuses to the DB. You could use a thread to build and maintain the queued requests at a healthy level, or queue a set number, let them all run to completion, then loop for another group. It's up to you.
Paul Dix, the original author, talked about his design goals on his blog.
This is some sample code I wrote to download archived mail lists so I could do local searches. I deliberately removed the URL to keep from subjecting the site to DOS attacks if people start running the code:
#!/usr/bin/env ruby
require 'nokogiri'
require 'addressable/uri'
require 'typhoeus'
BASE_URL = ''
url = Addressable::URI.parse(BASE_URL)
resp = Typhoeus::Request.get(url.to_s)
doc = Nokogiri::HTML(resp.body)
hydra = Typhoeus::Hydra.new(:max_concurrency => 10)
doc.css('a').map{ |n| n['href'] }.select{ |href| href[/\.gz$/] }.each do |gzip|
gzip_url = url.join(gzip)
request = Typhoeus::Request.new(gzip_url.to_s)
request.on_complete do |resp|
gzip_filename = resp.request.url.split('/').last
puts "writing #{gzip_filename}"
File.open("gz/#{gzip_filename}", 'w') do |fo|
fo.write resp.body
end
end
puts "queuing #{ gzip }"
hydra.queue(request)
end
hydra.run
Running the code on my several-year-old MacBook Pro pulled in 76 files totaling 11MB in just under 20 seconds, over wireless to DSL. If you're only doing HEAD requests your throughput will be better. You'll want to mess with the concurrency setting because there is a point where having more concurrent sessions only slow you down and needlessly use resources.
I give it a 8 out of 10; It's got a great beat and I can dance to it.
EDIT:
When checking the remove URLs you can use a HEAD request, or a GET with the If-Modified-Since. They can give you responses you can use to determine the freshness of your URLs.
I haven't done anything multithreaded in Ruby, only in Java, but it seems pretty straightforward: http://www.tutorialspoint.com/ruby/ruby_multithreading.htm
From what you described, you don't need any queue and workers (well, I'm sure you can do it that way too, but I doubt you'll get much benefit). Just partition your urls between several threads, and let each thread do each chunk and update the database with the results. E.g., create 100 threads, and give each thread a range of 1000 database rows to process.
You could even just create 100 separate processes and give them rows as arguments, if you'd rather deal with processes than threads.
To get the URL status, I think you do an HTTP HEAD request, which I guess is http://apidock.com/ruby/Net/HTTP/request_head in ruby.
The work_queue gem is the easiest way to perform tasks asynchronously and concurrently in your application.
wq = WorkQueue.new 10
urls.each do |url|
wq.enqueue_b do
response = Net::HTTP.get_response(uri)
puts response.code
end
end
wq.join

Resources