Sqlite3 library won't open after 250 inserts - ruby

I'm trying to insert a large amount of information into a Sqlite3 database using a ruby script. After 250 db_prepare_location.execute's to do this, it stops working saying:
.rvm/gems/ruby-1.9.2-p290/gems/sqlite3-1.3.6/lib/sqlite3/statement.rb:67:in `step': unable to open database file (SQLite3::CantOpenException)
from /Users/ashley/.rvm/gems/ruby-1.9.2-p290/gems/sqlite3-1.3.6/lib/sqlite3/statement.rb:67:in `execute'
from programs.rb:57:in `get_program_details'
from programs.rb:22:in `block in get_link'
from /Users/ashley/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1768:in `each'
from /Users/ashley/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1202:in `block in foreach'
from /Users/ashley/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1340:in `open'
from /Users/ashley/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1201:in `foreach'
from programs.rb:20:in `get_link'
from programs.rb:63:in `<module:Test>'
from programs.rb:15:in `<main>'
And here's my code:
require 'net/http'
require 'json'
require 'nokogiri'
require 'open-uri'
require 'csv'
require 'sqlite3'
require "bundler/setup"
require "capybara"
require "capybara/dsl"
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.current_driver = :selenium
module Test
class Tree
include Capybara::DSL
def get_link
CSV.foreach("links.csv") do |row|
link = row[0]
get_details(link)
end
end
def get_details(link)
db = SQLite3::Database.open "development.sqlite3"
address = []
address_text = []
visit("#{link}")
name = find("#listing_detail_header").find("h3").text
page.find(:xpath, "//div[#id='listing_detail_header']").all(:xpath, "//span/span").each {|span| address << span }
if address.size == 4
street_address = address[0].text
address.shift
address.each {|a| address_text << a.text }
city_state_address = address_text.join(", ")
else
puts link
street_address = ""
city_state_address = ""
end
if page.has_css?('.provider-click_to_call')
find(".provider-click_to_call").click
phone_number = find("#phone_number").text.gsub(/[()]/, "").gsub(" ", "-")
else
phone_number = ""
end
if page.has_css?('.provider-website_link')
website = find(".provider-website_link")[:href]
else
website = ""
end
description = find(".listing_details_list").find("p").text
db_prepare_location = db.prepare("INSERT INTO programs(name, city_state_address, street_address, phone_number, website, description) VALUES (?, ?, ?, ?, ?, ?)")
db_prepare_location.bind_params name, city_state_address, street_address, phone_number, website, description
db_prepare_location.execute
end
end
test = Test::Tree.new
test.get_link
end
What is the problem here and what can I do to fix it? Let me know if additional info is needed.

You could be running out file descriptors. Every time you call get_details, you open the SQLite database:
db = SQLite3::Database.open "development.sqlite3"
but you never explicitly close it; instead, you're relying on the garbage collector to clean up all your dbs and close all your file descriptors. Each time you open the database, you need to allocate a file descriptor, closing the database frees the file descriptor. If you're calling get_details faster than the GC can clean things up, you will run out of file descriptors and subsequent SQLite3::Database.open calls will fail.
Try adding db.close at the end of get_details.
You'll probably have to close the prepared statement as well so you should db_prepare_location.close before db.close:
def get_details
#...
db_prepare_location.close
db.close
end
Yes, Ruby has garbage collection but that doesn't mean that you don't have to manage your resources by hand.
Another option (which DGM was hinting at) would be to open a connection to the database in your constructor:
def initialize
#db = SQLite3::Database.open "development.sqlite3"
end
and then drop your SQLite3::Database.open call in get_details and use #db instead. You wouldn't need a db.close in get_details anymore but you'd still want the db_prepare_location.close call.

Related

NameError Exception: undefined local variable or method `products' for Wheyscrapper:Class

I'm building a small web scraper using Ruby and now I'm trying to refactor my code. Unfortunately, I'm encountering some errors while I'm refactoring my code. This is one of the errors.
Basically, I'm calling two separate methods in the first method which is whey_scrapper. Each of these two methods are basically responsible of scraping a specific item on the webpage. When I run and debug this code with byebug, I basically try to display the products or prices I've scraped but I get an error message saying that 'products' or 'prices' is undefined. This is my current code:
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'byebug'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
products = Array.new
product_names = parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
products << product
end
end
def prices_scrapper
prices = Array.new
product_prices = parsed_page.css('div.price-box')
product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
prices << price
end
end
byebug
whey_scrapper
end
There's a lot going on here, but to make it more Ruby you'd consider making those lazy-initialized and giving them names that reflect that:
class Wheyscrapper
URL = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?%s"
def initialize(company:)
#company = company
# Use encode_www_form to encode query-string parameters
#url = URL % URI.encode_www_form(manufacturer: company)
end
def document
# Lazy-initialize a parsd version of the page
#document ||= Nokogiri::HTML(open(url).read)
end
def products
document.css('div.product-primary').map do |product_name|
{
name: product_name.css('h2.product-name').text
}
end
end
def prices
document.css('div.price-box').map do |product_price|
{
amount: product_price.css('span.price').text
}
end
end
end
This fixes a lot of the data propagation problems you had in your original. When you declare a variable it's a local variable, meaning it doesn't exist outside of that particular call of that particular method. If you want to persist it for longer you need to use instance variables, as in #products, or you need to define methods that return the data you need.
The above approach combines that, using a lazy-initialized instance variable to persist the parsed document, and exposes that as a method the other methods can use.
Now you can spin this up:
scraper = WheyScraper.new(company: "Body & Fit")
Where that should enable everything to be available directly:
scraper.prices
scraper.products
When you learn how to use Ruby effectively you'll often find solutions to your problems that are really minimal. Usually a lot of Ruby code is a sign that it's not being used properly.
This should be refactored in a better way but this should at least work without refactor, based on my comments above
require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'csv'
class Wheyscrapper
def whey_scrapper
company = 'Body+%26+fit'
url = "https://www.bodyenfitshop.nl/afslanken/afslank-toppers/?manufacturer=#{company}"
unparsed_page = open(url).read
#parsed_page = Nokogiri::HTML(unparsed_page)
product_scrapper
prices_scrapper
# csv = CSV.open('wheyprotein.csv', 'wb')
end
def product_scrapper
#products = Array.new
product_names = #parsed_page.css('div.product-primary')
product_names.each do |product_name|
product = {
name: product_name.css('h2.product-name').text
}
#products << product
end
end
def prices_scrapper
#prices = Array.new
#product_prices = #parsed_page.css('div.price-box')
#product_prices.each do |product_price|
price = {
amount: product_price.css('span.price').text
}
#prices << price
end
end
end
w = Wheyscrapper.new.whey_scrapper

How to refresh a large database?

I built a rake task to donwload a zip from Awin datafeed and import it to my product model via activerecord-import.
require 'zip'
require 'httparty'
require 'active_record'
require 'activerecord-import'
namespace :affiliate_datafeed do
desc "Import products data from Awin"
task import_product_awin: :environment do
url = "https://productdata.awin.com"
dir = "db/affiliate_datafeed/awin.zip"
File.open(dir, "wb") do |f|
f.write HTTParty.get(url).body
end
zip_file = Zip::File.open(dir)
entry = zip_file.glob('*.csv').first
csv_text = entry.get_input_stream.read
products = []
CSV.parse(csv_text, :headers=>true).each do |row|
products << Product.new(row.to_h)
end
Product.import(products)
end
end
How to update the product db only if the product doesn't exist or if there is a new date in the last_updated field? What is the best way to refresh a large db?
Probably use some methods like the following to keep checking the last_updated or last_modified header field in your rake task.
def get_date
date = CSV.foreach('CSV_raw.csv', :headers => false).first { |r| puts r}
$last_modified = Date.parse(date.compact[1]) # if last_updated is first row of CSV or use your http req header
end
run_once = ARGV.length > 0 # to run once & test if it works; not sure if rake taks accept args.
if not run_once
puts "Daemon Mode"
end
if not File.read('last_update.txt').empty?
date_in_file = Date.parse(File.read('last_update.txt'))
else
date_in_file = Date.parse('2001-02-03')
end
if $last_modified > date_in_file
"your db updating method"
end
unless run_once
sleep UPDATE_INTERVAL # whatever value you want for the interval to be
end
end until run_once

Sinatra - Saving Twilio SMS to CSV

I have a very simple Sinatra app that allows me to receive SMS messages through my Twilio number and will print them to the same terminal session that the app is running on. I would like to save these messages to a local .csv file. Adding CSV.open() to the app throws some errors.
require 'sinatra'
require 'twilio-ruby'
require 'csv'
post '/receive_sms' do
#body = params["Body"].to_s
#sid = params["MessageSid"].to_s
#sender = params["From"].delete('+').to_i
content_type 'text/xml'
puts #body
puts #sender
puts #sid
CSV.open("/home/ubuntu/Twilio_SMS/smsLog.csv", "a") do |csv|
csv << [#sender, #body, #sid]
end
end
This gives me the following errors:
ERROR IOError: closed stream
/home/ubuntu/.rvm/gems/ruby-2.2.1/gems/rack-1.6.4/lib/rack/body_proxy.rb:16:in `close'
/home/ubuntu/.rvm/gems/ruby-2.2.1/gems/rack-1.6.4/lib/rack/handler/webrick.rb:117:in `ensure in service'
/home/ubuntu/.rvm/gems/ruby-2.2.1/gems/rack-1.6.4/lib/rack/handler/webrick.rb:117:in `service'
/home/ubuntu/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/webrick/httpserver.rb:138:in `service'
/home/ubuntu/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/webrick/httpserver.rb:94:in `run'
/home/ubuntu/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/webrick/server.rb:294:in `block in start_thread'
I have tried moving the CSV call outside of the post method, but this only writes , , to the file every time I start the applicaiton.
What is the proper way to save this informaiton to the CSV file and make sure that every message is added even if they are recieved in rapid succession?
Try adding a valid return value to the method.
require 'sinatra'
require 'twilio-ruby'
require 'csv'
post '/receive_sms' do
#body = params["Body"].to_s
#sid = params["MessageSid"].to_s
#sender = params["From"].delete('+').to_i
content_type 'text/xml'
puts #body
puts #sender
puts #sid
CSV.open("/home/ubuntu/Twilio_SMS/smsLog.csv", "a") do |csv|
csv << [#sender, #body, #sid]
end
'done'
end
Because 'CSV.open' was the last method you ran, Sinatra tried to read from it to generate an HTTP reply - and invoked an IOError from trying to read from a closed stream.

EventMachine server and serial-port using SQL

I'm new to Ruby.
I'm trying to make an app that reads from a serial-port and puts values into a sqlite3 database. When a client connects via TCP socket he should recieve values from the db. Values written by the client should be sent via serial-port.
I have two questions regarding my app.
This would open one connection to the db on the main thread(?) and one for each client..
Is there a better way to use sqlite3?
I think i figured this out. sqlite3 is not thread safe by defaul,t so this seems like the way to do it..
How do i write to the serialport in the recieve_data method? Is it okay to make serial a global variable?
#!/usr/bin/env ruby
#
# server_1
require 'rubygems'
require 'eventmachine'
require 'sqlite3'
require 'em-serialport'
require 'json'
module SocketClient
def self.list
#list ||= []
end
def post_init
SocketClient.list << self
#db = SQLite3::Database.new( "data.db" )
values = []
#db.execute("SELECT * FROM values") do |row|
values << {row[0] => row[1]} #id => value
end
self.send_data "#{values.to_json}\n"
p "Client connected"
end
def unbind
SocketClient.list.delete self
#db.close
end
def receive_data data
p data
#How do i send via serialport from here??? serial.send_data data
end
end
db = SQLite3::Database.new( "data.db" )
EM.run{
EM.start_server '0.0.0.0', 8081, SocketClient
serial = EM.open_serial '/dev/tty.usbserial-xxxxxxxx', 9600, 8, 1, 0
serial.on_data do |data|
#Parse data into an array called values
db.execute("UPDATE values SET value = ? WHERE id = ?", values["value"], values["id"])
SocketClient.list.each{ |c| c.send_data "#{values.to_json}\n" }
end
}
db.close
Setup the constructor for your Socket client so that it will receive the shared serial connection.
module SocketClient
def initialize serial
#serial = serial
end
def receive_data data
p data
#serial.send_data data
end
Then pass it when you call EM.start_server
EM.run{
serial = EM.open_serial '/dev/tty.usbserial-xxxxxxxx', 9600, 8, 1, 0
EM.start_server '0.0.0.0', 8081, SocketClient, serial

How do I query a MS Access database table, and export the information to Excel using Ruby and win32ole?

I'm new to Ruby, and I'm trying to query an existing MS Access database for information for a report. I want this information stored in an Excel file. How would I do this?
Try one of these:
OLE:
require 'win32ole'
class AccessDbExample
#ado_db = nil
# Setup the DB connections
def initialize filename
#ado_db = WIN32OLE.new('ADODB.Connection')
#ado_db['Provider'] = "Microsoft.Jet.OLEDB.4.0"
#ado_db.Open(filename)
rescue Exception => e
puts "ADO failed to connect"
puts e
end
def table_to_csv table
sql = "SELECT * FROM #{table};"
results = WIN32OLE.new('ADODB.Recordset')
results.Open(sql, #ado_db)
File.open("#{table}.csv", 'w') do |file|
fields = []
results.Fields.each{|f| fields << f.Name}
file.puts fields.join(',')
results.GetRows.transpose.each do |row|
file.puts row.join(',')
end
end unless results.EOF
self
end
def cleanup
#ado_db.Close unless #ado_db.nil?
end
end
AccessDbExample.new('test.mdb').table_to_csv('colors').cleanup
ODBC:
require 'odbc'
include ODBC
class AccessDbExample
#obdc_db = nil
# Setup the DB connections
def initialize filename
drv = Driver.new
drv.name = 'AccessOdbcDriver'
drv.attrs['driver'] = 'Microsoft Access Driver (*.mdb)'
drv.attrs['dbq'] = filename
#odbc_db = Database.new.drvconnect(drv)
rescue
puts "ODBC failed to connect"
end
def table_to_csv table
sql = "SELECT * FROM #{table};"
result = #odbc_db.run(sql)
return nil if result == -1
File.open("#{table}.csv", 'w') do |file|
header_row = result.columns(true).map{|c| c.name}.join(',')
file.puts header_row
result.fetch_all.each do |row|
file.puts row.join(',')
end
end
self
end
def cleanup
#odbc_db.disconnect unless #odbc_db.nil?
end
end
AccessDbExample.new('test.mdb').table_to_csv('colors').cleanup
Why do you want to do this? You can simply query your db from Excel directly. Check out this tutorial.
As Johannes said, you can query the database from Excel.
If, however, you would prefer to work with Ruby...
You can find info on querying Access/Jet databases with Ruby here.
Lots of info on automating Excel with Ruby can be found here.
David

Resources