ORM for SQL Scripting - ruby

What is the best way to run simple SQL scripts in a database (preferably DBM implementation agnostically)?
So, for illustration purposes, using your best/suggested way, I'd like to see a script that creates a few tables with names from an array ['cars_table', 'ice_cream_t'], deletes all elements with id=5 in a table, and does a join between two tables and prints the result formatted in some nice way.
I've heard of Python and PL/SQL to do this
Ruby/Datamapper seems very attractive
Java + JDBC, maybe
Others?
Some of these are mostly used in a full application or within a framework. I'd like to see them used simply in scripts.

Ruby/Sequel is currently my weapon of choice.
Short example from the site:
require "rubygems"
require "sequel"
# connect to an in-memory database
DB = Sequel.sqlite
# create an items table
DB.create_table :items do
primary_key :id
String :name
Float :price
end
# create a dataset from the items table
items = DB[:items]
# populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
# print out the number of records
puts "Item count: #{items.count}"
# print out the average price
puts "The average price is: #{items.avg(:price)}"

By using SQL DDL (Data Definition Language), which can be done db agnostically, if you're careful.
There are examples at the Wikipedia article:
http://en.wikipedia.org/wiki/Data_Definition_Language

Related

Sequel: How to use 'order by' in a view with tinytds

I need to create a view with an order by-clause with sequel, tinytds and MSSQL
When I do so, I get the error
TinyTds::Error: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified. (Sequel::DatabaseError)
My examplecode:
require 'sequel'
DB = Sequel.tinytds(
:host => 'server',
:database=> 'DB',
)
#Remove data from previous test
DB.drop_table(:testtab1) if DB.table_exists?(:testtab1)
DB.drop_view(:v_testtab1) rescue Sequel::DatabaseError
DB.drop_view(:v_testtab2) rescue Sequel::DatabaseError
DB.create_table(:testtab1){
primary_key :id
field :a, :type => :nvarchar, :size => 10
field :b, :type => :nvarchar, :size => 10
}
#Here the error comes up
#"SELECT * FROM `testtab1` ORDER BY `b`"
DB.create_view(:v_testtab1, DB[:testtab1].order_by(:b))
See solution on SQL-side is easy. Instead of the
SELECT * FROM `testtab1` ORDER BY `b`
I need a
SELECT top 100 percent * FROM `testtab1` ORDER BY `b`
I found a solution with an additional obsolete column (without the column dummy I get an invalid comma):
sel = DB[:testtab1].select(Sequel.lit('top 100 percent "" as dummy'), *DB[:testtab1].columns)
#SELECT top 100 percent "" as dummy, [ID], [A], [B] FROM [TESTTAB1]
DB.create_view(:v_testtab2, sel.order_by(:b))
A similar solution can be made with limit:
#Take a big number to get all entries.
#DB[:testtab1].count would take the number in moment of view creation, not usage.
sel = DB[:testtab1].limit(99999999999)
#SELECT TOP (99999999999) * FROM [TESTTAB1]
DB.create_view(:v_testtab3, sel.order_by(:b))
But I'm looking for a nicer solution. Is there another better possibility?
If it is important:
Ruby 2.1
Sequel 4.19
tiny_tds-0.6.2-x64-mingw32
MSSQL 10.50.2500.0, 64 bit

Stream based parsing and writing of JSON

I fetch about 20,000 datasets from a server in 1,000 batches. Each dataset is a JSON object. Persisted this makes around 350 MB of uncompressed plaintext.
I have a memory limit of 1GB. Hence, I write each 1,000 JSON objects as an array into a raw JSON file in append mode.
The result is a file with 20 JSON arrays which needs to be aggregated. I need to touch them anyway, because I want to add metadata. Generally the Ruby Yajl Parser makes this possible like so:
raw_file = File.new(path_to_raw_file, 'r')
json_file = File.new(path_to_json_file, 'w')
datasets = []
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new { |o| datasets += o }
parser.parse(datasets)
hash = { date: Time.now, datasets: datasets }
Yajl::Encoder.encode(hash, json_file)
Where is the problem with this solution? The problem is that still the whole JSON is parsed into memory, which I must avoid.
Basically what I need is a solution which parses the JSON from an IO object and encodes them to another IO object, at the same time.
I assumed Yajl offers this, but I haven't found a way, nor did its API give any hints, so I guess not. Is there a JSON Parser library which supports this? Are there other solutions?
The only solution I can think of is to use the IO.seek capabilities. Write all the datasets arrays one after another [...][...][...] and after every array, I seek back to the start and overwrite ][ with ,, effectively connecting the arrays manually.
Why can't you retrieve a single record at a time from the database, process it as necessary, convert it to JSON, then emit it with a trailing/delimiting comma?
If you started with a file that only contained [, then appended all your JSON strings, then, on the final entry didn't append a comma, and instead used a closing ], you'd have a JSON array of hashes, and would only have to process one row's worth at a time.
It'd be a tiny bit slower (maybe) but wouldn't impact your system. And DB I/O can be very fast if you use blocking/paging to retrieve a reasonable number of records at a time.
For instance, here's a combination of some Sequel example code, and code to extract the rows as JSON and build a larger JSON structure:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :name
Float :price
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
add_comma = false
puts '['
items.order(:price).each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[item]
end
puts "\n]"
Which outputs:
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
Notice the order is now by "price".
Validation is easy:
require 'json'
require 'pp'
pp JSON[<<EOT]
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
EOT
Which results in:
[{"id"=>2, "name"=>"def", "price"=>3.714714089426208},
{"id"=>3, "name"=>"ghi", "price"=>27.0179624376119},
{"id"=>1, "name"=>"abc", "price"=>52.51248221170203}]
This validates the JSON and demonstrates that the original data is recoverable. Each row retrieved from the database should be a minimal "bitesized" piece of the overall JSON structure you want to build.
Building upon that, here's how to read incoming JSON in the database, manipulate it, then emit it as a JSON file:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :json
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:json => JSON[:name => 'abc', :price => rand * 100])
items.insert(:json => JSON[:name => 'def', :price => rand * 100])
items.insert(:json => JSON[:name => 'ghi', :price => rand * 100])
items.insert(:json => JSON[:name => 'jkl', :price => rand * 100])
items.insert(:json => JSON[:name => 'mno', :price => rand * 100])
items.insert(:json => JSON[:name => 'pqr', :price => rand * 100])
items.insert(:json => JSON[:name => 'stu', :price => rand * 100])
items.insert(:json => JSON[:name => 'vwx', :price => rand * 100])
items.insert(:json => JSON[:name => 'yz_', :price => rand * 100])
add_comma = false
puts '['
items.each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[
JSON[
item[:json]
].merge('foo' => 'bar', 'time' => Time.now.to_f)
]
end
puts "\n]"
Which generates:
[
{"name":"abc","price":3.268814929005337,"foo":"bar","time":1379688093.124606},
{"name":"def","price":13.871147312377719,"foo":"bar","time":1379688093.124664},
{"name":"ghi","price":52.720984131655676,"foo":"bar","time":1379688093.124702},
{"name":"jkl","price":53.21477190840114,"foo":"bar","time":1379688093.124732},
{"name":"mno","price":40.99364022416619,"foo":"bar","time":1379688093.124758},
{"name":"pqr","price":5.918738444452265,"foo":"bar","time":1379688093.124803},
{"name":"stu","price":45.09391752439902,"foo":"bar","time":1379688093.124831},
{"name":"vwx","price":63.08947792357426,"foo":"bar","time":1379688093.124862},
{"name":"yz_","price":94.04921035056373,"foo":"bar","time":1379688093.124894}
]
I added the timestamp so you can see that each row is processed individually, AND to give you an idea how fast the rows are being processed. Granted, this is a tiny, in-memory database, which has no network I/O to content with, but a normal network connection through a switch to a database on a reasonable DB host should be pretty fast too. Telling the ORM to read the DB in chunks can speed up the processing because the DBM will be able to return larger blocks to more efficiently fill the packets. You'll have to experiment to determine what size chunks you need because it will vary based on your network, your hosts, and the size of your records.
Your original design isn't good when dealing with enterprise-sized databases, especially when your hardware resources are limited. Over the years we've learned how to parse BIG databases, which make 20,000 row tables appear miniscule. VM slices are common these days and we use them for crunching, so they're often the PCs of yesteryear: single CPU with small memory footprints and dinky drives. We can't beat them up or they'll be bottlenecks, so we have to break the data into the smallest atomic pieces we can.
Harping about DB design: Storing JSON in a database is a questionable practice. DBMs these days can spew JSON, YAML and XML representations of rows, but forcing the DBM to search inside stored JSON, YAML or XML strings is a major hit in processing speed, so avoid it at all costs unless you also have the equivalent lookup data indexed in separate fields so your searches are at the highest possible speed. If the data is available in separate fields, then doing good ol' database queries, tweaking in the DBM or your scripting language of choice, and emitting the massaged data becomes a lot easier.
It is possible via JSON::Stream or Yajl::FFI gems. You will have to write your own callbacks though. Some hints on how to do that can be found here and here.
Facing a similar problem I have created the json-streamer gem that will spare you the need to create your own callbacks. It will yield you each object one by one removing it from the memory afterwards. You could then pass these to another IO object as intended.
There is a library called oj that does exactly that. It can do parsing and generation. For example, for parsing you can use Oj::Doc:
Oj::Doc.open('[3,[2,1]]') do |doc|
result = {}
doc.each_leaf() do |d|
result[d.where?] = d.fetch()
end
result
end #=> ["/1" => 3, "/2/1" => 2, "/2/2" => 1]
You can even backtrack in the file using doc.move(path). it seems very flexible.
For writing documents, you can use Oj::StreamWriter:
require 'oj'
doc = Oj::StreamWriter.new($stdout)
def write_item(doc, item)
doc.push_object
doc.push_key "type"
doc.push_value "item"
doc.push_key "value"
doc.push_value item
doc.pop
end
def write_array(doc, array)
doc.push_object
doc.push_key "type"
doc.push_value "array"
doc.push_key "value"
doc.push_array
array.each do |item|
write_item(doc, item)
end
doc.pop
doc.pop
end
write_array(doc, [{a: 1}, {a: 2}]) #=> {"type":"array","value":[{"type":"item","value":{":a":1}},{"type":"item","value":{":a":2}}]}

I want to do an autocomplete with Ruby and Sequel

I am using Sequel with prostgres and Sinatra. I want to do an autocomplete search. I’ve verified my jQuery which sends a GET works fine.
The Ruby code is:
get '/search' do
search = params[:search]
DB[:candidates].select(:last).where('last LIKE ?', '_a_').each do |row|
l = row[:last]
end
end
The problem is the Sequel query:
I have tried every possible configuration of the query that I can think of with no luck.
So, for example, in the above query I get all the people who have "a" in their last name but when I change the query to:
DB[:candidates].select(:last).where('last LIKE ?', 'search')
or
DB[:candidates].select(:last).where('last LIKE ?', search) # (without '')
I get nothing.
I have done warn params.inspect which indicates the param search is being passed, so I am stuck.
Any ideas how the query should be written?
Finally, the second part of the question the results (when it works with '_a_') are rendered as {:last=>"Yao"} I would like just Yao, how can I do that?
I have tried numerous different types of query including raw SQL but no luck. Or is the approach just plain wrong?
Just installed Sequel and made working example:
require "rubygems"
require "sequel"
# connect to an in-memory database
DB = Sequel.sqlite
# create an items table
DB.create_table :items do
primary_key :id
String :name
Float :price
end
# create a dataset from the items table
items = DB[:items]
# populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
items.insert(:name => 'gui', :price => rand * 100)
# print out the number of records
puts "Item count: #{items.count}"
# print out the average price
puts "The average price is: #{items.avg(:price)}"
recs = items.select(:name).where(Sequel.like(:name, 'g%'))
recs.each do |rec|
puts rec.values
end
I think you will get the point.
UPDATED
So in your case you should try this:
DB[:candidates]
.select(:last)
.where(Sequel.like(:last, "#{search}%"))
.map{|rec| rec.values}.flatten
It should return array of found strings.
Copy/pasting from the Sequel documentation:
You can search SQL strings in a case sensitive manner using the Sequel.like method:
items.where(Sequel.like(:name, 'Acme%')).sql
#=> "SELECT * FROM items WHERE (name LIKE 'Acme%')"
You can search SQL strings in a case insensitive manner using the Sequel.ilike method:
items.where(Sequel.ilike(:name, 'Acme%')).sql
#=> "SELECT * FROM items WHERE (name ILIKE 'Acme%')"
You can specify a Regexp as a like argument, but this will probably only work on PostgreSQL and MySQL:
items.where(Sequel.like(:name, /Acme.*/)).sql
#=> "SELECT * FROM items WHERE (name ~ 'Acme.*')"
Like can also take more than one argument:
items.where(Sequel.like(:name, 'Acme%', /Beta.*/)).sql
#=> "SELECT * FROM items WHERE ((name LIKE 'Acme%') OR (name ~ 'Beta.*'))"
Open up a Sequel console (not your Sinatra app) and play with the query until you get results back. Since you say you want only the last column your query should be something like:
# Search anywhere inside the last name
DB[:candidates].where( Sequel.ilike(:last, "%#{search}%") ).select_map(:last)
# Find last names starting with the search string
DB[:candidates].where( Sequel.ilike(:last, "#{search}%") ).select_map(:last)
Uglier alternatives:
DB[:candidates]
.select(:last)
.where( Sequel.ilike(:last, "%#{search}%") )
.all
.map{ |hash| hash[:last] }
DB[:candidates]
.select(:last)
.where( Sequel.ilike(:last, "%#{search}%") )
.map( :last )
If you want to rank the search results by the best matches, you might be interested in my free LiqrrdMetal library. Instead of searching on the DB, you would pull a full list of all last names into Ruby and use LiqrrdMetal to search through them. This would allow a search string of "pho" to match both "Phong" as well as "Phrogz", with the former scoring higher in the rankings.

Can I retrieve objects with Sequel from a complex query that limits results to fields from a single table?

I have a model whose rows I always want to sort based on the values in another associated model and I was thinking that the way to implement this would be to use set_dataset in the model. This is causing query results to be returned as hashes rather than objects, though, so none of the methods from the class can be used when iterating over the dataset.
I basically have two classes
class SortFields < Sequel::Model(:sort_fields)
set_primary_key :objectid
end
class Items < Sequel::Model(:items)
set_primary_key :objectid
one_to_one :sort_fields, :class => SortFields, :key => :objectid
end
Some backstory: the data is imported from a legacy system into mysql. The values in sort_fields are calculated from multiple other associated tables (some one-to-many, some many-to-many) according to some complicated rules. The likely solution will be to just add the values in sort_fields to items (I want to keep the imported data separate from the calculated data, but I don't have to). First, though, I just want to understand how far you can go with a dataset and still get objects rather than hashes.
If I set the dataset to sort on a field in items like so
class Items < Sequel::Model(:items)
set_primary_key :objectid
one_to_one :sort_fields, :class => SortFields, :key => :objectid
set_dataset(order(:sortnumber))
end
then the expected clause is added to the generated SQL, e.g.:
>> Items.limit(1).sql
=> "SELECT * FROM `items` ORDER BY `sortnumber` LIMIT 1"
and queries still return objects:
>> Items.limit(1).first.class
=> Items
If I order it by the associated fields though...
class Items < Sequel::Model(:items)
set_primary_key :objectid
one_to_one :sort_fields, :class => SortFields, :key => :objectid
set_dataset(
eager_graph(:sort_fields).
order(:sort1, :sort2, :sort3)
)
end
...I get hashes
?> Items.limit(1).first.class
=> Hash
My first thought was that this happens because all fields from sort_fields are included in the results and maybe if selected only the fields from items I would get Items objects again:
class Items < Sequel::Model(:items)
set_primary_key :objectid
one_to_one :sort_fields, :class => SortFields, :key => :objectid
set_dataset(
eager_graph(:sort_fields).
select(:items.*).
order(:sort1, :sort2, :sort3)
)
end
The generated SQL is what I would expect:
>> Items.limit(1).sql
=> "SELECT `items`.* FROM `items` LEFT OUTER JOIN `sort_fields` ON (`sort_fields`.`objectid` = `items`.`objectid`) ORDER BY `sort1`, `sort2`, `sort3` LIMIT 1"
It returns the same rows as the set_dataset(order(:sortnumber)) version but it still doesn't work:
>> Items.limit(1).first.class
=> Hash
Before I add the sort fields to the items table so that they can all live happily in the same model, is there a way to tell Sequel to return on object when it wants to return a hash?
If you use #eager_graph, you must use #all instead of #each to retrieve the results in order for the graph to be processed (since you cannot eagerly load without having all instances up front), or use the eager_each plugin (which makes #each call #all internally).

Sequel default example fails when switched to postgres adapter

I'm trying to run the Sequel example from http://sequel.rubyforge.org/. Everything works fine on sqlite, but fails when I switch to postgres.
This is the connection code:
DB = Sequel.connect(:adapter=>'postgres', :host=>'localhost', :database=>'testing', :user=>'postgres', :default_schema=>'sequel')
This is the error I get:
postgres.rb:145:in `async_exec': PG::Error: ERROR: relation "items" does not exist (Sequel::DatabaseError)
LINE 1: INSERT INTO "items" ("price", "name") VALUES (12.45377636338...
I'm suspecting that the issue is Sequel trying to execute INSERT INTO "items" instead "sequel.items", even though :default_schema is correctly set.
Anyone have any idea what i'm doing wrong?
Thanks in advance.
Edit - this is the code used:
require "rubygems"
require "sequel"
# connect to an in-memory database
#DB = Sequel.sqlite
DB = Sequel.connect(:adapter=>'postgres', :host=>'localhost', :database=>'testing', :user=>'postgres', :default_schema=>'sequel')
# create an items table
DB.create_table :items do
primary_key :id
String :name
Float :price
end
# create a dataset from the items table
items = DB[:items]
# populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
# print out the number of records
puts "Item count: #{items.count}"
Looks like you're missing the password in the connect method (that's the only difference from the documentation example). Its common for the password to just be the username, so try that if you're not sure what the password is.
It's also suggested to use a different postgresql user with each project, which also makes naming the user intuitive (the project name.) That avoids potentially clashing names.
Anyway, see if this works:
DB = Sequel.postgres 'testing', host: 'localhost', default_schema: 'sequel',
user: 'postgres', password: 'postgres'

Resources