I want to do an autocomplete with Ruby and Sequel - ruby

I am using Sequel with prostgres and Sinatra. I want to do an autocomplete search. I’ve verified my jQuery which sends a GET works fine.
The Ruby code is:
get '/search' do
search = params[:search]
DB[:candidates].select(:last).where('last LIKE ?', '_a_').each do |row|
l = row[:last]
end
end
The problem is the Sequel query:
I have tried every possible configuration of the query that I can think of with no luck.
So, for example, in the above query I get all the people who have "a" in their last name but when I change the query to:
DB[:candidates].select(:last).where('last LIKE ?', 'search')
or
DB[:candidates].select(:last).where('last LIKE ?', search) # (without '')
I get nothing.
I have done warn params.inspect which indicates the param search is being passed, so I am stuck.
Any ideas how the query should be written?
Finally, the second part of the question the results (when it works with '_a_') are rendered as {:last=>"Yao"} I would like just Yao, how can I do that?
I have tried numerous different types of query including raw SQL but no luck. Or is the approach just plain wrong?

Just installed Sequel and made working example:
require "rubygems"
require "sequel"
# connect to an in-memory database
DB = Sequel.sqlite
# create an items table
DB.create_table :items do
primary_key :id
String :name
Float :price
end
# create a dataset from the items table
items = DB[:items]
# populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
items.insert(:name => 'gui', :price => rand * 100)
# print out the number of records
puts "Item count: #{items.count}"
# print out the average price
puts "The average price is: #{items.avg(:price)}"
recs = items.select(:name).where(Sequel.like(:name, 'g%'))
recs.each do |rec|
puts rec.values
end
I think you will get the point.
UPDATED
So in your case you should try this:
DB[:candidates]
.select(:last)
.where(Sequel.like(:last, "#{search}%"))
.map{|rec| rec.values}.flatten
It should return array of found strings.

Copy/pasting from the Sequel documentation:
You can search SQL strings in a case sensitive manner using the Sequel.like method:
items.where(Sequel.like(:name, 'Acme%')).sql
#=> "SELECT * FROM items WHERE (name LIKE 'Acme%')"
You can search SQL strings in a case insensitive manner using the Sequel.ilike method:
items.where(Sequel.ilike(:name, 'Acme%')).sql
#=> "SELECT * FROM items WHERE (name ILIKE 'Acme%')"
You can specify a Regexp as a like argument, but this will probably only work on PostgreSQL and MySQL:
items.where(Sequel.like(:name, /Acme.*/)).sql
#=> "SELECT * FROM items WHERE (name ~ 'Acme.*')"
Like can also take more than one argument:
items.where(Sequel.like(:name, 'Acme%', /Beta.*/)).sql
#=> "SELECT * FROM items WHERE ((name LIKE 'Acme%') OR (name ~ 'Beta.*'))"
Open up a Sequel console (not your Sinatra app) and play with the query until you get results back. Since you say you want only the last column your query should be something like:
# Search anywhere inside the last name
DB[:candidates].where( Sequel.ilike(:last, "%#{search}%") ).select_map(:last)
# Find last names starting with the search string
DB[:candidates].where( Sequel.ilike(:last, "#{search}%") ).select_map(:last)
Uglier alternatives:
DB[:candidates]
.select(:last)
.where( Sequel.ilike(:last, "%#{search}%") )
.all
.map{ |hash| hash[:last] }
DB[:candidates]
.select(:last)
.where( Sequel.ilike(:last, "%#{search}%") )
.map( :last )
If you want to rank the search results by the best matches, you might be interested in my free LiqrrdMetal library. Instead of searching on the DB, you would pull a full list of all last names into Ruby and use LiqrrdMetal to search through them. This would allow a search string of "pho" to match both "Phong" as well as "Phrogz", with the former scoring higher in the rankings.

Related

How to query multiple fields with Chewy

Let's say I have an index with multiple objects in it:
class ThingsIndex < Chewy::Index
define_type User do
field :full_name
end
define_type Post do
field :title
end
end
How do I search both users' full_name and posts' titles.
The docs only talk about querying one attribute like this:
ThingsIndex.query(term: {full_name: 'Foo'})
There are a couple ways you could do this. Chaining is probably the easiest:
ThingsIndex.query(term: {full_name: 'Foo'}).query(term: {title: 'Foo'})
If you need to do several queries, you might consider merging them:
query = ThingsIndex.query(term: {full_name: 'Foo'})
query = query.merge(ThingsIndex.query(term: {title: 'Foo'}))
Read more about merging here: Chewy #merge docs
Make sure to set your limit or else it only shows 10 results:
query.limit(50)

How do I remove sphinx_deleted from a Sphinx query?

I am new to Ruby and ThinkingSphinx.
I have the following Sphinx Query - SELECT * FROM user_core, user_delta WHERE sphinx_deleted = 0.
I do not want to see the condition "WHERE 'sphinx_deleted' = 0. How do I remove this? I have removed the sql_attr_uint = sphinx_deleted from my sphinx.conf file, yet I see the sphinx_deleted being passed in the query.
Here is the index file definition:
ThinkingSphinx::Index.define :user, :with => :active_record, :delta => true do
indexes [first_name,last_name,display_name], :as=>:name, :sortable=>true
indexes first_name, :sortable => true
indexes last_name, :sortable => true
indexes display_name, :sortable => true
indexes email, :sortable => true
indexes phone, :sortable => true
indexes title, :sortable => true
has id, :as => :user_id
has roles(:id), :as => :role_ids
has jurisdictions(:id), :as => :jurisdiction_ids
set_property :delta => true
end
I do not have a sphinx_scope or default_sphinx_scope defined.
We are using thinking-sphinx-3.1.0 and ruby-2.1.0
The sphinx_deleted attribute is created by Thinking Sphinx, and is used in the following cases (using your scenario of a User model with core and delta indices in the examples):
When a User is deleted, sphinx_deleted is set to 1 for that record in both the core and delta indices - there's no point returning Sphinx records if the underlying ActiveRecord object no longer exists.
When a User is updated, the delta index is processed with the latest field and attribute details, and the core index's document has sphinx_deleted set to 1, so only the latest (accurate) information will match. e.g. if a user has their name changed from Fred to Georgina, a search for 'Fred' will not return Georgina, because the core index document (which does match) is filtered out.
That is why the attribute exists. You cannot tell Thinking Sphinx to not add it, nor can you remove that filter, short of mucking around in the internals of Thinking Sphinx.
If there is a specific reason for wanting to remove the attribute and filter, feel free to comment here, or you can open an issue on the GitHub repo, or post to the TS Google Group.
Update
Okay, further to this, there are three ways around it.
Option One:
The first way is to make the query to Sphinx yourself, using a Thinking Sphinx connection:
results = ThinkingSphinx::Connection.take do |connection|
connection.execute "SELECT * FROM user_core, user_delta"
end
Keep in mind that this returns raw Sphinx values, not ActiveRecord instances.
Option Two:
A more complicated alternative, though, is to have your own search middleware stack. First, you'll want to create a custom subclass of ThinkingSphinx::Middlewares::SphinxQL that removes the :sphinx_deleted filter:
class SphinxQLWithoutFilter < ThinkingSphinx::Middlewares::SphinxQL
def call(contexts)
contexts.each do |context|
Inner.new(context).call
end
app.call contexts
end
private
class Inner < ThinkingSphinx::Middlewares::SphinxQL::Inner
def inclusive_filters
super.except :sphinx_deleted
end
end
end
Then, create a new middleware stack which uses this new SphinxQL query middleware:
WithoutFilterMiddleware = ::Middleware::Builder.new do
use ThinkingSphinx::Middlewares::StaleIdFilter
use SphinxQLWithoutFilter
use ThinkingSphinx::Middlewares::Geographer
use ThinkingSphinx::Middlewares::Inquirer
use ThinkingSphinx::Middlewares::ActiveRecordTranslator
use ThinkingSphinx::Middlewares::StaleIdChecker
use ThinkingSphinx::Middlewares::Glazier
end
And then you can use that middleware stack in specific search queries:
User.search 'foo', :middleware => WithoutFilterMiddleware
It's worth noting the two middleware present in that stack for stale ids. They work together to catch any Sphinx results that do not have a matching ActiveRecord object, and re-run the Sphinx query up to three times filtering out those unmatched records. They're probably useful, but if you don't want to use them, you can remove them from your custom stack. However, without them, any Sphinx records that don't have matching ActiveRecord objects will be transformed into nils.
Option Three:
This is the more hackish version of the previous solution, but will apply to all searches, so probably isn't worthwhile: re-open the class that adds the filter with class_eval and change the method definition:
ThinkingSphinx::Middlewares::SphinxQL::Inner.class_eval do
def inclusive_filters
# normally:
# (options[:with] || {}).merge({:sphinx_deleted => false})
# but without the sphinx_deleted filter:
options[:with] || {}
end
end
Now, all that said: I presume you're not actually deleting users, but somehow the deletion callbacks are being fired anyway? Hence, users do exist but are currently being filtered out by Sphinx? If so, I highly recommend not using ActiveRecord's destroy method, and instead having a custom method to mark users as inactive. This avoids the callbacks, and thus avoids the need for any of the above 'solutions'.

Sequel: How to use 'order by' in a view with tinytds

I need to create a view with an order by-clause with sequel, tinytds and MSSQL
When I do so, I get the error
TinyTds::Error: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified. (Sequel::DatabaseError)
My examplecode:
require 'sequel'
DB = Sequel.tinytds(
:host => 'server',
:database=> 'DB',
)
#Remove data from previous test
DB.drop_table(:testtab1) if DB.table_exists?(:testtab1)
DB.drop_view(:v_testtab1) rescue Sequel::DatabaseError
DB.drop_view(:v_testtab2) rescue Sequel::DatabaseError
DB.create_table(:testtab1){
primary_key :id
field :a, :type => :nvarchar, :size => 10
field :b, :type => :nvarchar, :size => 10
}
#Here the error comes up
#"SELECT * FROM `testtab1` ORDER BY `b`"
DB.create_view(:v_testtab1, DB[:testtab1].order_by(:b))
See solution on SQL-side is easy. Instead of the
SELECT * FROM `testtab1` ORDER BY `b`
I need a
SELECT top 100 percent * FROM `testtab1` ORDER BY `b`
I found a solution with an additional obsolete column (without the column dummy I get an invalid comma):
sel = DB[:testtab1].select(Sequel.lit('top 100 percent "" as dummy'), *DB[:testtab1].columns)
#SELECT top 100 percent "" as dummy, [ID], [A], [B] FROM [TESTTAB1]
DB.create_view(:v_testtab2, sel.order_by(:b))
A similar solution can be made with limit:
#Take a big number to get all entries.
#DB[:testtab1].count would take the number in moment of view creation, not usage.
sel = DB[:testtab1].limit(99999999999)
#SELECT TOP (99999999999) * FROM [TESTTAB1]
DB.create_view(:v_testtab3, sel.order_by(:b))
But I'm looking for a nicer solution. Is there another better possibility?
If it is important:
Ruby 2.1
Sequel 4.19
tiny_tds-0.6.2-x64-mingw32
MSSQL 10.50.2500.0, 64 bit

Stream based parsing and writing of JSON

I fetch about 20,000 datasets from a server in 1,000 batches. Each dataset is a JSON object. Persisted this makes around 350 MB of uncompressed plaintext.
I have a memory limit of 1GB. Hence, I write each 1,000 JSON objects as an array into a raw JSON file in append mode.
The result is a file with 20 JSON arrays which needs to be aggregated. I need to touch them anyway, because I want to add metadata. Generally the Ruby Yajl Parser makes this possible like so:
raw_file = File.new(path_to_raw_file, 'r')
json_file = File.new(path_to_json_file, 'w')
datasets = []
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new { |o| datasets += o }
parser.parse(datasets)
hash = { date: Time.now, datasets: datasets }
Yajl::Encoder.encode(hash, json_file)
Where is the problem with this solution? The problem is that still the whole JSON is parsed into memory, which I must avoid.
Basically what I need is a solution which parses the JSON from an IO object and encodes them to another IO object, at the same time.
I assumed Yajl offers this, but I haven't found a way, nor did its API give any hints, so I guess not. Is there a JSON Parser library which supports this? Are there other solutions?
The only solution I can think of is to use the IO.seek capabilities. Write all the datasets arrays one after another [...][...][...] and after every array, I seek back to the start and overwrite ][ with ,, effectively connecting the arrays manually.
Why can't you retrieve a single record at a time from the database, process it as necessary, convert it to JSON, then emit it with a trailing/delimiting comma?
If you started with a file that only contained [, then appended all your JSON strings, then, on the final entry didn't append a comma, and instead used a closing ], you'd have a JSON array of hashes, and would only have to process one row's worth at a time.
It'd be a tiny bit slower (maybe) but wouldn't impact your system. And DB I/O can be very fast if you use blocking/paging to retrieve a reasonable number of records at a time.
For instance, here's a combination of some Sequel example code, and code to extract the rows as JSON and build a larger JSON structure:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :name
Float :price
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
add_comma = false
puts '['
items.order(:price).each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[item]
end
puts "\n]"
Which outputs:
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
Notice the order is now by "price".
Validation is easy:
require 'json'
require 'pp'
pp JSON[<<EOT]
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
EOT
Which results in:
[{"id"=>2, "name"=>"def", "price"=>3.714714089426208},
{"id"=>3, "name"=>"ghi", "price"=>27.0179624376119},
{"id"=>1, "name"=>"abc", "price"=>52.51248221170203}]
This validates the JSON and demonstrates that the original data is recoverable. Each row retrieved from the database should be a minimal "bitesized" piece of the overall JSON structure you want to build.
Building upon that, here's how to read incoming JSON in the database, manipulate it, then emit it as a JSON file:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :json
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:json => JSON[:name => 'abc', :price => rand * 100])
items.insert(:json => JSON[:name => 'def', :price => rand * 100])
items.insert(:json => JSON[:name => 'ghi', :price => rand * 100])
items.insert(:json => JSON[:name => 'jkl', :price => rand * 100])
items.insert(:json => JSON[:name => 'mno', :price => rand * 100])
items.insert(:json => JSON[:name => 'pqr', :price => rand * 100])
items.insert(:json => JSON[:name => 'stu', :price => rand * 100])
items.insert(:json => JSON[:name => 'vwx', :price => rand * 100])
items.insert(:json => JSON[:name => 'yz_', :price => rand * 100])
add_comma = false
puts '['
items.each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[
JSON[
item[:json]
].merge('foo' => 'bar', 'time' => Time.now.to_f)
]
end
puts "\n]"
Which generates:
[
{"name":"abc","price":3.268814929005337,"foo":"bar","time":1379688093.124606},
{"name":"def","price":13.871147312377719,"foo":"bar","time":1379688093.124664},
{"name":"ghi","price":52.720984131655676,"foo":"bar","time":1379688093.124702},
{"name":"jkl","price":53.21477190840114,"foo":"bar","time":1379688093.124732},
{"name":"mno","price":40.99364022416619,"foo":"bar","time":1379688093.124758},
{"name":"pqr","price":5.918738444452265,"foo":"bar","time":1379688093.124803},
{"name":"stu","price":45.09391752439902,"foo":"bar","time":1379688093.124831},
{"name":"vwx","price":63.08947792357426,"foo":"bar","time":1379688093.124862},
{"name":"yz_","price":94.04921035056373,"foo":"bar","time":1379688093.124894}
]
I added the timestamp so you can see that each row is processed individually, AND to give you an idea how fast the rows are being processed. Granted, this is a tiny, in-memory database, which has no network I/O to content with, but a normal network connection through a switch to a database on a reasonable DB host should be pretty fast too. Telling the ORM to read the DB in chunks can speed up the processing because the DBM will be able to return larger blocks to more efficiently fill the packets. You'll have to experiment to determine what size chunks you need because it will vary based on your network, your hosts, and the size of your records.
Your original design isn't good when dealing with enterprise-sized databases, especially when your hardware resources are limited. Over the years we've learned how to parse BIG databases, which make 20,000 row tables appear miniscule. VM slices are common these days and we use them for crunching, so they're often the PCs of yesteryear: single CPU with small memory footprints and dinky drives. We can't beat them up or they'll be bottlenecks, so we have to break the data into the smallest atomic pieces we can.
Harping about DB design: Storing JSON in a database is a questionable practice. DBMs these days can spew JSON, YAML and XML representations of rows, but forcing the DBM to search inside stored JSON, YAML or XML strings is a major hit in processing speed, so avoid it at all costs unless you also have the equivalent lookup data indexed in separate fields so your searches are at the highest possible speed. If the data is available in separate fields, then doing good ol' database queries, tweaking in the DBM or your scripting language of choice, and emitting the massaged data becomes a lot easier.
It is possible via JSON::Stream or Yajl::FFI gems. You will have to write your own callbacks though. Some hints on how to do that can be found here and here.
Facing a similar problem I have created the json-streamer gem that will spare you the need to create your own callbacks. It will yield you each object one by one removing it from the memory afterwards. You could then pass these to another IO object as intended.
There is a library called oj that does exactly that. It can do parsing and generation. For example, for parsing you can use Oj::Doc:
Oj::Doc.open('[3,[2,1]]') do |doc|
result = {}
doc.each_leaf() do |d|
result[d.where?] = d.fetch()
end
result
end #=> ["/1" => 3, "/2/1" => 2, "/2/2" => 1]
You can even backtrack in the file using doc.move(path). it seems very flexible.
For writing documents, you can use Oj::StreamWriter:
require 'oj'
doc = Oj::StreamWriter.new($stdout)
def write_item(doc, item)
doc.push_object
doc.push_key "type"
doc.push_value "item"
doc.push_key "value"
doc.push_value item
doc.pop
end
def write_array(doc, array)
doc.push_object
doc.push_key "type"
doc.push_value "array"
doc.push_key "value"
doc.push_array
array.each do |item|
write_item(doc, item)
end
doc.pop
doc.pop
end
write_array(doc, [{a: 1}, {a: 2}]) #=> {"type":"array","value":[{"type":"item","value":{":a":1}},{"type":"item","value":{":a":2}}]}

ORM for SQL Scripting

What is the best way to run simple SQL scripts in a database (preferably DBM implementation agnostically)?
So, for illustration purposes, using your best/suggested way, I'd like to see a script that creates a few tables with names from an array ['cars_table', 'ice_cream_t'], deletes all elements with id=5 in a table, and does a join between two tables and prints the result formatted in some nice way.
I've heard of Python and PL/SQL to do this
Ruby/Datamapper seems very attractive
Java + JDBC, maybe
Others?
Some of these are mostly used in a full application or within a framework. I'd like to see them used simply in scripts.
Ruby/Sequel is currently my weapon of choice.
Short example from the site:
require "rubygems"
require "sequel"
# connect to an in-memory database
DB = Sequel.sqlite
# create an items table
DB.create_table :items do
primary_key :id
String :name
Float :price
end
# create a dataset from the items table
items = DB[:items]
# populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
# print out the number of records
puts "Item count: #{items.count}"
# print out the average price
puts "The average price is: #{items.avg(:price)}"
By using SQL DDL (Data Definition Language), which can be done db agnostically, if you're careful.
There are examples at the Wikipedia article:
http://en.wikipedia.org/wiki/Data_Definition_Language

Resources