I benchmarked 2 version of my solr index the first with the following include statement:
searchable(:auto_index => false, :auto_remove => true,
:include => { :account => true,
:user_practice_contact => [:city],
:user_professional_detail => [:specialty, :subspecialties]}) do
The second:
searchable(:auto_index => false, :auto_remove => true) do
I was expecting to see a speed bump on the version with includes but here is the outcome:
version with includes:
Benchmark.measure { User.limit(50).each do |u|; u.index; end; Sunspot.commit; }
=> #<Benchmark::Tms:0x1130b34e8 #real=6.8079788684845, #utime=5.05, #cstime=0.0, #cutime=0.0, #total=5.2, #label="", #stime=0.149999999999999>
and without the includes:
Benchmark.measure { User.limit(50).each do |u|; u.index; end; Sunspot.commit; }
=> #<Benchmark::Tms:0x112ef0fe8 #real=6.82465195655823, #utime=4.92, #cstime=0.0, #cutime=0.0, #total=5.07, #label="", #stime=0.15>
Does anybody know if the includes are supposed to work? And if so, am I doing it wrong?
I looked at the docs: http://outoftime.github.com/sunspot/rails/docs/ and see no mention of that.
According to the API, :include will:
allow ActiveRecord to load required associations when indexing.
Your benchmark does not work properly because you are indexing individual records in an ordinary Ruby iterator. As you are indexing a single record 50 times, Sunspot wouldn't be able to utilize the eager loading at all. Instead you should do:
Sunspot.index( User.limit(50) );
Sunspot.commit
Oh and could you test if the following is faster than above? I really want to know.
Sunspot.index( User.includes(:account).limit(50) );
Sunspot.commit
Also there is a bug currently that STI models will ignore the :include.
By looking at the SQL queries in the Rails log, you can see that :include on searchable causes eager loading while indexing. :include on #search caused eager loading while searching.
Related
I am new to Ruby and ThinkingSphinx.
I have the following Sphinx Query - SELECT * FROM user_core, user_delta WHERE sphinx_deleted = 0.
I do not want to see the condition "WHERE 'sphinx_deleted' = 0. How do I remove this? I have removed the sql_attr_uint = sphinx_deleted from my sphinx.conf file, yet I see the sphinx_deleted being passed in the query.
Here is the index file definition:
ThinkingSphinx::Index.define :user, :with => :active_record, :delta => true do
indexes [first_name,last_name,display_name], :as=>:name, :sortable=>true
indexes first_name, :sortable => true
indexes last_name, :sortable => true
indexes display_name, :sortable => true
indexes email, :sortable => true
indexes phone, :sortable => true
indexes title, :sortable => true
has id, :as => :user_id
has roles(:id), :as => :role_ids
has jurisdictions(:id), :as => :jurisdiction_ids
set_property :delta => true
end
I do not have a sphinx_scope or default_sphinx_scope defined.
We are using thinking-sphinx-3.1.0 and ruby-2.1.0
The sphinx_deleted attribute is created by Thinking Sphinx, and is used in the following cases (using your scenario of a User model with core and delta indices in the examples):
When a User is deleted, sphinx_deleted is set to 1 for that record in both the core and delta indices - there's no point returning Sphinx records if the underlying ActiveRecord object no longer exists.
When a User is updated, the delta index is processed with the latest field and attribute details, and the core index's document has sphinx_deleted set to 1, so only the latest (accurate) information will match. e.g. if a user has their name changed from Fred to Georgina, a search for 'Fred' will not return Georgina, because the core index document (which does match) is filtered out.
That is why the attribute exists. You cannot tell Thinking Sphinx to not add it, nor can you remove that filter, short of mucking around in the internals of Thinking Sphinx.
If there is a specific reason for wanting to remove the attribute and filter, feel free to comment here, or you can open an issue on the GitHub repo, or post to the TS Google Group.
Update
Okay, further to this, there are three ways around it.
Option One:
The first way is to make the query to Sphinx yourself, using a Thinking Sphinx connection:
results = ThinkingSphinx::Connection.take do |connection|
connection.execute "SELECT * FROM user_core, user_delta"
end
Keep in mind that this returns raw Sphinx values, not ActiveRecord instances.
Option Two:
A more complicated alternative, though, is to have your own search middleware stack. First, you'll want to create a custom subclass of ThinkingSphinx::Middlewares::SphinxQL that removes the :sphinx_deleted filter:
class SphinxQLWithoutFilter < ThinkingSphinx::Middlewares::SphinxQL
def call(contexts)
contexts.each do |context|
Inner.new(context).call
end
app.call contexts
end
private
class Inner < ThinkingSphinx::Middlewares::SphinxQL::Inner
def inclusive_filters
super.except :sphinx_deleted
end
end
end
Then, create a new middleware stack which uses this new SphinxQL query middleware:
WithoutFilterMiddleware = ::Middleware::Builder.new do
use ThinkingSphinx::Middlewares::StaleIdFilter
use SphinxQLWithoutFilter
use ThinkingSphinx::Middlewares::Geographer
use ThinkingSphinx::Middlewares::Inquirer
use ThinkingSphinx::Middlewares::ActiveRecordTranslator
use ThinkingSphinx::Middlewares::StaleIdChecker
use ThinkingSphinx::Middlewares::Glazier
end
And then you can use that middleware stack in specific search queries:
User.search 'foo', :middleware => WithoutFilterMiddleware
It's worth noting the two middleware present in that stack for stale ids. They work together to catch any Sphinx results that do not have a matching ActiveRecord object, and re-run the Sphinx query up to three times filtering out those unmatched records. They're probably useful, but if you don't want to use them, you can remove them from your custom stack. However, without them, any Sphinx records that don't have matching ActiveRecord objects will be transformed into nils.
Option Three:
This is the more hackish version of the previous solution, but will apply to all searches, so probably isn't worthwhile: re-open the class that adds the filter with class_eval and change the method definition:
ThinkingSphinx::Middlewares::SphinxQL::Inner.class_eval do
def inclusive_filters
# normally:
# (options[:with] || {}).merge({:sphinx_deleted => false})
# but without the sphinx_deleted filter:
options[:with] || {}
end
end
Now, all that said: I presume you're not actually deleting users, but somehow the deletion callbacks are being fired anyway? Hence, users do exist but are currently being filtered out by Sphinx? If so, I highly recommend not using ActiveRecord's destroy method, and instead having a custom method to mark users as inactive. This avoids the callbacks, and thus avoids the need for any of the above 'solutions'.
Let's say we have a MongoDB collection called "images", and a MongoMapper-powered application with a corresponding "Image" model. If we set up a MongoMapper query using this model, we see that it is of type Plucky::Query and returns results of type Image:
>> Image.where(:file_type => 'image/jpeg').class
=> Plucky::Query
>> Image.where(:file_type => 'image/jpeg').first.class
=> Image
We can run the corresponding query directly on the Mongo adapter, mostly bypassing MongoMapper, by accessing the MongoMapper.connection. If we do it this way, the query is of type Mongo::Cursor and returns raw data results of type BSON::OrderedHash:
>> MongoMapper.connection.db(dbname).collection('images').find({ :file_type => 'image/jpeg' }).class
=> Mongo::Cursor
>> MongoMapper.connection.db(dbname).collection('images').find({ :file_type => 'image/jpeg' }).first.class
=> BSON::OrderedHash
The question is, is there a way to take a Plucky::Query like above and convert it to (or retrieve from it) a basic, non-extended Mongo::Cursor object?
At first I thought I found a solution with find_each, which does actually take a Plucky::Query and return a Mongo::Cursor:
>> Image.where(:file_type => 'image/jpeg').find_each.class
=> Mongo::Cursor
But it turns out this Mongo::Cursor is somehow extended or otherwise different from the above one because it still returns Image objects instead of BSON::OrderHash objects:
>> Image.where(:file_type => 'image/jpeg').find_each.first.class
=> Image
Update: I can't simply bypass MongoMapper query magic altogether like I did in the second case because I need to access features of MongoMapper (specifically named scopes) to build up the query, so what I end up with is a Plucky::Query. But then I want the results to be plain data objects, not models, because all I need is data and I don't want the overhead of model instantiation.
If you drop to the driver, the transformer is nil by default:
1.9.3p194 :003 > Image.collection.find({ :file_type => 'image/jpeg' }, { :limit => 1 }).first.class
=> BSON::OrderedHash
MongoMapper achieves the conversion by setting a "transformer" lambda on the plucky query. You can see this in the MongoMapper source code:
def query(options={})
query = Plucky::Query.new(collection, :transformer => transformer)
...
end
...
def transformer
#transformer ||= lambda { |doc| load(doc) }
end
So after each mongo document retrieval, this Plucky::Query runs the transformation that loads the model. Looking at the Plucky source code we see that there is a simple setter method [] we can use to disable this. So this is the solution:
plucky_query = Image.where(:file_type => 'image/jpeg')
plucky_query.first.class
# => Image
plucky_query[:transformer] = nil
plucky_query.first.class
# => BSON::OrderedHash
If you don't mind monkey-patching you can encapsulate like so:
module Plucky
class Query
def raw_data
self[:transformer] = nil
self
end
end
end
Then you could simply write:
Image.where(:file_type => 'image/jpeg').raw_data.first.class
# => BSON::OrderedHash
Reading the ElasticSearch documentation (http://www.elasticsearch.org/guide/reference/mapping/boost-field.html) it says that you can boost a document based on a value, is this behaviour implemented via Tire - I'm struggling with syntax if it is.
Update:
It looks like;
mapping do
indexes :llt_code, :index => :not_analyze
indexes :llt_name, :analyzer => 'snowball'
indexes :_boost, :as => '_boost'
end
is what I need, assuming the _boost column has the boosted value in?
Always worth checking what YourModel.mapping_to_hash outputs: this is what tire will send over to elasticsearch when it creates the mapping. As it is, your code is wrong - _boost is a top level option, whereas what you've posted sticks stuff in the properties part of the mapping
mapping(:_boost => {:name => 'foo', :null_value => 1.0}) do
indexes ...
end
should tell elasticsearch to use the field named foo for _boost this at the right level.
I am fairly new to Ruby and MongoDB in particular. I use Mongo in a Ruby script to store and process thousands of Tweets in a collection. I would love to improve legibility and "rubyness" of the find command here:
require 'rubygems'
require 'mongo'
db = Mongo::Connection.new("localhost").db("db")
coll = db.collection("tweets")
cursor = coll.find({
'geo_enabled' => true,
'status.text' => { '$exists' => true },
'followers_count' => {
'$gte' => 10,
'$lt' => 100 }
})
cursor.each_with_index { |row,idx|
# do stuff
}
The mongodb query syntax drives me nuts! Is there a more elegant, ruby-like way to do a query?
You can use Mongoid, it has nice syntax for queries, much similar to that of ActiveRecord/ActiveRelation.
MongoDB seems like it is using an inefficient query pattern when one index is a subset of another index.
class Model
field :status, :type => Integer
field :title, :type => String
field :subtitle, :type => String
field :rating, :type => Float
index([
[:status, Mongo::ASCENDING],
[:title, Mongo::ASCENDING],
[:subtitle, Mongo::ASCENDING],
[:rating, Mongo::DESCENDING]
])
index([
[:status, Mongo::ASCENDING],
[:title, Mongo::ASCENDING],
[:rating, Mongo::DESCENDING]
])
end
The first index is being used both when querying on status, title and subtitle and sorting on rating and when querying on just status and title and sorting on rating even though using explain() along with hint() in the javascript console states that using the second index is 4 times faster.
How can I tell Mongoid to tell MongoDB to use the second index?
You can pass options such as hint to Mongo::Collection using Mongoid::Criterion::Optional.extras
An example:
criteria = Model.where(:status => true, :title => 'hello world').desc(:rating)
criteria.extras(:hint => {:status => 1, :title => 1, :rating => -1})
extras accepts anything that Mongo::Collection can handle
http://www.mongodb.org/display/DOCS/Optimization#Optimization-Hint
While the mongo query optimizer often
performs very well, explicit "hints"
can be used to force mongo to use a
specified index, potentially improving
performance in some situations.
db.collection.find({user:u, foo:d}).hint({user:1});
You need to work from http://www.rdoc.info/github/mongoid/mongoid/master/Mongoid/Cursor here as I do not know Ruby enough. It mentions hint.