Elasticsearch, Tire, and Nested queries / associations with ActiveRecord - elasticsearch

I'm using ElasticSearch with Tire to index and search some ActiveRecord models, and I've been searching for the "right" way to index and search associations. I haven't found what seems like a best practice for this, so I wanted to ask if anyone has an approach that they think works really well.
As an example setup (this is made up but illustrates the problem), let's say we have a book, with chapters. Each book has a title and author, and a bunch of chapters. Each chapter has text. We want to index the book's fields and the chapters' text so you can search for a book by author, or for any book with certain words in it.
class Book < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
has_many :chapters
mapping do
indexes :title, :analyzer => 'snowball', :boost => 100
indexes :author, :analyzer => 'snowball'
indexes :chapters, type: 'object', properties: {
chapter_text: { type: 'string', analyzer: 'snowball' }
}
end
end
class Chapter < ActiveRecord::Base
belongs_to :book
end
So then I do the search with:
s = Book.search do
query { string query_string }
end
That doesn't work, even though it seems like that indexing should do it. If instead I index:
indexes :chapters, :as => 'chapters.map{|c| c.chapter_text}.join('|'), :analyzer => 'snowball'
That makes the text searchable, but obviously it's not a nice hack and it loses the actual associated object. I've tried variations of the searching, like:
s = Book.search do
query do
boolean do
should { string query_string }
should { string "chapters.chapter_text:#{query_string}" }
end
end
end
With no luck there, either. If anyone has a good, clear example of indexing and searching associated ActiveRecord objects using Tire, it seems like that would be a really good addition to the knowledge base here.
Thanks for any ideas and contributions.

The support for ActiveRecord associations in Tire is working, but requires couple of tweaks inside your application. There's no question the library should do better job here, and in the future it certainly will.
That said, here is a full-fledged example of Tire configuration to work with Rails' associations in elasticsearch: active_record_associations.rb
Let me highlight couple of things here.
Touching the parent
First, you have to ensure you notify the parent model of the association about changes in the association.
Given we have a Chapter model, which “belongs to” a Book, we need to do:
class Chapter < ActiveRecord::Base
belongs_to :book, touch: true
end
In this way, when we do something like:
book.chapters.create text: "Lorem ipsum...."
The book instance is notified about the added chapter.
Responding to touches
With this part sorted, we need to notify Tire about the change, and update the elasticsearch index accordingly:
class Book < ActiveRecord::Base
has_many :chapters
after_touch() { tire.update_index }
end
(There's no question Tire should intercept after_touch notifications by itself, and not force you to do this. It is, on the other hand, a testament of how easy is to work your way around the library limitations in a manner which does not hurt your eyes.)
Proper JSON serialization in Rails < 3.1
Despite the README mentions you have to disable automatic "adding root key in JSON" in Rails < 3.1, many people forget it, so you have to include it in the class definition as well:
self.include_root_in_json = false
Proper mapping for elasticsearch
Now comes the meat of our work -- defining proper mapping for our documents (models):
mapping do
indexes :title, type: 'string', boost: 10, analyzer: 'snowball'
indexes :created_at, type: 'date'
indexes :chapters do
indexes :text, analyzer: 'snowball'
end
end
Notice we index title with boosting, created_at as "date", and chapter text from the associated model. All the data are effectively “de-normalized” as a single document in elasticsearch (if such a term would make slight sense).
Proper document JSON serialization
As the last step, we have to properly serialize the document in the elasticsearch index. Notice how we can leverage the convenient to_json method from ActiveRecord:
def to_indexed_json
to_json( include: { chapters: { only: [:text] } } )
end
With all this setup in place, we can search in properties in both the Book and the Chapter parts of our document.
Please run the active_record_associations.rb Ruby file linked at the beginning to see the full picture.
For further information, please refer to these resources:
https://github.com/karmi/railscasts-episodes/commit/ee1f6f3
https://github.com/karmi/railscasts-episodes/commit/03c45c3
https://github.com/karmi/tire/blob/master/test/models/active_record_models.rb#L10-20
See this StackOverflow answer: ElasticSearch & Tire: Using Mapping and to_indexed_json for more information about mapping / to_indexed_json interplay.
See this StackOverflow answer: Index the results of a method in ElasticSearch (Tire + ActiveRecord) to see how to fight n+1 queries when indexing models with associations.

I have created this as a solution in one of my applications, that indexes a deeply nested set of models
https://gist.github.com/paulnsorensen/4744475
UPDATE: I have now released a gem that does this:
https://github.com/paulnsorensen/lifesaver

Related

How to query multiple fields with Chewy

Let's say I have an index with multiple objects in it:
class ThingsIndex < Chewy::Index
define_type User do
field :full_name
end
define_type Post do
field :title
end
end
How do I search both users' full_name and posts' titles.
The docs only talk about querying one attribute like this:
ThingsIndex.query(term: {full_name: 'Foo'})
There are a couple ways you could do this. Chaining is probably the easiest:
ThingsIndex.query(term: {full_name: 'Foo'}).query(term: {title: 'Foo'})
If you need to do several queries, you might consider merging them:
query = ThingsIndex.query(term: {full_name: 'Foo'})
query = query.merge(ThingsIndex.query(term: {title: 'Foo'}))
Read more about merging here: Chewy #merge docs
Make sure to set your limit or else it only shows 10 results:
query.limit(50)

How do I remove sphinx_deleted from a Sphinx query?

I am new to Ruby and ThinkingSphinx.
I have the following Sphinx Query - SELECT * FROM user_core, user_delta WHERE sphinx_deleted = 0.
I do not want to see the condition "WHERE 'sphinx_deleted' = 0. How do I remove this? I have removed the sql_attr_uint = sphinx_deleted from my sphinx.conf file, yet I see the sphinx_deleted being passed in the query.
Here is the index file definition:
ThinkingSphinx::Index.define :user, :with => :active_record, :delta => true do
indexes [first_name,last_name,display_name], :as=>:name, :sortable=>true
indexes first_name, :sortable => true
indexes last_name, :sortable => true
indexes display_name, :sortable => true
indexes email, :sortable => true
indexes phone, :sortable => true
indexes title, :sortable => true
has id, :as => :user_id
has roles(:id), :as => :role_ids
has jurisdictions(:id), :as => :jurisdiction_ids
set_property :delta => true
end
I do not have a sphinx_scope or default_sphinx_scope defined.
We are using thinking-sphinx-3.1.0 and ruby-2.1.0
The sphinx_deleted attribute is created by Thinking Sphinx, and is used in the following cases (using your scenario of a User model with core and delta indices in the examples):
When a User is deleted, sphinx_deleted is set to 1 for that record in both the core and delta indices - there's no point returning Sphinx records if the underlying ActiveRecord object no longer exists.
When a User is updated, the delta index is processed with the latest field and attribute details, and the core index's document has sphinx_deleted set to 1, so only the latest (accurate) information will match. e.g. if a user has their name changed from Fred to Georgina, a search for 'Fred' will not return Georgina, because the core index document (which does match) is filtered out.
That is why the attribute exists. You cannot tell Thinking Sphinx to not add it, nor can you remove that filter, short of mucking around in the internals of Thinking Sphinx.
If there is a specific reason for wanting to remove the attribute and filter, feel free to comment here, or you can open an issue on the GitHub repo, or post to the TS Google Group.
Update
Okay, further to this, there are three ways around it.
Option One:
The first way is to make the query to Sphinx yourself, using a Thinking Sphinx connection:
results = ThinkingSphinx::Connection.take do |connection|
connection.execute "SELECT * FROM user_core, user_delta"
end
Keep in mind that this returns raw Sphinx values, not ActiveRecord instances.
Option Two:
A more complicated alternative, though, is to have your own search middleware stack. First, you'll want to create a custom subclass of ThinkingSphinx::Middlewares::SphinxQL that removes the :sphinx_deleted filter:
class SphinxQLWithoutFilter < ThinkingSphinx::Middlewares::SphinxQL
def call(contexts)
contexts.each do |context|
Inner.new(context).call
end
app.call contexts
end
private
class Inner < ThinkingSphinx::Middlewares::SphinxQL::Inner
def inclusive_filters
super.except :sphinx_deleted
end
end
end
Then, create a new middleware stack which uses this new SphinxQL query middleware:
WithoutFilterMiddleware = ::Middleware::Builder.new do
use ThinkingSphinx::Middlewares::StaleIdFilter
use SphinxQLWithoutFilter
use ThinkingSphinx::Middlewares::Geographer
use ThinkingSphinx::Middlewares::Inquirer
use ThinkingSphinx::Middlewares::ActiveRecordTranslator
use ThinkingSphinx::Middlewares::StaleIdChecker
use ThinkingSphinx::Middlewares::Glazier
end
And then you can use that middleware stack in specific search queries:
User.search 'foo', :middleware => WithoutFilterMiddleware
It's worth noting the two middleware present in that stack for stale ids. They work together to catch any Sphinx results that do not have a matching ActiveRecord object, and re-run the Sphinx query up to three times filtering out those unmatched records. They're probably useful, but if you don't want to use them, you can remove them from your custom stack. However, without them, any Sphinx records that don't have matching ActiveRecord objects will be transformed into nils.
Option Three:
This is the more hackish version of the previous solution, but will apply to all searches, so probably isn't worthwhile: re-open the class that adds the filter with class_eval and change the method definition:
ThinkingSphinx::Middlewares::SphinxQL::Inner.class_eval do
def inclusive_filters
# normally:
# (options[:with] || {}).merge({:sphinx_deleted => false})
# but without the sphinx_deleted filter:
options[:with] || {}
end
end
Now, all that said: I presume you're not actually deleting users, but somehow the deletion callbacks are being fired anyway? Hence, users do exist but are currently being filtered out by Sphinx? If so, I highly recommend not using ActiveRecord's destroy method, and instead having a custom method to mark users as inactive. This avoids the callbacks, and thus avoids the need for any of the above 'solutions'.

Mongoid $project aggregation doesn't return anything

I'm trying to perform the following aggregation with Mongoid:
Award.collection.aggregate( [ {"$project" => {:"value.amount"=> 1}} ] )
This returns:
#<Mongo::Collection::View::Aggregation:0x0055cc6e8658b8
#options={},
#pipeline=[{"$project"=>{:"value.amount"=>1}}],
#view=#<Mongo::Collection::View:0x47168257993960
namespace='elvis_development.awards #selector={} #options={}>>
so no results but no errors either. This version has the same syntax as the example they give in the docs but I've tried different syntax too, with no success. In the mongo shell this:
db.awards.aggregate( [ { $project : { value.amount : 1 } } ] )
returns the desired results.
I use MongoDB v3.0.7 and Mongoid 5.0.1 and this is my model:
class Award
include Mongoid::Document
include Mongoid::Elasticsearch
# Associations
belongs_to :document
embeds_one :date, class_name: "AwardDate", inverse_of: :award
embeds_one :value, class_name: "Value", inverse_of: :award
accepts_nested_attributes_for :value, :date
# Fields
field :title, type: String
field :description, type: String
elasticsearch!({
prefix_name: false,
index_name: 'awards',
wrapper: :load
})
end
Am I doing something wrong? I noticed in this example on mongo_ruby_driver Github that the $project aggregation is supported, but I've tried with both nested and not nested attributes with the same result. I realize I could do this with normal retrieval but I would prefer aggregations since they are faster and I have a large data set. Any thoughts would be very much appreciated.
Modern releases of Mongoid (v5 and greater) now use a modern mongodb ruby driver rather than the older "moped" driver of Mongoid v3 and v4.
This means that .aggregate() returns a "cursor", or specifically a Mongo::Collection::View::Readable object instead of a plain array of objects, which is consistent with other modern driver releases.
So iterate the "cursor" instead, via the standard ways. i.e:
require "pp"
Award.collection.aggregate( [ {"$project" => { "value.amount"=> 1}} ] ).each do | doc |
pp doc
end
Which will give you output like this for each document in the response:
{"_id"=>BSON::ObjectId('564c4836023fb886145f8063'), "value"=>{"amount"=>1.0}}
Just like you asked for.

Import from one index to a new index with a persistence model

I have an application that has a Nutch crawler sending results directly to an ElasticSearch index created by a Tire Persistence model.
I am looking for the best way to make changes to the index that does not require deleting the index, and then recreating it and re-populating it as the index is the master data source. I've been trying to get the method working where your index is an alias, then have indexes associated with the alias, and then import from the master index to a new index.
I have been trying to get the rake environment tire:import CLASS='Applicant' INDEX='index_new' command to get the job done with this approach, but have not had any success as it fails on the import at first due to an undefined method 'paginate' and then after I defined a 'paginate' method in my model, it fails from an undefined method 'count' which it hits in at tire-0.60.0/lib/tire/model/import.rb:102.
I've been scouring for days looking for the right approach, and I'm not convinced at this point that I'm on the right path at all at this point. I have included my model below for reference. I am using WillPaginate for pagination.
class Applicant
include Tire::Model::Persistence
include Tire::Model::Search
include Tire::Model::Callbacks
require 'will_paginate'
require 'will_paginate-bootstrap'
require 'will_paginate/array'
index_name 'index'
document_type 'doc'
mapping
indexes :boost, type: 'string'
indexes :content, type: 'string'
indexes :digest, type: 'string'
indexes :id, type: 'string'
indexes :skill, type: 'string'
indexes :title, type: 'string'
indexes :tstamp, type: 'date', format: 'dateOptionalTime'
indexes :url, type: 'string'
indexes :domain, type 'string'
property :boost
property :content
property :digest
property :id
property :skill
property :title
property :tstamp
property :url
property :domain
def self.search(params)
tire.search(page: params[:page], per_page: 20)do
query { string params[:query], default_operator: "AND" } if params[:query].present?
filter :term, domain: params[:domain_selected] if params[:domain_selected].present?
filter :term, skill: params[:skill_selected] if params[:skill_selected].present?
facet "domains" do
terms :domain
end
facet "skills" do
terms :skill
end
end
end
def self.paginate(params)
#page_results = WillPaginate::Collection.create(params[:page], per_page, total_entries) do |pager|
pager.replace(#self.to_array)
end
#page_results = #self.paginate(params[:current_page], params[:per_page])
end
end
On a side note but lower priority too me, I've been digging through the code trying to understand why the import needs pagination and it's not clear to me.
Thanks in advance.
Well, the reason you're getting that error is that in your view, I would guess, you're referring to the paginate gem.
First thing to do is either check your view, and strip paginate out of the view and the controller, OR, if you need paginate, do this simple test:
Your application should load the will_paginate gem. To see if the
library has been loaded, open the console for your app and try the
following lines:
defined? WillPaginate
ActiveRecord::Base.respond_to? :paginate If any of these lines return nil/false, will_paginate has not properly loaded in your app.
(( from https://github.com/mislav/will_paginate/wiki/Troubleshooting ))
If it fails out, make sure you have the following two lines in your Gemfile:
gem 'will_paginate', '~> 3.0.3'
gem 'bootstrap-will_paginate', '~> 0.0.6'
If that doesn't work for you, let me know, and we'll dig deeper.
So after 2 weeks of searching, I found the solution I was looking for. I basically accomplish the same result I was looking for using Article.create_elasticsearch_index followed by Tire.index('original-index-name').reindex 'new-index-name'. Karmi's tweet here is what led me to the right solution.
https://twitter.com/karmiq/status/185811361069142016
I'm also working on adapting jarosan's work here into working for my situation and will post soon.
https://gist.github.com/3124884
Thanks Michel and Karel.

Tire gem - Does it support document boosting?

Reading the ElasticSearch documentation (http://www.elasticsearch.org/guide/reference/mapping/boost-field.html) it says that you can boost a document based on a value, is this behaviour implemented via Tire - I'm struggling with syntax if it is.
Update:
It looks like;
mapping do
indexes :llt_code, :index => :not_analyze
indexes :llt_name, :analyzer => 'snowball'
indexes :_boost, :as => '_boost'
end
is what I need, assuming the _boost column has the boosted value in?
Always worth checking what YourModel.mapping_to_hash outputs: this is what tire will send over to elasticsearch when it creates the mapping. As it is, your code is wrong - _boost is a top level option, whereas what you've posted sticks stuff in the properties part of the mapping
mapping(:_boost => {:name => 'foo', :null_value => 1.0}) do
indexes ...
end
should tell elasticsearch to use the field named foo for _boost this at the right level.

Resources