Haystack boosting based on specific value in specific field - elasticsearch

I am using Haystack with ElasticSearch and I would like to perform boosts that don't just boost a term in general, but instead boost a term only when it is found on a specific field.
For instance, on my UserIndex, I would like to prioritize (boost) search results where the user is marked as active. is_active is a BooleanField on the index model. I know how to filter so that I only fetch active users, but how can I boost active users but not outright filter out inactive users? I could apply a boost to the field in UserIndex, but that doesn't seem like it would work without some way other than an outright filter to search against that BooleanField (since otherwise there are no search terms that the field boost would affect). I could apply a boost to the SearchQuerySet, but the boost() function takes a string which appears to just be a straight-up search term, and you cannot specify a field for that term to occur in.
I might be able to solve that issue in isolation with order_by, but I have a bunch of other complex boosts I want to do:
I want to be able to boost matching users if they have IDs in a list specified by the application at runtime (this is so I can boost users relative to the context of the page where the search button was pressed). I could simply boost a search term containing the user's ID, but then if that number was coincidentally in another field, it would boost that field too and thus give very strange results.
I want to be able to boost the searching user's friends. I currently have the list of every user's friends in a MultiValueField on the search index model. I want to pass the searching user's ID in with the search query, and boost any users in the index who have the searching user's ID in their friends list. Again, I have the same problem as above -- I can boost the ID, but I can't specify that I only want to boost the occurrence of that ID in that specific field.
I have a second BooleanField I want to boost by, similar to is_active but boosted by a smaller amount.
All of this is easy-ish if I can boost by a combination of a term and a field, but it seems very hard if I can only boost a term and not a field.
The only thing I have been able to think of so far is basically a hack: instead of BooleanFields, use CharFields with magic strings in them. Then boost those magic strings as search terms, and count on nobody accidentally using the magic strings in their inputted text. Likewise, instead of raw ids in my MultiValueFields, use ids prepended with magic strings. This is awkward, fragile and potentially buggy given that the behavior of the ElasticSearch standard tokenizer may be unpredictable given nonsensical "magic strings".
Another option I considered was using a Raw input type and adding ElasticSearch-specific syntax, but usage of Raw with ElasticSearch is almost entirely undocumented and the ElasticSearch boosting documentation itself is very thin.
Is there any way to solve this that does not involve mangling my index data in such a fashion?

In your mapping you could add:
"is_active":{
"type":"boolean",
"boost":10.0
}
and
"friends":{
"type":"int",
"index":"not_analyzed",
"boost":5.0
}
And then wrap your original query in a boolean query with a MUST on your original query and a SHOULD on is_active:true and SHOULD on friends:1234

Related

Elastic Search and Search Ranking Models

I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.
I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user

ElasticSearch and Searching in Arrays

We have an ES index which has a field which stores its data as an array. In this field, we include the original text, plus text without any punctuation, special characters, etc. The problem is, when searching on the field, the multiple values appears to be skewing the score.
For example, if we search on the term 'up', the document which has the array ['up, up and away', 'up up and away'] is scoring higher with a multi_match (we are using because we may search more than one field) than the document with the array as simply ['up'].
In the end, I guess what I am looking for is a score that emulates calculating a score for each item in the array and returning me the highest. I believe in this case, comparing 'up' to 'Up' and 'Up, Up and Away' will give me a higher score for 'Up'.
With my research, I believe I may need to do custom scoring on this field...? If that is true, am I looking at "score_mode": "max" as what I want?
I think you slightly over-engineered your index. You don't need to create duplicate fields for the same information and remove punctuation, lowercase fields yourself.
I'd recommend you to read what are elasticsearch token filters and how to create multiple analyzers for the same field.
For your exact use case, if you provided a document sample, it would certainly help. But in any case looking at what you are dealing with - index your array of strings with default analyzer and with a custom one that you'll build yourself. Then you can use the same field, but with different analyzers (differently processed text) to control your score.

How to sort (and give weight) by Availability dates in SolR

i'm facing a big problem in my SolR DB.
My objects have a datetime field "Available_From" and a datetime field "Available_To".
We also have a "Ranking" field for the sorting.
I can search correctly with direct queries (eg. give me all the items that are available at the moment) but when i do a regular search i cannot find a way to show the items that result "available NOW" in the first places in the results, usually sorted by "Ranking" field.
How can i do this? Am I forced to write some java classes (the nearest thing i've found is there https://medium.com/#devchaitu18/sorting-based-on-a-custom-function-in-solr-c94ddae99a12) or is there a way to do with standard SolR queries?
Thanks in advance to everyone!
In your case you actually don't want sorting, since that indicates that you want one field to determine the returned sequence of documents.
Instead, use boosting - apply a very large boost to those that are available now, either through bq or boost, then apply a boost based on ranking. You'll have to tweak the weights given to each part based on how you want the search results to be presented.

Per user behavior based scoring in Elasticsearch

We do understand the behavior of user by analyzing the tags he usually search for.
Now we need to give higher precedence for such tags for these users. I would like to know how we can achieve this using Elasticsearch in an elegant manner.
Well the best approach for this would be to
Analyse the behavior of the user
See which all keywords are of his interests
Maintain one document per user in another index which have all these keywords.
On the searches for that user , boost the occurrence of these keywords using function_score query
You can use terms filter inside boost function to achieve this.Add the boost function under functions in the function score query
In terms filter , you can point to this users document and get the values dynamically
Use custom filter key so that the cache key constructed wont eat too much memory
In this approach , you can avoid lots of code paths in client code.

Multiple field autocomplete with index type boost

What I'm trying to accomplish on a high level is an autocomplete input field which queries both customers and orders on multiple fields, with customers ranking higher for customer name searches.
It seems to me that there are various ways to approach this problem with the tools that elasticsearch provides.
The way that I have approached this is to use multi_match queries with prefix_phrase type in order to get partial queries to work across multiple fields.
For example, "bo" should return back matches for "Bob Smith" as well as "Adam Boss". I'm indexing fullname as a separate field from firstname and lastname, so that "adam boss" will return a valid prefix match as well.
In addition, I'd like to boost customer results - trying to do that with a boost param on the multi_match, but that doesn't seem to be working the way I'd expect it to.
What would be a straight forward way to tackle this problem?
One of the challenges I'm facing with the elasticsearch docs is that it's not always clear which properties and features apply to which others. For example, the multi_match documentation doesn't talk about using a custom boost, other than on a field-level.
I think the best way is using completion suggester of ES (v0.90.3+), please refer here for a real use case:
http://www.elasticsearch.org/blog/you-complete-me/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Resources