Per user behavior based scoring in Elasticsearch - elasticsearch

We do understand the behavior of user by analyzing the tags he usually search for.
Now we need to give higher precedence for such tags for these users. I would like to know how we can achieve this using Elasticsearch in an elegant manner.

Well the best approach for this would be to
Analyse the behavior of the user
See which all keywords are of his interests
Maintain one document per user in another index which have all these keywords.
On the searches for that user , boost the occurrence of these keywords using function_score query
You can use terms filter inside boost function to achieve this.Add the boost function under functions in the function score query
In terms filter , you can point to this users document and get the values dynamically
Use custom filter key so that the cache key constructed wont eat too much memory
In this approach , you can avoid lots of code paths in client code.

Related

Elasticsearch multiple score fields

Maybe a dummy question: is it possible to have multiple score fields?
I use a custom score based on function_score query. This score is being displayed to the user to show, how much each document matches his/her preferences. So far so good.
But! The user should be able to filter the documents and (of course) sort them not only by the custom relevance (how much each document matches his/her preferences) but also by the common relevance - how much each document matches the filter criteria.
So my first idea was to place the score calculated by function_score query to a custom field but it does not seems to be supported.
Or am I completely wrong and I should use another approach?
I took a different approach - in case user applies some filter the I run the query without function_score percolation and use the score calculated by ES and sort by it. Then I take all IDs from the result page and run percolation query with these IDs to get the custom "matching score". It does not seems to cause noticeable slowdown.
Anyway, I welcome any feedback.

Elastic Search and Search Ranking Models

I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.
I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user

Elasticsearch - Lookup value for query score

I am using elasticsearch with a custom script. I need to create a custom score based on another values defined in another index. Let's call this value "ranking". The score should look like this:
Math.log(_score * 2) + doc['ranking'].value
The ranking is dependent on the provider field in the type that is being scored. My problem is that since then ranking is in another index all together, it doesn't retrieve the value. What is the best way to conduct such a score? Also, how could it be done so that it could cache the results. For example, is there anyway in mvel to retrieve the value based on a redis key-value pair.
You must forget about mvel :)
You should use Groovy scripting
You could check this link for caching technique https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/LookupScript.java, but it's native java plugin. Though it could be translated to groovy script instance.
Another way (but it depends on your index relations, for example for parent/child), you could use has_child query and use score field as a transfer block between indexes, but this is too limited.

Multiple field autocomplete with index type boost

What I'm trying to accomplish on a high level is an autocomplete input field which queries both customers and orders on multiple fields, with customers ranking higher for customer name searches.
It seems to me that there are various ways to approach this problem with the tools that elasticsearch provides.
The way that I have approached this is to use multi_match queries with prefix_phrase type in order to get partial queries to work across multiple fields.
For example, "bo" should return back matches for "Bob Smith" as well as "Adam Boss". I'm indexing fullname as a separate field from firstname and lastname, so that "adam boss" will return a valid prefix match as well.
In addition, I'd like to boost customer results - trying to do that with a boost param on the multi_match, but that doesn't seem to be working the way I'd expect it to.
What would be a straight forward way to tackle this problem?
One of the challenges I'm facing with the elasticsearch docs is that it's not always clear which properties and features apply to which others. For example, the multi_match documentation doesn't talk about using a custom boost, other than on a field-level.
I think the best way is using completion suggester of ES (v0.90.3+), please refer here for a real use case:
http://www.elasticsearch.org/blog/you-complete-me/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Haystack boosting based on specific value in specific field

I am using Haystack with ElasticSearch and I would like to perform boosts that don't just boost a term in general, but instead boost a term only when it is found on a specific field.
For instance, on my UserIndex, I would like to prioritize (boost) search results where the user is marked as active. is_active is a BooleanField on the index model. I know how to filter so that I only fetch active users, but how can I boost active users but not outright filter out inactive users? I could apply a boost to the field in UserIndex, but that doesn't seem like it would work without some way other than an outright filter to search against that BooleanField (since otherwise there are no search terms that the field boost would affect). I could apply a boost to the SearchQuerySet, but the boost() function takes a string which appears to just be a straight-up search term, and you cannot specify a field for that term to occur in.
I might be able to solve that issue in isolation with order_by, but I have a bunch of other complex boosts I want to do:
I want to be able to boost matching users if they have IDs in a list specified by the application at runtime (this is so I can boost users relative to the context of the page where the search button was pressed). I could simply boost a search term containing the user's ID, but then if that number was coincidentally in another field, it would boost that field too and thus give very strange results.
I want to be able to boost the searching user's friends. I currently have the list of every user's friends in a MultiValueField on the search index model. I want to pass the searching user's ID in with the search query, and boost any users in the index who have the searching user's ID in their friends list. Again, I have the same problem as above -- I can boost the ID, but I can't specify that I only want to boost the occurrence of that ID in that specific field.
I have a second BooleanField I want to boost by, similar to is_active but boosted by a smaller amount.
All of this is easy-ish if I can boost by a combination of a term and a field, but it seems very hard if I can only boost a term and not a field.
The only thing I have been able to think of so far is basically a hack: instead of BooleanFields, use CharFields with magic strings in them. Then boost those magic strings as search terms, and count on nobody accidentally using the magic strings in their inputted text. Likewise, instead of raw ids in my MultiValueFields, use ids prepended with magic strings. This is awkward, fragile and potentially buggy given that the behavior of the ElasticSearch standard tokenizer may be unpredictable given nonsensical "magic strings".
Another option I considered was using a Raw input type and adding ElasticSearch-specific syntax, but usage of Raw with ElasticSearch is almost entirely undocumented and the ElasticSearch boosting documentation itself is very thin.
Is there any way to solve this that does not involve mangling my index data in such a fashion?
In your mapping you could add:
"is_active":{
"type":"boolean",
"boost":10.0
}
and
"friends":{
"type":"int",
"index":"not_analyzed",
"boost":5.0
}
And then wrap your original query in a boolean query with a MUST on your original query and a SHOULD on is_active:true and SHOULD on friends:1234

Resources