Elastic Search and Search Ranking Models - elasticsearch

I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.

I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user

Related

How to sort (and give weight) by Availability dates in SolR

i'm facing a big problem in my SolR DB.
My objects have a datetime field "Available_From" and a datetime field "Available_To".
We also have a "Ranking" field for the sorting.
I can search correctly with direct queries (eg. give me all the items that are available at the moment) but when i do a regular search i cannot find a way to show the items that result "available NOW" in the first places in the results, usually sorted by "Ranking" field.
How can i do this? Am I forced to write some java classes (the nearest thing i've found is there https://medium.com/#devchaitu18/sorting-based-on-a-custom-function-in-solr-c94ddae99a12) or is there a way to do with standard SolR queries?
Thanks in advance to everyone!
In your case you actually don't want sorting, since that indicates that you want one field to determine the returned sequence of documents.
Instead, use boosting - apply a very large boost to those that are available now, either through bq or boost, then apply a boost based on ranking. You'll have to tweak the weights given to each part based on how you want the search results to be presented.

Google Search Appliance sort by metadata content

I'm trying to refine the search results received by my application by including the sort parameter in my HTTP requests. I've combed through the documentation here, but I can't find exactly what I'm looking for.
I'm searching for DOC filetypes, and I am able to sort by date or sort by metadata, as in alphabetizing by title, author, etc. I can also filter by whether or not the title contains certain keywords. What I want to do is to sort by whether or not the title contains certain keywords (these documents appearing first in the results), but to still keep the other results.
For example, with keywords [winter, Christmas, holiday] I could do a descending sort by the sum of inmeta:title~winter, inmeta:title~Christmas, inmeta:title~holiday and the top result might be
Winter holidays other than Christmas
followed by documents with one or two of the keywords, followed by documents that meet the other search parameters but contain no keywords.
Is this possible in GSA?
I finally achieved what I was trying to do, so figured I'd post in case it helps anyone else.
As far as I know, it is impossible to create a query with this capability, but with Google's Custom Search API, you can create a search engine with the desired keywords in the context file (by editing the XML file directly or by adding keywords through the CSE console). Then you can formulate the query as usual, but perform the search on your personalized engine.
https://developers.google.com/custom-search/docs/ranking

Inject additional logic when elastic searches based on search query

I have a scenario where i need to extend search behavior and add some additional filter logic based on the document id . If the document can be visible to the user searching and then show in search result but have not found any.
Currently as part of search , search query is executed after applying all the filters. After the search results are fetched, we need to know if the resource can be shown to this user. Pretty much like ACL.
Now, if i apply these authorization/ audience type filters myself after getting results from elastic search , it creates a lot of problems like the aggregations count changes post filter. Also, the pagination of results gets impacted.
Is a there a way to implement such rules as hook provided by elastic search. That is if i can implement some logic implementing some interface and then call some web service returning a boolean and then adding the search result to the final collection based on that.
Some insight would be very useful.

Per user behavior based scoring in Elasticsearch

We do understand the behavior of user by analyzing the tags he usually search for.
Now we need to give higher precedence for such tags for these users. I would like to know how we can achieve this using Elasticsearch in an elegant manner.
Well the best approach for this would be to
Analyse the behavior of the user
See which all keywords are of his interests
Maintain one document per user in another index which have all these keywords.
On the searches for that user , boost the occurrence of these keywords using function_score query
You can use terms filter inside boost function to achieve this.Add the boost function under functions in the function score query
In terms filter , you can point to this users document and get the values dynamically
Use custom filter key so that the cache key constructed wont eat too much memory
In this approach , you can avoid lots of code paths in client code.

Haystack boosting based on specific value in specific field

I am using Haystack with ElasticSearch and I would like to perform boosts that don't just boost a term in general, but instead boost a term only when it is found on a specific field.
For instance, on my UserIndex, I would like to prioritize (boost) search results where the user is marked as active. is_active is a BooleanField on the index model. I know how to filter so that I only fetch active users, but how can I boost active users but not outright filter out inactive users? I could apply a boost to the field in UserIndex, but that doesn't seem like it would work without some way other than an outright filter to search against that BooleanField (since otherwise there are no search terms that the field boost would affect). I could apply a boost to the SearchQuerySet, but the boost() function takes a string which appears to just be a straight-up search term, and you cannot specify a field for that term to occur in.
I might be able to solve that issue in isolation with order_by, but I have a bunch of other complex boosts I want to do:
I want to be able to boost matching users if they have IDs in a list specified by the application at runtime (this is so I can boost users relative to the context of the page where the search button was pressed). I could simply boost a search term containing the user's ID, but then if that number was coincidentally in another field, it would boost that field too and thus give very strange results.
I want to be able to boost the searching user's friends. I currently have the list of every user's friends in a MultiValueField on the search index model. I want to pass the searching user's ID in with the search query, and boost any users in the index who have the searching user's ID in their friends list. Again, I have the same problem as above -- I can boost the ID, but I can't specify that I only want to boost the occurrence of that ID in that specific field.
I have a second BooleanField I want to boost by, similar to is_active but boosted by a smaller amount.
All of this is easy-ish if I can boost by a combination of a term and a field, but it seems very hard if I can only boost a term and not a field.
The only thing I have been able to think of so far is basically a hack: instead of BooleanFields, use CharFields with magic strings in them. Then boost those magic strings as search terms, and count on nobody accidentally using the magic strings in their inputted text. Likewise, instead of raw ids in my MultiValueFields, use ids prepended with magic strings. This is awkward, fragile and potentially buggy given that the behavior of the ElasticSearch standard tokenizer may be unpredictable given nonsensical "magic strings".
Another option I considered was using a Raw input type and adding ElasticSearch-specific syntax, but usage of Raw with ElasticSearch is almost entirely undocumented and the ElasticSearch boosting documentation itself is very thin.
Is there any way to solve this that does not involve mangling my index data in such a fashion?
In your mapping you could add:
"is_active":{
"type":"boolean",
"boost":10.0
}
and
"friends":{
"type":"int",
"index":"not_analyzed",
"boost":5.0
}
And then wrap your original query in a boolean query with a MUST on your original query and a SHOULD on is_active:true and SHOULD on friends:1234

Resources