Counting occurrences of search terms in Elasticsearch function score script - elasticsearch

I have an Elasticsearch index with document structure like below.
{
"id": "foo",
"tags": ["Tag1", "Tag2", "Tag3"],
"special_tags": ["SpecialTag1", "SpecialTag2", "SpecialTag3"],
"reserved_tags": ["ReservedTag1", "ReservedTag2", "Tag1", "SpecialTag2"],
// rest of the document
}
The fields tags, special_tags, reserved_tags are stored separately for multiple use cases. In one of the queries, I want to order the documents by number of occurrences for searched tags in all the three fields.
For example, if I am searching with three tags Tag1,
Tag4 and SpecialTag3, total occurrences are 2 in the above document. Using this number, I want to add a custom score to this document and sort by the score.
I am already using function_score as there are few other attributes on which the scoring depends. To compute the matched number, I tried painless script like below.
def matchedTags = 0;
def searchedTags = ["Tag1", "Tag4", "SpecialTag3"];
for (int i = 0; i < searchedTags.length; ++i) {
if (doc['tags'].contains(searchedTags[i])) {
matchedTags++;
continue;
}
if (doc['special_tags'].contains(searchedTags[i])) {
matchedTags++;
continue;
}
if (doc['reserved_tags'].contains(searchedTags[i])) {
matchedTags++;
}
}
// logic to score on matchedTags (returning matchedTags for simplicity)
return matchedTags;
This runs as expected, but extremely slow. I assume that ES has to count the occurrences for each doc and cannot use indexes here. (If someone can shed light on how this will work internally or provide documentation/resources links, that would be helpful.)
I want to have two scoring functions.
Score as a function of number of occurrences
Score higher for higher occurrences. This is basically same as 1, but the repeated occurrences would be counted.
Is there any way where I can get benefits of both faster searching and also the custom scoring using script?
Any help is appreciated. Thanks.

Related

Represent enum in Elastic Search for sorting

I have a use case to represent an enum for difficulty level (EASY, MEDIUM, DIFFICULT) in elastic search with support of sorting on this field. If this field is indexed as string the sorting will not work as expected.
One way to support this is to index integer values for each enumeration in ES and map it to string values when sorted results are returned by ES.
Are there other alternatives such that ES itself takes care of sorting in the enumeration order while this field is indexed as string? Can I specify custom sort function for a field? function_score is an option, but given that I have to sort based on enum ordering is there better way than defining custom function_score?
In my use case there are multiple such enumeration defining scale across dimensions like difficulty, height (low, medium, high), grades (good, average, poor), etc. Both the above solution requires custom work as a new dimension is introduced. Can either of the above approach be generalzied?
You can check the answer to the same question here. You will need to use script_score like below:
GET /my-index-2/_search
{
"query": {
"script_score": {
"query": {
"match_all":{}
},
"script": {
"source": "if (doc['field name'].value == 'EASY'){2} else if(doc['field name'].value == 'MEDIUM') {1} else if(doc['field name'].value == 'DIFFICULT') {0}"
}
}
}
}

Search After (pagination) in Elasticsearch when sorting by score

Search after in elasticsearch must match its sorting parameters in count and order. So I was wondering how to get the score from previous result (example page 1) to use it as a search after for next page.
I faced an issue when using the score of the last document in previous search. The score was 1.0, and since all documents has 1.0 score, the result for next page turned out to be null (empty).
That's actually make sense, since I am asking elasticsearch for results that has lower rank (score) than 1.0 which are zero, so which score do I use to get the next page.
Note:
I am sorting by score then by TieBreakerID, so one possible solution is using high value (say 1000) for score.
What you're doing sounds like it should work, as explained by an Elastic team member. It works for me (in ES 7.7) even with tied scores when using the document ID (copied into another indexed field) as a tiebreaker. It's true that indexing additional documents while paginating will make your scores slightly unstable, but not likely enough to cause a significant problem for an end user. If you need it to be reliable for a batch job, the Scroll API is the better choice.
{
"query": {
...
},
"search_after": [
12.276552,
14173
],
"sort": [
{ "_score": "desc" },
{ "id": "asc" }
]
}

Discover historical trends in Elasticsearch (not visual)

I have some experience with Elastic as logs storage, but I'm stuck on basic trends recognition (where I need to compare found documents to each other) over time periods.
Easy query would answer following question:
Find all occurrences of document rows (row is specified by growing/continues #timestamp value), where specific field (e.g. threads_count) is growing for fixed count of documents, or time period.
So if I have thread_count of some application, logged every minute over a day including timestamp. And I specify that I'm looking for growing trend in 10 minutes - result should return documents or document sets where thread_count was greater over the one from document minute before at least for 10 documents.
It is very similar task to see line graph, and identify growing parts by eye.
Maybe I just miss proper function name for search. I'm not interested in visualization, I would like to search similar situations over the API and take needed actions.
Any reference to documentation or simple example is welcome!
Well script cannot be used between documents. So you will have to use a payload.
In your query sort the result by date.
https://www.elastic.co/guide/en/elastic-stack-overview/6.3/how-watcher-works.html
A script in the payload could tell you if a field is increasing (something like that, don't have access to a es index right now)
"transform": {
"script": {
"source": "ctx.payload.transform = []; def current_score = -1;
def current = []; for (int j=0;j<ctx.payload.hits.hits;j++){
//check in the loop if current_score increasing using ctx.payload.hits.hits[j]._source.message], if not return "FALSE"
} ; return "TRUE",
"lang": "painless"
}
}
If you use logstash to index your documents, take a look to elapsed, could be nice too: https://www.elastic.co/guide/en/logstash/current/plugins-filters-elapsed.html

Possible to have a document always return above certain position

I've got a bunch of documents from a query which are sorted by a modified date. However I'd like certain documents (identified by a field value) to always return in the top ten results regardless of whether there are ten or more documents with a more recent modified date.
From what I've read about the various ways of sorting in Elasticsearch (score, boost, scripts) I don't think I have any way of determining the actual position of a document in the search results, let alone some way of manipulating the score to push a document into the top ten.
Assuming that you have a field called "important_field" which contains value 1, for documents you in top and say 0 for all other documents, you can use multi field sorting as below
{
"sort": [
{ "important_field": { "order": "desc" }},
{ "modified_date": { "order": "desc" }}
]
}
This way of sorting means it will sort by important_field value and if they are same then will be sorted by modified_date. So all documents with important_field value 1 will come on top and rest will still be sorted by modified_date.

How to sort elastic search results by score + boost + field?

Given an index of books that have a title, an author, and a description, I'd like the resulting search results to be sorted this way:
all books that match the title sorted by downloads (a numeric value)
all books that match on author sorted by downloads
all books that match on description sorted by downloads
I use the search query below, but the problem is that each entry has a different score thus making sorting by downloads irrelevant.
e.g. when the search term is 'sorting' - title: 'sorting in elastic search' will score higher than title: 'postgresql sorting is awesome' (because of the word position).
query = QueryBuilders.multiMatchQuery(queryString, "title^16", "author^8", "description^4")
elasticClient.prepareSearch(Index)
.setTypes(Book)
.setQuery(query)
.addSort(SortBuilders.scoreSort())
.addSort(SortBuilders.fieldSort("downloads").order(SortOrder.DESC))
How do I construct my query so that I could get the desired book sorting?
I use standard analysers and I need to the search query to be analysed, also I will have to handle multi-word search query strings.
Thx.
What you need here is a way to compute score based on three weighted field and a numeric field. Sort will sum the score obtained from both , due to which if either one of them is too large , it will supersede the other.
Hence a better approach would be to multiple downloads with the score obtained by the match.
So i would recommend function score query -
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "sorting",
"fields": [
"title^16",
"author^8",
"description^4"
]
}
},
"function": [
{
"field_value_factor": {
"field": "downloads"
}
}
],
"boost_mode": "multiply"
}
}
}
This will compute the score based on all three fields. And then multiply that score with the value in download field to get the final score. The multiply boost_mode decides how the value computed by functions are clubbed together with the score computed by query.

Resources