Elasticsearch popularity fallback - elasticsearch

I have a fallback in my queries to a popularity ranking if no hits are found. Every week I calculate a popRank field based on the number of times the doc is visited in the last month. This means that not all docs will have a popRank, only the ones visited in the last month.
The query below does not work with the must clause even though there are items that contain that category
GET /index/docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"body": [<array of keyword strings>]
}
}, {
"constant_score": {
"filter": {
"match_all": {}
},
"boost": 0
}
}],
"must": [{
"terms": {
"category": ["DIY"],
"boost": 0
}
}],
"minimum_should_match": 1
}
},
"sort": [{
"_score": {
"order": "desc"
}
}, {
"popRank": {
"unmapped_type": "double",
"order": "desc"
}
}]
}
This query is supposed to return resulting docs if the should clause is fulfilled, if not then the popularity ranking will take over, in either case it must be filtered by the category. This works if something other than the match_all returns results but does not work if only the match_all returns results.
This is an example doc.
{
"_index": "index",
"_type": "docs",
"_id": "Fridays",
"_score": 1,
"_source": {
"id": "Fridays",
"body": "text...",
"category": [
"DIY",
"Kitchen"
],
"popRank": 1
}
}

Related

ES: Sort on the result of a Query function

I'm quite new to ES and have been trying many different ways to sort on a subset results from Query/Filter. The aggs always sort on the whole collection instead of the result from the above query. My final goal is to sort on field price from the result of query (which was already sorted by _score and only 5 docs)
{
"query": {
"bool": {
"must": {
"function_score": {
"functions": [....],
"query": {....}
},
"score_mode": "sum",
"max_boost": 1.5
}
},
"filter": [...]
}
},
"size": 5,
"from": 0,
"sort": {
"_score": "desc"
},
"_source": [
"title",
"price"
],
"aggs": {
"i_am_confused": {
"terms": {
"field": "price",
"order": {
"_term": "desc"
}
}
}
}
}
I don't want to sort on client (because the subset result would be at least 700 docs).
I appreciate your help.
I've tried a couple of aggs they all don't work as I want, probably I didn't use them right.

Elasticsearch Bool query with minimum_should_match set to zero not honored

I add 3 documents
POST test/_doc
{"value": 1}
POST test/_doc
{"value": 2}
POST test/_doc
{"value": 3}
then do the following query I expect to return all the 3 docs with documents matching should clause being ranked higher
GET /test/_search
{
"query": {
"bool": {
"minimum_should_match": 0,
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
but instead i get only 2 docs (value 2,3) "minimum_should_match": 0, does not have any effect until i add the filter or must clause in the bool query like below,
GET /test/_search
{
"query": {
"bool": {
"filter": [ { "match_all": { } } ],
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
What I want
in the bool query, either the must clause or filter clause is empty or filled, the should clause must not filter any documents BUT only participate in ranking, please share how can i achieve that, thanks
It's a little weird that minimum_should_match: 0 is not working with the should clause. This may be due to the documentation mentioned here
No matter what number the calculation arrives at, a value greater than
the number of optional clauses, or a value less than 1 will never be
used. (ie: no matter how low or how high the result of the calculation
result is, the minimum number of required matches will never be lower
than 1 or greater than the number of clauses.
There are two ways in which you can get all the documents in the result and using the should clause only for the ranking purpose
Use must or filter clause with match_all query, which you already figured out as shown in the question above.
Another way could be to use the should clause with the boost parameter
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"value": {
"gte": 2,
"boost": 2.0
}
}
},
{
"range": {
"value": {
"lt": 2,
"boost": 1.0
}
}
}
]
}
}
}
Search Result will be
"hits": [
{
"_index": "68040640",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"value": 2
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"value": 3
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"value": 1
}
}
]

ElasticSearch Aggregation + Sorting in on NonNumric Field 5.3

I wanted to aggregate the data on a different field and also wanted to get the aggregated data on sorted fashion based on the name.
My data is :
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp001_local000000000000001",
"_score": 10.0,
"_source": {
"name": [
"Person 01"
],
"groupbyid": [
"group0001"
],
"ranking": [
"2.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp002_local000000000000001",
"_score": 85146.375,
"_source": {
"name": [
"Person 02"
],
"groupbyid": [
"group0001"
],
"ranking": [
"10.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp003_local000000000000001",
"_score": 20.0,
"_source": {
"name": [
"Person 03"
],
"groupbyid": [
"group0002"
],
"ranking": [
"-1.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp004_local000000000000001",
"_score": 5.0,
"_source": {
"name": [
"Person 04"
],
"groupbyid": [
"group0002"
],
"ranking": [
"2.0"
]
}
}
My query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "name:emp*^1000.0"
}
}
]
}
},
"aggs": {
"groupbyid": {
"terms": {
"field": "groupbyid.raw",
"order": {
"top_hit_agg": "desc"
},
"size": 10
},
"aggs": {
"top_hit_agg": {
"terms": {
"field": "name"
}
}
}
}
}
}
My mapping is :
{
"name": {
"type": "text",
"fielddata": true,
"fields": {
"lower_case_sort": {
"type": "text",
"fielddata": true,
"analyzer": "case_insensitive_sort"
}
}
},
"groupbyid": {
"type": "text",
"fielddata": true,
"index": "analyzed",
"fields": {
"raw": {
"type": "keyword",
"index": "not_analyzed"
}
}
}
}
I am getting data based on the average of the relevance of grouped records. Now, what I wanted is the first club the records based on the groupid and then in each bucket sort the data based on the name field.
I wanted grouping on one field and after that grouped bucket, I want to sort on another field. This is sample data.
There are other fields like created_on, updated_on. I also wanted to get sorted data based on that field. also get the data by alphabetically grouped.
I wanted to sort on the non-numeric data type(string). I can do the numeric data type.
I can do it for the ranking field but not able to do it for the name field. It was giving the below error.
Expected numeric type on field [name], but got [text];
You're asking for a few things, so I'll try to answer them in turn.
Step 1: Sorting buckets by relevance
I am getting data based on the average of the relevance of grouped records.
If this is what you're attempting to do, it's not what the aggregation you wrote is doing. Terms aggregations default to sorting the buckets by the number of documents in each bucket, descending. To sort the groups by "average relevance" (which I'll interpret as "average _score of documents in the group"), you'd need to add a sub-aggregation on the score and sort the terms aggregation by that:
"aggregations": {
"most_relevant_groups": {
"terms": {
"field": "groupbyid.raw",
"order": {
"average_score": "desc"
}
},
"aggs": {
"average_score": {
"avg": {
"script": {
"inline": "_score",
"lang": "painless",
}
}
}
}
}
}
Step 2: Sorting employees by name
Now, what I wanted is the first club the records based on the groupid and then in each bucket sort the data based on the name field.
To sort the documents within each bucket, you can use a top_hits aggregation:
"aggregations": {
"most_relevant_groups": {
"terms": {
"field": "groupbyid.raw",
"order": {
"average_score": "desc"
}
},
"aggs": {
"employees": {
"top_hits": {
"size": 10, // Default will be 10 - change to whatever
"sort": [
{
"name.lower_case_sort": {
"order": "asc"
}
}
]
}
}
}
}
}
Step 3: Putting it all together
Putting the both the above together, the following aggregation should suit your needs (note that I used a function_score query to simulate "relevance" based on ranking - your query can be whatever and just needs to be any query that produces whatever relevance you need):
POST /testing-aggregation/employee/_search
{
"size": 0,
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "ranking"
}
}
]
}
},
"aggs": {
"groupbyid": {
"terms": {
"field": "groupbyid.raw",
"size": 10,
"order": {
"average_score": "desc"
}
},
"aggs": {
"average_score": {
"avg": {
"script": {
"inline": "_score",
"lang": "painless"
}
}
},
"employees": {
"top_hits": {
"size": 10,
"sort": [
{
"name.lower_case_sort": {
"order": "asc"
}
}
]
}
}
}
}
}
}

Elastic Search v6.3: Query with filter never returns any matches

I'm facing some challenges with Elastic Search. I want to query for by some text and then filter based on a category. I followed the Elastic Search 6.3 Documentation for Queries but my response for ES is always empty. I know for a fact that I have at least one entry that should match the request. Below I have posted my query to Elastic Search and the entry that I know is present in my Elastic Search index. Any help is very much appreciated.
Query
{
"from": 0,
"size": 300,
"query": {
"bool": {
"filter": {
"term": {"category": "Soups"}
},
"should": [
{"term": {"instructions": "Matt"}},
{"term": {"introduction": "Matt"}},
{"term": {"recipe_name": "Matt"}},
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
Record Present in Elastic Search
{
"_index": "recipes",
"_type": "_doc",
"_id": "QMCScWoBkkkjW61rD81v",
"_score": 0.2876821,
"_source": {
"calories": 124,
"category": "Soups",
"cook_time": {
"hour": "2",
"min": "4"
},
"cooking_temp": "375",
"cooking_temp_units": "°F",
"creator_username": "virtualprodigy",
"ingredients": [
{
"majorQuantity": "1 ",
"measuring_units": "teaspoon",
"minorQuantity": " ",
"name": "mett"
}
],
"instructions": "instructions",
"introduction": "intro",
"prep_time": {
"hour": "1",
"min": "2"
},
"recipe_name": "Matt Test",
"servings": 1
}
}
Your fields are probably indexed using a standard analyser, which means they are split into tokens and lowercased. The term query is an exact match and does not perform this analysis, so you are looking for 'Matt' and it only has 'matt'. You look for 'Soups' and it only has 'soups'. The easiest fix is to change your term queries into match queries. e.g:
{
"from": 0,
"size": 300,
"query": {
"bool": {
"filter": {
"match": {
"category": "Soups"
}
},
"should": [
{"match": {"instructions": "Matt"}},
{"match": {"introduction": "Matt"}},
{"match": {"recipe_name": "Matt"}}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}

Boosting results based on selected types in elasticsearch

I have different types indexed in elastic search.
but, if I want to boost my results on some selected types then what should I do?
I could use type filter in boosting query, but type filter allows me only one type to be used in filter. I need results to be boosted on the basis of multiple types.
Example:
I have Person, Event, Location data indexed in elastic search where Person, Location and Event are my types.
I am searching for keyword 'London' in all types but i want Person and Event type records to be boosted than Location.
How could I achieve the same?
One of the ways of getting the desired functionality is by wrapping your query inside a bool query and then make use of the should clause, in order to boost certain documents
Small example:
POST test/person
{
"title": "london elise moore"
}
POST test/event
{
"title" : "london is a great city"
}
Without boost:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
]
}
}
}
With the following response:
"hits": {
"total": 2,
"max_score": 0.2972674,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.2972674,
"_source": {
"title": "london elise moore"
}
},
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 0.26010898,
"_source": {
"title": "london is a great city"
}
}
]
}
And now with the added should clause:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
],
"should": [
{
"term": {
"_type": {
"value": "event",
"boost": 2
}
}
}
]
}
}
}
Which gives back the following response:
"hits": {
"total": 2,
"max_score": 1.0326607,
"hits": [
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 1.0326607,
"_source": {
"title": "london is a great city"
}
},
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.04235228,
"_source": {
"title": "london elise moore"
}
}
]
}
You could even leave out the extra boost in the should clause, cause if the should clause matches it will boost the result :)
Hope this helps!
I see two ways of doing that using that but both is using scripts
1. using sorting
POST c1_1/_search
{
"from": 0,
"size": 10,
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": "double boost = 1; if(doc['_type'].value == 'Person') { boost *= 2 }; if(doc['_type'].value == 'Event') { boost *= 3}; return _score * boost; ",
"params": {}
}
},
{
"_score": {}
}
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
}
}
Second option Using function score.
POST c1_1/_search
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
},
"script_score": {
"script": "_score * (doc['_type'].value == 'Person' || doc['_type'].value == 'Event'? 2 : 1)"
}
}
}
}

Resources