Elasticsearch Bool query with minimum_should_match set to zero not honored - elasticsearch

I add 3 documents
POST test/_doc
{"value": 1}
POST test/_doc
{"value": 2}
POST test/_doc
{"value": 3}
then do the following query I expect to return all the 3 docs with documents matching should clause being ranked higher
GET /test/_search
{
"query": {
"bool": {
"minimum_should_match": 0,
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
but instead i get only 2 docs (value 2,3) "minimum_should_match": 0, does not have any effect until i add the filter or must clause in the bool query like below,
GET /test/_search
{
"query": {
"bool": {
"filter": [ { "match_all": { } } ],
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
What I want
in the bool query, either the must clause or filter clause is empty or filled, the should clause must not filter any documents BUT only participate in ranking, please share how can i achieve that, thanks

It's a little weird that minimum_should_match: 0 is not working with the should clause. This may be due to the documentation mentioned here
No matter what number the calculation arrives at, a value greater than
the number of optional clauses, or a value less than 1 will never be
used. (ie: no matter how low or how high the result of the calculation
result is, the minimum number of required matches will never be lower
than 1 or greater than the number of clauses.
There are two ways in which you can get all the documents in the result and using the should clause only for the ranking purpose
Use must or filter clause with match_all query, which you already figured out as shown in the question above.
Another way could be to use the should clause with the boost parameter
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"value": {
"gte": 2,
"boost": 2.0
}
}
},
{
"range": {
"value": {
"lt": 2,
"boost": 1.0
}
}
}
]
}
}
}
Search Result will be
"hits": [
{
"_index": "68040640",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"value": 2
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"value": 3
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"value": 1
}
}
]

Related

Elasticsearch - Find documents missing two fields

I'm trying to create a query that returns information about how many documents that don't have data for two fields (date.new and date.old). I have tried the query below, but it works as OR-logic, where all documents missing either date.new or date.old are returned. Does anyone know how I can make this only return documents missing both fields?
{
"aggs":{
"Missing_field_count1":{
"missing":{
"field":"date.new"
}
},
"Missing_field_count2":{
"missing":{
"field":"date.old"
}
}
}
}
Aggregations is not the feature to use for this. You need to use the exists query wrapped within a bool/must_not query, like this:
GET index/_count
{
"size": 0,
"bool": {
"must_not": [
{
"exists": {
"field": "date.new"
}
},
{
"exists": {
"field": "date.old"
}
}
]
}
}
hits.total.value indicates the count of the documents that match the search request. The value indicates the number of hits that match and relation indicates whether the value is accurate (eq) or a lower bound (gte)
Index Data:
{
"data": {
"new": 1501,
"old": 10
}
}
{
"title": "elasticsearch"
}
{
"title": "elasticsearch-query"
}
{
"date": {
"new": 1400
}
}
The search query given by #Val answers on how to achieve your use case.
Search Result:
"hits": {
"total": {
"value": 2, <-- note this
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "65112793",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"title": "elasticsearch"
}
},
{
"_index": "65112793",
"_type": "_doc",
"_id": "5",
"_score": 0.0,
"_source": {
"title": "elasticsearch-query"
}
}
]
}

Query for : How many elements of an array are matching in a document attribute in ElasticSearch

I've many documents having an attribute that is an array of values like these:
{
"_index": "myindex",
"_type": "mytype",
"_id": "myid1",
"_source": {
"tags": [
"devid",
"batman",
"obama"
]
}
},
{
"_index": "myindex",
"_type": "mytype",
"_id": "myid2",
"_source": {
"tags": [
"devid",
"superman"
]
}
}
I have an array of elements like: ["devid", "batman", "pippo"]
I want to get all the documents matching at least one element of the array, sorted by how many elements are matched.
For example, I expect that myid1 will have an higher score than myid2.
How can I do this?
At the moment I'm "stuck" here:
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"tags": ["devid", "batman", "pippo"]
}
}
}
}
}
}
}
It only filters by terms and sets 1 as score to both.
I'm noob with elasticsearch any hint is welcome!
Using the terms query instead of filter would result in documents with more terms matching get a higher score.
Example :
{
"query": {
"terms": {
"tags": [
"devid",
"batman",
"pippo"
]
}
}
}

Boosting results based on selected types in elasticsearch

I have different types indexed in elastic search.
but, if I want to boost my results on some selected types then what should I do?
I could use type filter in boosting query, but type filter allows me only one type to be used in filter. I need results to be boosted on the basis of multiple types.
Example:
I have Person, Event, Location data indexed in elastic search where Person, Location and Event are my types.
I am searching for keyword 'London' in all types but i want Person and Event type records to be boosted than Location.
How could I achieve the same?
One of the ways of getting the desired functionality is by wrapping your query inside a bool query and then make use of the should clause, in order to boost certain documents
Small example:
POST test/person
{
"title": "london elise moore"
}
POST test/event
{
"title" : "london is a great city"
}
Without boost:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
]
}
}
}
With the following response:
"hits": {
"total": 2,
"max_score": 0.2972674,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.2972674,
"_source": {
"title": "london elise moore"
}
},
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 0.26010898,
"_source": {
"title": "london is a great city"
}
}
]
}
And now with the added should clause:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "london"
}
}
],
"should": [
{
"term": {
"_type": {
"value": "event",
"boost": 2
}
}
}
]
}
}
}
Which gives back the following response:
"hits": {
"total": 2,
"max_score": 1.0326607,
"hits": [
{
"_index": "test",
"_type": "event",
"_id": "AVVx63LrYvUb9aQn6r5Y",
"_score": 1.0326607,
"_source": {
"title": "london is a great city"
}
},
{
"_index": "test",
"_type": "person",
"_id": "AVVx621GYvUb9aQn6r5X",
"_score": 0.04235228,
"_source": {
"title": "london elise moore"
}
}
]
}
You could even leave out the extra boost in the should clause, cause if the should clause matches it will boost the result :)
Hope this helps!
I see two ways of doing that using that but both is using scripts
1. using sorting
POST c1_1/_search
{
"from": 0,
"size": 10,
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": "double boost = 1; if(doc['_type'].value == 'Person') { boost *= 2 }; if(doc['_type'].value == 'Event') { boost *= 3}; return _score * boost; ",
"params": {}
}
},
{
"_score": {}
}
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
}
}
Second option Using function score.
POST c1_1/_search
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*",
"default_operator": "and"
}
}
],
"minimum_should_match": "1"
}
},
"script_score": {
"script": "_score * (doc['_type'].value == 'Person' || doc['_type'].value == 'Event'? 2 : 1)"
}
}
}
}

Complex aggregations with Elastic Search

Supposing this is my elasticsearch structure:
{
"_index": "my_index",
"_type": "person",
"_id": "ID",
"_source": {
...DATA...
}
}
{
"_index": "my_index",
"_type": "result",
"_id": "ID",
"_source": {
"personID": "personID"
"date": "timestamp",
"result": "integer",
"speciality": "categoryID"
}
}
I would like to get the most 10 most "influent" people based on:
number of competition in the last 30 days
number of competition in the last year
competition's results in the last 30 days
number of different specialities in the last 30 days
I'm thinking about using _score but I don't know how to influence the score using some values aggregated from the documents of type "result" . This is what I'm trying to achieve
POST my_index/_search?search_type=dfs_query_then_fetch
{
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"_type": {
"value": "person"
}
}
}
]
}
}
},
"functions": [
{
"field_value_factor": {
"field": {
"query": {
//competitions in the last 30 days
},
"aggs": {
//cout
}
},
"factor": 1
},
"weight": 0.1
}
]
}
}
Is this possible with just 1 request?
Is this a good approach?
Any tip on what to look at is appreciated

Elasticsearch popularity fallback

I have a fallback in my queries to a popularity ranking if no hits are found. Every week I calculate a popRank field based on the number of times the doc is visited in the last month. This means that not all docs will have a popRank, only the ones visited in the last month.
The query below does not work with the must clause even though there are items that contain that category
GET /index/docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"body": [<array of keyword strings>]
}
}, {
"constant_score": {
"filter": {
"match_all": {}
},
"boost": 0
}
}],
"must": [{
"terms": {
"category": ["DIY"],
"boost": 0
}
}],
"minimum_should_match": 1
}
},
"sort": [{
"_score": {
"order": "desc"
}
}, {
"popRank": {
"unmapped_type": "double",
"order": "desc"
}
}]
}
This query is supposed to return resulting docs if the should clause is fulfilled, if not then the popularity ranking will take over, in either case it must be filtered by the category. This works if something other than the match_all returns results but does not work if only the match_all returns results.
This is an example doc.
{
"_index": "index",
"_type": "docs",
"_id": "Fridays",
"_score": 1,
"_source": {
"id": "Fridays",
"body": "text...",
"category": [
"DIY",
"Kitchen"
],
"popRank": 1
}
}

Resources