Why my Elasticsearch query retrieves all indexed documents - elasticsearch

I've a problem to understand the functionality of the following Elasticsearch (ES 6.4) query:
{
"query" : {
"bool" : {
"should" : [
{
"match" : {
"title" : {
"query" : "example",
"operator" : "AND",
"boost" : 2
}
}
},
{
"multi_match" : {
"type" : "best_fields",
"query" : "example",
"operator" : "AND",
"fields" : [
"author", "content", "tags"
],
"boost" : 1
}
}
],
"must" : [
{
"range" : {
"dateCreate" : {
"gte" : "2000-01-01T00:00:00+0200",
"lte" : "2019-02-12T23:59:59+0200"
}
}
},
{
"term" : {
"client" : {
"value" : "test",
"boost" : 1
}
}
}
]
}
},
"size" : 10,
"from" : 0,
"sort" : [
{
"_score" : {
"order" : "desc"
}
}
]
}
The query is executed successfully but retrieves about 400,000 documents which is the total count of my index. It means that all documents are in the result set. But why? Is this really the correct behavior of the multi_match query?
When I was still using the query_string query, I only got the actual matching documents. That's why I'm a bit surprised.

You're missing minimum_should_match:
"bool" : {
"minimum_should_match": 1, <--- add this
"should" : [
...

Related

ElasticSearch : constant_score query vs function_score query

I recently upgraded my ElasticSearch version from version 5.3 to version 5.6
"query" : {
"constant_score" : {
"query" : {
"bool" : {
"must" : {
"terms" : {
"customerId" : [ "ASERFE", "7004567457" ]
}
},
"must_not" : {
"terms" : {
"useCase" : [ "PAY", "COLLECT" ]
}
}
},
"bool" : {
"must" : {
"match" : {
"cardProductGroupName" : {
"query" : "Pre-fill Test birthday Present",
"type" : "phrase"
}
}
}
}
}
}
}
executing the query mentioned above gave me the following error -
{"root_cause":[{"type":"parsing_exception","reason":"[constant_score] query does not support [query]","line":1,"col":37}],"type":"parsing_exception","reason":"[constant_score] query does not support [query]","line":1,"col":37}
So, I searched for the solution and found this function_score query. On executing the query mentioned below I am getting the same results that I would have got with constant_score.
"query" : {
"function_score" : {
"query" : {
"bool" : {
"must" : {
"terms" : {
"customerId" : [ "ASERFE", "7004567457" ]
}
},
"must_not" : {
"terms" : {
"useCase" : [ "PAY", "COLLECT" ]
}
}
},
"bool" : {
"must" : {
"match" : {
"groupName" : {
"query" : "Pre-fill Test birthday Present",
"type" : "phrase"
}
}
}
}
},
"functions" : [ {
"script_score" : {
"script" : "1"
}
} ],
"boost_mode" : "replace"
}
}
so my question is, Does it implies that function_score with script : "1" would give same result as constant_function ?
It will give the same result indeed though performance might be worse if it will still run the "script" for each matching document.
On the other hand, constant_score still exists in 5.6 though you have to use filter+boost instead of query.

How to query like this in Elasticsearch

I have documents like this:
{
'body': '',
'date': '',
'agency_id': ''
}
I want to get documents with these conditions:
body contains :
all of ['word1', 'word2 word3', 'word4']
Or all of: ['word5 word6', 'word7']
or all of: ['word8 word9', 'word10']
And agency_id in ['id1', 'id2', 'id3']
Would you please tell me how to create this query?
To achieve what you want you need to use two must clauses, one for body and other for agency_id. In case of body you can specify your three conditions and use a minimum should match as 1. Should be something like this:
{
"size" : 10,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"must" : {
"bool" : {
"should" : [ {
"query_string" : {
"query" : "word1 worrd2 word3 word4",
"fields" : [ "body" ],
"default_operator" : "and"
}
}, {
"query_string" : {
"query" : "word5 worrd6 word7",
"fields" : [ "body" ],
"default_operator" : "and"
}
}, {
"query_string" : {
"query" : "word8 worrd9 word10",
"fields" : [ "body" ],
"default_operator" : "and"
}
} ],
"minimum_should_match" : "1"
}
},
"must" : {
"query": {
"terms" : { "agency_id" : [ "id1", "id2", "id3" ]}
}
}
}
}
}
}
}
Just make sure you use some analyzer to generate tokens for each word inside body. If you don't need any special feature you can just use standard analyzer.

Can't create the correct filter for my elasticsearch query

I've a problem to set the correct filter. My query looks like this:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "example~",
"analyzer" : "standard",
"default_operator" : "OR",
"fuzziness" : "AUTO"
}
},
{
"term" : {
"client" : {
"value" : "MyClient",
"boost" : 1
}
}
},
{
"range" : {
"dateCreate" : {
"gte" : "2016-01-01T00:00:00+0200",
"lte" : "2016-12-31T23:59:59+0200"
}
}
},
{
"match" : {
"lang" : "php OR java"
}
}
]
}
},
"size" : 10,
"from" : 0,
"sort" : [
{
"_score" : {
"order" : "desc"
}
}
]
}
The "lang" field is of type text.
My expectation is to get all documents with the given query string and then I want select only the documents which have "PHP" or "Java" in their lang field. The lang fields only contain either "PHP" or "Java" but never both strings so I thought about using an exact matching but I can't got it to work.
The result is actually a list of two documents but with total_count=2510.
One of my documents that doesn't match:
{
"id" : "d3295f18-a033-4934-941a-21a8bef901e8",
"client" : "MyClient",
"lang" : "PHP",
"author" : null,
"dateCreate" : "2016-03-31T00:00:00+0200",
"title" : "Sample document",
"content" : "This is a short text describing the deocument."
}
Yes, the client field is also of type text.
client field must be either of keyword type to use term query or change the query for client from term to match:
{
"match" : {
"client" : {
"query" : "MyClient",
"boost" : 1
}
}
}

Elasticsearch match_phrase doesn't perform the same as multi_match with type phrase?

I'm having some trouble turning a match_phrase query into a multi_match query for multiple fields. My original query:
{
"from" : 0,
"size" : 50,
"query" : {
"filtered" : {
"query" : {
"match_phrase" : {
"metadata.description" : "Search Terms"
}
},
"filter" : {
"bool" : {
"must" : [ {
"terms" : {
"collectionId" : [ "1", "2" ]
}
} ]
}
}
}
}
}
Returns results correctly, but when I rewrite the match_phrase piece as a multi_match to run against multiple fields:
{
"from" : 0,
"size" : 50,
"query" : {
"filtered" : {
"query" : {
"multi_match" : {
"query" : "Search Terms",
"fields" : [ "metadata.description", "metadata.title" ],
"type" : "phrase"
}
},
"filter" : {
"bool" : {
"must" : [ {
"terms" : {
"collectionId" : [ "1", "2" ]
}
} ]
}
}
}
}
}
I am not getting any results. Is there anything obvious I am doing wrong here?
EDIT:
It must be something to do with the filter, as
{
"from" : 0,
"size" : 50,
"query" : {
"match_phrase" : {
"metadata.description" : "Search Terms"
}
}
}
and
{
"from" : 0,
"size" : 50,
"query" : {
"multi_match" : {
"query" : "Search Terms",
"fields" : [ "metadata.description", "metadata.title" ],
"type" : "phrase"
}
}
}
both perform as expected.
I am not sure why, exactly, but not using a filtered query, and applying the filter at the top level
{
"from" : 0,
"size" : 50,
"query" : {
"multi_match" : {
"query" : "Search Terms",
"fields" : [ "metadata.description", "metadata.title" ],
"type" : "phrase"
}
},
"filter" : {
"bool" : {
"must" : [ {
"terms" : {
"collectionId" : [ "1", "2" ]
}
} ]
}
}
}
resolves the problem.

Query with multiple values on a property with one value in Elasticsearch

I am trying to build on this query a little bit. The index I am searching also has a field "entity" with an id. So a few records will have "entity" : 16, "entity" 156 etc, depending on the id of the entity. I need to expand this query in such a way that I can pass an array or some list of values in, such as {:term => {:entity => [1, 16, 100]}} and get back records that have one of these integers as their entity value. I haven't had any luck so far, can someone help me?
{
"query" : {
"bool" : {
"must" : [
{
"term" : {"user_type" : "alpha"}
},
{
"term" :{"area" : "16"}
}
],
"must_not" : [],
"should" : []
}
},
"filter": {
"or" : [{
"and" : [
{ "term" : { "area" : "16" } },
{ "term" : { "date" : "05072013" } }
]
}, {
"and" : [
{ "term" : { "area" : "16" } },
{ "term" : { "date" : "blank" } }
]
}
]
},
"from" : 0,
"size" : 100
}
Use "terms" instead of "term".
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
{ "terms" : { "entity" : [ 123, 1234, ... ] }}

Resources