Elasticsearch query returning far less number of records - elasticsearch

I am running following elasticsearch query from groovy script. There are thousands of records which meet this criteria, but I get only 10 records in return.
{
"query":{
"bool":{
"must":[
{
"match_all":{
}
},
{
"range":{
"#Timestamp":{
"gte":1417511269270,
"lte":1575277669270,
"format":"epoch_millis"
}
}
},
{
"match_phrase":{
"field1.keyword":{
"query":"value1"
}
}
},
{
"match_phrase":{
"field2.keyword":{
"query":"value2"
}
}
},
{
"range":{
"#Timestamp":{
"gte":"2001-03-01",
"lt":"2019-10-30"
}
}
}
],
"filter":[
],
"should":[
],
"must_not":[
]
}
}
}
What am I missing in my query?

You are missing a size parameter, which means it defaults to 10 results.
e.g. add this to your query object:
"size": 100

Related

Elasticsearch DSL query returning result for condition which isn't true

I want to have three conditions in my elasticsearch query and accordingly I have written as below. But I don't know why it is returning a DOCUMENT where AMOUNT is 250 and it EXISTS whereas my condition is ATLEAST one of the two i.e. AMOUNT less than or equal to zero or AMOUNT should not exist.
Below is the DSL Query
{
"from":0,
"size":10,
"track_total_hits":true,
"_source": ["amount", "npa_stageid_loanaccounts"],
"query":{
"bool":{
"must":[
{
"query_string":{
"default_field":"npa_stageid_loanaccounts.keyword",
"query":"Y"
}
},
{
"bool":{
"minimum_should_match":1,
"should":[
{
"range":{
"Amount":{
"lte":0
}
}
},
{
"bool":{
"must_not":[
{
"exists":{
"field":"Amount"
}
}
]
}
}
]
}
}
]
}
}
}
In your documents, you have amount but in your query you have Amount, the casing is not the same.

Elasticsearch need AND query instead OR

I'm trying to search posts with some prefixes (212, 215) and in certain node (663).
This query is searching posts with OR prefix operator. But i need a query to search with AND operator. How to do it? This query is generated by CMS:
{
"query":{
"bool":{
"filter":[
{
"term":{
"node":663
}
},
{
"terms":{
"prefix":[
"215",
"212"
]
}
},
{
"bool":{
"should":[
{
"type":{
"value":"post"
}
},
{
"type":{
"value":"thread"
}
}
]
}
}
],
"must":{
"match_all":{
}
}
}
},
"sort":[
{
"date":"desc"
}
],
"size":8000,
"docvalue_fields":[
"discussion_id",
"user",
"date"
],
"_source":false
}
If you're looking for docs that have a list of values for prefix containing both 212 and 215, you should use separate queries:
{
"query":{
"bool":{
"filter":[
...
{"match":{"prefix":"212"}},
{"match":{"prefix":"215"}},
...
],
...
}

How to reduce multiple conditions in ES

In the below query, I used multiple times match_phrase. how to reduce multiple match_phrase? because in production while querying to ES response is very slow.
GET /logs*/_search
{
"from":0,
"query":{
"bool":{
"filter":[
{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
}
],
"must":[
{
"bool":{
"must_not":[
{
"match_phrase":{
"message":{
"query":"System32"
}
}
},
{
"match_phrase":{
"message":{
"query":"212.118.14.45"
}
}
},
{
"match_phrase":{
"message":{
"query":" stopped state."
}
}
},
{
"match_phrase":{
"message":{
"query":" running state"
}
}
},
{
"match_phrase":{
"message":{
"query":" Share Name: \\\\*\\DLO-EBackup"
}
}
}
.
.
.
etc.,
.
.
.
.
.
{
"match_phrase":{
"message":{
"query":"WFO15Installation"
}
}
},
{
"match_phrase":{
"message":{
"query":"Windows\\SysWOW64"
}
}
},
{
"match_phrase":{
"message":{
"query":"Bitvise"
}
}
}
]
}
}
]
}
},
"size":10,
"sort":[
{
"#timestamp":{
"order":"desc"
}
}
]
}
Thank You!
to begin with, you could move the must_not block inside the filter one to skip score calculation and leverage on some caching. Something like:
"query":{
"bool":{
"filter":[{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
},
{
"bool": {
"must_not":[{
"match_phrase":{
"message":{
"query":"System32"
}
}
},
{
"match_phrase":{
"message":{
"query":"212.118.14.45"
}
}
},
...
]
}
}],
...
However, as someone already mentioned in the comments, you should optimise your data for searches before indexing your documents into Elasticsearch. A better solution than having so many filters in your query would be to process your data and applying those filters at ingestion time, for example by using the ingest APIs (see Elastic Documentation) or Logstash. E.g., you could evaluate the must_not conditions at index time and set the result into a boolean field (e.g., ignore) that you can add to all documents, so that you can use that field at query time with a query like this:
"query":{
"bool":{
"filter":[{
"range":{
"#timestamp":{
"gte":"2020-02-10T11:13:19.7684961Z",
"lte":"2020-02-11T11:13:19.7684961Z"
}
}
},
{
"match": {
"ignore": false
}
},
...

Elasticsearch Filtered Bool Query

I am running into some serious Problems with a custom Search. All i want is a Wildcard Search in three Fields and the Result should to filtered by another field. In Elastica it results in this Query:
{
"bool":{
"should":[
{
"wildcard":{
"ean":"*180g*"
}
},
{
"wildcard":{
"titel":"*180g*"
}
},
{
"wildcard":{
"interpret":"*180g*"
}
}
],
"filter":[
{
"term":{
"genre":{
"value":"Rock",
"boost":1
}
}
}
]
}
}
Actually i can't find an error, but Elasticsearch does not give me Filtered Results. What happens? Elasticsearch returns ALL Items with the Filtered Term, either if the Boolean Shoulds are True or False. When i add the Filter as "Must" i am getting the same results? What is wrong here !?
You need to add "minimum_should_match": 1 in your bool query.
{
"bool":{
"minimum_should_match": 1,
"should":[
{
"wildcard":{
"ean":"*180g*"
}
},
{
"wildcard":{
"titel":"*180g*"
}
},
{
"wildcard":{
"interpret":"*180g*"
}
}
],
"filter":[
{
"term":{
"genre":{
"value":"Rock",
"boost":1
}
}
}
]
}
}

Combine two function_score queries in dis_max

I would like to make a query with two subqueries, each of them has it's own scoring based on function_score with script. For example, this subquery:
{
"query":{
"function_score":{
"query":{
"bool":{
"filter":[
{
"term":{
"rooms_count":3
}
},
{
"term":{
"addresses":"d76255c8-3173-4db5-a39b-badd3ebdf851"
}
},
{
"exists":{
"field":"zhk_id"
}
}
]
}
},
"script_score":{
"script":"1 * doc['price'].value/100000"
},
"boost_mode":"replace"
}
}
}
works fine, and it's score is based on price (about 190 points). But if I try to combine two subqueries in dis_max query, function_score is not working and I get scores about 1 point.
Explanation for each subquery looks like this
"value": 100.9416, "description": "script score function, computed with script:"[script: 1 * doc['price'].value/100000, type: inline, lang: null, params: {}]" and parameters:
{}",
for dis_max query like
"value": 1, "description": "ConstantScore(function score (#rooms_count: #addresses:d76255c8-3173-4db5-a39b-badd3ebdf851 #ConstantScore(fieldnames:zhk_id),function=script[script: 1 * doc['price'].value/100000, type: inline, lang: null, params: {}])), product of:",`
Can anybody tell me, how to combine function_score queries properly?
My full dis_max query on pastebin
Thanks to Daniel Mitterdorfer from https://discuss.elastic.co/t/combine-two-function-score-queries-in-dis-max/70666.
correct query is
{
"query":{
"dis_max":{
"queries":[
{
"function_score":{
"query":{
"bool":{
"filter":[
{
"term":{
"rooms_count":3
}
},
{
"term":{
"addresses":"d76255c8-3173-4db5-a39b-badd3ebdf851"
}
},
{
"missing":{
"field":"zhk_id"
}
}
]
}
},
"script_score":{
"script":"1 * doc['price'].value/100000"
},
"boost_mode":"replace"
}
},
{
"function_score":{
"query":{
"bool":{
"filter":[
{
"term":{
"rooms_count":3
}
},
{
"term":{
"addresses":"d76255c8-3173-4db5-a39b-badd3ebdf851"
}
},
{
"exists":{
"field":"zhk_id"
}
}
]
}
},
"script_score":{
"script":"1 * doc['price'].value/100000"
},
"boost_mode":"replace"
}
}
]
}
}
}

Resources