In which order elasticsearch filters applied? - elasticsearch

Currently I am using elasticsearch in my rails application. My concern is how filter works in query is there any priority or ranking that this filter will apply first and other one in last
OR it applied from top to bottom OR bottom to top :-
I have an example query with filter's below :-
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match_all": {
}
}
],
"filter": [
{
"term": {
"status": true
}
},
{
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 40.8888,
"lon": -73.888888
},
"bottom_right": {
"lat": 41.8888,
"lon": -74.88888
}
}
}
},
{
"range": {
"get_booking_cut_off_time": {
"gte": 5.46
}
}
},
{
"terms": {
"show_to_list_val": [
"Both",
"Direct"
]
}
},
{
"range": {
"min_days": {
"lte": 1
}
}
},
{
"terms": {
"midoffice_master_id": [
10,
14
]
}
}
]
}
}
}
},
"_source": {
"includes": [
"name",
"code"
]
},
"size": 20,
"from": 0
}
So in above example I have term filter, geo_bounding_box filter and range filter. I want to know that which filter should be apply first or last when query hits the elasticsearch api ??
Any help would be appreciable... :)

Take a look at this very detailed blog post from Elastic about the inner workings of query- and filter-execution:
https://www.elastic.co/de/blog/elasticsearch-query-execution-order
In the Conclusion-section at the end they state (quote):
Q: Does the order in which I put my queries/filters in the query DSL matter?
A: No, because they will be automatically reordered anyway based on their respective costs and match costs.
Hope this helps!

Related

How to get 3 random search results in elasticserch query

I have my elasticsearch query that returns record between the range of publishedDates:
{
query : {
bool: {
filter: [
],
must: {
range: {
publishedDate: {
gte: "2018-11-01",
lte: "2019-03-30"
}
}
}
}
}
from: 0,
size: 3,
}
I need to show 3 random results every time I send this query
It is mentioned in the elastic search documentation that I can send a seed to get random results:
After following the documentation, I updated my query as:
{
"query" : {
"bool": {
"filter": [
],
"must": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
}
},
"function_score": {
"functions": [
{
"random_score": {
"seed": "123123123"
}
}
]
}
},
"from": 0,
"size": 3
}
But it is not working (saying query is malformed), can anyone suggest how to correct this query to return 3 random search results.
If you just need random results returned, you could restructure the query to be similar to the following
{
"query": {
"function_score": {
"query": {
"range": {
"publishedDate": {
"gte": "2018-11-01",
"lte": "2019-03-30"
}
}
},
"boost": "5",
"random_score": {},
"boost_mode": "multiply"
}
},
"from": 0,
"size": 3
}
Modified from the elastic documentation -
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Elasticsearch multiple fields OR query

Here is an example record that I have stored in ES:
"taskCurateStatus": true,
"taskMigrateStatus": true,
"verifiedFields": 7,
"taskId": "abcdef123",
"operatorEmail": "test#test.com"
Example Query I'm making via /_search:
{
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"msg.operator_email": "test#test.com"
}
}
{
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}
]
}
},
"from": 0,
"size": 50
}
Basically I want to also filter by documents that have EITHER taskCurateStatus or taskMigrateStatus be true. Some messages have only one of them defined. I was thinking of using a should query but not sure how that would work with the match query. Any help would be appreciated. Thanks
you can add another boolean filter inside your must filter. This boolean filter can implemenet the should clause where you can compare the boolean flags with a should filter combining both the boolean check filters
{
"sort": [{
"#timestamp": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": [{
"match": {
"msg.operator_email": "test#test.com"
}
}, {
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}, {
"bool": {
"should": [{
"term": {
"taskCurateStatus": {
"value": true
}
}
}, {
"term": {
"taskMigrateStatus": {
"value": true
}
}
}]
}
}]
}
},
"from": 0,
"size": 50
}
Take a look at the above query and see if the helps
Thanks

ElasticSearch 2 bucket level sorting

The mapping of database is this:
{
"users": {
"mappings": {
"user": {
"properties": {
credentials": {
"type": "nested",
"properties": {
"achievement_id": {
"type": "string"
},
"percentage_completion": {
"type": "integer"
}
}
},
"current_location": {
"type": "geo_point"
},
"locations": {
"type": "geo_point"
}
}
}
}
}
Now In the mapping, You can see there are two geo-distance fields one is current_location and other is locations. Now I want to sort user based on credentials.percentage_completion which is a nested field. This work fine for example this query,
Example Query:
GET /users/user/_search?size=23
{
"sort": [
{
"credentials.percentage_completion": {
"order": "desc",
"missing": "_last"
}
},
"_score"
],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "100000000km",
"user.locations": {
"lat": 19.77,
"lon": 73
}
}
}
}
}
}
I want to change sorting order made into buckets, the desired order is first show all the people who are at 100KM radius of user.current_location and sort them according to credentials.percentage_completion and then rest of users sorted again by credentials.percentage_completion.
I tried putting conditional in sorting and made it multilevel but that will not work because only nested can have filters and that on nested fields child only.
I thought I can use _score for sorting and give more relevance to people who are under 1000 km but geo-distance is a filter, I don't seem to find any way to give relevance in filter.
Is there anything I am missing here , any help would be great.
Thanks
Finally solved it, posting it here so other can also take some lead if they get here. The way to solve this is to give constant relevance score to particular query but as here it was Geo distance so was not able to use that in query, then I found Constant Score query: It allows to wrap a filter inside a query.
This is how query looks:
GET /users/user/_search?size=23
{
"sort": [
"_score",
{
"credentials.udacity_percentage_completion": {
"order": "desc",
"missing": "_last"
}
}
],
"explain": true,
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"geo_distance": {
"distance": "100km",
"user.current_location": {
"lat": 19.77,
"lon": 73
}
}
},
"boost": 50
}
},
{
"constant_score": {
"filter": {
"geo_distance": {
"distance": "1000000km",
"user.locations": {
"lat": 19.77,
"lon": 73
}
}
},
"boost": 1
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "10000km",
"user.locations": {
"lat": 19.77,
"lon": 73
}
}
}
}
}
}

How do I limit an ElasticSearch API count by date?

I'm trying to count the number of query matches over a given time range, hitting the URL /{index}/_count with the body indicated below.
I'm new to Query DSL, so it's quite possible I'm overlooking something obvious. However, the straightforward application of a count to an existing query doesn't work. I don't see anything in the docs that indicate a count query should receive special treatment.
I've tried adding a range and aggregations to the query, but I keep getting the following error or some variant:
indices:data/read/count[s]]]; nested:
QueryParsingException[[graylog2_NN] request does not support [{label}]]
Limit query by timestamp:
{
"query": {
"term": { "level":3 },
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
}
}
Use an aggregation:
{
"query": {
"term": { "level":3 }
},
"aggs": {
"range": {
"date_range": {
field: "_timestamp",
"ranges": {
{ "to": "now-1d" },
{ "from": "now-2d" },
}
}
}
}
}
I've also tried plugging in the query exported from the UI (bug icon on an individual stream display), no joy there either (one hour's worth of matches):
{
"from": 0,
"size": 100,
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
},
{
"query": {
"query_string": {
"query": "streams:5568c9dbe4b0b31b781bf105"
}
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"highlight": {
"require_field_match": false,
"fields": {
"*": {
"fragment_size": 0,
"number_of_fragments": 0
}
}
}
}
I've found a query that both matches and lines up pretty closely with numbers I get from the UI ("Search in the last 1 day"):
{
"query": {
"filtered": {
"query": {
"term": { "level":3 }
},
"filter": {
"range": { "timestamp": { "gte": "now-1d" } }
}
}
}
}
Try the following query that uses bool query. I use a different timestamp format, which is the default in elasticsearch. Try that format first, if no luck modify the timestamp format to match yours.
{
"query": {
"bool" : {
"should" : [
{
"term": { "level":3 }
},
{
"range": {
"timestamp": {
"from": "2015-06-16T15:10:09",
"to": "2015-06-16T16:10:09"
}
}
}
]
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources