Elasticsearch multiple fields OR query - elasticsearch

Here is an example record that I have stored in ES:
"taskCurateStatus": true,
"taskMigrateStatus": true,
"verifiedFields": 7,
"taskId": "abcdef123",
"operatorEmail": "test#test.com"
Example Query I'm making via /_search:
{
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"msg.operator_email": "test#test.com"
}
}
{
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}
]
}
},
"from": 0,
"size": 50
}
Basically I want to also filter by documents that have EITHER taskCurateStatus or taskMigrateStatus be true. Some messages have only one of them defined. I was thinking of using a should query but not sure how that would work with the match query. Any help would be appreciated. Thanks

you can add another boolean filter inside your must filter. This boolean filter can implemenet the should clause where you can compare the boolean flags with a should filter combining both the boolean check filters
{
"sort": [{
"#timestamp": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": [{
"match": {
"msg.operator_email": "test#test.com"
}
}, {
"range": {
"#timestamp": {
"gte": "2017-03-05",
"lte": "2017-03-12"
}
}
}, {
"bool": {
"should": [{
"term": {
"taskCurateStatus": {
"value": true
}
}
}, {
"term": {
"taskMigrateStatus": {
"value": true
}
}
}]
}
}]
}
},
"from": 0,
"size": 50
}
Take a look at the above query and see if the helps
Thanks

Related

ElasticSearch query with prefix for aggregation

I am trying to add a prefix condition for my ES query in a "must" clause.
My current query looks something like this:
body = {
"query": {
"bool": {
"must":
{ "term": { "article_lang": 0 }}
,
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I need to add a mandatory condition to my query to filter articles whose id starts with "article-".
So, far I have tried this:
{
"query": {
"bool": {
"should": [
{ "term": { "article_lang": 0 }},
{ "prefix": { "article_id": {"value": "article-"} }}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I am fairly new to ES and from the documentations online, I know that "should" is to be used for "OR" conditions and "must" for "AND". This is returning me some data but as per the condition it will be consisting of either article_lang=0 or articles starting with article-. When I use "must", it doesn't return anything.
I am certain that there are articles with id starting with this prefix because currently, we are iterating through this result to filter out such articles. What am I missing here?
In your prefix query, you need to use the article_id.keyword field, not article_id. Also, you should prefer filter over must since you're simply doing yes/no matching (aka filters)
{
"query": {
"bool": {
"filter": [ <-- change this
{
"term": {
"article_lang": 0
}
},
{
"prefix": {
"article_id.keyword": { <-- and this
"value": "article-"
}
}
}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

How do I limit an ElasticSearch API count by date?

I'm trying to count the number of query matches over a given time range, hitting the URL /{index}/_count with the body indicated below.
I'm new to Query DSL, so it's quite possible I'm overlooking something obvious. However, the straightforward application of a count to an existing query doesn't work. I don't see anything in the docs that indicate a count query should receive special treatment.
I've tried adding a range and aggregations to the query, but I keep getting the following error or some variant:
indices:data/read/count[s]]]; nested:
QueryParsingException[[graylog2_NN] request does not support [{label}]]
Limit query by timestamp:
{
"query": {
"term": { "level":3 },
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
}
}
Use an aggregation:
{
"query": {
"term": { "level":3 }
},
"aggs": {
"range": {
"date_range": {
field: "_timestamp",
"ranges": {
{ "to": "now-1d" },
{ "from": "now-2d" },
}
}
}
}
}
I've also tried plugging in the query exported from the UI (bug icon on an individual stream display), no joy there either (one hour's worth of matches):
{
"from": 0,
"size": 100,
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"from": "2015-06-16 15:10:09.322",
"to": "2015-06-16 16:10:09.322",
"include_lower": true,
"include_upper": true
}
}
},
{
"query": {
"query_string": {
"query": "streams:5568c9dbe4b0b31b781bf105"
}
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"highlight": {
"require_field_match": false,
"fields": {
"*": {
"fragment_size": 0,
"number_of_fragments": 0
}
}
}
}
I've found a query that both matches and lines up pretty closely with numbers I get from the UI ("Search in the last 1 day"):
{
"query": {
"filtered": {
"query": {
"term": { "level":3 }
},
"filter": {
"range": { "timestamp": { "gte": "now-1d" } }
}
}
}
}
Try the following query that uses bool query. I use a different timestamp format, which is the default in elasticsearch. Try that format first, if no luck modify the timestamp format to match yours.
{
"query": {
"bool" : {
"should" : [
{
"term": { "level":3 }
},
{
"range": {
"timestamp": {
"from": "2015-06-16T15:10:09",
"to": "2015-06-16T16:10:09"
}
}
}
]
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources