Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963 - elasticsearch

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}

Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

Related

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

combine two queries of elasticsearch?

I have a "date_created_tranx" and "phone_number_cust" fields. Few entries of date_created_tranx are null . I want to have particular phone_number within date_range and with null value.
a = {
"query": {
"bool": {
"must": [
{
"range": {
"date_created_tranx": {
"gte": "2019-12-01",
"lte": "2020-05-07"
}
}
},
{
"regexp": {
"phone_number_cust": ".*702625.*"
}
}
]
}
}
}
b = {
"query": {
"bool": {
"must": [{
"regexp": {
"phone_number_cust": ".*702625.*"
}
}],
"must_not": [{
"exists": {
"field": "date_created_tranx"
}
}
]
}
}
}
How to combine these ??
I cannot call it twice because The result is paginated
I am totally new to elastic search . Any leads will be helpful.
I tried
doc2 = {
"query" :{
"bool" : {
"must":[
a,
b
]
}
}
}
It throws
Error: RequestError: RequestError(400, 'parsing_exception', 'no [query] registered for [query]')
The query you're looking for is this one, i.e.:
We have a constraint on the phone number and we also check that either the date_created_tranx is within bounds or does not exist (i.e. is null).
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"range": {
"date_created_tranx": {
"gte": "2019-12-01",
"lte": "2020-05-07"
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "date_created_tranx"
}
}
}
}
],
"filter": [
{
"regexp": {
"phone_number_cust": ".*702625.*"
}
}
]
}
}
}

How do I recreate an "or" query now that "missing" is deprecated?

I am upgrading to elasticsearch 5.2 and have the following query, which now fails because the "missing" filter is deprecated:
{
"query": {
"bool": {
"should": [
{
"missing": {
"field": "birthday"
}
},
{
"range": {
"birthday": {
"lte": "20131231"
}
}
}
]
}
}
}
So, I am looking for documents that are either missing the birthday field or have a birthday less than 12/31/2013. The suggested replacement for "missing" is to use "must_not". I get that but how do I now do the same "or" query I had going on before? I have:
{
"query": {
"bool": {
"should": {
"range": {
"birthday": {
"lte": "20131231"
}
}
},
"must_not": {
"exists": {
"field": "birthday"
}
}
}
}
}
You're on the right path and almost there:
{
"query": {
"bool": {
"should": [
{
"range": {
"birthday": {
"lte": "20131231"
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "birthday"
}
}
}
}
]
}
}
}

Elasticsearch: execute a filter on nested document only if it exists

I am using ES 2.3 and have a query in which filter section looks as follows:
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
},
{
"exists": {
"field": "employees"
}
},
{
#....other filter here based on root document, not on nested employee document
}
]
}
}
}
}
I have a filter, where I check some conditions in the nested document "employees" in a bigger document called company, But I want to run this filter, only if "employees" object exists, as some of the document may not have that nested document at all. So I added , {"exists": {"field": "employees"}}
but this doesn't seem to work. Any idea what change I should make to get it work?
You can do it like this. However, if documents don't have the employees field, they will not be picked up anyway, so I'm not sure why you want/need that exists query in the first place.
{
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"exists": {
"field": "employees"
}
}
}
},
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
}
]
}
}
}
}

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

Resources