Elasticsearch range query with multiple condition - spring-boot

I have to fetch records from Elastic Search on the basis of date it is updated and created. I have these two fields updatedDate and createdDate and the condition should be:
To fetch records that has updatedDate within the range of past 3 years.
If updatedDate is null, fetch records that has createdDate within the range of past 3 years.
I have written the query in java for fetching the records on the basis of record createdDate:
.must(QueryBuilders.rangeQuery("createdDate").from(startDate,true).to(endDate,true));
startDate and endDate holds the date range.
I am new to Elastic Search, don't know how to implement the above condition.

Since you have not provided any index data, so adding a working example with sample index data, mapping, search query and search result that satisfies all the conditions required for your use case.
Index Mapping:
{
"mappings": {
"properties": {
"createdDate": {
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'",
"type": "date"
},
"updatedDate": {
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'",
"type": "date"
}
}
}
}
Index Data:
{
"createdDate": "2020-08-15T00:00:00Z"
}
{
"createdDate": "2019-08-15T00:00:00Z"
}
{
"createdDate": "2010-08-15T00:00:00Z"
}
{
"updatedDate": "2021-08-15T00:00:00Z",
"createdDate": "2002-08-15T00:00:00Z"
}
{
"updatedDate": "2018-08-15T00:00:00Z",
"createdDate": "2020-09-15T00:00:00Z"
}
{
"updatedDate": "2000-08-15T00:00:00Z",
"createdDate": "2020-09-15T00:00:00Z"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"bool": {
"filter": {
"range": {
"createdDate": {
"gte": "now-3y",
"lte": "now"
}
}
},
"must_not": {
"exists": {
"field": "updatedDate"
}
}
}
}
]
}
},
{
"bool": {
"filter": {
"range": {
"updatedDate": {
"gte": "now-3y",
"lte": "now"
}
}
}
}
}
],
"minimum_should_match": 1
}
}
}
Search Result:
"hits": [
{
"_index": "64965551",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"createdDate": "2020-08-15T00:00:00Z"
}
},
{
"_index": "64965551",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"createdDate": "2019-08-15T00:00:00Z"
}
},
{
"_index": "64965551",
"_type": "_doc",
"_id": "5",
"_score": 0.0,
"_source": {
"updatedDate": "2018-08-15T00:00:00Z",
"createdDate": "2020-09-15T00:00:00Z"
}
}
]

Related

How do I get a terms aggregation to respect the pipeline above it?

(I've attached test data below the question)
I would like to know why when I use min_doc_count of a terms aggregation that is inside a pipeline aggregation, does it not respect the results of the aggregation above it?
Here's my query:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "facets",
"query": {
"bool": {
"filter": [
{
"term": {
"facets.name": "brand"
}
},
{
"term": {
"facets.value": "hokey"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"facets": {
"nested": {
"path": "facets"
},
"aggs": {
"names": {
"terms": {
"field": "facets.name"
},
"aggs": {
"values": {
"terms": {
"field": "facets.value",
"min_doc_count": 0
}
}
}
}
}
}
}
}
Looking at the above, and with using min_doc_count=0 for the facets.value term, why am I seeing results for all possible facets.value even when they don't match the above facets.name?
Surely, the aggs should be hierarchical and respect the higher levels? Do I need to run a script filter or something?
Please experiment with the below data to see what I mean. I don't want to have to run multiple queries for our search filtering because without min_doc_count because lots of facet.values are filtered out, but with it, we have too many irrelevant results in the lowest aggregation.
Mapping:
{
"mappings": {
"properties": {
"facets": {
"type": "nested",
"properties": {
"name": { "type": "keyword"},
"value": { "type": "keyword"}
}
}
}
}
}
Bulk documents:
{ "index": { "_index": "product-facets", "_id": 1} }
{"facets":[{"name":"brand","value":"ubest"},{"name":"color","value":"green"},{"name":"department","value":"soccer"}]}
{ "index": { "_index": "product-facets", "_id": 2} }
{"facets":[{"name":"brand","value":"ubest"},{"name":"color","value":"green"},{"name":"department","value":"adventure"}]}
{ "index": { "_index": "product-facets", "_id": 3} }
{"facets":[{"name":"brand","value":"beert"},{"name":"color","value":"white"},{"name":"department","value":"soccer"}]}
{ "index": { "_index": "product-facets", "_id": 4} }
{"facets":[{"name":"brand","value":"ubest"},{"name":"color","value":"yellow"},{"name":"department","value":"adventure"}]}
{ "index": { "_index": "product-facets", "_id": 5} }
{"facets":[{"name":"brand","value":"hokey"},{"name":"color","value":"yellow"},{"name":"department","value":"adventure"}]}
{ "index": { "_index": "product-facets", "_id": 6} }
{"facets":[{"name":"brand","value":"beert"},{"name":"color","value":"black"},{"name":"department","value":"casual"}]}
{ "index": { "_index": "product-facets", "_id": 7} }
{"facets":[{"name":"brand","value":"hokey"},{"name":"color","value":"white"},{"name":"department","value":"adventure"}]}
{ "index": { "_index": "product-facets", "_id": 8} }
{"facets":[{"name":"brand","value":"ubest"},{"name":"color","value":"black"},{"name":"department","value":"casual"}]}
{ "index": { "_index": "product-facets", "_id": 9} }
{"facets":[{"name":"brand","value":"hokey"},{"name":"color","value":"white"},{"name":"department","value":"soccer"}]}
{ "index": { "_index": "product-facets", "_id": 10} }
{"facets":[{"name":"brand","value":"hokey"},{"name":"color","value":"white"},{"name":"department","value":"adventure"}]}

elk's elastic search dsl case sensitive

I'm doing an Elasticsearch Query DSL query on ELK such as:
{
"query": {
"wildcard": {
"url.path": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score"
}
}
}
}
but it seems is case sensitive (so show only info with "download", not "Download" or "DOWNLOAD").
i.e. is case sensitive.
can I disable this? and search case insensitive?
Version used: 7.9.1
The below query will help you perform case-insensitive search as it will fetch results for *download, *Download and *DOWNLOAD. You may replace with your index and with the field you would like to perform this search.
Search Query
GET /<my-index>/_search
{
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "*download",
"fields": ["<field1>"]
}
}
}
}
}
If you wish to perform the same search on multiple fields, you can add the same in list.
Search on multiple fields
GET /<my-index>/_search
{
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "*download",
"fields": ["<field1>","<field2>","field3>"]
}
}
}
}
}
There is a case_insensitive parameter available for wildcard query, but it was introduced in Elasticsearch 7.10.0, so you need to upgrade if you are still on 7.9.1.
If you can upgrade to 7.10.0 or higher:
Ideally, in index mapping field should use wildcard type:
{
"mappings": {
"properties": {
"url.path": {
"type": "wildcard"
}
}
}
}
Then a wildcard query with case insensitivity enabled will find all the variants ("download", "DOWNLOAD", "download", etc)
{
"query": {
"wildcard": {
"url.path": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": true
}
}
}
}
If you must remain at 7.9.1:
Define your mapping in such a way that Elasticsearch treats the field contents as lowercase. The following will mimic wildcard type (it's a keyword, so only one token) indexed as lowercase.
{
"mappings": {
"properties": {
"url": {
"type": "text",
"analyzer": "lowercase-keyword"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"lowercase-keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
The query, without the case_insensitive parameter which is unsupported in this version:
{
"query": {
"wildcard": {
"url": {
"value": "*download*",
"boost": 1,
"rewrite": "constant_score"
}
}
}
}
Example results (note that searching for "*download*" and "*DoWnLoAd*" with both work in the same way):
{
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "PtbQe3wByTvslqtrs7Cn",
"_score": 1.0,
"_source": {
"url": "http://example.com/download"
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "P9bQe3wByTvslqtrvbDt",
"_score": 1.0,
"_source": {
"url": "http://example.com/Download"
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "QNbQe3wByTvslqtrzbDw",
"_score": 1.0,
"_source": {
"url": "http://example.com/DOWNLOAD"
}
}
]
}
}
You can use case_insensitive parameter for wildcard query. This parameter was introduced in 7.10.0 version
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"url": {
"properties": {
"path": {
"type": "wildcard"
}
}
}
}
}
}
Index Data:
{
"url":{
"path":"xx/download"
}
}
Search Query:
{
"query": {
"wildcard": {
"url.path": {
"value": "*Download*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": false
}
}
}
}
Search Result:
No results will be there when you are searching for *Download* or *DOWNLOAD*
Update:
You can use the wildcard query with "case_insensitive": true parameter
Adding a sample index data, search query, and search result
Index Data:
{
"url": {
"path": "download"
}
}
{
"url": {
"path": "DOWNLOAD"
}
}
{
"url": {
"path": "Download"
}
}
Search Query:
{
"query": {
"wildcard": {
"url.path": {
"value": "*DOWNLOAD*",
"boost": 1,
"rewrite": "constant_score",
"case_insensitive": true
}
}
}
}
Search Result:
"hits": [
{
"_index": "67210888",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"url": {
"path": "download"
}
}
},
{
"_index": "67210888",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"url": {
"path": "Download"
}
}
},
{
"_index": "67210888",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"url": {
"path": "DOWNLOAD"
}
}
}
]

Filter elastic data on array count

How can we fetch candidates which have at least one phone number from the below index data along with other conditions like must and should?
Using elastic version 6.*
{
"_index": "test",
"_type": "docs",
"_id": "1271",
"_score": 1.518617,
"_source": {
"record": {
"createdDate": "2020-10-16T10:49:51.53",
"phoneNumbers": [
{
"type": "Cell",
"id": 0,
"countryCode": "+1",
"phoneNumber": "7845200448",
"extension": "",
"typeId": 700
}
]
},
"entityType": "Candidate",
"dbId": "1271",
"id": "1271"
}
}
You can use terms query that returns documents that contain one
or more exact terms in a provided field.
Search Query:
{
"query": {
"bool": {
"must": [
{
"terms": {
"record.phoneNumbers.phoneNumber.keyword": [
"7845200448"
]
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64388591",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"record": {
"createdDate": "2020-10-16T10:49:51.53",
"phoneNumbers": [
{
"type": "Cell",
"id": 0,
"countryCode": "+1",
"phoneNumber": "7845200448",
"extension": "",
"typeId": 700
}
]
},
"entityType": "Candidate",
"dbId": "1271",
"id": "1271"
}
}
]
Update 1: For version 7.*
You need to use a script query, to filter documents based on the provided script.
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['record.phoneNumbers.phoneNumber.keyword'].length > 0",
"lang": "painless"
}
}
}
}
}
}
For version 6.*
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['record.phoneNumbers.phoneNumber.keyword'].values.length > 0",
"lang": "painless"
}
}
}
}
}
}
You can use exists query for this purpose like below which is a lightweight query in comparison with scripts:
{
"query": {
"exists": {
"field": "record.phoneNumbers.phoneNumber"
}
}
}

How to query IP range in Elastic search?

I want to query IP range from:172.16.0.0 to 172.31.0.0 in ELK
I try two query methods, but fail.
{
"query": {
"bool": {
"should": [
{
"regexp": {
"DstIP": "172.(3[0-1]|1[6-9]|2[0-9]).*"
}
}
],
"minimum_should_match": 1
}
}
}
{
"query": {
"range": {
"DstIP": {
"gte": "172.16.0.0",
"lte": "172.31.0.0"
}
}
}
}
How can query IP range in ELK?
For range queries to work correctly on IP values it is necessary to define the field data type as ip.
Below is the working example with mapping, sample docs, and search query.
Mapping:
{
"mappings": {
"properties": {
"dest": {
"type": "ip"
}
}
}
}
Index data:
Then I've taken a couple of sample documents like this:
{ "dest":"172.16.0.0"}
{ "dest":"172.31.0.0"}
{ "dest":"172.21.0.0"}
{ "dest":"172.1.0.0" }
{ "dest":"172.12.0.0"}
Search Query :
{
"query": {
"range": {
"dest": {
"gte": "172.16.0.0",
"lte": "172.31.0.0"
}
}
}
}
Search Result :
"hits": [
{
"_index": "foo4",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"dest": "172.16.0.0"
}
},
{
"_index": "foo4",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"dest": "172.31.0.0"
}
},
{
"_index": "foo4",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"dest": "172.21.0.0"
}
}
]

How to highlight date fields in Elasticsearch?

My mapping:
"mappings": {
"my_type": {
"properties": {
"birthDate": {
"type": "date",
"format": "dateOptionalTime"
},
"name": {
"type": "string"
}
}
}
}
My search query:
GET my_index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "babken"
}
},
{
"term": {
"birthDate": {
"value": "1999-01-01"
}
}
}
]
}
},
"highlight": {
"fields": {
"*": {}
}
}
}
However in the response body, only the name field is highlighted, even though the birthDate field has matched as well:
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1a82fbb4-1268-42b9-9999-ef932f67a114",
"_score": 12.507131,
"_source": {
"name": "babken",
"birthDate": "1999-01-01",
},
"highlight": {
"name": [
"<em>babken</em>"
]
}
}
...
How can I make the birthDate field appear in "highlight" results as well if it has matched?
I'm using Elasticsearch 1.6
You would need to change the the type to string to enable highlighting.
Bare minimum requirement for a field to be enabled for highlighting is that it should be string type.
The following issue has little more discussion about it.

Resources