conditionally query for fields in elasticsearch - elasticsearch

I m new to Elasticsearch and before posting this question I have googled for help but not understanding how to write the query which i wanted to write.
My problem is I have few bunch of documents which i want to query, few of those documents has field "DueDate" and few of those has "PlannedCompletionDate" but not both exist in a single document. So I want to write a query which should conditionally query for a field from documents and return all documents.
For example below I m proving sample documents of each type and my query should return results from both the documents, I need to write query which should check for field existence and return the document
"_source": {
...
"plannedCompleteDate": "2019-06-30T00:00:00.000Z",
...
}
"_source": {
...
"dueDate": "2019-07-26T07:00:00.000Z",
...
}

You can use range query with the combination of the boolean query to achieve your use case.
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"plannedCompleteDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
"dueDate": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
Index Data:
{
"plannedCompleteDate": "2019-05-30"
}
{
"plannedCompleteDate": "2020-06-30"
}
{
"dueDate": "2020-05-30"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"plannedCompleteDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
},
{
"range": {
"dueDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "65808850",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"plannedCompleteDate": "2020-06-30"
}
},
{
"_index": "65808850",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"dueDate": "2020-05-30"
}
}
]

Related

elasticsearch find doc by time with datetime field

I'm trying to retrieve all documents that have a date between 2 dates and a time between 2 hours.
I can't get the query to work.
Is it possible ? If yes, how.
[
{
"_index": "a1",
"_type": "_doc",
"_id": "50c09e31-1fad-4d25-ab9d-35154a1b765b",
"_score": 5.0,
"_source":
{
"start_at": "2022-06-23 14:00",
"end_at": "2022-06-23 14:15",
...
}
},
{
"_index": "a1",
"_type": "_doc",
"_id": "d96ba291-63de-422a-9123-3d1a1d573861",
"_score": 5.0,
"_source":
{
"start_at": "2022-06-24 16:30",
"end_at": "2022-06-24 17:00",
...
}
}
]
GET /a1/_search?pretty
{
"query": {
"bool": {
"must": [
{
"range": {
"start_at": {
"gte": "2022-06-20",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"start_at": {
"lt": "2022-06-27",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"start_at": {
"gte": "14:00",
"format": "HH:mm"
}
}
},
{
"range": {
"start_at": {
"lt": "18:00",
"format": "HH:mm"
}
}
},
]
}
},
"size": 10
}
Thanks.
The immediate solution would be to use a query similar to this one but change the script part to:
doc['start_at'].value.getHourOfDay() ...
Since scripting can be bad for performance, a better solution would be to index the hours into a dedicated field and then perform a range query on it.

How to search over all fields and return every document containing that search in elasticsearch?

I have a problem regarding searching in elasticsearch.
I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:
"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",
Those are just a few examples, however when I index for example:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "2008"
}
}
}
It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.
I also have problem searching file names.
In my index there are fields that contain fileNames like this:
"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",
When i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo"
}
}
}
I get no results
But if i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo.txt"
}
}
}
I get the proper result.
Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero.
Any help would be greatly appreciated.
Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to
{
"tokens": [
{
"token": "demo.txt",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.
You can instead use a wildcard query to search for a document having demo in fileName
{
"query": {
"wildcard": {
"fileName": {
"value": "demo*"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"fileName": "demo.pdf"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"fileName": "demo.txt"
}
}
]
Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.
You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below
{
"mappings": {
"properties": {
"changedDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"projectSmirCreationDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"dueDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"revisionDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
}
}
}
}
Index Data:
{
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
{
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
Search Query:
{
"query": {
"multi_match": {
"query": "2008"
}
}
}
Search Result:
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
}
]

Elasticsearch - Trouble querying for exact date with range query

I have the following mapping definition in my events index:
{
"events": {
"mappings": {
"properties": {
"data": {
"properties": {
"reportDate": {
"type": "date",
"format": "M/d/YYYY"
}
}
}
}
}
}
And an example doc:
{
"_index": "events",
"_type": "_doc",
"_id": "12345",
"_version": 1,
"_seq_no": 90,
"_primary_term": 1,
"found": true,
"_source": {
"data": {
"reportDate": "12/4/2018",
}
}
}
My goal is query for docs with an exact data.reportDate of 12/4/2018, but when I run this query:
{
"query": {
"range": {
"data.reportDate": {
"lte": "12/4/2018",
"gte": "12/4/2018",
"format": "M/d/YYYY"
}
}
}
}
I instead get all of the docs that have a data.reportDate that is in the year 2018, not just 12/4/2018. I've tried setting relation to CONTAINS and WITHIN with no luck. Any ideas?
You need to change your date format from M/d/YYYY to M/d/yyyy. Refer to this ES official documentation to know more about date formats. You can even refer to this documentation to know about the difference between yyyy and YYYY
yyyy specifies the calendar year whereas YYYY specifies the year (of
“Week of Year”)
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"data": {
"properties": {
"reportDate": {
"type": "date",
"format": "M/d/yyyy"
}
}
}
}
}
}
Index Data:
{
"data": {
"reportDate": "12/3/2018"
}
}
{
"data": {
"reportDate": "12/4/2018"
}
}
{
"data": {
"reportDate": "12/5/2018"
}
}
Search Query:
{
"query": {
"bool": {
"must": {
"range": {
"data.reportDate": {
"lte": "12/4/2018",
"gte": "12/4/2018"
}
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65312594",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"data": {
"reportDate": "12/4/2018"
}
}
}
]

Elasticsearch Date parsing error in 7.x version

Im using Elasticsearch 7.1 and i have defined the format in my index mappings as below :
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'|| yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
But im getting date parsing error when searching against the date - "2020-07-09T00:12:22.011-00:00". The format yyyy-MM-dd'T'HH:mm:ss.SSSXXX is already defined as one of the accepted formats.
The error is
Failed to parse date field [2020-07-09T00:12:22.011-00:00] with format [yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX]:
Can anyone please help?
Adding Working example with mapping and search query.
To know more about the Date data type refer to this documentation.
The search query mentioned below is for finding exact date type values.
To Return documents that contain terms within a provided range refer this
Mapping :
{
"mappings": {
"properties": {
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
}
}
}
Search Query:
{
"query": {
"term": {
"ManufacturerDate": {
"value": "2020-07-09T00:12:22.011-00:00"
}
}
}
}'
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]
Update 1:
You can even use Constant score query
Search query:
{
"query": {
"constant_score": {
"filter": {
"term": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
},
"boost": 1.2
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.2,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]
Update 2: By changing the order of patterns the query works (Using ES version 7.2)
Mapping:
{
"mappings": {
"properties": {
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX||yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSS"
}
}
}
}
Index data:
{
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
Search Query:
{
"query": {
"constant_score": {
"filter": {
"term": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
},
"boost": 1.2
}
}
}
Search Result :
"hits": [
{
"_index": "my_index5",
"_type": "_doc",
"_id": "1",
"_score": 1.2,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]

Elastic search ---: MUST_NOT query not working

I have a query in which i want to add a must_not clause that would discard all records that have blank data for a some field. I tried a lot of ways but none worked. when I issue the same query (mentioned below) with other specific fields then it works fine.
this query should get all records that do not have "registrationType1" field empty/blank
query:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1": ""
}
}
]
}
}
}
the results below still contains "registrationType1" with empty values
results:
**"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842002",
"_score": 1,
"_source": {
"registrationType1": "A&R"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842033",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842213",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842963",
"_score": 1,
"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3869063",
"_score": 1,
"_source": {
"registrationType1": ""}}**
PFB mappings for the field above
"registrationType1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
You need to use the keyword subfield in order to do this:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": "" <-- change this
}
}
]
}
}
}
If you do not specify any text value on the text fields, there is basically nothing to analyze and return the documents accordingly.
In similar way, if you remove must_not and replace it with must, it would show empty results.
What you can do is, looking at your mapping, query must_not on keyword field. Keyword fields won't be analysed and in that way your query would return the results as you expect.
Query
POST myemptyindex/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
}
Hope this helps!
I am using elasticsearch version 7.2,
I replicated your data and ingested in my elastic index,and tried querying with and without .keyword.
I am getting the desired result when using the ".keyword" in the field name.It is not returning the docs which have registrationType1="".
Note - The query does not works when not using the ".keyword"
I have added my sample code below, have a look if that helps.
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index="test", ignore=400, body={
"mappings": {
"_doc": {
"properties": {
"registrationType1": {
"type": "text",
"field": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
})
data = {
"registrationType1": ""
}
es.index(index="test",doc_type="_doc",body=data,id=1)
search = es.search(index="test", body={
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
})
print(search)
Executing the above should not return any results as we are inserting empty for the field
There was some issue with the mappings itself, I deleted the index and re-indexed it with new mappings and its working now.

Resources