Elastic Search Date Range Query - elasticsearch

I am new to elastic search and I am struggling with date range query. I have to query the records which fall between some particular dates.The JSON records pushed into elastic search database are as follows:
"messageid": "Some message id",
"subject": "subject",
"emaildate": "2020-01-01 21:09:24",
"starttime": "2020-01-02 12:30:00",
"endtime": "2020-01-02 13:00:00",
"meetinglocation": "some location",
"duration": "00:30:00",
"employeename": "Name",
"emailid": "abc#xyz.com",
"employeecode": "141479",
"username": "username",
"organizer": "Some name",
"organizer_email": "cde#xyz.com",
I have to query the records which has start time between "2020-01-02 12:30:00" to "2020-01-10 12:30:00". I have written a query like this :
{
"query":
{
"bool":
{
"filter": [
{
"range" : {
"starttime": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
This query is not giving results as expected. I assume that the person who has pushed the data into elastic search database at my office has not set the mapping and Elastic Search is dynamically deciding the data type of "starttime" as "text". Hence I am getting inconsistent results.
I can set the mapping like this :
PUT /meetings
{
"mappings": {
"dynamic": false,
"properties": {
.
.
.
.
"starttime": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
.
.
.
}
}
}
And the query will work but I am not allowed to do so (office policies). What alternatives do I have so that I can achieve my task.
Update :
I assumed the data type to be "Text" but by default Elastic Search applies both "Text" and "Keyword" so that we can implement both Full Text and Keyword based searches. If it is also set as "Keyword" . Will this benefit me in any case. I do not have access to lots of stuff in the office that's why I am unable to debug the query.I only have the search API for which I have to build the query.
GET /meetings/_mapping output :
'
'
'
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
'
'
'

Date range queries will not work on text field, for that, you have to use the date field
Since you are working on date fields , best practice is to use the date field.
I would suggest you to reindex your index to another index so that you can change the type of your text field to date field
Step1-: Create index2 using index1 mapping and make sure to change the type of your date field which is text to date type
Step 2-: Run the elasticsearch reindex and reindex all your data from index1 to index2. Since you have changed your field type to date field type. Elasticsearch will now recognize this field as date
POST _reindex
{
"source":{ "index": "index1" },
"dest": { "index": "index2" }
}
Now you can run your Normal date queries on index2

As #jzzfs suggested the idea is to add a date sub-field to the starttime field. You first need to modify the mapping like this:
PUT meetings/_mapping
{
"properties": {
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss",
}
}
}
}
}
When done, you need to reindex your data using the update by query API so that the starttime.date field gets populated and index:
POST meetings/_update_by_query
When the update is done, you'll be able to leverage the starttime.date sub-field in your query:
{
"query": {
"bool": {
"filter": [
{
"range": {
"starttime.date": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}

There are ways of parsing text fields as dates at search time but the overhead is impractical... You could, however, keep the starttime as text by default but make it a multi-field and query it using starttime.as_date, for example.

Related

Can we query on Field if its mapping is not defined in ES?

Is it possible to Query on field which is not mapped with order??
Using Elastic search 7.4
I've created a index with with only 1 mapping
Index name - test_date_mapping_with_null
Dynamic mapping - False
properties - city -> text.
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"mappings" : {
"dynamic":false,
"properties" : {
"city" : { "type" : "text" }
}
}
}
Inserting documents with published_at field
POST test_date_mapping_with_null/_doc/1
{
"city": "NY",
"published_at": "2022-01-01T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/2
{
"city": "Paris",
"published_at": "2022-01-02T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/3
{
"city": "Mumbai",
"published_at": "2022-01-03T06:58:27.000Z"
}
POST test_date_mapping_with_null/_doc/4
{
"city": "Tokyo",
"published_at": "2022-01-04T06:58:27.000Z"
}
Mapping looks like this
"mappings": {
"_doc": {
"dynamic": "false",
"properties": {
"city": {
"type": "text"
}
}
}
}
Now Upon Search Query
GET test_date_mapping_with_null/_search
{
"query": {
"range": {
"published_at": {
"gte": "2022-01-01T00:58:27.000Z",
"lte": "2022-01-03T23:58:27.000Z",
"boost": 2.0
}
}
}
}
Actual - ES returns all the docs.
Expected - ES should return only Doc 1, 2 and 3 (i.e City -> NY, Paris and Mumbai Doc)
Your index mapping, currently only includes mapping for the city field, it does not have mapping for the published_at field as you have set "dynamic": "false" in your index mapping.
This means that published_at is stored in Elasticsearch, but this field is not indexed in Elasticsearch. In simple terms, this means that you cannot perform any search on the published_at field
No, You can't query the fields if its not indexed in the Elasticsearch(as you define dynamic:false, it won't be index), you however can see the them as part of _source when you get a document using _search or by document id.
Either change the mapping from dynamic:false to dynamic:true or add the field explicitly in the mapping(if you want to have dynamic:false), if you want to query the field.
You can't query on fields which are not specified in mapping and dynamic is set to false . You can only store those fields in _source.
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/dynamic.html

Partial search on date fields in elasticsearch

I'm trying to implement partial search on a date field in elastic search. For example if startDate is stored as "2019-08-28" i should be able to retrieve the same while querying just "2019" or "2019-08" or "2019-0".
For other fields i'm doing this:
{
"simple_query_string": {
"fields": [
"customer"
],
"query": "* Andrew *",
"analyze_wildcard": "true",
"default_operator": "AND"
}}
which perfectly works on text fields, but the same doesn't work on date fields.
This is the mapping :
{"mappings":{"properties":{"startDate":{"type":"date"}}}}
Any way this can be achieved, be it change in mapping or other query method? Also i found this discussion related to partial dates in elastic, not sure if it's much relevant but here it is:
https://github.com/elastic/elasticsearch/issues/45284
Excerpt from ES-Docs
Internally, dates are converted to UTC (if the time-zone is specified)
and stored as a long number representing milliseconds-since-the-epoch.
It is not possible to do searching as we can do on a text field. However, we can tell ES to index date field as both date & text.e.g
Index date field as multi-type:
PUT sample
{
"mappings": {
"properties": {
"my_date": {
"type": "date",
"format": "year_month_day",//<======= yyyy-MM-dd
"fields": {
"formatted": {
"type": "text", //<========= another representation of type TEXT, can be accessed using my_date.formatted
"analyzer": "whitespace" //<======= whitespace analyzer (standard will tokenized 2020-01-01 into 2020,01 & 01)
}
}
}
}
}
}
POST dates/_doc
{
"date":"2020-01-01"
}
POST dates/_doc
{
"date":"2019-01-01"
}
Use wildcard query to search: You can even use n-grams at indexing time for faster search if required.
GET dates/_search
{
"query": {
"wildcard": {
"date.formatted": {
"value": "2020-0*"
}
}
}
}

how to use filter in ElasticSearch?

I'm trying to implement filter using ElasticSearch I'm simply want to implement range filter I've the following data:
{
"result": [
{
"Id": "144039",
"posted_dt": 1506951883637,
"submit_dt": 1507609800000,
"title": "Request for Information (RFI) # 306-18-0018",
"fname": "RODRI",
"email": "",
"desc": "dummy Text"
}
]
}
I want to get data from last 3 or 5 days I'm using this :
query = {
"bool": {
"must": [
{
"range" : {
"posted_dt" : {
"gte" : "now-3d/d",
"lt" : "now/d"
}
}
} ]
}
}
My mapping for posted_dt is :
"posted_dt": {
"type": "long"
},
I did try the filter as well but didn't succeed.
Please help.
Thanks
Randheer
Your mapping of "posted_dt" field is incorrect. You intend to store date which is in epoch in millis but you are storing it as long type. So the date range filter won't work on long datatype. Update your "posted_dt" field's mapping like :
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"posted_dt": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
}
Refer Date datatype in Elasticsearch.
First you need to share your mapping. Actually make sure that posted_dt and submit_dt are defined as date in your mapping. Here you are using a long which is incorrect to deal with dates.
A side note is that you should use filter instead of must in your case. Will be faster IMO.

Replacing (Bulk Update) Nested documents in ElasticSearch

I have an ElasticSearch index with vacation rentals (100K+), each including a property with nested documents for availability dates (1000+ per 'parent' document). Periodically (several times daily), I need to replace the entire set of nested documents for each property (to have fresh data for availability per vacation rental property) - however ElasticSearch default behavior is to merge nested documents.
Here is a snippet of the mapping (availability dates in the "bookingInfo"):
{
"vacation-rental-properties": {
"mappings": {
"property": {
"dynamic": "false",
"properties": {
"bookingInfo": {
"type": "nested",
"properties": {
"avail": {
"type": "integer"
},
"datum": {
"type": "date",
"format": "dateOptionalTime"
},
"in": {
"type": "boolean"
},
"min": {
"type": "integer"
},
"out": {
"type": "boolean"
},
"u": {
"type": "integer"
}
}
},
// this part left out
}
}
}
}
Unfortunately, our current underlying business logic does not allow us to replace or update parts of the "bookingInfo" nested documents, we need to replace the entire array of nested documents. With the default behavior, updating the 'parent' doc, merely adds new nested docs to the "bookingInfo" (unless they exist, then they're updated) - leaving the index with a lot of old dates that should no longer be there (if they're in the past, they're not bookable anyway).
How do I go about making the update call to ES?
Currently using a bulk call such as (two lines for each doc):
{ "update" : {"_id" : "abcd1234", "_type" : "property", "_index" : "vacation-rental-properties"} }
{ "doc" : {"bookingInfo" : ["all of the documents here"]} }
I have found this question that seems related, and wonder if the following will work (first enabling scripts via script.inline: on in the config file for version 1.6+):
curl -XPOST localhost:9200/the-index-and-property-here/_update -d '{
"script" : "ctx._source.bookingInfo = updated_bookingInfo",
"params" : {
"updated_bookingInfo" : {"field": "bookingInfo"}
}
}'
How do I translate that to a bulk call for the above?
Using ElasticSearch 1.7, this is the way I solved it. I hope it can be of help to someone, as a future reference.
{ "update": { "_id": "abcd1234", "_retry_on_conflict" : 3} }\n
{ "script" : { "inline": "ctx._source.bookingInfo = param1", "lang" : "js", "params" : {"param1" : ["All of the nested docs here"]}}\n
...and so on for each entry in the bulk update call.

Elasticsearch mapping - different data types in same field

I am trying to to create a mapping that will allow me to have a document looking like this:
{
"created_at" : "2014-11-13T07:51:17+0000",
"updated_at" : "2014-11-14T12:31:17+0000",
"account_id" : 42,
"attributes" : [
{
"name" : "firstname",
"value" : "Morten",
"field_type" : "string"
},
{
"name" : "lastname",
"value" : "Hauberg",
"field_type" : "string"
},
{
"name" : "dob",
"value" : "1987-02-17T00:00:00+0000",
"field_type" : "datetime"
}
]
}
And the attributes array must be of type nested, and dynamic, so i can add more objects to the array and index it by the field_type value.
Is this even possible?
I have been looking at the dynamic_templates. Can i use that?
You actually can index multiple datatypes into the same field using a multi-field mapping and the ignore_malformed parameter, if you are willing to query the specific field type if you want to do type specific queries (like comparisons).
This will allow elasticsearch to populate the fields that are pertinent for each input, and ignore the others. It also means you don’t need to do anything in your indexing code to deal with the different types.
For example, for a field called user_input that you want to be able to do date or integer range queries over if that is what the user has entered, or a regular text search if the user has entered a string, you could do something like the following:
PUT multiple_datatypes
{
"mappings": {
"_doc": {
"properties": {
"user_input": {
"type": "text",
"fields": {
"numeric": {
"type": "double",
"ignore_malformed": true
},
"date": {
"type": "date",
"ignore_malformed": true
}
}
}
}
}
}
}
We can then add a few documents with different user inputs:
PUT multiple_datatypes/_doc/1
{
"user_input": "hello"
}
PUT multiple_datatypes/_doc/2
{
"user_input": "2017-02-12"
}
PUT multiple_datatypes/_doc/3
{
"user_input": 5
}
And when you search for these, and have ranges and other type-specific queries work as expected:
// Returns only document 2
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.date": {
"gte": "2017-01-01"
}
}
}
}
// Returns only document 3
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.numeric": {
"lte": 9
}
}
}
}
// Returns only document 1
GET multiple_datatypes/_search
{
"query": {
"term": {
"user_input": {
"value": "hello"
}
}
}
}
I wrote about this as a blog post here
No - you cannot have different datatypes for the same field within the same type.
e.g. the field index/type/value can not be both a string and a date.
A dynamic template can be used to set the datatype and analyzer based on the format of the field name
For example:
set all fields with field names ending in "_dt" to type datetime.
But this won't help in your scenario, once the datatype is set you can't change it.

Resources