How to index date ranges with ElasticSearch 5.1 - elasticsearch

I have documents that I want to index/search with ElasticSearch. These documents may contain multiple dates, and in some cases, the dates are actually date ranges. I'm wondering if someone can help me figure out how to write a query that does the right thing (or how to properly index my document so I can query it).
An example is worth a thousand words. Suppose the document contains two marriage date ranges: 2005-05-05 to 2007-07-07 and 2012-12-012 to 2014-03-03.
If I index each date range in start and end date fields, and write a typical range query, then a search for 2008-01-01 will return this record because one marriage will satisfy one of the inequalities and the other will satisfy the other. I don't know how to get ES to keep the two date ranges separate. Obviously, having marriage1 and marriage2 fields would resolve this particular problem, but in my actual data set I have an unbounded number of dates.
I know that ES 5.2 supports the date_range data type, which I believe would resolve this issue, but I'm stuck with 5.1 because I'm using AWS's managed ES.
Thanks in advance.

You can use nested objects for this purpose.
PUT /records
{
"mappings": {
"record": {
"properties": {
"marriage": {
"type": "nested",
"properties": {
"start": { "type": "date" },
"end": { "type": "date" },
"person1": { "type": "string" },
"person2": { "type": "string" }
}
}
}
}
}
}
PUT /records/record/1
{
"marriage": [ { "start" : "2005-05-05","end" :"2007-07-07" , "person1" : "","person2" :"" },{"start": "2012-12-12","end": "2014-03-03","person1" : "","person2" :"" }]
}
POST /records/record/_search
{
"query": {
"nested": {
"path": "marriage",
"query": {
"range": {
"marriage.start": { "gte": "2008-01-01", "lte": "2015-02-03"}
}
}
}
}

Related

How to search by non-tokenized field length in ElasticSearch

Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.
There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.
For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).

how to use filter in ElasticSearch?

I'm trying to implement filter using ElasticSearch I'm simply want to implement range filter I've the following data:
{
"result": [
{
"Id": "144039",
"posted_dt": 1506951883637,
"submit_dt": 1507609800000,
"title": "Request for Information (RFI) # 306-18-0018",
"fname": "RODRI",
"email": "",
"desc": "dummy Text"
}
]
}
I want to get data from last 3 or 5 days I'm using this :
query = {
"bool": {
"must": [
{
"range" : {
"posted_dt" : {
"gte" : "now-3d/d",
"lt" : "now/d"
}
}
} ]
}
}
My mapping for posted_dt is :
"posted_dt": {
"type": "long"
},
I did try the filter as well but didn't succeed.
Please help.
Thanks
Randheer
Your mapping of "posted_dt" field is incorrect. You intend to store date which is in epoch in millis but you are storing it as long type. So the date range filter won't work on long datatype. Update your "posted_dt" field's mapping like :
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"posted_dt": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
}
Refer Date datatype in Elasticsearch.
First you need to share your mapping. Actually make sure that posted_dt and submit_dt are defined as date in your mapping. Here you are using a long which is incorrect to deal with dates.
A side note is that you should use filter instead of must in your case. Will be faster IMO.

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Aggregation over "LastUpdated" property or _timestamp

My Elasticsearch mapping looks like roughly like this:
{
"myIndex": {
"mappings": {
"myType": {
"_timestamp": {
"enabled": true,
"store": true
},
"properties": {
"LastUpdated": {
"type": "date",
"format": "dateOptionalTime"
}
/* lots of other properties */
}
}
}
}
}
So, _timestamp is enabled, and there's also a LastUpated property on every document. LastUpdated can have a different value than _timestamp: sometimes, documents get updated physically (e.g. updates to denormalized data) which updates _timestamp, but LastUpdated remains unchanged because the document hasn't actually been "updated" from a business perspective.
Also, there are many of documents without a LastUpdated value (mostly old data).
What I'd like to do is run an aggregation which counts the number of documents per calendar day (kindly ignore the fact that the dates need to be midnight-aligned, please). For every document, use LastUpdated if it's there, otherwise use _timestamp.
Here's what I've tried:
{
"aggregations": {
"counts": {
"terms": {
"script": "doc.LastUpdated == empty ? doc._timestamp : doc.LastUpdated"
}
}
}
}
The bucketization appears to work to some extent, but the keys in the result looks weird:
buckets: [
{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#7ba1f463doc_count: 300544
}{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#5a298acbdoc_count: 257222
}{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#6e451b5edoc_count: 101117
},
...
]
What's the proper way to run this aggregation and get meaningful keys (i.e. timestamps) in the result?
I've tested and made a groovy script for you,
POST index/type/_search
{
"aggs": {
"counts": {
"terms": {
"script": "ts=doc['_timestamp'].getValue();v=doc['LastUpdated'].getValue();rv=v?:ts;rv",
"lang": "groovy"
}
}
}
}
This returns the required result.
Hope this helps!! Thanks!!

Resources