Elasticsearch connect range and term to same array item - elasticsearch

I have a user document with a field called experiences which is an array of objects, like:
{
"experiences": [
{
"end_date": "2017-03-02",
"is_valid": false
},
{
"end_date": "2015-03-02",
"is_valid": true
}
]
}
With this document I have to search users where end date is in last year and is_valid is true.
At this time I have a query -> bool and I add two must there, one range for the end_date and one term for the is_valid.
{
"query": {
"bool": {
"must": {
"term": {
"experiences.is_valid": true
},
"range": {
"experiences.end_date": {
"gte": "now-1y",
"lte": "now"
}
},
}
}
}
}
The result is that this user is selected because he has an end_date in the last year (the first exp.) and another exp. with is_valid true.
Of course this is not what I need, because I need that end_date and is_valid must be referenced to the same object, but how can we do this on Elasticsearch?
Mapping:
"experiences": {
"properties": {
"comment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"end_date": {
"type": "date"
},
"id": {
"type": "long"
},
"is_valid": {
"type": "boolean"
},
"start_date": {
"type": "date"
}
}
}

You need to change experiences type to Nested data type.
Then apply nested query :
{
"query": {
"nested": {
"path": "experiences",
"query": {
"bool": {
"must": [
{
"term": {
"experiences.is_valid": true
}
},
{
"range": {
"experiences.end_date": {
"gte": "now-1y",
"lte": "now"
}
}
}
]
}
}
}
}
}
This is due to the way arrays of objects are flattened in Elasticsearch.
Study more here

Related

Query hashmap structure with elasticsearch

I have two questions regarding mapping and querying a java hashmap in elasticsearch.
Does this mapping make sense in elasticsearch (is it the correct way to map a hashmap)?:
{
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
Here is some example data:
{
"itemsMap": {
"2021-12-31": {
"itemVal1": 100.0,
"itemVal2": 150.0,
},
"2021-11-30": {
"itemVal1": 200.0,
"itemVal2": 50.0,
}
}
}
My queries don't seem to work. For example:
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-11-30"
}
}
]
}
}
}
}
}
Am I doing something wrong? How can I query such a structure? I have the possibility to change the mapping if it's necessary.
Thanks
TLDR;
The way you are uploading your data, nothing is stored in key.
You will have fields named 2021-11-30 ... and key is going to be empty.
Either you have a limited amount of "dates" and this is a viable options (less than 1000) else your format is not viable on the long run.
If you don't want to change your doc, here is the query
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "itemsMap.2021-12-31"
}
}
]
}
}
}
}
}
To understand
If you inspect the mapping by querying the index
GET /<index_name>/_mapping
You will see that the number of fields name after your date is going to grow.
And in all your doc, itemsMap.key is going to be empty. (this explain why my previous answer did not work.
A more viable option
Keep your mapping, update the shape of your docs.
They will look like
{
"itemsMap": [
{
"key": "2021-12-31",
"value": { "itemVal1": 100, "itemVal2": 150 }
},
{
"key": "2021-11-30",
"value": { "itemVal1": 200, "itemVal2": 50 }
}
]
}
DELETE /71525899
PUT /71525899/
{
"mappings": {
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2022-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-11-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-12-31"
}
}
]
}
}
}
}
}

ElasticSearch: Query to get result only when all dates of a nested data less then present date

Mapping:
{"mappings": {
"supply": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "nested",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}}}
Below is the data
{"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-07-07"
},
{
"end_date": "2020-08-07"
}
],
"total_days": 25
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}]}
query
{"query": {
"bool": {
"filter": [
{
"nested": {
"path": "rotation_list.project_end_date",
"query": {
"range": {
"rotation_list.project_end_date.end_date": {
"lt": "now"
}
}
}
}
}
]
}}}
I just want to get the result, if only all the end_date of project_end_date section less then now. so in this example there are two project_end_date. project_end_date1 has two end date and project_end_date2 has 1 end date. So it should check base on the project_end_date section end date. Can any one help me on this.
If you see the below section:
"project_end_date": [
{
"end_date": "2020-07-07"
},
{
"end_date": "2020-08-07"
}
]
If i am comparing today date("2020-07-08") with the above date. One of the above date is not less then today date.So, It shouldn't give the result.

Need help combining wildcard search with range query within elasticsearch?

I am trying to combine wildcard with date range in elastic search query but is not giving response based upon the wildcard search. It is returning response with items which have incorrect date range.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
]
}
}
]
}
}
}
The index mapping looks as below:
{
"index_history": {
"mappings": {
"applications_datalake": {
"properties": {
"query": {
"properties": {
"term": {
"properties": {
"server": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
},
"index-data-type": {
"properties": {
"attributes": {
"properties": {
"wwnListForServer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"hostName": {
"type": "keyword"
},
"requestDate": {
"type": "date"
},
"requestedBy": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
You missed minimum_should_match parameter,
Check this out :
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html.
I think your query should looklike this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
],
"minimum_should_match" : 2
}
}
]
}
}
}
From the documentation :
You can use the minimum_should_match parameter to specify the number
or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
According to your mappings, you have to call-out the fully qualified property for hostName and requestDate fields. Example:
"wildcard": {
"index-data-type.hostName": {
"value": "..."
}
}
Also, could also consider reducing your compound queries to just the main bool query, using the must clause, and apply a filter. Example:
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"wildcard": {
"index-data-type.hostName": {
"value": "*abc*"
}
}
}
],
"filter": {
"range": {
"index-data-type.requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
}
}
}
The filter context doesn't contribute to the _score yet it reduces your number of hits.
Warnining:
Using the leading asterisk (*) on a wildcard query can have severe performance impacts to your queries.

Elastic Search query for an AND condition on two properties of a nested object

I have the post_filter as below, Where I am trying to filter records where the school name is HILL SCHOOL AND containing a nested child object with name JOY AND section A.
school is present in the parent object, Which is holding children list of nested objects.
All of the above are AND conditions.
But the query doesn't seem to work. Any idea why ? And is there a way to combine the two nested queries?
GET /test_school/_search
{
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must_not": [
{
"bool": {
"must": [
{
"term": {
"schoolname": {
"value": "HILL SCHOOL"
}
}
},
{
"nested": {
"path": "children",
"query": {
"bool": {
"must": [
{
"match": {
"name": "JACK"
}
}
]
}
}
}
},
{
"term": {
"children.section": {
"value": "A"
}
}
}
]
}
}
]
}
}
}
The schema is as below:
PUT /test_school
{
"mappings": {
"_doc": {
"properties": {
"schoolname": {
"type": "keyword"
},
"children": {
"type": "nested",
"properties": {
"name": {
"type": "keyword",
"index": true
},
"section": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
Sample data as below:
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JOY",
"section":"A"
}
}
second record
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JACK",
"section":"B"
}
}
https://stackoverflow.com/a/17543151/183217 suggests special mapping is needed to work with nested objects. You appear to be falling foul of the "cross object matching" problem.

Must not with and in elastic search

I have 4 fields in an elastic search schema.
date
status
type
createdAt
Now, I need to fetch all the rows where
date=today
status = "confirmed"
and where type is not equals to "def"
However, it is ok if
type=def exists
but only when the field createdAt is not equals to today.
My current query looks like this:
{
must: [
{ "bool":
{
"must": [
{"term": {"date": 'now/d'}},
{"term": {"status": 'confirmed'}},
]
}
}
],
mustNot: [
{"match": {'createdAt': 'now/d'}},
{"match":{"type": "def"}}
]
}
The rows where type is not equals to "def" are fetched.
However, if a row has the type=def AND createdAT any date but today, then the row doesn't show up.
What am I doing wrong?
This query should work.
{
"query": {
"bool": {
"must": [
{ "term": {"date": "now/d" } },
{ "term": {"status": "confirmed" } }
],
"must_not": {
"bool": {
"must": [
{ "match": { "createdAt": "now/d" } },
{ "match": { "type": "def" } }
]
}
}
}
}
}
I believe the reason that your version is not working is that every query in the must_not must not match.
https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html#_controlling_precision
All the must clauses must match, and all the must_not clauses must not match, but how many should clauses should match? By default, none of the should clauses are required to match, with one exception: if there are no must clauses, then at least one should clause must match.
Assuming a setup like this:
PUT twitter
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"_doc": {
"properties": {
"date": {
"type": "date",
"format": "epoch_millis"
},
"createdAt": {
"type": "date",
"format": "epoch_millis"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
}
}
}
}
}
and a sample doc like this (adjust values to test the query):
post twitter/_doc/1
{
"date": 1536562800000, //start of TODAY September 10, 2018 in UTC
"createdAt": 1536562799999,
"status": "confirmed",
"type": "def"
}
the following query should work:
get twitter/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now/d",
"lte": "now/d"
}
}
},
{
"term": {
"status": "confirmed"
}
}
],
"must_not": [
{
"range": {
"createdAt": {
"gte": "now/d",
"lte": "now/d"
}
}
},
{
"term": {
"type": "def"
}
}
]
}
}
}
}
}
This is a filtered query which i think for this scenario is better because it doesn't calculate the score. If you do want to calculate the score, just remove the bool and the filter from the top.

Resources