Query hashmap structure with elasticsearch - elasticsearch

I have two questions regarding mapping and querying a java hashmap in elasticsearch.
Does this mapping make sense in elasticsearch (is it the correct way to map a hashmap)?:
{
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
Here is some example data:
{
"itemsMap": {
"2021-12-31": {
"itemVal1": 100.0,
"itemVal2": 150.0,
},
"2021-11-30": {
"itemVal1": 200.0,
"itemVal2": 50.0,
}
}
}
My queries don't seem to work. For example:
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-11-30"
}
}
]
}
}
}
}
}
Am I doing something wrong? How can I query such a structure? I have the possibility to change the mapping if it's necessary.
Thanks

TLDR;
The way you are uploading your data, nothing is stored in key.
You will have fields named 2021-11-30 ... and key is going to be empty.
Either you have a limited amount of "dates" and this is a viable options (less than 1000) else your format is not viable on the long run.
If you don't want to change your doc, here is the query
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "itemsMap.2021-12-31"
}
}
]
}
}
}
}
}
To understand
If you inspect the mapping by querying the index
GET /<index_name>/_mapping
You will see that the number of fields name after your date is going to grow.
And in all your doc, itemsMap.key is going to be empty. (this explain why my previous answer did not work.
A more viable option
Keep your mapping, update the shape of your docs.
They will look like
{
"itemsMap": [
{
"key": "2021-12-31",
"value": { "itemVal1": 100, "itemVal2": 150 }
},
{
"key": "2021-11-30",
"value": { "itemVal1": 200, "itemVal2": 50 }
}
]
}
DELETE /71525899
PUT /71525899/
{
"mappings": {
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2022-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-11-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-12-31"
}
}
]
}
}
}
}
}

Related

Is there is any way to iterate elastic array document like other programming language with script

Mapping
{
"supply": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "nested",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-ddTHH:mm:ss"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}}
Data
{"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}]}
query
{"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list.project_end_date",
"query": {
"script": {
"script": {
"lang": "groovy",
"inline": "import org.elasticsearch.common.logging.*;logger=ESLoggerFactory.getLogger('myscript');def ratable =false;logger.info(doc['rotation_list.project_end_date.end_date'].values)"
}
}
}
}
}
]
}
}
}}}
Log result
[INFO ][myscript] [1596758400000] [INFO ][myscript] [1591488000000] [INFO ][myscript] [1596758400000]
I am not sure why this is happning. Is there is any way to iterate like [1596758400000, 1591488000000] and [1596758400000].
Data is saved like this as well. I have mentioned in the mapping as well nested type. Not sure why this is returning like this. Is there is any way to iterate like original document i have indexed.
It's impossible to access a nested doc's nested neighbor in a script query due to the nature of nested whereby each (sub)document is treated as a separate document -- be it on the top level or within an array of objects like your rotation_list.project_end_date.
The only permissible situation of having access to the whole context of a nested field is within script_fields -- but you unfortunately cannot query by them -- only construct them on the fly & retrieve them:
Using your mapping from above
GET supply_nested/_search
{
"script_fields": {
"combined_end_dates": {
"script": {
"lang": "painless",
"source": "params['_source']['rotation_list'][0]['project_end_date']"
}
}
}
}
Iterating within a script query be possible only if rotation_list alone were nested but not project_end_date. Using 7.x here:
PUT supply_non_nested
{
"mappings": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "object",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}
}
Sync a doc:
POST supply_non_nested/_doc
{
"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}
]
}
Query using painless instead of groovy because it's more secure & less verbose in this case:
GET supply_non_nested/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list",
"query": {
"script": {
"script": {
"lang": "painless",
"inline": "Debug.explain(doc['rotation_list.project_end_date.end_date'])"
}
}
}
}
}
]
}
}
}
}
}
yielding
...
"reason": {
...
"to_string": "[2020-06-07T00:00:00.000Z, 2020-08-07T00:00:00.000Z]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Dates",
}
...
It's not exactly clear from your snippet what you were trying to achieve in the query. Can you elaborate?

Need help combining wildcard search with range query within elasticsearch?

I am trying to combine wildcard with date range in elastic search query but is not giving response based upon the wildcard search. It is returning response with items which have incorrect date range.
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
]
}
}
]
}
}
}
The index mapping looks as below:
{
"index_history": {
"mappings": {
"applications_datalake": {
"properties": {
"query": {
"properties": {
"term": {
"properties": {
"server": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
},
"index-data-type": {
"properties": {
"attributes": {
"properties": {
"wwnListForServer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"hostName": {
"type": "keyword"
},
"requestDate": {
"type": "date"
},
"requestedBy": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
You missed minimum_should_match parameter,
Check this out :
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html.
I think your query should looklike this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"hostName": "*abc*"
}
},
{
"range": {
"requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
],
"minimum_should_match" : 2
}
}
]
}
}
}
From the documentation :
You can use the minimum_should_match parameter to specify the number
or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or
filter clauses, the default value is 1. Otherwise, the default value
is 0.
According to your mappings, you have to call-out the fully qualified property for hostName and requestDate fields. Example:
"wildcard": {
"index-data-type.hostName": {
"value": "..."
}
}
Also, could also consider reducing your compound queries to just the main bool query, using the must clause, and apply a filter. Example:
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"wildcard": {
"index-data-type.hostName": {
"value": "*abc*"
}
}
}
],
"filter": {
"range": {
"index-data-type.requestDate": {
"gte": "2019-10-01T08:00:00.000Z"
}
}
}
}
}
}
The filter context doesn't contribute to the _score yet it reduces your number of hits.
Warnining:
Using the leading asterisk (*) on a wildcard query can have severe performance impacts to your queries.

Elasticsearch - Applying multi level filter on nested aggregation bucket?

I'm, trying to get distinct nested objects by applying multiple filters.
Basically in Elasticsearch I have cities as top level document and inside I have nested citizens documents, which have another nested pets documents.
I am trying to get all citizens that have certain conditions applied on all of these 3 levels (cities, citizens and pets):
Give me all distinct citizens
that have age:"40",
that have pets "name":"Casper",
from cities with office_type="secondary"
I know that to filter 1st level I can use query condition, and then if I need to filter the nested citizens I can add a filter in the aggregation level.
I am using this article as an example: https://iridakos.com/tutorials/2018/10/22/elasticsearch-bucket-aggregations.html
Query working so far:
GET city_offices/_search
{
"size" : 10,
"query": {
"term" : { "office_type" : "secondary" }
},
"aggs": {
"citizens": {
"nested": {
"path": "citizens"
},
"aggs": {
"inner_agg": {
"filter": {
"term": { "citizens.age": "40" }
} ,
"aggs": {
"occupations": {
"terms": {
"field": "citizens.occupation"
}
}
}
}
}
}
}
}
BUT: How can I add the "pets" nested filter condition?
Mapping:
PUT city_offices
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"city": {
"type": "keyword"
},
"office_type": {
"type": "keyword"
},
"citizens": {
"type": "nested",
"properties": {
"occupation": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"pets": {
"type": "nested",
"properties": {
"kind": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
}
}
}
}
}
}
}
}
}
Index data:
PUT /city_offices/doc/1
{
"city":"Athens",
"office_type":"secondary",
"citizens":[
{
"occupation":"Statistician",
"age":30,
"pets":[
{
"kind":"Cat",
"name":"Phoebe",
"age":14
}
]
},
{
"occupation":"Librarian",
"age":30,
"pets":[
{
"kind":"Rabbit",
"name":"Nino",
"age":13
}
]
},
{
"occupation":"Librarian",
"age":40,
"pets":[
{
"kind":"Rabbit",
"name":"Nino",
"age":13
}
]
},
{
"occupation":"Statistician",
"age":40,
"pets":[
{
"kind":"Rabbit",
"name":"Casper",
"age":2
},
{
"kind":"Rabbit",
"name":"Nino",
"age":13
},
{
"kind":"Dog",
"name":"Nino",
"age":15
}
]
}
]
}
So I found a solution for this.
Basically I apply top level filters in the query section and then apply rest of conditions in the aggregations.
First I apply citizens level filter aggregation, then I go inside nested pets and apply the filter and then I need to get back up to citizens level (using reverse_nested: citizens) and then set the term that will generate the final bucket.
Query looks like this:
GET city_offices/_search
{
"size" : 10,
"query": {
"term" : { "office_type" : "secondary" }
},
"aggs": {
"citizens": {
"nested": {
"path": "citizens"
},
"aggs": {
"inner": {
"filter": {
"term": { "citizens.age": "40" }
} ,
"aggs": {
"occupations": {
"nested": {
"path": "citizens.pets"
},
"aggs": {
"inner_pets": {
"filter": {
"term": { "citizens.pets.name": "Casper" }
} ,
"aggs": {
"lll": {
"reverse_nested": {
"path": "citizens"
},
"aggs": {
"xxx": {
"terms": {
"field": "citizens.occupation",
"size": 10
}
}
}
}
}
}
}
}
}
}
}
}
}
}
The response bucket looks like this:
"xxx": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Librarian",
"doc_count": 1
},
{
"key": "Statistician",
"doc_count": 1
}
]
}
Any other suggestions?

Elasticsearch query on data with multi level child

Given this sample data:
"users": {
"user1": {
"first": "john",
"last": "bellamy"
},
"user2": {
.....
.....
}
}
How can I set up elasticsearch to query/search on child first and last? Ohter tutorials only shows one level child, not this 2 or more level child.
I tried looking for a solution, and I guess that it has something to do with mapping option?
I just started elasticsearch few days ago, already manage to set up and adding data.
This works for me
{
"query": {
"bool": {
"must": [{
"term": {
"users.user2.firstname": {
"value": "sumit"
}
}
}]
}
}
}
nested users approach
mappings
{
"mappings": {
"test_type": {
"properties": {
"users": {
"type": "nested",
"properties": {
"firstname": {
"type": "text"
},
"lastname": {
"type": "text"
}
}
}
}
}
}
}
query
{
"query": {
"bool": {
"must": [{
"nested": {
"inner_hits": {},
"path": "users",
"query": {
"bool": {
"must": [{
"term": {
"users.firstname": {
"value": "ajay"
}
}
}]
}
}
}
}]
}
}
}

Elasticsearch: Querying a nested array

I have seen similar questions posted, but of course, none are exactly what I am trying to do.
When I run the query below I get this error:
"reason": "[nested] failed to find nested object under path [contentGroup]"
I think the problem is contentGroup.name does not exist because contentGroup is an array not an object. It needs to be something like this:
contentGroup[0].name
and
contentGroup[1].name
But I can't figure out how to do that.
Another thing that might be wrong is that I have two items nested within each other, I don't know if that is right or not.
Any help would be great!
My mapping:
{
"mappings": {
"articles": {
"properties": {
"contentGroups": {
"type": "nested",
"properties": {
"contentGroup": {
"type": "nested",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
}
What gets created when I input in an article ( Note the array being created ):
"contentGroups": {
"contentGroup": [
{
"name": "Breaking",
"id": "104"
},
{
"name": "News",
"id": "22"
}
]
My query:
{
"query": {
"bool": {
"must": [
{ "match": { "headline": "whatever" }},
{
"nested": {
"path": "contentGroup",
"query": {
"bool": {
"must": [
{ "match": { "contentGroup.name": "Breaking" }}
]
}
}
}
}
]
}
}
You should use simpler mapping:
{
"mappings": {
"articles": {
"properties": {
"contentGroups": {
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Each field in elasticsearch already supports multiple values, no need to specify this.

Resources