ELASTICSEARCH - Ordering aggregation by date on nested field - elasticsearch

I am developing a query where I count how many unique "cp" the most recent document contains.
The json is made up of several nested fields.
I am having trouble showing only the json value with the most recent date when I add to a json with nested fields.
I have done nested aggregations, and finally I have used top_hits filter to sort in descending order, and it returns me the last one through the size.
But still it is returning all the documents with different dates.
JSON:
"data" : [
{
"addresses" : [
{
"cp" : "33.33.33",
"services" : [
{
"field1" : "true",
"field2" : "1234",
}
]
}
],
}
],
"created_at" : "2020-09-03 14:39:01",
"#timestamp" : "2020-09-04T05:53:22.341661Z",
}
},
QUERY:
{"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
},
"range":{
"top_hits": {
"size": 1,
"sort": [
{"created_at.keyword": {"order": "desc"}}]
}
}
}
}
}
}
}
}
}
I have tried sorting by the predefined field "created_at" or with #timestamp, but the result is the same.
Any advice that can help me to solve my problem?

For this case the solution is to add
"order": {
"_key": "desc":
instead of top_hits.
QUERY
{"size": 0,
"aggs": {
"filtrofecha": {
"terms": {
"field": "created_at.keyword",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
}
}
}
}
}
}
}
}
}
}

Related

How to get the latest record from each unique value of key

How to get the latest record from each unique value of key combined (in the example which is “combined.keyword”)
I can see the buckets in aggregations, but also wanted a way to get the most recent record for each bucket.
Here is my query:
GET /new_csvindex/_search?pretty
{
"size" : 1,
"query": {
"bool" : {
"must_not":[
{"term": {"combined.keyword" : "combined"}}
]
}
},
"sort": [
{ "#timestamp": { "order": "desc" }}
],
"aggs" : {
"get_the_latest_record_from_each_bucket" : {
"terms" : { "field" : "combined.keyword", "exclude": [ "combined"]}
}
}
}
You are probably looking for top_hits aggregation. Use it as below:
{
"size": 1,
"query": {
"bool": {
"must_not": [
{
"term": {
"combined.keyword": "combined"
}
}
]
}
},
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"aggs": {
"get_the_latest_record_from_each_bucket": {
"terms": {
"field": "combined.keyword",
"exclude": [
"combined"
]
},
"aggs": {
"latest": {
"top_hit": {
"sort": {
"#timestamp": "desc"
},
"size": 1
}
}
}
}
}
}

Elasticsearch: Aggregation on filtered nested objects to find unique values

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

Aggregations on an array in a nested query

I am trying to query for all Users that have at least one color in common with a particular User and I have been able to do that however I am unable to figure out how to aggregate my results so that I can get a the user along with the colors that they have in common.
Part of my document for a sample user is as follows:
{
// ... other fields
"colors" : [
{
"id" : 1,
"name" : "Green"
},
{
"id" : 7,
"name" : "Blue"
}
]
}
This is my query for getting the colors in common with another User that has the colors Red, Orange and Green:
{
"query": {
"nested": {
"path": "colors",
"scoreMode": "sum",
"query": {
"function_score": {
"filter": {
"terms": {
"colors.name": [
"Red","Orange","Green"
]
}
},
"functions": [
// Functions here for custom scoring
]
}
}
}
}
}
How can I aggregate the Users with the colors in common?
You need to use nested aggregation, then apply filter aggregation for colors and finally use top hits to get the matching colors. I am using source filtering to get only color value
This is the query
{
"size": 0,
"query": {
"nested": {
"path": "colors",
"query": {
"terms": {
"colors.color": [
"green",
"red"
]
}
}
}
},
"aggs": {
"user": {
"terms": { <----get users with unique name or user_id
"field": "name",
"size": 10
},
"aggs": {
"nested_color_path": { <---go inside nested documents
"nested": {
"path": "colors"
},
"aggs": {
"match_color": {
"filter": { <--- use the filter to match for colors
"terms": {
"colors.color": [
"green",
"red"
]
}
},
"aggs": {
"get_match_color": { <--- use this to get matched color
"top_hits": {
"size": 10,
"_source": {
"include": "name"
}
}
}
}
}
}
}
}
}
}
}
You have to use nested aggregations to achieve this. See the query below:
POST <index>/<type>/_search
{
"query": {
"nested": {
"path": "colors",
"query": {
"terms": {
"colors.name": [
"Red",
"Orange",
"Green"
]
}
}
}
},
"aggs": {
"users_with_common_colors": {
"terms": {
"field": "user_id",
"size": 0,
"order": {
"color_distribution>common": "desc" <-- This will sort the users in descending order of number of common colors
}
},
"aggs": {
"color_distribution": {
"nested": {
"path": "colors"
},
"aggs": {
"common": {
"filter": {
"terms": {
"colors.name": [
"Red",
"Orange",
"Green"
]
}
},
"aggs": {
"colors": {
"terms": {
"field": "colors.name",
"size": 0
}
}
}
}
}
}
}
}
}
}

Is it possible to perform elasticsearch nested stats aggregation on a field defined by the parent aggregation?

I'm trying to do a query to generate a plot. My data index looks like this:
"mappings": {
"mydata": {
"properties": {
"type": { "type": "string", "index": "not_analyzed" },
"stamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"data": { "type": "object" }
}
}
Depending on the type, the data field will contain different objects, e.g.,
temperature_data = {
"type": "temperature",
"stamp": "2015-11-01T15:25:19.123",
"data": {"temperature": 23.4, "variance": 0.0}
}
humidity_data = {
"type": "humidity",
"stamp": "2015-11-01T15:26:21.063",
"data": {"humidity": 75.1, "variance": 0.0}
}
I'm trying to aggregate the data on buckets depending on their type, and then perform a date histogram to get the stats of each reading (temperature, humidity). My problem is how to set the field on the stats aggs since it changes with the type (for "type": "temperature" the field is data.temperature for example):
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data."+field???
}
}
}
}
}
}
}
}
* UPDATE *
As suggested I added a data-type.groovy file to config/scripts/, the file contains the following:
return doc['data.temperature'].value
Elasticsearch is able to compile the script:
[2015-11-02 19:50:32,651][INFO ][script] [Atum] compiling script file [/home/user/elasticsearch-1.7.0/config/scripts/data-type.groovy]
I updated the query to load the script file:
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": {"file": "data-type"}
}
}
}
}
}
}
}
}
When I run the query I get the following output:
{u'status': 400, u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], ... Parse Failure [Unexpected token START_OBJECT in [point_stats].]]; }]'}
There's only temperature data in the database, if I change "script": {"file": "data-type"} for "field": "data.temperature" the query works.
One option is to rename the humidity and temperature fields to something identical, like value, so you can simply aggregate on that field and you're good. You'd already know what kind of value it is since you know it from the type field.
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data.value"
}
}
}
}
}
}
}
The second option is to use a script but that'd be less performant and less scalable if you were to add more type of data (pressure, etc)
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": "return doc.type.value == 'temperature' ? doc['data.temperature'].value : doc['data.humidity'].value"
}
}
}
}
}
}
}
Note that for this second option you need to enable dynamic scripting

Nested Aggregation Elasticsearch

I'm trying to build a nested aggregation in elasticsearch but it keeps giving errors. It says "cannot find agg type tags". How can I fix it. Thank you for your helps.Btw I don't have nested documents I have one document having 180 fields. Can I apply this aggregation? Here is my code:
{
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs" : {
"red_products": {
"filter": {
"not": {
"terms": {
"text": [
"06melihgokcek",
"t.co","??","????","???"
]
}
}
},
"aggs": {
"top_docs": {
"terms": {
"field": "text",
"size": 50
}
},
"aggs" : {
"tags" : {
"terms" : {
"field" : "text",
"include" : ".*avni.*",
"exclude" : "fuat_.*"
}
}
}
}
}
}
}}}
Your innermost aggs (the one called tags at the bottom) is misplaced and should be a child element of top_docs.
{
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs": {
"red_products": {
"filter": {
"not": {
"terms": {
"text": [
"06melihgokcek",
"t.co",
"??",
"????",
"???"
]
}
}
},
"aggs": {
"top_docs": {
"terms": {
"field": "text",
"size": 50
},
"aggs": { <---- this was the misplaced aggs
"tags": {
"terms": {
"field": "text",
"include": ".*avni.*",
"exclude": "fuat_.*"
}
}
}
}
}
}
}
}
}
}

Resources