Elasticsearch: Aggregation on filtered nested objects to find unique values - elasticsearch

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.

You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}

Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

Related

ELASTICSEARCH - Ordering aggregation by date on nested field

I am developing a query where I count how many unique "cp" the most recent document contains.
The json is made up of several nested fields.
I am having trouble showing only the json value with the most recent date when I add to a json with nested fields.
I have done nested aggregations, and finally I have used top_hits filter to sort in descending order, and it returns me the last one through the size.
But still it is returning all the documents with different dates.
JSON:
"data" : [
{
"addresses" : [
{
"cp" : "33.33.33",
"services" : [
{
"field1" : "true",
"field2" : "1234",
}
]
}
],
}
],
"created_at" : "2020-09-03 14:39:01",
"#timestamp" : "2020-09-04T05:53:22.341661Z",
}
},
QUERY:
{"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
},
"range":{
"top_hits": {
"size": 1,
"sort": [
{"created_at.keyword": {"order": "desc"}}]
}
}
}
}
}
}
}
}
}
I have tried sorting by the predefined field "created_at" or with #timestamp, but the result is the same.
Any advice that can help me to solve my problem?
For this case the solution is to add
"order": {
"_key": "desc":
instead of top_hits.
QUERY
{"size": 0,
"aggs": {
"filtrofecha": {
"terms": {
"field": "created_at.keyword",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"nested": {
"nested": {
"path": "data.addresses.services"
},
"aggs": {
"filter": {
"filter": {
"term": {
"data.addresses.services.field1.keyword": "true"
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "data.addresses.services.field2.keyword"
}
}
}
}
}
}
}
}
}
}
}

Aggregations on an array in a nested query

I am trying to query for all Users that have at least one color in common with a particular User and I have been able to do that however I am unable to figure out how to aggregate my results so that I can get a the user along with the colors that they have in common.
Part of my document for a sample user is as follows:
{
// ... other fields
"colors" : [
{
"id" : 1,
"name" : "Green"
},
{
"id" : 7,
"name" : "Blue"
}
]
}
This is my query for getting the colors in common with another User that has the colors Red, Orange and Green:
{
"query": {
"nested": {
"path": "colors",
"scoreMode": "sum",
"query": {
"function_score": {
"filter": {
"terms": {
"colors.name": [
"Red","Orange","Green"
]
}
},
"functions": [
// Functions here for custom scoring
]
}
}
}
}
}
How can I aggregate the Users with the colors in common?
You need to use nested aggregation, then apply filter aggregation for colors and finally use top hits to get the matching colors. I am using source filtering to get only color value
This is the query
{
"size": 0,
"query": {
"nested": {
"path": "colors",
"query": {
"terms": {
"colors.color": [
"green",
"red"
]
}
}
}
},
"aggs": {
"user": {
"terms": { <----get users with unique name or user_id
"field": "name",
"size": 10
},
"aggs": {
"nested_color_path": { <---go inside nested documents
"nested": {
"path": "colors"
},
"aggs": {
"match_color": {
"filter": { <--- use the filter to match for colors
"terms": {
"colors.color": [
"green",
"red"
]
}
},
"aggs": {
"get_match_color": { <--- use this to get matched color
"top_hits": {
"size": 10,
"_source": {
"include": "name"
}
}
}
}
}
}
}
}
}
}
}
You have to use nested aggregations to achieve this. See the query below:
POST <index>/<type>/_search
{
"query": {
"nested": {
"path": "colors",
"query": {
"terms": {
"colors.name": [
"Red",
"Orange",
"Green"
]
}
}
}
},
"aggs": {
"users_with_common_colors": {
"terms": {
"field": "user_id",
"size": 0,
"order": {
"color_distribution>common": "desc" <-- This will sort the users in descending order of number of common colors
}
},
"aggs": {
"color_distribution": {
"nested": {
"path": "colors"
},
"aggs": {
"common": {
"filter": {
"terms": {
"colors.name": [
"Red",
"Orange",
"Green"
]
}
},
"aggs": {
"colors": {
"terms": {
"field": "colors.name",
"size": 0
}
}
}
}
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Using aggregation with filters in elastic search

I have an elastic search running with documents like this one:
{
id: 1,
price: 620000,
propertyType: "HO",
location: {
lat: 51.41999,
lon: -0.14426
},
active: true,
rentOrSale: "S",
}
I'm trying to use aggregates to get statistics about a certain area using aggregations and the query I'm using is the following:
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"must": [
{
"term": {
"rentOrSale": "s"
}
},
{
"term": {
"active": true
}
}
]
},
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "15.0mi",
"location": {
"lat": 51.50735,
"lon": -0.12776
}
}
}
]
}
}
},
"aggs": {
"propertytype_agg": {
"terms": {
"field": "propertyType"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
},
"bed_agg": {
"terms": {
"field": "numberOfBedrooms"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
But in the result I can't see the aggregations. As soon as I remove either the bool or filtered part of the query I can see the aggregations. I can't figure out why this is happening, nor how do I get the aggregations for these filters. I've tried using the answer to this question but I've not been able to solve it. Any ideas?
I think your query need to be slightly re-arranged - move the "filtered" further up and repeat the "query" command:
"query": {
"filtered": {
"query" : {
"bool": {
...
}
},
"filter": {
...
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources