Elasticsearch aggregations: how to get bucket with 'other' results of terms aggregation? - elasticsearch

I use aggregation to collect data from nested field and stuck a little
Example of document:
{
...
rectangle: {
attributes: [
{_id: 'some_id', ...}
]
}
ES allows group data by rectangle.attributes._id, but is there any way to get some 'other' bucket to put there documents that were not added to any of groups? Or maybe there is a way to create query to create bucket for documents by {"rectangle.attributes._id": {$ne: "{currentDoc}.rectangle.attributes._id"}}
I think bucket would be perfect because i need to do further aggregations with 'other' docs.
Or maybe there's some cool workaround
I use query like this for aggregation
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword"
}
}
}
}
}
And get this result
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 27616,
"attributes" : {
"doc_count" : 45,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45,
"attributeOptionsCount" : {
"value" : 2
}
}
]
}
}
}
]
result like this would be perfect:
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 1000,
"attributes" : {
"doc_count" : 145,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45
},
{
"key" : "other",
"doc_count" : 100
}
]
}
}
}
]

You can make use of missing value parameter. Update aggregation as below:
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword",
"missing": "other"
}
}
}
}
}

Related

Elasticsearch How to count total docs by date

As my theme, I wanna count docs the day and before by date, it's sample to understand that the chart.
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"2","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.24","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-24T12:00:00Z"}
{"index":{"_index":"login-2015.12.25","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-25T12:00:00Z"}
As you see, index login-2015.12.23 has two docs, index login-2015.12.24 has one doc, index login-2015.12.23 has one doc.
And now I wanna get the result
{
"hits" : {
"total" : 6282,
"max_score" : 1.0,
"hits" : []
},
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 2,
},
{
"key_as_string" : "2015-12-24T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 3,
},
{
"key_as_string" : "2015-12-25T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 4,
}
]
}
If I count the date 2015-12-24T12:00:00Z and it means I must count day 2015-12-23T12:00:00Z and 2015-12-24T12:00:00Z at the same time.
In my project I have many indices like that, and I searching many ways to make this goal come true but not, this is my demo:
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"intersect": {
"scripted_metric": {
"init_script": "state.inner=[]",
"map_script": "state.inner.add(params.param1 == 3 ? params.param2 * params.param1 : params.param1 * params.param2)",
"combine_script": "return state.inner",
"reduce_script": "return states",
"params": {
"param1": 3,
"param2": 5
}
}
}
}
}
}
}
I wanna group by date, and use scripted_metric to iter the date list, not the second iteration just can in its bucket and not for all the document, so do anyone has better idea to solve this problem?
You can simply use the cumulative sum pipeline aggregation
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "login_time",
"interval": "day"
},
"aggs": {
"cumulative_docs": {
"cumulative_sum": {
"buckets_path": "_count"
}
}
}
}
}
}
And the results will look like this:
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T00:00:00.000Z",
"key" : 1450828800000,
"doc_count" : 2,
"cumulative_docs" : {
"value" : 2.0
}
},
{
"key_as_string" : "2015-12-24T00:00:00.000Z",
"key" : 1450915200000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 3.0
}
},
{
"key_as_string" : "2015-12-25T00:00:00.000Z",
"key" : 1451001600000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 4.0
}
}
]
}
}

Nested Aggregation for AND Query Not Working

Please can someone help with the below Question.
https://discuss.elastic.co/t/nested-aggregation-with-and-always-return-0-match/315722?u=chattes
I have used following aggregations
1. Terms aggregation
2. Bucket selector
3. Nested aggregation
First I have grouped by user id using terms aggregation. Then further grouped by skill Id. Using bucket selector I have filtered users which have documents under two skills.
Query
GET index5/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"skills": {
"nested": {
"path": "skills"
},
"aggs": {
"filter_skill": {
"terms": {
"field": "skills.id",
"size": 10,
"include": [
553,
426
]
}
}
}
},
"bucket_count": {
"bucket_selector": {
"buckets_path": {
"skill_count": "skills>filter_skill._bucket_count"
},
"script": "params.skill_count ==2"
}
}
}
}
}
}
Results
"aggregations" : {
"users" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"skills" : {
"doc_count" : 3,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
},
{
"key" : 2,
"doc_count" : 1,
"skills" : {
"doc_count" : 2,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
}
]
}

ELASTICSEARCH - Total doc_count aggregations

I am looking for a way to sum up the total of an aggregation that I have defined in the query.
For example:
{
"name" : false,
"surname" : false
},
{
"name" : false,
"surname" : false
}
Query:
GET index/_search?size=0
{"query": {
"bool": {
"must": [
{"term": {"name": false}},
{"term": {"surname": false}}
]
}
},
"aggs": {
"name": {
"terms": {
"field": "name"
}
},
"surname": {
"terms": {
"field": "surname"
}
}
}
}
The query returns the value for each field "name" and "surname" with value "false".
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 2 <---------
}
]
},
"surname" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 2 <---------
}
]
}
}
}
Is it possible to return the total sum of doc_count, so that in this situation it would be "doc_count" : 2 + "doc_count" : 2 == 4?
I've been trying to do it with script but since they are boolean values it doesn't work.
The functionality that most closely resembles the solution I am looking for is sum_bucket.
GET index/_search?filter_path=aggregations
{
"aggs": {
"surname_field": {
"terms": {
"field": "surname",
"size": 1
}
},
"sum": {
"sum_bucket" : {
"buckets_path":"surname_field>_count"
}
}
}
}
For this specific case where it is a simple JSON, the result of the query is the same as the hits.total.value (number of documents) with filtering to boolean field surname:false or name:false.
But for situations with Json with more fields we can specify the number of times we have a result in our database.
With this result I wanted to find the total number of hits and not the number of documents in the result.

Elasticsearch aggregation on different search in same query

I want to make a query to aggregate base only on match no matter what other parameters(terms , term , etc...) are used.
To be more specific I have an online shop where I use multiple filters (color ,size etc..) If I check a field for example color : red the other colors are no longer aggregated.
A solution that I am using is to make 2 separated queries (one for search where filters are applied and other for aggregation. Any idea how can I combine the 2 separated queries ?
You can take advantage of post_filter which will not apply to your aggregations but will only filter the to-be-returned hits. For example:
Create a shop
PUT online_shop
{
"mappings": {
"properties": {
"color": {
"type": "keyword"
},
"size": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Populate it w/ a few products
POST online_shop/_doc
{"color":"red","size":35,"name":"Louboutin High heels abc"}
POST online_shop/_doc
{"color":"black","size":34,"name":"Louboutin Boots abc"}
POST online_shop/_doc
{"color":"yellow","size":36,"name":"XYZ abc"}
Apply a shared query to the hits as well as aggregations and use post_filter to ... post-filter the hits:
GET online_shop/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
}
]
}
},
"aggs": {
"by_color": {
"terms": {
"field": "color"
}
},
"by_size": {
"terms": {
"field": "size"
}
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"color": {
"value": "red"
}
}
}
]
}
}
}
Expected result
{
...
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.11750763,
"hits" : [
{
"_index" : "online_shop",
"_type" : "_doc",
"_id" : "cehma3IBG_KW3EFn1QYa",
"_score" : 0.11750763,
"_source" : {
"color" : "red",
"size" : 35,
"name" : "Louboutin High heels abc"
}
}
]
},
"aggregations" : {
"by_color" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "black",
"doc_count" : 1
},
{
"key" : "red",
"doc_count" : 1
},
{
"key" : "yellow",
"doc_count" : 1
}
]
},
"by_size" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 34,
"doc_count" : 1
},
{
"key" : 35,
"doc_count" : 1
},
{
"key" : 36,
"doc_count" : 1
}
]
}
}
}

Can I get a clean data structure with aggregations

I'm trying to create an aggregation but the results are bloated with metadata and not fits my use case.
This is my aggregation definition;
"aggs": {
"attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"facet_name": {
"terms": {
"field": "attributes.name.keyword"
},
"aggs": {
"facet_value": {
"terms": {
"field": "attributes.value.keyword"
}
}
}
}
}
}
},
I try to get a data structure similar to this;
[{
"name": "Materiał",
"values": ["stal", "drewno"...]
},
{
"name": "Kolor",
"values": ["czarny", "kolorowy"...]
]
Instead of this results set below which is the current aggregation response;
"aggregations" : {
"attributes" : {
"doc_count" : 142307,
"facet_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 38074,
"buckets" : [
{
"key" : "Materiał",
"doc_count" : 21811,
"facet_value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 4977,
"buckets" : [
{
"key" : "stal",
"doc_count" : 3141
},
{
"key" : "drewno",
"doc_count" : 2944
},
{
"key" : "szkło",
"doc_count" : 2885
},
{
"key" : "tworzywo sztuczne",
"doc_count" : 1529
},
{
"key" : "metal",
"doc_count" : 1303
},
...
This is the closest result that I could get.
I couldn't find how to restructure the resulting object or remove the metadata from aggregations.
Unfortuanetly you can not change the structure of the response body to fulfill your desired result. This is just how the Elasticsearch REST API is implemented.
You would have to iterate over the buckets array and create your own structure/object by extracting the particular values.

Resources