Elasticsearch bucketing and add-to-list - elasticsearch

How to go about bucketing on a field and then aggregating all the values of a different field into an array. Here's a sample list.
{
"product": "xyz",
"action": "add",
"user": "bob"
},
{
"product": "xyz",
"action": "update",
"user": "bob"
},
{
"product": "xyz",
"action": "add",
"user": "alice"
},
{
"product": "xyz",
"action": "add",
"user": "eve"
},
{
"product": "xyz",
"action": "delete",
"user": "eve"
}
Expected output:
{
"buckets": [
{
"key": "add",
"doc_count": 3,
"user": ["bob", "alice", "eve"]
},
{
"key": "update",
"doc_count": 1,
"user": ["bob"]
},
{
"key": "delete",
"doc_count": 1,
"user": ["eve"]
}
]
}
How to push user values to an array in each bucket? Is there something similar to mongodb $push or $addToFields in elastic aggregation? Appreciate the help.
Here's the work-in-progress aggregation.
{
"size": 0,
"aggs": {
"product_filter": {
"filter": {
"term": {
"product": "xyz"
}
},
"aggs": {
"group_by_action": {
"terms": {
"field": "action",
"size":1000,
"order": {
"_count": "desc"
}
}
}
}
}
}
}

Would this do? I just added chained one more Terms Aggregation as mentioned below:
Aggregation Query:
POST <your_index_name>
{
"size": 0,
"aggs": {
"product_filter": {
"filter": {
"term": {
"product": "xyz"
}
},
"aggs": {
"group_by_action": {
"terms": {
"field": "action",
"size":1000,
"order": {
"_count": "desc"
}
},
"aggs": {
"myUsers": {
"terms": {
"field": "user",
"size": 10
}
}
}
}
}
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"product_filter" : {
"doc_count" : 5,
"group_by_action" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "add",
"doc_count" : 3,
"myUsers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "alice",
"doc_count" : 1
},
{
"key" : "bob",
"doc_count" : 1
},
{
"key" : "eve",
"doc_count" : 1
}
]
}
},
{
"key" : "delete",
"doc_count" : 1,
"myUsers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "eve",
"doc_count" : 1
}
]
}
},
{
"key" : "update",
"doc_count" : 1,
"myUsers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "bob",
"doc_count" : 1
}
]
}
}
]
}
}
}
}
I'm not sure if it is possible to have them in a single list as you've mentioned.
Hope this helps!

Related

is it possible es aggreation?

I have a question about aggregation.
I want to do aggregation for a field declared as an object array.
It is not aggregation for each element, but aggregation for the whole value.
I have following documents:
PUT value-list-index
{
"mappings": {
"properties": {
"server": {
"type": "keyword"
},
"users": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
PUT value-list-index/_doc/1
{
"server": "server1",
"users": ["user1"]
}
PUT value-list-index/_doc/2
{
"server": "server2",
"users": ["user1","user2"]
}
PUT value-list-index/_doc/3
{
"server": "server3",
"users": ["user2", "user3"]
}
PUT value-list-index/_doc/4
{
"server": "server4",
"users": ["user1","user2", "user3","user4"]
}
PUT value-list-index/_doc/5
{
"server": "server5",
"users": ["user2", "user3","user4"]
}
PUT value-list-index/_doc/6
{
"server": "server6",
"users": ["user3","user4"]
}
PUT value-list-index/_doc/7
{
"server": "server7",
"users": ["user1","user2", "user3","user4"]
}
PUT value-list-index/_doc/8
{
"server": "server8",
"users": ["user1","user2", "user3","user4"]
}
PUT value-list-index/_doc/9
{
"server": "server9",
"users": ["user1","user2", "user3","user4"]
}
get value-list-index/_search
{
"size" : 0,
"aggs": {
"words": {
"terms": {
"field": "users"
},
"aggs": {
"total": {
"value_count": {
"field": "users"
}
}
}
}
}
}
i want following
"aggregations" : {
"words" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
**"key" : "user1",
"doc_count" : 1,**
"total" : {
"value" : xx
}
},
{
**"key" : "user1","user2",
"doc_count" : 1,**
"total" : {
"value" : xx
}
},
{
"key" : "user1","user2","user3","user4",
"doc_count" : 4,
"total" : {
"value" : xx
}
}
]
}
}
but return each element grouping result like this
"aggregations" : {
"words" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user2",
"doc_count" : 7,
"total" : {
"value" : 23
}
},
{
"key" : "user3",
"doc_count" : 7,
"total" : {
"value" : 23
}
},
{
"key" : "user1",
"doc_count" : 6,
"total" : {
"value" : 19
}
},
{
"key" : "user4",
"doc_count" : 6,
"total" : {
"value" : 21
}
}
]
}
}
Is the aggregation I want possible?
Maybe this aggs can help you: Frequent items aggregation
But be careful with the performance.
Look this results:
"aggregations": {
"words": {
"buckets": [
{
"key": {
"users": [
"user2"
]
},
"doc_count": 7,
"support": 0.7777777777777778
},
{
"key": {
"users": [
"user2",
"user3"
]
},
"doc_count": 6,
"support": 0.6666666666666666
},
{
"key": {
"users": [
"user3",
"user4"
]
},
"doc_count": 6,
"support": 0.6666666666666666
},
{
"key": {
"users": [
"user1"
]
},
"doc_count": 6,
"support": 0.6666666666666666
},
{
"key": {
"users": [
"user2",
"user3",
"user4"
]
},
"doc_count": 5,
"support": 0.5555555555555556
},
{
"key": {
"users": [
"user2",
"user1"
]
},
"doc_count": 5,
"support": 0.5555555555555556
},
{
"key": {
"users": [
"user2",
"user3",
"user4",
"user1"
]
},
"doc_count": 4,
"support": 0.4444444444444444
}
]
}
}

multiple field aggregation on documents with multiple elements gives unexpected result

I have documents with the following structure (very much simplified for the example):
"documents": [
{
"name": "Document 1",
"collections" : [
{
"id": 30,
"title" : "Research"
},
{
"id": 45,
"title" : "Events"
},
{
"id" : 52,
"title" : "International"
}
]
},
{
"name": "Document 2",
"collections" : [
{
"id": 45,
"title" : "Events"
},
{
"id" : 63,
"title" : "Development"
}
]
}
]
I want an aggregation of the collection. It works fine when I do it like this:
"aggs": {
"collections": {
"terms": {
"field": "collections.title",
"size": 30
}
}
}
I get a nice result as expected:
"aggregations" : {
"collections" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Research",
"doc_count" : 18
},
{
"key" : "Events",
"doc_count" : 14
},
{
"key" : "International",
"doc_count" : 13
},
{
"key" : "Development",
"doc_count" : 8
}
]
}
}
However, I want the id included as well. So I tried this:
"aggs": {
"collections": {
"terms": {
"field": "collections.title",
"size": 30
}
},
"aggs": {
"id": {
"terms": {
"field": "collections.id",
"size": 1
}
}
}
}
This is the result:
"aggregations" : {
"collections" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Research",
"doc_count" : 18,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "30",
"doc_count" : 1
}
]
}
},
{
"key" : "Events",
"doc_count" : 14,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "45",
"doc_count" : 1
}
]
}
},
{
"key" : "International",
"doc_count" : 13,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "52",
"doc_count" : 1
}
]
}
},
{
"key" : "Development",
"doc_count" : 8,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "45",
"doc_count" : 1
}
]
}
}
]
}
}
At glance it looks good. But at a closer look the at the last element with Development (scroll down). The id should be 63, but is 45.
I have vague idea why this is, but I cannot find a solution for it. I also tried the multi_terms, but it gives a similar result. I think the issue has to do with the fact there are multiple collections within the document.
Does anyone know the correct solution to solve this issue?
The reason is in an object type mapping there is no relation between "title" and "id" , everything is flatenned by Elasticsearch under the hood, so:
"collections" : [
{
"id": 30,
"title" : "Research"
},
{
"id": 45,
"title" : "Events"
},
{
"id" : 52,
"title" : "International"
}
]
Becomes:
"collections.id": [30,45,52],
"collections.title": [Research, Events, International]
Elasticsearch doesn't know id 30 belongs to Research, or id 45 to Events.
You must use "nested" type to keep the relation between nested properties.
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
Solution: Use nested field type
Mappings
PUT test_nestedaggs
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"collections": {
"type": "nested",
"properties": {
"title": {
"type": "keyword"
},
"id": {
"type": "keyword"
}
}
}
}
}
}
Documents
POST test_nestedaggs/_doc
{
"name": "Document 1",
"collections": [
{
"id": 30,
"title": "Research"
},
{
"id": 45,
"title": "Events"
},
{
"id": 52,
"title": "International"
}
]
}
POST test_nestedaggs/_doc
{
"name": "Document 2",
"collections": [
{
"id": 45,
"title": "Events"
},
{
"id": 63,
"title": "Development"
}
]
}
Query
POST test_nestedaggs/_search?size=0
{
"aggs": {
"nested_collections": {
"nested": {
"path": "collections"
},
"aggs": {
"collections": {
"terms": {
"field": "collections.title"
},
"aggs": {
"ids": {
"terms": {
"field": "collections.id"
}
}
}
}
}
}
}
}
Results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"nested_collections": {
"doc_count": 5,
"collections": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Events",
"doc_count": 2,
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "45",
"doc_count": 2
}
]
}
},
{
"key": "Development",
"doc_count": 1,
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "63",
"doc_count": 1
}
]
}
},
{
"key": "International",
"doc_count": 1,
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "52",
"doc_count": 1
}
]
}
},
{
"key": "Research",
"doc_count": 1,
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "30",
"doc_count": 1
}
]
}
}
]
}
}
}
}
You can read an article I wrote about that for details:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
NOTE: If the number of child documents is too big and you are doing a lot of updates, consider changing the data model because each child document is an independent document in the index, and on each update on a child document the whole structure will reindex and that may affect the performance, there are also limits in the maximum of nested documents you can add. If the number is small like the example then it's fine.

Concatenating fields in OpenSearch / ElasticSearch aggregate

I have an OpenSearch index with the following mapping (simplified):
PUT /house
{
"mappings": {
"properties": {
"house": { "type": "keyword" },
"people": {
"type": "nested",
"properties": {
"forename": { "type": "keyword" },
"surname": { "type": "keyword" }
}
}
}
}
}
I'd like to retrieve an aggregate where the bucket key is "[forename] [surname]".
Toy data:
PUT /house/_doc/1
{
"house": "house1",
"people": [
{ "forename": "Dave", "surname": "Daveson" },
{ "forename": "Jeff", "surname": "Jeffson" }
]
}
PUT /house/_doc/2
{
"house": "house1",
"people": [
{ "forename": "Dave", "surname": "Daveson" },
{ "forename": "Jeffs", "surname": "Jeffsons" }
]
}
The following doesn't return what I'd expect, and I can't figure out what object paths to put in the script to get it to work:
GET house/_search
{
"aggs": {
"people": {
"nested": {
"path": "people"
},
"aggs": {
"people.name": {
"terms": {
"script": "[params._source['forename'], params._source['surname']].join(' ')"
}
}
}
}
},
"size": 0
}
Returns:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"people" : {
"doc_count" : 4,
"people.name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "null null",
"doc_count" : 4
}
]
}
}
}
}
Without script I can aggregate correctly on forename, surname or both, but using both I can't reliably "join" the results since they can be sorted only on the doc_count or key:
GET house/_search
{
"aggs": {
"people": {
"nested": {
"path": "people"
},
"aggs": {
"people.forename": {
"terms": { "field": "people.forename" }
},
"people.surname": {
"terms": { "field": "people.surname" }
}
}
}
},
"size": 0
}
Returns:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"people" : {
"doc_count" : 4,
"people.surname" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Daveson",
"doc_count" : 2
},
{
"key" : "Jeffson",
"doc_count" : 1
},
{
"key" : "Jeffsons",
"doc_count" : 1
}
]
},
"people.forename" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Dave",
"doc_count" : 2
},
{
"key" : "Jeff",
"doc_count" : 1
},
{
"key" : "Jeffs",
"doc_count" : 1
}
]
}
}
}
}
You want this results:
GET house/_search
{
"aggs": {
"people": {
"nested": {
"path": "people"
},
"aggs": {
"people.name": {
"terms": {
"script": "doc['people.forename'].value + ' ' + doc['people.surname'].value"
}
}
}
}
},
"size": 0
}
Results:
"aggregations" : {
"people" : {
"doc_count" : 4,
"people.name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Dave Daveson",
"doc_count" : 2
},
{
"key" : "Jeff Jeffson",
"doc_count" : 1
},
{
"key" : "Jeffs Jeffsons",
"doc_count" : 1
}
]
}
}
}

Elasticsearch bucket_script returns no output

I created an index like that:
{
"mappings":{
"properties":{
"#timestamp":{
"type":"date",
"doc_values":true
},
"event.category":{
"type":"keyword",
"index":true
},
"action":{
"type":"keyword",
"index":true
},
"success":{
"type":"boolean",
"index":true
},
"raw":{
"type":"text",
"index":false
}
}
}
}
Then I tried to use bucket_script pipeline aggregation to calculate success rate over actions, searching like that
{
"size": 0,
"_source": false,
"query": {
"bool": {
"filter": [{
"term": {
"action": "login"
}
}
]
}
},
"aggs": {
"action_bucket": {
"terms": {
"field": "action",
"show_term_doc_count_error": true
},
"aggs": {
"total": {
"terms": {
"field": "action"
}
},
"action": {
"filter": {
"term": {
"success": true
}
},
"aggs": {
"success": {
"terms": {
"field": "action"
}
}
}
},
"action_success_rate": {
"bucket_script": {
"buckets_path": {
"no_total": "total.doc_count",
"no_success": "action>success.doc_count"
},
"script": "100 * params.no_success / params.no_total"
}
}
}
}
}
}
And inside the response there is not action_success_rate:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 15,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"action_bucket": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "login",
"doc_count": 15,
"doc_count_error_upper_bound": 0,
"total": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "login",
"doc_count": 15
}
]
},
"action": {
"doc_count": 9,
"success": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "login",
"doc_count": 9
}
]
}
}
}
]
}
}
}
How could I fix my search request body to obtain success rate?
Mapping:
{
"mappings":{
"properties":{
"#timestamp":{
"type":"date",
"doc_values":true
},
"event.category":{
"type":"keyword",
"index":true
},
"action":{
"type":"keyword",
"index":true
},
"success":{
"type":"boolean",
"index":true
},
"raw":{
"type":"text",
"index":false
}
}
}
}
Sample Data:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "idx0",
"_type" : "_doc",
"_id" : "E802dnkBHcKLV4rtwh3V",
"_score" : 1.0,
"_source" : {
"action" : "login",
"success" : true
}
},
{
"_index" : "idx0",
"_type" : "_doc",
"_id" : "FM02dnkBHcKLV4rtyx0r",
"_score" : 1.0,
"_source" : {
"action" : "login",
"success" : true
}
},
{
"_index" : "idx0",
"_type" : "_doc",
"_id" : "Fc02dnkBHcKLV4rt1x0t",
"_score" : 1.0,
"_source" : {
"action" : "login",
"success" : false
}
},
{
"_index" : "idx0",
"_type" : "_doc",
"_id" : "Fs04dnkBHcKLV4rtKR3P",
"_score" : 1.0,
"_source" : {
"action" : "logout",
"success" : false
}
},
{
"_index" : "idx0",
"_type" : "_doc",
"_id" : "F804dnkBHcKLV4rtNR3J",
"_score" : 1.0,
"_source" : {
"action" : "logout",
"success" : true
}
}
]
}
}
Query:
{
"size": 0,
"aggs": {
"actions": {
"terms": {
"field": "action.keyword",
"size": 10
},
"aggs": {
"total": {
"value_count": {
"field": "action.keyword"
}
},
"success":{
"filter": {
"term": {
"success":true
}
},
"aggs": {
"success_cnt": {
"value_count": {
"field": "action.keyword"
}
}
}
},
"success_rate":{
"bucket_script": {
"buckets_path": {
"no_total":"total.value",
"no_success":"success>success_cnt.value"
},
"script": "(params.no_success/params.no_total)*100"
}
}
}
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"actions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "login",
"doc_count" : 3,
"total" : {
"value" : 3
},
"success" : {
"doc_count" : 2,
"success_cnt" : {
"value" : 2
}
},
"success_rate" : {
"value" : 66.66666666666666
}
},
{
"key" : "logout",
"doc_count" : 2,
"total" : {
"value" : 2
},
"success" : {
"doc_count" : 1,
"success_cnt" : {
"value" : 1
}
},
"success_rate" : {
"value" : 50.0
}
}
]
}
}
}

Distinct for several fields in ElasticSearch

I need to get distinct values of several fields from ElasticSearch index - but it has to be distinct as a set, just like in MySQL query:
SELECT DISTINCT name, type from some_table;
Until now I have tried some ways to obtain this, but for me all of them failed:
1. Aggregation
GET test_index/_search
{
"size": 0,
"track_total_hits": false,
"aggs" : {
"features": {
"terms": {
"field" : "feature.name",
"size" : 10,
"order": {
"_key": "asc"
}
}
}
}
}
2. Script
This below returns all available combinations for two fields but not only really existing pairs.
GET bm_upgraded_visitors/_search
{
"size": 0,
"aggs": {
"t": {
"terms": {
"script": "doc['feature.name'].values + ' | ' + doc['feature.type'].values"
}
}
}
}
Sample code:
PUT test_index
{
"mappings" : {
"_doc" : {
"dynamic" : "false",
"properties" : {
"features" : {
"type": "nested",
"include_in_root": true,
"properties" : {
"name" : {
"type" : "keyword"
},
"value" : {
"type" : "text"
},
"type": {
"type" : "keyword"
}
}
}
}
}
}
}
Sample doc:
PUT test_index/_doc/1
{
"features": [
{
"name": "Feature 1",
"value": "Value 1",
"type": "Type 1"
},
{
"name": "Feature 2",
"value": "Value 1",
"type": "Type 2"
}
]
}
Result required:
buckets" : [
{
"key" : "Feature 1",
"doc_count" : 1,
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Type 1",
"doc_count" : 1
}
]
}
},
{
"key" : "Feature 2",
"doc_count" : 1,
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Type 2",
"doc_count" : 1
}
]
}
}
]
Then you need another terms sub-aggregation. Try this:
GET test_index/_search
{
"size": 0,
"track_total_hits": false,
"aggs": {
"features": {
"terms": {
"field": "feature.name",
"size": 10,
"min_doc_count": 1,
"order": {
"_key": "asc"
}
},
"aggs": {
"types": {
"terms": {
"field": "feature.type",
"size": 10,
"min_doc_count": 1,
"order": {
"_key": "asc"
}
}
}
}
}
}
}

Resources