How to use copy_to do multi field aggregation? - elasticsearch

I put some data into ES. Then I specify two field in one group using copy_to feature. The reason to do this is to do multi field agg. Below are my steps.
Create index
curl -XPOST "localhost:9200/test?pretty" -d '{
"mappings" : {
"type9k" : {
"properties" : {
"SRC" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"DST" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"BITS" : { "type" : "long", "index" : "not_analyzed" },
"TIME" : { "type" : "long", "index" : "not_analyzed" }
}
}
}
}'
Put data into ES
curl -X POST "http://localhost:9200/test/type9k/_bulk?pretty" -d '
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"tcp","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
'
OK. Question
I want to aggregate on SRC,DST use sum aggregator. Then return the top 3 results. Translate my requirement to SQL is like
SELECT sum(BITS) FROM table GROUP BY src,dst ORDER BY sum(BITS) DESC LIMIT 3.
I know that I can do this using script feature like below:
curl -XPOST "localhost:9200/_all/_search?pretty" -d '
{
"_source": [ "SRC", "DST","BITS"],
"size":0,
"query": { "match_all": {} },
"aggs":
{
"SRC_DST":
{
"terms": {"script": "[doc.SRC.value, doc.DST.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BITS"} } }
}
}
}
'
The result I got with script will be like below:
"aggregations" : {
"SRC_DST" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [ {
"key" : "BJ-DL",
"doc_count" : 8,
"sum_bits" : {
"value" : 140.0
}
}, {
"key" : "DL-SH",
"doc_count" : 6,
"sum_bits" : {
"value" : 120.0
}
} ]
But I`m expecting to do it with copy_to feature. Because I think scripting may cost too much time.

I am not sure but I guess you do not need copy_to functionality. If I go by SQL query then what you are asking can be done with terms aggregation and sum aggregation like this
{
"size": 0,
"aggs": {
"unique_src": {
"terms": {
"field": "SRC",
"size": 10
},
"aggs": {
"unique_dst": {
"terms": {
"field": "DST",
"size": 3,
"order": {
"bits_sum": "desc"
}
},
"aggs": {
"bits_sum": {
"sum": {
"field": "BITS"
}
}
}
}
}
}
}
}
Above query give me output like this
"aggregations": {
"unique_src": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 6,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DL",
"doc_count": 4,
"bits_sum": {
"value": 70
}
},
{
"key": "SH",
"doc_count": 2,
"bits_sum": {
"value": 50
}
}
]
}
},
{
"key": "DL",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "SH",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
},
{
"key": "SH",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
}
]
}
}
Hope this helps!

Related

Count number of inner elements of array property (Including repeated values)

Given I have the following records.
[
{
"profile": "123",
"inner": [
{
"name": "John"
}
]
},
{
"profile": "456",
"inner": [
{
"name": "John"
},
{
"name": "John"
},
{
"name": "James"
}
]
}
]
I want to get something like:
"aggregations": {
"name": {
"buckets": [
{
"key": "John",
"doc_count": 3
},
{
"key": "James",
"doc_count": 1
}
]
}
}
I'm a beginner using Elasticsearch, and this seems to be a pretty simple operation to do, but I can't find how to achieve this.
If I try a simple aggs using term, it returns 2 for John, instead of 3.
Example request I'm trying:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
}
}
}
}
How can I possibly achieve this?
Additional Info: It will be used on Kibana later.
I can change mapping to whatever I want, but AFAIK Kibana doesn't like the "Nested" type. :(
You need to do a value_count aggregation, by default terms only does a doc_count, but the value_count aggregation will count the number of times a given field exists.
So, for your purposes:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
},
"aggs": {
"total": {
"value_count": {
"field": "inner.name"
}
}
}
}
}
}
Which returns:
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John",
"doc_count" : 2,
"total" : {
"value" : 3
}
},
{
"key" : "James",
"doc_count" : 1,
"total" : {
"value" : 2
}
}
]
}
}

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Return just buckets size of aggregation query - Elasticsearch

I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.
try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation

Incorrect unique values from field in elasticsearch

I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?
I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}

how to enable elasticsearch return aggregations including the name in the response

I m performing some aggregations (by_shop and by_category ) over a data set using elasticsearch. The thing is I get the response where it s not specified the name of each agg and thus it s difficult to parse the response.
Query
"aggregations" : {
"byShop" : {
"terms" : {
"field" : "shopName",
"size" : 0
}
},
"byCategory" : {
"terms" : {
"field" : "category",
"size" : 0
}
}
}
Respone
"aggs": [
[
{
"name": "bucket",
"count": 5075,
"key": "shop1"
},
{
"name": "bucket",
"count": 1,
"key": "shop2"
}
],
[
{
"name": "bucket",
"count": 11,
"key": "Jewelry & Watches"
},
{
"name": "bucket",
"count": 1,
"key": "Home & Garden/Home Décor"
}
]
Ideally, I would like to see the following:
"aggregations": {
"byShop": {
"buckets": [
{
"count": 5075,
"key": "shop1"
},
{
"count": 1,
"key": "shop2"
}
]
},
"byCategory": {
"buckets": [
{
"count": 11,
"key": "Jewelry & Watches"
},
{
"count": 11,
"key": "Home & Garden/Home Décor"
}
]
}
}
EDIT
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByCategory").getBuckets());
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByShopname").getBuckets());
where searchResult holds the response from Elastcisearch. It seems that getBucket() trims the names of the aggs , right?

Resources