Elasticsearch aggregation by full array - elasticsearch

How i can to get aggregation by all array items inside document, not by each value of array. For example i have several documents, like this
{'some_field': [1,2]}
{'some_field': [1]}
{'some_field': [1]}
{'some_field': [7,2]}
Now with simple aggregation query like this
{
"aggs" : {
"agg_name" : {
"terms" : {
"field" : "some_field"
}
}
},
"size": 0
}
i got result like this
"buckets": [
{
"key": "1",
"doc_count": 3
},
{
"key": "2",
"doc_count": 2
},
...
]
but i want to get full array view, like this
"buckets": [
{
"key": [1],
"doc_count": 2
},
{
"key": [1,2],
"doc_count": 1
},
{
"key": [7,2],
"doc_count": 1
},
]

I was looking for the same aggregation, still doesn't exists.
So fixed with a painless script
POST some_index/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"lang": "painless",
"source": """
def myString = "";
for (int i = 0; i < doc['data. some_field.keyword'].length; ++i) {
myString += doc['data. some_field.keyword'][i] + ", ";
}
return myString;
"""
}
}
}
}
}

Related

Distinct query in ElasticSearch

I've an index where a field (category) is a list field. I want to fetch all the distinct categories within in an index.
Following is the example.
Doc1 -
{
"category": [1,2,3,4]
}
Doc2 -
{
"category": [5,6]
}
Doc3 -
{
"category": [1,2,3,4]
}
Doc4 -
{
"category": [1,2,7]
}
My output should be
[1,2,3,4]
[5,6]
[1,2,7]
I using the below query:-
GET /products/_search
{
"size": 0,
"aggs" : {
"category" : {
"terms" : { "field" : "category", "size" : 1500 }
}
}}
This returns me [1], [2], [3], [4], [5], [6], [7]. I don't want the individual unique items in my list field. I'm rather looking for the complete unique list.
What am I missing in the above query? I'm using ElasticSearch v7.10
You can use terms aggregation with script:
{
"size": 0,
"aggs": {
"category":{
"terms": {
"script": {
"source": """
def cat="";
for(int i=0;i<doc['category'].length;i++){
cat+=doc['category'][i];}
return cat;
"""
}
}
}
}
}
Above query will return result like below:
"aggregations": {
"category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1234",
"doc_count": 2
},
{
"key": "127",
"doc_count": 1
},
{
"key": "56",
"doc_count": 1
}
]
}
}

Elasticsearch - Merge date histogram aggregation

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!

ElasticSearch Nested NumericRangeQuery using min value from list for comparison

I have the following data:
[{
"id": "1",
"listItems": [
{
"key": "li1",
"value": 100
},
{
"key": "li2",
"value": 5000
}
]
},
{
"id": "2",
"listItems": [
{
"key": "li3",
"value": 200
},
{
"key": "li2",
"value": 2000
}
]
}]
I'm trying to do a NumericRangeQuery filter so that the MIN value in each document's listItems match up between a range. So for example, my range is 150 to 15000.
The only way I know how to write this is using a script query but it doesn't appear to work as the code still seems to grab any value under the listItems to attempt to match up against the range instead of grabbing the MIN like I told it to. Here's my query:
{
"track_total_hits": true,
"from": 0,
"min_score": 0.0,
"query": {
"bool": {
"must": [
{
"nested": {
"path": "listItems",
"query": {
"script": {
"script": "double minVal = 0; minVal = doc['listItems.value'][0]; for (wp in doc['listItems.value']) {if (wp < minVal) { minVal = wp;}} return minVal >= 150 && minVal <= 15000"
}
}
}
}
]
}
}}
Anybody seeing something I don't?
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Bucket Selector aggregation that is a parent pipeline aggregation that executes a script that determines whether the current bucket will be retained in the parent multi-bucket aggregation.
Adding a working example with index mapping, index data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
},
"value_range": {
"bucket_selector": {
"buckets_path": {
"totalValues": "nested_entries>min_position"
},
"script": "params.totalValues >= 150 && params.totalValues < 15000"
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
}
}

Return just buckets size of aggregation query - Elasticsearch

I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.
try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation

Elasticsearch term aggregation using script with key as integer

Is it possible to make the key of the resulting aggregation be the int value returned by the script instead of a string?
See this example, but using dayOfMonth or hourOfDay instead of dayOfWeek, so there are more than 10 values, so the result ends up being ordered, "1", "10", "11", ..." instead of1, 2, 3,...`.
Here's an example of the full call:
POST /sales/_search?size=0
{
"aggs": {
"dayOfMonth": {
"terms": {
"script": {
"lang": "painless",
"source": "doc['date'].value.dayOfMonth"
}
}
}
}
}
And an example response:
{
...
"aggregations": {
"dayOfWeek": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 4
},
{
"key": "10",
"doc_count": 3
},
{
"key": "2",
"doc_count": 2
}
]
}
}
}
Setting the value_type parameter can resolve the issue by coercing the unmapped field into the correct type.
{
"aggs": {
"ip_addresses": {
"terms": {
"script": "doc['date'].value.dayOfMonth",
"value_type": "long"
}
}
}
}
Failed Trying to Format Bytesedit

Resources