I am just wondering for a aggregation query in ES, is that possible to utilize the returned bucket for your own purpose. For example if I have response result like this:
{
"key": "test",
"doc_count": 2000,
"child": {
"value": 1000
}
}
And I want to get the ratio of doc_count and value, so I am looking for a way to generate another field/aggregation to do the math of those two fields, like this:
{
"key": "test",
"doc_count": 2000,
"child": {
"value": 1000
},
"ratio" : 2
}
or
{
"key": "test",
"doc_count": 1997,
"child": {
"value": 817
},
"buckets": [
{
"key": "ratio",
"value": 2
}
]
}
Related
I'm using Elastic Search to create a search filter and I need to find all the values saved in the database of the "cambio" column without repeating the values.
The values are saved as follows: "Manual de 5 marchas" or "Manual de 6 marchas"....
I created this query to return all saved values:
GET /crawler10/crawler-vehicles10/_search
{
"size": 0,
"aggregations": {
"my_agg": {
"terms": {
"field": "cambio"
}
}
}
}
But when I run the returned values they look like this:
"aggregations": {
"my_agg": {
"doc_count_error_upper_bound": 2,
"sum_other_doc_count": 2613,
"buckets": [
{
"key": "de",
"doc_count": 2755
},
{
"key": "marchas",
"doc_count": 2714
},
{
"key": "manual",
"doc_count": 2222
},
{
"key": "modo",
"doc_count": 1097
},
{
"key": "5",
"doc_count": 1071
},
{
"key": "d",
"doc_count": 1002
},
{
"key": "n",
"doc_count": 1002
},
{
"key": "automática",
"doc_count": 935
},
{
"key": "com",
"doc_count": 919
},
{
"key": "6",
"doc_count": 698
}
]
}
}
Aggregations are based on the mapping type of the saved field. The field type for cambio seems to be set to analyzed(by default). Please create an index with the mapping not_analyzed for your field cambio.
You can create the index with a PUT request as below (if your ES version is less than 5) and then you will need to re-index your data in the crawler10 index.
PUT crawler10/_mapping/
{
"mappings": {
"crawler-vehicles10": {
"properties": {
"cambio": {
"type": "string"
"index": "not_analyzed"
}
}
}
}
}
For ES v5 or greater
PUT crawler10/_mapping/
{
"mappings": {
"crawler-vehicles10": {
"properties": {
"cambio": {
"type": "keyword"
}
}
}
}
}
I am trying to perform an aggregation to group documents by the first two letters of a specific field value.
I successfully aggreated my documents by a specific field name, but i don't know how to work with the values.
For example, for the docs:
[
{
"name": "John"
},
{
"name": "Jog"
},
{
"name": "James"
},
{
"name": "Robert"
},
{
"name": "Jessica"
}
]
I would like to get the following response:
[
{
"key": "Jo",
"doc_count": 2
},
{
"key": "Ja",
"doc_count": 1
},
{
"key": "Ro",
"doc_count": 1
},
{
"key": "Je",
"doc_count": 1
}
]
Is there an aggregation query able to do that?
You could use a terms aggregation with a script instead of a field, like this:
{
"size": 0,
"aggs": {
"first_two": {
"terms": {
"script": "doc.name.value?.size() >=2 ? doc.name.value?.substring(0, 2) : doc.name.value"
}
}
}
}
Note that if your name fields all have at least two characters, the script could simply be doc.name.value?.substring(0, 2). My script above accounts for single character names.
Also make sure to enable dynamic scripting in order for this to work.
I have an elasticsearch index with the following document:
{
dates: ["2014-01-31","2014-02-01"]
}
I want to count all the instances of all the days in my index separated by year and month. I hoped to do this using a date histogram aggregation (which is successful for counting non-array properties):
{
"from": 0,
"size": 0,
"aggregations": {
"year": {
"date_histogram": {
"field": "dates",
"interval": "1y",
"format": "yyyy"
},
"aggregations": {
"month": {
"date_histogram": {
"field": "dates",
"interval": "1M",
"format": "M"
},
"aggregations": {
"day": {
"date_histogram": {
"field": "dates",
"interval": "1d",
"format": "d"
}
}
}
}
}
}
}
}
However, I get the following aggregation results:
"aggregations": {
"year": {
"buckets": [
{
"key_as_string": "2014",
"key": 1388534400000,
"doc_count": 1,
"month": {
"buckets": [
{
"key_as_string": "1",
"key": 1388534400000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
},
{
"key_as_string": "2",
"key": 1391212800000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
The "day" aggregation ignores the bucket of its parent "month" aggregation, so it processes both elements of the array in each bucket, counting each date twice. The results indicate that two dates appear in each month (and four total), which is obviously incorrect.
I've tried reducing my aggregation to a single date histogram (and bucketing the results in java based on the key) but the doc_count returns as one instead of the number of elements in the array (two in my example). Adding a value_count brings me back to my original issue in which documents that overlap multiple buckets have their dates double-counted.
Is there a way to add a filter to the date histogram aggregations or otherwise modify them in order to count the elements in my date arrays correctly? Alternatively, does Elasticsearch have an option to unwind arrays like in MongoDB? I want to avoid using scripting due to security concerns.
Thanks,
Thomas
I have the following simple aggregation:
GET index1/type1/_search
{
"size": 0,
"aggs": {
"incidentID": {
"terms": {
"field": "incidentID",
"size": 5
}
}
}
}
Results are:
"aggregations": {
"incidentID": {
"buckets": [
{
"key": "0A631EB1-01EF-DC28-9503-FC28FE695C6D",
"doc_count": 233
},
{
"key": "DF107D2B-CA1E-85C9-E01A-C966DC6F7051",
"doc_count": 226
},
{
"key": "60B8955F-38FD-8DFE-D374-4387668C8368",
"doc_count": 220
},
{
"key": "B787868A-F72E-63DC-D837-B3A864D9FFC6",
"doc_count": 174
},
{
"key": "C597EC5F-C60F-F3BA-61CB-4990F12C1893",
"doc_count": 174
}
]
}
}
What I want to do is get the "statistics" of the "doc_count" returned. I want:
Min Value
Max Value
Average
Standard Deviation
No, this is not currently possible, here is the issue tracking the support:
https://github.com/elasticsearch/elasticsearch/issues/8110
Obviously, it is possible to do this client side if you are able to pull the full list of all buckets into memory.
Is that a possible to define an aggregation function in elastic search?
E.g. for data:
author weekday status
me monday ok
me tuesday ok
me moday bad
I want to get an aggregation based on author and weekday, and as a value I want to get concatenation of status field:
agg1 agg2 value
me monday ok,bad
me tuesday ok
I know you can do count, but is that possible to define another function used for aggregation?
EDIT/ANSWER: Looks like there is no multirow aggregation support in ES, thus we had to use subaggregations on last field (see Akshay's example). If you need to have more complex aggregation function, then aggregate by id (note, you won't be able to use _id, so you'll have to duplicate it in other field) - that way you'll be able to do advanced aggregation on individual items in each bucket.
You can get get roughly what you want by using sub aggregations available in 1.0. Assuming the documents are structured as author, weekday and status, you could using the aggregation below:
{
"size": 0,
"aggs": {
"author": {
"terms": {
"field": "author"
},
"aggs": {
"days": {
"terms": {
"field": "weekday"
},
"aggs": {
"status": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
Which gives you the following result:
{
...
"aggregations": {
"author": {
"buckets": [
{
"key": "me",
"doc_count": 3,
"days": {
"buckets": [
{
"key": "monday",
"doc_count": 2,
"status": {
"buckets": [
{
"key": "bad",
"doc_count": 1
},
{
"key": "ok",
"doc_count": 1
}
]
}
},
{
"key": "tuesday",
"doc_count": 1,
"status": {
"buckets": [
{
"key": "ok",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}