I have 4 days experience of Elasticsearch 1.7.2.
I have a collection of documents, each document is a User. The User has a number of Answers which is linked through UserAnswers. Which gives a document reference of user_answers.answer[]. Where the answers array is an array of objects.
The user_answers.answer[].correct is a boolean field which tells me if the answer given by the user is correct or not.
I would like to list the users and also display the total number of correct and incorrect answers they have.
So far I have tried a number of different approaches and the one I'll include here is as close as I've got so far in 1.5 days of trying.
Use a terms aggregation to create a bucket for each User by username.
Filter each bucket to leave only correct or incorrect answers.
Count the number of filtered answers.
"size": 0,
"filter": {
"bool": {
"must_not": {
// Remove users who already have this award
"term": {"awards_users.award_id": 2}
"aggs": {
"users": {
"terms": {"field": "username"},
"aggs": {
"correct": {
"filter": {
"term": {"user_answers.answer.correct": true}
"aggs": {
"count": {
"value_count": {
"field": "user_answers.answer.id"
// Same for incorrect, but inverted correct value
Sample response
"key": "neon1024",
"doc_count": 1,
"correct": {
"doc_count": 1,
"count": {
"value": 7 // Expected 1 correct & 6 incorrect
This is the record which I am testing against, and I am expecting that 1 is returned instead of 7. There are 7 answers in total, 6 incorrect and 1 correct. This I have verified in my document index.
The problem
For some reason the actual filter seems to be being ignored, and leaving all possible related answers in the bucket. Hence the aggregation is seeing them all, rather than showing the expected value.
How can I use an aggregation to segregate my counts based on the value of the related answers values?
Thanks for reading my long question!

As suggested, you probably have your answers mapped as object, while you should be using nested type.
Using nested type, elasticsearch will store your answers as individual documents linked to the root one and will let you do expected aggregations on them. You'll have to use nested type aggregation in your query to achieve that.
So I'd say it would be best to map your document like this:
PUT /test
"mappings" : {
"your_type" : {
"properties" : {
"username" : {
"type" : "string",
"index" : "not_analyzed"
"user_answers" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
"answer" : {
"type" : "string"
"correct" : {
"type" : "boolean"
Test document:
PUT /test/your_type/1
"username": "neon1024",
"user_answers": [
"id": 1,
"answer": "answer1",
"correct": true
"id": 2,
"answer": "answer2",
"correct": true
"id": 3,
"answer": "answer3",
"correct": false
POST /test/_search?search_type=count
"aggs": {
"users": {
"terms": {
"field": "username"
"aggs": {
"DiveIn": {
"nested": {
"path": "user_answers"
"aggs": {
"CorrectVsIncorrect": {
"terms": {
"field": "user_answers.correct",
"size": 2
And Final result:
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
"hits": {
"total": 1,
"max_score": 0,
"hits": []
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "neon1024",
"doc_count": 1,
"DiveIn": {
"doc_count": 3,
"CorrectVsIncorrect": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "T",
"doc_count": 2
"key": "F",
"doc_count": 1
Where "key": "T" represents correct answers and "doc_count": 2 represents amount of them.


ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
"id" : "1",
"listItems" :
"key" : "li1",
"value" : 100
"key" : "li2",
"value" : 5000
"id" : "2",
"listItems" :
"key" : "li3",
"value" : 200
"key" : "li2",
"value" : 2000
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
"mappings": {
"properties": {
"listItems": {
"type": "nested"
Index Data:
"id" : "1",
"listItems" :
"key" : "li1",
"value" : 100
"key" : "li2",
"value" : 5000
"id" : "2",
"listItems" :
"key" : "li3",
"value" : 200
"key" : "li2",
"value" : 2000
Search Query:
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
"maxValue": {
"value": 200.0,
"keys": [
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
// POST /index/_doc
"children": [
{ "value": 12 },
{ "value": 45 }
// POST /index/_doc
"children": [
{ "value": 7 },
{ "value": 35 }
I can use those aggregations in request to get required value:
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
"aggs": {
"minimum": {
"min": {
"field": "children.value"
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
"result": {
"value": 12.0,
"keys": [
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Elasticsearch Histogram of visits

I'm quite new to Elasticsearch and I fail to build a histogram based on ranges of visits. I am not even sure that it's possible to create this kind of chart by using a single query in Elasticsearch, but I'm the feeling that could be possible with pipeline aggregation or may be scripted aggregation.
Here is a test dataset with which I'm working:
PUT /test_histo
{ "settings": { "number_of_shards": 1 }}
PUT /test_histo/_mapping/visit
"properties": {
"user": {"type": "string" },
"datevisit": {"type": "date"},
"page": {"type": "string"}
POST test_histo/visit/_bulk
If we consider the ranges [1,2[, [2,3[, [3, inf.[
The expected result should be :
[1,2[ = 2
[2,3[ = 1
[3, inf.[ = 1
All my efforts to find the histogram showing a customer visit frequency remained to date unsuccessful. I would be pleased to have a few tips, tricks or ideas to get a response to my problem.
There are two ways you can do it.
First is doing it in ElasticSearch which will require Scripted Metric Aggregation. You can read more about it here.
Your query would look like this
"size": 0,
"aggs": {
"visitors_over_time": {
"date_histogram": {
"field": "datevisit",
"interval": "week"
"aggs": {
"no_of_visits": {
"scripted_metric": {
"init_script": "_agg['values'] = new java.util.HashMap();",
"map_script": "if (_agg.values[doc['user'].value]==null) {_agg.values[doc['user'].value]=1} else {_agg.values[doc['user'].value]+=1;}",
"combine_script": "someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x];if(value<3){key='[' + value +',' + (value + 1) + '[';}else{key='[' + value +',inf[';}; if(someHashMap[key]==null){someHashMap[key] = 1}else{someHashMap[key] += 1}}; return someHashMap;"
where you can change period of time in date_histogram object in the field interval by values like day, week, month.
Your response would look like this
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
"hits": {
"total": 7,
"max_score": 0,
"hits": []
"aggregations": {
"visitors_over_time": {
"buckets": [
"key_as_string": "2015-11-23T00:00:00.000Z",
"key": 1448236800000,
"doc_count": 7,
"no_of_visits": {
"value": [
"[2,3[": 1,
"[3,inf[": 1,
"[1,2[": 2
Second method is to the work of scripted_metric in client side. You can use the result of Terms Aggregation. You can read more about it here.
Your query will look like this
GET test_histo/visit/_search
"size": 0,
"aggs": {
"visitors_over_time": {
"date_histogram": {
"field": "datevisit",
"interval": "week"
"aggs": {
"no_of_visits": {
"terms": {
"field": "user",
"size": 10
and the response will be
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
"hits": {
"total": 7,
"max_score": 0,
"hits": []
"aggregations": {
"visitors_over_time": {
"buckets": [
"key_as_string": "2015-11-23T00:00:00.000Z",
"key": 1448236800000,
"doc_count": 7,
"no_of_visits": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "john",
"doc_count": 3
"key": "mary",
"doc_count": 2
"key": "jean",
"doc_count": 1
"key": "robert",
"doc_count": 1
where on the response you can do count for each doc_count for each period.
Have a look at:
If you whant to show it in fancy already fixed UI use Kibana.
A query like this:
GET _search
"query": {
"match_all": {}
"aggs" : {
"visits" : {
"date_histogram" : {
"field" : "datevisit",
"interval" : "month"
Should give you a histogram, I don't have elastic here at the moment so I might have some fat finggered typos.
Then you could ad query terms to only show histogram for specific page our you could have an aouter aggregation bucket wich aggregates / page or user.
Something like this:
GET _search
"query": {
"match_all": {}
"aggs" : {
"users" : {
"terms" : {
"field" : "user",
"aggs" : {
"visits" : {
"date_histogram" : {
"field" : "datevisit",
"interval" : "month"
Have a look to this solution:
"query": {
"match_all": {}
"aggs": {
"periods": {
"filters": {
"filters": {
"1-2": {
"range": {
"datevisit": {
"gte": "2015-11-25",
"lt": "2015-11-26"
"2-3": {
"range": {
"datevisit": {
"gte": "2015-11-26",
"lt": "2015-11-27"
"3-": {
"range": {
"datevisit": {
"gte": "2015-11-27",
"aggs": {
"users": {
"terms": {"field": "user"}
Step by step:
Filter aggregation: You can define ranged values for the next aggregation, in this case we define 3 periods based on date range filter
Nested Users aggregation: This aggregation returns as many results as filters you'd defined. So, in this case, you'll get 3 values using range date filtering
You'll get a result like this:
"aggregations" : {
"periods" : {
"buckets" : {
"1-2" : {
"users" : {
"buckets" : [
{"key" : XXX,"doc_count" : NNN},
{"key" : YYY,"doc_count" : NNN},
"2-3" : {
"users" : {
"buckets" : [
{"key" : XXX1,"doc_count" : NNN1},
{"key" : YYY1,"doc_count" : NNN1},
"3-" : {
"users" : {
"buckets" : [
{"key" : XXX2,"doc_count" : NNN2},
{"key" : YYY2,"doc_count" : NNN2},
Try it, and tell if it works

Limit aggregations to list of values

Can I limit aggregations to return only specific list of values? I have something like this:
{ "aggs" : {
"province" : {
"terms" : {
"field" : "province"
"query": {
"bool": {
//my query..
But let's say I know list of province for which I want make count ({'province1', 'province2', 'province3'}). Is it possible to restrict returned list of province without influence on my query results?
I want to get:
//list of hits..
"aggregations": {
"province": {
"buckets": [
"key": "province1",
"doc_count": 200
"key": "province2",
"doc_count": 162
"key": "province3",
"doc_count": 162
// even if there is more possible provinces
// I don't want to see them
Sure, just use term filters.
Here's an example. Let's say I have visit stats for a bunch of different IP addresses, but I only want to get counts of document for two of them, I could do this:
POST /test_index/_search?search_type=count
"aggregations": {
"ip": {
"terms": {
"field": "ip",
"size": 10,
"include": [
and get back something like:
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
"hits": {
"total": 7,
"max_score": 0,
"hits": []
"aggregations": {
"ip": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "",
"doc_count": 3
"key": "",
"doc_count": 3
Here is some code I used to play around with it:
So your example might look like:
"aggs": {
"province": {
"terms": {
"field": "province",
"include": [

How to get an Elasticsearch aggregation with multiple fields

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
meta: {
tags: [
id: 123,
name: 'Biscuits'
id: 456,
name: 'Cakes'
id: 789,
name: 'Breads'
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
"query": {
"bool": {
"must": [
"match": {
"item.meta.tags.id": "123"
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
Combine the fields when indexing
A script to munge together the fields
A nested aggregation
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
I will get this result:
"aggregations": {
"baked_goods": {
"buckets": [
"key": "456",
"doc_count": 11,
"name": {
"buckets": [
"key": "Biscuits",
"doc_count": 11
"key": "Cakes",
"doc_count": 11
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
Thanks for making it this far!
By the looks of it, your tags is not nested.
For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:
"item": {
"properties": {
"meta": {
"properties": {
"tags": {
"type": "nested", <-- nested field
"include_in_parent": true, <-- to, also, keep the flat array-like structure
"properties": {
"id": {
"type": "integer"
"name": {
"type": "string"
Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.
So, everything you had so far in your queries will still work without any changes to the queries.
But, for this particular query of yours, the aggregation needs to change to something like this:
"aggs": {
"baked_goods": {
"nested": {
"path": "item.meta.tags"
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.id"
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
And the result is like this:
"aggregations": {
"baked_goods": {
"doc_count": 9,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": 123,
"doc_count": 3,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "biscuits",
"doc_count": 3
"key": 456,
"doc_count": 2,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
"key": "cakes",
"doc_count": 2

Elasticsearch Terms or Cardinality Aggregation - Order by number of distinct values

I am doing some analysis to find unique pairs from 100s of millions of documents. The mock example is as shown below:
doc field1 field2
90% of the document contains an unique pair as shown above in doc 3, 4, 5, 6 and 7 which I am not interested on my aggregation result. I am interested to aggregate doc 1 and 2.
Terms Aggregation Query:
"aggs": {
"f1": {
"terms": {
"field": "FIELD1",
"min_doc_count": 2
"aggs": {
"f2": {
"terms": {
"field": "FIELD2"
Term Aggregation Result
"aggregations": {
"f1": {
"buckets": [
"key": "PPP",
"doc_count": 2,
"f2": {
"buckets": [
"key": "QQQ",
"doc_count": 2
"key": "XXX",
"doc_count": 2,
"f2": {
"buckets": [
"key": "YYY",
"doc_count": 2
"key": "AAA",
"doc_count": 2,
"f2": {
"buckets": [
"key": "BBB",
"doc_count": 1
"key": "CCC",
"doc_count": 1
I am interested only on key AAA to be in the aggregation result. What is the best way to filter the aggregation result containing distinct pairs?
I tried with cardinality aggregation which result unque value count. However I am not able to filter out what I am not interested from the aggregation results.
Cardinality Aggregation Query
"aggs": {
"f1": {
"terms": {
"field": "FIELD1",
"min_doc_count": 2
"aggs": {
"f2": {
"cardinality": {
"field": "FIELD2"
Cardinality Aggregation Result
"aggregations": {
"f1": {
"buckets": [
"key": "PPP",
"doc_count": 2,
"f2": {
"value" : 1
"key": "XXX",
"doc_count": 2,
"f2": {
"value" : 1
"key": "AAA",
"doc_count": 2,
"f2": {
"value" : 2
Atleast if I could sort by cardinal value, that would be help me to find some workarounds. Please help me in this regard.
P.S: Writing a spark/mapreduce program to post process/filter the aggregation result is not expected solution for this issue.
I suggest to use filter query along with aggregations, since you are only interested in field1=AAA.
I have a similar example here.
For example, I have an index of all patients in my hospital. I store their drug use in a nested object DRUG. Each patient could take different drugs, and each could take a single drug for multiple times.
Now if I wanted to find the number of patients who took aspirin at least once, the query could be:
"size": 0,
"_source": false,
"query": {
"filtered": {
"query": {
"match_all": {}
"filter": {
"nested": {
"path": "DRUG",
"filter": {
"bool": {
"must": [{ "term": { "DRUG.NAME": "aspirin" } }]
"aggs": {
"nested": {
"path": "DRUG"
"aggs": {
"terms": { "field": "DRUG.NAME", "size": 0 },
"aggs": {
"DISTINCT": { "cardinality": { "field": "DRUG.PATIENT" } }
Sample result:
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
"hits": {
"total": 6,
"max_score": 0,
"hits": []
"aggregations": {
"doc_count": 11,
"buckets": [
"key": "aspirin",
"doc_count": 6,
"value": 6
"key": "vitamin-b",
"doc_count": 3,
"value": 2
"key": "vitamin-c",
"doc_count": 2,
"value": 2
The first one in the buckets would be aspirin. But you can see other 2 patients had also taken vitamin-b when they took aspirin.
If you change the field value of DRUG.NAME to another drug name for example "vitamin-b", I suppose you would get vitamin-b in the first position of the buckets.
Hopefully this is helpful to your question.
A bit late, hope it would help for others.
A simple approach is to filter only 'AAA' records in top aggregation:
"size": 0,
"aggregations": {
"filterAAA": {
"filter": {
"term": {
"aggregations": {
"f1": {
"terms": {
"field": "FIELD1",
"min_doc_count": 2
"aggregations": {
"f2": {
"terms": {
"field": "FIELD2"
