How to aggs on particular field in elasticsearch - elasticsearch

I have dictionary below
{
'id': 0,
'Title': 'Wolf',
'Major Genre': Action,
'IMDB': "7"
},
{
'id': 1,
'Title': 'The Land Girls',
'Major Genre': Drama,
'IMDB': "7"
},
{
'id': 2,
'Title': 'Beauty',
'Major Genre': Comedy,
'IMDB': "5"
}
Need to find the aggregation function for Major Genre
Need to filter the output which is having Major Genre == Comedy which IMDB >6
I have done below and i got error
{
"size": 100,
"aggregations": {
"terms": {
"Major Genre": "Comedy"
}
}
}

Edit: splitted queries
Filtering documents by Comedy genre
POST test_nons/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"IMDB": {
"gte": 4
}
}
},
{
"term": {
"Major Genre.keyword": "Comedy"
}
}
]
}
}
}
Getting all possible genres
POST test_nons/_search
{
"size": 0,
"aggs": {
"major_genres": {
"terms": {
"field": "Major Genre.keyword",
"size": 10
}
}
}
}
Ingest data
POST test_nons/_doc
{
"id": 0,
"Title": "Wolf",
"Major Genre": "Action",
"IMDB": "7"
}
POST test_nons/_doc
{
"id": 1,
"Title": "The Land Girls",
"Major Genre": "Drama",
"IMDB": "7"
}
POST test_nons/_doc
{
"id": 2,
"Title": "Beauty",
"Major Genre": "Comedy",
"IMDB": "5"
}
Request
POST test_nons/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"IMDB": {
"gte": 6
}
}
},
{
"term": {
"Major Genre.keyword": "Comedy"
}
}
]
}
},
"aggs": {
"major_genres": {
"terms": {
"field": "Major Genre.keyword",
"size": 10
}
}
}
}
Response
There are no docs with Comedy genre and IMBD > 6 so response would be empty.
For example purposes I will filter by IMDB > 4 instead of 6 to have some data in the response.
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "test_nons",
"_type" : "_doc",
"_id" : "Rcd06ncB50NMsuQPeVRj",
"_score" : 0.0,
"_source" : {
"id" : 2,
"Title" : "Beauty",
"Major Genre" : "Comedy",
"IMDB" : "5"
}
}
]
},
"aggregations" : {
"major_genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Comedy",
"doc_count" : 1
}
]
}
}
}

Related

Query filter for searching rollup index works with epoch time fails with date math

`How do we query (filter) a rollup index?
For example, based on the query here
Request:
{
"size": 0,
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
Response:
{
"took" : 93,
"timed_out" : false,
"terminated_early" : false,
"_shards" : ... ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"timeline" : {
"buckets" : [
{
"key_as_string" : "2018-01-18T00:00:00.000Z",
"key" : 1516233600000,
"doc_count" : 6,
"nodes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 5.1499998569488525
}
},
{
"key" : "b",
"doc_count" : 2,
"max_temperature" : {
"value" : 201.0
},
"avg_voltage" : {
"value" : 5.700000047683716
}
},
{
"key" : "c",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 4.099999904632568
}
}
]
}
}
]
}
}
}
How to filter say last 3 days, is it possible?
For a test case, I used fixed_interval rate of 1m (one minute, and also 60 minutes) and I tried the following and the error was all query shards failed. Is it possible to query filter rollup agggregations?
Test Query for searching rollup index
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-3d/d",
"lt": "now/d"
}
}
}
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}

aggregation elastic search query with sum

This is my current data. i want an aggregate query to return variantId sum of quantity based on type in/out.
hits: {
total: {
value: 5,
relation: "eq",
},
max_score: 1,
hits: [
{
_index: "transactions",
_type: "_doc",
_id: "out2391",
_score: 1,
_source: {
date: "2021-03-08",
transactionId: 2391,
brandId: 1112,
outletId: 121222,
variantId: 1321,
qty: 1,
closing: 10,
type: "out",
}
],
},
I want result that returns sum of quantity for type in/out for variants
[{
variantId: 1321,
in: sum(qty),
out: sum(qty)
},
{
variantId: 13211,
in: sum(qty),
out: sum(qty)
}
]
Ingest test documents
POST test_shaheer/_doc
{
"date": "2021-03-08",
"transactionId": 2391,
"brandId": 1112,
"outletId": 121222,
"variantId": 1321,
"qty": 1,
"closing": 10,
"type": "out"
}
POST test_shaheer/_doc
{
"date": "2021-03-08",
"transactionId": 2391,
"brandId": 1112,
"outletId": 121222,
"variantId": 1321,
"qty": 1,
"closing": 10,
"type": "out"
}
POST test_shaheer/_doc
{
"date": "2021-03-08",
"transactionId": 2391,
"brandId": 1112,
"outletId": 121222,
"variantId": 1321,
"qty": 5,
"closing": 10,
"type": "in"
}
POST test_shaheer/_doc
{
"date": "2021-03-08",
"transactionId": 2391,
"brandId": 1112,
"outletId": 121222,
"variantId": 1321,
"qty": 2,
"closing": 10,
"type": "in"
}
To achieve what you need you have nest aggregations , first you group by variantId, then each variantId by type, and finally you do a sum on the qty field inside each type.
Query
POST test_shaheer/_search
{
"size": 0,
"aggs": {
"variant_ids": {
"terms": {
"field": "variantId",
"size": 10
},
"aggs": {
"types": {
"terms": {
"field": "type.keyword",
"size": 10
},
"aggs": {
"qty_sum": {
"sum": {
"field": "qty"
}
}
}
}
}
}
}
}
Note size 0 to not show results.
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"variant_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1321,
"doc_count" : 4,
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "in",
"doc_count" : 2,
"qty_sum" : {
"value" : 7.0
}
},
{
"key" : "out",
"doc_count" : 2,
"qty_sum" : {
"value" : 2.0
}
}
]
}
}
]
}
}
}

ELASTICSEARCH - Get a count of values from the most recent document

I can't get a count of fields with a filtered document value.
I have this json
``
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "net",
"_type" : "_doc",
"_id" : "RTHRTH",
"_score" : 1.0,
"_source" : {
"created_at" : "2020-05-31 19:01:01",
"data" : [...]
{
"_index" : "net",
"_type" : "_doc",
"_id" : "LLLoIJBHHM",
"_score" : 1.0,
"_source" : {
"created_at" : "2020-06-23 15:11:59",
"data" : [...]
}
}
]
}
}
``
In the "data" field, there are more fields within other fields respectively.
I want to filter the most recent document, and then count a certain value in the most recent document.
This is my query:
`{
"query": {
"match": {
"name.keyword": "net"
}
},
"sort": [
{
"created_at.keyword": {
"order": "desc"
}
}
],
"size": 1,
"aggs": {
"CountValue": {
"terms": {
"field": "data.add.serv.desc.keyword",
"include": "nginx"
}
}
}
}`
And the output is:
`{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"CountValue" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "nginx",
"doc_count" : 2
}
]
}
}`
I suspect that doc_count is the number of documents the value appears in, not the number of times the value is repeated within the filtered document.
Any advice I will be very grateful!
Unless any of the fields under the path data.add.serv are of the nested type, the terms agg will produce per-whole-doc results, not per-field.
Exempli gratia:
POST example/_doc
{
"serv": [
{
"desc": "nginx"
},
{
"desc": "nginx"
},
{
"desc": "nginx"
}
]
}
then
GET example/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "serv.desc.keyword"
}
}
}
}
produces doc_count==1.
When, however, specified as nested:
DELETE example
PUT example
{
"mappings": {
"properties": {
"serv": {
"type": "nested"
}
}
}
}
POST example/_doc
{"serv":[{"desc":"nginx"},{"desc":"nginx"},{"desc":"nginx"}]}
then
GET example/_search
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "serv"
},
"aggs": {
"NAME": {
"terms": {
"field": "serv.desc.keyword"
}
}
}
}
}
}
we end up with doc_count==3.
This has to do with the way non-nested array types are flattened and de-duplicated. At the end, you may need to reindex your collections after having applied the nested mapping.
EDIT
In order to only take the latest doc, you could do the following:
PUT example
{
"mappings": {
"properties": {
"serv": {
"type": "nested"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
then
POST example/_doc
{
"created_at" : "2020-05-31 19:01:01",
"serv": [
{
"desc": "nginx"
},
{
"desc": "nginx"
},
{
"desc": "nginx"
}
]
}
POST example/_doc
{
"created_at" : "2020-06-23 15:11:59",
"serv": [
{
"desc": "nginx"
},
{
"desc": "nginx"
}
]
}
then use a terms agg of size 1, sorted by timestamp desc:
GET example/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "created_at",
"order": {
"_term": "desc"
},
"size": 1
},
"aggs": {
"NAME2": {
"nested": {
"path": "serv"
},
"aggs": {
"NAME": {
"terms": {
"field": "serv.desc.keyword"
}
}
}
}
}
}
}
}

Issue with nested aggregations ElasticSearch : doing a sum after a max

I know sub aggregation isn't possible with metric aggregations and that Elasticsearch supports sub aggregations with buckets. But I am a bit lost on how to do this.
I want to do a sum after nested aggregations and after having aggregated by max timestamp.
Something like the code below, give me this error : "Aggregator [max_date_aggs] of type [max] cannot accept sub-aggregations" which is normal. Is there a way to make it works?
{
"aggs": {
"sender_comp_aggs": {
"terms": {
"field": "senderComponent"
},
"aggs": {
"activity_mnemo_aggs": {
"terms": {
"field": "activityMnemo"
},
"aggs": {
"activity_instance_id_aggs": {
"terms": {
"field": "activityInstanceId"
},
"aggs": {
"business_date_aggs": {
"terms": {
"field": "correlationIdSet.businessDate"
},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {
"field": "contextSetId.closing"
},
"aggs": {
"max_date_aggs": {
"max": {
"field": "timestamp"
},
"aggs" : {
"sum_done": {
"sum": {
"field": "itemNumberDone"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Thank you
I am not 100% sure what you would like to achieve, it helps if you also would have shared the mapping.
A bucket aggregation is about defining the buckets/groups. As you do in your example, you can wrap/nest bucket aggregations to further break down your buckets into sub-buckets and so on.
By default Elasticsearch always calculates the count-metric, but you can specify other metrics to get calculated as well. A metric is calculated per bucket / for a bucket (and not for another metric) this is why you cannot nest a metrics aggregation under a metric aggregation, it simply does not make sense.
Depending how your data looks like the only change you may need to do is, moving the sum_done aggregation out of the aggs-clause, to the same level as your max_date_aggs-aggregation.
Code Snippet
"aggs": {
"max_date_aggs": { "max": {"field": "timestamp"} },
"sum_done": { "sum": { "field": "itemNumberDone"} }
}
After you refined your question and you provided I managed to come up with a solution requiring one single request. As previously mentioned that sum-metric aggregation needs to operate on a bucket and not a metric. The solution is pretty straight forward: rather than calculating the max-date, just re-formulate this aggregation to a terms-aggregation, sorted by descending timestamp, asking for exactly one bucket.
Solution
GET gos_element/_search
{
"size": 0,
"aggs": {
"sender_comp_aggs": {
"terms": {"field": "senderComponent.keyword"},
"aggs": {
"activity_mnemo_aggs": {
"terms": {"field": "activityMnemo.keyword"},
"aggs": {
"activity_instance_id_aggs": {
"terms": {"field": "activityInstanceId.keyword"},
"aggs": {
"business_date_aggs": {
"terms": {"field": "correlationIdSet.businessDate"},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {"field": "contextSetId.closing.keyword"},
"aggs": {
"max_date_bucket_aggs": {
"terms": {
"field": "timestamp",
"size": 1,
"order": {"_key": "desc"}
},
"aggs": {
"sum_done": {
"sum": {"field": "itemNumberDone"}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
As I relied on the default Elasticsearch mapping, I had to refer to the .keyword-version of the fields. If your fields are directly mapped to a field of type keyword, you don't need to do that.
You can test the request above right away after indexing the documents provided by you with the following 2 commands:
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz
{
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz8z
{
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
As a result you get back the following response (with value 10 as expected):
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sender_comp_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PS",
"doc_count" : 2,
"activity_mnemo_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PScommand",
"doc_count" : 2,
"activity_instance_id_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123466",
"doc_count" : 2,
"business_date_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1580083200000,
"key_as_string" : "2020-01-27T00:00:00.000Z",
"doc_count" : 2,
"context_set_id_closing_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PARIS",
"doc_count" : 2,
"max_date_bucket_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 1580180460000,
"key_as_string" : "2020-01-28T03:01:00.000Z",
"doc_count" : 1,
"sum_done" : {
"value" : 10.0
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
Here are two documents:
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz-o",
"_score": 5.785128,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
},
{
"_type": "gos_element",
"_id": "AW_yu3dIa2R_HwqpSz8z",
"_score": 4.8696175,
"_source": {
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
}
]
}
I would like to aggregate of a few terms (senderComponent, activityMnemo ,activityInstanceId, correlationIdSet.businessDate and contextSetId.closing) and also aggregate on the max timestamp of each of these aggregations. Once this is done, I would like to sum the itemNumberDone.
If we take only these two documents and do the aggregations, I would like to get 10 itemNumberDone.
Is it possible with only one query and using buckets?

Elastic-search missing bucket aggregation

Updated
I have the following elastic-search query. Which gives me the following results, with aggregation.
Tried following what Andrey Borisko example but for the life of me i can not get it working.
The main query with filter of companyId finds all the fullnames with the name 'Brenda'
The companyId agg returns best match companyId for fullnames brenda, based of the main filter.
My exact query
GET employee-index/_search
{
"aggs": {
"companyId": {
"terms": {
"field": "companyId"
},
"aggs": {
"filtered": {
"filter": {
"multi_match": {
"fields": [
"fullName.edgengram",
"number"
],
"query": "brenda"
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"fullName.edgengram",
"number"
],
"query": "brenda"
}
}
],
"filter": [
{
"terms": {
"companyId": [
3849,
3867,
3884,
3944,
3260,
4187,
3844,
2367,
158,
3176,
3165,
3836,
4050,
3280,
2298,
3755,
3854,
7161,
3375,
7596,
836,
4616
]
}
}
]
}
}
}
My exact result
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 8.262566,
"hits" : [
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "67207",
"_score" : 8.262566,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 158,
"fullName" : "Brenda Grey",
"companyTradingName" : "Sky Blue",
}
},
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "7061",
"_score" : 7.868355,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 158,
"fullName" : "Brenda Eaton",
"companyTradingName" : "Sky Blue",
}
},
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "107223",
"_score" : 7.5100465,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 3260,
"fullName" : "Brenda Bently",
"companyTradingName" : "Green Ice",
}
}
]
},
"aggregations" : {
"companyId" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "158",
"doc_count" : 2,
"filtered" : {
"doc_count" : 2
}
},
{
"key" : "3260",
"doc_count" : 1,
"filtered" : {
"doc_count" : 1
}
}
]
}
}
}
**This is how i want the filtered-companies results to look**
"aggregations": {
"companyId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "158",
"doc_count": 2,
"filtered": {
"doc_count": 2 (<- 2 records found of brenda)
}
},
{
"key": "3260",
"doc_count": 1,
"filtered": {
"doc_count": 1 (<- 1 records found of brenda)
}
},
{
"key": "4616",
"doc_count": 0,
"filtered": {
"doc_count": 0 (<- 0 records found of brenda)
}
},
... and so on. Basically all the other companies that are in the filtered list i want to display with a doc_count of 0.
]
}
As I understood you correctly, you want to run aggregation or a part of aggregation independently from the query. In this case you should use Global Aggregation
UPDATE after your comment
In this case you need to use filter aggregation. So for example this type query (simplified your example) you have currently:
GET indexName/_search
{
"size": 0,
"query": {
"match": {
"firstName": "John"
}
},
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
}
}
}
}
becomes this:
GET indexName/_search
{
"size": 0,
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
},
"aggs": {
"filtered": {
"filter": {
"match": {
"firstName": "John"
}
}
}
}
}
}
}
the output will look slightly different though:
...
"aggregations" : {
"by_phone" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 260072,
"buckets" : [
{
"key" : "+9649400",
"doc_count" : 270,
"filtered" : {
"doc_count" : 0 <-- not John
}
},
{
"key" : "+8003000",
"doc_count" : 184,
"filtered" : {
"doc_count" : 3 <-- this is John
}
},
{
"key" : "+41025026",
"doc_count" : 168,
"filtered" : {
"doc_count" : 0 <-- not John
}
}
...
And now if you need the results of query as well then you have to wrap it in global aggregation like so:
GET indexName/_search
{
"size": 20,
"from": 0,
"query": {
"match": {
"firstName": "John"
}
},
"aggs": {
"all": {
"global": {},
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
},
"aggs": {
"filtered": {
"filter": {
"match": {
"firstName": "John"
}
}
}
}
}
}
}
}
}
Reviewed version based on your query:
GET employee-index/_search
{
"size": 0,
"aggs": {
"filtered": {
"filter": {
"bool": {
"filter": [
{
"terms": {
"companyId": [
3849,
3867,
3884,
3944,
3260,
4187,
3844,
2367,
158,
3176,
3165,
3836,
4050,
3280,
2298,
3755,
3854,
7161,
3375,
7596,
836,
4616
]
}
}
]
}
},
"aggs": {
"by_companyId": {
"terms": {
"field": "companyId",
"size": 1000
},
"aggs": {
"testing": {
"filter": {
"multi_match": {
"fields": [
"fullName"
],
"query": "brenda"
}
}
}
}
}
}
}
}
}

Resources