ElasticSearch: Is min_doc_count supported for Metric Aggregations - elasticsearch

I am new to Elastic Search and am trying to make a query with Metric aggregation for my docs. But when I add the field: min_doc_count=1 for my sum metric aggregation, I get an error:
`
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
},
"status": 400
}
`
What am I missing here?
`
{
"aggregations" : {
"myKey" : {
"sum" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}
`

I'm not sure why/where you have the sum keyword?
The idea of min_doc_count is to make sure buckets returned by a given aggs query contain at least N documents, the example below would only return subject buckets for subjects that appear in 10 or more documents.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "subject",
"min_doc_count": 10
}
}
}
}
So with that in mind, yours would refactor to the following... Although when setting min_doc_count to 1, it's not really necessary to keep the parameter at all.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}

If you wish to sum only non-zero values of field you can filter those zero-values out in a query section:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"field": {
"gt": 0
}
}
}
]
}
},
"aggregations": {
"myKey": {
"sum": {
"field": "field1"
}
}
}
}
See Bool Query and Range Term

Related

How to perform sub-aggregation in elasticsearch?

I have a set of article documents in elasticsearch with fields content and publish_datetime.
I am trying to retrieve most frequent words from articles with publish year == 2021.
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
},
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"word_counts": "word_counts",
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
This fails on
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25
}
],
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25,
"caused_by" : {
"type" : "named_object_not_found_exception",
"reason" : "[17:25] unknown field [word_counts_2021]"
}
},
"status" : 400
}
which does not make sense, because word_counts2021 is the name of the aggregation accordings to docs. It's not an aggregation type. I am the one who pics the name, so I thought it could have had basically any value.
Does anyone have any idea, what's going on there. So far, it seems pretty unintuitive service to me.
The agg as you have it written seems to be filtering publish_datetime buckets so that you only include those in the year 2021 to do that you must nest the sub-agg under that particular terms aggregation.
Like so:
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
}
But, if that field has a date time type, I would suggest simply filtering with a range query and then aggregating your documents.

Perform query and field collapse

When i do a multi-condition query and apply field collapsing to one of the field in the mentioned index i get following error
no mapping found for `search_type.keyword` in order to collapse on
Query Used :
GET /_search
{
"query": {
"bool" : {
"must" : [
{
"match" :
{
"id" : "123456"
}
},
{
"terms": {
"_index": ["history"]
}
}
]
}
},
"collapse" : {
"field" : "search_type.keyword",
"inner_hits": {
"name": "terms",
"size": 10
}
}
}
Error Trace:
{
"shard" : 0,
"index" : "test",
"node" : "UOA44HkATh61krg6ht3paA",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "no mapping found for `search_type.keyword` in order to collapse on"
}
}
Currently, am applying the query only for index - history but the result throws exception for indexes that i haven't mentioned. Please help how to narrow down field collapsing to a particular index.
It appears to be a bug, but if you notice your result carefully, you should be able to view the response you are looking for at the very end after all the such errors are observed.
But then again why not add the index name to the front and modify your query as below:
POST history/_search <---- Add index name here
{
"query": {
"bool": {
"must": [
{
"match": {
"id": "123456"
}
}
]
}
},
"collapse" : {
"field" : "search_type.keyword",
"inner_hits": {
"name": "terms",
"size": 10
}
}
}

Add condition to filter aggregation in elastic search

I want the count of each values of a variable based on some filter applied in elastic search. For example, I want all the age groups but on the filter that the students are from California.
The age groups is text field and contains an array like this,
"age_group": ["5-6-years", "6-7-years"]
I kinda want a query like this but this ain't working. It throws an error saying
unable to parse BaseAggregationBuilder with name [count]: parser not found
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.place_of_birth": "California" }
},
"aggs" : {
"age_group" : { "count" : { "field" : "students.age_group" } }
}
}
}
}
Request help from you troops.
That's because there's no metric aggregation called count but value_count instead:
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.gender": "boys" }
},
"aggs" : {
"age_group" : { "value_count" : { "field" : "students.age_group" } }
^^^
|||
}
}
}
}
UPDATE:
After discussions, the terms aggregation was more appropriate than value_count. After fixing the mapping (which was text instead of keyword), the query worked out correctly

Elasticsearch sort inside top_hits aggregation

I have an index of messages where I store messageHash for each message too. I also have many more fields along with them. There are multiple duplicate message fields in the index e.g. "Hello". I want to retrieve unique messages.
Here is the query I wrote to search unique messages and sort them by date. I mean the message with the latest date among all duplicates is what I want
to be returned.
{
"query": {
"bool": {
"must": {
"match_phrase": {
"message": "Hello"
}
}
}
},
"sort": [
{
"date": {
"order": "desc"
}
}
],
"aggs": {
"top_messages": {
"terms": {
"field": "messageHash"
},
"aggs": {
"top_messages_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
},
"_score"
],
"size": 1
}
}
}
}
}
}
The problem is that it's not sorted by date. It's sorted by doc_count. I just get the sort values in the response, not the real sorted results. What's wrong? I'm now wondering if it is even possible to do it.
EDIT:
I tried subsituting "terms" : { "field" : "messageHash", "order" : { "mydate" : "desc" } } , "aggs" : { "mydate" : { "max" : { "field" : "date" } } } for "terms": { "field": "messageHash" } but I get:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Found two sub aggregation definitions under [top_messages]",
"line" : 1,
"col" : 412
}
],
"type" : "parsing_exception",
"reason" : "Found two sub aggregation definitions under [top_messages]",
"line" : 1,
"col" : 412
},
"status" : 400
}

Count how many documents have an attribute or are missing that attribute in Elasticsearch

How can I write a single Elasticsearch query that will count how many documents either have a value for a field or are missing that field?
This query successfully count the docs missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field" : {
"missing": { "field": "group_doc_groupset_id" }
}
}
}
This query does the opposite, counting documents NOT missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Not_Missing_Field" : {
"exists": { "field": "group_doc_groupset_id" }
}
}
}
How can I write one that combines both? For example, this yields a syntax error:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field_Or_Not" : {
"missing": { "field": "group_doc_groupset_id" },
"exists": { "field": "group_doc_groupset_id" }
}
}
}
GET indexname/_search?size=0
{
"aggs": {
"a1": {
"missing": {
"field": "status"
}
},
"a2": {
"filter": {
"exists": {
"field": "status"
}
}
}
}
}
As per new Elastic search recommendation in the docs:
GET {your_index_name}/_search #or _count, to see just the value
{
"query": {
"bool": {
"must_not": { # here can be also "must"
"exists": {
"field": "{field_to_be_searched}"
}
}
}
}
}
Edit: _count allows to have exact values of how many documents are indexed. If there're more than 10k the total is shown as:
"hits" : {
"total" : {
"value" : 10000, # 10k
"relation" : "gte" # Greater than
}

Resources