Invalid aggregation name. Aggregation names must be alpha-numeric and can only contain '_' and '-'" - elasticsearch

I'm trying to use bucket selection with over an aggregation whose name contains numerical characters. Invalid aggregation name [secondagg_sum_[filters_equals_100]]. Aggregation names must be alpha-numeric and can only contain '_' and '-'" error has been occurred for me. I realized that aggregations with numeric characters has been solved but I think there is still an error for bucket selection. I asked the question on elastic search forum but nobody is replied. Is there anybody how to solve this problem?
PS: I'm generating my aggregation name dynamically so I need to use numeric characters.
Error:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Invalid aggregation name [secondagg_sum_[filters_equals_100]]. Aggregation names must be alpha-numeric and can only contain '_' and '-'",
"line" : 1,
"col" : 318
}
],
"type" : "parsing_exception",
"reason" : "Invalid aggregation name [secondagg_sum_[filters_equals_100]]. Aggregation names must be alpha-numeric and can only contain '_' and '-'",
"line" : 1,
"col" : 318
},
"status" : 400
}
My query:
{
"size": 0,
"query": {
"bool": {
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"first_agg": {
"terms": {
"field": "firstproperty",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_term": "asc"
}
},
"aggregations": {
"secondagg_sum_[filters_equals_100]": {
"sum": {
"field": "secondproperty"
}
},
"agg_values": {
"bucket_selector": {
"buckets_path": {
"total": "secondagg_sum_[filters_equals_100]"
},
"script": {
"inline": "params.total > 100",
"lang": "painless"
},
"gap_policy": "skip"
}
}
}
}
}
}

Related

Elasticsearch sort within sampler

We are using Elasticsearch 7.*, and I'm trying to take a sample. It returns far more than 10,000 results, which is the max hits a query can return. In order to paginate with search_after, I need to sort the items by #timestamp (_id sorting will be deprecated soon).
Here's my current query:
GET /my-index-pattern/_search
{
"query": {
"range": {
"#timestamp": {
"gte": "now-1M",
"lte": "now"
}
}
},
"aggs": {
"sample": {
"sampler": {
"shard_size": 40000
},
"aggs": {
"group_by_my_grouping_field": {
"terms": {
"field": "my_grouping_field.keyword",
"size": 10000
}
}
}
}
},
"sort": [
"#timestamp"
]
}
Returning:
"_shards" : {
"total" : 55,
"successful" : 55,
"skipped" : 43,
"failed" : 0
},
However, this takes a long time. I think it's sorting before doing the sample, which also affects my methodology. It's also skipping something?
Is there a way to sort within the sample?
I tried:
...
"sample": {
"sampler": {
"shard_size": 40000
},
"aggs": {
"group_by_my_grouping_field": {
"terms": {
"field": "my_grouping_field.keyword",
"size": 10000
}
},
"search_after_sort":
{
"bucket_sort": {
"sort": ["#timestamp"]
}
}
}
}
...
But this just gives:
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: No aggregation found for path [#timestamp];"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: No aggregation found for path [#timestamp];"
},
"status" : 400
enter code here
This happens for all fields, like message and _id, not just on #timestamp.

How to perform sub-aggregation in elasticsearch?

I have a set of article documents in elasticsearch with fields content and publish_datetime.
I am trying to retrieve most frequent words from articles with publish year == 2021.
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
},
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"word_counts": "word_counts",
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
This fails on
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25
}
],
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25,
"caused_by" : {
"type" : "named_object_not_found_exception",
"reason" : "[17:25] unknown field [word_counts_2021]"
}
},
"status" : 400
}
which does not make sense, because word_counts2021 is the name of the aggregation accordings to docs. It's not an aggregation type. I am the one who pics the name, so I thought it could have had basically any value.
Does anyone have any idea, what's going on there. So far, it seems pretty unintuitive service to me.
The agg as you have it written seems to be filtering publish_datetime buckets so that you only include those in the year 2021 to do that you must nest the sub-agg under that particular terms aggregation.
Like so:
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
}
But, if that field has a date time type, I would suggest simply filtering with a range query and then aggregating your documents.

can i use count query when distinct a field in Elasticsearch?

how can i get count of elements when using distinct a filed in Elasticsearch ? i want get total elements of index when distinct one of field . i can use these codes for search :
**POST myIndex/_search**
{
"size": 0,
"aggs": {
"myField": {
"terms": {
"field": "name’s of my field",
"size": 10000
}
}
}
.
.
.
}
but , I want query similar to :
**GET myIndex/_count**
{
"size": 0,
"aggs": {
"myField": {
"terms": {
"field": "name’s of my field",
"size": 10000
}
}
}
.
.
.
}
but return error :
**{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "request does not support [size]",
"line" : 2,
"col" : 3
}
],
"type" : "parsing_exception",
"reason" : "request does not support [size]",
"line" : 2,
"col" : 3
},
"status" : 400
}**
so i interested a solution a bout this problem .
Elastic search only supports approximate Distinct using cardinality aggregation
{
"aggs": {
"distinct_count": {
"cardinality": {
"field": "field-name"
}
}
}
}
values are approximate. Though you can increase precision using precision_threshold

ElasticSearch: Is min_doc_count supported for Metric Aggregations

I am new to Elastic Search and am trying to make a query with Metric aggregation for my docs. But when I add the field: min_doc_count=1 for my sum metric aggregation, I get an error:
`
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
},
"status": 400
}
`
What am I missing here?
`
{
"aggregations" : {
"myKey" : {
"sum" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}
`
I'm not sure why/where you have the sum keyword?
The idea of min_doc_count is to make sure buckets returned by a given aggs query contain at least N documents, the example below would only return subject buckets for subjects that appear in 10 or more documents.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "subject",
"min_doc_count": 10
}
}
}
}
So with that in mind, yours would refactor to the following... Although when setting min_doc_count to 1, it's not really necessary to keep the parameter at all.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}
If you wish to sum only non-zero values of field you can filter those zero-values out in a query section:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"field": {
"gt": 0
}
}
}
]
}
},
"aggregations": {
"myKey": {
"sum": {
"field": "field1"
}
}
}
}
See Bool Query and Range Term

Elasticsearch sort inside top_hits aggregation

I have an index of messages where I store messageHash for each message too. I also have many more fields along with them. There are multiple duplicate message fields in the index e.g. "Hello". I want to retrieve unique messages.
Here is the query I wrote to search unique messages and sort them by date. I mean the message with the latest date among all duplicates is what I want
to be returned.
{
"query": {
"bool": {
"must": {
"match_phrase": {
"message": "Hello"
}
}
}
},
"sort": [
{
"date": {
"order": "desc"
}
}
],
"aggs": {
"top_messages": {
"terms": {
"field": "messageHash"
},
"aggs": {
"top_messages_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
},
"_score"
],
"size": 1
}
}
}
}
}
}
The problem is that it's not sorted by date. It's sorted by doc_count. I just get the sort values in the response, not the real sorted results. What's wrong? I'm now wondering if it is even possible to do it.
EDIT:
I tried subsituting "terms" : { "field" : "messageHash", "order" : { "mydate" : "desc" } } , "aggs" : { "mydate" : { "max" : { "field" : "date" } } } for "terms": { "field": "messageHash" } but I get:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Found two sub aggregation definitions under [top_messages]",
"line" : 1,
"col" : 412
}
],
"type" : "parsing_exception",
"reason" : "Found two sub aggregation definitions under [top_messages]",
"line" : 1,
"col" : 412
},
"status" : 400
}

Resources