elasticsearch_dsl response multiple bucket aggregations - elasticsearch

found this thread on how to frame nested aggregations using elasticsearch_dsl Generate multiple buckets in aggregation
can someone show how to iterate through the response to get the second bucket results?
for i in s.aggregations.clients.buckets.num_servers.buckets:
does not work, how else to get to the content in num_servers or server_list?

You need two loops if you want to loop through an second level aggregation. Here is an example assuming 'label' and 'number' fields in your index:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, A
client = Elasticsearch()
# Build a two level aggregation
my_agg = A('terms', field='label')
my_agg.bucket('number', A('terms', field='number'))
# Build and submit the query
s = Search(using=client, index="stackoverflow")
s.aggs.bucket('label', my_agg)
response = s.execute()
# Loop through the first level of the aggregation
for label_bucket in response.aggregations.label.buckets:
print "Label: {}, {}".format(label_bucket.key, label_bucket.doc_count)
# Loop through the 2nd level of the aggregation
for number_bucket in label_bucket.number.buckets:
print " Number: {}, {}".format(number_bucket.key, number_bucket.doc_count)
Which would print something like this:
Label: A, 3
Number: 2, 2
Number: 1, 1
Label: B, 3
Number: 3, 2
Number: 1, 1

Related

ElasticSearch aggregation in case of minimal document count

(This is a similar question to Elasticsearch filter aggregations on minimal doc count)
Consider I have a following list:
A
A
A
B
B
I can easily perform aggregation by the title, which results in:
A (3x)
B (2x)
The question above deals with removing results which are less then min_doc_count. In this case for value of 2, this would end up like:
A (3x)
B
In my case min_doc_count does not help, because I still want to keep these results, but ungrouped. Eg for minCount 2, I would like to have following result:
A (3x)
B
B
Is this achievable with Elastic aggregations?

Counting records in elasticsearch by avoiding duplicates

For the search results of /_search, I would like to get the count of the total records, after applying a condition such that if there are multiple records with the same value in fieldxyz, I would like to count it only one record. For example, here are the full results:
Doc 1 {field_one:'value one' , fieldxyz: 'value four';}
Doc 2 {field_one:'value two' , fieldxyz: 'value five';}
Doc 3 {field_one:'value three' , fieldxyz: 'value four';}
Because 'value four' occurs twice, I would like to count those two records as one, and the final count should be 2.
How can I do that?
You can use the following elasticsearch cardinality aggregation to get the count of distinct values for a field:
{
"aggs": {
"counting": {
"cardinality": {
"field": "fieldxyz"
}
}
}
}

Grafana select max from multiple fields in Elasticsearch

For context, my data in Elasticsearch looks like the following:
"data": {
"errors": {
"file/name/somewhere.kt": 7,
"file/name/somewhereElse.kt": 1,
"file/name/some.kt": 2,
"file/name/where.kt": 4
}
Now as far as I can see, in Grafana I can only filter on the exact name of the field (data.errors.file/name/somewhere.kt).
What I would like to do is a wildcard search that returns the name and number of the highest number. Something like:
data.errors.*
Which would then return:
"file/name/somewhere.kt": 7
As that's the one with the highest number of errors.
Unfortunately such a wildcard does not seem to work: it returns 0 results.
Is this possible, and if so, how? Possibly with a Lucene query?

Elasticsearch DSL: Bucket not working

Running the code,
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q, A
client = Elasticsearch(timeout=100)
s = Search(using=client, index="cms*")
s.aggs.bucket('ExitCode', 'terms', field='ExitCode').metric('avgCpuEff', 'avg', field='CpuEff')
for hit in s[0:20].execute():
print hit['ExitCode']
yields several ExitCode = 0. I thought a terms bucket is supposed to group all the results that have the same exit code, in this case. What is actually going on?
You're iterating over the hits, you need to iterate over the aggregated buckets instead:
response = s.execute()
for code in response.aggregations.ExitCode.buckets:
print(code.key, code.avgCpuEff.value)

Elasticsearch Aggregation Huge Bucket

I have an category tree on each document. Like
[{
name: "RootCat",
parentId: 104319,
id: 104319
},
{
name: "FirstLevel",
parentId: 104319,
id: 104328
},
{
name: "n Level",
parentId: 104328,
id: 107929
}]
The problem when i want to have a tree for each search like :
Root Cat
- First Level
--Second Level-1
--Second Level-2
Aggregations come up with all buckets. And i have about 40000 categories, so it makes huge network traffic. How can i get only categories that i want to show.My filter aggreagtion below
.Filter(SearchConstants.Aggregation.Category,
y => y.Aggregations(r => r.Filter("filteredAggs",
cc => cc.Filter(GetPostFilters(searchQuery)) .Aggregations(ra => ra.Nested("cat",
ty => ty.Path(rtw => rtw.CategoryList).Aggregations(
abc => abc.Terms("categoryId", t => t.Field(q => q.CategoryList.First().Id).Size(0)
I want to get one level children with all parents. Like
Root Cat
Firts Level
Second Level
(But buckets have all level children)
Not sure what you mean, by default the search results impact the actual aggregations that are returned. Another option is to use a filter in the aggregation itself. But this is dependent on your requirements. Check the docs for implementing a filter in the aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-bucket-filter-aggregation.html

Resources