Elasticsearch distinct count on nested fields - elasticsearch

According to docs, distinct count can be achieved approximately by using cardinality.
https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html
I have a large store of data of type like this:
{
{
"foo": {
"bar": "a1"
}
},
{
"foo": {
"bar": "a2"
}
}
}
and I want to do a distinct count of "foo.bar" values.
My DSL query:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "bar"
}
}
}
}
returns "number_of_bars": 0. I was also trying "field": "foo.bar", which results in an error.
Can you tell me, what I am doing wrong?

Use this:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "foo.bar.keyword"
}
}
}
}

Related

Elasticsearch aggregations significant_text without query block returns zero buckets

I want to learn elasticsearch and I am following this guide:
https://github.com/LisaHJung/Part-2-Understanding-the-relevance-of-your-search-with-Elasticsearch-and-Kibana-
This command worked correctly as described in the guide, it will return buckets with significant_texts:
GET news_headlines/_search
{
"query": {
"match": {
"category": "ENTERTAINMENT"
}
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
I thought I'd explore by trying to find significant_text against ALL documents in my index. But both these attempts gave my zero bucketed items:
GET news_headlines/_search
{
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
GET news_headlines/_search
{
"query": {
"match_all": { }
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
What did I do wrong? Or is there something about aggregations that I don't understand?

Sorting a reverse nested back to parent aggregation

I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.
My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.
I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.
class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
id: '32ead132eq13w21',
statistics: {
clicks: 123,
views: 456
},
categories: [{ //nested type
name: 'color',
tags: [{ //nested type
slug: 'blue'
},{
slug: 'red'
}]
}]
}
GET /acounts-123321/_search
{
size: 0,
aggs: {
categories_parent: {
nested: {
path: 'categories.tags'
},
aggs: {
filtered: {
filter: {
term: { 'categories.tags.category': 'color' }
},
aggs: {
by_slug: {
terms: {
field: 'categories.tags.slug',
size: perPage
},
aggs: {
back_to_parent: {
reverse_nested: {},
aggs: {
clicks: {
sum: {
field: 'statistics.clicks'
}
},
custom_metric: {
scripted_metric: {
init_script: 'state.accounts = []',
map_script: 'state.accounts.add(new HashMap(params["_source"]))',
combine_script: 'double result = 0;
for (acc in state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;',
reduce_script: 'double sum = 0;
for (state in states) {
sum += state;
}
return sum;'
}
},
by_tag_sort: {
bucket_sort: {
sort: [{ 'clicks.value': { order: 'desc' } }]
}
}
}
}
}
}
}
}
}
Update:
It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.
I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.
by_tag_sort: {
bucket_sort: {
sort: [{ 'custom_metric.value': { order: 'desc' } }]
}
}
bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).
The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"clicks.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Update: sorting by the result of a custom scripted metric value
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"custom_metric": {
"scripted_metric": {
"init_script": "state.accounts = []",
"map_script": """state.accounts.add(params["_source"])""",
"combine_script": """
double result = 0;
for (def acc : state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;
""",
"reduce_script": """
double sum = 0;
for (def state : states) {
sum += state;
}
return sum;
"""
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"custom_metric.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Joe - Elasticsearch Handbook - I have an equivalent query to yours (one that sorts by the result of a custom scripted metric) and I expect the response to your query looks something like the below.
I have noticed that sorting specified by the bucket_sort does not get applied to the uppermost buckets (i.e. by_slug.buckets), which are still sorted by the default doc_count ordering. This can also be verified by changing the custom_metric.value ordering from desc to asc, which has no effect on the order of the results.
My understanding of bucket_sort suggests that sorting based on the custom_metric is applied to the aggregation one level up, which in this case would be multi_bucket_emulator.buckets (but because this is an emulator it has no actual buckets to sort).
Is it possible to sort the by_slug.buckets based on the custom_metric values?
I am using Elasticsearch v7.10.
Thanks very much.
(Sorry for posting this question as an answer; it was too long to be a comment.)
Response (approximation):
{
"aggregations": {
"categories_parent": {
"filtered": {
"by_slug": {
"buckets": [
{
"key": "xxxxxx",
"back_to_parent": {
"multi_bucket_emulator": {
"buckets": {
"placeholder_match_all_query": {
"clicks": {
"buckets": [
{
"key": 5.0,
"doc_count": 1
},
…
]
},
"custom_metric": {
"value": 20.0
}
}
}
}
}
},
…
]
}
}
}
}
}

Elasticsearch - Query field against aggregation

I am exploring the ease of querying and aggregating the data using elasticsearch. But i am not able to pivot and aggregate the data in a single query as below:
Considering the data:
Is there a way to query the below result
that pivots and aggregates the value as below:
Required Result:
{
{
"A":a1,
"B":b1,
"Value":3
},
{
"A":a1,
"B":b2,
"Value":3
},
{
"A":a2,
"B":b2,
"Value":4
},
{
"A":a1,
"B":b3,
"Value":11
}
}
Yes, you can nest two terms aggregations for A and B, like this, and you'll get exactly the results you expect:
{
"size": 0,
"aggs": {
"A": {
"terms": {
"field": "A"
},
"aggs": {
"B": {
"terms": {
"field": "B"
},
"aggs": {
"value_sum": {
"sum": {
"field": "Value1"
}
}
}
}
}
}
}
}

Aggregations for categories, sorted by category sequence

I have an elastic index, in which each document contains the following:
category {
"id": 4,
"name": "Green",
"seq": 2
}
I can use aggregations to get me the doc count for each of the categories:
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name"
}
}
}
}
This is fine, but the aggs are sorted by the doc count. What I'd like is to have the buckets sorted by the seq value, something that's easy in SQL.
Any suggestions?
Thanks!
Take a look at ordering terms aggregations.
Something like this could work, but only if "name" and "sequence" have the right relationships (one-to-one, or it works out in some other way):
POST /test_index/_search
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name",
"order" : { "seq_num" : "asc" }
},
"aggs": {
"seq_num": {
"max": {
"field": "category.seq"
}
}
}
}
}
}
Here is some code I used for testing:
http://sense.qbox.io/gist/4e551b2faec81eb0343e0e6d0cc9b10f20d7d4c1

Elasticsearch Ordering terms aggregation buckets after field in top hits sub aggregation

I would like to order the buckets from a terms aggregation based on a property possessed by the first element in a top hits aggregation.
My best effort query looks like this (with syntax errors):
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topAnswer._source.id": "asc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Does anyone know how to accomplish this?
Example:
{
"a":1,
"b":2,
"id":4
}
{
"a":1,
"b":3,
"id":1
}
{
"a":2,
"b":4,
"id":3
}
Grouping by "a" and ordering the buckets by "id" (desc) and sorting the top hits on "b" (desc) would give:
{2:{
"a":2,
"b":4,
"id":3
},1:{
"a":1,
"b":3,
"id":1
}}
You can do it with the following query. The idea is to show for each parent_uuid bucket the first top hit with the minimum id value and to sort the parent_uuid buckets according the smallest id value as well using a min sub-aggregation.
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topSort": "desc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1,
"sort": {
"b": "desc"
}
}
},
"topSort": {
"max": {
"field": "id"
}
}
}
}
}
}
Try it out and report if this works out for you.

Resources