Elasticsearch - Query field against aggregation - elasticsearch

I am exploring the ease of querying and aggregating the data using elasticsearch. But i am not able to pivot and aggregate the data in a single query as below:
Considering the data:
Is there a way to query the below result
that pivots and aggregates the value as below:
Required Result:
{
{
"A":a1,
"B":b1,
"Value":3
},
{
"A":a1,
"B":b2,
"Value":3
},
{
"A":a2,
"B":b2,
"Value":4
},
{
"A":a1,
"B":b3,
"Value":11
}
}

Yes, you can nest two terms aggregations for A and B, like this, and you'll get exactly the results you expect:
{
"size": 0,
"aggs": {
"A": {
"terms": {
"field": "A"
},
"aggs": {
"B": {
"terms": {
"field": "B"
},
"aggs": {
"value_sum": {
"sum": {
"field": "Value1"
}
}
}
}
}
}
}
}

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Elasticsearch - add normal field filter to nested field aggregation

I have document structure like below in ES:
{
customer_id: 1,
is_member: true,
purchases: [
{
pur_id: 1,
pur_channel_id: 1,
pur_amount: 100.00,
pur_date: '2021-08-01'
},
{
pur_id: 2,
pur_channel_id: 2,
pur_amount: 100.00,
pur_date: '2021-08-02'
}
]
},
{
customer_id: 2,
is_member: false,
purchases: [
{
pur_id: 3,
pur_channel_id: 1,
pur_amount: 200.00,
pur_date: '2021-07-01'
},
{
pur_id: 4,
pur_channel_id: 3,
pur_amount: 300.00,
pur_date: '2021-07-02'
}
]
}
I want to aggregate sum by purchases.pur_channel_id and also for each sub aggregation I want to add sub sum aggregation on documents that contains "is_member=false", therefore, I composed following query:
{
"size": 0,
"query": {
"match_all": {}
}
},
"aggs": {
"purchases": {
"nested": {
"path": "purchases"
},
"aggs": {
"pur_channel_id": {
"terms": {
"field": "purchases.pur_channel_id",
"size": 10
},
"aggs": {
"none_member": {
"filter": {
"term": {
"is_member": false
}
},
"aggs": {
"none_member_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
},
"aggs": {
"pur_channel_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
}
}
}
}
}
The query runs success, while I got 0 for all "none_member_amount". I wonder a normal field perhaps can not be added inside of a nested aggregation.
Please help! Thanks.
Nested aggregation runs at nested block level, so your query is searching for is_member field in nested documents. To join back to parent doc you need to use reverse nested aggregation or you can move is_member check before nested aggregation using filter aggregation.

Filter out terms aggregation buckets in elasticsearch after applying aggregation

Below is snapshot of the dataset:
recordNo employeeId employeeStatus employeeAddr
1 employeeA Permanent
2 employeeA ABC
3 employeeB Contract
4 employeeB CDE
I want to get the list of employees along with employeeStatus and employeeAddr.
So I am using terms aggregation on employeeId and then using sub-aggregations of employeeStatus and employeeAddr to get these details.
Below query returns the results correctly.
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
Now I want only the employees which are in Permanent status. So I am applying filter aggregation.
{
"aggregations": {
"filter_Employee_employeeID": {
"filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {"query": "Permanent"}
}
}
]
}
},
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
}
}
Now the problem is that the employeeAddr aggregation returns no buckets for employeeA because record 2 gets filtered out before the aggregation is done.
Assuming that I cannot modify the data set and I want to achieve the result with a single elastic query, how can I do it?
I checked the Bucket Selector pipeline aggregation but it only works for metric aggregations.
Is there a way to filter out term buckets after the aggregation is applied?
If I understood correctly you want to preserve the aggregations even if you use some kind of filter. To achieve that, try using the post_filter clause.
You can check the docs here
The clause is applied "outside" the aggregation. Using your example, it should look like this:
{
"aggregations": {
"filter_Employee_employeeID": {
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {
"field": "employeeStatus"
}
},
"employeeAddr": {
"terms": {
"field": "employeeAddr"
}
}
}
}
}
}
},
"post_filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {
"query": "Permanent"
}
}
}
]
}
}
}
I tested a combination of the include field for the terms aggregation, plus using a bucket_selector with document count would give you the desired result.
Filtering term values is here.
Bucket selector using document count is here
the subtlety here is that, yes you need numeric values, but you can also reference meta/custom fields that elasticsearch has
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeId.keyword"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus", "include": "Permanent"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "employeeStatus._bucket_count"
},
"script": {
"source": "params.count != 0"
}
}
}
}
}
}
}
I tested this on 7.10 and it worked, returning only employeeA, with the address included.

Can I sort grouped search result by formula?

I am trying to implement query which will sort aggregated results by the formula.
For example, we have the next entities:
{
"price":"1000",
"zip":"77777",
"field1":"1",
"field2":"5"
},
{
"price":"2222",
"zip":"77777",
"field1":"2",
"field2":"5"
},
{
"price":"1111",
"zip":"77777",
"field1":"1",
"field2":"5"
}
Now, my query without sorting looks like:
POST /entities/_search {
"size": 0,
"query": {
"term": {
"zip": {
"value": "77777"
}
}
},
"aggs": {
"my composite": {
"composite": {
"size": 500,
"sources": [{
"field1_term": {
"terms": {
"field": "field1"
}
}
},
{
"field2_term": {
"terms": {
"field": "field2"
}
}
}
]
},
"aggs": {
"avg_price_per_group": {
"avg": {
"field": "price"
}
},
"results_per_group": {
"top_hits": {
"size": 100,
"_source": {
"include": ["entity_id", "price"]
}
}
}
}
}
}
}
The first one I need to group result by field1 and field2 and then calculate the average price for each group.
Then I need to divide the price of each doc by average price value and sort documents based on this value.
Is it possible to do this somehow?

Elasticsearch distinct count on nested fields

According to docs, distinct count can be achieved approximately by using cardinality.
https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html
I have a large store of data of type like this:
{
{
"foo": {
"bar": "a1"
}
},
{
"foo": {
"bar": "a2"
}
}
}
and I want to do a distinct count of "foo.bar" values.
My DSL query:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "bar"
}
}
}
}
returns "number_of_bars": 0. I was also trying "field": "foo.bar", which results in an error.
Can you tell me, what I am doing wrong?
Use this:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "foo.bar.keyword"
}
}
}
}

Resources