Aggregation value inside array of array elasticsearch - elasticsearch

i have json structure like this:
[{
'id': 1,
'result': [{
"score": 0.0,
"result_rules": [{
"rule_id": "sr-1",
},
{
"rule_id": "sr-2",
}
]
}]
},
{
'id': 2,
'result': [{
"score": 0.0,
"result_rules": [{
"rule_id": "sr-1",
},
{
"rule_id": "sr-4",
}
]
}]
}]
i want to count rule_id, so the result would be:
[
{
'rule_id': 'sr-1',
'doc_count': 2
},
{
'rule_id': 'sr-2',
'doc_count': 1
},
{
'rule_id': 'sr-4',
'doc_count': 1
}
]
i've tried something like this, but it's showing empty aggregation
{
"aggs":{
"group_by_rule_id":{
"terms":{
"field": "result.result_rules.rule_id.keyword"
}
}
}
}

For aggregation on nested structure you would have to use nested aggregation.
See the example on ES DOC.

Related

How to add suggestion inside term query in DSL

My DOc is below
[
{'id':1, 'name': 'sachin messi', 'description': 'football#football.com', 'type': 'football', 'var':'sports'},
{'id':2, 'name': 'lionel messi', 'description': 'messi#fifa.com','type': 'soccer','var':'sports'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket', 'var':'sports'}
]
I need to suggest a string which after the term
My DSL query is below
quer = {
"query": {
"bool": {
"must": [
{
"terms": {
"var.keyword": [
"notsports"
]
}
},
{
"query_string": {
"query": "schin",
"fields": [
"name^128",
"description^64",
]
}
}
]
}
},
"suggest": {
"my-suggestion": {
"text": "schin",
"term": {
"field": "name",
"prefix_length": 0,
"min_word_length": 3,
"string_distance": "ngram"
}
}
}
}
My var.keyword is notsports
still i am getting suggestion 'suggest': {'my-suggestion': [{'text':'schin','offset':0,'length':5,'options': [{'text':'sachin', 'score': 0.75, 'freq': 1}]}
When i tried to put suggest inside terms list then i am getting RequestError: RequestError(400, 'x_content_parse_exception', 'unknown query [suggest]')
I need to get the suggestion only if var.keyword matches sports
I have asked question in elasticsearch also https://discuss.elastic.co/t/how-to-add-suggestion-inside-term-query-in-dsl/309893

Merge / flatten sub aggs into main agg

Is there away in elasticsearch to get the results back in a sort of flattend form (multiple child/sub aggs?
For instance currently i am trying to get back all product types and their status (online / offline).
This is what i end up with:
aggs
[
{ key: SuperProduct, doc_count:3, subagg:[
{status:online, doc_count:1},
{status:offline, doc_count:2}
]
},
{ key: SuperProduct2, doc_count:10, subagg:[
{status:online, doc_count:7},
{status:offline, doc_count:3}
]
Charting libraries tend to like it flattened so i was wondering if elasticsearch could probide it in this sort of manner:
[
{ products_key: 'SuperProduct', status_key:'online', doc_count:1},
{ products_key: 'SuperProduct', status_key:'offline', doc_count:2},
{ products_key: 'SuperProduct2', status_key:'online', doc_count:7},
{ products_key: 'SuperProduct2', status_key:'offline', doc_count:3}
]
Thanks
It is possible with composite aggregation which you can use to link two terms aggregations:
// POST /i/_search
{
"size": 0,
"aggregations": {
"distribution": {
"composite": {
"sources": [
{"product": {"terms": {"field": "product.keyword"}}},
{"status": {"terms": {"field": "status.keyword"}}}
]
}
}
}
}
This results in following structure:
{
"aggregations": {
"distribution": {
"after_key": {
"product": "B",
"status": "online"
},
"buckets": [
{
"key": {
"product": "A",
"status": "offline"
},
"doc_count": 3
},
{
"key": {
"product": "A",
"status": "online"
},
"doc_count": 2
},
{
"key": {
"product": "B",
"status": "offline"
},
"doc_count": 1
},
{
"key": {
"product": "B",
"status": "online"
},
"doc_count": 4
}
]
}
}
}
If for any reason composite aggregation doesn't fulfill your needs, you can create (via copy_to or by concatenation) or simulate (via scripted fields) field that would uniquely identify bucket. In our project we went with concatenation (partially for the necessity to collapse on this field), e.g. {"bucket": "SuperProductA:online"}, which results in dirtier output (you'll have to decode that field back or use top hits to get original values) but still does the job.

Spring Mongo - An aggregation to order by objects in an array

I have the following data:
{
"_id": ObjectID("5e2fa881c3a1a70006c5743c"),
"name": "Some name",
"policies": [
{
"cId": "dasefa-2738-4cf0-90e0d568",
"weight": 12
},
{
"cId": "c640ad67dasd0-92f981583568",
"weight": 50
}
]
}
I'm able to query this with Spring Mongo fine, however I want to be able to order the policies by weight
At the moment I get my results fine with:
return mongoTemplate.find(query, CArea::class.java)
However say I make the following aggregations:
val unwind = Aggregation.unwind("policies")
val sort = Aggregation.sort(Sort.Direction.DESC,"policies.weight")
How can I go and actually apply those to the returned results above? I was hoping that the dot annotation would do the job in my query however didnt do anything e.g. Query().with(Sort.by(options.sortDirection, "policies.weight"))
Any help appreciated.
Thanks.
I am not familier with Spring Mongo, but I guess you can convert the following aggregation to spring code.
db.collection.aggregate([
{
$unwind: "$policies"
},
{
$sort: {
"policies.weight": -1
}
},
{
$group: {
_id: "$_id",
"policies": {
"$push": "$policies"
},
parentFields: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
"$parentFields",
{
policies: "$policies"
}
]
}
}
}
])
This will result:
[
{
"_id": "5e2fa881c3a1a70006c5743c",
"name": "Some name",
"policies": [
{
"cId": "c640ad67dasd0-92f981583568",
"weight": 50
},
{
"cId": "dasefa-2738-4cf0-90e0d568",
"weight": 12
}
]
}
]
Playground

how to sort Data Sources in terraform based on arguments

I use following terraform code to get a list of available db resources:
data "alicloud_db_instance_classes" "resources" {
instance_charge_type = "PostPaid"
engine = "PostgreSQL"
engine_version = "10.0"
category = "HighAvailability"
zone_id = "${data.alicloud_zones.rds_zones.ids.0}"
multi_zone = true
output_file = "./classes.txt"
}
And the output file looks like this:
[
{
"instance_class": "pg.x4.large.2",
"storage_range": {
"max": "500",
"min": "250",
"step": "250"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "pg.x8.medium.2",
"storage_range": {
"max": "250",
"min": "250",
"step": "0"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "rds.pg.c1.xlarge",
"storage_range": {
"max": "2000",
"min": "5",
"step": "5"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "rds.pg.s1.small",
"storage_range": {
"max": "2000",
"min": "5",
"step": "5"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
}
]
And I want to get the one that's cheapest.
One way to do so is by sorting with storage-range.min, but how do I sort this list based on 'storage_range.min'?
Or I can filter by 'instance_class', but "alicloud_db_instance_classes" doesn't seem to like filter as it says: Error: data.alicloud_db_instance_classes.resources: : invalid or unknown key: filter
Any ideas?
The sort() function orders lexicographical and you have no simple key here.
You can use filtering with some code like this (v0.12)
locals {
best_db_instance_class_key = "rds.pg.s1.small"
best_db_instance_class = element( alicloud_db_instance_classes.resources, index(alicloud_db_instance_classes.resources.*.instance_class, best_db_instance_class_key) )
}
(Untested code)

How do I sort buckets by Term Aggregation's nested doc_count?

I have an index, invoices, that I need to aggregate into yearly buckets then sort.
I have succeeded in using Bucket Sort to sort my buckets by simple sum values (revenue and tax). However, I am struggling to sort by more deeply nested doc_count values (status).
I want to order my buckets not only by revenue, but also by the number of docs with a status field equal to 1, 2, 3 etc...
The documents in my index looks like this:
"_source": {
"created_at": "2018-07-07T03:11:34.327Z",
"status": 3,
"revenue": 68.474,
"tax": 6.85,
}
I request my aggregations like this:
const params = {
index: 'invoices',
size: 0,
body: {
aggs: {
sales: {
date_histogram: {
field: 'created_at',
interval: 'year',
},
aggs: {
total_revenue: { sum: { field: 'revenue' } },
total_tax: { sum: { field: 'tax' } },
statuses: {
terms: {
field: 'status',
},
},
sales_bucket_sort: {
bucket_sort: {
sort: [{ total_revenue: { order: 'desc' } }],
},
},
},
},
},
},
}
The response (truncated) looks like this:
"aggregations": {
"sales": {
"buckets": [
{
"key_as_string": "2016-01-01T00:00:00.000Z",
"key": 1451606400000,
"doc_count": 254,
"total_tax": {
"value": 735.53
},
"statuses": {
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 59
},
{
"key": 1,
"doc_count": 58
},
{
"key": 5,
"doc_count": 57
},
{
"key": 3,
"doc_count": 40
},
{
"key": 4,
"doc_count": 40
}
]
},
"total_revenue": {
"value": 7355.376005351543
}
},
]
}
}
I want to sort by key: 1, for example. Order the buckets according to which one has the greatest number of docs with a status value of 1. I tried to order my terms aggregation, then specify the desired key like this:
statuses: {
terms: {
field: 'status',
order: { _key: 'asc' },
},
},
sales_bucket_sort: {
bucket_sort: {
sort: [{ 'statuses.buckets[0]._doc_count': { order: 'desc' } }],
},
},
However this did not work. It didn't error, it just doesn't seem to have any effect.
I noticed someone else on SO had a similar question many years ago, but I was hoping a better answer had emerged since then: Elasticsearch aggregation. Order by nested bucket doc_count
Thanks!
Nevermind I figured it out. I added a separate filter aggregation like this:
aggs: {
total_revamnt: { sum: { field: 'revamnt' } },
total_purchamnt: { sum: { field: 'purchamnt' } },
approved_invoices: {
filter: {
term: {
status: 1,
},
},
},
Then I was able to bucket sort that value like this:
sales_bucket_sort: {
bucket_sort: {
sort: [{ 'approved_invoices>_count': { order: 'asc' } }],
},
},
In case if anyone comes to this issue again. Latest update tried with Elasticsearch version 7.10 could work in this way:
sales_bucket_sort: {
bucket_sort: {
sort: [{ '_count': { order: 'asc' } }],
},
}
With only _count specified, it will automatically take the doc_count and sort accordingly.
I believe this answer will just sort by the doc_count of the date_histogram aggregation, not the nested sort.
JP's answer works: create a filter with the target field: value then sort by it.

Resources