Is there a way to pull back the field name after aggregation? - elasticsearch

I'm using elastic search 1.5.2 and was wondering if there is a way to add the field name to the return of an aggregation other than the name of the aggregation itself, so that age can stay age and not Customer.Age
{
"aggregations": {
"age": {
"terms": {
"field": "Customer.Age",
"size": 10
}
}
}
}
I want the return to look something like this
aggregations: {
age: {
doc_count_error_upper_bound: 0
sum_other_doc_count: 0
field: Customer.Age
buckets: [6]
0: {
key: "unknown"
doc_count: 607103
}
}
And what I currently get does not include field.

This is not possible at the moment (before 2.0 is released). However, since the name of the aggregation can be essentially anything, you can encode both aggregation name and the field in the aggregation name and then parse it on the client side:
{
"aggregations": {
"age/Customer.Age": {
"terms": {
"field": "Customer.Age",
"size": 10
}
}
}
}
In v2.0 it will be possible to specify arbitrary metadata that will be returned back to the user:
{
"aggregations": {
"age: {
"terms": {
"field": "Customer.Age",
"size": 10
},
"meta": {
"field": "Customer.Age"
}
}
}
}

Related

How to group docs by latest and run aggregation on the group in elastic search?

I need to get all clients under a partnerId & since there would be many duplicate clients I need to group the clients by clientId,reportDate and run some aggregations on it.
Index data is like below -
[
{ partnerId: "PID1234", clientId: "c1234", reportDate: "2022-02-01" }, // dup
{ partnerId: "PID1234", clientId: "c1234", reportDate: "2030-02-01" }, // dup, agg should take this one only since this is the latest.
{ partnerId: "PID1111", clientId: "c1222", reportDate: "2010-02-01" },
{ partnerId: "PID2222", clientId: "c1444", reportDate: "2013-02-01" },
]
I need to do something like the below query, the problem is top hits don't accept sub aggregations -
{
"query": {
"bool": {
"must": [
{
"term": {
"partnerId": "PID1234"
}
}
]
}
},
"aggs": {
"groupp": {
"top_hits": {
"sort": [
{
"clientId": {
"order": "desc"
}
}
],
"size": 1,
"aggs": {
"total_engagement_count": {
"sum": { "field": "recommendations.totalEngagement" }
}
}
}
}
}
}
Maybe you need to change your approach.
Look at the answer below, aggs metrics don't allow sub aggs.
https://github.com/elastic/elasticsearch/issues/16537#issuecomment-181965367

How to write an Elasticsearch query that aggregates by a field, and for each one of another field?

Given an index with documents of the following format
{
"userA": "user1",
"relation": 10,
"userB": "user2"
}
How to create an aggregation query that will display for each user (from a given list), the sum of 'relation's between them.
for example: given userX, userY
result will be:
{
user4: {user1: 100, user2: 300, user3: 350},
...
userX: {user4: 123, user5: 456}
}
I tried to do it using 2 separate queries like that (the second one with userB instead in the aggs field)
GET myindex*/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"userA": [<input user ids>]
}
},
{
"terms": {
"userB": [<input user ids>]
}
]
}
},
"aggs": {
"connections": {
"terms": {
"field": "userA" //// Second query with `userB`
},
"aggs": {
"privateConversationCount": {
"avg": {
"field": "privateConversationCount"
}
}
}
}
}
}
But this is not correct, it requires a nested aggregation.
How could I write a query that will answer that need?

How to filter response in multi search in elasticsearch?

I am using python's client of elasticsearch 6.5 for multi search since I have to fetch data from multiple indexes with different queries and aggregations.
GET _msearch/
{
"index": QUESTION_INDEX
}
{
"aggs": {
"order_info":{
"terms": {
"field": "order_ids",
"size": 9999
},
"aggs": {
"total_value": {
"sum": "selling_price"
}
}
},
"median_price": {
"percentiles_bucket": {
"buckets_path": "order_info>total_value",
"percents": [50]
}
}
}
}
Now in my response I am getting the order_info bucket but I only need the percentile value. So is there any way to filter out this bucket from response of elasticsearch?
Edit 1: I want to reduce the response size which is coming over network call from es

elasticsearch Need average per week of some value

I have simple data as
sales, date_of_sales
I need is average per week i.e. sum(sales)/no.of weeks.
Please help.
What i have till now is
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sales",
"interval": "week"
}
},
"TotalSales": {
"sum": {
"field": "sales"
}
},
"myValue": {
"bucket_script": {
"buckets_path": {
"myGP": "TotalSales",
"myCount": "WeekAggergation._bucket_count"
},
"script": "params.myGP/params.myCount"
}
}
}
}
I get the error
Invalid pipeline aggregation named [myValue] of type [bucket_script].
Only sibling pipeline aggregations are allowed at the top level.
I think this may help:
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sale",
"interval": "week",
"format": "yyyy-MM-dd"
},
"aggs": {
"TotalSales": {
"sum": {
"field": "sales"
}
},
"AvgSales": {
"avg": {
"field": "sales"
}
}
}
},
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales"
}
}
}
}
Note the TotalSales aggregation is now a nested aggregation under the weekly histogram aggregation (I believe there was a typo in the code provided - the simple schema provided indicated the field name of date_of_sale and the aggregation provided uses the plural form date_of_sales). This provides you a total of all sales in the weekly bucket.
Additionally, AvgSales provides a similar nested aggregation under the weekly histogram aggregation so you can see the average of all sales specific to that week.
Finally, the pipeline aggregation avg_all_weekly_sales will give the average of weekly sales based on the TotalSales bucket and the number of non-empty buckets - if you want to include empty buckets, add the gap_policy parameter like so:
...
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales",
"gap_policy": "insert_zeros"
}
}
...
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html).
This pipeline aggregation may or may not be what you're actually looking for, so please check the math to ensure the result is what is expected, but should provide the correct output based on the original script.

Return parent data with child document from Elasticsearch

Is is possible to return parent data with a search for child documents within an Elasticsearch query?
I have two document types, e.g. Book and Chapter, that are related as Parent/Child (not nested).
I want to run a search on the child document and return the child document, with some of the fields from the parent document. I'm trying to avoid executing a separate query on the parent.
Update
The only way possible I can find is to use the has_child query and then a series of aggregations to drill back to the children and apply the query/filter again. However, this seems overly complicated and inefficient.
GET index/_search
{
"size": 10,
"query": {
"has_child": {
"type": "chapter",
"query": {
"term": {
"field": "value"
}
}
}
},
"aggs": {
"name1": {
"terms": {
"size": 50,
"field": "id"
},
"aggs": {
"name2": {
"top_hits": {
"size": 50
}
},
"name3": {
"children": {
"type": "type2"
},
"aggs": {
"docFilter": {
"filter": {
"query": {
"match": {
"_all": "value"
}
}
},
"aggs": {
"docs": {
"top_hits": {
"size": 50
}
}
}
}
}
}
}
}
}
}
It is possible do a has_child query to return the parent docs with a top hits aggregation to return the child docs, but it is a bit cumbersome.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
The Inner Hits feature that is due to be released in 1.5.0 will do what you want.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-request-inner-hits.html
You could build the source from master and try it out.
This can be now be done with ElasticSearch. Just use 'has_parent' in the search query:
'has_parent': {
'parent_type': 'book',
'query': {
'match_all': {}
},
'inner_hits': {}
}
The results will appear in the inner_hits of the response.
As Dan Tuffery say in his comment, currently, this can be achieve with Inner Hits, in Java you can understand it more easy with the next snippet of code.
SearchResponse searchResponse = this.transportClient.prepareSearch("your_index")
.setTypes("your_type")
.setQuery(QueryBuilders.filteredQuery(
null,
FilterBuilders.hasParentFilter(
"parent_type_name",
FilterBuilders.termFilter("foo", "foo"))
.innerHit(new QueryInnerHitBuilder()))
)
.execute().actionGet();
List<YourObject> list = new ArrayList<>();
for (SearchHit searchHit : searchHits.getHits()) {
YourObject yourObject = this.objectMapper.readValue(searchHit.getSourceAsString(), YourObject.class);
yourObject.setYourParentObject(this.objectMapper.readValue(searchHit.getInnerHits().get("parent_type_name").getAt(0).getSourceAsString(), YourParentObject.class));
list.add(yourObject);
}

Resources