How to write an Elasticsearch query that aggregates by a field, and for each one of another field? - elasticsearch

Given an index with documents of the following format
{
"userA": "user1",
"relation": 10,
"userB": "user2"
}
How to create an aggregation query that will display for each user (from a given list), the sum of 'relation's between them.
for example: given userX, userY
result will be:
{
user4: {user1: 100, user2: 300, user3: 350},
...
userX: {user4: 123, user5: 456}
}
I tried to do it using 2 separate queries like that (the second one with userB instead in the aggs field)
GET myindex*/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"userA": [<input user ids>]
}
},
{
"terms": {
"userB": [<input user ids>]
}
]
}
},
"aggs": {
"connections": {
"terms": {
"field": "userA" //// Second query with `userB`
},
"aggs": {
"privateConversationCount": {
"avg": {
"field": "privateConversationCount"
}
}
}
}
}
}
But this is not correct, it requires a nested aggregation.
How could I write a query that will answer that need?

Related

How to group docs by latest and run aggregation on the group in elastic search?

I need to get all clients under a partnerId & since there would be many duplicate clients I need to group the clients by clientId,reportDate and run some aggregations on it.
Index data is like below -
[
{ partnerId: "PID1234", clientId: "c1234", reportDate: "2022-02-01" }, // dup
{ partnerId: "PID1234", clientId: "c1234", reportDate: "2030-02-01" }, // dup, agg should take this one only since this is the latest.
{ partnerId: "PID1111", clientId: "c1222", reportDate: "2010-02-01" },
{ partnerId: "PID2222", clientId: "c1444", reportDate: "2013-02-01" },
]
I need to do something like the below query, the problem is top hits don't accept sub aggregations -
{
"query": {
"bool": {
"must": [
{
"term": {
"partnerId": "PID1234"
}
}
]
}
},
"aggs": {
"groupp": {
"top_hits": {
"sort": [
{
"clientId": {
"order": "desc"
}
}
],
"size": 1,
"aggs": {
"total_engagement_count": {
"sum": { "field": "recommendations.totalEngagement" }
}
}
}
}
}
}
Maybe you need to change your approach.
Look at the answer below, aggs metrics don't allow sub aggs.
https://github.com/elastic/elasticsearch/issues/16537#issuecomment-181965367

Query return the search difference on elasticsearch

How would the following query look:
Scenario:
I have two bases (base 1 and 2), with 1 column each, I would like to see the difference between them, that is, what exists in base 1 that does not exist in base 2, considering the fictitious names of the columns as hostname.
Example:
Selected value of Base1.Hostname is for Base2.Hostname?
YES → DO NOT RETURN
NO → RETURN
I have this in python for the following function:
def diff(first, second):
second = set (second)
return [item for item in first if item not in second]
Example match equal:
GET /base1/_search
{
"query": {
"multi_match": {
"query": "webserver",
"fields": [
"hostname"
],
"type": "phrase"
}
}
}
I would like to migrate this architecture to elastic search in order to generate forecast in the future with the frequency of change of these search in the bases
This could be done with aggregation.
Collect all the hostname from base1 & base2 index
For each hostname count occurrences in base2
Keep only the buckets that have base2 count 0
GET base*/_search
{
"size": 0,
"aggs": {
"all": {
"composite": {
"size": 10,
"sources": [
{
"host": {
"terms": {
"field": "hostname"
}
}
}
]
},
"aggs": {
"base2": {
"filter": {
"match": {
"_index": "base2"
}
}
},
"index_count_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"base2_count": "base2._count"
},
"script": "params.base2_count == 0"
}
}
}
}
}
}
By the way don't forget to use pagination to get rest of the result.
References :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
https://discuss.elastic.co/t/data-set-difference-between-fields-on-different-indexes/160015/4

elasticsearch Need average per week of some value

I have simple data as
sales, date_of_sales
I need is average per week i.e. sum(sales)/no.of weeks.
Please help.
What i have till now is
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sales",
"interval": "week"
}
},
"TotalSales": {
"sum": {
"field": "sales"
}
},
"myValue": {
"bucket_script": {
"buckets_path": {
"myGP": "TotalSales",
"myCount": "WeekAggergation._bucket_count"
},
"script": "params.myGP/params.myCount"
}
}
}
}
I get the error
Invalid pipeline aggregation named [myValue] of type [bucket_script].
Only sibling pipeline aggregations are allowed at the top level.
I think this may help:
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sale",
"interval": "week",
"format": "yyyy-MM-dd"
},
"aggs": {
"TotalSales": {
"sum": {
"field": "sales"
}
},
"AvgSales": {
"avg": {
"field": "sales"
}
}
}
},
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales"
}
}
}
}
Note the TotalSales aggregation is now a nested aggregation under the weekly histogram aggregation (I believe there was a typo in the code provided - the simple schema provided indicated the field name of date_of_sale and the aggregation provided uses the plural form date_of_sales). This provides you a total of all sales in the weekly bucket.
Additionally, AvgSales provides a similar nested aggregation under the weekly histogram aggregation so you can see the average of all sales specific to that week.
Finally, the pipeline aggregation avg_all_weekly_sales will give the average of weekly sales based on the TotalSales bucket and the number of non-empty buckets - if you want to include empty buckets, add the gap_policy parameter like so:
...
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales",
"gap_policy": "insert_zeros"
}
}
...
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html).
This pipeline aggregation may or may not be what you're actually looking for, so please check the math to ensure the result is what is expected, but should provide the correct output based on the original script.

Is there a way to pull back the field name after aggregation?

I'm using elastic search 1.5.2 and was wondering if there is a way to add the field name to the return of an aggregation other than the name of the aggregation itself, so that age can stay age and not Customer.Age
{
"aggregations": {
"age": {
"terms": {
"field": "Customer.Age",
"size": 10
}
}
}
}
I want the return to look something like this
aggregations: {
age: {
doc_count_error_upper_bound: 0
sum_other_doc_count: 0
field: Customer.Age
buckets: [6]
0: {
key: "unknown"
doc_count: 607103
}
}
And what I currently get does not include field.
This is not possible at the moment (before 2.0 is released). However, since the name of the aggregation can be essentially anything, you can encode both aggregation name and the field in the aggregation name and then parse it on the client side:
{
"aggregations": {
"age/Customer.Age": {
"terms": {
"field": "Customer.Age",
"size": 10
}
}
}
}
In v2.0 it will be possible to specify arbitrary metadata that will be returned back to the user:
{
"aggregations": {
"age: {
"terms": {
"field": "Customer.Age",
"size": 10
},
"meta": {
"field": "Customer.Age"
}
}
}
}

Return parent data with child document from Elasticsearch

Is is possible to return parent data with a search for child documents within an Elasticsearch query?
I have two document types, e.g. Book and Chapter, that are related as Parent/Child (not nested).
I want to run a search on the child document and return the child document, with some of the fields from the parent document. I'm trying to avoid executing a separate query on the parent.
Update
The only way possible I can find is to use the has_child query and then a series of aggregations to drill back to the children and apply the query/filter again. However, this seems overly complicated and inefficient.
GET index/_search
{
"size": 10,
"query": {
"has_child": {
"type": "chapter",
"query": {
"term": {
"field": "value"
}
}
}
},
"aggs": {
"name1": {
"terms": {
"size": 50,
"field": "id"
},
"aggs": {
"name2": {
"top_hits": {
"size": 50
}
},
"name3": {
"children": {
"type": "type2"
},
"aggs": {
"docFilter": {
"filter": {
"query": {
"match": {
"_all": "value"
}
}
},
"aggs": {
"docs": {
"top_hits": {
"size": 50
}
}
}
}
}
}
}
}
}
}
It is possible do a has_child query to return the parent docs with a top hits aggregation to return the child docs, but it is a bit cumbersome.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
The Inner Hits feature that is due to be released in 1.5.0 will do what you want.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-request-inner-hits.html
You could build the source from master and try it out.
This can be now be done with ElasticSearch. Just use 'has_parent' in the search query:
'has_parent': {
'parent_type': 'book',
'query': {
'match_all': {}
},
'inner_hits': {}
}
The results will appear in the inner_hits of the response.
As Dan Tuffery say in his comment, currently, this can be achieve with Inner Hits, in Java you can understand it more easy with the next snippet of code.
SearchResponse searchResponse = this.transportClient.prepareSearch("your_index")
.setTypes("your_type")
.setQuery(QueryBuilders.filteredQuery(
null,
FilterBuilders.hasParentFilter(
"parent_type_name",
FilterBuilders.termFilter("foo", "foo"))
.innerHit(new QueryInnerHitBuilder()))
)
.execute().actionGet();
List<YourObject> list = new ArrayList<>();
for (SearchHit searchHit : searchHits.getHits()) {
YourObject yourObject = this.objectMapper.readValue(searchHit.getSourceAsString(), YourObject.class);
yourObject.setYourParentObject(this.objectMapper.readValue(searchHit.getInnerHits().get("parent_type_name").getAt(0).getSourceAsString(), YourParentObject.class));
list.add(yourObject);
}

Resources