How to aggs for group by result in elasticsearch - elasticsearch

There is an index: person
"_source" : {
"id" : 304028598,
"name" : "aaa"
},
want to get these information:
1. average count per name
2. max count one name can have
For sql I could get these info by below sql
select max(count), avg(count), min(count) from (
select name, count(*) count from t group by name
);
but how to implement it by elasticsearch?

The answer to this question relies on Pipeline aggregations -- these aggregations operate on the output of another aggregation.
For example, we have many documents, each with a different hostVersion and use the following to find the max, min and average number of documents per host version:
"aggs": {
"per_hostver": {
"terms": {
"field": "hostVersion"
}
},
"avg_docs_per_version": {
"avg_bucket": {
"buckets_path": "per_hostver>_count"
}
},
"max_docs_per_version": {
"max_bucket": {
"buckets_path": "per_hostver>_count"
}
},
"min_docs_per_version": {
"min_bucket": {
"buckets_path": "per_hostver>_count"
}
}
}
The syntax per_hostver>_count refers to the _count field generated by each bucket of the aggregation per_hostver. _count is how you refer to the special document count field generated by all ES aggregations.

Related

How to limit search results from each index in a multi index search query?

I am using Elasticsearch version 6.3 and I want to make queries across multiple indices.Elasticsearch has support for this and I can give multiple indices as comma separated values in the url with one query in request body and also give size parameter to limit the number of search results returned.However this limits the size of the overall search results and might lead to no results from some indexes- so instead I want to fetch first n number of results from each index.
I tried using multi search api (_msearch) but with that it seems I have to give the same query and size for all indexes and that works, but I am not able to get a single aggregation over the entire result , is there any way to address both the issues?
Solution 1:
You're on the right path with the _msearch query. What I would do is to issue one query per index (no aggregations!) with the size you want for that index, as well as another query just for the aggregations, like this:
{ "index": "index1" }
{ "size": 5, "query": { ... }}
{ "index": "index2" }
{ "size": 5, "query": { ... }}
{ "index": "index3" }
{ "size": 5, "query": { ... }}
{ "index": "index1,index2,index3" }
{ "size": 0, "query": { ... }, "aggs": { ... } }
So the first three queries will return document hits from each of the three indexes and the last query will return the aggregation computed on all indexes, but no documents.
Solution 2:
Another way to tackle this if you have a small size, is to have a single query in the query part and then aggregate on the index name and retrieve hits from each index using top_hits, like this:
POST index1,index2,index3/_search
{
"size": 0,
"query": { ... },
"aggs": {
"indexes": {
"terms": {
"field": "_index",
"size": 50
},
"aggs": {
"hits": {
"top_hits": {
"size": 5
}
}
}
}
}
}

Finding unique documents in an index in elastic search

I am having duplicates entries in my index and I want to find out only unique documents in the index . TopHits aggregation solves this problem but my other requirement is to support sorting on the results (across buckets). Hence I cant use top hits aggregation.
Other options I can think of is to write a plugin or use painless script.
Need help to solve this.It would be great if you can redirect me to some examples.
Top hits aggregation find the value from the complete result set while If you use cardinality it gives only filtered result set.
You can use cardinality aggregation like below:
{
"aggs" : {
"UNIQUE_COUNT" : {
"cardinality" : {
"field" : "your_field"
}
}
}
}
This aggregation comes with some responsibility, You can find the below ElasticSearch documentation to understand it better.
Link: Cardinality Aggregation
For sorting, you can refer the below example, where you can pass your aggregation in order of terms for which your bucket get created:
{
"aggs": {
"AGG_NAME": {
"terms": {
"field": "you_field",
"size": 10,
"order": {
"UNIQUE_COUNT.doc_count": "asc"
},
"min_doc_count": 1
},
"aggs": {
"UNIQUE_COUNT": {
"cardinality": {
"field": "your_field"
}
}
}
}
}
}

Elasticsearch, group by field and calculate average value for another field

Mapping:
player_id: int
stat_date: date
some_param: int
I need to calculate average value for "some_param" per each player_id used row with max "stat_date" in case of several rows with same player_id.
So i need average value for last date for all players
This snippet it not working because of "Aggregator [average_val] of type [avg] cannot accept sub-aggregations"
get test/test/_search
{
"size":0,
"aggs": {
"average_val":{
"avg": {
"field": "some_param"
},
"aggs": {
"by_player": {
"terms": { "field" : "player_id" },
"aggs" : {
"by_date" : {
"max" : { "field" : "stat_date" }
}
}
}
}
}
}
}
Simpliest way is use simple avg
get test/test/_search
{
"size":0,
"aggs": {
"averages": {
"avg": {
"field": "some_param"
}
}
}
}
But i need to calc avg player "some_param" only for last stat dates.
I think you would just need to reverse the order of your aggregation. Put avg aggregation in the deepest aggregation and it should work fine.
There are two major types of aggregation. Avg is a Metrics Aggregation and it outputs metrics (numbers). You would need to put Bucket Aggregations ( like terms aggregation) in the outside and do metrics aggregations for their output.

Elasticsearch filter multiple terms with only matching results and not any of them

How I can get only filtered matching results with all the multi term search. I have this sample table where titleid is a mapping int field and personid is a keyword:
titleid:1,personid:a
titleid:3,personid:a
titleid:1,personid:b
titleid:2,personid:b
titleid:1,personid:c
titleid:5,personid:c
The expeted result is:
titleid:1
With a sample query like this one:
{query:
{bool:
{filter:
{must:[
{terms : {fields: {personid:[a,b,c]}}
]
}}}}
I have the following results:
titleid: 1,2,3,5
Maybe this will help, I did the query in sql and got the expected result. What I did was ask the query to give me the sum of titleid that matches the quantity of searched parameters. This is only to be more self explained, the idea is to use elasticsearch.
select titleid
from (
select count(titleid) as title_count, titleid
from table1
where personid in ('a','b','c')
group by titleid
) as vw
where title_count = 3
if you only want records with titleid == 1 AND personid == 'a' you can filter on both fields. only the boolean query uses must, should, and most_not. with a filter since it's filtering (eg, removing) by definition it's a must
"query": {
"bool": {
"filter": [
{
"term": {
"titleId": { "value": 1 }
}
},
{
"term": {
"personid": { "value": "a" }
}
}
]
}
}
UPDATE::
Now your question looks like you want to filter and aggregate your results and then aggregate on those. There's a few metrics and bucket aggregations
Using bucket selector aggregation (this isn't tested but should be very close if not correct)
{
"aggs" : {
"title_id" : {
"filter" : { "terms": { "personid": ["a","b","c"] } },
"aggs" : {
"id_count" : { "count" : { "field" : "titleid" } }
}
},
aggs": {
"count_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "the_doc_count == 3"
}
}
}
}
}
However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields as in thousands of thousands of terms.

Finding the max date in elastic search query

Can you please help me to convert this sql query to elastic search query?
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group
What if you have your query as below, assuming that you're doing an HTTP POST. You could simply use max aggregations of ES in order to get the max value and use terms within aggs in order to get the GROUP BY function done.
Request:
yourhost:9200/your_index/_search
Request Body:
{
"query": {
"query_string": {
"query": "checks > 0" <-- check whether this works, if not use the range query
}
},
"aggs": {
"groupby_group": {
"terms": {
"field": "group"
},
"aggs": {
"maximum": {
"max": {
"script": "doc['date'].value"
}
}
}
}
}
}
For checks > 0, you could go with the range query as well within the query, which could look like:
"range" : {
"checks" : {
"gte" : 0
}
}
This one could help you on executing aggregations. But please do make sure that you've enabled scripting from your elasticsearch.yml before you try querying:
script.inline: on
Hope this helps!

Resources