elastic search get bucket count - elasticsearch

I have the following query:
GET images/_search
{
"query":{
"bool":{
"must":[
{
"term":{
"appID.raw":"myApp"
}
}
]
}
},
"size":0,
"aggs":{
"perDeviceAggregation":{
"terms":{
"field":"deviceID",
"min_doc_count":50000
}
}
}
}
This query returns a "buckets" array, but I would like to return only the length of the array, without the array itself.
Explanation: the purpose of this query is to count how many devices that belong to app "myApp", have over 50,000 images. I don't need the query to return these devices, just to know how many are there.

The terms aggregation returns buckets -- 1 bucket for each unique term of the field -- where each bucket contains the count of documents that contain the term.
It sounds like you want to know the number of unique terms instead of the document count per term. This concept is called cardinality.
There is a different aggregation to determine cardinality. Your query would look like this:
GET images/_search
{
"query":{
"bool":{
"must":[
{
"term":{
"appID.raw":"myApp"
}
}
]
}
},
"size":0,
"aggs":{
"deviceIdCardinality":{
"cardinality":{
"field":"deviceID"
}
}
}
}
NOTE: cardinality counts are approximate. You can configure the accuracy with the precision_threshold parameter to the aggregation. See the documentation for specifics.

Related

Elasticsearch: Query to filter out specific documents based on field value and return count

I'm trying to compose a query in Elasticsearch that filters out documents with a specific field value, and also returns the number of documents that has been filtered out as an aggregation.
What I have so far is below, however, with my solution it seems that the documents are filtered out first, then after the filtering, the count is performed, which is making it always be 0.
{
"query":{
"bool":{
"must_not":[
{
"terms":{
"gender":[
"male"
]
}
}
]
}
},
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}
You don't need a query block, just aggs will provide you expected results.
{
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}

Elasticsearch sort exact matches and fuzzy matches in different sets

This is my first ever question here so I apologize if I make any mistakes.
I'm trying to make a fuzzy search (match query with fuzziness parameter) on my index that will return the results in Alphabetical order. But I need the exact matches to come first(Alphabetically ordered among themselves) and fuzzy matches later.
I have tried this to make exact matches have higher scores. But they are just being sorted by their scores:
"query":{
"bool":{
"must":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"fuzziness":"AUTO"
}
}
}
],
"should":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"boost":20
}
}
}
]
}
},
"sort":[
"_score",
{
"myProperty.keyword":{
"order":"asc"
}
}
],
"track_scores":true
}
Then I have tried to make the scores of all exact matches and fuzzy matches same among themselves with many methods. I can make it for fuzzy matches by using filter or constant_score but I couldn't figure a way to assign a custom score to the results of should query in my search.
How can I achieve this?
I've managed to achieve this by using a function score query with "boost_mode": "replace" and setting a custom value to weight parameter like: "weight": "10".
{
"query":{
"function_score":{
"query":{
"bool":{
"filter":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"fuzziness":"AUTO"
}
}
}
]
}
},
"boost_mode":"replace",
"functions":[
{
"filter":{
"match":{
"myPropertyName":{
"query":"myWord"
}
}
},
"weight":"10"
}
]
}
},
"sort":[
"_score",
{
"myProperty.keyword":{
"order":"asc"
}
}
],
"track_scores":true
}
This way documents that match the match query will return with 0 score since it's also a filter query. Then among these documents the ones that match the function will return with 10 score since "boost_mode": "replace" and "weight: "10".
When it comes to sorting firstly Elasticsearch will sort the results by their score's since it comes first in "sort[]" array. Then documents with same scores will be sorted alphabetically among themselves.
This worked perfectly for me.

Incorrect aggregation when using sorting

I use this query to get search hits and the count of hits across multiple indices:
/index1,index2/_search
{
"query":{
"query_string":{
"query":"*"
}
},
"aggs":{
"group_by_index":{
"terms":{
"field":"_index",
"min_doc_count":0
}
}
},
"post_filter":{
"terms":{
"_index":"index1"
}
},
"sort":{
"my_field":"asc"
}
}
The problem is if I sort on a field (my_field) that only exist in index1, the aggregation will only give me the hits count of index1, and not index2.
I thought the aggregation would work regardless of what sorting I have specified?
Using Elasticsearch 6.4
Solved it by using unmapped_type

bool query with filter does not return any documents

The simple query
"query": {
"simple_query_string": { "query":"great guide" }
},
returns my document as expected, containing
"groups": [
"Local Business"
],
But if I use a filter, it returns no documents:
"query": {
"bool":{
"must":[
{"simple_query_string": { "query":"great guide" }}
],
"filter":{
"terms":{
"groups":["Local Business"]
}
}
}
},
If I remove the "filter" key and values, then the document is retrieved.
Why isn't the filter matching the document ?
If the groups field is of type keyword, then the query you've mentioned works as expected.
However it wouldn't work if the field groups if of type text. In that case the below query would actually fit what you are looking for.
Query for group - Type text
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"simple_query_string":{
"query":"great guide"
}
}
],
"filter":{
"match":{
"groups":"Local Business"
}
}
}
}
}
The reason the query you've mentioned doesn't work for the field of type text is because this field goes through Analysis phase making use of Standard Analyzer by default where it would first convert Local Business into small cases and then saves local and business as two individual words in the inverted index.
Elasticsearch would only give you results if the words you query match what's available in the index.
And what keyword does is, it saves Local Business as is in inverted index.
Note: You can try the query you have by replacing groups with groups.keyword if mapping hasn't been defined and is created dynamically.
Hope this helps!

Elasticsearch - Aggregation and Bucket size

I have data in ES which looks like this:
'{"Emp_ID":"12212","Emp_Name":"Jim","Emp_Sal":300,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"6874590","Emp_Name":"Joe","Emp_Sal":140,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"32135","Emp_Name":"Jill","Emp_Sal":170,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"43312","Emp_Name":"Andy","Emp_Sal":450,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"315609","Emp_Name":"Cody","Emp_Sal":150,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"87346","Emp_Name":"Dave","Emp_Sal":500,"Dep_Id":55,"Dep_Name":"hr","Dep_Cnt":10}'
I want to get all the unique departments ordered by Dep_Cnt, for which I wrote the following query
{
"size":0,
"aggs":{
"by_Dep_Cnt":{
"terms":{
"field":"Dep_Cnt",
"order":{
"_term":"asc"
}
},
"aggs":{
"by_unique_dep_id":{
"terms":{
"field":"Dep_Id"
},
"aggs":{
"tops":{
"top_hits":{
"size":1
}
}
}
}
}
}
}
}
And got expected output of 3 unique departments ordered by Dep_Cnt.
But now my requirement is to get only the top two departments.
How do I modify the query to get only 2 buckets?
What you are looking for is the parameter size of the terms aggregation:
If Dep_Cnt is the number of employees in your department and your document are per employee and you have all the employee in your index (from your mapping it may be the case) you can just do:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
}
}
}
Since by default it will sort by the number of documents with the corresponding value i.e. the number of documents with this Dep_Id i.e. the number of employees in this department.
If you are not in this situation:
Your current request does not behave the same way when you have two department with the same size (you will have two Dep_Ids in the same bucket of Dep_Cnt)
You can group documents by Dep_Id, get the Dep_Cnt using the metric you want (min, max, avg, ...) and sort on this metric:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
"order":{
"avg_Dep_Cnt":"asc"
}
},
"aggs":{
"avg_Dep_Cnt":{
"avg":{
"field":"Dep_Cnt"
}
}
}
}
}
}
NB: I removed the top_hits aggregations since you do not need them according to what you explained, if you have extra requirement just add them in the aggregation.

Resources