Elastic count by facets that exists only for some documents - elasticsearch

I have a facet that exists only in some of the documents. I wish to know how many documents have each possible value of the facet, and how many doesn't have this facet at all.
The facet is color. My current query returns the count for different colors, but doesn't returns the count for documents without color:
"facets": {
"_Properties": {
"terms": {
"field": "Color",
"size": 100
}
}
}
Thanks!

Facets have been deprecated in Elasticsearch. You can use a combination of Terms Aggregation and Missing Aggregation for this. Find the query below for your requirement:
"aggs": {
"_Properties": {
"terms": {
"field": "Color",
"size": 100
}
},
"_MissingColor": {
"missing": {
"field": "Color"
}
}
}

Related

Search document from an index which _ids does not exists in another index

Is any way to search documents from one index that _ids do not exists in another index? Something like NOT EXISTS in MySQL.
There is no convinient way to do this within elasticsearch but only by using aggregations as a workaround:
GET index-a,index-b/_search
{
"size": 0,
"aggs": {
"group_by_id": {
"terms": {
"field": "_id",
"size": 1000
},
"aggs": {
"containied_in_indices_count": {
"cardinality": {
"field": "_index"
}
},
"filter_only_differences": {
"bucket_selector": {
"buckets_path": {
"count": "containied_in_indices_count"
},
"script": "params.count < 2"
}
}
}
}
}
}
Then you'll need to iterate over all buckets in group_by_id aggregation. Consider using a larger size, as it´s 10 by default and 1000 in my example. If there are more differences in your indicies you need to use bucket partitioning as described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

How do I filter after an aggregation?

I am trying to filter after a top hits aggregation to get if the first apparition of an error was in a given range but I can't find a way.
I have seen something about bucket selector but can't get it to work
POST log-*/_search/
{
"size": 100,
"aggs": {
"group":{
"terms": {
"field": "errorID.keyword",
"size": 100
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"#timestamp": {
"order": "asc"
}
}
]
}
},
}
}
}
}
}
With this top hits I get the first apparition of a concrete errorID as I have many documents with the same errorID, but what I want to find is if the first apparition is within a given range of dates.
I think that a valid solution would be to filter the results of the aggregation to check if it is in the range, but I don't know how could I do that.

How to get specific _source fields in aggregation

I am exploring ElasticSearch, to be used in an application, which will handle large volumes of data and generate some statistical results over them. My requirement is to retrieve certain statistics for a particular field. For example, for a given field, I would like to retrieve its unique values and document frequency of each value, along-with the length of the value. The value lengths are indexed along-with each document.
So far, I have experimented with Terms Aggregation, with the following query:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
}
}
}
}
The query returns all the values in the field val with the number of documents in which each value occurs. I would like the field val_len to be returned as well. Is it possible to achieve this using ElasticSearch? In other words, is it possible to include specific _source fields in buckets? I have looked through the documentation available online, but I haven't found a solution yet.
Hoping somebody could point me in the right direction. Thanks in advance!
I tried to include _source in the following manners:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"_source":["val_len"]
}
}
and
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"_source":["val_len"]
}
}
}
But I guess this isn't the right way, because both gave me parsing errors.
You need to use another sub-aggregation called top_hits, like this:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"aggs": {
"hits": {
"top_hits": {
"_source":["val_len"],
"size": 1
}
}
}
}
}
Another way of doing it is to use another avg sub-aggregation so you can sort on it, too
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"order": {
"length": "desc"
}
},
"aggs": {
"length": {
"avg": {
"field": "val_len"
}
}
}
}
}

how to order on doc count for terms aggregation within a composite aggregation?

I was trying the composite aggregation in elastic-search but found it weird that what i can do within a terms aggregation normally, isn't supported for terms within a composite aggregation!
See the query below :
GET _search
{
"size": 0,
"query": {
"match_all": {}
},"aggs": {
"compo": {
"composite": {
"sources": [
{
"terms_inside": {
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // not supported here!
}
}
}
}
]
}
},
"just_terms" :{
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // supported here
}
}
}
}
}
Is the just the way it is, or is there a way to get sorted buckets on doc count with nested terms aggregation. I want to use paging and sorting on the terms aggregation.
It cannot be done as composite results paginate the aggregation and thus its function is designed to not fetch the count on all fields, only those in the first paginated set.
https://discuss.elastic.co/t/composite-aggregation-order-by/139563/5
You cannot aggregate on multiple terms and order on doc_count before elastic 7.12. On elasticsearch 7.12, you can use a multi terms aggregation.

Sublisting Aggregations Elastic search

Hi I wanted to know if after applying an aggregation can I only select a range of values to return in response. Suppose aggregation has100 docs can I select say documents from 10 to 30 or 0 to 20, etc. Any help would be appreciated, thanks
Elasticsearch supports filtering aggregation values with partitioning.
GET /_search
{
"size": 0,
"aggs": {
"expired_sessions": {
"terms": {
"field": "account_id",
"include": {
"partition": 0,
"num_partitions": 20
},
"size": 10000,
"order": {
"last_access": "asc"
}
},
"aggs": {
"last_access": {
"max": {
"field": "access_date"
}
}
}
}
}
}
See Filtering Values with partitions.
Be aware that partitioning may add a performance hit depending upon the aggregation.

Resources