ElasticSearch Filtering aggregations from array field - elasticsearch

I am trying to do an aggregation on values in an array and also filter the buckets that are returned by a prefix. Not sure if this is possible or I am misusing the filter bucket.
3 documents:
{ "colors":["red","black","blue"] }
{ "colors":["red","black"] }
{ "colors":["red"] }
The goal is to get a count of documents that have a color starting with the letter B:
{
"size":0,
"aggs" : {
"colors" : {
"filter" : { "prefix" : { "colors" : "b" } },
"aggs" : {
"top-colors" : { "terms" : { "field":"colors" } }
}
}
}
}
The results that come back include Red unfortunately. Obviously because the documents with red still match by filter because they also have blue and/or black.
"aggregations": {
"colors": {
"doc_count": 2,
"top-colors": {
"buckets": [
{
"key": "black",
"doc_count": 2
},
{
"key": "red",
"doc_count": 2
},
{
"key": "blue",
"doc_count": 1
}
]
}
}
}
Is there a way to filter just the bucket results?

Try this, it will filter the values the buckets themselves are created for:
{
"size": 0,
"aggs": {
"colors": {
"filter": {
"prefix": {
"colors": "b"
}
},
"aggs": {
"top-colors": {
"terms": {
"field": "colors",
"include": {
"pattern": "b.*"
}
}
}
}
}
}
}

Related

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Elasticsearch: aggregation and select docs only having max value of field

I am using elastic search 6.5.
Basically, based on my query my index can return multiple documents, I need only those documents which has the max value for a particular field.
E.g.
{
"query": {
"bool": {
"must": [
{
"match": { "header.date" : "2019-07-02" }
},
{
"match": { "header.field" : "ABC" }
},
{
"bool": {
"should": [
{
"regexp": { "body.meta.field": "myregex1" }
},
{
"regexp": { "body.meta.field": "myregex2" }
}
]
}
}
]
}
},
"size" : 10000
}
The above query will return lots of documents/messages as per the query. The sample data returned is:
"header" : {
"id" : "Text_20190702101200123_111",
"date" : "2019-07-02"
"field": "ABC"
},
"body" : {
"meta" : {
"field" : "myregex1",
"timestamp": "2019-07-02T10:12:00.123Z",
}
}
-----------------
"header" : {
"id" : "Text_20190702151200123_121",
"date" : "2019-07-02"
"field": "ABC"
},
"body" : {
"meta" : {
"field" : "myregex2",
"timestamp": "2019-07-02T15:12:00.123Z",
}
}
-----------------
"header" : {
"id" : "Text_20190702081200133_124",
"date" : "2019-07-02"
"field": "ABC"
},
"body" : {
"meta" : {
"field" : "myregex1",
"timestamp": "2019-07-02T08:12:00.133Z",
}
}
So based on the above 3 documents, I only want the max timestamp one to be shown i.e. "timestamp": "2019-07-02T15:12:00.123Z"
I only want one document in above example.
I tried doing it as below:
{
"query": {
"bool": {
"must": [
{
"match": { "header.date" : "2019-07-02" }
},
{
"match": { "header.field" : "ABC" }
},
{
"bool": {
"should": [
{
"regexp": { "body.meta.field": "myregex1" }
},
{
"regexp": { "body.meta.field": "myregex2" }
}
]
}
}
]
}
},
"aggs": {
"group": {
"terms": {
"field": "header.id",
"order": { "group_docs" : "desc" }
},
"aggs" : {
"group_docs": { "max" : { "field": "body.meta.tiemstamp" } }
}
}
},
"size": "10000"
}
Executing the above, I am still getting all the 3 documents, instead of only one.
I do get the buckets though, but I need only one of them and not all the buckets.
The output in addition to all the records,
"aggregations": {
"group": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Text_20190702151200123_121",
"doc_count": 29,
"group_docs": {
"value": 1564551683867,
"value_as_string": "2019-07-02T15:12:00.123Z"
}
},
{
"key": "Text_20190702101200123_111",
"doc_count": 29,
"group_docs": {
"value": 1564551633912,
"value_as_string": "2019-07-02T10:12:00.123Z"
}
},
{
"key": "Text_20190702081200133_124",
"doc_count": 29,
"group_docs": {
"value": 1564510566971,
"value_as_string": "2019-07-02T08:12:00.133Z"
}
}
]
}
}
What am I missing here?
Please note that I can have more than one messages for same timestamp. So I want them all i.e. all the messages/documents belonging to the max time stamp.
In above example there are 29 messages for same timestamp (It can go to any number). So there are 29 * 3 messages being retrieved by my query after using the above aggregation.
Basically I am able to group correctly, I am looking for something like HAVING in SQl?

Elasticsearch query array field

I used elastic search in my project. I stored some values to ES. I want to query the array field from elastic search. I have to get how many time the array of value came. For example, You could see the below code, In that, image and price are coming two times.
{
"missing_fields_arr": ["images", "price"]
},
{
"missing_fields_arr": ["price"]
},
{
"missing_fields_arr": ["images"]
},
{
"missing_fields_arr": ["images", "price"]
}
and I expected output should be
"aggregations": {
"missing_fields": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "images, price",
"doc_count": 2
},
{
"key": "price",
"doc_count": 1
},
{
"key": "images",
"doc_count": 1
}
]
}
}
My code is here,
{
"query":{
"bool":{
"must":[
{
"range": {
"#timestamp":{
"gte": "2017-07-20T00:00:00.000Z",
"lte": "2017-07-28T23:59:59.999Z"
}
}
},
{
"term": {
"tracker_name": true
}
}
]
}
},
"from": 0,
"size": 0,
"aggregations" : {
"missing_fields": {"terms": {"field": "missing_fields_arr.raw", "size": 0} }
}
}
You need to use the count api it's much more efficient than the search:
of course combined with a little bit of regex
ex :
curl -XGET 'localhost:9200/product/item/_count?pretty' -H 'Content-Type:application/json' -d'\
{ "query" : { "term" : { "image|price" } } } '
GET /product/item/_count
{
"query" : {
"term" : { "image|price"}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html

ElasticSearch: min_doc_count on lower/lowest level nested aggregation

I have this query with some nested aggregations
{
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"new_users": {
"filter": {
"query": {
"match": {
"action": "USER_ADD"
}
}
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "user"
}
}
}
}
}
}
},
"size": 0
}
It yields results that look like this
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "1970-01-07T00:00:00.000Z",
"key": 518400000,
"doc_count": 210,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
{
"key_as_string": "1970-01-09T00:00:00.000Z",
"key": 691200000,
"doc_count": 6,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
......
What I want to happen is apply min_doc_count on the most nested sub-aggregation such that I don't see zero values for "unique_users" (in this case) returned.
The issue is that min_doc_count can't be applied to my query other than the date_histogram at the top level.
Does the ES query language support something like this? Any know workarounds?
Thanks,
George
As per ElasticSearch Documentation min_doc_count can used with any aggregation including histogram
for example
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag"
}
}
}
}
the above query is not date_histogram still you can apply the min_doc_count
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag",
"min_doc_count" : 1
}
}
}
}
only thing is min_doc_count can be applied to any aggregation

Resources