Elasticsearch null values aggregation, sum beng 0 and avg being null? - elasticsearch

I am using elasticsearch to do analyze and found that when doing the aggregation, if one bucket all elements are null value, the sum result is 0, but avg result is null.
{
"size" : 0,
"query" : {
"bool" : {
"must" : {
"bool" : {
"must" : {
"bool" : {
"should" : [ {
"term" : {
"2219" : "AAA"
}
}, {
"term" : {
"2219" : "BBB"
}
}, {
"term" : {
"2219" : "CCC"
}
}, {
"term" : {
"2219" : "DDD"
}
} ]
}
}
}
}
}
},
"explain" : false,
"aggregations" : {
"2224" : {
"terms" : {
"field" : "2224",
"missing" : "null",
"size" : 2000
},
"aggregations" : {
"2219" : {
"terms" : {
"field" : "2219",
"missing" : "null",
"size" : 2000
},
"aggregations" : {
"a" : {
"avg" : {
"field" : "2255"
}
},
"count" : {
"value_count" : {
"field" : "1982"
}
}
}
}
}
}
}
}
The result will be
...
{
"2219": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DDD",
"doc_count": 1,
"a": {
"value": null
}
}
]
},
"key": "rock",
"doc_count": 1
}
...
The result for "a" is null.
But if I change to sum, the result of "a" is 0.
Weird different behavior.

There's a similar issue in ES github: https://github.com/elastic/elasticsearch/issues/9745
null is considered a correct value for AVG aggregation in case when ES has found 0 entities.

try adding this script to the aggregation to remove nulls:
"avg" : {
"field" : "2255"
"script":{
"lang":"painless",
"source":"if (_value == null) {return 0} else {return _value}"
}
}

Related

Elastic how to use the aggregation buckets to update the documents

I'm new to elastic/painless and needed some assistance.
Having this query :
GET index1/_search/
{
"size": 0,
"aggs": {
"attrs_root": {
"nested": {
"path": "business_index_jd_list_agg"
},
"aggs": {
"attrs": {
"terms": {
"field": "jdl_id"
},
"aggs": {
"sumOfQuantity" : {
"sum" : {
"field" : "value"
}
}
}
}
}
}
}
}
and these results from that query :
[...]
aggregations" : {
"attrs_root" : {
"doc_count" : 5,
"attrs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : -666,
"doc_count" : 1,
"sumOfQuantity" : {
"value" : 55.0
}
},
{
"key" : 93,
"doc_count" : 1,
"sumOfQuantity" : {
"value" : 25.0
},
[...]
]
}
}
}
}
How can I use that query and navigate through those results using a painless script to achieve to update each document in the index with that agregated info. Something like this:
{
"jdl_id" : -666,
"value" : 55.0
}
},
{
"jdl_id" : 93,
"value" : 25.0
}
},
[...]
Thank you.

Display a text label on top of an Elasticsearch aggregation keyword

I have this Elastic aggregation, but I would like to display the text activity.label on top of the activity.kw. I understand it is more than an aggregation, but how could I do it ?
Thank you
GET /my-index/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"group_by_state" : {
"terms" : {
"field" : "activity.kw",
"size" : 3000
}
}
}
}
Today I get something like:
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : "0",
"sum_other_doc_count" : "0",
"buckets" : [
{
"key" : "0009",
"doc_count" : "285396"
},
{
"key" : "9090",
"doc_count" : "1"
}
]
}
}
--------- edit 1
and I would like something like:
{
"key" : "0009",
"label" : "something"
"doc_count" : "285396"
},
{
"key" : "9090",
"label" : "something22"
"doc_count" : "1"
}
If you are trying to get documents under terms, you can use top_hits aggregation.
Query
{
"aggs": {
"group_by_state" : {
"terms" : {
"field" : "activity.kw",
"size" : 3000
},
"aggs": {
"docs": {
"top_hits": {
"_source": {
"includes": [ "activity.label" ]
},
"size": 1
}
}
}
}
}
}

elasticsearch returns null on stats aggregation

I have small data of 1200 entries in Elasticsearch that is automatically input in mapped fields of document-types. The float goes in float and double goes in double.
When taking 'aggs' of the data on 'stats' like:
GET /statsd-2015.09.28/timer_data/_search
{
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"range" : { "ns" : { "lte" : "gunicorn" }}
}
}
},
"aggs" : {
"value_val" : { "stats" : { "field" : "u'count_90'" } }
}
}
I get null on return like this:
...
"aggregations": {
"value_val": {
"count": 0,
"min": null,
"max": null,
"avg": null,
"sum": null
}
}
...
Here is my mapping of fields:
{"statsd-2015.09.28":{"mappings":{"timer":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"gauge":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"counter":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"timer_data":{"properties":{"#timestamp":{"type":"double"},"act":{"type":"string"},"count":{"type":"float"},"count_90":{"type":"float"},"count_ps":{"type":"float"},"grp":{"type":"string"},"lower":{"type":"float"},"mean":{"type":"float"},"mean_90":{"type":"float"},"median":{"type":"float"},"ns":{"type":"string"},"std":{"type":"float"},"sum":{"type":"float"},"sum_90":{"type":"float"},"sum_squares":{"type":"float"},"sum_squares_90":{"type":"float"},"tgt":{"type":"string"},"upper":{"type":"float"},"upper_90":{"type":"float"}}}}}}
What I want to ask is that why is my output not desired? And how can I get it?
GET /statsd-2015.09.28/timer_data/_search
{
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"range" : { "ns" : { "lte" : "gunicorn" }}
}
}
},
"aggs" : {
"value_val" : { "stats" : { "field" : "count_90" } }
}
}
I am new to this but I realized that field name was not what I was using. After this, everything became clear.

elasticsearch mix "and filter" with "bool filter"

i work on elasticsearch, I try to mix two working queries. the first with "and filter" and the second with "bool filter" but i fail.
My queries are generated dynamically from a user interface.
the "and filter" :
I need "and filter" to query data, for example a field have to be equal to "africa" or "asia" or empty. this is an example of working query :
curl -XGET 'http://localhost:9200/botanique/specimens/_search?pretty' -d '
{
"fields" : ["D_TYPESTATUS", "O_HASMEDIA"],
"aggs" : {
"D_TYPESTATUS_MISSING" : {
"missing" : {
"field" : "D_TYPESTATUS"
}
},
"D_TYPESTATUS" : {
"terms" : {
"field" : "D_TYPESTATUS",
"size" : 10
}
}
},
"query" : {
"filtered" : {
"filter" : {
"and" : [
{ "or" : [{
"term" : {
"O_HASMEDIA" : "true"
}
}
]
}, {
"or" : [{
"term" : {
"T_GENUS" : "flemingia"
}
}
]
}, {
"or" : [{
"term" : {
"L_CONTINENT" : "africa"
}
}, {
"term" : {
"L_CONTINENT" : "asia"
}
}, {
"missing" : {
"field" : "L_CONTINENT"
}
}
]
}, {
"or" : [{
"term" : {
"I_INSTITUTIONCODE" : "mnhn"
}
}
]
}
]
}
}
}
}'
this query work fine, this is the result :
"hits" : {
"total" : 1006,
"max_score" : 1.0,
"hits" : [ {
"_index" : "botanique",
"_type" : "specimens",
"_id" : "9459AB31EC354F1FAE270BDB6C22CDF7",
"_score" : 1.0,
"fields" : {
"O_HASMEDIA" : [ true ],
"D_TYPESTATUS" : "syntype"
}
},
....
},
"aggregations" : {
"D_TYPESTATUS" : {
"buckets" : [ {
"key" : "syntype",
"doc_count" : 6
}, {
"key" : "type",
"doc_count" : 5
}, {
"key" : "isotype",
"doc_count" : 2
} ]
},
"D_TYPESTATUS_MISSING" : {
"doc_count" : 993
}
}
}
the second query :
Now i need to restrict the result data with the field : "D_TYPESTATUS" who must be different from the value "type" and must be not null.
this query work to do this :
curl -XGET 'http://localhost:9200/botanique/specimens/_search?size=10&pretty' -d ' {
"fields" : ["D_TYPESTATUS", "O_HASMEDIA"],
"aggs" : {
"D_TYPESTATUS_MISSING" : {
"missing" : {"field" : "D_TYPESTATUS"}
},
"D_TYPESTATUS" : {
"terms" : {"field" : "D_TYPESTATUS","size" : 20}
}
},
"query" : {
"filtered" : {
"query" : {
"query_string" : { "query" : "liliaceae" }
},
"filter" : {
"bool" : {
"must_not" : [{
"term" : {
"D_TYPESTATUS" : "type"
}
}
],
"must":{
"exists" : {
"field" : "D_TYPESTATUS"
}
}
}
}
}
}
}'
and the result :
{[ {
"_index" : "botanique_tmp2",
"_type" : "specimens",
"_id" : "0C388B4A3186410CBA46826BA296ECBC",
"_score" : 0.9641713,
"fields" : {
"D_TYPESTATUS" : [ "isotype" ],
"O_HASMEDIA" : [ true ]
}
} , ... ]},
"aggregations" : {
"D_TYPESTATUS" : {
"buckets" : [ {
"key" : "isotype",
"doc_count" : 40
}, {
"key" : "syntype",
"doc_count" : 37
}, {
"key" : "holotype",
"doc_count" : 6
}, {
"key" : "paratype",
"doc_count" : 3
}, {
"key" : "isonéotype",
"doc_count" : 2
} ]
},
"D_TYPESTATUS_MISSING" : {
"doc_count" : 0
}
}
how to integret the "bool filter" in the "and filter" ??
thanks a lot
I must be missing something, because it's easy:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"or": [
{
"term": {
"O_HASMEDIA": "true"
}
}
]
},
{
"or": [
{
"term": {
"T_GENUS": "flemingia"
}
}
]
},
{
"or": [
{
"term": {
"L_CONTINENT": "africa"
}
},
{
"term": {
"L_CONTINENT": "asia"
}
},
{
"missing": {
"field": "L_CONTINENT"
}
}
]
},
{
"or": [
{
"term": {
"I_INSTITUTIONCODE": "mnhn"
}
}
]
},
{
"bool": {
"must_not": [
{
"term": {
"D_TYPESTATUS": "type"
}
}
],
"must": {
"exists": {
"field": "D_TYPESTATUS"
}
}
}
}
]
}
}
}
}

How to exclude a filter from a facet?

I have come from a Solr background and am trying to find the equivalent of "tagging" and "excluding" in Elasticsearch.
In the following example, how can I exclude the price filter from the calculation of the prices facet? In other words, the prices facet should take into account all of the filters except for price.
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
},
{
"range" : {
"price" : {
"from" : "10",
"to" : "20"
}
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
}
},
"features" : {
"terms" : {
"field" : "feature"
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
}
}
You can apply price filter as a top level filter to your query and add it to all facets expect prices as a facet_filter:
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"features" : {
"terms" : {
"field" : "feature"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
},
"filter": {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
}
Btw, important change since ES 1.0.0. Top-level filter was renamed to post_filter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_search_requests.html#_search_requests). And filtered queries using is still preferred as described here: http://elasticsearch-users.115913.n3.nabble.com/Filters-vs-Queries-td3219558.html
And there is global option for facets to avoid filtering by query filter (elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html#_scope).

Resources