How to order by many values in Elasticsearch terms aggregations - elasticsearch

How do you order ES terms aggregations by multiple values?
At the moment i do:
aggs : {
aggName : {
terms : {
field : "foo",
order : { "subAgg.avg" : "desc" }
}
},
aggs : {
subAgg : {
stats : {
field : "bar"
}
}
}
}
The API says you can do:
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
But this does not work, ES throws an error:
Unknown key for a START_ARRAY in [aggName]: [order].
I found something like this in other posts:
order : { "subAgg.avg" : "desc", "subAgg.count" : "desc" }
No error, but not sorted correctly.
My question is, how to correctly sort by many values?
I have ES 1.4.4 installed.
thx
EDITED:
Mapping
{
"mappings" : {
"mymapping" : {
"properties" : {
"foo" : {
"type" : "short"
}
}
}
}
}
Query:
{
query : {
match_all : {}
},
aggs : {
aggName : {
terms : {
field : "foo",
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
},
aggs : {
subAgg : {
stats : {
field : "foo"
}
}
}
}
}
}

You can try this:
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
From: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}

Related

Elastic GeoHash Query - Aggregation Filter

I am trying to query an elastic index where the result of the query is a list of the geohashes with only one matching document.
I can get a simple list of all geo hashes and their document counts using the following:
{
"size" : 0,
"aggregations" : {
"boundingbox" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "34.5, -118.9",
"bottom_right" : "33.3, -116."
}
}
},
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
}
}
}
}
}
However I can't work out the correct syntax to filter the query, the closest I can get are below:
This fails with 503 org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
},
"grid_bucket_filter" : {
"bucket_selector" : {
"buckets_path" :{
"docCount" : "grid" //Also tried `"docCount" : "doc_count"`
},
"script" : "params.docCount == 1"
}
}
}
This fails with 400 No aggregation found for path [doc_count]
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
},
"grid_bucket_filter" : {
"bucket_selector" : {
"buckets_path" :{
"docCount" : "doc_count"
},
"script" : "params.docCount > 1"
}
}
}
How can I filter based on the doc_count in a geohash grid?
You need to do it like this, i.e. the bucket selector pipeline shall be specified as a sub-aggregation of the geohash_grid one. Plus you need to use _count instead of doc_count(see here):
{
"aggregations": {
"grid": {
"geohash_grid": {
"field": "location",
"precision": 4
},
"aggs": {
"grid_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount > 1"
}
}
}
}
}
}

Query muilt filed by date and ip in elasticesarch

in elasticsearch data load from next josn data.
,i want get the max value of cpu0 and in_eth1 for every ip in elasticsearch and sorted by date , so some one can help me with the flowing query?
{
"ip":"10.235.13.172",
"date":"2015-11-09",
"time":"18:30:00",
"cpu0":7"cpu13":2,
"cpu14":1,
"diskio(%)":0,
"memuse(MB)":824,
"in_eth1(Mbps)":34
}
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"maxcpu" : { "max" : { "field" : "cpu(%)" } },
"maxin" : { "max" : { "field" : "in_eth1(Mbps)" } },
}
}
}
}
}

elasticsearch returns null on stats aggregation

I have small data of 1200 entries in Elasticsearch that is automatically input in mapped fields of document-types. The float goes in float and double goes in double.
When taking 'aggs' of the data on 'stats' like:
GET /statsd-2015.09.28/timer_data/_search
{
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"range" : { "ns" : { "lte" : "gunicorn" }}
}
}
},
"aggs" : {
"value_val" : { "stats" : { "field" : "u'count_90'" } }
}
}
I get null on return like this:
...
"aggregations": {
"value_val": {
"count": 0,
"min": null,
"max": null,
"avg": null,
"sum": null
}
}
...
Here is my mapping of fields:
{"statsd-2015.09.28":{"mappings":{"timer":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"gauge":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"counter":{"properties":{"#timestamp":{"type":"string"},"act":{"type":"string"},"grp":{"type":"string"},"ns":{"type":"string"},"tgt":{"type":"string"},"val":{"type":"float"}}},"timer_data":{"properties":{"#timestamp":{"type":"double"},"act":{"type":"string"},"count":{"type":"float"},"count_90":{"type":"float"},"count_ps":{"type":"float"},"grp":{"type":"string"},"lower":{"type":"float"},"mean":{"type":"float"},"mean_90":{"type":"float"},"median":{"type":"float"},"ns":{"type":"string"},"std":{"type":"float"},"sum":{"type":"float"},"sum_90":{"type":"float"},"sum_squares":{"type":"float"},"sum_squares_90":{"type":"float"},"tgt":{"type":"string"},"upper":{"type":"float"},"upper_90":{"type":"float"}}}}}}
What I want to ask is that why is my output not desired? And how can I get it?
GET /statsd-2015.09.28/timer_data/_search
{
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"range" : { "ns" : { "lte" : "gunicorn" }}
}
}
},
"aggs" : {
"value_val" : { "stats" : { "field" : "count_90" } }
}
}
I am new to this but I realized that field name was not what I was using. After this, everything became clear.

Replacing OR/AND/NOT filters with bool filter creates a hard-to-understand query with too many levels?

I have the following filter in a filtered query. As seen, it has many OR/AND/NOT filters at different levels. I was advised to replace them with bool filters for performance reasons, and I am going to do that.
"filter" : {
"or" : [
{
"and" : [
{ "range" : { "start" : { "lte": 201407292300 } } },
{ "range" : { "end" : { "gte": 201407292300 } } },
{ "term" : { "condtion1" : false } },
{
"or" : [
{
"and" : [
{ "term" : { "condtion2" : false } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion6" : "nop" } }
{ "term" : { "condtion7" : "rst" } }
]
}
]
},
{
"and" : [
{ "term" : { "condtion2" : true } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion3" } },
{ "missing" : { "field" : "condtion4" } },
{ "missing" : { "field" : "condtion5" } },
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion3" : "abc" } },
{ "term" : { "condtion4" : "def" } },
{ "term" : { "condtion5" : "ghj" } },
{ "term" : { "condtion6" : "nop" } },
{ "term" : { "condtion7" : "rst" } }
]
}
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_1" }
},
{ "range" : { "start" : { "lte": 201407302300 } } },
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_2" }
},
{ "ids" : { values: [100, 10] } }
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_3" }
},
{
"or": [
{ "term" : { "condtion1" : true } },
{ "range" : { "end" : { "lt": 201407302300 } } }
]
},
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
}
]
}
However, I feel replacing these OR/AND/NOT filters would create a query that has too many levels and is hard to understand. For example, replacing
"or": [
....
]
I have to have:
"bool" {
"should": [
]
}
Am I right that replacing OR/AND/NOT with bool filter in my case is at the expense of sacrificing understandability?
A related question
If I have to replace OR/AND/NOT filters for performance, should I replace ALL of these OR/AND/NOT filters, or just some of them such as the one at the top for example?
Thanks and regards.
You should replace all of them except geo/script/range filters. Having said that understanding the possible impact of each filter can help you also. For example if one of the filter is going to filter out say 90% of the result then you may want to put that in an and filter at the starting. Since and/or filters are executed sequentially the rest of the filters will have lesser documents to process. In case of bool filters all the filters are combined in a single bitset operation. You might have already read about it.
I don't think you will be sacrificing understability by replacing OR/AND/NOT with bool filter. As the example you have given, for a single or filter converting to should filter looks like an increase in the query structure but in an overall combination the structure would be almost similar.

How to exclude a filter from a facet?

I have come from a Solr background and am trying to find the equivalent of "tagging" and "excluding" in Elasticsearch.
In the following example, how can I exclude the price filter from the calculation of the prices facet? In other words, the prices facet should take into account all of the filters except for price.
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
},
{
"range" : {
"price" : {
"from" : "10",
"to" : "20"
}
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
}
},
"features" : {
"terms" : {
"field" : "feature"
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
}
}
You can apply price filter as a top level filter to your query and add it to all facets expect prices as a facet_filter:
{
query : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"and" : [
{
"term" : {
"colour" : "Red"
}
},
{
"term" : {
"feature" : "Square"
}
},
{
"term" : {
"feature" : "Shiny"
}
}
]
}
}
},
"facets" : {
"colours" : {
"terms" : {
"field" : "colour"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"features" : {
"terms" : {
"field" : "feature"
},
"facet_filter" : {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
},
"prices" : {
"statistical" : {
"field" : "price"
}
}
},
"filter": {
"range" : { "price" : { "from" : "10", "to" : "20" } }
}
}
Btw, important change since ES 1.0.0. Top-level filter was renamed to post_filter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_search_requests.html#_search_requests). And filtered queries using is still preferred as described here: http://elasticsearch-users.115913.n3.nabble.com/Filters-vs-Queries-td3219558.html
And there is global option for facets to avoid filtering by query filter (elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html#_scope).

Resources