Elastic GeoHash Query - Aggregation Filter - elasticsearch

I am trying to query an elastic index where the result of the query is a list of the geohashes with only one matching document.
I can get a simple list of all geo hashes and their document counts using the following:
{
"size" : 0,
"aggregations" : {
"boundingbox" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "34.5, -118.9",
"bottom_right" : "33.3, -116."
}
}
},
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
}
}
}
}
}
However I can't work out the correct syntax to filter the query, the closest I can get are below:
This fails with 503 org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
},
"grid_bucket_filter" : {
"bucket_selector" : {
"buckets_path" :{
"docCount" : "grid" //Also tried `"docCount" : "doc_count"`
},
"script" : "params.docCount == 1"
}
}
}
This fails with 400 No aggregation found for path [doc_count]
"aggregations":{
"grid" : {
"geohash_grid" : {
"field": "location",
"precision": 4
}
},
"grid_bucket_filter" : {
"bucket_selector" : {
"buckets_path" :{
"docCount" : "doc_count"
},
"script" : "params.docCount > 1"
}
}
}
How can I filter based on the doc_count in a geohash grid?

You need to do it like this, i.e. the bucket selector pipeline shall be specified as a sub-aggregation of the geohash_grid one. Plus you need to use _count instead of doc_count(see here):
{
"aggregations": {
"grid": {
"geohash_grid": {
"field": "location",
"precision": 4
},
"aggs": {
"grid_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount > 1"
}
}
}
}
}
}

Related

aggregating properties in elastic search

I have an indexed entry that has optional properties. So, for example, I have entries like this
{
"id":1
"field1":"XYZ"
},
{
"id":2
"field2":"XYZ"
},
{
"id":3
"field1":"XYZ"
}
I would like to make an aggregation that will tell me how many entries I have with field1 and field2 populated.
The expected result should be:
{
"field1":2
"field2":1
}
Is this even possible with elasticsaerch?
Yes, you can do it like this:
POST myindex/_search
{
"size": 0,
"aggs": {
"field_exists": {
"filters": {
"filters": {
"field1": {
"exists": {
"field": "field1"
}
},
"field2": {
"exists": {
"field": "field2"
}
}
}
}
}
}
}
You'll get an answer like this one:
"aggregations" : {
"field_exists" : {
"buckets" : {
"field1" : {
"doc_count" : 2
},
"field2" : {
"doc_count" : 1
}
}
}
}

How can I filter doc_count value which is a result of a nested aggregation

How can I filter the doc_count value which is a result of a nested aggregation?
Here is my query:
"aggs": {
"CDIDs": {
"terms": {
"field": "CDID.keyword",
"size": 1000
},
"aggs": {
"my_filter": {
"filter": {
"range": {
"transactionDate": {
"gte": "now-1M/M"
}
}
}
},
"in_active": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 4"
}
}
}
}
}
The result of the query looks like:
{
"aggregations" : {
"CDIDs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2386,
"buckets" : [
{
"key" : "1234567",
"doc_count" : 5,
"my_filter" : {
"doc_count" : 4
}
},
{
"key" : "12345",
"doc_count" : 5,
"my_filter" : {
"doc_count" : 5
}
}
]
}
}
}
I'm trying to filter the second doc_count value here. Let's say I wanna have docs that are > 4 so the result should be having only one aggregation result in a bucket with doc_count = 5. Can anyone help how can I do this filter? Please let me know if any additional information is required.
Take a close look at the bucket_selector aggregation. You simply need to specify the aggregation name in buckets_path section i.e. "doc_count":"my_filter>_count"
Pipeline aggregation (buckets_path) has its own syntax where > acts as a separator. Refer to this LINK for more information on this.
Aggregation Query
POST <your_index_name>/_search
{
"size":0,
"aggs":{
"CDIDs":{
"terms":{
"field":"CDID.keyword",
"size":1000
},
"aggs":{
"my_filter":{
"filter":{
"range":{
"transactionDate":{
"gte":"now-1M/M"
}
}
}
},
"in_active":{
"bucket_selector":{
"buckets_path":{
"doc_count":"my_filter>_count"
},
"script":"params.doc_count > 4"
}
}
}
}
}
}
Hope it helps!

Count the percentage of character fields

I want to count the percentage of specified field data.
this is my Restful API:
Restful API:
GET _search
{
"_source": {
"includes": [ "FIRST_SWITCHED","LAST_SWITCHED","IPV4_DST_ADDR","L4_DST_PORT","IPV4_SRC_ADDR","L7_PROTO_NAME","IN_BYTES","IN_PKTS","OUT_BYTES","OUT_PKTS"]
},
"from" : 0, "size" : 10000,
"query": {
"bool": {
"must": [
{
"match" : { "_index" : "logstash-2017.12.22" }
},
{
"match_phrase":{"IPV4_SRC_ADDR":"192.168.0.159"}
},
{
"range" : {
"LAST_SWITCHED" : {
"gte" : 1513683600
}
}
}
]
}
},
"aggs": {
"IN_PKTS": {
"sum": {
"field": "IN_PKTS"
}
},
"IN_BYTES": {
"sum": {
"field": "IN_BYTES"
}
},
"OUT_BYTES": {
"sum": {
"field": "OUT_BYTES"
}
},
"OUT_PKTS": {
"sum": {
"field": "OUT_PKTS"
}
},
"percent":{
"significant_terms" : {
"field" : "L7_PROTO_NAME",
"percentage":{}
}},
"protocol" : {
"terms" : {
"field" : "PROTOCOL",
"include" : ["17", "6"]
}
},
"Using_port_count" : {
"cardinality" : {
"field" : "L4_SRC_PORT"
}
}
}
}
but there's some errors.
this is error messages:
error messages:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [L7_PROTO_NAME] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
thank you in advance!
ok, I find the answer!
just add .keyword at here then it can run!
"field" : "L7_PROTO_NAME.keyword"

How to fetch records with aggregation in elasticsearch?

I am using below range aggregation in ElasticSearch and I want the aggregated records also with doc count. Can it be achieved ??
Below is the query:
{
"aggs" : {
"Age" : {
"filter" : { "range" : { "AGE" : { "gt" : 33 } } }
}
}
}
and here is the output:
{
"aggregations" : {
"Age" : {
"doc_count" : 2
}
}
}
Is there any way to fetch the records also ??
Thanks.
Yes , you can use the top hits aggregation
The documents in that bucket would be returned.
The below code should work fine -
{
"aggs": {
"Age": {
"filter": {
"range": {
"AGE": {
"gt": 33
}
},
"aggs": {
"results": {
"top_hits": {}
}
}
}
}
}
}

Post filter on subaggregation in elasticsearch

I am trying to run a post filter on the aggregated data, but it is not working as i expected. Can someone review my query and suggest if i am doing anything wrong here.
"query" : {
"bool" : {
"must" : {
"range" : {
"versionDate" : {
"from" : null,
"to" : "2016-04-22T23:13:50.000Z",
"include_lower" : false,
"include_upper" : true
}
}
}
}
},
"aggregations" : {
"associations" : {
"terms" : {
"field" : "association.id",
"size" : 0,
"order" : {
"_term" : "asc"
}
},
"aggregations" : {
"top" : {
"top_hits" : {
"from" : 0,
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
},
"sort" : [ {
"versionDate" : {
"order" : "desc"
}
} ]
}
},
"disabledDate" : {
"filter" : {
"missing" : {
"field" : "disabledDate"
}
}
}
}
}
}
}
STEPS in the query:
Filter by indexDate less than or equal to a given date.
Aggregate based on formId. Forming buckets per formId.
Sort in descending order and return top hit result per bucket.
Run a subaggregation filter after the sort subaggregation and remove all the documents from buckets where disabled date is not null.(Which is not working)
The whole purpose of post_filter is to run after aggregations have been computed. As such, post_filter has no effect whatsoever on aggregation results.
What you can do in your case is to apply a top-level filter aggregation so that documents with no disabledDate are not taken into account in aggregations, i.e. consider only documents with disabledDate.
{
"query": {
"bool": {
"must": {
"range": {
"versionDate": {
"from": null,
"to": "2016-04-22T23:13:50.000Z",
"include_lower": true,
"include_upper": true
}
}
}
}
},
"aggregations": {
"with_disabled": {
"filter": {
"exists": {
"field": "disabledDate"
}
},
"aggs": {
"form.id": {
"terms": {
"field": "form.id",
"size": 0
},
"aggregations": {
"top": {
"top_hits": {
"size": 1,
"_source": {
"includes": [],
"excludes": []
},
"sort": [
{
"versionDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Resources