ElasticSearch: aggregations for ip_range type - elasticsearch

I have a field which is defined in mappings as:
"route": {
"type": "ip_range"
}
It works well, and I see the results when I query the ES:
"_source": {
"ip": "65.151.40.164",
"route": "65.151.40.0/22",
...
}
Now I want to do some aggregations of this field, and pretty much everything I try ends up being this error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [route] of type [ip_range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [route] of type [ip_range]"
}
}
I hope that it doesn't mean that ES doesn't support aggregations for ip_range? Or if it does, how can it be done?
UPDATE
As I said, so far any aggregations that work on other types (including ip type) don't work on ip_range.
Some examples:
{
"size": 0,
"aggs": {
"routes": {
"range": {
"field": "route",
"ranges": [
{"to": "10.0.0.0/32"}
]
}
}
}
}
{
"size": 0,
"aggs": {
"routes": {
"terms": {
"field": "route",
"size": 50
}
}
}
}
If anyone can point me to an aggregation that does work on ip_range that would be helpful!

There's a specific ip_range aggregation for the ip_range field type, i.e. do not use the range aggregation (only for numeric types) and terms (only for numeric and keyword types):
GET /ip_addresses/_search
{
"size": 10,
"aggs" : {
"routes" : {
"ip_range" : {
"field" : "route",
"ranges" : [
{"to": "10.0.0.0/32"}
]
}
}
}
}

Related

How to perform sub-aggregation in elasticsearch?

I have a set of article documents in elasticsearch with fields content and publish_datetime.
I am trying to retrieve most frequent words from articles with publish year == 2021.
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
},
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"word_counts": "word_counts",
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
This fails on
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25
}
],
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [word_counts_2021]",
"line" : 17,
"col" : 25,
"caused_by" : {
"type" : "named_object_not_found_exception",
"reason" : "[17:25] unknown field [word_counts_2021]"
}
},
"status" : 400
}
which does not make sense, because word_counts2021 is the name of the aggregation accordings to docs. It's not an aggregation type. I am the one who pics the name, so I thought it could have had basically any value.
Does anyone have any idea, what's going on there. So far, it seems pretty unintuitive service to me.
The agg as you have it written seems to be filtering publish_datetime buckets so that you only include those in the year 2021 to do that you must nest the sub-agg under that particular terms aggregation.
Like so:
GET articles/_search
{
"query": {
"match_all": {}
},
"aggs": {
"word_counts": {
"terms": {
"field": "content"
}
},
"publish_datetime": {
"terms": {
"field": "publish_datetime"
}
"aggs": {
"word_counts_2021": {
"bucket_selector": {
"buckets_path": {
"pd": "publish_datetime"
},
"script": "LocalDateTime.parse(params.pd).getYear() == 2021"
}
}
}
}
}
}
But, if that field has a date time type, I would suggest simply filtering with a range query and then aggregating your documents.

Using Elasticsearch, how do I apply function scores to documents which conditionally have a property

I have a handful of indexes, some of which have a particular date property indicating when it was published (date_publish), and others do not. I am trying to apply a gauss function to decay the score of documents which were published a long time ago. The relevant indexes are correctly configured to recognise the date_publish property as a date.
I have set up my query as follows, specifically filtering documents which do not have the property:
{
"index": "index_contains_prop,index_does_not_contains_prop",
"body": {
"query": {
"function_score": {
"score_mode": "avg",
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "0"
}
}
},
{
"filter": {
"exists": {
"field": "date_publish"
}
},
"gauss": {
"date_publish": {
"origin": "now",
"scale": "728d",
"offset": "7d",
"decay": 0.5
}
}
}
]
}
},
"from": 0,
"size": 1000
}
}
However, the query errors with the following:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_does_not_contains_prop",
"node": "1hfXZK4TT3-K288nIr0UWA",
"reason": {
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
}
]
},
"status": 400
}
I have RTFM'd many times, and i can't see any discrepancy - I ahve also tried wrapping the exists condition in a bool:must object, to no avail.
Have I misunderstood the purpose of the filter argument?
The exists query will only work on fields that are part of the index mapping. It will return only documents that have a value for this field, but the field itself still needs to be defined in the mapping. This is why you're getting an error - index_does_not_contains_prop does not have date_publish mapped. You can use the put mapping API to add this field to the indexes who don't have it (it won't change any document), and then your query should work.

ElasticSearch: Is min_doc_count supported for Metric Aggregations

I am new to Elastic Search and am trying to make a query with Metric aggregation for my docs. But when I add the field: min_doc_count=1 for my sum metric aggregation, I get an error:
`
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
}
],
"type": "illegal_argument_exception",
"reason": "[sum] unknown field [min_doc_count], parser not found"
},
"status": 400
}
`
What am I missing here?
`
{
"aggregations" : {
"myKey" : {
"sum" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}
`
I'm not sure why/where you have the sum keyword?
The idea of min_doc_count is to make sure buckets returned by a given aggs query contain at least N documents, the example below would only return subject buckets for subjects that appear in 10 or more documents.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "subject",
"min_doc_count": 10
}
}
}
}
So with that in mind, yours would refactor to the following... Although when setting min_doc_count to 1, it's not really necessary to keep the parameter at all.
GET _search
{
"aggs" : {
"docs_per_subject" : {
"terms" : {
"field" : "field1",
"min_doc_count": 1
}
}
}
}
If you wish to sum only non-zero values of field you can filter those zero-values out in a query section:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"field": {
"gt": 0
}
}
}
]
}
},
"aggregations": {
"myKey": {
"sum": {
"field": "field1"
}
}
}
}
See Bool Query and Range Term

ELK query to return one record for each product with the max timestamp

On Kibana, I can view logs for various products (product.name) along with timestamp and other information. Here is one of the log:
{
"_index": "xxx-2017.08.30",
"_type": "logs",
"_id": "xxxx",
"_version": 1,
"_score": null,
"_source": {
"v": "1.0",
"level": "INFO",
"timestamp": "2017-01-30T18:31:50.761Z",
"product": {
"name": "zzz",
"version": "2.1.0-111"
},
"context": {
...
...
}
},
"fields": {
"timestamp": [
1504117910761
]
},
"sort": [
1504117910761
]
}
There are several other logs for same product and also several logs for different products.
However, I want to write a query that returns single record for a given product.name (the one with maximum timestamp value) and it returns same information for all other products. That, is logs returned one for each product and for each product, it should be the one with maximum timestamp.
How do I achieve this?
I tried to follow the approach listed in:
How to get latest values for each group with an Elasticsearch query?
And created a query:
{
"aggs": {
"group": {
"terms": {
"field": "product.name"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}'
But, I got an error that said:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [product.name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
Do I absolutely need to set fielddata=true for this field in this case? If no, what should I do? If yes, I am not sure how to set it. I tried doing it this way:
curl -XGET 'localhost:9200/xxx*/_search?pretty' -H 'Content-Type: application/json' -d'
{
"properties": {
"product.name": {
"type": "text",
"fielddata": true
}
},
"aggs": {
"group": {
"terms": {
"field": "product.name"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}'
But, I think there is something wrong with it (synatactically?) and I get this error:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [properties].",
"line" : 3,
"col" : 19
}
],
The reason you got error is because you try to do aggregation on text field (product.name) you can't doing that in elasticsearch 5.
You don't need to set field data to true,what you need to do is define in the mapping the fields product. name as a 2 fields, one product.name and second product.name.keyword
Like this:
{
"product.name":
{
"type" "text",
"fields":
{
"keyword":
{
"type": "keyword",
"ignore_above": 256
}
}
}
}
Then you need to do the aggregation on product.name.keyword

Scripting Elasticsearch 2.1| No such property: doc

I am facing an issue while trying to execute a script within an ES JSON request
The request:
POST _search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"bucket_histogram": {
"histogram": {
"field": "dayTime",
"interval": 10
},
"aggs": {
"get_average": {
"avg": {
"field": "value"
}
},
"check-threshold": {
"bucket_script": {
"buckets_path": {
"averageValue": "get_average"
},
"script": "averageValue - doc[\"thresholdValue\"].value"
}
}
}
}
}
}
But I get this error instead of returning values
{
"error": {
"root_cause": [],
"type": "reduce_search_phase_exception",
"reason": "[reduce] ",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "groovy_script_execution_exception",
"reason": "failed to run inline script [averageValue - doc[\"thresholdValue\"].value] using lang [groovy]",
"caused_by": {
"type": "missing_property_exception",
"reason": "No such property: doc for class: 7dcca7d142ac809a7192625d43d95bde9883c434"
}
}
},
"status": 503
}
Yet if I remove doc[\"thresholdValue\"] and enter a number everything works fine.
You are using a bucket_script, which is a part of the pipeline aggregations released with Elasticsearch 2.0. Pipeline aggregations work against other aggregations and not documents, which is why the doc context is not supplied to the aggregation.
If you want to process aggregations against specific documents, then perhaps you want the scripted metric aggregation instead.

Resources