ELK query to return one record for each product with the max timestamp - elasticsearch

On Kibana, I can view logs for various products (product.name) along with timestamp and other information. Here is one of the log:
{
"_index": "xxx-2017.08.30",
"_type": "logs",
"_id": "xxxx",
"_version": 1,
"_score": null,
"_source": {
"v": "1.0",
"level": "INFO",
"timestamp": "2017-01-30T18:31:50.761Z",
"product": {
"name": "zzz",
"version": "2.1.0-111"
},
"context": {
...
...
}
},
"fields": {
"timestamp": [
1504117910761
]
},
"sort": [
1504117910761
]
}
There are several other logs for same product and also several logs for different products.
However, I want to write a query that returns single record for a given product.name (the one with maximum timestamp value) and it returns same information for all other products. That, is logs returned one for each product and for each product, it should be the one with maximum timestamp.
How do I achieve this?
I tried to follow the approach listed in:
How to get latest values for each group with an Elasticsearch query?
And created a query:
{
"aggs": {
"group": {
"terms": {
"field": "product.name"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}'
But, I got an error that said:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [product.name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
Do I absolutely need to set fielddata=true for this field in this case? If no, what should I do? If yes, I am not sure how to set it. I tried doing it this way:
curl -XGET 'localhost:9200/xxx*/_search?pretty' -H 'Content-Type: application/json' -d'
{
"properties": {
"product.name": {
"type": "text",
"fielddata": true
}
},
"aggs": {
"group": {
"terms": {
"field": "product.name"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}'
But, I think there is something wrong with it (synatactically?) and I get this error:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown key for a START_OBJECT in [properties].",
"line" : 3,
"col" : 19
}
],

The reason you got error is because you try to do aggregation on text field (product.name) you can't doing that in elasticsearch 5.
You don't need to set field data to true,what you need to do is define in the mapping the fields product. name as a 2 fields, one product.name and second product.name.keyword
Like this:
{
"product.name":
{
"type" "text",
"fields":
{
"keyword":
{
"type": "keyword",
"ignore_above": 256
}
}
}
}
Then you need to do the aggregation on product.name.keyword

Related

How to understand this description of 'collapse' in the Elasticsearch document?

ES version:6.4.3
First, pls imagine that I have an index like this:
create a new index "test_1",
store some data,
#### 1.create a new index "test_1"
DELETE test_1
PUT /test_1/
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
GET /test_1/_mapping
GET /test_1/_refresh
GET /test_1/_search
#### 2.put some doc
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "100" } }
{ "title" : ["100","101"] }
{ "index" : { "_index" : "test_1", "_id" : "101" } }
{ "title" : "100" }
test agg
#### 3.test agg
GET /test_1/_search
{
"size": 0,
"aggs": {
"title": {
"terms": {
"field": "title.keyword",
"size": 100
}
}
}
}
It works as expected, and the results are as follows:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"title": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100",
"doc_count": 2
},
{
"key": "101",
"doc_count": 1
}
]
}
}
}
test collapse
#### 4. test collapse
GET /test_1/_search
{
"_source": false,
"from":0,
"size": 10,
"query": {
"match_all": {
}
},
"collapse": {
"field": "title.keyword",
"inner_hits": {
"name": "latest",
"size": 1
}
}
}
The result is an error:
{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason": "failed to collapse 0, the collapse field must be single valued"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test_1",
"node": "1TlabepgQSi-5WvjVm6MuQ",
"reason": {
"type": "illegal_state_exception",
"reason": "failed to collapse 0, the collapse field must be single valued"
}
}
],
"caused_by": {
"type": "illegal_state_exception",
"reason": "failed to collapse 0, the collapse field must be single valued",
"caused_by": {
"type": "illegal_state_exception",
"reason": "failed to collapse 0, the collapse field must be single valued"
}
}
},
"status": 500
}
So my question is why the error is reported, is it related to this description of es about collapse:
The field used for collapsing must be a single valued keyword or numeric field with doc_values activated.
If the two are related, why is the reason for the error being failed to collapse 0, where does this 0 come from? Sincerely appreciate any answer.
First of all, thanks for providing a reproducible example, that helps a lot!!
Then, regarding collapse, indeed, it is only working on single valued fields. In your first document, title is an array, and hence, is multi-valued, which is not ok for collapsing.
Simply put, the 0 you see in the error message is the internal document ID, i.e. it's an incremental number that each document gets whenever it is indexed. In your case, 0 stands for the first document that has been indexed. If you invert the documents in your bulk call, you'll see 1 instead.

Strange behavior of range query in Elasticsearch

My question is pretty simple. I have an ES index which contains field updated that is a UNIX timestamp. I only have testing records (documents) in my index, which were all created today.
I have a following query, which works well and (righfully) doesn't return any results when executed:
GET /test_index/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"range": {
"updated": {
"lt": "159525360"
}
}
}
]
}
},
"sort": [
{
"updated": {
"order": "desc",
"mode": "avg"
}
}
]
}
So this is all ok. However, when I change timestamp in my query to lower number, I am getting multiple results! And these results all contain much larger values in updated field than 5000! Even more bafflingly, I am getting results with updated only being set in range of 1971 to 9999. So numbers like 1500 or 10000 behave corectly and I see no results. Query behaving strangely is below.
GET /test_index/_search
{
"size": 100,
"query": {
"bool": {
"must": [
{
"range": {
"updated": {
"lt": "5000"
}
}
}
]
}
},
"sort": [
{
"updated": {
"order": "desc",
"mode": "avg"
}
}
]
}
Btw, this is how my typical document stored in this index looks like:
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "V6LDyHMBAUKhWZ7lxRtb",
"_score" : null,
"_source" : {
"councilId" : 111,
"chargerId" : "15",
"unitId" : "a",
"connectorId" : "2",
"status" : 10,
"latitude" : 77.7,
"longitude" : 77.7,
"lastStatusChange" : 1596718920,
"updated" : 1596720720,
"dataType" : "recorded"
},
"sort" : [
1596720720
]
}
Here is a mapping of this index:
PUT /test_index/_mapping
{
"properties": {
"chargerId": { "type": "text"},
"unitId": { "type": "text"},
"connectorId": { "type": "text"},
"councilId": { "type": "integer"},
"status": {"type": "integer"},
"longitude" : {"type": "double"},
"latitude" : {"type": "double"},
"lastStatusChange" : {"type": "date"},
"updated": {"type": "date"}
}
}
Is there any explanation for this?
The default format for a date field in ES is
strict_date_optional_time||epoch_millis. Since you haven't specified epoch_second, your dates were incorrectly parsed (treated as millis since epoch). It's verifiable by running this script:
GET test_index/_search
{
"script_fields": {
"updated_pretty": {
"script": {
"lang": "painless",
"source": """
LocalDateTime.ofInstant(
Instant.ofEpochMilli(doc['updated'].value.millis),
ZoneId.of('Europe/Vienna')
).format(DateTimeFormatter.ofPattern("dd/MM/yyyy HH:mm"))
"""
}
}
}
}
Quick fix: update your mapping as follows:
{
...
"updated":{
"type":"date",
"format":"epoch_second"
}
}
and reindex.

ElasticSearch: aggregations for ip_range type

I have a field which is defined in mappings as:
"route": {
"type": "ip_range"
}
It works well, and I see the results when I query the ES:
"_source": {
"ip": "65.151.40.164",
"route": "65.151.40.0/22",
...
}
Now I want to do some aggregations of this field, and pretty much everything I try ends up being this error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [route] of type [ip_range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [route] of type [ip_range]"
}
}
I hope that it doesn't mean that ES doesn't support aggregations for ip_range? Or if it does, how can it be done?
UPDATE
As I said, so far any aggregations that work on other types (including ip type) don't work on ip_range.
Some examples:
{
"size": 0,
"aggs": {
"routes": {
"range": {
"field": "route",
"ranges": [
{"to": "10.0.0.0/32"}
]
}
}
}
}
{
"size": 0,
"aggs": {
"routes": {
"terms": {
"field": "route",
"size": 50
}
}
}
}
If anyone can point me to an aggregation that does work on ip_range that would be helpful!
There's a specific ip_range aggregation for the ip_range field type, i.e. do not use the range aggregation (only for numeric types) and terms (only for numeric and keyword types):
GET /ip_addresses/_search
{
"size": 10,
"aggs" : {
"routes" : {
"ip_range" : {
"field" : "route",
"ranges" : [
{"to": "10.0.0.0/32"}
]
}
}
}
}

Fielddata is disabled on text fields by default in elasticsearch

I have problem that I updated from elasticsearch 2.x to 5.1. However, some of my data does not work in newer elasticsearch because of this "Fielddata is disabled on text fields by default" https://www.elastic.co/guide/en/elasticsearch/reference/5.1/fielddata.html before 2.x it was enabled it seems.
Is there way to enable fielddata automatically to text fields?
I tried code like this
curl -XPUT http://localhost:9200/_template/template_1 -d '
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"fielddata-*": {
"type": "text",
"fielddata": true
}
}
}
}
}'
but it looks like elasticsearch does not understand wildcard there in field name. Temporary solution to this is that I am running python script every 30 minutes, scanning all indices and adding fielddata=true to fields which are new.
The problem is that I have string data like "this is cool" in elasticsearch.
curl -XPUT 'http://localhost:9200/example/exampleworking/1' -d '
{
"myfield": "this is cool"
}'
when trying to aggregate that:
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield"
}
}
}
}'
"Fielddata is disabled on text fields by default. Set fielddata=true on [myfield]"
that elasticsearch documentation suggest using .keyword instead of adding fielddata. However, that is not returning data what I want.
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield.keyword"
}
}
}
}'
returns:
"buckets" : [
{
"key" : "this is cool",
"doc_count" : 1
}
]
which is not correct. Then I add fielddata true and everything works:
curl -XPUT 'http://localhost:9200/example/_mapping/exampleworking' -d '
{
"properties": {
"myfield": {
"type": "text",
"fielddata": true
}
}
}'
and then aggregate
curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
"aggs": {
"foobar": {
"terms": {
"field": "myfield"
}
}
}
}'
return correct result
"buckets" : [
{
"key" : "cool",
"doc_count" : 1
},
{
"key" : "is",
"doc_count" : 1
},
{
"key" : "this",
"doc_count" : 1
}
]
How I can add this fielddata=true automatically to all indices to all text fields? Is that even possible? In elasticsearch 2.x this is working out of the box.
i will answer to myself
curl -XPUT http:/localhost:9200/_template/template_1 -d '
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fielddata": true
}
}
}
]
}
}
}'
this is doing what i want. Now all indexes have default settings fielddata true
Adding "fielddata": true allows the text field to be aggregated, but this has performance problems at scale. A better solution is to use a multi-field mapping.
Unfortunately, this is hidden a bit deep in Elasticsearch's documentations, in a warning under the fielddata mapping parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#before-enabling-fielddata
Here's a complete example of how this helps with a terms aggregation, tested on Elasticsearch 7.12 as of 2021-04-24:
Mapping (in ES7, under the mappings property of the body of a "put index template" request etc):
{
"properties": {
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Four documents indexed:
{
"bio": "Dogs are the best pet."
}
{
"bio": "Cats are cute."
}
{
"bio": "Cats are cute."
}
{
"bio": "Cats are the greatest."
}
Aggregation query:
{
"size": 0,
"aggs": {
"bios_with_cats": {
"filter": {
"match": {
"bio": "cats"
}
},
"aggs": {
"bios": {
"terms": {
"field": "bio.keyword"
}
}
}
}
}
}
Aggregation query results:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"bios_with_cats": {
"doc_count": 3,
"bios": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Cats are cute.",
"doc_count": 2
},
{
"key": "Cats are the greatest.",
"doc_count": 1
}
]
}
}
}
}
Basically, this aggregation says "Of the documents whose bios are like 'cats', how many of each distinct bio are there?" The one document without "cats" in its bio property is excluded, and then the remaining documents are grouped into buckets, one of which has one document and the other has two documents.

Raw nested aggregation

I would like to create a raw nested aggregation in ElasticSearch, but I'm enable to get it working.
My documents look like this :
{
"_index": "items",
"_type": "frame_spec",
"_id": "19770602001",
"_score": 1,
"_source": {
"item_type_name": "frame_spec",
"status": "published",
"creation_date": "2016-02-18T11:19:15Z",
"last_change_date": "2016-02-18T11:19:15Z",
"publishing_date": "2016-02-18T11:19:15Z",
"attributes": [
{
"brand": "Sun"
},
{
"model": "Sunglasses1"
},
{
"eyesize": "56"
},
{
"opc": "19770602001"
},
{
"madein": "UNITED KINGDOM"
}
]
}
}
What I want to do is to aggregate based on one of the attributes. I can't do a normal aggregation with "attributes.model" (for example) because some of them contain spaces. So I've tried using the "raw" property but it appears that ES considers it as a normal property and does not return any result.
This is what I've tried :
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}
But I have no result.
Have you any solution I could use for this problem ?
You should use a dynamic_template in your mapping that will catch all attributes.* string fields and create a raw sub-field for all of them. For other types than string, you don't really need raw fields. You need to delete your current index and then recreate it with this:
DELETE items
PUT items
{
"mappings": {
"frame_spec": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"path_match": "attributes.*",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
After that, you need to re-populate your index and then you'll be able to run this:
POST /items/_search
{
"size": 0,
"aggs": {
"brand": {
"terms": {
"field": "attributes.brand.raw"
}
}
}
}

Resources