Can an Elasticsearch rollup job dynamically create indexes like Logstash does? - elasticsearch

I am currently testing out the new rollup APIs in Elasticsearch 6.3 and am wondering if there is any way to configure the rollup job to dynamically create an index based on timestamp like Logstash does when ingesting data? The use case is to try and roll up large amounts of time series network performance reporting data and I'm worried that even an hourly rollup will create a huge index to manage so am looking to split it to have one index for each day's hourly rollup.
Current rollup job config:
{
"index_pattern": "dxs-raw-*",
"rollup_index": "dxs-hourly-%{+YYYY.MM.dd}",
"cron": "* */15 * * * ?",
"page_size": 1000,
"groups": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h",
"delay": "12h"
},
"terms": {
"fields": ["ci_id.keyword", "client_id.keyword", "element_name.keyword", "measurement.keyword", "source_management_platform.keyword", "unit.keyword"]
}
},
"metrics": [
{
"field": "value",
"metrics": ["min", "max", "avg"]
}
]
}
Error seen when PUTting job via Kibana DevTools console:
{
"error": {
"root_cause": [
{
"type": "invalid_index_name_exception",
"reason": "Invalid index name [dxs-hourly-%{+YYYY.MM.dd}], must be lowercase",
"index_uuid": "_na_",
"index": "dxs-hourly-%{+YYYY.MM.dd}"
}
],
"type": "runtime_exception",
"reason": "runtime_exception: Could not create index for rollup job [dxs-hourly]",
"caused_by": {
"type": "invalid_index_name_exception",
"reason": "Invalid index name [dxs-hourly-%{+YYYY.MM.dd}], must be lowercase",
"index_uuid": "_na_",
"index": "dxs-hourly-%{+YYYY.MM.dd}"
}
},
"status": 500
}

As of version 6.4 this is not possible but a new enhancement has been raised here
When the final solution is released I will update this answer with the implementation we have.

Related

How to calculate lag between the time log message was generated at application end and the time it was ingested to Elastic Search?

Elasticsearch Experts, need your help to achieve the below mention goal.
Goal:
Trying to find a way to calculate lag between the time, log message was generated at application end (#timestamp field) and the time, it was ingested to Elastic Search (ingest_time field)?
Current Setup:
I am using FluentD to capture the logs and send to Kafka. Then I use Kafka connect (Elasticsearch connector) to send the logs further to Elasticsearch. Since I have a layer of Kafka in between FluentD and Elasticsearch, I want to calculate the lag between the log message generation time and ingestion time.
Log message generation time is stored in timestamp field of the log and is done at when the application generates log. PFB how log message looks at Kafka topic end.
{
"message": "ServiceResponse - Throwing non 2xx response",
"log_level": "ERROR",
"thread_id": "http-nio-9033-exec-21",
"trace_id": "86d39fbc237ef7f8",
"user_id": "85355139",
"tag": "feedaggregator-secondary",
"#timestamp": "2022-06-18T23:30:06+0530"
}
I have created an ingest pipeline to add the ingest_time field to every doc inserted to the Elasticsearch index.
PUT _ingest/pipeline/ingest_time
{
"description": "Add an ingest timestamp",
"processors": [
{
"set": {
"field": "_source.ingest_time",
"value": "{{_ingest.timestamp}}"
}
}]
}
Once document gets inserted to the index from Kafka using Kafka connect (ES sink connector), this is how my message looks on Kibana in JSON format.
{
"_index": "feedaggregator-secondary-2022-06-18",
"_type": "_doc",
"_id": "feedaggregator-secondary-2022-06-18+2+7521337",
"_version": 1,
"_score": null,
"_source": {
"thread_id": "http-nio-9033-exec-21",
"trace_id": "86d39fbc237ef7f8",
"#timestamp": "2022-06-18T23:30:06+0530",
"ingest_time": "2022-06-18T18:00:09.038032Z",
"user_id": "85355139",
"log_level": "ERROR",
"tag": "feedaggregator-secondary",
"message": "ServiceResponse - Throwing non 2xx response"
},
"fields": {
"#timestamp": [
"2022-06-18T18:00:06.000Z"
]
},
"sort": [
1655574126000
]
}
Now, I wanted to calculate the difference between #timestamp field and ingest_time field. For this I added a script in the ingest pipeline, which adds a field lag_seconds and sets it value as the difference between ingest_time and #timestamp fields.
PUT _ingest/pipeline/calculate_lag
{
"description": "Add an ingest timestamp and calculate ingest lag",
"processors": [
{
"set": {
"field": "_source.ingest_time",
"value": "{{_ingest.timestamp}}"
}
},
{
"script": {
"lang": "painless",
"source": """
if(ctx.containsKey("ingest_time") && ctx.containsKey("#timestamp")) {
ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;
}
"""
}
}
]
}
Error:
But since my ingest_time and #timestamp fields are in different format it gave error DateTimeParseException.
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"header": {
"processor_type": "script"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)",
"java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1948)",
"java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:598)",
"java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:583)",
"ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;\n }",
" ^---- HERE"
],
"script": " if(ctx.containsKey(\"ingest_time\") && ctx.containsKey(\"#timestamp\")) {\n ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;\n }",
"lang": "painless",
"caused_by": {
"type": "date_time_parse_exception",
"reason": "Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22"
}
}
},
"header": {
"processor_type": "script"
}
},
"status": 500
}
So, need your help to find the lag_seconds, between the #timestamp and ingest_time fields.
Using managed Elasticsearch by AWS (Opensearch) Elasticsearch Version - 7.1
I can see a Java date parsing problem for the #timestamp field. ctx['#timestamp'] will return the value "2022-06-18T23:30:06+0530", which is a ISO_OFFSET_DATE_TIME. You would need to parse this is using OffsetDateTime.parse(ctx['#timestamp']). Alternatively, you could try to access the #timestamp from the fields block. You can read up on date parsing in Java at https://howtodoinjava.com/java/date-time/zoneddatetime-parse/.

_update_by_query + script do not work correctly,error:Trying to create too many scroll contexts

Elasticsearch version: 7.6.2
JVM:13.0.2
OS version:centeros7
This is my code
POST recommend_index/_update_by_query
{
"script": {
"source": "ctx._source.rec_doctor_id = 1"
},
"query": {
"bool": {
"must": [{
"terms": {
"id": ["22222"]
}
}]
}
}
}
This code does not return the result correctly,The error message is
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "Trying to create too many scroll contexts. Must be less than or equal to: [5000]. This limit can be set by changing the [search.max_open_scroll_context] setting."
}
],
"type": "search_phase_execution_exception",
"reason": "Partial shards failure",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 1,
"index": "recommend_index",
"node": "XXX",
"reason": {
"type": "exception",
"reason": "Trying to create too many scroll contexts. Must be less than or equal to: [5000]. This limit can be set by changing the [search.max_open_scroll_context] setting."
}
}
]
},
"status": 500
}
I'm sure the current scroll is 0
When I replace _UPDATE_BY_QUERY with _UPDATE, it updates normally
No change has been made in ES since last Friday, and suddenly an error is reported
No configuration changes have been made to the ES server
follow-up:
I set the search.max_open_scroll_context parameter to 5000 and found nothing to do with it.
I looked up the 7.6.2 release and found that someone was having the same problem as me.Link on this
here #71354 #56202
I guess this is due to scrolling triggering the 7.6.2 bug.I restarted the cluster node without upgrading and found that it worked!!

Adding docs to elasticsearch via API, discovering them via kibana. How?

I typically insert data (logs) into elasticsearch via logstash plugin. Then, I
can search them from kibana.
However, if I try to intert data in elasticsearch programatically (in order to
skip filebeat and logstash), I cannot find the data in kibana.
This is what I tested:
from elasticsearch import Elasticsearch
es = Elasticsearch(["XXX"], ...)
doc = {
"#version": 1,
"#timestamp": datetime.now(),
"timestamp": datetime.now(), # Just in case this is needed too
"message": "test message"
}
res = es.index(
index="foobar-2019.05.13", doc_type='whatever', id=3, body=doc,
refresh=True
)
# Doc is indexed by above code, as proved by
# es.search(
# index="foobar-*", body={"query": {"match_all": {}}}
#)
I added the index pattern `foobar-*`` to kibana in "Index Pattern -> Create
index pattern". Then, I can use "discover" page to search for documents in that
index. But no documents are found by kibana, even if those exist in
elasticsearch.
What I am missing? Are there any mappings that should be configured for index?
(note: using 6.x versions)
UPDATE: example of doc indexed, and mapping of index
# Example of doc indexed
{'_index': 'foobar-2019.05.13', '_type': 'doc', '_id': '41', '_score': 1.0,
'_source': {'author': 'foobar', 'message': 'karsa big and crazy. icarium crazy. mappo big.',
'timestamp': '2019-05-13T15:52:19.857898',
'#version': 1, '#timestamp': '2019-05-13T15:52:19.857900'}}
# mapping of foobar-2019.05.13'
{
"mapping": {
"doc": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "long"
},
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
I found the issue... there was a 2 hour timezone difference between host were
python code is running and elasticsearch/kibana hosts.
So, since I was using datetime.now(), I was inserting documents with a timestamp "hours in the future", and I was searching for them "anywhere in the past".
If I look for them in the future (or, if I wait for 2 hours without updating
them), they are found.
Embarrassing mistake on my side.
Fix for me was to use datetime.now(timezone.utc)

How to create a mutlitype index in Elasticsearch?

In several pages in Elasticsearch documentation is mentioned how to query a multi-type index.
But I failed to create one at the first place.
Here is my minimal example (on a Elasticsearch 6.x server):
PUT /myindex
{
"settings" : {
"number_of_shards" : 1
}
}
PUT /myindex/people/123
{
"first name": "John",
"last name": "Doe"
}
PUT /myindex/dog/456
{
"name": "Rex"
}
Index creation and fist insert did well, but at the dog type insert attempt:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [myindex] as the final mapping would have more than 1 type: [people, dog]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [myindex] as the final mapping would have more than 1 type: [people, dog]"
},
"status": 400
}
But this is exactly what I'm trying to do, buddy! Having "more than 1 type" in my index.
Do you know what I have to change in my calls to achieve this?
Many thanks.
Multiple mapping types are not supported from Elastic 6.0.0 onwards. See breaking changes for details.
You can still effectively use multiple types by implementing your own custom type field.
For example:
{
"mappings": {
"doc": {
"properties": {
"type": {
"type": "keyword"
},
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
}
}
}
}
}
This is described in removal of types.

Date_histogram Elasticsearch facet can't find field

I am using the date_histogram facet to find results based on a Epoch timestamp. The results are displayed on a histogram, with the date on the x-axis and count of events on the y-axis. Here is the code that I have that doesn't work:
angular.module('controllers', [])
.controller('FacetsController', function($scope, $http) {
var payload = {
query: {
match: {
run_id: '9'
}
},
facets: {
date: {
date_histogram: {
field: 'event_timestamp',
factor: '1000',
interval: 'second'
}
}
}
}
It works if I am using
field: '#timestamp'
which is in ISO8601 format; however, I need it to now work with Epoch timestamps.
Here is an example of what's in my Elasticsearch, maybe this can lead to some answers:
{"#version":"1",
"#timestamp":"2014-07-04T13:13:35.372Z","type":"automatic",
"installer_version":"0.3.0",
"log_type":"access.log","user_id":"1",
"event_timestamp":"1404479613","run_id":"9"}
},
When I run this, I receive this error:
POST 400 (Bad Request)
Any ideas as to what could be wrong here? I don't understand why I'd have such a difference from using the two different fields, as the only difference is the format. I researched as best I could and discovered I should be using 'factor', but that didn't seem to solve my problem. I am probably making a silly beginner mistake!
You need to set the indexing initially. Elasticsearch is good at defaults but it is not possible for it to determine if the provided value is a timestamp, integer or string. So its your job to tell Elasticsearch about the same.
Let me explain by example. Lets consider the following document is what you are trying to index:
{
"#version": "1",
"#timestamp": "2014-07-04T13:13:35.372Z",
"type": "automatic",
"installer_version": "0.3.0",
"log_type": "access.log",
"user_id": "1",
"event_timestamp": "1404474613",
"run_id": "9"
}
So initially you don't have an index and you index your document by making an HTTP request like so:
POST /test/date_experiments
{
"#version": "1",
"#timestamp": "2014-07-04T13:13:35.372Z",
"type": "automatic",
"installer_version": "0.3.0",
"log_type": "access.log",
"user_id": "1",
"event_timestamp": "1404474613",
"run_id": "9"
}
This creates a new index called test and a new doc type in index test called date_experiments.
You can check the mapping of this doc type date_experiments by doing so:
GET /test/date_experiments/_mapping
And what you get in the result is an auto-generated mapping that was generated by Elasticsearch:
{
"test": {
"date_experiments": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"event_timestamp": {
"type": "string"
},
"installer_version": {
"type": "string"
},
"log_type": {
"type": "string"
},
"run_id": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
}
}
}
Notice that the type of event_timestamp field is set to string. Which is why your date_histogram is not working. Also notice that the type of #timestamp field is already date because you pushed the date in the standard format which made easy for Elasticsearch to recognize your intention was to push a date in that field.
Drop this mapping by sending a DELETE request to /test/date_experiments and lets start from the beginning.
This time instead of pushing the document first, we will make the mapping according to our requirements so that our event_timestamp field is considered as a date.
Make the following HTTP request:
PUT /test/date_experiments/_mapping
{
"date_experiments": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "string"
},
"event_timestamp": {
"type": "date"
},
"installer_version": {
"type": "string"
},
"log_type": {
"type": "string"
},
"run_id": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
}
}
Notice that I have changed the type of event_timestamp field to date. I have not specified a format because Elasticsearch is good at understanding a few standard formats like in the case of #timestamp field where you pushed a date. In this case, Elasticsearch will be able to understand that you are trying to push a UNIX timestamp and convert it internally to treat it as a date and allow all date operations on it. You can specify a date format in the mapping just in case the dates you are pushing are not in any standard formats.
Now you can start indexing your documents and starting running your date queries and facets the same way as you were doing earlier.
You should read more about mapping and date format.

Resources