Elastich search rollover index with ingest pipeline - elasticsearch

I have a data stream built out in elastic search through Kibana. I have all the right mappings, index patterns and settings. I created the index that matched the correct index pattern. All good so far.
I have a ingest pipeline that I have created to ensure that any documents that come to ES get a #timestamp field before getting ingested into the index.
PUT _ingest/pipeline/my_timestamp_pipeline
{
"description": "Adds a field to a document with the time of ingestion",
"processors": [
{
"set": {
"field": "#timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
I apply the above pipeline to the index as follows
PUT /<<index name>>/_settings
{
"settings": {
"default_pipeline": "my_timestamp_pipeline"
}
}
Everytime I do a manual rollover the ingest pipeline changes get disabled on the index and my documents fail to get indexed due to a missing #timestamp field, which is required as part of a data stream.
Do manual rollovers NOT support ingest pipelines and I have to manually apply the pipeline everytime I do a manual rollover?
I checked that you can pass properties during a manual rollover of an index but not for a rollover of a data stream. Am I missing anything obvious here?
Any help is appreciated
Thanks
Nick

Related

How to add custom index using ingest node pipeline?

Is it possible to create conditional indexing by using ingest node pipelines? I feel this could be done by the script processor but can someone tell if this is possible?
I am in a scenario where I should decide which is a better way to do custom indexing. I can mention conditions in the metricbeat.yml /filebeat.yml files to get this done. But is this the best way to do custom indexing? There is no logstash in my elastic stack
output.elasticsearch:
indices:
- index: "metricbeat-dev-%{[agent.version]}-%{+yyyy.MM.dd}"
when.equals:
kubernetes.namespace: "dev"
This is how I have implemented custom indexing in metric/filebeat right now. I have like 20+ namespaces in my Kubernetes cluster. Please help in suggesting if this could be done by ingest node pipeline or not
Yes, You can achived this by ingest pipeline Set Processor. Ingest Pipeline support accessing of metadata fields and you can access / update index name using _index field name.
Below is sample Ingest Pipeline which will update index name when namespace is dev:
[
{
"set": {
"field": "_index",
"value": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace== 'dev'"
}
}
]
Upadte 1: append agent version to index name. I ahve consider agent version feild name as agent.version
[
{
"set": {
"field": "_index",
"value": "metricbeat-dev-{{agent.version}}",
"if": "ctx.kubernetes?.namespace== 'dev'"
}
}
]

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

what is offline and online indexing in Elastic search? and when do we need to reindex?

what is offline and online indexing in Elastic search? I did my research but I couldn't find enough resources to see what these terms mean? any idea? and also when do we need to reindex? any examples would be great
The terms offline and online indexing are used here.
https://spark-summit.org/2014/wp-content/uploads/2014/07/Streamlining-Search-Indexing-using-Elastic-Search-and-Spark-Holden-Karau.pdf
Reindexing
The most basic form if reindexing just copies one index to another.
I have used this form of reindexing to change a mapping.
Elasticsearch doesn't allow you to change a mapping, so if you want to change a mapping you have to create a new index (index2) with a new mapping and then reindex. The reindex will fill that new mapping with the data of the old index.
The command below will move everything from index to index2.
curl -XPOST 'localhost:9200/_reindex?pretty' -d'
{
"source": {
"index": "index"
},
"dest": {
"index": "index2"
}
}'
You can also use reindexing to fill a new index with a part of the old one. You can do so by using a couple of parameters. The example below will copy the newest 1000 documents.
POST /_reindex
{
"size": 1000,
"source": {
"index": "index",
"sort": { "date": "desc" }
},
"dest": {
"index": "index2"
}
}
For more examples about reindexing please have a look at the official documentation.
offline vs online indexing
In ONLINE mode the new index is built while the old index is accessible to reads and writes. any update on the old index will also get applied to the new index.
In OFFLINE mode the table is locked up front for any read or write, and then the new index gets built from the old index. No read or write operation is permitted on the table while the index is being rebuilt. Only when the operation is done is the lock on the table released and reads and writes are allowed again.

Timestamps and documents are not detected in Kibana

I am using Elasticsearch for a while, and wanted to visualize the data with Kibana. Since I have time-series data, I created a---from my point of view---suitable timestamp field in the corresponding index. The relevant part of the index mappings is as follows:
[..]
"properties": {
"#timestamp": {
"enabled" : true,
"type":"date",
"format": "date_hour_minute_second_millis",
"store": true,
"path": "#timestamp"
},
[..]
I have played around with the "format" field-value, because I want to visualize data having millisecond resolution. Ideally, I would just like to use the raw timestamp from my application (i.e. Unix epoch time, fractional in seconds), but I couldn't get Kibana to detect that format. Currently, I am posting data as follows:
{
"#timestamp": "2015-03-10T14:37:42.644",
"name": "some counter",
"value": 91.76
}
Kibana detects the #timestamp field as a timestamp, but then tells me that it cannot find any documents stored having that field (which is not true):
This field is present in your elasticsearch mapping but not in any documents in the search results. You may still be able to visualize or search on it.
I should note that previously, I used "dateOptionalTime" as format for the timestamp, and everything was working fine in Kibana using "simple" timestamps. I need, however, to switch to milliseconds now.
Cheers!
I was struggling with this highly ambiguous problem as well. No matter how I changed the mapping, Kibana simply would not display certain fields.
Then I found out it has something to do with JVM Heap size and Kibana being unable to display every field 'in-memory'.
Setting "doc_value" to true for those fields during mapping fixed the issue.
In the Java API, I'm building the following Mapping that includes the doc_values option
XContentFactory.jsonBuilder()
.startObject()
.startObject("#memory-available")
.field("type", "integer")
.field("index","not_analyzed")
.field("doc_values", true)
.endObject()
.endObject();
Read More here : Doc Values in ES

Do changes to elasticsearch mapping apply to already indexed documents?

If I change the mapping so certain properties have new/different boost values, does that work even if the documents have already been indexed? Or do the boost values get applied when the document is indexed?
You cannot change field level boost factors after indexing data. It's not even possible for new data to be indexed once the same fields have been indexed already for previous data.
The only way to change the boost factor is to reindex your data. The pattern to do this without changing the code of your application is to use aliases. An alias points to a specific index. In case you want to change the index, you create a new index, then reindex data from the old index to the new index and finally you change the alias to point to the new index. Reindexing data is either supported by the elasticsearch library or can be achieved with a scan/scroll.
First version of mapping
Index: items_v1
Alias: items -> items_v1
Change necessary, sencond version of the index with new field level boost values :
Create new index: items_v2
Reindex data: items_v1 => items_v2
Change alias: items -> items_v2
This might be useful in other situations where you want to change your mapping.
Field level boosts are, however, not recommended. The better approach is to use boosting at query time.
Alias commands are:
Adding an alias
POST /_aliases
{
"actions": [
{ "add": {
"alias": "tems",
"index": "items_v1"
}}
]
}
Removing an alias
POST /_aliases
{
"actions": [
{ "remove": {
"alias": "tems",
"index": "items_v1"
}}
]
}
They do not.
Index time boosting is generally not recommended. Instead, you should do your boosting when you search.

Resources