Conditional indexing in metricbeat using Ingest node pipeline creates a datastream - elasticsearch

I am trying to achieve conditional indexing for namespaces in elastic using ingest node pipelines. I used the below pipeline but the index getting created when I add the pipeline in metricbeat.yml is in form of datastreams.
PUT _ingest/pipeline/sample-pipeline
{
"processors": [
{
"set": {
"field": "_index",
"copy_from": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace==\"dev\"",
"ignore_failure": true
}
}
]
}
Expected index name is metricbeat-dev but i am getting the value in _index as .ds-metricbeat-dev.
This works fine when I test with one document but when I implement it in yml file I get the index name starting with .ds- why is this happening?
update for the template :
{
"metricbeat" : {
"order" : 1,
"index_patterns" : [
"metricbeat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metricbeat",
"rollover_alias" : "metricbeat-metrics"
},

If you have data streams enabled in the index templates it has potential to create a datastream. This would depend upon how you configure the priority. If priority is not mentioned then it would create legacy index but if priority higher than 100 is mentioned in the index templates. Then this creates a data stream(legacy index has priority 100 so use priority value more than 100 if you want index in form of data stream).
If its create a data stream and its not expected please check if there is a template pointing to index you are writing where data stream is enabled! This was the reason in my case.
Have been working with this for few months and this is what I have observed!

Related

Remove ECS data from metricbeat for smaller documents

I use the graphite beat to get graphite protocol metrics into es.
The metric document is much bigger than the metric data itself (timestamp, value, metric name).
I also get all the ECS data inserted and I think it will make my queries much slower (and my documents much bigger) and I don't need this data.
Can I remove the ECS data somehow in the metricbeat configuration?
You might be able to use Metricbeat's drop_fields processor, but it might not be able to remove all the fields you specify as some are added after the processor chain.
So, acting on the ES side will guarantee you that you can change the event source the way you like. Also if you have many Beats deployed, you only need to configure this in a single place.
One way to achieve this is to create an index template for Metricbeat events and attach an ingest pipeline to it.
PUT _index_template/my-template
{
"index_patterns" : [
"metricbeat-*"
],
"template" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metric-lifecycle"
},
"codec" : "best_compression",
"default_pipeline" : "metric-pipeline"
}
},
...
Then the metric-pipeline would simply look like this and remove all the fields listed in the field array:
PUT _ingest/pipeline/metric-pipeline
{
"processors": [
{
"remove": {
"field": ["agent", "host", "..."]
}
}
]
}

How to auto apply index policy to newly created indexes in AWS Elasticsearch

We push Nginx logs to AWS Elasticsearch using Filebeat and Logstash. We have created an index pattern with the name nginx-error-logs* & nginx-access-logs*. We can see in Kibana that daily new indices are being created based on the nginx log file date pattern. We created index policy and applied to existing indices but we would like to auto-apply the same ISM policy for all the newly created indices in Elasticsearch. Kindly help us to achieve this?
Is this the correct format to apply in Devtools console?
PUT _template/testindex_template
{
"index_patterns": ["*"],
"settings": {
"opendistro.index_state_management.policy_id": "index_lifecycle_management_policy"
}
}
Or should that be applied on the filebeat or Logstash config?
opendistro.index_state_management.policy_id is deprecated
opendistro.index_state_management.policy_id is deprecated
You have to add your index pattern in ism_template array of the policy. Below is the example.
PUT _opendistro/_ism/policies/policy_name
{
"policy": {
"description": "Policy to manage indices",
"default_state": "hot",
"states" : [
{
"name" : "hot",
"actions" : [
{
"rollover" : {
"min_size" : "20gb",
"min_index_age" : "2d"
}
}
]
}
],
"ism_template": {
"index_patterns": [
"nginx-error-logs*", // **sample index pattern**
"nginx-access-logs*"
],
"priority": 100
}
}
}
Whenever new index create, the index name pattern will match to the ism_template and the respective policy will be applied.
If same pattern available in multiple policy the it will attach the policy who has high prority.

Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!
The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

How to dynamically change destination index in continuous Elasticsearch transforms?

I am trying to extract some high level metrics from the log data we store in Elasticsearch. To achieve this I am running a number of continuous transforms to generate more meaningful high level logs.
I have included a dest block in my transform definition JSON, as follows:
"dest": {
"index": "transform_index" + date
}
But the aforementioned code is evaluated only once on transform creation time, and is not updated in future sync cycles.
I am looking for a solution to change the transform index on a monthly basis and I think it is doable using a pipeline. However, I am not sure how.
Any pointers are appreciated.
I've read through the documentation and found my answer. I've managed to achieve what I needed using pipelines, I've created a pipeline as follows:
PUT /_ingest/pipeline/add_timestamp_pipeline
{
"processors" : [
{
//copy timestamp field from transform source
"set" : {
"field" : "#timestamp",
"value" : "{{#timestamp}}"
}
},
{
//create indices based on #timestamp rounded to month
"date_index_name" => {
"field" => "#timestamp",
"index_name_prefix" => "hourly-activity-index-",
"date_rounding" => "M",
"date_formats" => ["UNIX_MS"]
}
}
]
}
Then you use the created pipeline in your transform:
PUT /_transform/hourly_transform
{
"dest" : {
"index" : "hourly_activity_index",
"pipeline" => "add_timestamp_pipeline"
},
//rest of the transform definition
}

Is it possible to organize data between elasticsearch shards based on stored data?

I want to build a data store with three nodes. The first one should keep all data, the second one data of the last month, the third data of the last week. Is it possible to automatically configure elasticsearch shards to relocate themselves between nodes so that this functionality is given?
if you want to move existing documents from some node to another then you can use _cluster/reroute.
But using this solution with automatic allocation can be dangerous as just after moving an index to target node it will try to even balance the cluster.
Or you can disable automatic allocations, in that case, only custom allocations will work and can be really risky to handle for large data set.
POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate_replica" : {
"index" : "test", "shard" : 1,
"node" : "node3"
}
}
]
}
source: Elasticsearch rerouting
Also, you should read this : > Customize document routing

Resources