How to auto apply index policy to newly created indexes in AWS Elasticsearch - elasticsearch

We push Nginx logs to AWS Elasticsearch using Filebeat and Logstash. We have created an index pattern with the name nginx-error-logs* & nginx-access-logs*. We can see in Kibana that daily new indices are being created based on the nginx log file date pattern. We created index policy and applied to existing indices but we would like to auto-apply the same ISM policy for all the newly created indices in Elasticsearch. Kindly help us to achieve this?
Is this the correct format to apply in Devtools console?
PUT _template/testindex_template
{
"index_patterns": ["*"],
"settings": {
"opendistro.index_state_management.policy_id": "index_lifecycle_management_policy"
}
}
Or should that be applied on the filebeat or Logstash config?

opendistro.index_state_management.policy_id is deprecated

opendistro.index_state_management.policy_id is deprecated
You have to add your index pattern in ism_template array of the policy. Below is the example.
PUT _opendistro/_ism/policies/policy_name
{
"policy": {
"description": "Policy to manage indices",
"default_state": "hot",
"states" : [
{
"name" : "hot",
"actions" : [
{
"rollover" : {
"min_size" : "20gb",
"min_index_age" : "2d"
}
}
]
}
],
"ism_template": {
"index_patterns": [
"nginx-error-logs*", // **sample index pattern**
"nginx-access-logs*"
],
"priority": 100
}
}
}
Whenever new index create, the index name pattern will match to the ism_template and the respective policy will be applied.
If same pattern available in multiple policy the it will attach the policy who has high prority.

Related

Remove ECS data from metricbeat for smaller documents

I use the graphite beat to get graphite protocol metrics into es.
The metric document is much bigger than the metric data itself (timestamp, value, metric name).
I also get all the ECS data inserted and I think it will make my queries much slower (and my documents much bigger) and I don't need this data.
Can I remove the ECS data somehow in the metricbeat configuration?
You might be able to use Metricbeat's drop_fields processor, but it might not be able to remove all the fields you specify as some are added after the processor chain.
So, acting on the ES side will guarantee you that you can change the event source the way you like. Also if you have many Beats deployed, you only need to configure this in a single place.
One way to achieve this is to create an index template for Metricbeat events and attach an ingest pipeline to it.
PUT _index_template/my-template
{
"index_patterns" : [
"metricbeat-*"
],
"template" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metric-lifecycle"
},
"codec" : "best_compression",
"default_pipeline" : "metric-pipeline"
}
},
...
Then the metric-pipeline would simply look like this and remove all the fields listed in the field array:
PUT _ingest/pipeline/metric-pipeline
{
"processors": [
{
"remove": {
"field": ["agent", "host", "..."]
}
}
]
}

Conditional indexing in metricbeat using Ingest node pipeline creates a datastream

I am trying to achieve conditional indexing for namespaces in elastic using ingest node pipelines. I used the below pipeline but the index getting created when I add the pipeline in metricbeat.yml is in form of datastreams.
PUT _ingest/pipeline/sample-pipeline
{
"processors": [
{
"set": {
"field": "_index",
"copy_from": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace==\"dev\"",
"ignore_failure": true
}
}
]
}
Expected index name is metricbeat-dev but i am getting the value in _index as .ds-metricbeat-dev.
This works fine when I test with one document but when I implement it in yml file I get the index name starting with .ds- why is this happening?
update for the template :
{
"metricbeat" : {
"order" : 1,
"index_patterns" : [
"metricbeat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metricbeat",
"rollover_alias" : "metricbeat-metrics"
},
If you have data streams enabled in the index templates it has potential to create a datastream. This would depend upon how you configure the priority. If priority is not mentioned then it would create legacy index but if priority higher than 100 is mentioned in the index templates. Then this creates a data stream(legacy index has priority 100 so use priority value more than 100 if you want index in form of data stream).
If its create a data stream and its not expected please check if there is a template pointing to index you are writing where data stream is enabled! This was the reason in my case.
Have been working with this for few months and this is what I have observed!

elasticsearch template doesn't change index ILM

in my elasticsearch, I will receive daily index with format like dstack-prod_dcbs-. I want to add ILM to them, immediately after they are revived. I dont know why ILM are not added to indexs. below you can find my command.(I have already defined "dstack-prod_dcbs-policy" ILM)
*PUT _template/dstack-prod_dcbs
{
"index_patterns": ["dstack-prod_dcbs-*"],
"settings": {
"index.lifecycle.name": "dstack-prod_dcbs-policy"
}
}*
but when I run
GET dstack-prod_dcbs/_ilm/explain*
below result returns
*{
"indices" : {
"dstack-prod_dcbs-20200821" : {
"index" : "dstack-prod_dcbs-20200821",
"managed" : false
},
"dstack-prod_dcbs-2020-09-22" : {
"index" : "dstack-prod_dcbs-2020-09-22",
"managed" : false
}
}
}*
I believe ILM is an alternative to using daily indices where indices are rolled over when a condition is met in the policy (not when it becomes a new day)
For ILM you need to define a rollover alias for the template
PUT _template/dstack-prod_dcbs
{
"index_patterns": ["dstack-prod_dcbs-*"],
"settings": {
"index.lifecycle.name": "dstack-prod_dcbs-policy",
"index.lifecycle.rollover_alias": "dstack-prod_dcbs"
}
}
Then you need to create the first index manually and assign it as the write index for the alias
PUT dstack-prod_dcbs-000001
{
"aliases": {
"dstack-prod_dcbs":{
"is_write_index": true
}
}
}
After that everything will be handled automatically and a new index will be created on rollover which will be then assigned as the write index for the alias

Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!
The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

Get elasticsearch indices before specific date

My logstash service sends the logs to elasticsearch as daily indices.
elasticsearch {
hosts => [ "127.0.0.1:9200" ]
index => "%{type}-%{+YYYY.MM.dd}"
}
Does Elasticsearch provides the API to lookup the indices before specific date?
For example, how could I get the indices created before 2015-12-15 ?
The only time I really care about what indexes are created is when I want to close/delete them using curator. Curator has "age" type features built in, if that's also your use case.
I think you are looking for Indices Query have a look here
Here is an example:
GET /_search
{
"query": {
"indices" : {
"query": {
"term": {"description": "*"}
},
"indices" : ["2015-01-*", "2015-12-*"],
"no_match_query": "none"
}
}
}
Each index has a creation_date field.
Since the number of indices is supposed to be quite small there's no such feature as 'searching for indices'. So you just get their metadata and filter them inside your app. The creation_date is also available via _cat API.

Resources