Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name - elasticsearch

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!

The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

Related

What is the best way to update cache in elasticsearch

I'm using elasticsearch index as a cache table.
My document structure is the following:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.
The problem is that I need to update cached results about once an hour. I wish my service won't lose the ability to use cache efficiently while updating procedure, so I'm not sure which one of solutions is the best:
Sequentially update documents one-by-one, so the index won't be destroyed. The drawback of this solution I afraid is the fact, that every update causes index rebuilding, so the cache requests will become slow
Create entirely new index with new results and then somehow swap current cache index with the new one. The drawbacks I see are
a) I've found no elegant way to swap indexes
b) Users will get their cached resuts lately than in solution (1)
I would go with #2 as everytime you update a document the cache is flushed.
There is an elegant way to swap indices:
You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.
Something like this:
Current index name is items-2022-11-26-001
Create alias items pointing to items-2022-11-26-001
POST _aliases
{
"actions": [
{
"add": {
"index": "items-2022-11-26-001",
"alias": "items"
}
}
]
}
Create new index with fresh data items-2022-11-26-002
When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
"actions": [
{
"remove": {
"index": "items-2022-11-26-001",
"alias": "items"
}
},
{
"add": {
"index": "items-2022-11-26-002",
"alias": "items"
}
}
]
}
Delete items-2022-11-26-001
You run all your queries against "items" alias that will act as an index.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Update configuration for actively used index without data loss

Sometimes, I need to update mappings, settings, or bind default pipelines to the actively used index.
For the time being, I am using a method with data loss as follows:
update the index template with proper mapping (or binding the default pipeline by index.default_pipeline);
create a_new_index (matching the template index_patterns);
reindex the index_to_fix to a_new_index to migrate the data already indexed;
use alias to redirect the coming indexing request to a_new_index (the alias will have the same name as index_to_fix to ensure the indexing is undisturbed) and delete the index_to_fix;
But between step 3 and step 4, there is a time gap, during which the newly indexed data are lost in the original index_to_fix.
Is there a way, to update configurations for actively used index without any data loss?
Thanks for the help of #LeBigCat, after some discussions. I think this problem could be solved in three steps.
Use Alias for CRUD
First thing first, try not to use index directly, use alias if possible; since you can't use an alias with the same name as the existed indices, directly you can't replace the index even if it's broken (badly designed). The easiest way is to use a template and include the index name directly in the alias.
PUT _template/test
{
...
"aliases" : {
"{index}-alias" : {}
}
}
Redirect the Indexing
Since the index_to_fix is being actively used, after updating the template and create a new index a_new_fix, we can use alias to redirect the indexing to a_new_fix.
POST /_aliases
{
"actions" : [
{ "add": { "index": "a_new_index", "alias": "index_to_fix-alias" } },
{ "remove": { "index": "index_to_fix", "alias": "index_to_fix-alias" } }
]
}
Migrating the Data
Simply use _reindex to migrate all the data from index_to_fix to a_new_index.
POST _reindex
{
"source": {
"index": "index_to_fix"
},
"dest": {
"index": "index_to_fix-alias"
}
}

ElasticSearch - what is the difference between an index template and an index pattern

I have read an explanation to my question here:
https://discuss.elastic.co/t/whats-the-differece-between-index-pattern-and-index-template/54948
However, I still don't understand the difference. When defining an index PATTERN, does it not affect index creation at all? Also, what happens if I create an index but it doesn't have a corresponding index pattern? How can I see the mapping used for an index pattern so I can know how to use the Mapping API to update it?
And on a side note, the docs say you manage the index patterns by clicking the "Settings" and then "Indices" tab. I'm looking at Kibana and I don't see any settings tab. I can view the index patterns through the management tab, but I don't see any settings tab there
An index template is an ES feature for triggering the creation of new indexes whenever a name pattern is matched. For instance, let's say we create the following index template:
PUT _template/template_1
{
"index_patterns": ["foo*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
...
}
}
As you can see, as soon as we want to index a document inside an index named (e.g.) foo-44 and that index doesn't exist, then that template (settings + mappings) will be used by ES in order to create the foo-44 index automatically.
You can update an index template at any time by simply PUTting a new settings/mappings definition like above.
An index pattern (not to be confounded with the index-patterns property you saw above, those are two totally different things), is a Kibana feature for telling Kibana what makes up an index (all the fields, their types, etc). Nothing can happen in Kibana without creating index patterns, which you can do in Management > Index Patterns.
Creating an index in ES will not create any index pattern in Kibana. Similarly, creating an index pattern in Kibana will not create any index in ES.
The reason why Kibana needs an index pattern is because it needs to store different kind of information as it available in an index mapping. For instance, let's say you create an index with the following mapping:
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"timestamp": {
"type": "date"
},
"name": {
"type": "text"
}
}
}
}
}
Then the corresponding index pattern that you will create in Kibana will have the following content:
GET .kibana/doc/index-pattern:16a98050-a53f-11e8-82ab-af0d48c6ddd8
{
"type": "index-pattern",
"updated_at": "2018-08-21T12:38:22.509Z",
"index-pattern": {
"title": "my_index*",
"timeFieldName": "timestamp",
"fields": """[{"name":"_id","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"_index","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"_score","type":"number","count":0,"scripted":false,"searchable":false,"aggregatable":false,"readFromDocValues":false},{"name":"_source","type":"_source","count":0,"scripted":false,"searchable":false,"aggregatable":false,"readFromDocValues":false},{"name":"_type","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"name","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":false,"readFromDocValues":false},{"name":"timestamp","type":"date","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":true}]"""
}
}
As you can see, Kibana also stores the timestamp field, the name of the index pattern (which can span several indexes). Also it stores various properties for each field you have defined, for instance, for the name field, the index-pattern contains the following information that Kibana needs to know:
{
"name": "name",
"type": "string",
"count": 0,
"scripted": false,
"searchable": true,
"aggregatable": false,
"readFromDocValues": false
},

Modify default number of Elasticsearch shards

If I have a 15 node cluster, do I have to change the
index.number_of_shards
value on all 15 nodes, and restart them, before the new value comes into effect for new indexes?
That is right changing index.number_of_shards defaults in config file would involve changing the setting on all nodes and then restarting the instance ideally following the guidelines for rolling restarts.
However if that is not an option and if explicitly specifying the number_of_shards in the settings while creating the new index is not ideal then the workaround would be using index templates
Example:
One can create an index_defaults default as below
PUT /_template/index_defaults
{
"template": "*",
"settings": {
"number_of_shards": 4
}
}
This applies the setting specified in index_defaults template to all new indexes.
Once you set the number of shards for an index in ElasticSearch, you cannot change them. You will need to create a new index with the desired number of shards, and depending on your use case, you may want then to transfer the data to the new index.
I say depending on the use case because, for instance, if you are storing time based data such as log events, it is perfectly reasonable to close one index and open a new one with a different number of shards, and index all data going forward to that new index, keeping the old one for searches.
However, if your use case is, for instance, storing blog documents, and your indices are by topic, then you will need to (a) create new indices as stated above with a different number of shards and (b) reindex your data. For (b) I recommend using the Scroll and Scan API to get the data out of the old index.
You need to create a template for new indices that will be created:
PUT /_template/index_defaults
{
"index_patterns": "*",
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
For old indices you need to reindex.
Example: from my_old_index to my_new_index
Create the new index with appropriate mapping and settings:
PUT my_new_index
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
Reindex from old index to new one, specify type only if you desire:
POST /_reindex?slices=5
{
"size": 100000,
"source": { "index": "my_old_index" },
"dest": { "index": "my_new_index", "type": "my_type" }
}
Updated syntax to avoid some deprecation warnings in Elasticsearch 6+
per
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/indices-templates.html
PUT /_template/index_defaults
{
"index_patterns": ["*"],
"order" : 0,
"settings": {
"number_of_shards": 2
}
}
Please remember that specifying the number of shards is a static operation and should be done when creating an index. But, any change after the index is created will require complete reindexing again which will take time.
To create the number of shards when creating an index use this command.
curl -XPUT ‘localhost:9200/my_sample_index?pretty’ -H ‘Content-Type: application/json’ -d’
{
“settings:”{
“number_of_shards”:2,
“number_of_replicas”:0
}
}
you don't have to to run this on all the nodes. run them on any one node. All the nodes communicate with each other about the change to the elastic index.

Resources