Update configuration for actively used index without data loss - elasticsearch

Sometimes, I need to update mappings, settings, or bind default pipelines to the actively used index.
For the time being, I am using a method with data loss as follows:
update the index template with proper mapping (or binding the default pipeline by index.default_pipeline);
create a_new_index (matching the template index_patterns);
reindex the index_to_fix to a_new_index to migrate the data already indexed;
use alias to redirect the coming indexing request to a_new_index (the alias will have the same name as index_to_fix to ensure the indexing is undisturbed) and delete the index_to_fix;
But between step 3 and step 4, there is a time gap, during which the newly indexed data are lost in the original index_to_fix.
Is there a way, to update configurations for actively used index without any data loss?

Thanks for the help of #LeBigCat, after some discussions. I think this problem could be solved in three steps.
Use Alias for CRUD
First thing first, try not to use index directly, use alias if possible; since you can't use an alias with the same name as the existed indices, directly you can't replace the index even if it's broken (badly designed). The easiest way is to use a template and include the index name directly in the alias.
PUT _template/test
{
...
"aliases" : {
"{index}-alias" : {}
}
}
Redirect the Indexing
Since the index_to_fix is being actively used, after updating the template and create a new index a_new_fix, we can use alias to redirect the indexing to a_new_fix.
POST /_aliases
{
"actions" : [
{ "add": { "index": "a_new_index", "alias": "index_to_fix-alias" } },
{ "remove": { "index": "index_to_fix", "alias": "index_to_fix-alias" } }
]
}
Migrating the Data
Simply use _reindex to migrate all the data from index_to_fix to a_new_index.
POST _reindex
{
"source": {
"index": "index_to_fix"
},
"dest": {
"index": "index_to_fix-alias"
}
}

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Can you set a field to not_analyzed in an auto created index in Elasticsearch?

As part of our AWS infrastructure, I am using an Elasticsearch (7.4) index. We use Terraform to create the domain in AWS Elasticsearch but we don't create the index explicitly. Instead when the first document is posted, the index is auto-created. This worked well, but now I have been given the requirement to have a non analyzed field (user id).
Is there a way to make a field not_analyzed when putting the first document?
If there is not, what are my options to set the field to not_analyzed? Should I do some sort of init/bootstrapping? Maybe there is a way to do it from Terraform. The application is build using Chalice and runs in Lambda. Not sure how to do initialization in Lambda in that case. Ideally I would fire this call a single time:
PUT /my_index
{
"mappings" : {
"properties" : {
"user_id" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
When restarting the application, this call would be send again but I guess it's immutable (PUT).
This might be an overkill but I would consider using index template feature
This may look like
PUT _index_template/template_1
{
"index_patterns": [
"my_template*"
],
"template": {
"mappings": {
"properties": {
"user_id" : {
"type" : "keyword"
}
}
}
},
"priority": 1
}
It can be terraformed using dedicated provider - it also integrates directly with AWS Elasticsearch using IAM keys.
Then first document created in that way will also build an index using given template (of course if name will match the pattern)
This doesn't directly answer your question, but for your problem I would suggest a solution outside Elasticsearch:
Provision a second Lambda function in Terraform that will be permitted to run PUT operations against Elasticsearch, and has a sole purpose of creating your index.
In Terraform, invoke this lambda function once the domain is created
In other words, perform the bootstrapping mentioned in your question, but move it to a separate lambda function instead of it being mixed into your application lambda.

Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!
The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

Specifying Field Types Indexing from Logstash to Elasticsearch

I have successfully ingested data using the XML filter plugin from Logstash to Elasticsearch, however all the field types are of the type "text."
Is there a way to manually or automatically specify the correct type?
I found the following technique good for my use:
Logstash would filter the data and change a field from the default - text to whatever form you want. The documentation would be found here. The example given in the documentation is:
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
This you add in the /etc/logstash/conf.d/02-... file in the body part. I believe the downside of this practice is that from my understanding it is less recommended to alter data entering the ES.
After you do this you will probably get the this problem. If you have this problem and your DB is a test DB that you can erase all old data just DELETE the index until now that there would not be a conflict (for example you have a field that was until now text and now it is received as date there would be a conflict between old and new data). If you can't just erase the old data then read into the answer in the link I linked.
What you want to do is specify a mapping template.
PUT _template/template_1
{
"index_patterns": ["te*", "bar*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"type1": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z YYYY"
}
}
}
}
}
Change the settings to match your needs such as listing the properties to map what you want them to map to.
Setting index_patterns is especially important because it tells elastic how to apply this template. You can set an array of index patterns and can use * as appropriate for wildcards. i.e logstash's default is to rotate by date. They will look like logstash-2018.04.23 so your pattern could be logstash-* and any that match the pattern will receive the template.
If you want to match based on some pattern, then you can use dynamic templates.
Edit: Adding a little update here, if you want logstash to apply the template for you, here is a link to the settings you'll want to be aware of.

How to set existing elastic search mapping from index: no to index: analyzed

I am new to elastic search, I want to updated the existing mapping under my index. My existing mapping looks like
"load":{
"mappings": {
"load": {
"properties":{
"customerReferenceNumbers": {
"type": "string",
"index": "no"
}
}
}
}
}
I would like to update this field from my mapping to be analyzed, so that my 'customerReferenceNumber' field will be available for search.
I am trying to run the following query in Sense plugin to do so,
PUT /load/load/_mapping { "load": {
"properties": {
"customerReferenceNumbers": {
"type": "string",
"index": "analyzed"
}
}
}}
but I am getting following error with this command,
MergeMappingException[Merge failed with failures {[mapper customerReferenceNumbers] has different index values]
Though there exist data associated with these mappings, here I am unable to understand why elastic search not allowing me to update mapping from no-index to indexed?
Thanks in advance!!
ElasticSearch doesn't allow this kind of change.
And even if it was possible, as you will have to reindex your data for your new mapping to be used, it is faster for you to create a new index with the new mapping, and reindex your data into it.
If you can't afford any downtime, take a look at the alias feature which is designed for these use cases.
This is by design. You cannot change the mapping of an existing field in this way. Read more about this at https://www.elastic.co/blog/changing-mapping-with-zero-downtime and https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html.

Resources