Modify default number of Elasticsearch shards - elasticsearch

If I have a 15 node cluster, do I have to change the
index.number_of_shards
value on all 15 nodes, and restart them, before the new value comes into effect for new indexes?

That is right changing index.number_of_shards defaults in config file would involve changing the setting on all nodes and then restarting the instance ideally following the guidelines for rolling restarts.
However if that is not an option and if explicitly specifying the number_of_shards in the settings while creating the new index is not ideal then the workaround would be using index templates
Example:
One can create an index_defaults default as below
PUT /_template/index_defaults
{
"template": "*",
"settings": {
"number_of_shards": 4
}
}
This applies the setting specified in index_defaults template to all new indexes.

Once you set the number of shards for an index in ElasticSearch, you cannot change them. You will need to create a new index with the desired number of shards, and depending on your use case, you may want then to transfer the data to the new index.
I say depending on the use case because, for instance, if you are storing time based data such as log events, it is perfectly reasonable to close one index and open a new one with a different number of shards, and index all data going forward to that new index, keeping the old one for searches.
However, if your use case is, for instance, storing blog documents, and your indices are by topic, then you will need to (a) create new indices as stated above with a different number of shards and (b) reindex your data. For (b) I recommend using the Scroll and Scan API to get the data out of the old index.

You need to create a template for new indices that will be created:
PUT /_template/index_defaults
{
"index_patterns": "*",
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
For old indices you need to reindex.
Example: from my_old_index to my_new_index
Create the new index with appropriate mapping and settings:
PUT my_new_index
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
Reindex from old index to new one, specify type only if you desire:
POST /_reindex?slices=5
{
"size": 100000,
"source": { "index": "my_old_index" },
"dest": { "index": "my_new_index", "type": "my_type" }
}

Updated syntax to avoid some deprecation warnings in Elasticsearch 6+
per
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/indices-templates.html
PUT /_template/index_defaults
{
"index_patterns": ["*"],
"order" : 0,
"settings": {
"number_of_shards": 2
}
}

Please remember that specifying the number of shards is a static operation and should be done when creating an index. But, any change after the index is created will require complete reindexing again which will take time.
To create the number of shards when creating an index use this command.
curl -XPUT ‘localhost:9200/my_sample_index?pretty’ -H ‘Content-Type: application/json’ -d’
{
“settings:”{
“number_of_shards”:2,
“number_of_replicas”:0
}
}
you don't have to to run this on all the nodes. run them on any one node. All the nodes communicate with each other about the change to the elastic index.

Related

Issue setting up ElasticSearch Index Lifecycle policy with pipeline date index name

I'm new to setting up a proper Lifecycle policy, so I'm hoping someone can please give me a hand with this. So, I have an existing index getting created on a weekly basis. This is a third party integration (they provided me with the pipeline and index template for the incoming logs). Logs are being created weekly in the pattern "name-YYYY-MM-DD". I'm attempting to setup a lifecycle policy for these indexes so they transition from hot->warm->delete. So far, I have done the following:
Updated the index template to add the policy and set an alias:
{
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"number_of_replicas": "1"
On the existing indexes, set the alias and which one is the "write" index:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-13",
"alias" : "cloudflare",
"is_write_index" : true
}
}
]
}
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "cloudflare-2020-07-06",
"alias" : "cloudflare",
"is_write_index" : false
}
}
]
}
Once I did that, I started seeing the following 2 errors (1 on each index):
ILM error #1
ILM error #2
I'm not sure why the "is not the write index" error is showing up on the older index. Perhaps this is because it is still "hot" and trying to move it to another phase without it being the write index?
For the second error, is this because the name of the index is wrong for rollover?
I'm also not clear if this is a good scenario for rollover. These indexes are being created weekly, which I assume is ok. I would think normally you would create a single index and let the policy split off the older ones based upon your criteria (size, age, etc). Should I change this or can I make this policy work with existing weekly files? In case you need it, here is part of the pipeline that I imported into ElasticSearch that I believe is responsible for the index naming:
{
"date_index_name" : {
"field" : "EdgeStartTimestamp",
"index_name_prefix" : "cloudflare-",
"date_rounding" : "w",
"timezone" : "UTC",
"date_formats" : [
"uuuu-MM-dd'T'HH:mm:ssX",
"uuuu-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
]
}
},
So, for me at the moment the more important error is the "number_format_exception". I'm thinking it is due to this setting I'm seeing in the index (provided_name):
{
"settings": {
"index": {
"lifecycle": {
"name": "Cloudflare",
"rollover_alias": "cloudflare"
},
"mapping": {
"ignore_malformed": "true"
},
"number_of_shards": "1",
"provided_name": "<cloudflare-{2020-07-20||/w{yyyy-MM-dd|UTC}}>",
"creation_date": "1595203589799",
"priority": "100",
"number_of_replicas": "1",
I believe this "provided_name" is getting established from the pipeline's "date_index_name" I provided above. If this is the issue, is there a way to create a fixed index name via the ingest pipeline without it changing based upon the date? I would rather just create a fixed index and let the lifecycle policy handle the split offs (i.e. 0001, 0002, etc).
I've been looking for a way to create a fixed index name without the "date_index_name" processor, but I haven't found a way to do this yet. Or, if I can create an index name with a date and add a suffix that would allow the LifeCycle policy manager (ILM) to add the incremental number at the end, that might work as well. Any help here would be greatly appreciated!
The main issue is that the existing indexes do not end with a sequence number (i.e. 0001, 0002, etc), hence the ILM doesn't really know how to proceed.
The name of this index must match the template’s index pattern and end with a number
You'd be better off letting ILM manage the index creation and rollover, since that's exactly what it's supposed to do. All you need to do is to keep writing to the same cloudflare alias and that's it. No need for a date_index_name ingest processor.
So your index template is correct as it is.
Next you need to bootstrap the initial index
PUT cloudflare-2020-08-11-000001
{
"aliases": {
"cloudflare": {
"is_write_index": true
}
}
}
You can then either reindex your old indices into ILM-managed indices or apply lifecycle policies to your old indices.

Update configuration for actively used index without data loss

Sometimes, I need to update mappings, settings, or bind default pipelines to the actively used index.
For the time being, I am using a method with data loss as follows:
update the index template with proper mapping (or binding the default pipeline by index.default_pipeline);
create a_new_index (matching the template index_patterns);
reindex the index_to_fix to a_new_index to migrate the data already indexed;
use alias to redirect the coming indexing request to a_new_index (the alias will have the same name as index_to_fix to ensure the indexing is undisturbed) and delete the index_to_fix;
But between step 3 and step 4, there is a time gap, during which the newly indexed data are lost in the original index_to_fix.
Is there a way, to update configurations for actively used index without any data loss?
Thanks for the help of #LeBigCat, after some discussions. I think this problem could be solved in three steps.
Use Alias for CRUD
First thing first, try not to use index directly, use alias if possible; since you can't use an alias with the same name as the existed indices, directly you can't replace the index even if it's broken (badly designed). The easiest way is to use a template and include the index name directly in the alias.
PUT _template/test
{
...
"aliases" : {
"{index}-alias" : {}
}
}
Redirect the Indexing
Since the index_to_fix is being actively used, after updating the template and create a new index a_new_fix, we can use alias to redirect the indexing to a_new_fix.
POST /_aliases
{
"actions" : [
{ "add": { "index": "a_new_index", "alias": "index_to_fix-alias" } },
{ "remove": { "index": "index_to_fix", "alias": "index_to_fix-alias" } }
]
}
Migrating the Data
Simply use _reindex to migrate all the data from index_to_fix to a_new_index.
POST _reindex
{
"source": {
"index": "index_to_fix"
},
"dest": {
"index": "index_to_fix-alias"
}
}

Elasticsearch - Reindex whole cluster using pattern for new index name

I have an index with thousands of indices, with 5 shards per index.
I would like to reindex them with only 1 shard per index.
Is there a build in solution in Elastic to reindex for instance all the indices by adding "-reindexed" to each index ?
Looks like you want to dynamically change the index names while reindexing.
Let's understand this with an example:
1) Add some indices:
POST sample/_doc/1
{
"test" : "sample"
}
POST sample1/_doc/1
{
"test" : "sample"
}
POST sample2/_doc/1
{
"test" : "sample"
}
2) Use Reindex API to dynamically change the index names while reindexing multiple indices:
POST _reindex
{
"source": {
"index": "sample*"
},
"dest": {
"index": ""
},
"script": {
"inline": "ctx._index = ctx._index + '-reindexed'"
}
}
The above request will reindex all the indices starting with sample and add -reindexed in their indexNames. So that means sample, sample1 and sample2 will be reindexed as sample-reindexed, sample1-reindexed and sample2-reindexed all with this one request.
In order to set up the destination indices with one shard you need to
create those indices before reindexing.
Hope that helps.
You could do a simple reindex but I'd also recommend you take a look at the Shrink Index API:
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/indices-shrink-index.html
The documentation above links to v7.0, but this has been around for many iterations.
In your example, you would do something similar to the following:
First, reallocate copies of all primary or replica shards to a single node and prevent any future write-access while the shrink operations are being performed.
PUT /my_source_index/_settings
{
"settings": {
"index.routing.allocation.require._name": "shrink_node_name",
"index.blocks.write": true
}
}
Initiate the shrink operation, clear the index settings set in the previous command, and update your primary and replica settings on the target index:
POST my_source_index/_shrink/my_target_index-reindexed
{
"settings": {
"index.routing.allocation.require._name": null,
"index.blocks.write": null,
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression"
}
}
Note the above is also allocating a replica shard - if you don't want this, ensure you set this to 0.
You would want to set up a script of some sort to iterate through the list of source indices one by one.

what is offline and online indexing in Elastic search? and when do we need to reindex?

what is offline and online indexing in Elastic search? I did my research but I couldn't find enough resources to see what these terms mean? any idea? and also when do we need to reindex? any examples would be great
The terms offline and online indexing are used here.
https://spark-summit.org/2014/wp-content/uploads/2014/07/Streamlining-Search-Indexing-using-Elastic-Search-and-Spark-Holden-Karau.pdf
Reindexing
The most basic form if reindexing just copies one index to another.
I have used this form of reindexing to change a mapping.
Elasticsearch doesn't allow you to change a mapping, so if you want to change a mapping you have to create a new index (index2) with a new mapping and then reindex. The reindex will fill that new mapping with the data of the old index.
The command below will move everything from index to index2.
curl -XPOST 'localhost:9200/_reindex?pretty' -d'
{
"source": {
"index": "index"
},
"dest": {
"index": "index2"
}
}'
You can also use reindexing to fill a new index with a part of the old one. You can do so by using a couple of parameters. The example below will copy the newest 1000 documents.
POST /_reindex
{
"size": 1000,
"source": {
"index": "index",
"sort": { "date": "desc" }
},
"dest": {
"index": "index2"
}
}
For more examples about reindexing please have a look at the official documentation.
offline vs online indexing
In ONLINE mode the new index is built while the old index is accessible to reads and writes. any update on the old index will also get applied to the new index.
In OFFLINE mode the table is locked up front for any read or write, and then the new index gets built from the old index. No read or write operation is permitted on the table while the index is being rebuilt. Only when the operation is done is the lock on the table released and reads and writes are allowed again.

How to create a common mapping template for indices?

For the app i created, the indices are generated once in a week. And the type and nature of the data is not varying and that implies, I need the same mapping type for these indices. Is it possible in elasticsearch to apply the same mapping to all the indices as they are created?. This could avoid me the overhead of defining mapping each time the index is created.
Definitely, you can use what is called an index template. Since your mapping type is stable, that's the perfect condition for using index templates.
It's as easy as creating an index. See below, whenever you want to index a document in an index whose name matches my_*, ES will select that template and create the index for you using the given mappings, settings and aliases:
curl -XPUT localhost:9200/_template/template_1 -d '{
"template" : "my_*",
"settings" : {
"number_of_shards" : 1
},
"aliases" : {
"my_alias" : {}
},
"mappings" : {
"my_type" : {
"properties" : {
"my_field": { "type": "string" }
}
}
}
}'
It's basically the technique used by Logstash when it needs to index new logs for each new day in a new daily index.
You can employ index template to address your problem. The official documentation can be found here.
A use case of how to apply the same with examples can be found in this blog

Resources