Elasticsearch - Reindex whole cluster using pattern for new index name - elasticsearch

I have an index with thousands of indices, with 5 shards per index.
I would like to reindex them with only 1 shard per index.
Is there a build in solution in Elastic to reindex for instance all the indices by adding "-reindexed" to each index ?

Looks like you want to dynamically change the index names while reindexing.
Let's understand this with an example:
1) Add some indices:
POST sample/_doc/1
{
"test" : "sample"
}
POST sample1/_doc/1
{
"test" : "sample"
}
POST sample2/_doc/1
{
"test" : "sample"
}
2) Use Reindex API to dynamically change the index names while reindexing multiple indices:
POST _reindex
{
"source": {
"index": "sample*"
},
"dest": {
"index": ""
},
"script": {
"inline": "ctx._index = ctx._index + '-reindexed'"
}
}
The above request will reindex all the indices starting with sample and add -reindexed in their indexNames. So that means sample, sample1 and sample2 will be reindexed as sample-reindexed, sample1-reindexed and sample2-reindexed all with this one request.
In order to set up the destination indices with one shard you need to
create those indices before reindexing.
Hope that helps.

You could do a simple reindex but I'd also recommend you take a look at the Shrink Index API:
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/indices-shrink-index.html
The documentation above links to v7.0, but this has been around for many iterations.
In your example, you would do something similar to the following:
First, reallocate copies of all primary or replica shards to a single node and prevent any future write-access while the shrink operations are being performed.
PUT /my_source_index/_settings
{
"settings": {
"index.routing.allocation.require._name": "shrink_node_name",
"index.blocks.write": true
}
}
Initiate the shrink operation, clear the index settings set in the previous command, and update your primary and replica settings on the target index:
POST my_source_index/_shrink/my_target_index-reindexed
{
"settings": {
"index.routing.allocation.require._name": null,
"index.blocks.write": null,
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression"
}
}
Note the above is also allocating a replica shard - if you don't want this, ensure you set this to 0.
You would want to set up a script of some sort to iterate through the list of source indices one by one.

Related

elasticsearch reindex doesnt use index name specified in script

I tried reindexing daily indices from remote cluster and following reindex-daily-indices example
POST _reindex
{
"source": {
"remote": {
"host": "http://remote_es:9200"
},
"index": "telemetry-*"
},
"dest": {
"index": "dummy"
},
"script": {
"lang": "painless",
"source": """
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
"""
}
}
It looks like if the new ctx._index is exactly the same as the original ctx._index, it will use the dest.index instead. It reindex all the records into "dummy" index
Is this a bug or intended behaviour? I could not find any explanation to this behaviour.
Is there a way to reindex (multiple indices) from remote and still preserve the original name?
It's because according to your logic, the destination index name is the same as the source index name. In the documentation you linked at, they are appending '-1' at the end of the index name.
In your case, the following logic just sets the same destination index name as the source index name, and reindex doesn't allow that, so it's using the destination index name specified in dest.index
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
Also worth noting that this case has been reported here and here.

How to point elasticsearch alias to current index and removing the alias from old index from index template?

In our application , we are creating the elasticsearch index daily basis and index pattern is index-. (eg. index-17-09-2019). But our application is accessing the index through an alias which is pointing the current index. Now attaching and removing of the alias with the index is done through a cron job. Is it possible to do it through through index template as we are avoiding the cron job.
We can attach alias with the index through index template but I am not sure whether we can detach the alias with the old index and add it to the new index through index template.
That can be done with built-in index lifecycle management (ILM). Your application will be sending data to index alias and ILM will take care of the rest.
Here is the description of how it can be done, but basically you need to:
1. Create ILM job
PUT /_ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "1d"
}
}
}
}
}
}
2. Create an index template with ILM policy attached
PUT _template/my_template
{
"index_patterns": ["test-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "my_policy",
"index.lifecycle.rollover_alias": "test-alias"
}
}
3. Start the process by creating init index
PUT test-000001
{
"aliases": {
"test-alias":{
"is_write_index": true
}
}
}
That will help you with handling creation of new index every day without external CRON job. You can also extend your policy, later on to e.g. delete old indices after 7 days after rollover.
Hope that helps.

ElasticSearch - what is the difference between an index template and an index pattern

I have read an explanation to my question here:
https://discuss.elastic.co/t/whats-the-differece-between-index-pattern-and-index-template/54948
However, I still don't understand the difference. When defining an index PATTERN, does it not affect index creation at all? Also, what happens if I create an index but it doesn't have a corresponding index pattern? How can I see the mapping used for an index pattern so I can know how to use the Mapping API to update it?
And on a side note, the docs say you manage the index patterns by clicking the "Settings" and then "Indices" tab. I'm looking at Kibana and I don't see any settings tab. I can view the index patterns through the management tab, but I don't see any settings tab there
An index template is an ES feature for triggering the creation of new indexes whenever a name pattern is matched. For instance, let's say we create the following index template:
PUT _template/template_1
{
"index_patterns": ["foo*"],
"settings": {
"number_of_shards": 1
},
"mappings": {
...
}
}
As you can see, as soon as we want to index a document inside an index named (e.g.) foo-44 and that index doesn't exist, then that template (settings + mappings) will be used by ES in order to create the foo-44 index automatically.
You can update an index template at any time by simply PUTting a new settings/mappings definition like above.
An index pattern (not to be confounded with the index-patterns property you saw above, those are two totally different things), is a Kibana feature for telling Kibana what makes up an index (all the fields, their types, etc). Nothing can happen in Kibana without creating index patterns, which you can do in Management > Index Patterns.
Creating an index in ES will not create any index pattern in Kibana. Similarly, creating an index pattern in Kibana will not create any index in ES.
The reason why Kibana needs an index pattern is because it needs to store different kind of information as it available in an index mapping. For instance, let's say you create an index with the following mapping:
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"timestamp": {
"type": "date"
},
"name": {
"type": "text"
}
}
}
}
}
Then the corresponding index pattern that you will create in Kibana will have the following content:
GET .kibana/doc/index-pattern:16a98050-a53f-11e8-82ab-af0d48c6ddd8
{
"type": "index-pattern",
"updated_at": "2018-08-21T12:38:22.509Z",
"index-pattern": {
"title": "my_index*",
"timeFieldName": "timestamp",
"fields": """[{"name":"_id","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"_index","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"_score","type":"number","count":0,"scripted":false,"searchable":false,"aggregatable":false,"readFromDocValues":false},{"name":"_source","type":"_source","count":0,"scripted":false,"searchable":false,"aggregatable":false,"readFromDocValues":false},{"name":"_type","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":false},{"name":"name","type":"string","count":0,"scripted":false,"searchable":true,"aggregatable":false,"readFromDocValues":false},{"name":"timestamp","type":"date","count":0,"scripted":false,"searchable":true,"aggregatable":true,"readFromDocValues":true}]"""
}
}
As you can see, Kibana also stores the timestamp field, the name of the index pattern (which can span several indexes). Also it stores various properties for each field you have defined, for instance, for the name field, the index-pattern contains the following information that Kibana needs to know:
{
"name": "name",
"type": "string",
"count": 0,
"scripted": false,
"searchable": true,
"aggregatable": false,
"readFromDocValues": false
},

Best way to reindex multiple indices in ElasticSearch

I am using Elasticsearch 5.1.1 and have 500 + indices created with default mapping provided by ES.
Now we have decided to use dynamic templates.
In order to apply this template/mapping to old indices I need to reindex all indices.
What is the best way to do it? Can we use Kibana for this ? Couldn't find sufficient documentation to do so.
Example: Reindex from a daily index to a monthly index (August)
POST _reindex?slices=10&refresh
{
"source": {
"index": "myindex-2019.08.*"
},
"dest": {
"index": "myindex-2019.08"
}
}
Monitor reindex task (wait until is finished)
GET _tasks?detailed=true&actions=*reindex
Check if new index was created
GET _cat/indices/myindex-2019.08*?v&s=index
You can delete old indices
DELETE myindex-2019.08.*
Source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
You can use the _reindex API which can also reindex multiple indices. It was specifically built for this.
Bash script to re-index all indices matching a pattern: https://gist.github.com/hartfordfive/e507bc47e17f4e03a89055918900e44d
If you want to filter some field and reindex it from index you can use this.
POST _reindex
{
"source": {
"index": "auditbeat",
"query": {
"match": {
"agent.version": "7.6.0"
}
}
},
"dest": {
"index":"auditbeat-7.6.0"
}
}

Modify default number of Elasticsearch shards

If I have a 15 node cluster, do I have to change the
index.number_of_shards
value on all 15 nodes, and restart them, before the new value comes into effect for new indexes?
That is right changing index.number_of_shards defaults in config file would involve changing the setting on all nodes and then restarting the instance ideally following the guidelines for rolling restarts.
However if that is not an option and if explicitly specifying the number_of_shards in the settings while creating the new index is not ideal then the workaround would be using index templates
Example:
One can create an index_defaults default as below
PUT /_template/index_defaults
{
"template": "*",
"settings": {
"number_of_shards": 4
}
}
This applies the setting specified in index_defaults template to all new indexes.
Once you set the number of shards for an index in ElasticSearch, you cannot change them. You will need to create a new index with the desired number of shards, and depending on your use case, you may want then to transfer the data to the new index.
I say depending on the use case because, for instance, if you are storing time based data such as log events, it is perfectly reasonable to close one index and open a new one with a different number of shards, and index all data going forward to that new index, keeping the old one for searches.
However, if your use case is, for instance, storing blog documents, and your indices are by topic, then you will need to (a) create new indices as stated above with a different number of shards and (b) reindex your data. For (b) I recommend using the Scroll and Scan API to get the data out of the old index.
You need to create a template for new indices that will be created:
PUT /_template/index_defaults
{
"index_patterns": "*",
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
For old indices you need to reindex.
Example: from my_old_index to my_new_index
Create the new index with appropriate mapping and settings:
PUT my_new_index
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
Reindex from old index to new one, specify type only if you desire:
POST /_reindex?slices=5
{
"size": 100000,
"source": { "index": "my_old_index" },
"dest": { "index": "my_new_index", "type": "my_type" }
}
Updated syntax to avoid some deprecation warnings in Elasticsearch 6+
per
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/indices-templates.html
PUT /_template/index_defaults
{
"index_patterns": ["*"],
"order" : 0,
"settings": {
"number_of_shards": 2
}
}
Please remember that specifying the number of shards is a static operation and should be done when creating an index. But, any change after the index is created will require complete reindexing again which will take time.
To create the number of shards when creating an index use this command.
curl -XPUT ‘localhost:9200/my_sample_index?pretty’ -H ‘Content-Type: application/json’ -d’
{
“settings:”{
“number_of_shards”:2,
“number_of_replicas”:0
}
}
you don't have to to run this on all the nodes. run them on any one node. All the nodes communicate with each other about the change to the elastic index.

Resources