Long time index recovery when install plugin into existing Elasticsearch - elasticsearch

I installed langdetect plugin into Elasticsearch beta cluster that have just 1 node and around 455 indices. When re-start server, use around 5-10 minutes to get yellow status.
I think if this plugin be installed in production that have many nodes and thousand indices. It have to use a lot of time to recovery perhaps.
Anyone used to meet this situation like this? how could you deal with it? Could I re-start ES with zero downtime?
p.s. use ES version 5.3.2

Related

How to migrate data from elasticsearch 5.6 to elasticsearch 8.3

I have an elastic search cluster running 5.6. I plan to upgrade my cluster but i plan to do it by basically running a ES cluster 8.3 running in parallel and then moving data over to it.
The preferred way i think is to do snapshot and restore https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
But I am confused about what exactly Snapshot version compatibility mean :
In the above does it mean that if i take a snapshot in elasticsearch 5.6, I cannot restore directly in version 8.3 ?? (which mean I have to first move to 6.x then to 7.x and finally to 8.x ) ??
The below index compatibility matrix, however says that a version in 5.x will work in 8.x ?
Am i missing something ? or can someone help me elaborate this?
So, the underlying problem is that data written in Lucene version N, can only be read with N+1. For Elasticsearch 5 to 8 the Lucene version was always 1 greater than the ES version (so 6 to 9).
That means, both for an upgrade or a restored snapshot: If your data was written with 5.x, you can only read / restore it with 6.x. For 7.x or 8.x you'll need to reindex the data. I would do a remote reindex straight from 5.x to 8.latest if possible: https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html
There are some small caveats but they will probably not apply to you:
This doesn't apply to source only snapshots, but those always need a reindex, so that's not going to add any benefit for you.
8.3 added a feature to still read snapshots from 5.0 on but it is slower, doesn't support all features, and it is a commercial feature (platinum license if I'm not mistaken).
Depending on what kind of data it is: If it's aging out (like logs or metrics), maybe you don't have to migrate it to the new cluster?

Designing ElasticSearch Migration from 6.8 to 7.16 along with App Deployment

I have a Spring Boot application that uses ElasticSearch 6.8 and I would like to migrate it to Elasticsearch 7.16 with least downtime. I can do rolling update but the problem with migration is that when I migrate my ES cluster from version 6 to 7, some features in my application fails because of breaking changes (for example total hit response change)
I also upgraded my ElasticSearch client to version 7 in a separate branch and I can deploy it as well but that client doesn't work with ES version 6. So I cannot first release the application and then do the ES migration. I thought about doing application deployment and ES migration at the same time with a few hours downtime but in case something goes wrong rollback may take too much time (We have >10TB data in PROD).
I still couldn't find a good solution to this problem. I'm thinking to migrate only ES data nodes to 7.16 version and keep master nodes in 6.8. Then do application deployment and migrate ElasticSearch master nodes together with a small downtime. Has anyone tried doing this? Would running data and master nodes of my ElasticSearch cluster in different versions (6.8 and 7.16) cause problem?
Any help / suggestion is much appreciated
The breaking change you mention can be alleviated by using the query string parameter rest_total_hits_as_int=true in your client code in order to keep getting total hit count as in version 6 (mentioned in the same link you shared).
Running master and data nodes with different versions is not supported and I would not venture into it. If you have a staging environment where you can test this upgrade procedure it's better.
Since 6.8 clients are compatible with 7.16 clusters, you can add that small bit to your 6.8 client code, then you should be able to upgrade your cluster to 7.16.
When your ES server is upgraded, you can upgrade your application code to use the 7.16 client and you'll be good.
As usual with upgrades, since you cannot revert them once started, you should test this on a test environment first.

How to define datacenter.group in conf/elasticsearch.yml in order to run Elassandra multi data center?

I've 2DC
DC1
x.x.x.1 running Elassandra 6.2.3 (seed)
x.x.x.2 running Elassandra 5.5.0 (seed)
DC2
x.x.x.3 running Elassandra 6.2.3 (seed)
Actually I didn't want to create a multi data center, at first I have only two nodes in DC1 but they're unable to connect with each other due to minimum version that allowed connectivity among Elassandra is 5.6.
The thing that stop me from re-install Elassandra from 5.5 to 6.2 is I have an important data on that node. So I came with the multi data center solution.
The solution I've got from Strapdata's guy previously is
1.Create a new Cassandra Datacenter DC2 running version 6.2.3, with a dedicated datacenter group (see https://elassandra.readthedocs.io/en/latest/configuration.html#multi-datacenter-configuration)
2. Re-create your indices in DC2, there is few differences in the elasticsearch mapping between version 5.5 and 6.2, so you have to deal with that manually.
If you have a lot of data to re-index, you can stop the single-thread index build with a nodetool stop -id <compaction_id>, and restart it in multi-threads, see https://elassandra.readthedocs.io/en/latest/operations.html?highlight=--num-threads#create-delete-and-rebuild-index
3. Test your application on DC2 (warning, there is breaking changes in the Elasticsearch API when upgrading)
4. Remove old DCs running version 5.5 when everything is ok on DC2.
I've search all over the internet, there're no mentioned about the datacenter.group in elasticsearch.yaml (http://doc.elassandra.io/en/latest/configuration.html#multi-datacenter-configuration)
Now I've no idea what should I do with the datacenter.group one
please help
Thanks
For guys that might came across the issue,
after a couple of hours, I've figured out how to define the datacenter.group
just simply add datacenter.group:<your desired name> at the bottom of elasticsearch.yml file.
Then restart Cassandra service
systemctl restart cassandra
You're good to go! all the data will automatically transfer to the new node.

How can i run tarantool 2.3.1 with the snapshot from tarantool 1.10.3

Circumstances:
For 3 years there is an application, that uses tarantool (now it is 1.10.3), and once upon a time we've decided to move it to Kubernetes and replace old and ugly dockerfile, based on Jessie to the official image tarantool/tarantool:2.3.1. I don't know will it be all fine with the data.
In this case, I have two questions, and I will be really happy to read the correct answers from you:
Does tarantool 2.3.1-2-g92750c828 support recovering from snapshots, made by tarantool 1.10.3?
How can I surely load the snapshot data by the new version? It would be great to make it without restarting tarantool, because I have PVC like "emptyDir" that can't handle any data when pod restarts)
Yes, Tarantool 2.3 is compatible with Tarantool 1.10 in terms of binary protocol and snapshot format. If you can't simply run Tarantool 2.3 based on snapshot of 1.10, please file an issue - https://github.com/tarantool/tarantool/issues
I want to note several things: after upgrade to 2.3 it could be impossible to run it back on 1.10 (some system spaces/records incompatible will be created).
Sorry, I can't answer for this question for Kubernetes. But you can't simply stop and run again single instance. But upgrade without downtime is available for replication cluster - https://www.tarantool.io/en/doc/2.4/book/admin/upgrades/#upgrading-tarantool-in-a-replication-cluster

Elasticsearch cluster data migration to new cluster

We have a Elasticsearch cluster which is running on elasticsearch 1.4 and logstash 1.4 with 1 master and 4 data node, now I want to upgrade the version of elasticsearch to 1.7 and logstash to 1.5 without losing any data. So my plan is to create a new cluster with new nodes and restore the snapshot of the current cluster on that. Now my question is this the best way or upgrade the versions on the current cluster. I am bit of nervous because it a production logging stack working smoothly.I don't want to mess around with production cluster with testing
First of all, read documentation. As you said, you'd like to upgrade from 1.4 to 1.7, which means there's no significant version jump.
Documentation states that upgrading from 1.x version to another 1.x version you have to do a rolling upgrade. What's that? Quoting documentation:
A rolling upgrade allows the ES cluster to be upgraded one node at a
time, with no observable downtime for end users.
Which means you can shut node down one by one, upgrade its binaries and turn it back on. One node by one!.
Of course, always do a backup in case **** happens.

Resources