How to gracefully upgrade production rethinkdb server - rethinkdb

We have host a rethinkdb cluster with millions of docs in production.
Every time we try to upgrade rethinkdb, we had to rebuild index, which will cause the table almost unavailable (Read very slow, and not writable) for hours or even days during upgrading.
So How do you guys upgrade rethinkdb in production ?

Related

Designing ElasticSearch Migration from 6.8 to 7.16 along with App Deployment

I have a Spring Boot application that uses ElasticSearch 6.8 and I would like to migrate it to Elasticsearch 7.16 with least downtime. I can do rolling update but the problem with migration is that when I migrate my ES cluster from version 6 to 7, some features in my application fails because of breaking changes (for example total hit response change)
I also upgraded my ElasticSearch client to version 7 in a separate branch and I can deploy it as well but that client doesn't work with ES version 6. So I cannot first release the application and then do the ES migration. I thought about doing application deployment and ES migration at the same time with a few hours downtime but in case something goes wrong rollback may take too much time (We have >10TB data in PROD).
I still couldn't find a good solution to this problem. I'm thinking to migrate only ES data nodes to 7.16 version and keep master nodes in 6.8. Then do application deployment and migrate ElasticSearch master nodes together with a small downtime. Has anyone tried doing this? Would running data and master nodes of my ElasticSearch cluster in different versions (6.8 and 7.16) cause problem?
Any help / suggestion is much appreciated
The breaking change you mention can be alleviated by using the query string parameter rest_total_hits_as_int=true in your client code in order to keep getting total hit count as in version 6 (mentioned in the same link you shared).
Running master and data nodes with different versions is not supported and I would not venture into it. If you have a staging environment where you can test this upgrade procedure it's better.
Since 6.8 clients are compatible with 7.16 clusters, you can add that small bit to your 6.8 client code, then you should be able to upgrade your cluster to 7.16.
When your ES server is upgraded, you can upgrade your application code to use the 7.16 client and you'll be good.
As usual with upgrades, since you cannot revert them once started, you should test this on a test environment first.

Upgrade from RDS Aurora 1 (MySQL 5.6) to Aurora 2 (MySQL 5.7) without downtime on a very active database

Is there a way to upgrade from Aurora 1 (MySQL 5.6) to Aurora 2 (MySQL 5.7) without downtime on an active database? This seems like a simple task given we should be able to simply do major version upgrades from either the CLI or the Console, but that is not the case.
We tried:
Creating a snapshot of the database
Creating a new cluster using Aurora 2 (MySQL 5.7) from the snapshot
Configure replication to the new cluster from the primary cluster
However, because you can't run commands that require SUPER user privileges in Aurora you're not able to stop transactions long enough to get a good binlog pointer from the master, which results in a ton of SQL errors that are impossible to skip on an active database.
Also, because Aurora is not doing binlog replication to its Read replicas I can't necessarily stop replication to that read replica and get the pointer.
I have seen this semi-related question, but it certainly requires downtime: How to upgrade AWS RDS Aurora MySQL 5.6 to 5.7
UPDATE: AWS just announced in-place upgrade option available for 5.6 > 5.7:
https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-aurora-supports-in-place-upgrades-mysql-5-6-to-5-7/
Simple as Modify and choose version with 2.x. :)
I tested this Aurora MySQL 5.6 > 5.7 on a 25Gb db, many minor versions behind and it took 10 min, with 8 min of downtime. Not zero downtime, but a very easy option, and it can be scheduled in AWS to happen automatically during off-peak times (maintenance window).
Additionally consider RDS Proxy to reduce downtime. During small windows of db unavailable time (eg. reboot for minor updates), the proxy will hold connections open, instead of completely unavailable, simply appearing as a brief delay/latency, only.
Need was to upgrade the AWS RDS Aurora MySQL from 5.6 to 5.7 without causing any downtime to our production. Being a SaaS solution, we could not afford any downtime.
Background
We have distributed architecture based on micro services running in AWS Fargate and AWS Lambda. For data persistency AWS RDS Aurora MySQL is used. While there are other services being used, those are not of interest in this use case.
Approach
After a good deliberation on in place upgrade by declaring a downtime and maintenance window, we realized that having zero downtime upgrade is the need. As without which we would have created a processing backlog for us.
High level approach was:
Create an AWS RDS Cluster with the required version and copy the data from the existing RDS Cluster to this new Cluster
Setup AWS DMS(Data Migration Service) between these two clusters
Once the replication is done and is ongoing then switch the application to point to the new DB. In our case, the micro-services running in AWS Fargate has to upgraded with the new end point and it took care of draining the old and using the new.
For Complete post please check out
https://bharatnainani1997.medium.com/aws-rds-major-version-upgrade-with-zero-downtime-5-6-to-5-7-b0aff1ea1f4
When you create a new Aurora cluster from a snapshot, you get a binlog pointer in the error log from the point at which the snapshot was taken. You can use that to set up replication from the old cluster to the new cluster.
I've followed a similar process to what you've described in your question (multiple times in fact) and was able to keep the actual downtime in the low seconds range.

elasticsearch temporary crash when heavily adding/updating

I am running elasticsearch on a dedicated server on a Saas platform. The problem is that when cron jobs execute, and massively update/insert new values in elastic search, the front-office(the site) when it tries to connect to elasticsearch it returns false (the connection fails).
Anyone knows what can be the problem and how it can be fixed? We are running elasticsearch latest stable elastic search version.
This happens on and off, meaning when i refresh the page in the front office sometimes it cannot connect to elastic search, after another refresh it works again and so on, until the heavy load passes.
We have nvme hdds and elastic search is only running on that server not multi-nodes.
When i say heavily, I mean 1000-2000 updates per second.

Long time index recovery when install plugin into existing Elasticsearch

I installed langdetect plugin into Elasticsearch beta cluster that have just 1 node and around 455 indices. When re-start server, use around 5-10 minutes to get yellow status.
I think if this plugin be installed in production that have many nodes and thousand indices. It have to use a lot of time to recovery perhaps.
Anyone used to meet this situation like this? how could you deal with it? Could I re-start ES with zero downtime?
p.s. use ES version 5.3.2

Elasticsearch cluster data migration to new cluster

We have a Elasticsearch cluster which is running on elasticsearch 1.4 and logstash 1.4 with 1 master and 4 data node, now I want to upgrade the version of elasticsearch to 1.7 and logstash to 1.5 without losing any data. So my plan is to create a new cluster with new nodes and restore the snapshot of the current cluster on that. Now my question is this the best way or upgrade the versions on the current cluster. I am bit of nervous because it a production logging stack working smoothly.I don't want to mess around with production cluster with testing
First of all, read documentation. As you said, you'd like to upgrade from 1.4 to 1.7, which means there's no significant version jump.
Documentation states that upgrading from 1.x version to another 1.x version you have to do a rolling upgrade. What's that? Quoting documentation:
A rolling upgrade allows the ES cluster to be upgraded one node at a
time, with no observable downtime for end users.
Which means you can shut node down one by one, upgrade its binaries and turn it back on. One node by one!.
Of course, always do a backup in case **** happens.

Resources