Kafka replicas not in sync once a host is replaced - amazon-ec2

Hello Kafka/Zookeeper users,
My team has a kafka cluster which works in conjunction with Apache zookeeper. The kafka is hosted on EC2. For any number of reasons, the EC2 host can go down and be replaced by a new host. The new host has a different broker id as compared to previous one (id generated by AWS, not us).
At this point, zookeeper still has the old state where previous host was replica of some partitions.
Although leader re-election happened successfully, the new replacement host was not utilized in any way as leader or replica.
The kafka documentation talks about 'broker coming up again' after sometime, but in EC2 world host is permanently replaced.
In distributed systems terminology we only attempt to handle a "fail/recover" model of failures where nodes suddenly cease working and then later recover (perhaps without knowing that they have died).
I understand the reason for that. Zookeeper contains state of each partition. That state contains the old dead host as leader and/or follower. When new host comes up, this state is not getting updated to include new host, until we manually run a command to set replicas.
Is there a way kafka can automatically utilize the new broker as leader or ISR?
This is causing lot of operational burden on our team to manually assign new broker as replica and trigger 'preferred leader election'.

Preferred leader election can be triggered automatically by turning on config auto.leader.rebalance.enable and tuning leader.imbalance.per.broker.percentage.
However, the problem you are facing is that:
new servers will not automatically be assigned any existing data
partitions, so unless partitions are moved to them they won't be doing
any work until new topics are created.
Seems you have to figure out a scheme that is able to automatically execute kafka-reassign-partitions.sh script whenever a replacement occurs. No purely-automatic scheme is offered out of box.

Related

HDFS migrate datanodes servers to new servers

I want to migrate our hadoop server with all the data and components to new servers (newer version of redhat).
I saw a post on cloudera site about how to move the namenode,
but I dont know how to move all the datanodes without data loss.
We have replica factor 2.
If I will shutdown 1 datanode at a time hdsfs will generate new replicas?
Is there A way to migrate all the datanodes at once? what is the correct way to transfer all (about 20 server) datanodes to a new cluster?
Also I wanted to know if hbase will have the same problem or if I can just to delete and add the roles on the new servers
Update for clearify:
My Hadoop cluster already contains two sets of servers (They are in the same hadoop cluster, I just splited it logicly for the example)
First set is the older version of linux servers
Second set is the newer version of linux servers
Both sets are already share data and components (the namenode is in the old set of servers).
I want to remove all the old set of servers so only the new set of servers will remain in the hadoop cluster.
Does the execution should be like:
shutdown one datanode (from old servers set)
run balancer and wait for finish
do the same for the next datanodes
because if so, the balancer operation takes a lot of time and the whole operation will take a lot of time.
Same problem is for the hbase,
Now hbase region and master are only on the old set of servers, and I want remove it and install on the new set of servers without data loss.
Thanks
New Datanodes can be freely added without touching the namenode. But you definitely shouldn't shut down more than one at a time.
As an example, if you pick two servers to shut down at random, and both hold a block of a file, there's no chance of it replicating somewhere else. Therefore, upgrade one at a time if you're reusing the same hardware.
In an ideal scenario, your OS disk is separated from the HDFS disks. In which case, you can unmount them, upgrade the OS, reinstall HDFS service, remount the disks, and everything will work as previously. If that isn't how you have the server set up, you should do that before your next upgrade.
In order to get replicas added to any new datanodes, you'll need to either 1) Increase the replication factor or 2) run the HDFS rebalancer to ensure that the replicas are shuffled across the cluster
I'm not too familiar with Hbase, but I know you'll need to flush the regionservers before you install and migrate that service to other servers. But if you flush the majority of them without rebalancing the regions, you'll have one server that holds all the data. I'm sure the master server has similar caveats, although hbase backup seems to be a command worth trying.
#guylot - After adding the new nodes and running the balancer process take the old nodes out of the cluster by going through the decommissioning process. The decommissioning process will move the data to another node in your cluster. As a matter of precaution, only run against on one node at a time. This will limit the potential for a lost data incident.

Cache a redis cluster locally

I have a scenario where we want to use redis, but I am not sure how to go about setting it up. Here is what we want to achieve eventually:
A redundant central redis cluster where all the writes will occur with servers in two aws regions.
Local redis caches on servers which will hold a replica of the complete central cluster.
The reason for this is that we have many servers which need read access only, and we want them to be independent even in case of an outage (where the server cannot reach the main cluster).
I know there might be a "stale data" issue withing the caches, but we can tolerate that as long as we get eventual consistency.
What is the correct way to achieve something like that using redis?
Thanks!
You need the Redis Replication (Master-Slave) Architecture.
Redis Replication :
Redis replication is a very simple to use and configure master-slave replication that allows slave Redis servers to be exact copies of master servers. The following are some very important facts about Redis replication:
Redis uses asynchronous replication. Starting with Redis 2.8, however, slaves periodically acknowledge the amount of data processed from the replication stream.
A master can have multiple slaves.
Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a cascading-like structure.
Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization.
Replication is also non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured Redis to do so in redis.conf. Otherwise, you can configure Redis slaves to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The slave will block incoming connections during this brief window (that can be as long as many seconds for very large datasets).
Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, slow O(N) operations can be offloaded to slaves), or simply for data redundancy.
It is possible to use replication to avoid the cost of having the master write the full dataset to disk: a typical technique involves configuring your master redis.conf to avoid persisting to disk at all, then connect a slave configured to save from time to time, or with AOF enabled. However this setup must be handled with care, since a restarting master will start with an empty dataset: if the slave tries to synchronized with it, the slave will be emptied as well.
Go through the Steps : How to Configure Redis Replication.
So I decided to go with redis-sentinel.
Using a redis-sentinel I can set the slave-priority on the cache servers to 0, which will prevent them from becoming masters.
I will have one master set up, and a few "backup masters" which will actually be slaves with slave-priority set to a value which is not 0, which will allow them to take over once the master goes down.
The sentinel will monitor the master, and once the master goes down it will promote one of the "backup masters" and promote it to be the new master.
More info can be found here

How to disable sentinel auto-slaveof when the previously dead redis-master is online again

I have a question about redis sentinel when there is a network partition
I started a redis server on server01 as master and a server on server02 as slave, there was a redis sentinel on another server, I setup a script to make the client point to the new master when failover。
Then a partition occurred isolating the master on server01, so the sentinel start a failover on server02, and the slave of server02 become the new master. All the clients are using the new master now, which is okay.
When the partition recovered, however, sentinel will send slaveof to the old master. The old master will then remove all the data and sync with the new master, even there is little difference between new master and old master. When there is more than one master-slave group, the sync command will take up all my bandwidth in my production environment.
So how to disable the auto slaveof? Is there a better idea?
Do you want to remove the old-master all together? If so then before it comes back up issue a sentinel reset <podname> and it (the old-master) will be removed from Sentinel. Of course, then you won't have a slave to the new-master.
The way Redis currently works, the old master will always have to do a full re-sync with the new master to become a slave to it. So until replication changes in Redis itself as long as you want replication you'll have to accept the sync aspect.
That said, I'm not sure what you mean by "When there is more than one master-slave group...". Could you elaborate?

Novell eDirectory: Error while adding replica on new server

I want to add a replica of our whole eDirectory tree to a new server (OES11.2 SLES11.3).
So I wanted to do so via iManager. (Partitions and Replicas / Replica View / Add Replica)
Everthing looks normal. I see our other servers with added replicas and of course the server with the master image.
For addition information: I did that a lot of times without problems until now.
When I want to add a replica to the new server, i get the following error: (Error -636) The server is unreachable.
I checked the /etc/hosts file and the network settings on both servers.
Ndsrepair looks normal too. All servers are in sync and there are no connection errors. The replica depth of the new server is -1. I get that, because there is no replica on it yet.
But if i can connect from one server to another and there are no error messages, why does adding a replica not work?
I also tried to make a LAN trace, but didn't get any information that would help me out here. In the trace the communication seems normal!
Am I forgetting something here?
Every server in our environment runs OES11.2 except the master server which runs OES11.1
Thanks for your help!
Daniel
Nothing wrong.
Error -636 means that the replica is not yet available at the new server. When will the synchronization, the replica will be ready and available. Depending on the size of the Tree and the communication channel we can wait for up to some hours.

EC2 database server failover strategy

I am planning to deploy my web app to EC2. I have several webserver instances. I have 1 primary database instance. I have 1 failover database instance. I need a strategy to redirect the webservers to the failover database instance IP when the primary database instance fails.
I was hoping I could use an Elastic IP in my connection strings. But, the webservers are not able to access/ping the Elastic IP. I have several brute force ideas to solve the problem. However, I am trying to find the most elegant solution possible.
I am using all .Net and SQL Server. My connection strings are encrypted.
Does anybody have a strategy for failing over a database instance in EC2 using some form of automation or DNS configuration?
Please let me know.
http://alestic.com/2009/06/ec2-elastic-ip-internal
tells you how to use the Elastic IP public DNS.
Haven't used EC2 but surely you need to either:
(a) put your front-end into some custom maintenance mode, that you define, while you switch the IP over; and have the front-end perform required steps to manage potential data integrity and data loss issues related to the previous server going down and the new server coming up when it enters and leaves your custom maintenance mode
OR, for a zero down-time system:
(b) design the system at the object/relational and transaction levels from the ground up to support zero-down-time fail-over. It's not something you can bolt on quicjkly to just any application.
(c) use some database support for automatic failover. I am unaware whether SQL Server support for failover suitable for your application exists or is appropriate here. I suggest adding a "sql-server" tag to the question to start a search for the right audience.
If Elastic IPs don't work (which sounds odd to say the least - shouldn't you talk to EC2 about that), you mayhave to be able to instruct your front-end which new database IP to use at the same time as telling it to go from maintenance mode to normal mode.
If you're willing to shell out a bit of extra money, take a look at Rightscale's tools; they've built custom server images and supporting tools that handle database failover (among many other things). This link explains how to do it with MySQL, so will hopefully show you some principles even though it doesn't use SQL Server.
I always thought there was this possibility in the connnection string
This is taken (but not yet tested) from How to add Failover Partner to a connection string in VB.NET :
If you connect with ADO.NET or the SQL
Native Client to a database that is
being mirrored, your application can
take advantage of the drivers ability
to automatically redirect connections
when a database mirroring failover
occurs. You must specify the initial
principal server and database in the
connection string and the failover
partner server.
Data Source=myServerAddress;Failover Partner=myMirrorServerAddress;
Initial Catalog=myDataBase;Integrated Security=True;
There is ofcourse many other ways to
write the connection string using
database mirroring, this is just one
example pointing out the failover
functionality. You can combine this
with the other connection strings
options available.
To broaden gareth's answer, cloud management softwares usually solve this type of problems. RightScale is one of them, but you can try enStratus or Scalr (disclaimer: I work at Scalr). These tools provide failover solutions like:
Backups: you can schedule automated snapshots of the EBS volume containing the data
Fault-tolerant database: in the event of failure, a slave is promoted master and mounted storage will be switched if the failed master and new master are in the same AZ, or a snapshot taken of the volume
If you want to build your own solution, you could replicate the process detailed below that we use at Scalr:
Is there a slave in the same AZ? If so, promote it, switch EBS
volumes (which are limited to a single AZ), switch any ElasticIP you
might have, reconfigure replication of the remaining slaves.
If not, is there a slave fully replicated in another AZ? If so, promote it,
then do the above.
If there are no slave in same AZ, and no slave fully
replicated in another AZ, then create a snapshot from master's
volume, and use this snapshot to create a new volume in an AZ where a
slave is running. Then do the above.

Resources