Cassandra: 6 node cluster, RF=2: What to do when 2 nodes crash? - cassandra-2.0

Good Day
We have a 6 node casssandra cluster witha replication factor of 3 on our keyspaces. Our applications make use of QUORUM so we can survive the loss of a single node wihtout it affecting the application.
Lets assume I lose 2 nodes at the same time. If my application was using consistency level of ONE then it would have been fine and my application would have run without any issues but we would like to keep the level at QUORUM.
My question is if 2 nodes crash at the same time and I do a nodetool removenode for each of the crashed nodes, will the cluster then rebalance the data over the remaining 4 nodes (and getting ir back to a 3 replica) and if done should my application then be able to work again usinng QUORUM?

In title you write RF=2, in text RF=3. You did not specify Cassandra version and if you are using single-token or vnodes. Quorum CL means, in a RF = 3 that 2 nodes must write/read before returning. It is possible that you face minimal issues/no issue even if 2 nodes dies, it depends on how many common ranges (partitions) the nodes shares.
Give a look at this distribution example that is exactly like the one you describe: RF3, 6 nodes.
using single tokens:
if you loose couples like (1,4) - (2,5) - (3,6) -- your cluster should allow all writes and reads, no issues. A good client will recognize nodes down and won't use them anymore as coordinators. Other situations, for example loss of nodes (1,6) might lead to a situation in which any r/w of F and E tokens will fail (assuming an equal distribution about 33% r/w operation will fail)
using vnodes:
here the situation is slightly different and also depends on couples you loose -- now if you repeat the worst scenario above -- you loose couple of nodes like (1,6) only B tokens will be affected in r/w operations since it's the only token shared between them.
Said that, just to clarify the possible scenarios, here's your answer. Nodetool removenode should be used like explained in this document. Use removenode IF AND ONLY IF you want reduce the cluster size (here what to do if you want replace a dead node). Once you did that your application will start working again using Quorum since other nodes will be responsible for partitions previously assigned to a dead node.
If you are using the official Datastax Java Driver you might want to let the driver temporary fight your monsters specifying a DowngradingConsistencyRetryPolicy
HTH,
Carlo

Related

number of nodes in elasticsearch cluster

in our university we have an elasticsearch cluster with 1 Node. Now we have money to install more powerful server. We produce 7-10 millions accesslogs / day.
What is better to create a cluster with:
a. 3 powerful server each 64GB and 16 CPU + SSD.
b. to have 14 not so powerful server each 32GB and 8CPU +SSD
ps: a & b have the same price.
c. may be some recommendation?
Thank you in advance
it depends on the scenario. for the logging case you describing option b seems more flexible to me. let me explain my opinion:
as you are in a logging scenario, then implement the hot/warm architecture. you'll mainly write and read recent indices. in few cases you want to access older data and you probably want to shrink old and close even older indices.
set up at least 3 master eligble nodes to prevent spit brain problems. configure the same nodes also as coordinating nodes (11 nodes left)
install 2 ingest nodes to move the ingestion workload to dedicated nodes (9 nodes left)
install 3 hot data nodes for storing the most recent indices (6 nodes left)
install 6 warm data nodes for holding older, shrinked and closed indices. (0 nodes left)
the previous setup is just a example. the node numbers/roles should be changed if
if you need more resiliency. then add more master nodes, increase replica count for the index nodes. this will also reduce the total capacity.
the more old data you need to have searchable or being held in already closed indices, the more warm nodes you'll need. then rebalance the hot/warm node count according to you needs. if you can drop your old data early then increase the hot node count.
if you have xpack licensed, consider installing ml/alerting nodes. add this roles to the master nodes or reduce the data nodes count in favor of ml/alertig.
do you need kibana/logstash? depending on the workload, prepare one/two nodes exclusively.
assuming there are the same mainboards in both options you have more potential to quickly scale the 14 boxes up just by adding more ram/cpu/storage. having 3 nodes already maxed out at the specs, you'll need to set up new boxes and join them the cluster in order to scale up. but this also gives you maybe more recent hardware in you rack over the time.
please also have a look on this: https://www.elastic.co/pdf/architecture-best-practices.pdf
if you need some background on sharding configuration please see ElasticSearch - How does sharding affect indexing performance?
BTW: thomas is right with his comment about the heap size. please have a look on this if you want to know the background: https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

Multiple datacenter replication and local quorum?

I created a cluster from 6 nodes.
3 nodes in Eu west1 and 3 nodes in EU west2
I set the locality for every group of nodes like : --locality=region=europe,datacenter=west1
I also set the replica to 6 to have all ranges and all data on every node.
What will happen if the connection between data centers is lost the whole cluster goes down ?
I tried to kill 3 nodes in one of the datacenters and cluster is not operational because the majority of the nodes are down and quorum is less that 4.
Is it possible to make the 2 datacentes to work with their local quorum 2/3
I also played a bit with replications settings and sometimes cluster is healthy if I kill 3 nodes from 6 and was I was able to write to the cluster. Sometimes I can only read from the cluster. Cluster is working with replica of 5 and 3 nodes killed from 6. Still paying with this but if someone can give me more information will be very helpful.
To be able to replicate across datacentes is very cool feature but if I lost the whole cluster when one of the datacenters is down ruin the whole good idea at least for me.
CockroachDB requires a majority of replicas to be fully operational, which means > half, not >= half. In order to survive the loss of a full datacenter or region, you must have three DCs/regions, not two. Try running two nodes in each of three regions instead of three nodes in two regions.
Is it possible to make the 2 datacenters to work with their local quorum 2/3
Not for a single table (because it would be impossible to guarantee consistency if each datacenter were able to act in isolation from the other). You've configured the data to be replicated across all six replicas, which means four replicas are required to make a quorum. If you want each datacenter to be able to operate independently of the other, you would need two separate tables, with each one configured to be located within one of the datacenters.
Thanks for the answer just to clear few thing. But looks like you got my point and what I want to accomplish.
But as far as I understand if I have 2x3 node in 2 different DC's if one DC goes down. I have 3 live nodes for the quorum I need at least 4 . N/2 +1.
So if I have 3x3 I can lost one DC because if I have 2 DC's live I will have a quorum .
And one last question if I don't set replication to 9 if I loose 3 nodes some in one DC some ranges will be not available right ?

Datastax Cassandra - Spanning Cluster node across amazon region

I planning to launch three EC2 instance across Amazon hosting region. For say, Region-A,Region-B and Region-C.
Based on the above plan, Each region act as Cluster(Or Datacenter) and have one node.(Correct me if I am wrong).
Using this infrastructure, Can I attain below configuration?
Replication Factor : 2
Write and Read Level:QUORUM.
My basic intention to do these are to achieve "If two region are went down, I can be survive with remaining one region".
Please help me with your inputs.
Note: I am very new to cassandra, hence whatever your inputs you are given will be useful for me.
Thanks
If you have a replication factor of 2 and use CL of Quorum, you will not tolerate failure i.e. if a node goes down, and you only get 1 ack - thats not a majority of responses.
If you deploy across multiple regions, each region is, as you mention, a DC in your Cluster. Each individual DC is a complete replica of all your data i.e. it will hold all the data for your keyspace. If you read/write at a LOCAL_* consistency (eg. LOCAL_ONE, LOCAL_QUORUM) level within each region, then you can tolerate the loss of the other regions.
The number of replicas in each DC/Region and the consistency level you are using to read/write in that DC will determine how much failure you can tolerate. If you are using QUORUM - this is a cross-DC consistency level. It will require a majority of acks from ALL replicas in your cluster in all DCs. If you loose 2 regions then its unlikely that you will be getting a quorum of responses.
Also, its worth remembering that Cassandra can be made aware of the AZ's it is deployed on in the Region and can do its best to ensure replicas of your data are placed in multiple AZs. This will give you even better tolerance to failure.
If this was me and I didnt need to have a strong cross-DC consistency level (like QUORUM). I would have 4 nodes in each region, deployed across each AZ and then a replication factor of 3 in each region. I would then be reading/writing at LOCAL_QUORUM or LOCAL_ONE (preferably). If you go with LOCAL_ONE than you could have fewer replicas in each DC e.g a replication factor of 2 with LOCAL_ONE means you could tolerate the loss of 1 replica.
However, this would be more expensive than what your initially suggesting but (for me) that would be the minimum setup I would need if I wanted to be in multiple regions and tolerate the loss of 2. You could go with 3 nodes in each region if you wanted to really save costs.

How to avoid or minimize a split brain with 3 nodes in Elasticsearch

I try to read the different recommendations here and there, but I still don't see how with 3 nodes we solve the problem over two nodes.
If I have A<->B<->C<->A node and A <-> B network connection is down, how to avoid the possibility to form 2 clusters A <-> C and B <-> C running in parallel?
A split brain scenario occurs when some of your cluster nodes can't connect to other cluster nodes.
With a large estate - especially geographically distributed - the intermediate link might be lost. If both parts of the cluster are quorate (have enough nodes available as defined in the config) they carry on 'working' and you end up with data going out of sync between the two.
This cannot happen in your scenario, because node C knows that both A and B are online - thus you don't get two separate clusters forming, even if the comms between A and B are offline.
If A was lost entirely, B+C are more than 50% of the nodes - therefore your cluster knows it's 'quorate' and can continue operating normally.
Enhancements to Zen discovery fixing partial isolation

RavenDb topology with load balancing and redundancy

We're trying to come up with the appropriate RavenDb topology that would allow us to balance load as well as be fault tolerant.
It seems that better approach for load balancing would be to use native sharding, we might shift to use it but due to domain peculiarities it is not trivial at this point.
In order to have redundancy we just setup 2 ravendb nodes per group with master/master replication between so if one fails, RavenDb client will automatically switches to another one.
We have indexing "component" which is the only one who will be writing to the database so it'll be writing to one node and we expect these changes to be distributed eventually. We're going to setup master/master replication between two groups of ravendb nodes so if indexing component will eventually fall back to the group 1, changes should be replicated to the second group.
The schema
So, it seems there's low risk of having conflicts since we have only one player who writes to the database (with bunches, once in a minute). Several questions regarding this setup:
Is it typical for RavenDb to have so many master/master
replications?
Can this problem be solved in an easier way?
How to configure client's fallback conventions so each web node will fail to another node in its group first before failing to
another group's RavenDb node?
Do you see any potential issues if we embed simple round robin logic for reads on each of web nodes (web node 1 will read from
both: RavenDb1_01 and RavenDb1_02)? Will it make standard RavenDb
fallback behavior go crazy?
1) It is pretty common to have many nodes in such a cluster, yes. Note that you need to setup replication to be changed and replicated in such topology.
2) It is generally easier to have a fully connected topology, rather than the layers you have here..
3) The failover is always based on the client's primary node order of destinations. In other words, if node 2 has destinations (node 1, node 3) and node 1 has destinations (node 3, node 2).
A client that was originally connected to node 2 would go to node 1 then node 3
on failover, and a client that was originally connected to node 1 will go to node 3 and then 2 on failover.
4) Round robin and failover behave separately.

Resources