cassandra: strategy for single datacenter deployment - amazon-ec2

We are planning to use apache shiro & cassandra for distributed session management very similar to mentioned # https://github.com/lhazlewood/shiro-cassandra-sample
Need advice on deployment for cassandra in Amazon EC2:
In EC2, we have below setup:
Single region, 2 Availability Zones(AZ), 4 Nodes
Accordingly, cassandra is configured:
Single DataCenter: DC1
two Racks: Rack1, Rack2
4 Nodes: Rack1_Node1, Rack1_Node2, Rack2_Node1, Rack2_Node2
Data Replication Strategy used is NetworkTopologyStrategy
Since Cassandra is used as session datastore, we need high consistency and availability.
My Questions:
How many replicas shall I keep in a cluster?
Thinking of 2 replicas, 1 per rack.
What shall be the consistency level(CL) for read and write operations?
Thinking of QUORUM for both read and write, considering 2 replicas in a cluster.
In case 1 rack is down, would Cassandra write & read succeed with the above configuration?
I know it can use the hinted-hands-off for temporary down node, but does it work for both read/write operations?
Any other suggestion for my requirements?

Generally going for an even number of nodes is not the best idea, as is going for an even number of availability zones. In this case, if one of the racks fails, the entire cluster will be gone. I'd recommend to go for 3 racks with 1 or 2 nodes per rack, 3 replicas and QUORUM for read and write. Then the cluster would only fail if two nodes/AZ fail.

You probably have heard of the CAP theorem in database theory. If not, You may learn the details about the theorem in wikipedia: https://en.wikipedia.org/wiki/CAP_theorem, or just google it. It says for a distributed database with multiple nodes, a database can only achieve two of the following three goals: consistency, availability and partition tolerance.
Cassandra is designed to achieve high availability and partition tolerance (AP), but sacrifices consistency to achieve that. However, you could set consistency level to all in Cassandra to shift it to CA, which seems to be your goal. Your setting of quorum 2 is essentially the same as "all" since you have 2 replicas. But in this setting, if a single node containing the data is down, the client will get an error message for read/write (not partition-tolerant).
You may take a look at a video here to learn some more (it requires a datastax account): https://academy.datastax.com/courses/ds201-cassandra-core-concepts/introduction-big-data

Related

Is it safe and efficient to store session information in a Redis Cluster

This question sounds very much like: this one, but I believe it is not. Whilst that question is very specific, I believe it doesn't provide enough to cover the doubts I have.
I am trying to set up a Redis Cluster for an application deployment I have. I use Redis to store various information like Session info, Scheduled Job meta-info etc. I have been using a single node instance thus far. However, I am thinking moving to a Redis Cluster for HA. I know that Redis is single threaded and only provides best effort consistency and is not a strong consistency provider. So as far as I am at a single node, I had no issues with consistency (except in terms of fault-tolerance). However when I move to a cluster setup this is still not true (at-least as per what I understand).
My questions are as follows:
If I move to a Redis Cluster setup, do I compromise on consistency to gain HA? The Redis website itself says the cluster setup does not provide strong consistency guarantees given its asynchronous replication method. In that case what's the argument for people using/suggesting Redis to be a viable solution for storing sessions as in the previous post? Is it only true for a single node setup? Or is it that sessions are okay to have been lost once every whenever-it-happens?
For Redis to be truly fault-tolerant we must use the persistence feature and if not it cannot re-generate state? (I believe this also comes with a slight compromise in performance)
Am I correct in my understanding that Redis Cluster only provides HA in the sense the data is sharded and distributed and does not provide automatic fail-over? For which Redis Sentinel must be used?
What other solutions do people use for fast-access data with strong consistency requirements?
I may not answer all the questions in depth. Before going into the details of your questions;
The relation between availability and consistency is not only Redis related but one of the core principals of distributed systems. It can be explained with CAP Theorem. Yes you will compromise consistency for high availability because you can't sacrifice partition tolerance in distributed systems. Some of the distributed database technologies provide configuration to have "strong" consistency with the tradeoff availability with quorum (such as Cassandra).
If you want HA then Redis cluster may not be what you are looking for. Redis Cluster is a good solution when you need to shard your data(distribute the load) across multiple nodes. It is "a must" when you reach the limits of the memory of your instance. What you may need is Redis Sentinel.
Redis Sentinel provides high availability for Redis. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention certain kinds of failures.
The post you shared is almost 8 years old, it may not cover or answer all the requirement's of today. The post is not asking any scenarios or solutions to cover distributed Redis too.
Redis is still a great solution for sessions(perfect example for key/value). You may scale vertically and stay in one node to achieve strong consistency for sessions.
You may switch to some other database with configurable consistency(data accuracy) such as Cassandra and set your quorum according to the business needs. It will not be a silver bullet, there is always a tradeoff.
You may look for a third party tool for quorum or implement one to have strong consistency in Redis. Redis's quorum is different than Cassandra's.
The quorum is only used to detect the failure. In order to actually perform a failover, one of the Sentinels need to be elected leader for the failover and be authorized to proceed. This only happens with the vote of the majority of the Sentinel processes.
Redis sentinel could be an answer here too. The official documentation covers a lot of details.
If a master is not working as expected, Sentinel can start a failover process where a replica is promoted to master, the other additional replicas are reconfigured to use the new master, and the applications using the Redis server are informed about the new address to use when connecting.
Redis cluster's specifications and use cases are different than Sentinels. Redis Sentinels one of the most important power comes from leader election during failover. AFAIK, cluster doesn't have this(didn't try but saw some details in documentation).
I indirectly answered and gave examples for this one. Vertical(Instead of horizontal) scaling could be an option. You may add more resources(RAM etc) to your instance. Another option could be considering Cassandra and make tuning for immediate consistency. The tradeoff is again availability. If your node(s) go(es) down, then both your reads and writes fail.
For fast-access data with strong consistency requirements - go with Cassandra. It's inherent quorum mechanism helps ensure consistency and the P2P architecture provides scalability with minimal configuration overhead

Why RDBMS is said to be not partition tolerant?

Partition Tolerance - The system continues to operate as a whole even if individual servers fail or can't be reached.
Better definition from this link
Even if the connections between nodes are down, the other two (A & C)
promises, are kept.
Now consider we have master slave model in both RDBMS(oracle) and mongodb. I am not able to understand why RDBMS is said to not partition tolerant but mongo is partition tolerant.
Consider I have 1 master and 2 slaves. In case master gets down in mongo, reelection is done to select one of the slave as Master so that system continues to operate.
Does not happen the same in RDBMS system like oracle/Mysql ?
See this article about CAP theorem and MySQL.
Replication in Mysql cluster is synchronous, meaning transaction is not committed before replication happen. In this case your data should be consistent, however cluster may be not available for some clients in some cases after partition occurs. It depends on the number of nodes and arbitration process. So MySQL cluster can be made partition tolerant.
Partition handling in one cluster:
If there are not enough live nodes to serve all of the data stored - shutdown
Serving a subset of user data (and risking data consistency) is not an option
If there are not enough failed or unreachable nodes to serve all of the data stored - continue and provide service
No other subset of nodes can be isolated from us and serving clients
If there are enough failed or unreachable nodes to serve all of the data stored - arbitrate.
There could be another subset of nodes regrouped into a viable cluster out there.
Replication btw 2 clusters is asynchronous.
Edit: MySql can be also configured as a cluster, in this case it is CP, otherwise it is CA and Partition tolerance can be broken by having 2 masters.

How to deal with Split Brain with an cluster have the two number of nodes?

I am leaning some basic concept of cluster computing and I have some questions to ask.
According to this article:
If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka.partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption.
A cluster has quorum when more than half of all known nodes are online in the same partition, or for the mathematically inclined, whenever the following equation is true:
total_nodes < 2 * active_nodes
For example, if a 5-node cluster split into 3- and 2-node paritions, the 3-node partition would have quorum and could continue serving resources. If a 6-node cluster split into two 3-node partitions, neither partition would have quorum; pacemaker’s default behavior in such cases is to stop all resources, in order to prevent data corruption.
Two-node clusters are a special case.
By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless
Questions:
From above,I came out with some confuse, why we can not stop all cluster resources like “6-node cluster”?What`s the special lies in the two node cluster?
You are correct that a two node cluster can only have quorum when they are in communication. Thus if the cluster was to split, using the default behavior, the resources would stop.
The solution is to not use the default behavior. Simply set Pacemaker to no-quorum-policy=ignore. This will instruct Pacemaker to continue to run resources even when quorum is lost.
...But wait, now what happens if the cluster communication is broke but both nodes are still operational. Will they not consider their peers dead and both become the active nodes? Now I have two primaries, and potentially diverging data, or conflicts on my network, right? This issue is addressed via STONITH. Properly configured STONITH will ensure that only one node is ever active at a given time and essentially prevent split-brains from even occurring.
An excellent article further explaining STONITH and it's importance was written by LMB back in 2010 here: http://advogato.org/person/lmb/diary/105.html

Datastax Cassandra - Spanning Cluster node across amazon region

I planning to launch three EC2 instance across Amazon hosting region. For say, Region-A,Region-B and Region-C.
Based on the above plan, Each region act as Cluster(Or Datacenter) and have one node.(Correct me if I am wrong).
Using this infrastructure, Can I attain below configuration?
Replication Factor : 2
Write and Read Level:QUORUM.
My basic intention to do these are to achieve "If two region are went down, I can be survive with remaining one region".
Please help me with your inputs.
Note: I am very new to cassandra, hence whatever your inputs you are given will be useful for me.
Thanks
If you have a replication factor of 2 and use CL of Quorum, you will not tolerate failure i.e. if a node goes down, and you only get 1 ack - thats not a majority of responses.
If you deploy across multiple regions, each region is, as you mention, a DC in your Cluster. Each individual DC is a complete replica of all your data i.e. it will hold all the data for your keyspace. If you read/write at a LOCAL_* consistency (eg. LOCAL_ONE, LOCAL_QUORUM) level within each region, then you can tolerate the loss of the other regions.
The number of replicas in each DC/Region and the consistency level you are using to read/write in that DC will determine how much failure you can tolerate. If you are using QUORUM - this is a cross-DC consistency level. It will require a majority of acks from ALL replicas in your cluster in all DCs. If you loose 2 regions then its unlikely that you will be getting a quorum of responses.
Also, its worth remembering that Cassandra can be made aware of the AZ's it is deployed on in the Region and can do its best to ensure replicas of your data are placed in multiple AZs. This will give you even better tolerance to failure.
If this was me and I didnt need to have a strong cross-DC consistency level (like QUORUM). I would have 4 nodes in each region, deployed across each AZ and then a replication factor of 3 in each region. I would then be reading/writing at LOCAL_QUORUM or LOCAL_ONE (preferably). If you go with LOCAL_ONE than you could have fewer replicas in each DC e.g a replication factor of 2 with LOCAL_ONE means you could tolerate the loss of 1 replica.
However, this would be more expensive than what your initially suggesting but (for me) that would be the minimum setup I would need if I wanted to be in multiple regions and tolerate the loss of 2. You could go with 3 nodes in each region if you wanted to really save costs.

Datastax hadoop nodes basics

I'm trying to set up some hadoop nodes along with some cassandra nodes in my datastax enterprise cluster. Two things are not clear to me at this point. One, how many hadoop nodes do I need? Is it the same number of cassandra nodes? Does the data still live on the cassandra nodes? Second--the tutorials mention that I should have vnodes disabled on the hadoop nodes. Can I still use vnodes on the cassandra nodes in that cluster? Thank you.
In Datastax Enterprise you run Hadoop on nodes that are also running Cassandra. The most common deployment is to make two datacenters (logical groupings of nodes.) One Datacenter is devoted to analytics and contains your machines which run Hadoop and C* at the same time, the other datacenter is C* only and servers the OLTP function of your cluster. The C* processes on the Analytics nodes are connected to the rest of your cluster (like any other C* node) and receives updates when mutations are written so it is eventually consistent with the rest of your database. The data lives both on these nodes and on the other nodes in your cluster. Again most folks end up having a replication pattern with NetworkTopologyStrategy which specifies several replicas in their C* only DC and a single replica in their Analytics DC but your usecase may differ. The number of nodes does not have to be equal in the two datacenters.
For your second question, yes you can have Vnodes enabled in the C* only datacenter. In addition if your batch jobs are of a signficantly large enough size you could also run vnodes in your analytics datacenterr with only a slight performance hit. Again this is completely based on your use case. If you want many faster shorter analytics jobs you do NOT want vnodes enabled in your Analytics datacenter.

Resources