How does Elasticseach's model translate into these High-Availability patterns? - elasticsearch

I've been studying Elasticsearch model of availability where you create a cluster with master nodes and data nodes [1], where master nodes control the cluster and data nodes hold data. You can also set for each index, a number of shards and replicas that are distributed through these data nodes.
I also seen [2] that High-Avalability patterns are usually some model of Fail-Over (Active-passive or Active-Active) and?/or Replication (Master-slave or Master-master). But I couldn't fit these information together. How can I classify this model in this patterns?
There is also [3] other NoSQL databases like MongoDB having similar HA model and being deployed as a cluster using StatefulSets in Kubernetes. I want to understand more of how it works. Any hints on that?
References:
[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
[2] https://www.slideshare.net/jboner/scalability-availability-stability-patterns/33-What_do_we_mean_withAvailability
[3] https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/

StatefulSet in Kubernetes
In a distribued system, it is much easier to handle stateless workload, since it contains no state and it is trivial to replicate the service to any number of replicas. In Kubernetes statless workloads is managed by ReplicaSet (deployed from Deployment).
Most services require some kind of state. StatefulSet manages stateful workload on Kubernetes and it is different from ReplicaSet in that pods managed by a StatefulSet have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage.
Failover and Replication
I also seen that High-Avalability patterns are usually some model of Fail-Over (Active-passive or Active-Active) and?/or Replication (Master-slave or Master-master). But I couldn't fit these information together. How can I classify this model in this patterns?
These are pretty outdated patterns. Now, Consensus algorthims is the norm for High-Availability and Fail-over since both these problems is about replication and leader-election. Raft (2013) is one of the most popular consensus algorithms and I can recommend the book Designing Data-Intensive Applications if you want to learn more about the problems with High-Availability, Fail-over, Replicaton and Consensus.
Elasticsearch seem to use consensus algorithm for its clustering. Any of the master-eligible nodes may be elected as master and it is recommended to have at least three of them (for high-availability)
Role of nodes
a cluster with master nodes and data nodes, where master nodes control the cluster and data nodes hold data. You can also set for each index, a number of shards and replicas that are distributed through these data nodes.
Nodes can have several roles in an Elasticsearch cluster. When you have a small cluster, your nodes can have several roles, e.g. both master-eligible and data but as your cluster grows to more nodes it is recommended to have dedicated nodes for the roles.

Related

How to deal with Split Brain with an cluster have the two number of nodes?

I am leaning some basic concept of cluster computing and I have some questions to ask.
According to this article:
If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka.partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption.
A cluster has quorum when more than half of all known nodes are online in the same partition, or for the mathematically inclined, whenever the following equation is true:
total_nodes < 2 * active_nodes
For example, if a 5-node cluster split into 3- and 2-node paritions, the 3-node partition would have quorum and could continue serving resources. If a 6-node cluster split into two 3-node partitions, neither partition would have quorum; pacemaker’s default behavior in such cases is to stop all resources, in order to prevent data corruption.
Two-node clusters are a special case.
By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless
Questions:
From above,I came out with some confuse, why we can not stop all cluster resources like “6-node cluster”?What`s the special lies in the two node cluster?
You are correct that a two node cluster can only have quorum when they are in communication. Thus if the cluster was to split, using the default behavior, the resources would stop.
The solution is to not use the default behavior. Simply set Pacemaker to no-quorum-policy=ignore. This will instruct Pacemaker to continue to run resources even when quorum is lost.
...But wait, now what happens if the cluster communication is broke but both nodes are still operational. Will they not consider their peers dead and both become the active nodes? Now I have two primaries, and potentially diverging data, or conflicts on my network, right? This issue is addressed via STONITH. Properly configured STONITH will ensure that only one node is ever active at a given time and essentially prevent split-brains from even occurring.
An excellent article further explaining STONITH and it's importance was written by LMB back in 2010 here: http://advogato.org/person/lmb/diary/105.html

About elasticsearch cluster

I need to provide many elasticSearch instances for different clients but hosted in my infrastructre.
For the moment it is only some small instances.
I am wondering if it is not better to build a big ElastSearch Cluster with 3-5 servers to handle all instances and then each client gets a different index in this cluster and each instance is distributed over servers.
Or maybe another idea?
And another question is about quorum, what is the quorum for ES please?
thanks,
You don’t have to assign each client to different index, Elasticsearch cluster will automatically share loading among all nodes which share shards.
If you are not sure how many nodes are needed, start from a small cluster then keep monitoring the health status of cluster. Add more nodes to the cluster if server loading is high; remove nodes if server loading is low.
When the cluster continuously grow, you may need to assign a dedicated role to each node. In this way, you will have more control over the cluster, easier to diagnose the problem and plan resources. For example, adding more master nodes to stabilize the cluster, adding more data nodes to increase searching and indexing performance, adding more coordinate nodes to handle client requests.
A quorum is defined as majority of eligible master nodes in cluster as follows:
(master_eligible_nodes / 2) + 1

Elasticsearch one big cluster VS tribe node?

Problem descriptions:
- Multiple machines producing logs.
- On each machine we have logstash which filters the log files and sends them to a local elasticsearch
- We would like to keep the machines as separate as possible and avoid intercommunication
- But we would also like to be able to visualize all of these logs with a single Kibana instance
Approaches:
Make each machine a single node ES cluster, and have one of the machines as a tribe node with Kibana installed on this machine (of course with avoiding indices conflict)
Make all machines (nodes) part of a single cluster with each node writing to unique index of one shard and statically map each shard to its node, and finally of course having one instance of kibana for the cluster
Question:
Which approach is more appropriate for the described scenario in terms of: limiting inter machine communications, cluster management, and maybe other aspects that I haven't think about ?
Tribe node is there because of this requirements. So my advice to use the Tribe node setup.
With the second option;
There will be a cluster but you will not use its benefits (replica shards, shard relocation, query performance, etc)
Benefits mentioned above will be pain points that will generate configuration complexity and troubleshooting hell.
Besides the shard allocation and node communication there will be other things to configure that nodes will have when they are in a cluster.

Datastax hadoop nodes basics

I'm trying to set up some hadoop nodes along with some cassandra nodes in my datastax enterprise cluster. Two things are not clear to me at this point. One, how many hadoop nodes do I need? Is it the same number of cassandra nodes? Does the data still live on the cassandra nodes? Second--the tutorials mention that I should have vnodes disabled on the hadoop nodes. Can I still use vnodes on the cassandra nodes in that cluster? Thank you.
In Datastax Enterprise you run Hadoop on nodes that are also running Cassandra. The most common deployment is to make two datacenters (logical groupings of nodes.) One Datacenter is devoted to analytics and contains your machines which run Hadoop and C* at the same time, the other datacenter is C* only and servers the OLTP function of your cluster. The C* processes on the Analytics nodes are connected to the rest of your cluster (like any other C* node) and receives updates when mutations are written so it is eventually consistent with the rest of your database. The data lives both on these nodes and on the other nodes in your cluster. Again most folks end up having a replication pattern with NetworkTopologyStrategy which specifies several replicas in their C* only DC and a single replica in their Analytics DC but your usecase may differ. The number of nodes does not have to be equal in the two datacenters.
For your second question, yes you can have Vnodes enabled in the C* only datacenter. In addition if your batch jobs are of a signficantly large enough size you could also run vnodes in your analytics datacenterr with only a slight performance hit. Again this is completely based on your use case. If you want many faster shorter analytics jobs you do NOT want vnodes enabled in your Analytics datacenter.

cassandra: strategy for single datacenter deployment

We are planning to use apache shiro & cassandra for distributed session management very similar to mentioned # https://github.com/lhazlewood/shiro-cassandra-sample
Need advice on deployment for cassandra in Amazon EC2:
In EC2, we have below setup:
Single region, 2 Availability Zones(AZ), 4 Nodes
Accordingly, cassandra is configured:
Single DataCenter: DC1
two Racks: Rack1, Rack2
4 Nodes: Rack1_Node1, Rack1_Node2, Rack2_Node1, Rack2_Node2
Data Replication Strategy used is NetworkTopologyStrategy
Since Cassandra is used as session datastore, we need high consistency and availability.
My Questions:
How many replicas shall I keep in a cluster?
Thinking of 2 replicas, 1 per rack.
What shall be the consistency level(CL) for read and write operations?
Thinking of QUORUM for both read and write, considering 2 replicas in a cluster.
In case 1 rack is down, would Cassandra write & read succeed with the above configuration?
I know it can use the hinted-hands-off for temporary down node, but does it work for both read/write operations?
Any other suggestion for my requirements?
Generally going for an even number of nodes is not the best idea, as is going for an even number of availability zones. In this case, if one of the racks fails, the entire cluster will be gone. I'd recommend to go for 3 racks with 1 or 2 nodes per rack, 3 replicas and QUORUM for read and write. Then the cluster would only fail if two nodes/AZ fail.
You probably have heard of the CAP theorem in database theory. If not, You may learn the details about the theorem in wikipedia: https://en.wikipedia.org/wiki/CAP_theorem, or just google it. It says for a distributed database with multiple nodes, a database can only achieve two of the following three goals: consistency, availability and partition tolerance.
Cassandra is designed to achieve high availability and partition tolerance (AP), but sacrifices consistency to achieve that. However, you could set consistency level to all in Cassandra to shift it to CA, which seems to be your goal. Your setting of quorum 2 is essentially the same as "all" since you have 2 replicas. But in this setting, if a single node containing the data is down, the client will get an error message for read/write (not partition-tolerant).
You may take a look at a video here to learn some more (it requires a datastax account): https://academy.datastax.com/courses/ds201-cassandra-core-concepts/introduction-big-data

Resources