How many racks should I set in my ElasticSearch cluster - elasticsearch

My ES cluster has 20 machines with 50 nodes(ES instances), I'm not sure how many racks should I set. Is two racks enough? or 3 or 4 racks better.
As I know if I set rack_id in ES configuration, it can provide the following functions:
1, Select data location or relocation(to make sure replicas in different racks)
2, Use Rack_id as doc routing
Any reasons should I set more racks, but I think even just one rack by default is good too.

The chance of an outage of two machines is highest if they share hardware because you use VMs, smaller if they share a rack but not hardware, and again smaller if they share a building but not a rack. So it makes sense to use more than a single rack.
Whether you need more than 2 racks depends on your replicas. The default number of replications is 1. If you require a higher value, strictly speaking you will degrade the Availability of your cluster a bit if you use only 2 racks because the >= 3 setting will not be effective on the rack level.

I think that in your case, it's simpler and easier to just set cluster.routing.allocation.same_shard.host to true. (See https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html) This will prevent copies of the same shard to be placed on the same host (host is identified by address and host name). Please, test this before going in production with this approach.
Also, keep in mind that you need to specify the processors setting (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#processors) accordingly. Each ES node detects the # of cores available on the machine (not aware of other nodes present). With multiple nodes on the same machine, each node can think that it has dedicated access to all cores on the machine (this can be problematic for the default thread pool sizes are derived from this). So you will want to explicitly specify the # of cores available via the processors setting so that it does not end up over-allocating the thread pools.
I recommend using dedicated master nodes and to ensure cluster stability, each dedicated master node instance should be on its own machine (certainly can be a much smaller machine, e.g. 4Gb of RAM to start with).

Related

How to set the number of replicas properly when I try to start a TDengine cluster?

I was wondering how to set the replica parameter properly when start a TDengine cluster to balance the storage and high availability? According to documentation of TDengine, default value of replica is 1 which means no copies for each vnode (vGroup size should be 1 as well), and the replica can be dynamically changed to maintain a high avilability of the cluster. However, the extra vnode copies have to be generated physically when starting up multi-replica. So the problem rise up, how should a real company determine the value of replica to increase availability without taking up too much overhead(storage and performance) when using TDengine cluster?
replica means keeping a copy of the same data on multiple machines that are connected via a network. There are reasons you want to replicate data:
To keep data geographically close to your users (and thus reduce latency)
To allow the system to continue working even if some of its parts have failed (and thus increase availability)
To scale out the number of machines that can serve read queries (and thus increase read throughput)
referred from DDIA

Datastax Cassandra - Spanning Cluster node across amazon region

I planning to launch three EC2 instance across Amazon hosting region. For say, Region-A,Region-B and Region-C.
Based on the above plan, Each region act as Cluster(Or Datacenter) and have one node.(Correct me if I am wrong).
Using this infrastructure, Can I attain below configuration?
Replication Factor : 2
Write and Read Level:QUORUM.
My basic intention to do these are to achieve "If two region are went down, I can be survive with remaining one region".
Please help me with your inputs.
Note: I am very new to cassandra, hence whatever your inputs you are given will be useful for me.
Thanks
If you have a replication factor of 2 and use CL of Quorum, you will not tolerate failure i.e. if a node goes down, and you only get 1 ack - thats not a majority of responses.
If you deploy across multiple regions, each region is, as you mention, a DC in your Cluster. Each individual DC is a complete replica of all your data i.e. it will hold all the data for your keyspace. If you read/write at a LOCAL_* consistency (eg. LOCAL_ONE, LOCAL_QUORUM) level within each region, then you can tolerate the loss of the other regions.
The number of replicas in each DC/Region and the consistency level you are using to read/write in that DC will determine how much failure you can tolerate. If you are using QUORUM - this is a cross-DC consistency level. It will require a majority of acks from ALL replicas in your cluster in all DCs. If you loose 2 regions then its unlikely that you will be getting a quorum of responses.
Also, its worth remembering that Cassandra can be made aware of the AZ's it is deployed on in the Region and can do its best to ensure replicas of your data are placed in multiple AZs. This will give you even better tolerance to failure.
If this was me and I didnt need to have a strong cross-DC consistency level (like QUORUM). I would have 4 nodes in each region, deployed across each AZ and then a replication factor of 3 in each region. I would then be reading/writing at LOCAL_QUORUM or LOCAL_ONE (preferably). If you go with LOCAL_ONE than you could have fewer replicas in each DC e.g a replication factor of 2 with LOCAL_ONE means you could tolerate the loss of 1 replica.
However, this would be more expensive than what your initially suggesting but (for me) that would be the minimum setup I would need if I wanted to be in multiple regions and tolerate the loss of 2. You could go with 3 nodes in each region if you wanted to really save costs.

Actual need of Zookeepers

I am new to HBase and I am still learning it. I just wanted to know that how many Zookeepers do we actually need? Is it one per regionserver or one per cluster?Thanks
The zookeeper is per cluster, and not per regionserver.
From The hbase definitive guide:
How many ZooKeepers should I run? You can run a ZooKeeper ensemble
that comprises 1 node only but in production it is recommended that
you run a ZooKeeper ensemble of 3, 5 or 7 machines; the more members
an ensemble has, the more tolerant the ensemble is of host failures.
Also, run an odd number of machines. In ZooKeeper, an even number of
peers is supported, but it is normally not used because an even sized
ensemble requires, proportionally, more peers to form a quorum than an
odd sized ensemble requires. For example, an ensemble with 4 peers
requires 3 to form a quorum, while an ensemble with 5 also requires 3
to form a quorum. Thus, an ensemble of 5 allows 2 peers to fail, and
thus is more fault tolerant than the ensemble of 4, which allows only
1 down peer.
Give each ZooKeeper server around 1GB of RAM, and if possible, its own
dedicated disk (A dedicated disk is the best thing you can do to
ensure a performant ZooKeeper ensemble). For very heavily loaded
clusters, run ZooKeeper servers on separate machines from
RegionServers (DataNodes and TaskTrackers).

Running multiple Elasticsearch production nodes in one machine

Is it a recommended practice to run multiple Elasticsearch nodes in one physical (virtual) machine? I'm speaking about production environment.
I currently have three virtual machines that unicast each other. Setup:
node.name:"VM1"
master:true
data:true
node.name:"VM2"
master:true
data:true
node.name:"VM3"
master:false
data:true
There's a request to have a dedicated master node in first virtual machine (next to VM1). I'm trying to avoid that and looking for strong arguments that I shouldn't do this.
Please advice.
Having a dedicated master makes sense in a larger environment to me. I would say if your nodes are not that busy having a data node also be a master would not be the end of the world. I would be more comfortable having 3 data nodes for high availability.

How to setup ElasticSearch cluster with auto-scaling on Amazon EC2?

There is a great tutorial elasticsearch on ec2 about configuring ES on Amazon EC2. I studied it and applied all recommendations.
Now I have AMI and can run any number of nodes in the cluster from this AMI. Auto-discovery is configured and the nodes join the cluster as they really should.
The question is How to configure cluster in way that I can automatically launch/terminate nodes depending on cluster load?
For example I want to have only 1 node running when we don't have any load and 12 nodes running on peak load. But wait, if I terminate 11 nodes in cluster what would happen with shards and replicas? How to make sure I don't lose any data in cluster if I terminate 11 nodes out of 12 nodes?
I might want to configure S3 Gateway for this. But all the gateways except for local are deprecated.
There is an article in the manual about shards allocation. May be I'm missing something very basic but I should admit I failed to figure out if it is possible to configure one node to always hold all the shards copies. My goal is to make sure that if this would be the only node running in the cluster we still don't lose any data.
The only solution I can imagine now is to configure index to have 12 shards and 12 replicas. Then when up to 12 nodes are launched every node would have copy of every shard. But I don't like this solution cause I would have to reconfigure cluster if I might want to have more then 12 nodes on peak load.
Auto scaling doesn't make a lot of sense with ElasticSearch.
Shard moving and re-allocation is not a light process, especially if you have a lot of data. It stresses IO and network, and can degrade the performance of ElasticSearch badly. (If you want to limit the effect you should throttle cluster recovery using settings like cluster.routing.allocation.cluster_concurrent_rebalance, indices.recovery.concurrent_streams, indices.recovery.max_size_per_sec . This will limit the impact but will also slow the re-balancing and recovery).
Also, if you care about your data you don't want to have only 1 node ever. You need your data to be replicated, so you will need at least 2 nodes (or more if you feel safer with a higher replication level).
Another thing to remember is that while you can change the number of replicas, you can't change the number of shards. This is configured when you create your index and cannot be changed (if you want more shards you need to create another index and reindex all your data). So your number of shards should take into account the data size and the cluster size, considering the higher number of nodes you want but also your minimal setup (can fewer nodes hold all the shards and serve the estimated traffic?).
So theoretically, if you want to have 2 nodes at low time and 12 nodes on peak, you can set your index to have 6 shards with 1 replica. So on low times you have 2 nodes that hold 6 shards each, and on peak you have 12 nodes that hold 1 shard each.
But again, I strongly suggest rethinking this and testing the impact of shard moving on your cluster performance.
In cases where the elasticity of your application is driven by a variable query load you could setup ES nodes configured to not store any data (node.data = false, http.enabled = true) and then put them in for auto scaling. These nodes could offload all the HTTP and result conflation processing from your main data nodes (freeing them up for more indexing and searching).
Since these nodes wouldn't have shards allocated to them bringing them up and down dynamically shouldn't be a problem and the auto-discovery should allow them to join the cluster.
I think this is a concern in general when it comes to employing auto-scalable architecture to meet temporary demands, but data still needs to be saved. I think there is a solution that leverages EBS
map shards to specific EBS volumes. Lets say we need 15 shards. We will need 15 EBS Volumes
amazon allows you to mount multiple volumes, so when we start we can start with few instances that have multiple volumes attached to them
as load increase, we can spin up additional instance - upto 15.
The above solution is only advised if you know your max capacity requirements.
I can give you an alternative approach using aws elastic search service(it will cost little bit more than normal ec2 elasticsearch).Write a simple script which continuously monitor the load (through api/cli)on the service and if the load goes beyond the threshold, programatically increase the nodes of your aws elasticsearch-service cluster.Here the advantage is aws will take care of the scaling(As per the documentation they are taking a snaphost and launching a completely new cluster).This will work for scale down also.
Regarding Auto-scaling approach there is some challenges like shard movement has an impact on the existing cluster, also we need to more vigilant while scaling down.You can find a good article on scaling down here which I have tested.If you can do some kind of intelligent automation of the steps in the above link through some scripting(python, shell) or through automation tools like Ansible, then the scaling in/out is achievable.But again you need to start the scaling up well before the normal limits since the scale up activities can have an impact on existing cluster.
Question: is possible to configure one node to always hold all the shards copies?
Answer: Yes,its possible by explicit shard routing.More details here
I would be tempted to suggest solving this a different way in AWS. I dont know what ES data this is or how its updated etc... Making a lot of assumptions I would put the ES instance behind a ALB (app load balancer) I would have a scheduled process that creates updated AMI's regularly (if you do it often then it will be quick to do), then based on load of your single server I would trigger more instances to be created from the latest instance you have available. Add the new instances to the ALB to share some of the load. As this quiet down I would trigger the termination of the temp instances. If you go this route here are a couple more things to consider
Use spot instances since they are cheaper and if it fits your use case
The "T" instances dont fit well here since they need time to build up credits
Use lambdas for the task of turning things on and off, if you want to be fancy you can trigger it based on a webhook to the aws gateway
Making more assumptions about your use case, consider putting a Varnish server in front of your ES machine so that you can more cheaply provide scale based on a cache strategy (lots of assumptions here) based on the stress you can dial in the right TTL for cache eviction. Check out the soft-purge feature for our ES stuff we have gotten a lot of good value from this.
if you do any of what i suggest here make sure to make your spawned ES instances report any logs back to a central addressable place on the persistent ES machine so you don't lose logs when the machines die

Resources