elasticsearch cluster without replica - elasticsearch

I want to create an Elasticsearch cluster without replica because the cluster is created in a VMinfra and the VMInfra is fully redundant and the Disks are configured in RAID.
Is it fine?
also What will be the advantage of creating a cluster without a replica?

Replicas increase availability. So let's say you have a data node that's overloaded or down, if you have another node with a replica of the same shard on it, your data will still be available. I'm not familiar with VMInfra so I'm not sure whether "fully redundant" means you can get the same level of availability in those cases.
For most production uses you'll want to have at least one replica for each shard - meaning, the primary is on one node, and the replica is on another, and therefore at least two separate data nodes.
Having more replicas might make sense as load increases. Replicas have costs in memory and disk space - you're essentially storing the data several times.

Having not any replica shards even in optimistic view is bad trade
off, Replicas provide redundant copies of your data to protect
against hardware failure beside that they are increasing the
throughput of your query results because the queries executed on all
replicas in parallel with out interrupting each others so they also
tune your search performance, On the other hand having too many
shards can makes your system to overhead.
If your virtual server environment has good configuration i recommend
to always have at least one replica shards on the different hard
drivers. You may find out more on pros and cons in Here and an old discussion on how replica shards work on this stackoverflow chat.

Related

Why not allow replica shards in the same node that holds primary shard to increase performance/throughput?

I understand that replica shards are used for two main purposes in Elasticsearch:
Providing high availability (I.e. backup)
Improving throughput by enabling running search queries parallelly on multi-core CPUs
Elasticsearch does not allow having replica shards on the same node that holds the primary shard, the rationale is that replicas are used for backup which would be meaningless if they're stored on the same node as the primary shard. I get that.
But, in my case, I have a cluster with a single node and would like to add a replica to the node to improve the throughput and I don't mind the fact that I still have a single point of failure (I have the original data stored somewhere else). I only have a single machine to work with. Why can't I add replica shards for performance reasons only while disregarding the backup aspects?
ElasticSearch can execute concurrent requests on a shard. See this thread
the processing of a query is single threaded against each shard. Multiple queries can however be run concurrently against the same shard, so assuming you have more than one concurrent query, you can still use multiple cores.
So adding a replicas in the same node will just consume disk space. The throughput gain of replicas is that the data is distributed on mutiple node allowing all cpus of those node to be used to process your query.

How does elasticsearch prevent cascading failure after node outage due to disk pressure?

We operate an elasticsearch stack which uses 3 nodes to store log data. Our current config is to have indices with 3 primaries and 1 replica. (We have just eyeballed this config and are happy with the performance, so we decided to not (yet) spend time for optimization)
After a node outage (let's assume a full disk), I have observed that elasticsearch automatically redistributes its shards to the remaining instances - as advertised.
However this increases disk usage on the remaining two instances, making it a candidate for cascading failure.
Durability of the log data is not paramount. I am therefore thinking about reconfiguring elasticsearch to not create a new replica after a node outage. Instead, it could just run on the primaries only. This means that after a single node outage, we would run without redundancy. But that seems better than a cascading failure. (This is a one time cost)
An alternative would be to just increase disk size. (This is an ongoing cost)
My question
(How) can I configure elasticsearch to not create new replicas after the first node has failed? Or is this considered a bad idea and the canonical way is to just increase disk capacity?
Rebalancing is expensive
When a node leaves the cluster, some additional load is generated on the remaining nodes:
Promoting a replica shard to primary to replace any primaries that were on the node.
Allocating replica shards to replace the missing replicas (assuming there are enough nodes).
Rebalancing shards evenly across the remaining nodes.
This can lead to quite some data being moved around.
Sometimes, a node is only missing for a short period of time. A full rebalance is not justified in such a case. To take account for that, when a node goes down, then elasticsearch immediatelly promotes a replica shard to primary for each primary that was on the missing node, but then it waits for one minute before creating new replicas to avoid unnecessary copying.
Only rebalance when required
The duration of this delay is a tradeoff and can therefore be configured. Waiting longer means less chance of useless copying but also more chance for downtime due to reduced redundancy.
Increasing the delay to a few hours results in what I am looking for. It gives our engineers some time to react, before a cascading failure can be created from the additional rebalancing load.
I learned that from the official elasticsearch documentation.

Elasticsearch and CAP Theorem

Elasticsearch is a distributed system. As per the CAP theorem, it can satisfy any 2 out of 3 properties. Which one is compromised in Elasticsearch?
The answer is not so straightforward. It depends upon how the system is configured and how you want to use it. I will try to go into the details.
Paritioning in ElasticSearch
Each index is partitioned in shards, meaning data in each shard is mutually exclusive to other shards. Each shard further has multiple Lucence indices, which are not in the scope of this answer.
Each shard can have a replica running (most setups have) and in an event of a failure, the replica can be promoted to a master. Let's call a shard that has a primary working and is reachable from the ES node that our application server is hitting as an Active shard. Thus, a shard with no copies in primary and is not reachable is considered as failed shard. (Eg: An error saying "all shards failed" means no primaries in that index are available)
ES has a feature to have multiple primaries (divergent shards). It is not a good situation as we lose both read/write consistencies.
In an event of a network partition, what will happen to:
Reads:
By default reads will continue to happen on shards that are active. Thus, the data from failed shards will be excluded from our search queries. In this context, we consider the system to be AP. However, the situation is temporary and does not require manual effort to synchronize shard when the cluster is connected again.
By setting a search option allow_partial_search_results [1] to false, we can force the system to error when some of the shards have failed, guaranteeing consistent results. In this context, we consider the system to be CP.
In case no primaries are reachable from the node(s) that our application server is connecting to, the system will completely fail. Even if we say that our partition tolerance has failed, we also see that availability has taken a hit. This situation can be called be just C or CP
There can be cases where the team has to anyhow bring up the shards and their out of sync replica(s) were reachable. So they decide to make it a primary (manually). Note that there can be some un-synced data resulting in divergent shards. This results in the AP situation. Consistency will be hard to restore when the situation normalizes (sync shards manually)
Writes
Only if all shards fail, writes will stop working. But even if one shard is active writes will work and are consistent (by default). This will be CP
However, we can set option index-wait-for-active-shards [2] as all to ensure writes only happen when all shards in the index are active. I only see a little advantage of the flag, which would be to keep all shards balanced at any cost. This will be still CP (but lesser availability than the previous case)
Like in the last read network partition case, if we make un-synced replicas as primary (manually) there can be some data loss and divergent shards. The situation will be AP here and consistency will be hard to restore when the situation normalizes (sync shards manually)
Based on the above, you can make a more informed decision and tweak ElasticSearch according to your requirements.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-wait-for-active-shards
I strongly disagree with Harshit, Elasticsearch compromises on availability as he also mentioned few requests are returned error due to unavailability of shards.
ES guarantees consistency - as data read/write are always consistent. guarantees ES gaurantees Partition tolerance - if any node which was partitioned, joined back to the cluster after some time, it is able to recover the missed data to the current state.
Moreover, there is no distributed system that gives up on Partition Tolerance, cause without a guaranty of PT distributed system can't exist.
CAP theorem states that a distributed system can have at most two of the following:
Consistency.
Availability.
Partition Tolerance.
Elasticsearch gives up on "Partition Tolerance"
Reason: It means that if the creation of the node fails, the cluster health will turn red and it will not proceed to operate on the newly created index.
It will not give up on "Availability" because every Elasticsearch query will be returning a response from cluster either true (results) / false (error).
It will not give up on "Consistency" either. If it gives up on consistency then there will not be any document versioning and no index recovery.
You read more here: https://discuss.elastic.co/t/elasticsearch-and-the-cap-theorem/15102/8

Configuring Elastic Search cluster with machines of different capacity(CPU, RAM) for rolling upgrades

Due to cost restrictions, I only have the following types of machines at disposal for setting up an ES cluster.
Node A: Lean(w.r.t. CPU, RAM) Instance
Node B: Beefy(w.r.t. CPU,RAM) Instance
Node M: "Leaner than A"(w.r.t. CPU, RAM) Instance
Disk-wise, both A and B have the same size.
My plan is to set up Node A and Node B acting as Master Eligible, Data node and Node M as Master-Eligible Only node(no data storing).
Because the two data nodes are NOT identical, what would be the implications?
I am going to make it a cluster of 3 machines only for the possibility of Rolling Upgrades(current volume of data and expected growth for few years can be managed with vertical scaling and leaving the default no. of shards and replica would enable me to scale horizontally if there is a need)
There is absolutely no need for your machines to have the same specs. You will need 3 master-eligible nodes not just for rolling-upgrades, but for high availability in general.
If you want to scale horizontally you can do so by either creating more indices to hold your data, or configure your index to have multiple primary and or replica shards. Since version 7 the default for new indices is to get created with 1 primary and 1 replica shard. A single index like this does not really allow you to schedule horizontally.
Update:
With respect to load and shard allocation (where to put data), Elasticsearch by default will simply consider the amount of storage available. When you start up an instance of Elasticsearch, it introspects the hardware and configures its threadpools (number of threads & size of queue) for various tasks accordingly. So the number of available threads to process tasks can vary. If I‘m not mistaken the coordinating node (the node receiving the external request) will distribute indexing/write requests in a round-robin fashion, not taking a load into consideration. Depending on your version of Elasticsearch, this is different for search/read requests where the coordinating node will leverage adaptive replica selection, taking into account the load/response time of the various replicas when distributing requests.
Besides this, sizing and scaling is a too complex topic to be answered comprehensively in a simple response. It typically also involves testing to figure out the limits/boundaries of a single node.
BTW: the number of default primary shards got changed in v7.x of Elasticsearch, as too much oversharding was one of the most common issues Elasticsearch users were facing. A “reasonable” shard size is in the tens of Gigabytes.

MongoDB capacity planning

I have an Oracle Database with around 7 millions of records/day and I want to switch to MongoDB. (~300Gb)
To setup a POC, I'd like to know how many nodes I need? I think 2 replica of 3 node in 2 shard will be enough but I want to know your thinking about it :)
I'd like to have an HA setup :)
Thanks in advance!
For MongoDB to work efficiently, you need to know your working set size..You need to know how much data does 7 million records/day amounts to. This is active data that will need to stay in RAM for high performance.
Also, be very sure WHY you are migrating to Mongo. I'm guessing..in your case, it is scalability..
but know your data well before doing so.
For your POC, keeping two shards means roughly 150GB on each.. If you have that much disk available, no problem.
Give some consideration to your sharding keys, what fields does it make sense for you to shared your data set on? This will impact on the decision of how many shards to deploy, verses the capacity of each shard. You might go with relatively few shards maybe two or three big deep shards if your data can be easily segmented into half or thirds, or several more lighter thinner shards if you can shard on a more diverse key.
It is relatively straightforward to upgrade from a MongoDB replica set configuration to a sharded cluster (each shard is actually a replica set). Rather than predetermining that sharding is the right solution to start with, I would think about what your reasons for sharding are (eg. will your application requirements outgrow the resources of a single machine; how much of your data set will be active working set for queries, etc).
It would be worth starting with replica sets and benchmarking this as part of planning your architecture and POC.
Some notes to get you started:
MongoDB's journaling, which is enabled by default as of 1.9.2, provides crash recovery and durability in the storage engine.
Replica sets are the building block for high availability, automatic failover, and data redundancy. Each replica set needs a minimum of three nodes (for example, three data nodes or two data nodes and an arbiter) to enable failover to a new primary via an election.
Sharding is useful for horizontal scaling once your data or writes exceed the resources of a single server.
Other considerations include planning your documents based on your application usage .. for example, if your documents will be updated frequently and grow in size over time, you may want to consider manual padding to prevent excessive document moves.
If this is your first MongoDB project you should definitely read the FAQs on Replica Sets and Sharding with MongoDB, as well as for Application Developers.
Note that choosing a good shard key for your use case is an important consideration. A poor choice of shard key can lead to "hot spots" for data writes, or unbalanced shards if you plan to delete large amounts of data.

Resources