Elasticsearch configuration and best practices in production - elasticsearch

I'm new on working with the ELK stack and I'm working on 10 TB stocked on physical servers, so if there is recommendation on how many data nodes, Master nodes .. should I need to use , the best practice to configure our cluster to work smoothly in production and if there is other tools or technologies used with Elasticsearch for to improve performance

#ameur you can refer to these pages :
https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html
Regarding master nodes, you should have minimum 3 nodes(Go for 5 nodes if possible).
For data nodes , there are multiple factors involved -
for ex:
resources like RAM,CPU, disk
throughput like qpa,wps etc.
so there is no straightforward answer to that, you will need to do some performance test to get the right number.
don't forget to read about sharding strategy https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html

Related

How much clusters must I use in elastic stack?

I am confused which approach would be better having single cluster with 12 nodes or having 3 cluster with 4 nodes each in elastic stack. What are the advantages and disadvantages of single cluster? Does elastic charge me for 3 cluster as far as I know they charge for nodes but can someone clarify which would be better approach and which would be cost effective solution?
I am planning to use these nodes in my cluster :
master
data_content
data_hot
ingest
ml
remote_cluster_client
What the optimal cluster size is depends on various requirements / tradeoffs:
Do you have multiple users / systems that you might want to isolate against each other (so that one running wild won't overload the cluster for everyone)? Then you might be better off with multiple clusters.
On the other hand a single larger cluster would be able to absorb extra load from one user / system better.
Smaller clusters are quicker to upgrade and you don't have one "big bang" upgrade. Or you might just upgrade some part but not everything at once.
Every cluster should have 3 master eligible nodes.
Most features in the Elastic Stack are free, but some are paid. Besides the cloud service where it's resource based, there are 2 modes for pricing:
The classic node based pricing. Every Elasticsearch process would need a license. So larger nodes (within the technical limits) would cost you less than many smaller ones, but the cluster size itself doesn't matter.
The newer pricing model for ECE / ECK is resource based where you buy chunks of memory and you can slice that into as many nodes or clusters as you want.

Elasticsearch cluster setup

I'm curerntly running a single node ES-Instance. As there are some limitations with a single server setup in ES, and the queries are becoming pretty slow sometimes, I want to upgrade to a full cluster.
The ES-Instance currently only stores data, and is not doing any fancy stuff (Transformations, Ingest Pipelines, ...). All I currently need is a place to store my data at, and to retrieve it (Search + Aggregations). There are more reads than writes.
In a lot of forums and blog posts I read about the "Split-Brain" issue. To circumvent this, the minimum node count should be 3.
The idea is to keep the amount of machines low, because this is a private project and I do not want to also manage a lot of OS in my spare time..
The structure I thought about was:
- 1 Coordinator + Voting-only Node
- 2 Master-eligible + Data Nodes
minimum_master_nodes: 2 to circumvent Split-Brains
Send all ES-Queries to the Coordinator, which will then issue the requests on the data nodes and reduce the final results.
My question is: Does this make sense? Or is it better to use 3 master-eligible + Data nodes?
Online I found no guidance for ES-Newbies to get an idea of the structure of a simple cluster.
You are in right direction and I can see most of your thinking is also right so don't consider yourself as ES newbie :).
Anyway as you are going to have 3 nodes in your cluster, why note make all three nodes as master eligible nodes and why you are making a dedicated co-ordinating node when by default every ES node works as a co-ordinating node and in your small project you won't need a dedicated co-ordinating node. this way you will have a simple configuration, just don't assign any explicit role to any node as by default all ES nodes are master, data and co-ordinating node.
Also, you should invest some time to identify the slow logs and its cause to make it more performant rather than adding more resources that too in personal project, please refer to my short tips on improving the search performance

ELK Deployment; CPU,Memory,Disk

I would like to deploy Elasticsearch, logstash and kibana in 3 different flavors, the characteristics of each one:
o 32vCPU, 384Go RAM et 900Go HDD
I would like to supervise 100 servers so approximately 33 servers in each flavors.
Do you think it's a good idea to use this configuration? and it's not a problem to use this huge capacity of memory?
Another question how many nodes should I use?
without details its hard to give you global advice but Elasticsearch recommend to never cross 31Gb for RAM. Here are the reasons why
You should read all the page, they explain why it is generally far better to have a lot of small/medium hosts instead of a few big ones.
I also recommend you to read this post, it will give you some insight on how to design an Elastic Cluster especially the distinction between roles in a cluster and the difference in hardware needed.
For your question :
Another question how many nodes should I use?
There is no good answer without knowing the volume of data, read/write etc etc...
And last, I hardly doubt that using the same configuration for kibana / logstash / elastic hosts is a good idea. They just don't do the same sort of processing. You should start with small configuration and update it incrementally when you will have real data.

3 Node Cluster for Elastic, Kafka and Cassandra - On 3 Machines

We are creating a 3 node elastic cluster, but want to use each of our 3 elastic nodes for other things, like Kafka and Cassandra. We need high availability, so we want to have 3 nodes for everything, but we don't want to have 9 machines, we just want to put them on one bigger computer. Is this a typical scenario?
I would say no.
One sandbox machine running a PoC with all the components local, sure, why not. But Production with HA requirements, you are just asking for trouble putting everything in one place. They're going to compete for resource, one blowing the box up kills the others, touching the machine to change one risks the others, etc, etc.
IMO keep your architecture clean and deploy on separate nodes for each component.

How to setup ElasticSearch cluster with auto-scaling on Amazon EC2?

There is a great tutorial elasticsearch on ec2 about configuring ES on Amazon EC2. I studied it and applied all recommendations.
Now I have AMI and can run any number of nodes in the cluster from this AMI. Auto-discovery is configured and the nodes join the cluster as they really should.
The question is How to configure cluster in way that I can automatically launch/terminate nodes depending on cluster load?
For example I want to have only 1 node running when we don't have any load and 12 nodes running on peak load. But wait, if I terminate 11 nodes in cluster what would happen with shards and replicas? How to make sure I don't lose any data in cluster if I terminate 11 nodes out of 12 nodes?
I might want to configure S3 Gateway for this. But all the gateways except for local are deprecated.
There is an article in the manual about shards allocation. May be I'm missing something very basic but I should admit I failed to figure out if it is possible to configure one node to always hold all the shards copies. My goal is to make sure that if this would be the only node running in the cluster we still don't lose any data.
The only solution I can imagine now is to configure index to have 12 shards and 12 replicas. Then when up to 12 nodes are launched every node would have copy of every shard. But I don't like this solution cause I would have to reconfigure cluster if I might want to have more then 12 nodes on peak load.
Auto scaling doesn't make a lot of sense with ElasticSearch.
Shard moving and re-allocation is not a light process, especially if you have a lot of data. It stresses IO and network, and can degrade the performance of ElasticSearch badly. (If you want to limit the effect you should throttle cluster recovery using settings like cluster.routing.allocation.cluster_concurrent_rebalance, indices.recovery.concurrent_streams, indices.recovery.max_size_per_sec . This will limit the impact but will also slow the re-balancing and recovery).
Also, if you care about your data you don't want to have only 1 node ever. You need your data to be replicated, so you will need at least 2 nodes (or more if you feel safer with a higher replication level).
Another thing to remember is that while you can change the number of replicas, you can't change the number of shards. This is configured when you create your index and cannot be changed (if you want more shards you need to create another index and reindex all your data). So your number of shards should take into account the data size and the cluster size, considering the higher number of nodes you want but also your minimal setup (can fewer nodes hold all the shards and serve the estimated traffic?).
So theoretically, if you want to have 2 nodes at low time and 12 nodes on peak, you can set your index to have 6 shards with 1 replica. So on low times you have 2 nodes that hold 6 shards each, and on peak you have 12 nodes that hold 1 shard each.
But again, I strongly suggest rethinking this and testing the impact of shard moving on your cluster performance.
In cases where the elasticity of your application is driven by a variable query load you could setup ES nodes configured to not store any data (node.data = false, http.enabled = true) and then put them in for auto scaling. These nodes could offload all the HTTP and result conflation processing from your main data nodes (freeing them up for more indexing and searching).
Since these nodes wouldn't have shards allocated to them bringing them up and down dynamically shouldn't be a problem and the auto-discovery should allow them to join the cluster.
I think this is a concern in general when it comes to employing auto-scalable architecture to meet temporary demands, but data still needs to be saved. I think there is a solution that leverages EBS
map shards to specific EBS volumes. Lets say we need 15 shards. We will need 15 EBS Volumes
amazon allows you to mount multiple volumes, so when we start we can start with few instances that have multiple volumes attached to them
as load increase, we can spin up additional instance - upto 15.
The above solution is only advised if you know your max capacity requirements.
I can give you an alternative approach using aws elastic search service(it will cost little bit more than normal ec2 elasticsearch).Write a simple script which continuously monitor the load (through api/cli)on the service and if the load goes beyond the threshold, programatically increase the nodes of your aws elasticsearch-service cluster.Here the advantage is aws will take care of the scaling(As per the documentation they are taking a snaphost and launching a completely new cluster).This will work for scale down also.
Regarding Auto-scaling approach there is some challenges like shard movement has an impact on the existing cluster, also we need to more vigilant while scaling down.You can find a good article on scaling down here which I have tested.If you can do some kind of intelligent automation of the steps in the above link through some scripting(python, shell) or through automation tools like Ansible, then the scaling in/out is achievable.But again you need to start the scaling up well before the normal limits since the scale up activities can have an impact on existing cluster.
Question: is possible to configure one node to always hold all the shards copies?
Answer: Yes,its possible by explicit shard routing.More details here
I would be tempted to suggest solving this a different way in AWS. I dont know what ES data this is or how its updated etc... Making a lot of assumptions I would put the ES instance behind a ALB (app load balancer) I would have a scheduled process that creates updated AMI's regularly (if you do it often then it will be quick to do), then based on load of your single server I would trigger more instances to be created from the latest instance you have available. Add the new instances to the ALB to share some of the load. As this quiet down I would trigger the termination of the temp instances. If you go this route here are a couple more things to consider
Use spot instances since they are cheaper and if it fits your use case
The "T" instances dont fit well here since they need time to build up credits
Use lambdas for the task of turning things on and off, if you want to be fancy you can trigger it based on a webhook to the aws gateway
Making more assumptions about your use case, consider putting a Varnish server in front of your ES machine so that you can more cheaply provide scale based on a cache strategy (lots of assumptions here) based on the stress you can dial in the right TTL for cache eviction. Check out the soft-purge feature for our ES stuff we have gotten a lot of good value from this.
if you do any of what i suggest here make sure to make your spawned ES instances report any logs back to a central addressable place on the persistent ES machine so you don't lose logs when the machines die

Resources