how to sync up two ElasticSearch cluster - elasticsearch

I need to setup a replicated ES clusterII in data centerII, the ES clusterII just need to sync up with ES clusterI which in data centerI. So far my idea is that store snapshot in custerII and restore the snapshot in order to sync up clusterI. But this way kind of having some delay. Is there any better way please.

The ability to cluster is a concept baked into ElasticSearch. However it was not designed to be scaled across datacenters because this involves network latency, but it can do it.
The idea behind ElasticSearch is to have a highly-available cluster that replicates shards within itself (i.e. a replica level of 2 in a cluster means that you have 2 copies of the data across your cluster). This means one cluster alone is its own backup.
First, if you don't have it configured as a cluster, do so by adding the following to your /etc/elasticsearch/elasticsearch.yml (or wherever you put your config):
/etc/elasticsearch/elasticsearch.yml:
cluster.name: thisismycluster
node.name: ${HOSTNAME}
Alternatively, you can make node.name whatever you want, but it's best to put in your hostname.
You also want to make sure you have the ElasticSearch service bound to a particular address and/or interface, where the interface is probably your best bet because you need a point-to-point link across those datacenters:
/etc/elasticsearch/elasticsearch.yml:
network.host: [_tun1_]
You will need to make sure you set a list of discovery hosts, which means that on every host in the cluster, if their cluster.name parameter name matches, they will be discovered and assigned to that cluster. ElasticSearch takes care of the rest, it's magical!
You may add the host by name (only if defined in your /etc/hosts or DNS across your datacenters can resolve it) or IP:
/etc/elasticsearch/elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["ip1", "ip2", "..."]
Save the config and restart ElasticSearch:
sudo systemctl restart elasticsearch
OR
sudo service elasticsearch restart
If you aren't using systemd (depending on your OS), I would highly suggest using it.
I will tell you though that doing snapshots with ElasticSearch is a terrible idea, and to avoid it at all costs because ElasticSearch built the mentality of high-availability into the application already - this is why this application is so powerful and is being heavily adopted by the community and companies alike.

Related

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

Adding a secondary node on another computer?

The running instance of elasticsearch on a server is running with all defaults, no changes.
How can I scale horizontally to another server on another network?
Where do you specify this?
I only see one elasticsearch.yml file in the config directory, do I have to make a new config file for each cluster/node etc I would like to enable? The config file appears to be for one instance only. How do I tell it to use it as a master and the secondary server outside of the network as a secondary instance?
On the other node, you install ES as usual and, depending on the network characteristics and your preference, you change or not things in elasticsearch.yml of both ES instances.
ES uses by default multicasting on the network to discover nodes in the same cluster. A cluster is defined by "cluster.name" property you can find in elasticsearch.yml file. Nodes with the same "cluster.name" will join the same cluster. If using multicasting, you need to make sure, first the multicasting is available in your network configuration, and then that you don't have firewalls or any other things that could block communication between the nodes (like port 54328).
You can also use unicasting for nodes discovery, where the address of each node is specified in elasticsearch.yml. For more details about this, check elasticsearch.yml file as it has some good description of these settings. For example, disable multicasting:
discovery.zen.ping.multicast.enabled: false
and configure unicasting:
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Elasticsearch 0.90.0: Putting existing server (with some data) into cluster

I am using elasticsearch 0.90.0 in my production environment. Currently, I am using only a single server (along with jetty plugin to restrict write access).
Now I want a cluster of two servers consisting of one old server (with data) and one new server. Now, I need my data on both of my servers, so that in case if anyone of them fails, data can be fetched from another. What should I do? Will the normal configuration work? Can I copy data folder from one server to another and expect it to work properly when placed in a cluster? Or should I clone elasticsearch folder itself on my second machine?
By normal configuration, I mean this:-
cluster.name: elasticsearch
node.name: "John"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: []
Please help!
You should not have to copy data manually.
If you kept the default settings, adding one more node to the cluster should trigger :
creation of one replica by existing shard on your new node
rebalancing of the primary shards between your two nodes
You will have to add the adresses of your nodes as hosts as you have enabled unicast communication.
I recommend you to make some tests with ElasticSearch instances on your personal computer in order to check this behavior.

how to start elastic search in non cluster mode

I have two different machines running elastic search server instances. They automatically create a cluster and changes made on one instance reflect on other instance on different machine. I have changed the cluster.name property in elasticsearch.yml file in config folder and the issue is resolved. I wanted to know if i can start elastic search server instance in non-cluster mode ?
You can't start the es server in non-cluster mode.
But if you want the two servers to run independently (in its own cluster), there are 2 options that I can think of:
Disable multicast and don't set the hosts for them in unicast
Change the cluster.name to make them have different names
The easiest is to set node.local: true
This prevents elasticsearch from trying to connect to other nodes.
Using a custom name is also a good idea in any case just to prevent unintended exchange of data. Use something else for production, testing, and development.

Sharing elasticsearch between Logstash/graylog2 and my own application

Would it be safe to share an elasticsearch cluster (or single-node elasticsearch cluster) between Logstash or graylog2 and my own application? what configuration changes/additions should be made for accomodating that? what kind of name-spacing would the application require for storing its own data in separation from graylog/Logstash?
I'd rather avoid maintaining separate clusters, especially on dev boxes but also in general - if the architecture allows.
It is technically possible but not recommended. You will experience load on the logging cluster that you want to decouple from the other applications using ES.
Graylog2 supports defining an index prefix for having multiple setups running in one ES cluster.
We have both(Kibana and Graylog) running with shared Elasticsearch. It's just that the indexing pattern is something different we have to add Circuit breakers in Elasticsearch so that Kibana search query for logs would not expand beyond a certain size.

Resources