I am using elasticsearch 0.90.0 in my production environment. Currently, I am using only a single server (along with jetty plugin to restrict write access).
Now I want a cluster of two servers consisting of one old server (with data) and one new server. Now, I need my data on both of my servers, so that in case if anyone of them fails, data can be fetched from another. What should I do? Will the normal configuration work? Can I copy data folder from one server to another and expect it to work properly when placed in a cluster? Or should I clone elasticsearch folder itself on my second machine?
By normal configuration, I mean this:-
cluster.name: elasticsearch
node.name: "John"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: []
Please help!
You should not have to copy data manually.
If you kept the default settings, adding one more node to the cluster should trigger :
creation of one replica by existing shard on your new node
rebalancing of the primary shards between your two nodes
You will have to add the adresses of your nodes as hosts as you have enabled unicast communication.
I recommend you to make some tests with ElasticSearch instances on your personal computer in order to check this behavior.
Related
I need to setup a replicated ES clusterII in data centerII, the ES clusterII just need to sync up with ES clusterI which in data centerI. So far my idea is that store snapshot in custerII and restore the snapshot in order to sync up clusterI. But this way kind of having some delay. Is there any better way please.
The ability to cluster is a concept baked into ElasticSearch. However it was not designed to be scaled across datacenters because this involves network latency, but it can do it.
The idea behind ElasticSearch is to have a highly-available cluster that replicates shards within itself (i.e. a replica level of 2 in a cluster means that you have 2 copies of the data across your cluster). This means one cluster alone is its own backup.
First, if you don't have it configured as a cluster, do so by adding the following to your /etc/elasticsearch/elasticsearch.yml (or wherever you put your config):
/etc/elasticsearch/elasticsearch.yml:
cluster.name: thisismycluster
node.name: ${HOSTNAME}
Alternatively, you can make node.name whatever you want, but it's best to put in your hostname.
You also want to make sure you have the ElasticSearch service bound to a particular address and/or interface, where the interface is probably your best bet because you need a point-to-point link across those datacenters:
/etc/elasticsearch/elasticsearch.yml:
network.host: [_tun1_]
You will need to make sure you set a list of discovery hosts, which means that on every host in the cluster, if their cluster.name parameter name matches, they will be discovered and assigned to that cluster. ElasticSearch takes care of the rest, it's magical!
You may add the host by name (only if defined in your /etc/hosts or DNS across your datacenters can resolve it) or IP:
/etc/elasticsearch/elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["ip1", "ip2", "..."]
Save the config and restart ElasticSearch:
sudo systemctl restart elasticsearch
OR
sudo service elasticsearch restart
If you aren't using systemd (depending on your OS), I would highly suggest using it.
I will tell you though that doing snapshots with ElasticSearch is a terrible idea, and to avoid it at all costs because ElasticSearch built the mentality of high-availability into the application already - this is why this application is so powerful and is being heavily adopted by the community and companies alike.
I have a ES cluster with three nodes, and the master will produce through election. So I want to know how to make Kibana listen to the cluster rather than only for one node like using elasticsearch.url: "http://localhost:8200". Thanks
This is not possible right now, there's an open issue about it: #214 (opened for 2+ years)
Kibana 4 can currently only reverse-proxy a single ES host. You don't necessarily need to reference a master node. You could create a small client node (with master: false and data: false), add it to your cluster and configure Kibana to point at it. That way you get load-balancing for free and don't need to care if the master changes or goes down.
UPDATE
As of Kibana 6.6, it is possible to provide multiple ES hosts
Kibana suggests to "run an Elasticsearch Coordinating only node on the same machine as Kibana". Load Balancing Across Multiple Elasticsearch Nodes describes the details about how to configure the ElasticSearch Coordinating node for Kibana.
You can use a reverse-proxy/load-balancer but it gets messy. A simpler solution will be to install Elasticsearch with coordinating-only node configuration on same machine/VM/instance as Kibana with following configurations:
1. Add these lines to elasticsearch.yml for coordinating-only node.
node.master: false
node.data: false
node.ingest: false
Update the discovery.zen.ping.unicast.hosts to add the new node (coordinating-only). In your case, there will be 4 hosts.
refer to localhost: in kibana dashboard configuration file.
Hope this helps.
The running instance of elasticsearch on a server is running with all defaults, no changes.
How can I scale horizontally to another server on another network?
Where do you specify this?
I only see one elasticsearch.yml file in the config directory, do I have to make a new config file for each cluster/node etc I would like to enable? The config file appears to be for one instance only. How do I tell it to use it as a master and the secondary server outside of the network as a secondary instance?
On the other node, you install ES as usual and, depending on the network characteristics and your preference, you change or not things in elasticsearch.yml of both ES instances.
ES uses by default multicasting on the network to discover nodes in the same cluster. A cluster is defined by "cluster.name" property you can find in elasticsearch.yml file. Nodes with the same "cluster.name" will join the same cluster. If using multicasting, you need to make sure, first the multicasting is available in your network configuration, and then that you don't have firewalls or any other things that could block communication between the nodes (like port 54328).
You can also use unicasting for nodes discovery, where the address of each node is specified in elasticsearch.yml. For more details about this, check elasticsearch.yml file as it has some good description of these settings. For example, disable multicasting:
discovery.zen.ping.multicast.enabled: false
and configure unicasting:
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
I have two different machines running elastic search server instances. They automatically create a cluster and changes made on one instance reflect on other instance on different machine. I have changed the cluster.name property in elasticsearch.yml file in config folder and the issue is resolved. I wanted to know if i can start elastic search server instance in non-cluster mode ?
You can't start the es server in non-cluster mode.
But if you want the two servers to run independently (in its own cluster), there are 2 options that I can think of:
Disable multicast and don't set the hosts for them in unicast
Change the cluster.name to make them have different names
The easiest is to set node.local: true
This prevents elasticsearch from trying to connect to other nodes.
Using a custom name is also a good idea in any case just to prevent unintended exchange of data. Use something else for production, testing, and development.
I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2