Adding a secondary node on another computer? - elasticsearch

The running instance of elasticsearch on a server is running with all defaults, no changes.
How can I scale horizontally to another server on another network?
Where do you specify this?
I only see one elasticsearch.yml file in the config directory, do I have to make a new config file for each cluster/node etc I would like to enable? The config file appears to be for one instance only. How do I tell it to use it as a master and the secondary server outside of the network as a secondary instance?

On the other node, you install ES as usual and, depending on the network characteristics and your preference, you change or not things in elasticsearch.yml of both ES instances.
ES uses by default multicasting on the network to discover nodes in the same cluster. A cluster is defined by "cluster.name" property you can find in elasticsearch.yml file. Nodes with the same "cluster.name" will join the same cluster. If using multicasting, you need to make sure, first the multicasting is available in your network configuration, and then that you don't have firewalls or any other things that could block communication between the nodes (like port 54328).
You can also use unicasting for nodes discovery, where the address of each node is specified in elasticsearch.yml. For more details about this, check elasticsearch.yml file as it has some good description of these settings. For example, disable multicasting:
discovery.zen.ping.multicast.enabled: false
and configure unicasting:
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Related

Changing FQDN of nodes in hadoop cluster

I would like to change the DNS of the nodes added to my hadoop cluster.
FOr example, FQDN of a node in my cluster is hadoop1.dev.com and I would like to change it to hadoop1.abc.xyz
COuld someone suggest me the process to change it without effecting my cluster data.
Update your /etc/hosts file like below, then restart network service to take effect
x.x.x.x hadoop1.dev.com hadoop1.abc.xyz

how to sync up two ElasticSearch cluster

I need to setup a replicated ES clusterII in data centerII, the ES clusterII just need to sync up with ES clusterI which in data centerI. So far my idea is that store snapshot in custerII and restore the snapshot in order to sync up clusterI. But this way kind of having some delay. Is there any better way please.
The ability to cluster is a concept baked into ElasticSearch. However it was not designed to be scaled across datacenters because this involves network latency, but it can do it.
The idea behind ElasticSearch is to have a highly-available cluster that replicates shards within itself (i.e. a replica level of 2 in a cluster means that you have 2 copies of the data across your cluster). This means one cluster alone is its own backup.
First, if you don't have it configured as a cluster, do so by adding the following to your /etc/elasticsearch/elasticsearch.yml (or wherever you put your config):
/etc/elasticsearch/elasticsearch.yml:
cluster.name: thisismycluster
node.name: ${HOSTNAME}
Alternatively, you can make node.name whatever you want, but it's best to put in your hostname.
You also want to make sure you have the ElasticSearch service bound to a particular address and/or interface, where the interface is probably your best bet because you need a point-to-point link across those datacenters:
/etc/elasticsearch/elasticsearch.yml:
network.host: [_tun1_]
You will need to make sure you set a list of discovery hosts, which means that on every host in the cluster, if their cluster.name parameter name matches, they will be discovered and assigned to that cluster. ElasticSearch takes care of the rest, it's magical!
You may add the host by name (only if defined in your /etc/hosts or DNS across your datacenters can resolve it) or IP:
/etc/elasticsearch/elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["ip1", "ip2", "..."]
Save the config and restart ElasticSearch:
sudo systemctl restart elasticsearch
OR
sudo service elasticsearch restart
If you aren't using systemd (depending on your OS), I would highly suggest using it.
I will tell you though that doing snapshots with ElasticSearch is a terrible idea, and to avoid it at all costs because ElasticSearch built the mentality of high-availability into the application already - this is why this application is so powerful and is being heavily adopted by the community and companies alike.

How to configure kibana for multiple nodes in the same cluster?

I have a ES cluster with three nodes, and the master will produce through election. So I want to know how to make Kibana listen to the cluster rather than only for one node like using elasticsearch.url: "http://localhost:8200". Thanks
This is not possible right now, there's an open issue about it: #214 (opened for 2+ years)
Kibana 4 can currently only reverse-proxy a single ES host. You don't necessarily need to reference a master node. You could create a small client node (with master: false and data: false), add it to your cluster and configure Kibana to point at it. That way you get load-balancing for free and don't need to care if the master changes or goes down.
UPDATE
As of Kibana 6.6, it is possible to provide multiple ES hosts
Kibana suggests to "run an Elasticsearch Coordinating only node on the same machine as Kibana". Load Balancing Across Multiple Elasticsearch Nodes describes the details about how to configure the ElasticSearch Coordinating node for Kibana.
You can use a reverse-proxy/load-balancer but it gets messy. A simpler solution will be to install Elasticsearch with coordinating-only node configuration on same machine/VM/instance as Kibana with following configurations:
1. Add these lines to elasticsearch.yml for coordinating-only node.
node.master: false
node.data: false
node.ingest: false
Update the discovery.zen.ping.unicast.hosts to add the new node (coordinating-only). In your case, there will be 4 hosts.
refer to localhost: in kibana dashboard configuration file.
Hope this helps.

Elasticsearch 0.90.0: Putting existing server (with some data) into cluster

I am using elasticsearch 0.90.0 in my production environment. Currently, I am using only a single server (along with jetty plugin to restrict write access).
Now I want a cluster of two servers consisting of one old server (with data) and one new server. Now, I need my data on both of my servers, so that in case if anyone of them fails, data can be fetched from another. What should I do? Will the normal configuration work? Can I copy data folder from one server to another and expect it to work properly when placed in a cluster? Or should I clone elasticsearch folder itself on my second machine?
By normal configuration, I mean this:-
cluster.name: elasticsearch
node.name: "John"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: []
Please help!
You should not have to copy data manually.
If you kept the default settings, adding one more node to the cluster should trigger :
creation of one replica by existing shard on your new node
rebalancing of the primary shards between your two nodes
You will have to add the adresses of your nodes as hosts as you have enabled unicast communication.
I recommend you to make some tests with ElasticSearch instances on your personal computer in order to check this behavior.

how to start elastic search in non cluster mode

I have two different machines running elastic search server instances. They automatically create a cluster and changes made on one instance reflect on other instance on different machine. I have changed the cluster.name property in elasticsearch.yml file in config folder and the issue is resolved. I wanted to know if i can start elastic search server instance in non-cluster mode ?
You can't start the es server in non-cluster mode.
But if you want the two servers to run independently (in its own cluster), there are 2 options that I can think of:
Disable multicast and don't set the hosts for them in unicast
Change the cluster.name to make them have different names
The easiest is to set node.local: true
This prevents elasticsearch from trying to connect to other nodes.
Using a custom name is also a good idea in any case just to prevent unintended exchange of data. Use something else for production, testing, and development.

Resources