Elastic Search : Adding nodes to cluster on the fly - elasticsearch

I want to setup Elastic-Search cluster. As it is a distributed system, I should be able to add more nodes on the fly(meaning: adding new nodes after it is deployed once). How is this done and how does Elastic-search manage to do it?

Elasticsearch handles this using Zen Discovery
The zen discovery is the built in discovery module for elasticsearch
and the default. It provides unicast discovery, but can be extended to
support cloud environments and other forms of discovery.
This is done through elasticsearch.yml configuration file. You have two options - multicast and unicast:
Multicast lets your new node to connect to your cluster without specifying IPs, however it's not recommended.
Unicast. You specify a list of nodes in your cluster (their IPs).
Both ways, your started node will try to ping other nodes and if their cluster names are matching, it will join it.
Configuration example:
cluster.name: elasticsearch_media
node.name: "media-dev"
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["153.32.228.250[9300-9400]", "10.122.234.19[9300-9400]"]

All you have to do is to edit the main configuration file on your new node and change the cluster name to the cluster you are currently running. Of course the new node must be discoverable. This depends on your network settings.

Try to write a script which will accept command line arguments about cluster name and IP addresses, authentications etc.,
This script will open and modify elasticsearch.yml file on the remote server.

Related

How do I connect to an elastic search server from a remote computer?

Every guide or post about this topic says to just set network.host: 0 in the elasticsearch.yml file. However I tried that, along with applying other troubleshooting methods, and nothing seems to work. I'm starting to think maybe the configuration is right, but I am not connecting to it the right way?
This is what my yml file looks like,
discovery.seed_hosts: []
network.publish_host: xx.xxx.xxx.51
network.host: 0.0.0.0
The elastic search server is hosted on an Azure virtual machine. Then when I try to connect to it via curl on my local machine I get a Failed to Connect, Timeout Error.
curl http://xx.xxx.xxx.51:9200
The issue was with the network settings which was blocking all the incoming traffic and once incoming traffic on port 9200, default port of Elasticsearch allowed, the issue got resolved.
Just for the reference, you just need to have network.host: 0.0.0.0 config to make sure Elasticsearch isn't using the loopback address and this by default kicks in the production checks which can be avoided in case you are just running a single node discovery.type:single-node, this helps to troubleshoot such issues.

How can i connect to my elasticsearch cluster from another machine?

I want to connect my elasticsearch cluster from another machine i went through some documentation where they had mentioned that i had change the network.bind_host : 0 .But i didn't find the network.bind_host in my elasticsearch.yml . I got only network.host in my elasticsearch.yml file.Even i tried it by giving as
network.host :0 but i cant able to connect from another machine. And i also tried removing ## before network.host :0 which throws an error when starting elasticsearch cluster.
When i am connecting from another machine i have to give http://clustermachingip:9200 right?
Can anyone please help on this problem?
Thanks..
When you want to connect to an elasticsearch instance of an another machine, yes the address is http://clustermachingip:9200. Can you try setting network.bind_host: clustermachingip
If this doesn't work then you might want to check the connectivity to the machine you are trying to connect to using something like a ping command.
ping clustermachingip
EDIT:
You can just start elasticsearch in one machine and try one of the following curl commands from the other machine.
curl 'clustermachingip:9200/_cat/nodes?v'
curl 'clustermachingip:9200/_cat/health?v'
EDIT2: Clearing out confusion between network.host, network.bind_host
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/modules-network.html#advanced-network-settings
The network.host setting explained in Commonly used network settings
is a shortcut which sets the bind host and the publish host at the
same time. In advanced used cases, such as when running behind a proxy
server, you may need to set these settings to different values:
network.bind_host
This specifies which network interface(s) a node should bind to in order to listen for incoming requests. A node can bind to multiple
interfaces, e.g. two network cards, or a site-local address and a
local address. Defaults to network.host. network.publish_host
The publish host is the single interface that the node advertises to other nodes in the cluster, so that those nodes can connect to it.
Currently an elasticsearch node may be bound to multiple addresses,
but only publishes one. If not specified, this defaults to the “best”
address from network.host, sorted by IPv4/IPv6 stack preference, then
by reachability.
Set your network.host in elasticsearch.yml to 0.0.0.0 i.e. it will listen on all available bound addresses.
network.host: 0.0.0.0
Check your connectivity to the host machine on the port (in case you haven't changed the port it will be 9200).
In case you are not able to connect to the host machine still, I will suggest checking your iptables and allow connections to port 9200.

why do we need to setup a publish address[network.host] value

Looks like elastic search is not discoverable without setting the box's ip address in this property : network.host .
Why cant it just bind to the box's ip address(like it happens in application servers like rest apps).
Why is there even a provision to bind to a particular ip address?
The key property that matters is network.publish_host. You configure this indirectly via network.host. The publish host is the address that nodes advertise to other nodes as the address to be reached on when they join the cluster. So, it needs to be something that is reachable from the other nodes. E.g. 127.0.0.1 would not work for this; likewise a loadbalanced address won't work either.
Also see documentation for these properties
Many servers have multiple network interfaces and a common problem before this change was Elasticsearch picking the wrong one for the publish host and then failing to cluster because the nodes ended up advertising the wrong address to each other. Since Elasticsearch cannot know the right interface, you have to tell it.
This change has been introduced in 2.0 as explained in the breaking changes > network changes documentation:
This change prevents Elasticsearch from trying to connect to other nodes on your network unless you specifically tell it to do so. When moving to production you should configure the network.host parameter
The ES folks also released a blog article back then to explain the underlying reasons for this change, i.e. mainly to prevent your node from accidentally binding to another cluster available on the network.
To run on a local network a single node I added these to my
un-comment or comment
elasticsearch.yml
http.port: 9201
http.bind_host: 192.168.1.172 #works
or
http.port: 9201
http.publish_host: 192.168.1.172 #by itself does not work
http.host: 192.168.1.172 #works alone

Autocreate cluster Elastic Search

What is the parameter to be set to elastic search does not allow the creation of automated cluster?
And what happened if a cluster is automatically created?
Elastic search creates the cluster based on the nodes . For node discovery we can use Multicast or Uni-cast
In general development environment we use multicast discovery to true , so that is discoverable . In production we need to use unicast and specify the names as mentioned below and disable the multicast to false.
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
The cluster name is configured
#cluster.name: elasticsearch
All these setting are in config folder of Elastic search and will be available in elasticsearch.yml file.

Elastic Search Clustering in the Cloud

I have 2 Linux VM's (both at same datacenter of Cloud Provider): Elastic1 and Elastic2 (where Elastic 2 is a clone of Elastic 1). Both have same version centos, same cluster name, and same version ES, again - Elastic2 is a clone.
I use the service wrapper to automatically start them both at boot, and introduced each others ip to their respective iptables file, so now I can successfully ping between nodes.
I thought this would be enough to allow ES to form a cluster, but to no avail.
Both Elastic1 and Elastic2 have 1 index each named e1 and e2 respectfully. Each index has 1 shard with no replicas.
I can use the head and paramedic plugins on each server successfully. And use curl -XGET 'http://localhost:9200/_cluster/nodes?pretty=true' to validate the cluster name is the same and each server only has 1 node listed.
Is there anything glaring out at why these nodes arent talking? Ive restarted the ES service and rebooted on both servers to no avail. Could cloning be the problem??
In your elasticsearch.yml:
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['host1:9300', 'host2:9300']
So, just list your node IPs with the transport port (default is 9300) under unicast hosts. Multicast is enabled by default, but is generally impossible on cloud environments without use of external plugins.
Also, make sure to check your IP rules / security groups! That's easy to forget.

Resources