Is ElasticSearch safe to allow every node to join cluster? - elasticsearch

ElasticSearch open port 9300 for node-to-node communication, and every machine in the same network with same cluster.name can auto join this cluster?
I doubt is it safe to allow every node to join?
If not, do I need to set network.host to a fixed ip address? Or is there a better way?

It really depends on the networking stack of your nodes and how you interact with your cluster. If they are all running on a local network, inaccessible from the outside, then in general, allow other nodes to join freely is OK since it means someone from inside your network is trying to join.
However, if your nodes have a public IP address, it's a good idea to change the default ports used, disable Zen multicast discovery, and give each node a list of the other nodes that are allowed to communicate with it.
Straight from the elasticsearch.yml file :
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["enter_ip_here","enter_other_ip:port","etc..."]
Note that these settings needs to be the same on all nodes (except for the list of hosts obviously) and a restart of the node is required for these to be taken into account.
Also, you can indeed set the network.host to a fixed IP. This IP should be the one appearing in the list of discovery.zen.ping.unicast.hosts.

Related

How do you connect to multiple ElasticSearch hosts using Elastisch?

Currently I connect to a ElasticSearch cluster as follows:
(esr/connect "localhost:9200")
But I am concerned about availability so plan to run an ElasticSearch cluster.
How do I modify my Elastisch code to connect to a cluster (so that if a node is unavailable I can fall back to another node)? Does it do this by default? The ElasticSearch java rest client seems to offer this functionality so does Elastisch?
You can have setup of cluster with multiple hosts, this can can be configured using elasticsearch.yaml configuration file like:
.....
.....
discovery.zen.ping.unicast.hosts: ['192.168.10.1:9300', '192.168.10.2:9300']
also elect one node as master and other as slave or data node
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true
also you can explore more about the same by below links
about Zen discovery in clustered enviroment
Important configuration for elasticsearch
One of the benefits of using a service like elasticsearch is that it takes care of the availability part of the equation, in that ES itself will handle nodes going down. You do have to configure it intelligently, which is outside the scope of this question/answer.
The connect function here does not actually connect; it basically just creates a URI and options, and when you call a function like clojurewerkz.elastisch.rest.document/search, you give it the connection data, which is then used in an actual network operation.
So, you can call esr/connect as often as you like on as many URLs as you like, but you don't need to. I recommend reading elasticsearch's documentation to get familiar with the architecture, about nodes, clusters, indexes, shards, etc. -- and configure your elasticsearch cluster properly. But as far as the code itself goes, you are insulated from the architecture and need not worry about these details. This is true of elasticsearch's REST API, and thus the elastisch wrapper also provides this.

ElasticSearch Clusters Setting

Does anyone know how to tell Elastic Search to stop node to node communications and then restart it..In my system I would like to tell it to stop until a certain condition then restart the communications ( synchronize data)
By node to node communications, do you mean data synchronization and shard relocations?
If yes, you can do it by setting cluster.routing.allocation.enable to none using cluster settings API.
If you don't mean data synchronization, you can achieve this by blocking the port 9300 (or which ever port ES is using for internal communication).
Please note that any node leaves the cluster will cause the elasticsearch to rebalance the shards and replica. The overall cluster loading increases when any node is lost since the cluster needs to fulfill the shard and replica settings by copying existing data to rest of nodes. Therefore, if the operation happens often, the considerable extra space will be consumed for additional shards and replicas.
If you fully understand the impact, you can try the shard allocation filtering. For example, exclude the host ip 10.0.0.1 from the cluster:
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}
Other than ip, you can use node name or host name to exclude the node as well.
You can find the full documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html

How to setup Elasticsearch client nodes?

I have couple of Elasticsearch questions regarding client node:
Can I say: any nodes as long as they are opening HTTP port, I can treat them as "client" nodes, because we can do search/index through this node.
Actually we treat the node as client node when the cluster=false and data=false, if I set up 10 client nodes, do I need to route in my client side, I mean if I specify clientOne:9200 in my code as ES portal, then would clientOne forward other HTTP requests to other client nodes, otherwise, clientOne would be under very high pressure. i.e do they communicate with each other between client nodes?
When I specify client nodes in ES cluster, should I close other nodes' HTTP port? Because we can only query client nodes.
Do you think it's necessary to set up both data node and client node in the same machine, or just setup data node acts as client node as well, anyways it's in the same machine?
If the ES cluster would be heavily/frequently indexed while less searched, then I don't have to set up client node, because client node good for gathering data, right please?
For general search/index purpose should I use http port or tcp port, what's the difference in clients perspective please?
Yes, you can send queries via http to any node that has port 9200 open.
With node.data: false and node.master: false, you get a "client node". These are useful for offloading indexing and search traffic from your data nodes. If you have 10 of them, you would want to put a load balancer in front of them.
Closing the data node's http port (http.enabled: false) would keep them from serving client requests (probably good), though it would also prevent you from curl'ing them directly for stats, etc.
Client nodes are useful (see #2), so I wouldn't route traffic directly to your data nodes. Whether you run both a client and data node on the same piece of hardware would be dependent on the config of that machine (do you have sufficient RAM, etc).
Client node are also useful for indexing, because they know which data node should receive the data for storage. If you sent an indexing request to a random data node instead, the odds would be high that it would have to redirect that request to another node. That's a waste of time and resources, if you can create client nodes.
Having your clients join the cluster might give them access to more information about the cluster, but using http gives them a more generic "black box" interface. With http, you also don't have to keep your clients at the same version as your ES nodes.
Hope that helps.

How to handle url change when a node dies?

I am new to elasticsearch. I have a cluster with 3 nodes on a same machine. To access each node I have separate url as the port changes(localhost:9200, localhost:9201, localhost:9202).
Now the question I have is that suppose my node 1(i.e. master node) dies then elasticsearch engine handle the situation very well and makes node 2 as master node but how does my application know that a node died and now I need to hit node 2 with port 9201?
Is there a way using which I always hit a single URL and internally it figures out which node to hit?
Thanks,
Pratz
The client search nodes with a discovery module. The name of the cluster in your clients configuration is important to get this working.
With a correct configuration (on client and cluster) you can bring a single node down without any (negative) effect on your client.
See the following links:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

transport_address not matching unicast host values in elasticsearch

Hi Elastic Search users,
We are seeing a rather strange issue. I have reviewed the email archives and I do not see this issue addressed already. We have discovery.zen.ping.multicast.enabled: false in our elasticsearch.yml. The cluster comes up and the state is green, and the nodes are aware of each other.
The strangeness is that the transport_address value returned in the cluster state query does not match the values in discovery.zen.ping.unicast.hosts - it is using a different interface on the machines.
Does anyone have any insight?
Values in discovery.zen.ping.unicast.hosts are used only for initial discovery. In other words they are used by a node to find other nodes in the cluster. An interface that a particular node is binding to or publishing for other nodes to use doesn't depend on discovery.zen.ping.unicast.hosts but instead it is controlled by network.host, network.bind_host and network.publish_host settings. See the network section of Elasticsearch guide for more information.

Resources