How do you connect to multiple ElasticSearch hosts using Elastisch? - elasticsearch

Currently I connect to a ElasticSearch cluster as follows:
(esr/connect "localhost:9200")
But I am concerned about availability so plan to run an ElasticSearch cluster.
How do I modify my Elastisch code to connect to a cluster (so that if a node is unavailable I can fall back to another node)? Does it do this by default? The ElasticSearch java rest client seems to offer this functionality so does Elastisch?

You can have setup of cluster with multiple hosts, this can can be configured using elasticsearch.yaml configuration file like:
.....
.....
discovery.zen.ping.unicast.hosts: ['192.168.10.1:9300', '192.168.10.2:9300']
also elect one node as master and other as slave or data node
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true
also you can explore more about the same by below links
about Zen discovery in clustered enviroment
Important configuration for elasticsearch

One of the benefits of using a service like elasticsearch is that it takes care of the availability part of the equation, in that ES itself will handle nodes going down. You do have to configure it intelligently, which is outside the scope of this question/answer.
The connect function here does not actually connect; it basically just creates a URI and options, and when you call a function like clojurewerkz.elastisch.rest.document/search, you give it the connection data, which is then used in an actual network operation.
So, you can call esr/connect as often as you like on as many URLs as you like, but you don't need to. I recommend reading elasticsearch's documentation to get familiar with the architecture, about nodes, clusters, indexes, shards, etc. -- and configure your elasticsearch cluster properly. But as far as the code itself goes, you are insulated from the architecture and need not worry about these details. This is true of elasticsearch's REST API, and thus the elastisch wrapper also provides this.

Related

Elasticsearch - two out of three nodes instant shutdown case

We have a small Elasticsearch cluster for 3 nodes: two in one datacenter and one in another for disaster recovery reasons. However, if the first two nodes fail simultaneously, the third one won't work either - it will just throw "master not discovered or elected yet".
I understand that this is intended - this is how Elasticsearch cluster should work. But is there some additional special configuration that I don't know to keep the third single node working, even if in the read-only mode?
nope, there's not. as you mentioned it's designed that way
you're probably not doing yourselves a lot of favours by running things across datacentres like that. network issues are not kind on Elasticsearch due to it's distributed nature
Elasticsearch runs in distributed mode by default. Nodes assume that there are or will be a part of the cluster, and during setup nodes try to automatically join the cluster.
If you want your Elasticsearch to be available for only node without the need to communicate with other Elasticsearch nodes. It works similar to a standalone server. To do this we can tell Elasticsearch to work in local only (disable network)
open your elasticsearch/config/elasticsearch.yml and set:
node.local: true

How can I set up multiple elastic search instances on one server with different data?

We use Elastic Search and Kibana at my company. I want to create a second elastic search instance running on the same server but in a different JVM - lets call them A and B. I would like A to have an index called other_logs and B to have an index called batch. I want to be able to search both of them via a single kibana instance and set up dashboards that can read either index on either JVM. Data written to A should not be written to B and vice versa.
The reason is we have some batch jobs which depend on ES and ES has been a bit unstable causing batch job failures. The batch reads/writes very little data to ES but the rest of the app writes a ton of logs and is causing the instability. If we can't read logs its a minor issue but if the batch fails its a major issue. Hence for a short term fix while we look at the ES instability, I would like to move the batch dependencies to a new JVM (ES instance B) which should be small and more stable.
I assume I need the second ES instance to run with a different cluster name otherwise the data will get replicated. When testing this I am seeing a few exceptions so not sure if I'm going in the right direction. I'm looking at "cross-cluster search" which looks like it might allow me to keep one kibana and search both clusters but have zero experience with ES or Kibana and not much time to research this.
Any suggestions on how I can accomplish the configuration? Am I on the right path?
I think I proved everything out at least on my own local test machine. Essentially what I did was created a second cluster which can run on the same machine and have independent configuration files. By changing the folder I am also able to set independent jvm.options since I want less memory for the new cluster. Once that was working I configured the single kibana instance to know about the new cluster and then created an index pattern so I could search it. Cross cluster searching is discussed here and you can refer to the new 'remote' cluster directly in searches:
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/modules-remote-clusters.html
Port 9300 is the default port all the nodes on a cluster will use to talk to each other so I changed the new cluster to use 9301. With the default it was scanning 9300 first and throwing an exception then scanning 9301. So it was working without hardcoding to 9301 but I dont like seeing the exceptions in the logs and I wanted to control which port is used.
For posterity sake here are the details:
1). Created a copied config folder under elastic search to configB and edited elasticsearch.yml:
cluster.name: ClusterB
path.data: dataB
path.logs: logsB
http.port: 9201
transport.port: 9301
2). Since testing on windows I copied elasticsearch.bat to elasticsearchB.bat and added this at the top (linux has some different method for passing config directory). This allows the new batch file to use its own config directory while all other folders for ES remain the same (so upgrading ES will upgrade both instances):
SET ES_PATH_CONF=..\configB
3). started both instances of elastic search with elasticsearch.bat and elasticsearch2.bat
4). start single instance of kibana which points at 9200 by default
5). In kibana modify the cluster settings by running this in the dev tools:
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"CluserB": {
"seeds": [
"127.0.0.1:9301"
]
}
}
}
}
}
4). Created data in the new ES cluster (PUT /batch/_doc/1 {...} )
5). In kibana create a new index pattern and refer to the remote cluster and index like this ClusterB:batch
6). Create a dashboard using the new remote index pattern

How can access from Kibana several Elasticsearch instances in different machines?

I would like to have two Eleasticsearch instances in different machines accessible from the same Kibana instance.
Something like this:
Do you know how could I do it?
My first idea is to create a cluster with two nodes, how could I create a cluster with nodes with different machines?
Which parameter should I change from Elasticsearch config file ?
ElasticSearch contains Discovery Module:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html
by default Multicast discovery is used. This means ES will be searching across your network any another ES instances ( in common terms). You can read more about of discovery types supported in article above.
Also you can manually specify hosts, that should be in a cluster:
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "host1:9300", "host2:9300" ... ]
You have to define
discovery.zen.ping.unicast.hosts: ["192.168.45.21", "192.168.45.22"]
An example is described here
Detailed configuration info should be here

Is ElasticSearch safe to allow every node to join cluster?

ElasticSearch open port 9300 for node-to-node communication, and every machine in the same network with same cluster.name can auto join this cluster?
I doubt is it safe to allow every node to join?
If not, do I need to set network.host to a fixed ip address? Or is there a better way?
It really depends on the networking stack of your nodes and how you interact with your cluster. If they are all running on a local network, inaccessible from the outside, then in general, allow other nodes to join freely is OK since it means someone from inside your network is trying to join.
However, if your nodes have a public IP address, it's a good idea to change the default ports used, disable Zen multicast discovery, and give each node a list of the other nodes that are allowed to communicate with it.
Straight from the elasticsearch.yml file :
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["enter_ip_here","enter_other_ip:port","etc..."]
Note that these settings needs to be the same on all nodes (except for the list of hosts obviously) and a restart of the node is required for these to be taken into account.
Also, you can indeed set the network.host to a fixed IP. This IP should be the one appearing in the list of discovery.zen.ping.unicast.hosts.

transport_address not matching unicast host values in elasticsearch

Hi Elastic Search users,
We are seeing a rather strange issue. I have reviewed the email archives and I do not see this issue addressed already. We have discovery.zen.ping.multicast.enabled: false in our elasticsearch.yml. The cluster comes up and the state is green, and the nodes are aware of each other.
The strangeness is that the transport_address value returned in the cluster state query does not match the values in discovery.zen.ping.unicast.hosts - it is using a different interface on the machines.
Does anyone have any insight?
Values in discovery.zen.ping.unicast.hosts are used only for initial discovery. In other words they are used by a node to find other nodes in the cluster. An interface that a particular node is binding to or publishing for other nodes to use doesn't depend on discovery.zen.ping.unicast.hosts but instead it is controlled by network.host, network.bind_host and network.publish_host settings. See the network section of Elasticsearch guide for more information.

Resources