Which elasitcsearch node should i query from my application - elasticsearch

If I were to set up my cluster on elastic with 3 master node and 5 to 10 data nodes which node IP address should I actually use in my application to query elastic. I am following Hot warm architecture for elastic but from what I have understood is the master node should always be responsible for handling an incoming request and then coordinating that request to further node in the cluster and to operate on the final response.
So should I only use master node IP addresses in my application to talk with the cluster?

First of all, you shouldn't be using individual IP to connect to a cluster as that can potentially become your single point of failure, if the node goes down. You should have a load balancing URL that connects to data nodes or coordinator nodes to aid your search.
Also, it looks like, you are having dedicated master nodes. Typically for larger size cluster, its not recommended to use master as the search coordinator and should ideally have them in master eligible only role to ensure cluster stability. So you will be left with option of using either data nodes or coordinator only nodes to accept your search requests.
If you are using clients like JEST, NEST etc and not directly using the http endpoint for _search, then you also have option to provide a list of IPs/hostname to form a connection pool.

Like #askids mentioned, always connect to elasticsearch using the standard. Elastic itself provides clients.
https://www.elastic.co/guide/en/elasticsearch/client/index.html
You have not mentioned the clients you are going to be using. If your client is based on Java, use the Elasticsearch's Low-Level or High Level Rest Client. These clients are wrappers on apache http client and provide you all the boilerplate logic of handling connections and other features.
You can also add Sniffer support to it.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/sniffer.html

Related

Curl call for PUT and GET query on multi-node es cluster

I would like to know on a multi-node Elasticsearch cluster (3 nodes), to which node we can send curl call to fetch some results (by running query)?
If we can use any node IP what is can be the best practice? , for example, if
I am using node 1's URL from "node 1, node 2, and node 3", let's say node 1 goes down, I have to manually update the query URL to "node 2 or node 3" is their way so that I can have one centralized URL which does itself.
Do I have to manually do it using Nginx or load balancer, Or there is something in the elastic search itself
Although in ES if you send the request to any node, that is part of a valid ES cluster, it will route the request internally and provide you the result.
But You shouldn't use the directly node ip to communicate with the Elasticsearch for obvious reasons and one of that you already mentioned. You can use the load balancer, ngnix or DNS for your Elasticsearch cluster.
But if you are accessing it programmatically you don't need this also, while creating the Elasticsearch clients, you can specify all the nodes ip in Elasticsearch client, this way even when some nodes are down still your request will not fail.
RestClientBuilder restClientBuilder = RestClient.builder(
new HttpHost(esConfig.getHost(), esConfig.getPort()), new HttpHost(esConfig.getHost2(), esConfig.getPort2()));
As you can see i created my Elasticsearch client(works with Elasticsearch 8.5) WITH two Elasticsearch hosts.

How to know total nodes in an elasticsearch cluster?

I have 3 nodes elasticsearch cluster. If more than one node goes down then I can easily check them manually. Suppose nodes in the cluster got increased then it will be difficult to check them manually. So, how can I get all the nodes(specifically name of the nodes) of the cluster even if they are down?
To get live/healthy nodes I hit the api endpoint:
curl -X GET "hostname/ip:port/_cat/nodes?v&pretty"
Is there any endpoint by using which I can get total nodes and unhealthy/down nodes in elasticsearch cluster?
I was trying to list all the nodes using discovery.seed.hosts present in elasticsearch.yml config file. But I don't know how to do it or is it the right approach or not.
I don't think there is any API to know about offline nodes. If your entire cluster is down or single node down, then Elastic doesn't provide any way to check the node's health. You need to depend on an external script or code or monitoring tool which will ping all your nodes and print status.
You can write a custom script which will call below API and it will return all the nodes which are available in the cluster. Once you have received response, you can filter out IP or hostname of the node and whichever are not coming in response you can consider it as down node.
GET _cat/nodes?format=json&filter_path=ip,name
Another option is to enable cluster monitoring which will give you status of entire cluster but again it will show information about running node only.
Please check this answer for how Kibana show offline node in Cluster Monitoring.

migrate indexes from old version of elasticsearch to elasticsearch 7.9

we want to upgrade our elasticsearch version from 5.6 to 7.9 in our project.
I have to migrate our indexes and docs to new version but I cant use reindex, So I rest high level client to connect to elasticsearch 7 and use http request for elasticsearch 5.
For migration I get part of docs with match_all query and scroll from old version and index them in new elasticsearch with bulk request.
our old version elasticsearch has 3 node.My question is that I have to send request to all node separately and process docs or if I send match_all query search to one node it will be handled by elsaticsearch (I read sth about cordinating node that handle requests and Every node is implicitly a coordinating node cordinating node.) or I have to send request to data node
Adding more details to #saeednasehi answer, Looks like you are getting confused about how Elasticsearch and its queries work internally, please refer to my answer to how search queries works in elasticsearch.
Apart from this while it's true, you can get data by connecting to any node, but in your ES client(JHLRC or HTTP) you should mention all the nodes IP, so that your request(note coordinating) load is distributed among all the data nodes, if you just give one node-IP, than that node always acts as a co-ordinating node in absence of dedicated coordinating node(default).
When you start a cluster of elsticsearch you can see all of the cluster as a single data base. it means that you can fetch and insert to all of the cluster by sending your request to one of them. You just need to send your request to a node and fetch your data.

Adding cluster to existing elastic search in elk

Currently I have existing
1. Elastic search
2. Logstash
3. Kibana
I have existing data on them.
Now i have setup ELK cluster with 3 Master nodes , 5 data nodes 3 client nodes.
But i am not sure how can i get existing data into them.
Is it possible that if i make the existing ES node as data node and then attach it to the cluster . Then will that data gets replicated to other data nodes as well? and then take that node offline
Option 1
How about just try with fewer nodes? It is not hard to test if it is supported if you setup one node, feed some data, and add one more and configure them as a cluster to see if data get synchronized.
Option 2
Another option is to use an elasticsearch migration tool like https://github.com/taskrabbit/elasticsearch-dump, basically, you could setup a clean cluster and migrate all your data in old node to this cluster.

Clustering in Elasticsearch

I have implemented clustering using Elasticsearch. ElasticHead UI displays detected nodes.
However I am not sure how it works. Any one could please provide me a link/direction that shows how clustering works with elasticsearch?
ElasticSearch uses multicasting to see if there are other nodes with same cluster name present in the network.
If there is such a node it connects to it.
It shares it data with it depending upon the shard configuration.
http://www.elasticsearch.org/guide/reference/modules/discovery/zen.html
Read the above to get the full idea

Resources