How to know total nodes in an elasticsearch cluster? - elasticsearch

I have 3 nodes elasticsearch cluster. If more than one node goes down then I can easily check them manually. Suppose nodes in the cluster got increased then it will be difficult to check them manually. So, how can I get all the nodes(specifically name of the nodes) of the cluster even if they are down?
To get live/healthy nodes I hit the api endpoint:
curl -X GET "hostname/ip:port/_cat/nodes?v&pretty"
Is there any endpoint by using which I can get total nodes and unhealthy/down nodes in elasticsearch cluster?
I was trying to list all the nodes using discovery.seed.hosts present in elasticsearch.yml config file. But I don't know how to do it or is it the right approach or not.

I don't think there is any API to know about offline nodes. If your entire cluster is down or single node down, then Elastic doesn't provide any way to check the node's health. You need to depend on an external script or code or monitoring tool which will ping all your nodes and print status.
You can write a custom script which will call below API and it will return all the nodes which are available in the cluster. Once you have received response, you can filter out IP or hostname of the node and whichever are not coming in response you can consider it as down node.
GET _cat/nodes?format=json&filter_path=ip,name
Another option is to enable cluster monitoring which will give you status of entire cluster but again it will show information about running node only.
Please check this answer for how Kibana show offline node in Cluster Monitoring.

Related

Curl call for PUT and GET query on multi-node es cluster

I would like to know on a multi-node Elasticsearch cluster (3 nodes), to which node we can send curl call to fetch some results (by running query)?
If we can use any node IP what is can be the best practice? , for example, if
I am using node 1's URL from "node 1, node 2, and node 3", let's say node 1 goes down, I have to manually update the query URL to "node 2 or node 3" is their way so that I can have one centralized URL which does itself.
Do I have to manually do it using Nginx or load balancer, Or there is something in the elastic search itself
Although in ES if you send the request to any node, that is part of a valid ES cluster, it will route the request internally and provide you the result.
But You shouldn't use the directly node ip to communicate with the Elasticsearch for obvious reasons and one of that you already mentioned. You can use the load balancer, ngnix or DNS for your Elasticsearch cluster.
But if you are accessing it programmatically you don't need this also, while creating the Elasticsearch clients, you can specify all the nodes ip in Elasticsearch client, this way even when some nodes are down still your request will not fail.
RestClientBuilder restClientBuilder = RestClient.builder(
new HttpHost(esConfig.getHost(), esConfig.getPort()), new HttpHost(esConfig.getHost2(), esConfig.getPort2()));
As you can see i created my Elasticsearch client(works with Elasticsearch 8.5) WITH two Elasticsearch hosts.

Which elasitcsearch node should i query from my application

If I were to set up my cluster on elastic with 3 master node and 5 to 10 data nodes which node IP address should I actually use in my application to query elastic. I am following Hot warm architecture for elastic but from what I have understood is the master node should always be responsible for handling an incoming request and then coordinating that request to further node in the cluster and to operate on the final response.
So should I only use master node IP addresses in my application to talk with the cluster?
First of all, you shouldn't be using individual IP to connect to a cluster as that can potentially become your single point of failure, if the node goes down. You should have a load balancing URL that connects to data nodes or coordinator nodes to aid your search.
Also, it looks like, you are having dedicated master nodes. Typically for larger size cluster, its not recommended to use master as the search coordinator and should ideally have them in master eligible only role to ensure cluster stability. So you will be left with option of using either data nodes or coordinator only nodes to accept your search requests.
If you are using clients like JEST, NEST etc and not directly using the http endpoint for _search, then you also have option to provide a list of IPs/hostname to form a connection pool.
Like #askids mentioned, always connect to elasticsearch using the standard. Elastic itself provides clients.
https://www.elastic.co/guide/en/elasticsearch/client/index.html
You have not mentioned the clients you are going to be using. If your client is based on Java, use the Elasticsearch's Low-Level or High Level Rest Client. These clients are wrappers on apache http client and provide you all the boilerplate logic of handling connections and other features.
You can also add Sniffer support to it.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/sniffer.html

Adding cluster to existing elastic search in elk

Currently I have existing
1. Elastic search
2. Logstash
3. Kibana
I have existing data on them.
Now i have setup ELK cluster with 3 Master nodes , 5 data nodes 3 client nodes.
But i am not sure how can i get existing data into them.
Is it possible that if i make the existing ES node as data node and then attach it to the cluster . Then will that data gets replicated to other data nodes as well? and then take that node offline
Option 1
How about just try with fewer nodes? It is not hard to test if it is supported if you setup one node, feed some data, and add one more and configure them as a cluster to see if data get synchronized.
Option 2
Another option is to use an elasticsearch migration tool like https://github.com/taskrabbit/elasticsearch-dump, basically, you could setup a clean cluster and migrate all your data in old node to this cluster.

Remove of data folder is not synced in Elasticsearch upon index delete

We have an ES cluster with 2 nodes. When we delete an index not all folders in the cluster (on filesystem) are deleted which causes some problems when restarting one server.
Then our deleted indices gets distributed with some weird state indicating that the cluster health is not green.
Example. We delete index with name someIndex and after deletion we check file system, one can see this:
Node1
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\
Node2
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\someIndex (<-- still present)
Anyone know whats causing this?
ES-version: 0.90.5
There are two nodes directories for each on your filesystem (these are nodes\0 and nodes\1).
When you start Elasticsearch, you start up a node (in ES-lingo). Your machine can host multiple nodes, which happens if you start Elasticsearch multiple times. The default settings for the http port is 9200-9300, that means, ES is looking for a free port in that range and binds its node to it (the same is true for the transport module with 9300-9400)
So, if you start an ES process while another is still running, that is, it's bound to a port, you start a second node and ES will create a new directory for it. Maybe this has happened if you issued a restart, but ES couldn't shut down in time before the new node started up.
But now you have a third node in your cluster and ES will assign shards to it. Then you do a cluster restart or something similar and you start one node on each of your machine. ES cannot find the shards that were assigned to the third node, because it's not spun up, and it will show you a red or yellow state, depending on what shards live on the third node. If you delete you index data, you won't delete the data from this missing node.
If you don't care about the data, you can just shutdown ES and delete these directories or start two ES nodes on each of your machines and then delete the index again.
Then you could change the port settings to one specific port, that would prevent second processes from starting up, since they won't be able to bind to a free port.

Clustering in Elasticsearch

I have implemented clustering using Elasticsearch. ElasticHead UI displays detected nodes.
However I am not sure how it works. Any one could please provide me a link/direction that shows how clustering works with elasticsearch?
ElasticSearch uses multicasting to see if there are other nodes with same cluster name present in the network.
If there is such a node it connects to it.
It shares it data with it depending upon the shard configuration.
http://www.elasticsearch.org/guide/reference/modules/discovery/zen.html
Read the above to get the full idea

Resources