I am new to elasticsearch. I have a cluster with 3 nodes on a same machine. To access each node I have separate url as the port changes(localhost:9200, localhost:9201, localhost:9202).
Now the question I have is that suppose my node 1(i.e. master node) dies then elasticsearch engine handle the situation very well and makes node 2 as master node but how does my application know that a node died and now I need to hit node 2 with port 9201?
Is there a way using which I always hit a single URL and internally it figures out which node to hit?
Thanks,
Pratz
The client search nodes with a discovery module. The name of the cluster in your clients configuration is important to get this working.
With a correct configuration (on client and cluster) you can bring a single node down without any (negative) effect on your client.
See the following links:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html
Related
We have a small Elasticsearch cluster for 3 nodes: two in one datacenter and one in another for disaster recovery reasons. However, if the first two nodes fail simultaneously, the third one won't work either - it will just throw "master not discovered or elected yet".
I understand that this is intended - this is how Elasticsearch cluster should work. But is there some additional special configuration that I don't know to keep the third single node working, even if in the read-only mode?
nope, there's not. as you mentioned it's designed that way
you're probably not doing yourselves a lot of favours by running things across datacentres like that. network issues are not kind on Elasticsearch due to it's distributed nature
Elasticsearch runs in distributed mode by default. Nodes assume that there are or will be a part of the cluster, and during setup nodes try to automatically join the cluster.
If you want your Elasticsearch to be available for only node without the need to communicate with other Elasticsearch nodes. It works similar to a standalone server. To do this we can tell Elasticsearch to work in local only (disable network)
open your elasticsearch/config/elasticsearch.yml and set:
node.local: true
Hi I am using NiFi DistributedMapCacheServer to keep track of processed files in my flow. The issue is that we are working in a cluster and to leverage it we are using load balancing in queues so Flowfiles are not on the same node. Once they are arriving to Put/GetDistributedMapCache that is using DistributedMapCacheClient with fixed name of one of the hosts it only works when arriving Flowfile is on the same node as the one specified in DistributedMapCacheClient- for others we are getting:
FetchDistributedMapCache[id=d4713096-5ae5-1cb4-b777-202948e39e50] Unable to communicate with cache when processing StandardFlowFileRecord[uuid=5b1e8092-5bc5-4213-97a3-fa023691973f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1587393798960-14, container=default, section=14], offset=983015, length=5996],offset=0,name=bf15d684-4100-4aa5-9fb5-fa0ddb21b140,size=5996] due to No route to host: java.net.NoRouteToHostException: No route to host
Is there any way to set up DMC server/client to work in such case, or can I somehow route all flowfiles to explicitly given node?
This means the hostname/ip-address that you specified in the DistributedMapCacheClient for the location of the server is unreachable by the other nodes in your cluster. Your nodes must be able to communicate since you have a cluster, so you just need to set this to the correct value.
I have couple of Elasticsearch questions regarding client node:
Can I say: any nodes as long as they are opening HTTP port, I can treat them as "client" nodes, because we can do search/index through this node.
Actually we treat the node as client node when the cluster=false and data=false, if I set up 10 client nodes, do I need to route in my client side, I mean if I specify clientOne:9200 in my code as ES portal, then would clientOne forward other HTTP requests to other client nodes, otherwise, clientOne would be under very high pressure. i.e do they communicate with each other between client nodes?
When I specify client nodes in ES cluster, should I close other nodes' HTTP port? Because we can only query client nodes.
Do you think it's necessary to set up both data node and client node in the same machine, or just setup data node acts as client node as well, anyways it's in the same machine?
If the ES cluster would be heavily/frequently indexed while less searched, then I don't have to set up client node, because client node good for gathering data, right please?
For general search/index purpose should I use http port or tcp port, what's the difference in clients perspective please?
Yes, you can send queries via http to any node that has port 9200 open.
With node.data: false and node.master: false, you get a "client node". These are useful for offloading indexing and search traffic from your data nodes. If you have 10 of them, you would want to put a load balancer in front of them.
Closing the data node's http port (http.enabled: false) would keep them from serving client requests (probably good), though it would also prevent you from curl'ing them directly for stats, etc.
Client nodes are useful (see #2), so I wouldn't route traffic directly to your data nodes. Whether you run both a client and data node on the same piece of hardware would be dependent on the config of that machine (do you have sufficient RAM, etc).
Client node are also useful for indexing, because they know which data node should receive the data for storage. If you sent an indexing request to a random data node instead, the odds would be high that it would have to redirect that request to another node. That's a waste of time and resources, if you can create client nodes.
Having your clients join the cluster might give them access to more information about the cluster, but using http gives them a more generic "black box" interface. With http, you also don't have to keep your clients at the same version as your ES nodes.
Hope that helps.
It seems that for every node in the cluster, you can curl port 9200 to call most of the REST api. I wonder, is there any operation (or an extensive list of such operations) that can only be executed by accessing the master node?
There is no operation that can only be asked to the master node, becaue when you ask something to any node it will route it to the correct node. So if you ask something that should be traited by the master node to a non master node, it will be be routed by the non master node to the master node. This feature allow Elasticsearch to elect a new master node without breaking any code!
Let's say I have 3 nodes. 1 of which is the master.
I have an API (running on another machine) which hits the master and gets my search result. This is through a subdomain, say s1.mydomain.com:9200 (assume the others are pointed to by s2.mydomain.com and s3.mydomain.com).
Now my master fails for whatever reason. How would my API recover from such a situation? Can I hit either S2 or S3 instead? How can I figure out what the new master is? Is there a predictable way to know which one would be picked as the new master should the master go down?
I've googled this and it's given me enough information about how when a master goes down, a failover is picked as the new master but I haven't seen anything clarify how I would need to handle this from the outside looking in.
The master in ElasticSearch is really only for internal coordination. There are no actions required when a node goes down, other than trying to get it back up to get your full cluster performance back.
You can read/write to any of the remaining nodes and the data replication will keep going. When the old master node comes back up, it will re-join the cluster once it has received the updated data. In fact, you never need to worry if the node you are writing on is the master node.
There are some advanced configurations to alter these behaviors, but ElasticSearch comes with suitable defaults.