I remember the question asked on this forum about multiple shards per module how-does-multiple-shards-per-module-support-works-in-odl-nitrogen
The answer was that MD-SAL really uses only the first shard to start transactions for the module.
Can it be used for splitting a module among different cluster nodes? If on the first node module default is configured to have two shards default-1 and default-2, but on the second node it is configured to have only default-2 shard, it looks like we may have two leaders for the same namespace (on node 1 it will be default-1 and on node 2 it will be default-2). It will be very desirable but is it possible?
Is it possible to configure a module differently on different nodes?
It may be possible to configure it in that manner but not sure why it would be desirable. Also only 1 shard per module is supported so no point in defining default-1 and default-2 on node 1. If the purpose is for each node to maintain its own local copy of the data in the default space, then that can be achieved by configuring only the local node as a replica.
Related
I was exploring NiFi documentation. I must agree that it is one of the well documented open-source projects out there.
My understanding is that the processor runs on all nodes of the cluster.
However, I was wondering about how the content is distributed among cluster nodes when we use content pulling processors like FetchS3Object, FetchHDFS etc. In processor like FetchHDFS or FetchSFTP, will all nodes make connection to the source? Does it split the content and fetch from multiple nodes or One node fetched the content and load balance it in the downstream queues?
I think this document has an answer to your question:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
For other file stores the idea is the same.
will all nodes make connection to the source?
Yes. If you did not limit your processor to work only on primary node - it runs on all nodes.
The answer by #dagget has traditionally been the approach to handle this situation, often referred to as the "list + fetch" pattern. List processor runs on Primary Node only, listings sent to RPG to re-distribute across the cluster, input port receives listings and connect to a fetch processor running on all nodes fetching in parallel.
In 1.8.0 there are now load balanced connections which remove the need for the RPG. You would still run the List processor on Primary Node only, but then connect it directly to the Fetch processors, and configure the queue in between to load balance.
I have been reading up on ES Cluster design and have started to design the cluster we need. Please can someone clarify some of the things that are still not clear to me?
So we want to start off with 3 servers.
At the beginning we will have all three as Master, Data and Ingest with minimum two master. This basically means, we are sticking to defaults.
Question 1 is - What are data nodes exactly? Is full index replicated across other data nodes? So if one goes down, in our case the third one should be promoted to master server and the cluster should function.
Found this link Shards and replicas in Elasticsearch and it explains what data nodes are. So basically if our index has 12 shards, it might be that ES will store 4 primary shards on each data node and 8 replicas. Is this correct?
Question 2: With this as starting point, can we add more servers to function as data nodes, ingest nodes etc.
Question 3: We have setup a load balancer in front of the ES nodes, is this the recommended way of accessing ES Clusters over 9200. When ingesting, should this address be used and it will randomly be routed to an ingest node. When querying it should route to a random ES node that can handle searches.
What are data nodes exactly?
Disks for the shards.
Is full index replicated across other data nodes?
Yes, replica means availability as well, getting the concept of shards is key to understand this and don't get confused.
in our case the third one should be promoted to master server and the cluster should function.
Yes, read about the green, yellow and red statuses, in this case, it will turn from green to yellow, it means is still functioning but actions required, but read about "master eligibility" and also, avoid split brain, very important. https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#master-node
With this as starting point, can we add more servers to function as data nodes, ingest nodes etc.
as many as you want, what is the app requirement? high read low write? vice-versa? equals? define how do you want to grow the cluster depending on the use case.
Question 3: We have setup a load balancer in front of the ES nodes, is this the recommended way of accessing ES Clusters over 9200. When ingesting, should this address be used and it will randomly be routed to an ingest node. When querying it should route to a random ES node that can handle searches.
If it is, for instance, a nginx, it works because I have done it, have a clear understanding on the concept of the nodes roles, for example, the "coordinating node" would handle some process flow that some requests might require and nginx is not aware of.
IMO now that you have the instances, it is a great opportunity for you to learn-by-doing and experiment with them, so move the configs, try to reproduce the problems your app might have and see what happens, aha!moments will happen and full grasp is gotten here.
I currently have single node for elasticsearch in a windows server. Can you please explain how to add one extra node for failover in different machine? I also wonder how two nodes can be kept identical using NEST.
Usually, you don't run a failover node, but run a cluster of nodes to provide High Availability.
A minimum topology of 3 master eligible nodes with minimum_master_nodes set to 2 and a sharding strategy that distributes primary and replica shards over nodes to provide data redundancy is the minimum viable topology I'd consider running in production.
I am new to Elastic Search and I am trying to learn as much as I can about Elasticsearch.
I have a cluster having a single node. Is it possible for me to create multiple instances of Elasticsearch on the single node present in my Cluster?
Due to some reason, i cannot add another node to my cluster, so is it possible to install another instance of Elasticsearch on the same node and treat it as a separate node to create replicas on it?
Basically what I am asking is can I install multiple instances of Elasticsearch on a single node and treat those instances as a separate node to install replica on it?
Yes, that's definitely possible.
However, you need to make sure to configure both nodes properly (i.e. have separate data folders, different http/tcp ports, etc) and equally share the available CPU/RAM/HDD resources among both nodes and still leave some RAM for the OS.
Also note that it is strongly discouraged to run your whole cluster on a single node. If physical node was to crash for some reason, you'd end up with no ES cluster at all. But for learning purposes it is perfectly ok to do it in order to experiment shard allocation, etc.
to achieve this you have to configure unique http.port & transport.tcp.port in elastic instances
Problem descriptions:
- Multiple machines producing logs.
- On each machine we have logstash which filters the log files and sends them to a local elasticsearch
- We would like to keep the machines as separate as possible and avoid intercommunication
- But we would also like to be able to visualize all of these logs with a single Kibana instance
Approaches:
Make each machine a single node ES cluster, and have one of the machines as a tribe node with Kibana installed on this machine (of course with avoiding indices conflict)
Make all machines (nodes) part of a single cluster with each node writing to unique index of one shard and statically map each shard to its node, and finally of course having one instance of kibana for the cluster
Question:
Which approach is more appropriate for the described scenario in terms of: limiting inter machine communications, cluster management, and maybe other aspects that I haven't think about ?
Tribe node is there because of this requirements. So my advice to use the Tribe node setup.
With the second option;
There will be a cluster but you will not use its benefits (replica shards, shard relocation, query performance, etc)
Benefits mentioned above will be pain points that will generate configuration complexity and troubleshooting hell.
Besides the shard allocation and node communication there will be other things to configure that nodes will have when they are in a cluster.