elasticsearch node.ROLE configuration - elasticsearch

I am getting started with ELK, and I already have a question about configuring elasticsearch node. Ingest node especially is not clear to me. According to the docs, all three "roles" (master, data and ingest) are set to true by default. I understand it is for creating a singe node cluster, with all elasticsearch in one machine.
So if I want to have say a 3 node cluster (1 master + 2 workers), should I just set the values I don't want to false? Something like this:
MASTER
node.name: master
node.data: false
node.ingest: ?
WORKERS
node.name: data-x
node.master: false
node.ingest: ?
In this situation, where I don't have a dedicated ingest node, where should it go? It makes more sense to me to put them on data nodes, but I am not sure if this is the right assumption.

You may not need a dedicated ingest node. If there is considerable data transformation/enrichment going on ( not just analyzing and indexing JSON document) - dedicated ingest node might help. This blog will give you more information.

Related

Elasticsearch cluster in AWS ECS

I'm trying to create an elasticsearch cluster in AWS ECS but i'm getting the warn "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes. My elasticsearch.yml and task definition are the same for all nodes. How can i differentiate between the master and the other nodes ? Should i have a separate elasticsearch.yml/task definition for master node ?
My elasticsearch.yml :
cluster.name: "xxxxxxxxxxx"
bootstrap.memory_lock: false
network.host: 0.0.0.0
network.publish_host: _ec2:privateIp_
transport.publish_host: _ec2:privateIp_
discovery.seed_providers: ec2
discovery.ec2.tag.project: xxxxxxx-elasticsearch
discovery.ec2.endpoint: ec2.${REGION}.amazonaws.com
s3.client.default.endpoint: s3.${REGION}.amazonaws.com
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
xpack.security.enabled: false
I have faced the similar problem as well. Firstly, You need to create a initial cluster and make it ready to form a cluster. It is possible to start by using a inital node configuration on elasticsearch.yml. The solution I am using is to host on one ECS instance running with one elasticsearch docker container (As elasticsearch requires good amount of memory)
cluster.initial_master_nodes: '<<INITIAL_NODE_IPADDRESS>>'
This above configuration kickstarts the cluster that means elasticsearch is ready to join the nodes. In the next step Add the below configuration
cluster.initial_master_nodes: [<<MASTER_NODE_IPADDRESS>>,<<INITIAL_NODE_IPADDRESS>>]
discovery.seed_hosts: [<<MASTER_NODE_IPADDRESS>>,<<INITIAL_NODE_IPADDRESS>>]
Then you can add as many number of data nodes as you want. This depends on how much data you have.
Note: The IPADDRESS are from different nodes so use AWS SSM Parameter store to store IP securely and use engtrypoint.sh to get those and update the elasticsearch.yml file dynamically when you are building the docker images.
I hope this will solve the problem.

What does discovery.seed_hosts and cluster.initial_master_nodes mean in ES

I am using ES 7.10.1, and I am reading https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#unicast.hosts.
I have 5 master nodes (node.master: true node.data: false) and 20 data nodes(node.master: false node.data: true).
I got following four questions:
Should both discovery.seed_hosts and cluster.initial_master_nodes be specified with the master nodes? I mean, could I specify the data node for these two configurations?
Since I have 5 master nodes in my case, how many nodes should I specify for these two configurations. I think I don't have to list all of these 5 nodes in these configurations?
It looks to me that discovery.seed_hosts is like old version elastic search's discovery.zen.ping.unicast.hosts?
It looks to me that cluster.initial_master_nodes is like old version elastics search's discovery.zen.minimum_master_nodes?
Thanks!
It is recommended to add the setting in at least 3 master nodes for fault tolerance.
I would put all 5 to avoid confusion
Yes it is but will still work for legacy
Yes it is but will still work for legacy
Remember that cluster.initial_master_nodes is used once after the cluster bootstrapping then will be ignored and it is recommended to remove it from the configuration.
Cluster bootstrapping
Cluster Settings
In short discovery.seed_hosts is the list of master nodes a new node uses to join the cluster, and cluster.initial_master_nodes is the initial list to bootstrap a cluster.

How to check/make sure of Elasticsearch load balancer?

Using Nest,Asp.net Core 3.1 and Elasticsearch , I have created a 3-nodes-Cluster, with default roles.
How could I check that the queries/search queries are balanced between my local machines?
I tried to monitor metrics of each server/node while indexing large data, and I saw that only nodes having related replica and primary shard were engaged during the large indexing process.
But I need to check and make sure that the requests are balanced/divided between my nodes in a round robin manner, but I do not know how to check that? Is there any way or any tools that I make sure that for example, at first search query node-1 is engaged and at second search query node-3 is engaged?
Any hint, keyword and any help is appreciable.
My each .KML configuration : (all 3 nodes are
cluster.name: my-cluster
node.name: node-1
network.host: 192.168.254.137
http.port: 9200
discovery.seed_hosts: ["192.168.254.137", "192.168.254.135", "192.168.254.136"]
cluster.initial_master_nodes: ["192.168.254.137", "192.168.254.135", "192.168.254.136"]
My index is distributed as below:
index shard prirep state docs store ip node
suggestionindex 0 p STARTED 2000 170.5kb 192.168.254.136 node-3
suggestionindex 0 r STARTED 2000 90.5kb 192.168.254.137 node-1
My appsettings.json :
"ElasticsearchSettings": {
// IP of one of the 3 master eligible nodes
"uri": "http://192.168.254.137:9200/",
"basicAuthUser": "",
"basicAuthPassword": ""
},
Does all the search queries send to primary shard (node-1) always?? or the search queries are balanced between node-1 and node-3 in my case?
If it is balanced, how can I check it?
Who balances it between nodes?? Nest or my Master node ?
Elasticsearch internally load-balance the queries on all the data-nodes, so you don't have to do anything from your side, if you are on Elasticsearch version 7.X, than elasticsearch uses the Smart load balancing technique called Adaptive replica selection before that by default it was based on round-robin technique.
Elastic Blog which I mentioned has all the details of its working.

Elasticsearch Unicast Weird Behavior in Clustering

I have two nodes each of which forms a cluster (with one empty node).
0.0.0.0:9200 (elasticsearch)
0.0.0.0:9201 (test-1)
Node at 9200 is in cluster elasticsearch (maybe default cluster.name). Node at 9201 is in cluster test-1. (Additionally, important or not, I bind network.hosts of both nodes to 0.0.0.0)
I want to join a new node to test-1. When I leave discovery.zen.ping.unicast.hosts setting commented out alone, the new node is successfully joined to test-1. However, When I set it something else, e.g., ["0.0.0.0"] or ["127.0.1"], it is failed to join...
Joining a new node to elasticsearch has no problem. ["0.0.0.0"], ["127.0.1"] and ["IP"] all worked well. (But ["0.0.0.0", "ANOTHER-IP"] failed... Please answer about this as well if possible...)
What causes this joining issue? Have anybody experienced problems like this?
The discovery.zen.ping.unicast.hosts should have the IPs of all the nodes joining the cluster. Do this for all the nodes in the cluster and use IPs not 0.0.0.0 or 127.0.0.1.
As your new node is trying to join the test-1 cluster you can try to change the port of the new node to 9201 and see if it joins.
The minimal things required to form a cluster:
Same cluster.name
Put different node.name
discovery.zen.ping.unicast.hosts - IPs of all the nodes in the cluster.
gateway.recover_after_nodes and discovery.zen.minimum_master_nodes - comment these lines if they are not already so for all the nodes of the cluster.
Lastly check your firewall settings and disable the firewall if necessary. Check if the nodes can talk to each other.

Relocating shards in elasticsearch

step 1) creating a node named "NODE1"
Step 2) creating new index in NODE1 named "application" and in index type as "testing"
step 3)index that created is with 5 shards. no replicas
Step 4)now i insert 5 doc in to index. it will splitted among 5 shards
Step 5)now i initiate new node called "NODE2" in NODE1's Cluster.
Step 6) as per my understanding it shared shards between nodes. So my 2 shards moved to new node
Question 1)now i request document at NODE1 that is present in relocated shards(shards that moved from NODE1 to NODE2)
Question 2) Will my search return my requested document or not
Question 3) how does two nodes communicate each other
Question 4) Can i read and write in NODE2 ? if yes can i search same data written by NODE2 from NODE1..
Thanks in Advance..!
All the answers are yes :)
The nodes communicate with each other through the transport port, by default 9300 port (or the first one free in the (9300-9400] range. They use a custom binary protocol to communicate, based on serialization of objects (not standard java serialization in most of the cases though).
Any node in a cluster is cluster-aware and knows where the shards are and so on, as they all share the so called cluster state. You can send requests (read and write) to any node and it will be rerouted to the interesting nodes and properly executed depending on the type of request.

Resources