Error in configuring elasticsearch cluster - elasticsearch

I'm configuring three node elasticsearch cluster. I'm getting following error while try to start first node using following
startup command
[cloud_user#mishai3c elasticsearch-6.2.4]$ ./bin/elasticsearch -d -p pid
error message
[2019-11-11T04:50:39,634][INFO ][o.e.b.BootstrapChecks ] [master] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-11-11T04:50:39,636][ERROR][o.e.b.Bootstrap ] [master] node validation exception
[1] bootstrap checks failed
[1]: max number of threads [3581] for user [cloud_user] is too low, increase to at least [4096]
[2019-11-11T04:50:39,666][INFO ][o.e.n.Node ] [master] stopping ...
I have tried to set up ulimit in /etc/security/limits.conf file by adding following line
#cloud_user hard nproc 4096
It's highly appriciate if anyone can help

After changing limit.conf file,I have checked max thread limit by running ulimit -u command in terminal it still show previous value then
Then I logout and log into server and run ulimit -u command then it show 4096.
Then I tried to start elasticsearch it works

Related

Elasticsearch uses more memory than JVM heap settings allow

The link here from the official elasticsearch documentation, mentioning that to limit elasticsearch memory use, you have to set the value of Xms and Xmx to the proper value.
Current setup is:
-Xms1g
-Xmx1g
On my server, I am using CentOS8, elastic search is using more memory than it is allowed in the JVM heap settings and causing the server to crash.
The following errors observed at the same time:
[2021-09-06T13:11:08,810][WARN ][o.e.m.f.FsHealthService ] [dev.localdomain] health check of [/var/lib/elasticsearch/nodes/0] took [8274ms] which is above the warn threshold of [5s]
[2021-09-06T13:11:20,579][WARN ][o.e.c.InternalClusterInfoService] [dev.localdomain] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2021-09-06T13:12:14,585][WARN ][o.e.g.DanglingIndicesState] [dev.localdomain] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
at the same times the following errors issued on /var/log/messages
Sep 6 13:11:08 dev kernel: out_of_memory+0x1ba/0x490
Sep 6 13:11:08 dev kernel: Out of memory: Killed process 277068 (elasticsearch) total-vm:4145008kB, anon-rss:3300504kB, file-rss:0kB, shmem-rss:86876kB, UID:1001
Am I missing some settings to limit elasticsearch memory usage?

elasticsearch file descriptors and vm warnings become errors when not binding to localhost?

For those landing on this post trying to resolve these errors for yourself (whereas I'm only asking about what they mean), I have provided my steps at the bottom of this post for getting rid of the errors.
Installed elasticsearch and can access from localhost, but having timeout issues when trying to connect from remote machine. Trying to fix throws error that before trying to fix, were just warnings.
This other post (Installed elastic search on server but cannot connect to it if from another machine) may be outdated as when inspecting elasticsearch.yml, there appears to be no network.bind_host variable and the post seems to be referring to an older version. But following it as much as possible, running
[me#mapr07 config]$ netstat -ntlp | awk '/[j]ava/'
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 127.0.0.1:9200 :::* LISTEN 21109/java
tcp6 0 0 ::1:9200 :::* LISTEN 21109/java
shows that we are binding to localhost and "if you are binding to localhost (i.e. 127.0.0.1), you can only accept connections from the localhost, not over the network."(https://stackoverflow.com/a/24057311/8236733) (I don't know much networking stuff).
From the elasticsearch docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#advanced-network-settings), tried setting the network.bind_host variable in $ES_HOME/config/elasticsearch.yml to 0.0.0.0 (like in the original post). However, after rersterting elasticsearch, we see the output:
[2018-05-15T14:49:54,395][INFO ][o.e.n.Node ] [TjtCCG8] starting ...
[2018-05-15T14:49:54,603][INFO ][o.e.t.TransportService ] [TjtCCG8] publish_address {127.0.0.1:9300}, bound_addresses {[::]:9300}
[2018-05-15T14:49:54,620][INFO ][o.e.b.BootstrapChecks ] [TjtCCG8] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-05-15T14:49:54,641][INFO ][o.e.n.Node ] [TjtCCG8] stopping ...
[2018-05-15T14:49:54,670][INFO ][o.e.n.Node ] [TjtCCG8] stopped
[2018-05-15T14:49:54,670][INFO ][o.e.n.Node ] [TjtCCG8] closing ...
[2018-05-15T14:49:54,701][INFO ][o.e.n.Node ] [TjtCCG8] closed
I don't get how this happens and I also don't get why it is even a real problem, since it previously threw this error as a warning when bound to localhost and would run anyway, eg.
[2018-05-15T15:01:32,017][INFO ][o.e.n.Node ] [TjtCCG8] starting ...
[2018-05-15T15:01:32,283][INFO ][o.e.t.TransportService ] [TjtCCG8] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-05-15T15:01:32,303][WARN ][o.e.b.BootstrapChecks ] [TjtCCG8] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-05-15T15:01:32,304][WARN ][o.e.b.BootstrapChecks ] [TjtCCG8] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-05-15T15:01:35,372][INFO ][o.e.c.s.MasterService ] [TjtCCG8] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {TjtCCG8}{TjtCCG8LQOWOE2HNFdcLxA}{ik_q1XYnTk-BBJXcBMNK_A}{127.0.0.1}{127.0.0.1:9300}
My question is, why does this happen? In both cases there appears to be the same number of file descriptors and vm space being used, so why is it that when binding to 0.0.0.0 that amount is no longer just a warning level but an error?
For those landing on this post trying to resolve these errors for yourself (whereas I'm only asking about what they mean), I have provided my steps below for getting rid of the errors.
Doing some quick googling, the docs advise running the command
sudo su
ulimit -n 65536
su elasticsearch (dont exit *back* to elasticsearch session, since new limit only applies to current session (which ends if you exit))
to address the max file descriptors [4096] for elasticsearch process is too low error, since
Elasticsearch uses a lot of file descriptors or file handles. Running
out of file descriptors can be disastrous and will most probably lead
to data loss. Make sure to increase the limit on the number of open
files descriptors for the user running Elasticsearch to 65,536 or
higher.
(https://www.elastic.co/guide/en/elasticsearch/reference/current/file-descriptors.html#file-descriptors). Secondly, to address the max virtual memory error, run
(sudo) sysctl -w vm.max_map_count=262144
on linux systems, since
The default operating system limits on mmap counts is likely to be
too low, which may result in out of memory exceptions
(https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html#vm-max-map-count).
Since elasticsearch when bootstrap, it will check the bind address whether is loopback address(127.0.0.1), if it is a loopback address it will ignore the bootstrap check errors.
Reference:
BootstrapChecks.enforceLimits

elasticsearch: max file descriptors [1024] for elasticsearch process is too low, increase to at least [65536]

When I tried to run the logging aggregation I found out the following error generated by elasticsearch:
[2018-02-04T13:44:04,259][INFO ][o.e.b.BootstrapChecks ] [elasticsearch-logging-0] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max file descriptors [1024] for elasticsearch process is too low, increase to at least [65536]
[2018-02-04T13:44:04,268][INFO ][o.e.n.Node ] [elasticsearch-logging-0] stopping ...
[2018-02-04T13:44:04,486][INFO ][o.e.n.Node ] [elasticsearch-logging-0] stopped
[2018-02-04T13:44:04,486][INFO ][o.e.n.Node ] [elasticsearch-logging-0] closing ...
[2018-02-04T13:44:04,561][INFO ][o.e.n.Node ] [elasticsearch-logging-0] closed
[2018-02-04T13:44:04,564][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
BTW I am running a kubernetes cluster v1.8.0 on minions and 1.9.0 on masters using cri-containerd on Ubuntu machines 16.04.
Any help will be appreciated.
This happens mostly when you are running elasticsearch as a single node
Add the following the elasticsearch.yml to get rid of this
discovery.type to single-node
and comment out if your configuration has the following
#cluster.initial_master_nodes: ["node-1", "node-2"]
Hope this helped.

Percona Xtradb Cluster nodes won't start

I setup percona_xtradb_cluster-56 with three nodes in the cluster. To start the first cluster, i use the following command and it starts just fine:
#/etc/init.d/mysql bootstrap-pxc
The other two nodes however fail to start when i start them normally using the command:
#/etc/init.d/mysql start
The error i am getting is "The server quit without updating the PID file". The error log contains this message:
Error in my_thread_global_end(): 1 threads didn't exit 150605 22:10:29
mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended.
The cluster nodes are running all Ubuntu 14.04. When i use percona-xtradb-cluster5.5, the cluster ann all the nodes run just fine as expected. But i need to use version 5.6 because i am also using GTID which is only available in version 5.6 and not supported in earlier versions.
I was following these two percona documentation to setup the cluster:
https://www.percona.com/doc/percona-xtradb-cluster/5.6/installation.html#installation
https://www.percona.com/doc/percona-xtradb-cluster/5.6/howtos/ubuntu_howto.html
Any insight or suggestions on how to resolve this issue would be highly appreciated.
The problem is related to memory, as "The Georgia" writes. There should be at least 500MB for default setup and bootstrapping. See here http://sysadm.pp.ua/linux/px-cluster.html

Cloudera installation dfs.datanode.max.locked.memory issue on LXC

I have created virtual box, ubuntu 14.04LTS environment on my mac machine.
In virtual box of ubuntu, I've created cluster of three lxc-containers. One for master and other two nodes for slaves.
On master, I have started installation of CDH5 using following link http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
I have also made necessary changes in the /etc/hosts including FQDN and hostnames. Also created passwordless user named as "ubuntu".
While setting up the CDH5, during installation I'm constantly facing following error on datanodes. Max locked memory size: dfs.datanode.max.locked.memory of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
Exception in secureMain: java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1050)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)
Krunal,
This solution will be probably be late for you but maybe it can help somebody else so here it is. Make sure your ulimit is set correctly. But in case its a config issue.
goto:
/run/cloudera-scm-agent/process/
find latest config dir,
in this case:
1016-hdfs-DATANODE
search for parameter in this dir:
grep -rnw . -e "dfs.datanode.max.locked.memory"
./hdfs-site.xml:163: <name>dfs.datanode.max.locked.memory</name>
and edit the value to the one he is expecting in your case(65536)
I solved by opening a seperate tab in Cloudera and set the value from there

Resources