Set Kubernetes Readiness for elasticsearch client - elasticsearch

I upgraded the elasticsearch chart in kubernetes from 6.6 to 7.10.2 version. Data and master pods are running and ready but, the clients aren't ready (2 clients, 2 data , 3 master).
This is their status:
NAME READY STATUS RESTARTS AGE
elasticsearch-client-685c875bb5-5mcxg 0/1 Running 0 2m23s
elasticsearch-client-685c875bb5-cs9lq 0/1 Running 0 24m
When I run describe I see this warning:
Warning Unhealthy 10s (x10 over 100s) kubelet, Readiness probe failed: Get http://_cluster/health: net/http: request canceled (Client. Timeout exceeded while awaiting headers)
and in kubectl logs, I get this
{"type": "server", "timestamp": "2021-01-24T13:43:41,318Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-client-685c875bb5-5mcxg", "message": "path: /_cluster/health, params: {}",
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:601) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }
{"type": "server", "timestamp": "2021-01-24T13:43:51,319Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "elasticsearch", "node.name": "elasticsearch-client-685c875bb5-5mcxg", "message": "path: /_cluster/health, params: {}",
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:601) [elasticsearch-7.10.2.jar:7.10.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }
I set the readiness and liveness initialDelaySeconds to 90
what might be the problem here?

Since you have upgraded from 6.x to 7.x, make sure that you have set cluster.initial_master_nodes in your env or in the elasticsearch.yml config file.

You must have odd number of master nodes, e.g. 1, 3, 5 and so on, usually 3 masters is optimal. Otherwise your cluster won't work due to lack of quorum.

Related

ElasticSearch MasterNotDiscoveredException

I've setup an elasticsearch cluster in kuberentes, but I'm getting the error "MasterNotDiscoveredException". I'm not really sure even where to begin debugging this error as there does not appear to be anything really useful in the logs of any of the nodes:
│ elasticsearch {"type": "server", "timestamp": "2022-06-15T00:44:17,226Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "logging-ek", "node.name": "logging-ek-es-master-0", "message": "path: /_bulk, params: {}", │
│ elasticsearch "stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized, SERVICE_UNAVAILABLE/2/no master];", │
│ elasticsearch "at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:179) ~[elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:635) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:481) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$2.onTimeout(TransportBulkAction.java:669) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]", │
│ elasticsearch "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]", │
│ elasticsearch "at java.lang.Thread.run(Thread.java:833) [?:?]", │
│ elasticsearch "Suppressed: org.elasticsearch.discovery.MasterNotDiscoveredException", │
│ elasticsearch "\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:297) ~[elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]", │
│ elasticsearch "\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]", │
│ elasticsearch "\tat java.lang.Thread.run(Thread.java:833) [?:?]"] }
is pretty much the only logs i've ever seen.
It does appear that the cluster sees all of my master nodes:
elasticsearch {"type": "server", "timestamp": "2022-06-15T00:45:41,915Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logging-ek", "node.name": "logging-ek-es-master-0", "message": "master not discovered yet, this node has not previously joine │
│ d a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logging-ek-es-master-0}{fHLQrvLsTJ6UvR_clSaxfg}{iLoGrnWSTpiZxq59z7I5zA}{10.42.64.4}{10.42.64.4:9300}{mr}, {logging-ek-es-master-1}{EwF8WLIgSF6Q1Q46_51VlA}{wz5rg74iThicJdtzXZg29g}{ │
│ 10.42.240.8}{10.42.240.8:9300}{mr}, {logging-ek-es-master-2}{jtrThk_USA2jUcJYoIHQdg}{HMvZ_dUfTM-Ar4ROeIOJlw}{10.42.0.5}{10.42.0.5:9300}{mr}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]: │
│ 9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.42.0.5:9300, 10.42.240.8:9300] from hosts providers and [{logging-ek-es-master-0}{fHLQrvLsTJ6UvR_clSaxfg}{iLoGrnWSTpiZxq59z7I5zA}{10.42.64.4}{10.42.64.4:9300}{mr}] from last-known cluster state; node term 0, last-accepted version │
│ 0 in term 0" }
and I've verified that they can in fact reach each other through the network. Is there anything else or anywhere else I need to look for errors? I installed elasticsearch via elasticoperator.
Start doing curl to the 9300 port. Make sure you get a valid response going both ways.
Also make sure your MTU is set right or this can happen as well. It's a network problem most of the time.

How do I solve index alias error if I can't run elasticSearch?

The es instance I'm working with crashed, and attempting to restart the kernel leads to the run command immediately being killed (runES.sh: line 1: 92654 Killed ./bin/Elasticsearch).
The log shows this error:
[2022-03-02T10:40:54,802][INFO ][o.e.x.i.IndexLifecycleRunner] [smartmedia] policy [logstash-policy] for index [logstash-testin
dex] on an error step due to a transitive error, moving back to the failed step [check-rollover-ready] for execution. retry att
empt [24812]
[2022-03-02T10:50:54,779][ERROR][o.e.x.i.IndexLifecycleRunner] [smartmedia] policy [logstash-policy] for index [logstash-testin
dex] failed on step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]. Moving to ERROR step
java.lang.IllegalArgumentException: index.lifecycle.rollover_alias [logstash] does not point to index [logstash-testindex]
at org.elasticsearch.xpack.core.ilm.WaitForRolloverReadyStep.evaluateCondition(WaitForRolloverReadyStep.java:104) [x-pa
ck-core-7.8.1.jar:7.8.1]
at org.elasticsearch.xpack.ilm.IndexLifecycleRunner.runPeriodicStep(IndexLifecycleRunner.java:173) [x-pack-ilm-7.8.1.ja
r:7.8.1]
at org.elasticsearch.xpack.ilm.IndexLifecycleService.triggerPolicies(IndexLifecycleService.java:329) [x-pack-ilm-7.8.1.
jar:7.8.1]
at org.elasticsearch.xpack.ilm.IndexLifecycleService.triggered(IndexLifecycleService.java:267) [x-pack-ilm-7.8.1.jar:7.
8.1]
at org.elasticsearch.xpack.core.scheduler.SchedulerEngine.notifyListeners(SchedulerEngine.java:183) [x-pack-core-7.8.1.
jar:7.8.1]
at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:211) [x-pack-core-7.8
.1.jar:7.8.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
If I understand correctly, the problem is this part: [logstash] does not point to index [logstash-testindex]. However I do not know how to change an alias for an index if I can't even start the kernel. This
curl -X PUT "localhost:9200/logstash?pretty" -H 'Content-Type: application/json' -d'
{
"aliases": {
"logstash-testindex":{
"is_write_index": true
}
}
}
'
results in an error because elasticSearch is not actually running. Is there a config file I need to change instead?
This error occurs due to below reason as mentioned in this document.
Either the index is using the wrong alias or the alias does not exist.
If you are using ES version below 7.8 then you can set below configuration in elasticsearch.yml file.
xpack.ilm.enabled: false

How to apply config for Elasticsearch deploy on K8s?

I have a trouble in change Elasticsearch config that deployed in K8s.
I want to apply this config for my Elasticsearch node
# Force all memory to be locked, forcing the JVM to never swap
bootstrap.mlockall: true
## Threadpool Settings ##
# Search pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100
# Bulk pool
threadpool.bulk.type: fixed
threadpool.bulk.size: 60
threadpool.bulk.queue_size: 300
# Index pool
threadpool.index.type: fixed
threadpool.index.size: 20
threadpool.index.queue_size: 100
# Indices settings
indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb
# Cache Sizes
indices.fielddata.cache.size: 15%
indices.fielddata.cache.expire: 6h
indices.cache.filter.size: 15%
indices.cache.filter.expire: 6h
# Indexing Settings for Writes
index.refresh_interval: 30s
index.translog.flush_threshold_ops: 50000
To deploy Elasticsearch in K8s, I follow two step:
Step 1: Create .yml file, like this
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: esc-lamtest
namespace: lamtest
spec:
version: 7.9.0
nodeSets:
- name: basic-1
count: 3
config:
node.master: true
node.data: true
node.ingest: true
# Search pool
thread_pool.search.queue_size: 50
thread_pool.search.size: 20
thread_pool.search.min_queue_size: 10
thread_pool.search.max_queue_size: 100
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 150Gi
storageClassName: vnptit-nfs
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
resources:
requests:
memory: 0Gi
cpu: 0Gi
limits:
memory: 0Gi
cpu: 0Gi
http:
tls:
selfSignedCertificate:
disabled: true
Step 2: I run command line:
kubectl apply -f lamtest.yaml
But I can only app config for "Search pool".
When I apply config for # Bulk pool # Index pool # Indices settings # Cache Sizes and # Indexing Settings for Writes, my Elasticsearch fail and here is my logs
"Suppressed: java.lang.IllegalArgumentException: unknown setting [thread_pool.bulk.queue_size] did you mean any of [thread_pool.get.queue_size, thread_pool.write.queue_size, thread_pool.analyze.queue_size, thread_pool.search.queue_size, thread_pool.listener.queue_size]?",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:544) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:489) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:460) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:431) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:149) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.node.Node.<init>(Node.java:385) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.node.Node.<init>(Node.java:277) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:227) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.9.0.jar:7.9.0]",
"Suppressed: java.lang.IllegalArgumentException: unknown setting [thread_pool.bulk.size] did you mean any of [thread_pool.get.size, thread_pool.write.size, thread_pool.analyze.size, thread_pool.search.size]?",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:544) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:489) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:460) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:431) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:149) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.node.Node.<init>(Node.java:385) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.node.Node.<init>(Node.java:277) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:227) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.9.0.jar:7.9.0]",
"\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.9.0.jar:7.9.0]"] }
uncaught exception in thread [main]
java.lang.IllegalArgumentException: unknown setting [thread_pool.bulk.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:544)
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:489)
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:460)
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:431)
at org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:149)
at org.elasticsearch.node.Node.<init>(Node.java:385)
at org.elasticsearch.node.Node.<init>(Node.java:277)
at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:227)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393)
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
For complete error details, refer to the log at /usr/share/elasticsearch/logs/esc-lamtest.log
Error message is very clear, that you have not defined the proper config for other threadpools, Please notice the first line of your error msg carefully.
"Suppressed: java.lang.IllegalArgumentException: unknown setting
[thread_pool.bulk.queue_size] did you mean any of
[thread_pool.get.queue_size, thread_pool.write.queue_size,
thread_pool.analyze.queue_size, thread_pool.search.queue_size,
thread_pool.listener.queue_size]?",
You have defined bulk queue size as thread_pool.bulk.queue_size, which is not correct and as you can read threadpools in ES , there is no, threadpool for bulk, instead it uses write threadpool for bulk requests, from the same doc, hence chaning this to thread_pool.write.queue_size would work for this config.
write For single-document index/delete/update and bulk requests.
Thread pool type is fixed with a size of # of allocated processors,
queue_size of 10000. The maximum size for this pool is 1 + # of
allocated processors.
Now in order to fix this for bulk, Index and other settings, please verify that you are using the correct names for their configs, threadpools can be obtained from any running ES instance and you can easily construct the corresponding config.
http://localhost:9200/_nodes/stats?pretty gives the all the threadpools and some are listed below.
"thread_pool": {
"analyze": {
"threads": 1,
"queue": 0,
"active": 0,
"rejected": 0,
"largest": 1,
"completed": 1
},
"ccr": {
"threads": 0,
"queue": 0,
"active": 0,
"rejected": 0,
"largest": 0,
"completed": 0
},
}

Data node can’t find master node ,but master can find data node in Elasticsearch

Here is log info as below:
[2016-07-29 00:05:43,764][INFO ][cluster.service ] [data_node_144] detected_master {master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true}, added {{master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true},}, reason: zen-disco-receive(from master [{master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true}])
[2016-07-29 00:05:43,915][INFO ][discovery.zen ] [data_node_144] master_left [{master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true}], reason [transport disconnected]
[2016-07-29 00:05:43,974][WARN ][discovery.zen ] [data_node_144] master left (reason = transport disconnected), current nodes: {{data_node_144}{LkMlU7XeSqC-j0K_o05iUw}{192.168.201.144}{192.168.201.144:9300}{master=false},{data_node_148}{FKdVDjWjQ4eykqkMNRNxUQ}{192.168.201.148}{192.168.201.148:9300}{master=false},{master_node_102}{zQR04axjRxGFhmVxw3N4Pg}{192.168.201.102}{192.168.201.102:9300}{data=false, master=true},{master_node_103}{v83JEFKaQa6gXrG8o5xfBw}{192.168.201.103}{192.168.201.103:9300}{data=false, master=true},{data_node_145}{dF-4DvGlT22v2vI68PrIkQ}{192.168.201.145}{192.168.201.145:9300}{master=false},{data_node_146}{TDPHZWaRRm-lTfM2EM3bPQ}{192.168.201.146}{192.168.201.146:9300}{master=false},{master_node_101}{eqxEaFh8TeqR4VmhKPdT_g}{192.168.201.101}{192.168.201.101:9300}{data=false, master=true},{data_node_147}{-evmqt_nSV-RSeXTZ2w15w}{192.168.201.147}{192.168.201.147:9300}{master=false},}
[2016-07-29 00:05:43,974][INFO ][cluster.service ] [data_node_144] removed {{master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true},}, reason: zen-disco-master_failed ({master_node_200}{KPQpU4cdRyiqCT488ZFglg}{192.168.201.200}{192.168.201.200:9300}{data=false, master=true})
[2016-07-29 00:05:43,974][WARN ][cluster.service ] [data_node_144] failed to notify ClusterStateListener
java.lang.IllegalStateException: master not available when registering auto-generated license
at org.elasticsearch.license.plugin.core.LicensesService.requestTrialLicense(LicensesService.java:749)
at org.elasticsearch.license.plugin.core.LicensesService.clusterChanged(LicensesService.java:483)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
and the configure of data node:
cluster.name: es_cluster
node.name: data_node_144
node.master: false
node.data: true
path.data: /data/es/data
path.logs: /data/es/logs
network.host: 0.0.0.0
network.publish_host: 0.0.0.0
gateway.recover_after_nodes: 2
discovery.zen.ping_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.201.200","192.168.201.101","192.168.201.102","192.168.201.1
03","192.168.201.144","192.168.201.145","192.168.201.146","192.168.201.147","192.168.201.148" ]
discovery.zen.minimum_master_nodes: 3
configure of master node:
cluster.name: es_cluster
node.name: master_node_200
node.master: true
node.data: false
path.data: /data/es/data
path.logs: /data/es/logs
network.host: 0.0.0.0
network.publish_host: 0.0.0.0
gateway.recover_after_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.201.200","192.168.201.101","192.168.201.102","192.168.201.103" ]
discovery.zen.minimum_master_nodes: 3
bootstrap.mlockall: true
it's weird that master node can find the data node.
PS. master node : "192.168.201.200","192.168.201.101","192.168.201.102","192.168.201.103",192.168.201.200" and others are data node.ES version:2.3.2.Open JDK version:1.8.0_101.
The license plugin needs to be installed on all the nodes in the cluster, including the master eligible nodes.
If the license expires, it still can be used, but there will be some limitations:
watcher
PUT / GET watch APIs are disabled, DELETE watch API continues to work
Watches execute and write to the history
The actions of the watches don't execute
shield
Cluster health, cluster stats and indices stats operations are blocked
All data operations (read and write) continue to work
And, also, these listed here.
But, mainly, the ES cluster will work.

Elasticsearch Debugging

Our elasticsearch is a mess. The cluster health is always in red and ive decided to look into it and salvage it if possible. But I have no idea where to begin with. Here is some info regarding our cluster:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 91,
"active_shards" : 91,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 201,
"number_of_pending_tasks" : 0
}
The 6 nodes:
host ip heap.percent ram.percent load node.role master name
es04e.p.comp.net 10.0.22.63 30 22 0.00 d m es04e-es
es06e.p.comp.net 10.0.21.98 20 15 0.37 d m es06e-es
es08e.p.comp.net 10.0.23.198 9 44 0.07 d * es08e-es
es09e.p.comp.net 10.0.32.233 62 45 0.00 d m es09e-es
es05e.p.comp.net 10.0.65.140 18 14 0.00 d m es05e-es
es07e.p.comp.net 10.0.11.69 52 45 0.13 d m es07e-es
Straight away you can see I have a very large number of unassigned shards (201). I came across this answer and tried it and got 'acknowledged:true', but there was no change in the either of the above posted sets of info.
Next I logged into one of the nodes es04 and went through the log files. the first log file has a few lines that caught my attention
[2015-05-21 19:44:51,561][WARN ][transport.netty ] [es04e-es] exception caught on transport layer [[id: 0xbceea4eb]], closing connection
and
[2015-05-26 15:14:43,157][INFO ][cluster.service ] [es04e-es] removed {[es03e-es][R8sz5RWNSoiJ2zm7oZV_xg][es03e.p.sojern.net][inet[/10.0.2.16:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:22:28,721][INFO ][cluster.service ] [es04e-es] removed {[es02e-es][XZ5TErowQfqP40PbR-qTDg][es02e.p.sojern.net][inet[/10.0.2.229:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:32:00,448][INFO ][discovery.ec2 ] [es04e-es] master_left [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]], reason [shut_down]
[2015-05-26 15:32:00,449][WARN ][discovery.ec2 ] [es04e-es] master left (reason = shut_down), current nodes: {[es07e-es][etJN3eOySAydsIi15sqkSQ][es07e.p.sojern.net][inet[/10.0.2.69:9300]],[es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]],[es05e-es][ZoLnYvAdTcGIhbcFRI3H_A][es05e.p.sojern.net][inet[/10.0.1.140:9300]],[es08e-es][FPa4q07qRg-YA7hAztUj2w][es08e.p.sojern.net][inet[/10.0.2.198:9300]],[es09e-es][4q6eACbOQv-TgEG0-Bye6w][es09e.p.sojern.net][inet[/10.0.2.233:9300]],[es06e-es][zJ17K040Rmiyjf2F8kjIiQ][es06e.p.sojern.net][inet[/10.0.1.98:9300]],}
[2015-05-26 15:32:00,450][INFO ][cluster.service ] [es04e-es] removed {[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]],}, reason: zen-disco-master_failed ([es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]])
[2015-05-26 15:32:36,741][INFO ][cluster.service ] [es04e-es] new_master [es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]], reason: zen-disco-join (elected_as_master)
In this section i realized there were a few nodes es01, es02, es03 which were deleted.
After this, all log files(around 30 of them) have only 1 line:
[2015-05-26 15:43:49,971][DEBUG][action.bulk ] [es04e-es] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
I have checked all the nodes and they have same version of ES and logstash. I realize this is a big complicated issues but if anyone can find out the issue and nudge me in the right direction it will be HUGE help
I believe this might be because at some point you have a split brain issue and there were 2 versions of same shard in 2 clusters. One or both might have got different sets of data and 2 versions of shard might have come into existence. At some point you might have restarted the whole system and some shards might have gone to red state.
First see if there is data loss , if there is , aforementioned case could be the reason. Next make sure you set minimum master nodes to N/2+1 ( N is the number of shards ) , so that this issue wont surface again.
YOu can use the shard reroute API on the red shards and see if its moving out of red state. You might loose the shard data here , but then that is the the only way i have seen to being back the cluster state to green.
Please try to install Elastic-head plugin to check, to check shard status. you will able to see which shards are corrupted.
Try flush or optimize option.
Also restart Elastic sometime works.

Resources