Istio + Elasticsearch - elasticsearch

I'm playing with Istio and Elasticsearch, I have virtual box machines on my laptop
I Installed Elastic search from this link Kubernetes Elasticsearch Cluster
I have a master es + data es, if I installed them without Istio, they run normally.
If I inject them with Istio the data nodes can not communicate with master (it does not find it).
root#node1:/home/arkan# k get all
NAME READY STATUS RESTARTS AGE
pod/es-data-6fdbcf956f-fdnc7 1/2 Running 1 1m
pod/es-master-6b6d5fd59b-86qpb 2/2 Running 0 1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elasticsearch NodePort 10.108.28.225 <none> 9200:32721/TCP 1m
service/elasticsearch-discovery ClusterIP None <none> 9300/TCP 1m
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/es-data 1 1 1 0 1m
deployment.apps/es-master 1 1 1 1 1m
NAME DESIRED CURRENT READY AGE
replicaset.apps/es-data-6fdbcf956f 1 1 0 1m
replicaset.apps/es-master-6b6d5fd59b 1 1 1 1m
root#node1:/home/arkan# k describe pod/es-data-6fdbcf956f-fdnc7
Name: es-data-6fdbcf956f-fdnc7
Namespace: default
Node: node2/192.168.0.214
Start Time: Wed, 18 Jul 2018 21:42:50 +0300
Labels: component=elasticsearch
pod-template-hash=2986795129
role=data
Annotations: sidecar.istio.io/status={"version":"55c9e544b52e1d4e45d18a58d0b34ba4b72531e45fb6d1572c77191422556ffc","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs...
Status: Running
IP: 172.16.104.22
Controlled By: ReplicaSet/es-data-6fdbcf956f
Init Containers:
init-sysctl:
Container ID: docker://c510035d1e823d134ad287116ef43332255758cce60cc1216ed20282b0b55e76
Image: busybox:1.27.2
Image ID: docker-pullable://busybox#sha256:bbc3a03235220b170ba48a157dd097dd1379299370e1ed99ce976df0355d24f0
Port: <none>
Host Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 18 Jul 2018 21:42:52 +0300
Finished: Wed, 18 Jul 2018 21:42:52 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mkdlq (ro)
istio-init:
Container ID: docker://42a8f5da07834533dcd4c26155fb344fa41edecb744e6a4c14c54c40610a450b
Image: docker.io/istio/proxy_init:0.8.0
Image ID: docker-pullable://istio/proxy_init#sha256:b0b288ee8270e054442abdd413da9395e2af39fed1792b85ec157700ef2c192f
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
9200, 9300,
-d
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 18 Jul 2018 21:42:53 +0300
Finished: Wed, 18 Jul 2018 21:42:53 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mkdlq (ro)
Containers:
es-data:
Container ID: docker://505269e9be09e83d672b91a582afd7569b3afd794bbfab764d50d75e7a3f7309
Image: quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
Image ID: docker-pullable://quay.io/pires/docker-elasticsearch-kubernetes#sha256:dcd3e9db3d2c6b9a448d135aebcacac30a4cca655d42efaa115aa57405cd22f3
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 18 Jul 2018 21:46:08 +0300
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Wed, 18 Jul 2018 21:45:18 +0300
Finished: Wed, 18 Jul 2018 21:46:07 +0300
Ready: False
Restart Count: 4
Limits:
cpu: 1
Requests:
cpu: 250m
Liveness: tcp-socket :transport delay=20s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/_cluster/health delay=20s timeout=5s period=10s #success=1 #failure=3
Environment:
NAMESPACE: default (v1:metadata.namespace)
NODE_NAME: es-data-6fdbcf956f-fdnc7 (v1:metadata.name)
CLUSTER_NAME: myesdb
NODE_MASTER: false
NODE_INGEST: false
HTTP_ENABLE: true
ES_JAVA_OPTS: -Xms256m -Xmx256m
PROCESSORS: 1 (limits.cpu)
Mounts:
/data from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mkdlq (ro)
istio-proxy:
Container ID: docker://1bf8e167ece0ac6282c336a6630c292013e36721ba027e6e8b5bb71a4bf65a25
Image: docker.io/istio/proxyv2:0.8.0
Image ID: docker-pullable://istio/proxyv2#sha256:1930f0603321b1917b2249c576ecb4141aaceeaae5fcc0760b6a88dc88daea3e
Port: <none>
Host Port: <none>
Args:
proxy
sidecar
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15007
--discoveryRefreshDelay
10s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-statsd-prom-bridge.istio-system:9125
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
State: Running
Started: Wed, 18 Jul 2018 21:46:37 +0300
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 18 Jul 2018 21:42:55 +0300
Finished: Wed, 18 Jul 2018 21:46:36 +0300
Ready: True
Restart Count: 1
Requests:
cpu: 100m
memory: 128Mi
Environment:
POD_NAME: es-data-6fdbcf956f-fdnc7 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: es-data-6fdbcf956f-fdnc7 (v1:metadata.name)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mkdlq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
default-token-mkdlq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mkdlq
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/es-data-6fdbcf956f-fdnc7 to node2
Normal Pulled 5m kubelet, node2 Container image "busybox:1.27.2" already present on machine
Normal Created 5m kubelet, node2 Created container
Normal Started 5m kubelet, node2 Started container
Normal Started 5m kubelet, node2 Started container
Normal Pulled 5m kubelet, node2 Container image "docker.io/istio/proxy_init:0.8.0" already present on machine
Normal Created 5m kubelet, node2 Created container
Normal Pulled 5m kubelet, node2 Container image "docker.io/istio/proxyv2:0.8.0" already present on machine
Normal Created 5m kubelet, node2 Created container
Normal Started 5m kubelet, node2 Started container
Warning Unhealthy 5m kubelet, node2 Liveness probe failed: dial tcp 172.16.104.22:9300: connect: invalid argument
Warning Unhealthy 5m kubelet, node2 Readiness probe failed: Get http://172.16.104.22:9200/_cluster/health: dial tcp 172.16.104.22:9200: connect: invalid argument
Normal Pulled 5m (x2 over 5m) kubelet, node2 Container image "quay.io/pires/docker-elasticsearch-kubernetes:6.3.0" already present on machine
Normal Created 5m (x2 over 5m) kubelet, node2 Created container
Normal Killing 5m kubelet, node2 Killing container with id docker://es-data:Container failed liveness probe.. Container will be killed and recreated.
Normal Started 5m (x2 over 5m) kubelet, node2 Started container
Warning Unhealthy 4m (x3 over 5m) kubelet, node2 Readiness probe failed: Get http://172.16.104.22:9200/_cluster/health: dial tcp 172.16.104.22:9200: connect: connection refused
Warning Unhealthy 4m (x4 over 5m) kubelet, node2 Liveness probe failed: dial tcp 172.16.104.22:9300: connect: connection refused
Warning Unhealthy 51s (x3 over 1m) kubelet, node2 (combined from similar events): Readiness probe failed: Get http://172.16.104.22:9200/_cluster/health: EOF
root#node1:/home/arkan# k logs pod/es-data-6fdbcf956f-fdnc7 -c es-data
[2018-07-18T18:46:13,037][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] initializing ...
[2018-07-18T18:46:13,267][INFO ][o.e.e.NodeEnvironment ] [es-data-6fdbcf956f-fdnc7] using [1] data paths, mounts [[/data (/dev/mapper/node1--vg-root)]], net usable_space [13.8gb], net total_space [27.9gb], types [ext4]
[2018-07-18T18:46:13,270][INFO ][o.e.e.NodeEnvironment ] [es-data-6fdbcf956f-fdnc7] heap size [247.5mb], compressed ordinary object pointers [true]
[2018-07-18T18:46:13,272][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] node name [es-data-6fdbcf956f-fdnc7], node ID [ymKMhUIxRq-hbrmrqzayCQ]
[2018-07-18T18:46:13,272][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] version[6.3.0], pid[1], build[default/tar/424e937/2018-06-11T23:38:03.357887Z], OS[Linux/4.4.0-128-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2018-07-18T18:46:13,273][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms256m, -Xmx256m, -Des.path.home=/elasticsearch, -Des.path.conf=/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2018-07-18T18:46:15,832][WARN ][o.e.d.c.s.Settings ] [http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
[2018-07-18T18:46:18,327][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] loaded module [aggs-matrix-stats]
[2018-07-18T18:46:18,331][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] loaded module [analysis-common]
[2018-07-18T18:46:18,331][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] loaded module [ingest-common]
[2018-07-18T18:46:18,332][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] loaded module [lang-expression]
[2018-07-18T18:46:18,332][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] loaded module [lang-mustache]
[2018-07-18T18:46:18,337][INFO ][o.e.p.PluginsService ] [es-data-6fdbcf956f-fdnc7] no plugins loaded
[2018-07-18T18:46:26,419][INFO ][o.e.x.s.a.s.FileRolesStore] [es-data-6fdbcf956f-fdnc7] parsed [0] roles from file [/elasticsearch/config/roles.yml]
[2018-07-18T18:46:28,422][INFO ][o.e.d.DiscoveryModule ] [es-data-6fdbcf956f-fdnc7] using discovery type [zen]
[2018-07-18T18:46:30,218][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] initialized
[2018-07-18T18:46:30,218][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] starting ...
[2018-07-18T18:46:30,630][INFO ][o.e.t.TransportService ] [es-data-6fdbcf956f-fdnc7] publish_address {172.16.104.22:9300}, bound_addresses {172.16.104.22:9300}
[2018-07-18T18:46:30,701][INFO ][o.e.b.BootstrapChecks ] [es-data-6fdbcf956f-fdnc7] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-07-18T18:46:33,802][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:36,803][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:39,805][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:43,830][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59678, remoteAddress=null}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:44,818][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59684, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:45,810][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:45,902][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59690, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:48,812][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:48,900][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59714, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:49,899][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59720, remoteAddress=null}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:51,815][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:51,901][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59730, remoteAddress=null}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:46:52,900][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59734, remoteAddress=null}]
[es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:46:59,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59782, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:00,781][WARN ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] timed out while waiting for initial discovery state - timeout: 30s
[2018-07-18T18:47:00,801][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [es-data-6fdbcf956f-fdnc7] publish_address {172.16.104.22:9200}, bound_addresses {172.16.104.22:9200}
[2018-07-18T18:47:00,803][INFO ][o.e.n.Node ] [es-data-6fdbcf956f-fdnc7] started
[2018-07-18T18:47:00,821][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:02,896][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59796, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:03,822][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:06,823][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:15,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59880, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:18,832][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:21,835][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:24,837][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:24,841][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59936, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:27,838][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:30,840][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:30,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59976, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:33,844][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:33,898][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:59992, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:36,847][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:51,854][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:51,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:60110, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:47:54,857][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:47:57,858][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:48:06,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:60202, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:48:09,868][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:48:39,886][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:48:39,897][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:60418, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:49:15,909][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:49:15,997][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [es-data-6fdbcf956f-fdnc7] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:60640, remoteAddress=elasticsearch-discovery/172.16.166.167:9300}]
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-07-18T18:49:18,910][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:49:21,912][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:49:24,913][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2018-07-18T18:49:27,915][WARN ][o.e.d.z.ZenDiscovery ] [es-data-6fdbcf956f-fdnc7] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again

ElasticSearch expects to be able to talk to PodIPs directly.
Consider disabling the sidecar for ElasticSearch.
https://istio.io/docs/setup/kubernetes/additional-setup/sidecar-injection/#policy documents how to use the sidecar.istio.io/inject: "false" annotation.

You can add namespace (-n logging) to your deploy script. For example
kubectl create -f es-discovery-svc.yaml -n logging
...
Istio sidecar autoinject only work on namespace:default by default.

Related

ElasticSearch cannot work (This page isn’t working) [duplicate]

I have just downloaded elasticsearch and run the elasticsearch.bat.
So i didn't modify anything, but when i try to access localhost:9200 or 9300 is not working.
Accordign to logs it started ok.
[2022-03-14T16:42:47,633][INFO ][o.e.i.r.RecoverySettings ] [DESKTOP-3DPA0JQ] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
[2022-03-14T16:42:47,664][INFO ][o.e.d.DiscoveryModule ] [DESKTOP-3DPA0JQ] using discovery type [multi-node] and seed hosts providers [settings]
[2022-03-14T16:42:48,507][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] initialized
[2022-03-14T16:42:48,508][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] starting ...
[2022-03-14T16:42:48,564][INFO ][o.e.x.s.c.f.PersistentCache] [DESKTOP-3DPA0JQ] persistent cache index loaded
[2022-03-14T16:42:48,565][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [DESKTOP-3DPA0JQ] deprecation component started
[2022-03-14T16:42:48,692][INFO ][o.e.t.TransportService ] [DESKTOP-3DPA0JQ] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2022-03-14T16:42:49,065][INFO ][o.e.c.c.Coordinator ] [DESKTOP-3DPA0JQ] cluster UUID [M7j_3np8QtCiMDZ8hLGu6w]
[2022-03-14T16:42:49,157][INFO ][o.e.c.s.MasterService ] [DESKTOP-3DPA0JQ] elected-as-master ([1] nodes joined)[{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw} completing election, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 10, version: 142, delta: master node changed {previous [], current [{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}]}
[2022-03-14T16:42:49,269][INFO ][o.e.c.s.ClusterApplierService] [DESKTOP-3DPA0JQ] master node changed {previous [], current [{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}]}, term: 10, version: 142, reason: Publication{term=10, version=142}
[2022-03-14T16:42:49,326][INFO ][o.e.h.AbstractHttpServerTransport] [DESKTOP-3DPA0JQ] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
[2022-03-14T16:42:49,327][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] started
[2022-03-14T16:42:49,379][INFO ][o.e.l.LicenseService ] [DESKTOP-3DPA0JQ] license [f997c03d-7240-4ecf-be38-65f043eea771] mode [basic] - valid
[2022-03-14T16:42:49,380][INFO ][o.e.x.s.a.Realms ] [DESKTOP-3DPA0JQ] license mode is [basic], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]
[2022-03-14T16:42:49,386][INFO ][o.e.g.GatewayService ] [DESKTOP-3DPA0JQ] recovered [2] indices into cluster_state
[2022-03-14T16:42:49,880][INFO ][o.e.c.r.a.AllocationService] [DESKTOP-3DPA0JQ] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.geoip_databases][0], [.security-7][0]]])." previous.health="RED" reason="shards started [[.geoip_databases][0], [.security-7][0]]"
[2022-03-14T16:42:50,142][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-Country.mmdb]
[2022-03-14T16:42:50,155][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-ASN.mmdb]
[2022-03-14T16:42:51,002][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-City.mmdb]
[2022-03-14T16:42:54,067][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64318}
[2022-03-14T16:42:54,067][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64319}
[2022-03-14T16:42:54,068][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64320}
[2022-03-14T16:42:55,104][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64321}
In the latest version (ES8), security is on by default (i.e. SSL/TLS).
If you're accessing from the browser, just use https instead of http:
https://localhost:9200
^
|
add this
Edit elasticsearch\config\elasticsearch.yml
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
pack.security.http.ssl:
enabled: false
keystore.path: certs/http.p12
xpack.security.transport.ssl:
enabled: false
verification_mode: certificate
keystore.path: certs/transport.p12
truststore.path: certs/transport.p12

Elasticsearch Received plaintext traffic on an encrypted channel, closing connection Netty4TcpChannel

I have just downloaded elasticsearch and run the elasticsearch.bat.
So i didn't modify anything, but when i try to access localhost:9200 or 9300 is not working.
Accordign to logs it started ok.
[2022-03-14T16:42:47,633][INFO ][o.e.i.r.RecoverySettings ] [DESKTOP-3DPA0JQ] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
[2022-03-14T16:42:47,664][INFO ][o.e.d.DiscoveryModule ] [DESKTOP-3DPA0JQ] using discovery type [multi-node] and seed hosts providers [settings]
[2022-03-14T16:42:48,507][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] initialized
[2022-03-14T16:42:48,508][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] starting ...
[2022-03-14T16:42:48,564][INFO ][o.e.x.s.c.f.PersistentCache] [DESKTOP-3DPA0JQ] persistent cache index loaded
[2022-03-14T16:42:48,565][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [DESKTOP-3DPA0JQ] deprecation component started
[2022-03-14T16:42:48,692][INFO ][o.e.t.TransportService ] [DESKTOP-3DPA0JQ] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2022-03-14T16:42:49,065][INFO ][o.e.c.c.Coordinator ] [DESKTOP-3DPA0JQ] cluster UUID [M7j_3np8QtCiMDZ8hLGu6w]
[2022-03-14T16:42:49,157][INFO ][o.e.c.s.MasterService ] [DESKTOP-3DPA0JQ] elected-as-master ([1] nodes joined)[{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw} completing election, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 10, version: 142, delta: master node changed {previous [], current [{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}]}
[2022-03-14T16:42:49,269][INFO ][o.e.c.s.ClusterApplierService] [DESKTOP-3DPA0JQ] master node changed {previous [], current [{DESKTOP-3DPA0JQ}{n3yQhC4cQveWn_x7QrQPYQ}{QSgY7a2zQDWZClJOW_2yEg}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}]}, term: 10, version: 142, reason: Publication{term=10, version=142}
[2022-03-14T16:42:49,326][INFO ][o.e.h.AbstractHttpServerTransport] [DESKTOP-3DPA0JQ] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
[2022-03-14T16:42:49,327][INFO ][o.e.n.Node ] [DESKTOP-3DPA0JQ] started
[2022-03-14T16:42:49,379][INFO ][o.e.l.LicenseService ] [DESKTOP-3DPA0JQ] license [f997c03d-7240-4ecf-be38-65f043eea771] mode [basic] - valid
[2022-03-14T16:42:49,380][INFO ][o.e.x.s.a.Realms ] [DESKTOP-3DPA0JQ] license mode is [basic], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]
[2022-03-14T16:42:49,386][INFO ][o.e.g.GatewayService ] [DESKTOP-3DPA0JQ] recovered [2] indices into cluster_state
[2022-03-14T16:42:49,880][INFO ][o.e.c.r.a.AllocationService] [DESKTOP-3DPA0JQ] current.health="GREEN" message="Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.geoip_databases][0], [.security-7][0]]])." previous.health="RED" reason="shards started [[.geoip_databases][0], [.security-7][0]]"
[2022-03-14T16:42:50,142][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-Country.mmdb]
[2022-03-14T16:42:50,155][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-ASN.mmdb]
[2022-03-14T16:42:51,002][INFO ][o.e.i.g.DatabaseNodeService] [DESKTOP-3DPA0JQ] successfully loaded geoip database file [GeoLite2-City.mmdb]
[2022-03-14T16:42:54,067][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64318}
[2022-03-14T16:42:54,067][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64319}
[2022-03-14T16:42:54,068][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64320}
[2022-03-14T16:42:55,104][WARN ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DESKTOP-3DPA0JQ] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:64321}
In the latest version (ES8), security is on by default (i.e. SSL/TLS).
If you're accessing from the browser, just use https instead of http:
https://localhost:9200
^
|
add this
Edit elasticsearch\config\elasticsearch.yml
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
pack.security.http.ssl:
enabled: false
keystore.path: certs/http.p12
xpack.security.transport.ssl:
enabled: false
verification_mode: certificate
keystore.path: certs/transport.p12
truststore.path: certs/transport.p12

URL to access cluster environment for ElasticSearch2.4.3

We have an ElasticSearch2.4.3 cluster environment of two nodes. I want to ask what URL should I provide to access the environment so that it works in High Availability?
We have two master Node1 and Node2. The host name for Node1 is node1.elastic.com and Node2 is node2.elastic.com. Both the nodes are master according to formula (n/2 +1).
We have enabled cluster setting by modifying the elastic.yml file by adding
discovery.zen.ping.unicast.hosts for the two nodes.
From our java application, we are connecting to node1.elastic.com. It works fine till both the nodes are up. Data is getting populated in both the ES servers and everything is good. But as soon as Node1 goes down entire elastic search cluster gets disconnected. And it automatically doesn't switch to Node2 for processing requests.
I feel like the URL which I am giving is not right, and it has to be something else to provide an automatic switch.
Logs from Node1
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] initialized
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] starting ...
[2020-02-10 12:15:45,769][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,783][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,784][INFO ][transport ] [Wildpride] publish_address {000.00.00.204:9300}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9300}, {000.00.00.204:9300}
[2020-02-10 12:15:45,788][INFO ][discovery ] [Wildpride] XXXX/Hg_5eGZIS0e249KUTQqPPg
[2020-02-10 12:16:15,790][WARN ][discovery ] [Wildpride] waited for 30s and no initial state was set by the discovery
[2020-02-10 12:16:15,799][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,802][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,803][INFO ][http ] [Wildpride] publish_address {000.00.00.204:9200}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9200}, {000.00.00.204:9200}
[2020-02-10 12:16:15,803][INFO ][node ] [Wildpride] started
[2020-02-10 12:16:35,552][INFO ][node ] [Wildpride] stopping ...
[2020-02-10 12:16:35,619][WARN ][discovery.zen.ping.unicast] [Wildpride] [17] failed send ping to {#zen_unicast_1#}{000.00.00.206}{000.00.00.206:9300}
java.lang.IllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:906)
at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2020-02-10 12:16:35,620][WARN ][discovery.zen.ping.unicast] [Wildpride] failed to send ping to [{Wildpride}{Hg_5eGZIS0e249KUTQqPPg}{000.00.00.204}{000.00.00.204:9300}]
SendRequestTransportException[[Wildpride][000.00.00.204:9300][internal:discovery/zen/unicast]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.doRun(UnicastZenPing.java:249)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:320)
... 7 more
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] stopped
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] closing ...
[2020-02-10 12:16:35,642][INFO ][node ] [Wildpride] closed
TL;DR: There is no automatic switch in elasticsearch and you'll need some kind of loadbalancer in front of the elasticsearch cluster.
For an HA setup, you need at least 3 master eligible nodes. In front of the cluster there have to be a loadbalancer (also HA) to distribute the requests across the cluster. Or the client needs to be somehow aware of cluster and in a failure scenario failover to any node left.
If you go with 2 master nodes, the cluster can get into the "split brain" state. If your network gets somehow fragmented and the nodes become invisible to each other, both of them will think it is the last one working and keep serving the read/write requests independently. In that way they drift away from each other and it will become nearly impossible to fix it - at least, when the fragmentation is gone, there is a lot of trouble to fix. With 3 nodes, in a fragmentation scenario the clusternwill only continue to serve requests if there are at least 2 nodes visible to each other.

Elasticsearch: 'timed out waiting for all nodes to process published state' and cluster unavailability

I am setting up Elasticsearch 3-nodes cluster with docker. This is my docker compose file:
version: '2.0'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
environment:
- cluster.name=test-cluster
- node.name=elastic_1
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=2
- discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- test_es_cluster_data:/usr/share/elasticsearch/data
networks:
- esnet
elasticsearch2:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_2
volumes:
- test_es_cluster2_data:/usr/share/elasticsearch/data
elasticsearch3:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_3
volumes:
- test_es_cluster3_data:/usr/share/elasticsearch/data
volumes:
test_es_cluster_data:
test_es_cluster2_data:
test_es_cluster3_data:
networks:
esnet:
Once the cluster is up, I kill master (elastic_1) to test failover. I expect new master will be elected, while the cluster should respond to read requests all the time.
Well, master is elected, but the cluster is not responding for pretty long time (~45s).
Please find logs from elastic_2 and elastic_3 after master is stopped (docker stop escluster_elasticsearch_1):
elastic_2:
...
[2018-07-04T14:47:04,495][INFO ][o.e.d.z.ZenDiscovery ] [elastic_2] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,509][WARN ][o.e.c.NodeConnectionsService] [elastic_2] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,565][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] detected_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])
[2018-07-04T14:47:35,301][WARN ][r.suppressed ] path: /_cat/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:53,933][WARN ][o.e.c.s.ClusterApplierService] [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])] took [46.3s] above the warn threshold of 30s
[2018-07-04T14:47:53,934][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5]])
[2018-07-04T14:47:56,931][WARN ][o.e.t.TransportService ] [elastic_2] Received response for a request that has timed out, sent [48367ms] ago, timed out [18366ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}], id [1035]
elastic_3:
[2018-07-04T14:47:04,494][INFO ][o.e.d.z.ZenDiscovery ] [elastic_3] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,519][WARN ][o.e.c.NodeConnectionsService] [elastic_3] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,550][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}
[2018-07-04T14:47:35,026][WARN ][r.suppressed ] path: /_cat/nodes, params: {v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:37,560][WARN ][o.e.d.z.PublishClusterStateAction] [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}{...}{172.24.0.2}{172.24.0.2:9300}])
[2018-07-04T14:47:37,561][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
[2018-07-04T14:47:41,021][WARN ][o.e.c.s.MasterService ] [elastic_3] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [33.4s] above the warn threshold of 30s
[2018-07-04T14:47:41,022][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected), reason: removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}
[2018-07-04T14:47:56,929][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5] source [zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected)]])
Why does it take so long time for the cluster to stabilize and respond to requests?
It is puzzling, that:
a) new master is elected (elastic_3):
[2018-07-04T14:47:07,550][INFO ] ... [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}...
b) then, it is detected by elastic_2:
[2018-07-04T14:47:07,565][INFO ] ... [elastic_2] detected_master {elastic_3}...
c) then, master times out on waiting to process published state:
[2018-07-04T14:47:37,560][WARN ] ... [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}...])
d) elastic_2 applies cluster state with warn:
[2018-07-04T14:47:53,933][WARN ] ... [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}...])] took [46.3s] above the warn threshold of 30s
What can cause the timeout (c)? All this is run on local machine (no network issues). Am I missing any configuration?
Meanwhile, requesting both elastic_2 and elastic_3 one ends up with MasterNotDiscoveredException. According to the documentation, the cluster is expected to respond (https://www.elastic.co/guide/en/elasticsearch/reference/6.3/modules-discovery-zen.html#no-master-block).
Did anyone experienced this? I would appreciate any advice on this issue.
Using docker restart instead of docker stop solves the problem. See: https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

Elasticsearch Master not discovered exception

I'm running a 5 node elasticsearch cluster (2 data nodes, 2 master nodes, 1 kibana).
I'm getting the following error when use the command
curl -X GET "192.168.107.75:9200/_cat/master?v"
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}
],"type":"master_not_discovered_exception","reason":null},"status":503}
I'm using the following command to run elastic
sudo systemctl start elasticsearch.service
This is the message I see in the logs:
[2018-05-28T21:02:22,074][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:25,076][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:28,077][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:31,079][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:34,081][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:37,084][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] not enough master nodes discovered during pinging (found [[Candidate{node={node-master-1}{kJKYkpdbTKmdIeq-RVnCAQ}{JGbXMxOXR0SyjCu746Zlwg}{192.168.107.75}{192.168.107.75:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-05-28T21:02:40,090][WARN ][o.e.d.z.ZenDiscovery ] [node-master-1] failed to connect to master [{node-master-2}{_M4BTrFbQguT3PbY5d2_JA}{1rzJcDPSQ5OH2OZ_CnhR-g}{192.168.107.76}{192.168.107.76:9300}], retrying...
org.elasticsearch.transport.ConnectTransportException: [node-master-2][192.168.107.76:9300] connect_exception
at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:616) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:513) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:515) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:483) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.discovery.zen.ZenDiscovery.access$2500(ZenDiscovery.java:90) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1253) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 192.168.107.76/192.168.107.76:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
... 1 more
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
... 1 more
In the ealsticsearch.yml file apart from the config for assigning different roles to nodes I'm using the following configuration:
cluster.name: test_cluster
network.host: 192.168.107.71
discovery.zen.ping.unicast.hosts: ["192.168.107.73", "192.168.107.74", "192.168.107.75", "192.168.107.76"]
#the above two configuration IPs change as per the node
discovery.zen.minimum_master_nodes: 2
The hosts are pingable and have access to each other.
Any help would be much appreciated.
I think the problem is quite clear, [node-master-2][192.168.107.76] either is not accessible from this host, or elastic process on [node-master-2] is down.
You can check if curl -XGET "192.168.107.76:9200" from this host has a valid answer.
Also elastic documents explicitly says:
It is recommended to avoid having only two master eligible nodes,
since a quorum of two is two. Therefore, a loss of either master
eligible node will result in an inoperable cluster.
This ElasticSearch install guide provides a guidance how to to fix master_not_discovered_exception exceptions. Basically you can get this error for several reasons:
Firewall rule is blocking communication
Master / Data host names cannot be resolved (won't be you case as you are using IP addresses)
Incorrect elasticsearch.yml configuration (e.g. master node is not configured as master node, or running on different port / IP address).
First and second item can easily checked with telnet (from master telnet to data node, and the other way around).

Resources