Enterprise search timeout for Elasticsearch create index - elasticsearch

I am using ECK to deploy Elasticsearch cluster on Kubernetes.
My Elasticsearch is working fine and it shows green as cluster. But when Enterprise search start and start creating indexes in Elasticsearch, after creating some indexes, it give error for timeout.
pv.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elasticsearch-master
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /mnt/nfs/kubernetes/elasticsearch/master/
...
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elasticsearch-data
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /mnt/nfs/kubernetes/elasticsearch/data/
...
multi_node.yaml
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: bselastic
spec:
version: 8.1.2
nodeSets:
- name: masters
count: 1
config:
node.roles: ["master",
# "data",
]
xpack.ml.enabled: true
# Volumeclaim needed to add volume, it was giving error for not volume claim
# and its not starting pod.
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
- name: data-node
count: 1
config:
node.roles: ["data", "ingest"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
...
---
apiVersion: enterprisesearch.k8s.elastic.co/v1
kind: EnterpriseSearch
metadata:
name: enterprise-search-bselastic
spec:
version: 8.1.3
count: 1
elasticsearchRef:
name: bselastic
podTemplate:
spec:
containers:
- name: enterprise-search
env:
- name: JAVA_OPTS
value: -Xms2g -Xmx2g
- name: "elasticsearch.startup_retry.interval"
value: "30"
- name: allow_es_settings_modification
value: "true"
...
Apply these changes using below command.
kubectl apply -f multi_node.yaml -n deleteme -f pv.yaml
Check the Elasticsearch cluster status
# kubectl get es -n deleteme
NAME HEALTH NODES VERSION PHASE AGE
bselastic unknown 8.1.2 ApplyingChanges 47s
Check all pods
# kubectl get pod -n deleteme
NAME READY STATUS RESTARTS AGE
bselastic-es-data-node-0 0/1 Running 0 87s
bselastic-es-masters-0 1/1 Running 0 87s
enterprise-search-bselastic-ent-54675f95f8-9sskf 0/1 Running 0 86s
Elasticsearch cluster become green after 7+ min
[root#1175014-kubemaster01 nilesh]# kubectl get es -n deleteme
NAME HEALTH NODES VERSION PHASE AGE
bselastic green 2 8.1.2 Ready 7m30s
enterprise search log
# kubectl -n deleteme logs -f enterprise-search-bselastic-ent-549bbcb9-rnhmc
Custom Enterprise Search configuration file detected, not overwriting it (any settings passed via environment will be ignored)
Found java executable in PATH
Java version detected: 11.0.14.1 (major version: 11)
Enterprise Search is starting...
[2022-04-25T16:34:22.282+00:00][7][2000][app-server][INFO]: Elastic Enterprise Search version=8.1.3, JRuby version=9.2.16.0, Ruby version=2.5.7, Rails version=5.2.6
[2022-04-25T16:34:23.862+00:00][7][2000][app-server][INFO]: Performing pre-flight checks for Elasticsearch running on https://bselastic-es-http.deleteme.svc:9200...
[2022-04-25T16:34:25.308+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Elasticsearch backend. Make sure it is running and healthy.
[2022-04-25T16:34:25.310+00:00][7][2000][app-server][INFO]: [pre-flight] Error: /usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/elasticsearch_checks.class:187: Connection refused (Connection refused) (Faraday::ConnectionFailed)
[2022-04-25T16:34:31.353+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Elasticsearch backend. Make sure it is running and healthy.
[2022-04-25T16:34:31.355+00:00][7][2000][app-server][INFO]: [pre-flight] Error: /usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/elasticsearch_checks.class:187: Connection refused (Connection refused) (Faraday::ConnectionFailed)
[2022-04-25T16:34:37.370+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Elasticsearch backend. Make sure it is running and healthy.
[2022-04-25T16:34:37.372+00:00][7][2000][app-server][INFO]: [pre-flight] Error: /usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/elasticsearch_checks.class:187: Connection refused (Connection refused) (Faraday::ConnectionFailed)
[2022-04-25T16:34:43.384+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Elasticsearch backend. Make sure it is running and healthy.
[2022-04-25T16:34:43.386+00:00][7][2000][app-server][INFO]: [pre-flight] Error: /usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/elasticsearch_checks.class:187: Connection refused (Connection refused) (Faraday::ConnectionFailed)
[2022-04-25T16:34:49.400+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Elasticsearch backend. Make sure it is running and healthy.
[2022-04-25T16:34:49.401+00:00][7][2000][app-server][INFO]: [pre-flight] Error: /usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/elasticsearch_checks.class:187: Connection refused (Connection refused) (Faraday::ConnectionFailed)
[2022-04-25T16:37:56.290+00:00][7][2000][app-server][INFO]: [pre-flight] Elasticsearch cluster is ready
[2022-04-25T16:37:56.292+00:00][7][2000][app-server][INFO]: [pre-flight] Successfully connected to Elasticsearch
[2022-04-25T16:37:56.367+00:00][7][2000][app-server][INFO]: [pre-flight] Successfully loaded Elasticsearch plugin information for all nodes
[2022-04-25T16:37:56.381+00:00][7][2000][app-server][INFO]: [pre-flight] Elasticsearch running with an active basic license
[2022-04-25T16:37:56.423+00:00][7][2000][app-server][INFO]: [pre-flight] Elasticsearch API key service is enabled
[2022-04-25T16:37:56.446+00:00][7][2000][app-server][INFO]: [pre-flight] Elasticsearch will be used for authentication
[2022-04-25T16:37:56.447+00:00][7][2000][app-server][INFO]: Elasticsearch looks healthy and configured correctly to run Enterprise Search
[2022-04-25T16:37:56.452+00:00][7][2000][app-server][INFO]: Performing pre-flight checks for Kibana running on http://localhost:5601...
[2022-04-25T16:37:56.482+00:00][7][2000][app-server][WARN]: [pre-flight] Failed to connect to Kibana backend. Make sure it is running and healthy.
[2022-04-25T16:37:56.486+00:00][7][2000][app-server][ERROR]: Could not connect to Kibana backend after 0 seconds.
[2022-04-25T16:37:56.488+00:00][7][2000][app-server][WARN]: Enterprise Search is unable to connect to Kibana. Ensure it is running at http://localhost:5601 for user deleteme-enterprise-search-bselastic-ent-user.
[2022-04-25T16:37:59.344+00:00][7][2000][app-server][INFO]: Elastic APM agent is disabled
{"timestamp": "2022-04-25T16:38:05+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-04-25T16:38:06+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-04-25T16:38:16+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-04-25T16:38:26+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-04-25T16:38:36+00:00", "message": "readiness probe failed", "curl_rc": "7"}
[2022-04-25T16:38:43.880+00:00][7][2000][app-server][INFO]: [db_lock] [installation] Status: [Starting] Ensuring migrations tracking index exists
{"timestamp": "2022-04-25T16:38:45+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-04-25T16:38:56+00:00", "message": "readiness probe failed", "curl_rc": "7"}
[2022-04-25T16:39:05.283+00:00][7][2000][app-server][INFO]: [db_lock] [installation] Status: [Finished] Ensuring migrations tracking index exists
[2022-04-25T16:39:05.782+00:00][7][2000][app-server][INFO]: [db_lock] [installation] Status: [Starting] Creating indices for 38 models
[2022-05-02T16:21:47.303+00:00][8][2000][es][DEBUG]: {
"request": {
"url": "https://bselastic-es-http.deleteme.svc:9200/.ent-search-actastic-oauth_applications_v2",
"method": "put",
"headers": {
"Authorization": "[FILTERED]",
"Content-Type": "application/json",
"x-elastic-product-origin": "enterprise-search",
"User-Agent": "Faraday v1.8.0"
},
"params": null,
"body": "{\"settings\":{\"index\":{\"hidden\":true,\"refresh_interval\":-1},\"number_of_shards\":1,\"auto_expand_replicas\":\"0-3\",\"priority\":250},\"mappings\":{\"dynamic\":\"strict\",\"properties\":{\"id\":{\"type\":\"keyword\"},\"created_at\":{\"type\":\"date\"},\"updated_at\":{\"type\":\"date\"},\"name\":{\"type\":\"keyword\"},\"uid\":{\"type\":\"keyword\"},\"secret\":{\"type\":\"keyword\"},\"redirect_uri\":{\"type\":\"keyword\"},\"scopes\":{\"type\":\"keyword\"},\"confidential\":{\"type\":\"boolean\"},\"app_type\":{\"type\":\"keyword\"}}},\"aliases\":{}}"
},
"exception": "/usr/share/enterprise-search/lib/war/lib/swiftype/es/client.class:28: Read timed out (Faraday::TimeoutError)\n",
"duration": 30042.3,
"stack": [
"lib/actastic/schema.class:172:in `create_index!'",
"lib/actastic/schema.class:195:in `create_index_and_mapping!'",
"shared_togo/lib/shared_togo.class:894:in `block in apply_actastic_migrations'",
"shared_togo/lib/shared_togo.class:892:in `block in each'",
"shared_togo/lib/shared_togo.class:892:in `block in apply_actastic_migrations'",
"lib/db_lock.class:182:in `with_status'",
"shared_togo/lib/shared_togo.class:891:in `apply_actastic_migrations'",
"shared_togo/lib/shared_togo.class:406:in `block in install!'",
"lib/db_lock.class:171:in `with_lock'",
"shared_togo/lib/shared_togo.class:399:in `install!'",
"config/application.class:102:in `block in Application'",
"config/environment.class:9:in `<main>'",
"config/environment.rb:1:in `<main>'",
"shared_togo/lib/shared_togo/cli/command.class:37:in `initialize'",
"shared_togo/lib/shared_togo/cli/command.class:10:in `run_and_exit'",
"shared_togo/lib/shared_togo/cli.class:143:in `run_supported_command'",
"shared_togo/lib/shared_togo/cli.class:125:in `run_command'",
"shared_togo/lib/shared_togo/cli.class:112:in `run!'",
"bin/enterprise-search-internal:15:in `<main>'"
]
}
[2022-04-25T16:55:21.340+00:00][7][2000][app-server][INFO]: [db_lock] [installation] Status: [Failed] Creating indices for 38 models: Error = Faraday::TimeoutError: Read timed out
Unexpected exception while running Enterprise Search:
Error: Read timed out at
Master node logs
# kubectl -n deleteme logs -f bselastic-es-masters-0
Skipping security auto configuration because the configuration file [/usr/share/elasticsearch/config/elasticsearch.yml] is missing or is not a regular file
{"#timestamp":"2022-04-25T16:55:11.051Z", "log.level": "INFO", "current.health":"GREEN","message":"Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.ent-search-actastic-search_relevance_suggestions-document_position_id-unique-constraint][0]]]).","previous.health":"YELLOW","reason":"shards started [[.ent-search-actastic-search_relevance_suggestions-document_position_id-unique-constraint][0]]" , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[bselastic-es-masters-0][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.routing.allocation.AllocationService","elasticsearch.cluster.uuid":"rnaZmz4kQwOBNbWau43wYA","elasticsearch.node.id":"YMyOM1umSL22ro86II6Ymw","elasticsearch.node.name":"bselastic-es-masters-0","elasticsearch.cluster.name":"bselastic"}
{"#timestamp":"2022-04-25T16:55:21.447Z", "log.level": "WARN", "message":"writing cluster state took [10525ms] which is above the warn threshold of [10s]; [skipped writing] global metadata, wrote metadata for [0] new indices and [1] existing indices, removed metadata for [0] indices and skipped [48] unchanged indices", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[bselastic-es-masters-0][generic][T#5]","log.logger":"org.elasticsearch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"rnaZmz4kQwOBNbWau43wYA","elasticsearch.node.id":"YMyOM1umSL22ro86II6Ymw","elasticsearch.node.name":"bselastic-es-masters-0","elasticsearch.cluster.name":"bselastic"}
{"#timestamp":"2022-04-25T16:55:21.448Z", "log.level": "INFO", "message":"after [10.3s] publication of cluster state version [226] is still waiting for {bselastic-es-masters-0}{YMyOM1umSL22ro86II6Ymw}{ljGkLdk-RAukc9NEJtQCVw}{192.168.88.213}{192.168.88.213:9300}{m}{k8s_node_name=1175027-kubeworker15.sb.rackspace.com, xpack.installed=true} [SENT_APPLY_COMMIT], {bselastic-es-data-node-0}{K88khDyfRwaGCBZwMKEaHA}{g9mXrT4WTumoj09W1OylYA}{192.168.88.214}{192.168.88.214:9300}{di}{k8s_node_name=1175027-kubeworker15.sb.rackspace.com, xpack.installed=true} [SENT_PUBLISH_REQUEST]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[bselastic-es-masters-0][generic][T#1]","log.logger":"org.elasticsearch.cluster.coordination.Coordinator.CoordinatorPublication","elasticsearch.cluster.uuid":"rnaZmz4kQwOBNbWau43wYA","elasticsearch.node.id":"YMyOM1umSL22ro86II6Ymw","elasticsearch.node.name":"bselastic-es-masters-0","elasticsearch.cluster.name":"bselastic"}
Which attribute we have to set in Enterprise search to increase timeout ? or is there any way to get debug log for Enterprise search ?

You can try to increase the default timeout Globally parameter by following this example:
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
This would help to give the cluster more time to respond.

Related

GKE cluster node ends up with CrashLoopBackOff

I've had a 3 node setup in GKE. And one of my pod creation is in CrashLoopBackOff state and it is not recovering. The log suggests the below java.lang.IllegalArgumentException. But the other 2 pods they have no such issue. They are up and running. I'm completely unsure of the issue, can someone help me?
Is the issue, a by-product of install-plugins in the YML file?
If yes, why didn't the same problem occur with other pods? Can you please help me with it?
Exception:
"type": "server", "timestamp": "2022-08-29T19:52:29,743Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "dev", "node.name": "dev-es-data-hot-1", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: unknown secure setting [dev-es-snapshot-backup-feeb83405c27.json] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.16.3.jar:7.16.3]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.16.3.jar:7.16.3]",
"Caused by: java.lang.IllegalArgumentException: unknown secure setting [dev-es-snapshot-backup-feeb83405c27.json] please check that any required plugins are installed, or check the breaking changes documentation for removed settings",
"at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:561) ~[elasticsearch-7.16.3.jar:7.16.3]",
uncaught exception in thread [main]
"at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:507) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:477) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:447) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.common.settings.SettingsModule.<init>(SettingsModule.java:137) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.node.Node.<init>(Node.java:500) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) ~[elasticsearch-7.16.3.jar:7.16.3]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-7.16.3.jar:7.16.3]",
"... 6 more"] }
Here is my YAML config:
- name: data-hot-ingest
count: 3
config:
node.roles: ["data_hot", "ingest", "data_content"]
node.attr.data: hot
node.store.allow_mmap: false
xpack.security.authc:
anonymous:
username: anon
roles: monitoring_user
podTemplate:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- hot
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch repository-gcs
- name: set-virtual-mem
command:
- sysctl
- -w
- vm.max_map_count=262144
containers:
- name: elasticsearch
resources:
requests:
memory: "64Gi"
cpu: "30000m"
limits:
memory: "65Gi"
cpu: "30000m"
env:
- name: ES_JAVA_OPTS
value: -Xms32g -Xmx32g
readinessProbe:
httpGet:
scheme: HTTPS
port: 8080
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 350Gi
storageClassName: gold
EDIT:
We have this secure setting configured, which is linked to a secret in our
secureSettings:
- secretName: credentials
[ANSWERING MY OWN QUESTION]
Trying to resolve the below exception:
java.lang.IllegalArgumentException: unknown secure setting [dev-es-snapshot-backup-feeb83405c27.json]
I tried comparing the yaml config of the pods, and I found that the pods running successfully do not have a secure setting. But the pod that was crash looping, had the secure setting under elastic-internal-secure-settings
- name: elastic-internal-secure-settings
secret:
defaultMode: 420
optional: false
secretName: dev-es-secure-settings
And in the operator yaml, I found this:
secureSettings:
- secretName: credentials
Just to confirm the behaviour, I upscaled the statefulset, and found the new pod also crash looping with the same error. So someone had tried the secure setting last month, and it crash looped the pod, and didn't reset it back to normal. Once I removed the secure-setting from the operator yaml, the pods started running without any issue.

Filebeat initialize failed with 10.96.0.1:443 i/o timeout error

In my k8s cluster, filebeat connection is failing after a node restart. Other k8s nodes work normally.
logs from filebeat pod:
2020-08-30T03:18:58.770Z ERROR kubernetes/util.go:90 kubernetes: Querying for pod failed with error: performing request: Get https://10.96.0.1:443/api/v1/namespaces/monitoring/pods/filebeat-gfg5l: dial tcp 10.96.0.1:443: i/o timeout
2020-08-30T03:18:58.770Z INFO kubernetes/watcher.go:180 kubernetes: Performing a resource sync for *v1.PodList
2020-08-30T03:19:28.771Z ERROR kubernetes/watcher.go:183 kubernetes: Performing a resource sync err performing request: Get https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout for *v1.PodList
2020-08-30T03:19:28.771Z INFO instance/beat.go:357 filebeat stopped.
2020-08-30T03:19:28.771Z ERROR instance/beat.go:800 Exiting: error initializing publisher: error initializing processors: performing request: Get https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
Exiting: error initializing publisher: error initializing processors: performing request: Get https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
An error occurs and pod restarts are repeated. Also i restarted this node, but it didn't work.
filebeat version is 6.5.2 and deployed using daemonset. Are there any known issues like this?
All pods except filebeat work on that node has no problems.
update:
apiVersion: v1
data:
filebeat.yml: |-
filebeat.inputs:
- type: docker
multiline.pattern: '^[[:space:]]+'
multiline.negate: false
multiline.match: after
symlinks: true
cri.parse_flags: true
containers:
ids: [""]
path: "/var/log/containers"
processors:
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 1
target: message_json
overwrite_keys: false
when:
contains:
source: "/var/log/containers/app"
- add_kubernetes_metadata:
in_cluster: true
default_matchers.enabled: false
matchers:
- logs_path:
logs_path: /var/log/containers/
output:
logstash:
hosts:
- logstash:5044
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-06T09:31:31Z"
labels:
k8s-app: filebeat
name: filebeat-config
namespace: monitoring
resourceVersion: "6797684985"
selfLink: /api/v1/namespaces/monitoring/configmaps/filebeat-config
uid: 52d86bbb-3067-11ea-89c6-246e96da5c9c
The add_kubernetes_metadata failed querying https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&resourceVersion=0. As it turned out in the discussion above, this could be fixed by a restart of the Beat that resolved the temporary network interface problem.

Impossible connect to elasticsearch in kubernetes(bare metal)

I've set up elastic + kibana + metricbeat in local cluster. But the metricbeat can't connect to elastic:
ERROR pipeline/output.go:100 Failed to connect to
backoff(elasticsearch(http://elasticsearch:9200)): Get http://elasticsearch:9200: lookup
elasticsearch on 10.96.0.10:53: no such host
2019-10-15T14:14:32.553Z INFO pipeline/output.go:93 Attempting to reconnect to
backoff(elasticsearch(http://elasticsearch:9200)) with 10 reconnect attempt(s)
2019-10-15T14:14:32.553Z INFO [publisher] pipeline/retry.go:189 retryer: send unwait-signal to consumer
2019-10-15T14:14:32.553Z INFO [publisher] pipeline/retry.go:191 done
2019-10-15T14:14:32.553Z INFO [publisher] pipeline/retry.go:166 retryer: send wait signal to consumer
2019-10-15T14:14:32.553Z INFO [publisher] pipeline/retry.go:168 done
2019-10-15T14:14:32.592Z WARN transport/tcp.go:53 DNS lookup failure "elasticsearch": lookup elasticsearch on 10.96.0.10:53: no such host
In my cluster I use metalldb and ingress. I've set up ingress rules but it didnt help me.
Also I've noticed that the elk and the metricbeat have different namespaces in docs. I've tried make everywhere the same namespaces but it was unsuccesfully.
Below I've attached my yamls. Files for elastic/kibana and metricbeat I didn't attach because they have a lot of lines, I wrote only ref on them:
elastic/kibana -
https://download.elastic.co/downloads/eck/1.0.0-beta1/all-in-one.yaml
metricbeat - https://raw.githubusercontent.com/elastic/beats/7.4/deploy/kubernetes/metricbeat-kubernetes.yaml
Maybe anybody know why it happens?
**elastic config** -
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.4.0
nodeSets:
- name: default
count: 1
config:
node.master: true
node.data: true
node.ingest: true
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # note: elasticsearch-data must be the name of the Elasticsearch volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: standard
http:
service:
spec:
type: LoadBalancer
**kibana config** -
apiVersion: kibana.k8s.elastic.co/v1beta1
kind: Kibana
metadata:
name: quickstart
spec:
version: 7.4.0
count: 1
elasticsearchRef:
name: quickstart
http:
service:
spec:
type: LoadBalancer
tls:
selfSignedCertificate:
disabled: true
**ingress rules** -
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress
annotations:
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: undemo-service
servicePort: 80
- path: /
backend:
serviceName: quickstart-kb-http
servicePort: 80
- path: /
backend:
serviceName: quickstart-es-http
servicePort: 80
Just to be aware. Filebeat, metricbeats... runs under kube-system namespace.
If you run elastic on default namespace you should use elasticsearch.default as host in order to resolve your service properly.

microk8s.enable dns gets stuck in ContainerCreating

I have installed microk8s snap on Ubuntu 19 in a VBox. When I run microk8s.enable dns, the pod for the deployment does not get past ContainerCreating state.
I used to work in before. I have also re-installed microk8s, this helped in the passed, but not anymore.
n.a.
Output from microk8s.kubectl get all --all-namespaces shows that something is wrong with the volume for the secrets. I don't know how I can investigate further, so any help is appreciated.
Cheers
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-9b8997588-z88lz 0/1 ContainerCreating 0 16m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 20m
kube-system service/kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 16m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/1 1 0 16m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-9b8997588 1 1 0 16m
Output from microk8s.kubectl describe pod/coredns-9b8997588-z88lz -n kube-system
Name: coredns-9b8997588-z88lz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: peza-ubuntu-19/10.0.2.15
Start Time: Sun, 29 Sep 2019 15:49:27 +0200
Labels: k8s-app=kube-dns
pod-template-hash=9b8997588
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-9b8997588
Containers:
coredns:
Container ID:
Image: coredns/coredns:1.5.0
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-h6qlm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-h6qlm:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-h6qlm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/coredns-9b8997588-z88lz to peza-ubuntu-19
Warning FailedMount 5m59s kubelet, peza-ubuntu-19 Unable to attach or mount volumes: unmounted volumes=[coredns-token-h6qlm config-volume], unattached volumes=[coredns-token-h6qlm config-volume]: timed out waiting for the condition
Warning FailedMount 3m56s (x11 over 10m) kubelet, peza-ubuntu-19 MountVolume.SetUp failed for volume "coredns-token-h6qlm" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 3m44s (x2 over 8m16s) kubelet, peza-ubuntu-19 Unable to attach or mount volumes: unmounted volumes=[config-volume coredns-token-h6qlm], unattached volumes=[config-volume coredns-token-h6qlm]: timed out waiting for the condition
Warning FailedMount 113s (x12 over 10m) kubelet, peza-ubuntu-19 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
I spent my morning fighting with this on ubuntu 19.04. None of the microk8s add-ons worked. Their containers got stuck in "ContainerCreating" status having something like "MountVolume.SetUp failed for volume "kubernetes-dashboard-token-764ml" : failed to sync secret cache: timed out waiting for the condition" in their descriptions.
I tried to start/stop/reset/reinstall microk8s a few times. Nothing worked. Once I downgraded it to the prev version the problem went away.
sudo snap install microk8s --classic --channel=1.15/stable

Output: mount.nfs: requested NFS version or transport protocol is not supported

I am trying out the Kubernetes NFS volume claim in a replication controller example [1].
I have setup the NFS server, PV and PVC. And my replication controller looks like this
apiVersion: v1
kind: ReplicationController
metadata:
name: node-manager
labels:
name: node-manager
spec:
replicas: 1
selector:
name: node-manager
template:
metadata:
labels:
name: node-manager
spec:
containers:
-
name: node-manager
image: org/node-manager-1.0.0:1.0.0
ports:
-
containerPort: 9763
protocol: "TCP"
-
containerPort: 9443
protocol: "TCP"
volumeMounts:
- name: nfs
mountPath: "/mnt/data"
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfs
When I try to deploy the Replication Controller, the container is in the ContainerCreating status and I can see the following error in the journal of the minion
Feb 26 11:39:41 node-01 kubelet[1529]: Mounting arguments: 172.17.8.102:/ /var/lib/kubelet/pods/0e66affa-dc79-11e5-89b3-080027f84891/volumes/kubernetes.io~nfs/nfs nfs []
Feb 26 11:39:41 node-01 kubelet[1529]: Output: mount.nfs: requested NFS version or transport protocol is not supported
Feb 26 11:39:41 node-01 kubelet[1529]: E0226 11:39:41.908756 1529 kubelet.go:1383] Unable to mount volumes for pod "node-manager-eemi2_default": exit status 32; skipping pod
Feb 26 11:39:41 node-01 kubelet[1529]: E0226 11:39:41.923297 1529 pod_workers.go:112] Error syncing pod 0e66affa-dc79-11e5-89b3-080027f84891, skipping: exit status 32
Feb 26 11:39:51 node-01 kubelet[1529]: E0226 11:39:51.904931 1529 mount_linux.go:103] Mount failed: exit status 32
Used [2] Kubernetes-cluster-vagrant-cluster to setup my Kubernetes cluster.
my minion details:
core#node-01 ~ $ cat /etc/lsb-release
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=969.0.0
DISTRIB_CODENAME="Coeur Rouge"
DISTRIB_DESCRIPTION="CoreOS 969.0.0 (Coeur Rouge)"
[1] - https://github.com/kubernetes/kubernetes/tree/master/examples/nfs
[2] - https://github.com/pires/kubernetes-vagrant-coreos-cluster
I had the same problem then realized that nfs-server.service status is disabled. After activating, the problem has been solved.
hence in order to resolve this nfs mount version issue by making the entry in /etc/nfsmount.conf in nfs server with Defaultvers=4 in the NFS server .The will resolved !!

Resources