websockets on GKE with istio gives 'no healthy upstream' and 'CrashLoopBackOff' - socket.io
I am on GKE using Istio version 1.0.3 . I try to get my express.js with socket.io (and uws engine) backend working with websockets and had this backend running before on a 'non kubernetes server' with websockets without problems.
When I simply enter the external_gke_ip as url I get my backend html page, so http works. But when my client-app makes socketio authentication calls from my client-app I get 503 errors in the browser console:
WebSocket connection to 'ws://external_gke_ip/socket.io/?EIO=3&transport=websocket' failed: Error during WebSocket handshake: Unexpected response code: 503
And when I enter the external_gke_ip as url while socket calls are made I get: no healthy upstream in the browser. And the pod gives: CrashLoopBackOff.
I find somewhere: 'in node.js land, socket.io typically does a few non-websocket Handshakes to the Server before eventually upgrading to Websockets. If you don't have sticky-sessions, the upgrade never works.' So maybe I need sticky sessions? Or not... as I just have one replica of my app? It seems to be done by setting sessionAffinity: ClientIP, but with istio I do not know how to do this and in the GUI I can edit some values of the loadbalancers, but Session affinity shows 'none' and I can not edit it.
Other settings that might be relevant and that I am not sure of (how to set using istio) are:
externalTrafficPolicy=Local
Ttl
My manifest config file:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
app: myapp
ports:
- port: 8089
targetPort: 8089
protocol: TCP
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: gcr.io/myproject/firstapp:v1
imagePullPolicy: Always
ports:
- containerPort: 8089
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
readinessProbe:
httpGet:
path: /healthz
scheme: HTTP
port: 8089
initialDelaySeconds: 10
timeoutSeconds: 5
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=myproject:europe-west4:osm=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myapp-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "*"
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: myapp
weight: 100
websocketUpgrade: true
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-apis
spec:
hosts:
- "*.googleapis.com"
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: cloud-sql-instance
spec:
hosts:
- 35.204.XXX.XX # ip of cloudsql database
ports:
- name: tcp
number: 3307
protocol: TCP
location: MESH_EXTERNAL
Various output (while making socket calls, when I stop these the deployment restarts and READY returns to 3/3):
kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-8888 2/3 CrashLoopBackOff 11 1h
$ kubectl describe pod/myapp-8888 gives:
Name: myapp-8888
Namespace: default
Node: gke-standard-cluster-1-default-pool-888888-9vtk/10.164.0.36
Start Time: Sat, 19 Jan 2019 14:33:11 +0100
Labels: app=myapp
pod-template-hash=207157
Annotations:
kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container app; cpu request for container cloudsql-proxy
sidecar.istio.io/status:
{"version":"3c9617ff82c9962a58890e4fa987c69ca62487fda71c23f3a2aad1d7bb46c748","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.44.0.5
Controlled By: ReplicaSet/myapp-64c59c94dc
Init Containers:
istio-init:
Container ID: docker://a417695f99509707d0f4bfa45d7d491501228031996b603c22aaf398551d1e45
Image: gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxy_init#sha256:e30d47d2f269347a973523d0c5d7540dbf7f87d24aca2737ebc09dbe5be53134
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
8089,
-d
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:33:19 +0100
Finished: Sat, 19 Jan 2019 14:33:19 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Containers:
app:
Container ID: docker://888888888888888888888888
Image: gcr.io/myproject/firstapp:v1
Image ID: docker-pullable://gcr.io/myproject/firstapp#sha256:8888888888888888888888888
Port: 8089/TCP
Host Port: 0/TCP
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:40:14 +0100
Finished: Sat, 19 Jan 2019 14:40:37 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 19 Jan 2019 14:39:28 +0100
Finished: Sat, 19 Jan 2019 14:39:46 +0100
Ready: False
Restart Count: 3
Requests:
cpu: 100m
Readiness: http-get http://:8089/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
POSTGRES_DB_HOST: 127.0.0.1:5432
POSTGRES_DB_USER: <set to the key 'username' in secret 'mysecret'> Optional: false
POSTGRES_DB_PASSWORD: <set to the key 'password' in secret 'mysecret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
cloudsql-proxy:
Container ID: docker://788888888888888888888888888
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:5c690349ad8041e8b21eaa63cb078cf13188568e0bfac3b5a914da3483079e2b
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=myproject:europe-west4:osm=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Sat, 19 Jan 2019 14:33:40 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rclsf (ro)
istio-proxy:
Container ID: docker://f3873d0f69afde23e85d6d6f85b1f
Image: gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxyv2#sha256:826ef4469e4f1d4cabd0dc846
Port: <none>
Host Port: <none>
Args:
proxy
sidecar
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
myapp
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15007
--discoveryRefreshDelay
1s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-statsd-prom-bridge.istio-system:9125
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
State: Running
Started: Sat, 19 Jan 2019 14:33:54 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: myapp-64c59c94dc-8888 (v1:metadata.name)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-rclsf:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rclsf
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned myapp-64c59c94dc-tdb9c to gke-standard-cluster-1-default-pool-65b9e650-9vtk
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "cloudsql-instance-credentials"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "default-token-rclsf"
Normal SuccessfulMountVolume 7m31s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk MountVolume.SetUp succeeded for volume "istio-certs"
Normal Pulling 7m30s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Pulled 7m25s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0"
Normal Created 7m24s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 7m23s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m4s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Pulled 7m3s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.11"
Normal Started 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 7m2s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Pulled 6m54s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0"
Normal Created 6m51s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 6m48s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulling 111s (x2 over 7m22s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk pulling image "gcr.io/myproject/firstapp:v3"
Normal Created 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Created container
Normal Started 110s (x2 over 7m4s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Started container
Normal Pulled 110s (x2 over 7m7s) kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Successfully pulled image "gcr.io/myproject/firstapp:v3"
Warning Unhealthy 99s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 85s kubelet, gke-standard-cluster-1-default-pool-65b9e650-9vtk Back-off restarting failed container
And:
$ kubectl logs myapp-8888 myapp
> api_server#0.0.0 start /usr/src/app
> node src/
info: Feathers application started on http://localhost:8089
And the database logs (which looks ok, as some 'startup script entries' from app can be retrieved using psql):
$ kubectl logs myapp-8888 cloudsql-proxy
2019/01/19 13:33:40 using credential file for authentication; email=proxy-user#myproject.iam.gserviceaccount.com
2019/01/19 13:33:40 Listening on 127.0.0.1:5432 for myproject:europe-west4:osm
2019/01/19 13:33:40 Ready for new connections
2019/01/19 13:33:54 New connection for "myproject:europe-west4:osm"
2019/01/19 13:33:55 couldn't connect to "myproject:europe-west4:osm": Post https://www.googleapis.com/sql/v1beta4/projects/myproject/instances/osm/createEphemeral?alt=json: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp 74.125.143.95:443: getsockopt: connection refused
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:06 Client closed local connection on 127.0.0.1:5432
2019/01/19 13:39:13 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
2019/01/19 13:39:14 New connection for "myproject:europe-west4:osm"
EDIT:
Here is the serverside log of the 503 of websocket calls to my app:
{
insertId: "465nu9g3xcn5hf"
jsonPayload: {
apiClaims: ""
apiKey: ""
clientTraceId: ""
connection_security_policy: "unknown"
destinationApp: "myapp"
destinationIp: "10.44.XX.XX"
destinationName: "myapp-888888-88888"
destinationNamespace: "default"
destinationOwner: "kubernetes://apis/extensions/v1beta1/namespaces/default/deployments/myapp"
destinationPrincipal: ""
destinationServiceHost: "myapp.default.svc.cluster.local"
destinationWorkload: "myapp"
httpAuthority: "35.204.XXX.XXX"
instance: "accesslog.logentry.istio-system"
latency: "1.508885ms"
level: "info"
method: "GET"
protocol: "http"
receivedBytes: 787
referer: ""
reporter: "source"
requestId: "bb31d922-8f5d-946b-95c9-83e4c022d955"
requestSize: 0
requestedServerName: ""
responseCode: 503
responseSize: 57
responseTimestamp: "2019-01-18T20:53:03.966513Z"
sentBytes: 164
sourceApp: "istio-ingressgateway"
sourceIp: "10.44.X.X"
sourceName: "istio-ingressgateway-8888888-88888"
sourceNamespace: "istio-system"
sourceOwner: "kubernetes://apis/extensions/v1beta1/namespaces/istio-system/deployments/istio-ingressgateway"
sourcePrincipal: ""
sourceWorkload: "istio-ingressgateway"
url: "/socket.io/?EIO=3&transport=websocket"
userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
xForwardedFor: "10.44.X.X"
}
logName: "projects/myproject/logs/stdout"
metadata: {
systemLabels: {
container_image: "gcr.io/gke-release/istio/mixer:1.0.2-gke.0"
container_image_id: "docker-pullable://gcr.io/gke-release/istio/mixer#sha256:888888888888888888888888888888"
name: "mixer"
node_name: "gke-standard-cluster-1-default-pool-88888888888-8887"
provider_instance_id: "888888888888"
provider_resource_type: "gce_instance"
provider_zone: "europe-west4-a"
service_name: [
0: "istio-telemetry"
]
top_level_controller_name: "istio-telemetry"
top_level_controller_type: "Deployment"
}
userLabels: {
app: "telemetry"
istio: "mixer"
istio-mixer-type: "telemetry"
pod-template-hash: "88888888888"
}
}
receiveTimestamp: "2019-01-18T20:53:08.135805255Z"
resource: {
labels: {
cluster_name: "standard-cluster-1"
container_name: "mixer"
location: "europe-west4-a"
namespace_name: "istio-system"
pod_name: "istio-telemetry-8888888-8888888"
project_id: "myproject"
}
type: "k8s_container"
}
severity: "INFO"
timestamp: "2019-01-18T20:53:03.965100Z"
}
In the browser at first it properly seems to switch protocol but then causes a repeated 503 response and subsequent health issues cause a repeating restart. The protocol switch websocket call:
General:
Request URL: ws://localhost:8080/sockjs-node/842/s4888/websocket
Request Method: GET
Status Code: 101 Switching Protocols [GREEN]
Response headers:
Connection: Upgrade
Sec-WebSocket-Accept: NS8888888888888888888
Upgrade: websocket
Request headers:
Accept-Encoding: gzip, deflate, br
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Cookie: _ga=GA1.1.1118102238.18888888; hblid=nSNQ2mS8888888888888; olfsk=ol8888888888
Host: localhost:8080
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: b8zkVaXlEySHasCkD4aUiw==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
Its frames:
Following the above I get multiple of these:
Chrome output regarding websocket call:
general:
Request URL: ws://35.204.210.134/socket.io/?EIO=3&transport=websocket
Request Method: GET
Status Code: 503 Service Unavailable
response headers:
connection: close
content-length: 19
content-type: text/plain
date: Sat, 19 Jan 2019 14:06:39 GMT
server: envoy
request headers:
Accept-Encoding: gzip, deflate
Accept-Language: nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: no-cache
Connection: Upgrade
Host: 35.204.210.134
Origin: http://localhost:8080
Pragma: no-cache
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: VtKS5xKF+GZ4u3uGih2fig==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
The frames:
Data: (Opcode -1)
Length: 63
Time: 15:06:44.412
Using uws (uWebSockets) as websocket engine causes these errors. When I swap in my backend app this code:
app.configure(socketio({
wsEngine: 'uws',
timeout: 120000,
reconnect: true
}))
for this:
app.configure(socketio())
Everything works as expected.
EDIT: Now it also works with uws. I used alpine docker container which is based on node 10, which does not work with uws. After switching to container based on node 8 it works.
Related
Sonarqube plugin installation using netrCreds
We are installing Sonarqube as a self managed service via the helm charts at https://SonarSource.github.io/helm-chart-sonarqube.. Sonarqube instance was working fine but we did a change to use netrc type Credentials to download plugins from JFrog artifactory after which our pods started failing. Log details can be found as below bash-3.2$ kubectl logs sonarqube-sonarqube-0 install-plugins -n sonarqube sh: /opt/sonarqube/extensions/downloads/sonar-pmd-plugin-3.3.1.jar: unknown operand curl: (22) The requested URL returned error: 403 bash-3.2$ kubectl exec sonarqube-sonarqube-0 -n sonarqube -- ls /opt/sonarqube/extensions/download Defaulted container "sonarqube" out of: sonarqube, init-sysctl (init), concat-properties (init), inject-prometheus-exporter (init), init-fs (init), install-plugins (init) error: unable to upgrade connection: container not found ("sonarqube") NAME READY STATUS RESTARTS AGE sonarqube-sonarqube-0 0/1 Init:CrashLoopBackOff 525 44h Name: sonarqube-sonarqube-0 Namespace: sonarqube Priority: 0 Node: ip-10-110-198-195.eu-west-1.compute.internal/10.110.198.195 Start Time: Sat, 10 Sep 2022 13:57:31 +0200 Labels: app=sonarqube controller-revision-hash=sonarqube-sonarqube-6d6c785f6f release=sonarqube statefulset.kubernetes.io/pod-name=sonarqube-sonarqube-0 Annotations: checksum/config: 823d389fbc2ce9b41133d9542232fb023520659597f5473b44f9c0a870c2c6a7 checksum/init-fs: ad6cbc139b1960af56d3e813d56eb450949be388fa84686c48265d32e68cb895 checksum/init-sysctl: 3fc2c9dee4de70eed6b8b0b7112095ccbf69694166ee05c3e59ccfc7571461aa checksum/plugins: 649c5fdb8f1b2f07b1999a8d5f7e56f9ae65d05e25d537fcdfc7e1c5ff6c9103 checksum/prometheus-ce-config: b2643e1c7fd0d26ede75ee98c7e646dfcb9255b1f73d1c51616dc3972499bb08 checksum/prometheus-config: 3f1303040aa8c859addcf37c7b82e376b3d90adcdc0b209fa251ca72ec9bee7e checksum/secret: 7b9cfd0db7ecd7dc34ee86567e5bc93601ccca66047d3452801b6222fd44df84 kubernetes.io/psp: eks.privileged Status: Pending IP: 10.110.202.249 IPs: IP: 10.110.202.249 Controlled By: StatefulSet/sonarqube-sonarqube Init Containers: init-sysctl: Container ID: docker://3e66f63924be5c251a46cf054107951f5056f23a096b2f6c8c31b77842e0f29d Image: leaseplan.jfrog.io/docker-hub/busybox:latest Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613 Port: <none> Host Port: <none> Command: sh -e /tmp/scripts/init_sysctl.sh State: Terminated Reason: Completed Exit Code: 0 Started: Sat, 10 Sep 2022 13:57:42 +0200 Finished: Sat, 10 Sep 2022 13:57:42 +0200 Ready: True Restart Count: 0 Limits: cpu: 50m memory: 128Mi Requests: cpu: 20m memory: 64Mi Environment: <none> Mounts: /tmp/scripts/ from init-sysctl (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) concat-properties: Container ID: docker://b04f51eaa84bf4198437c7a782e0d186ea93337ac91cc6dae862b836fc6ef6a9 Image: leaseplan.jfrog.io/docker-hub/busybox:latest Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613 Port: <none> Host Port: <none> Command: sh -c #!/bin/sh if [ -f /tmp/props/sonar.properties ]; then cat /tmp/props/sonar.properties > /tmp/result/sonar.properties fi if [ -f /tmp/props/secret.properties ]; then cat /tmp/props/secret.properties > /tmp/result/sonar.properties fi if [ -f /tmp/props/sonar.properties -a -f /tmp/props/secret.properties ]; then awk 1 /tmp/props/sonar.properties /tmp/props/secret.properties > /tmp/result/sonar.properties fi State: Terminated Reason: Completed Exit Code: 0 Started: Sat, 10 Sep 2022 13:57:43 +0200 Finished: Sat, 10 Sep 2022 13:57:43 +0200 Ready: True Restart Count: 0 Limits: cpu: 50m memory: 128Mi Requests: cpu: 20m memory: 64Mi Environment: <none> Mounts: /tmp/props/sonar.properties from config (rw,path="sonar.properties") /tmp/result from concat-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) inject-prometheus-exporter: Container ID: docker://22d8f7458c95d1d7ad096f2f804cac5fef64b889895274558739f691820786e0 Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1 Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c Port: <none> Host Port: <none> Command: /bin/sh -c Args: curl -s 'https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.0/jmx_prometheus_javaagent-0.16.0.jar' --output /data/jmx_prometheus_javaagent.jar -v State: Terminated Reason: Completed Exit Code: 0 Started: Sat, 10 Sep 2022 13:57:43 +0200 Finished: Sat, 10 Sep 2022 13:57:44 +0200 Ready: True Restart Count: 0 Limits: cpu: 50m memory: 128Mi Requests: cpu: 20m memory: 64Mi Environment: http_proxy: https_proxy: no_proxy: Mounts: /data from sonarqube (rw,path="data") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) init-fs: Container ID: docker://2005fe2dbe2ca4c5150d91955563c9df948864ea65fca9d9bfa397b6f8699410 Image: leaseplan.jfrog.io/docker-hub/busybox:latest Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/busybox#sha256:20142e89dab967c01765b0aea3be4cec3a5957cc330f061e5503ef6168ae6613 Port: <none> Host Port: <none> Command: sh -e /tmp/scripts/init_fs.sh State: Terminated Reason: Completed Exit Code: 0 Started: Sat, 10 Sep 2022 13:57:44 +0200 Finished: Sat, 10 Sep 2022 13:57:44 +0200 Ready: True Restart Count: 0 Limits: cpu: 50m memory: 128Mi Requests: cpu: 20m memory: 64Mi Environment: <none> Mounts: /opt/sonarqube/data from sonarqube (rw,path="data") /opt/sonarqube/extensions from sonarqube (rw,path="extensions") /opt/sonarqube/logs from sonarqube (rw,path="logs") /opt/sonarqube/temp from sonarqube (rw,path="temp") /tmp from tmp-dir (rw) /tmp/scripts/ from init-fs (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) install-plugins: Container ID: docker://58a6bed99749e3da7c4818a6f0e0061ac5bced70563020ccc55b4b63ab721125 Image: leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1 Image ID: docker-pullable://leaseplan.jfrog.io/docker-hub/curlimages/curl#sha256:fa32ef426092b88ee0b569d6f81ab0203ee527692a94ec2e6ceb2fd0b6b2755c Port: <none> Host Port: <none> Command: sh -e /tmp/scripts/install_plugins.sh State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 22 Started: Mon, 12 Sep 2022 10:53:52 +0200 Finished: Mon, 12 Sep 2022 10:53:56 +0200 Ready: False Restart Count: 525 Limits: cpu: 50m memory: 128Mi Requests: cpu: 20m memory: 64Mi Environment: http_proxy: https_proxy: no_proxy: Mounts: /opt/sonarqube/extensions/downloads from sonarqube (rw,path="extensions/downloads") /opt/sonarqube/lib/common from sonarqube (rw,path="lib/common") /root from plugins-netrc-file (rw) /tmp/scripts/ from install-plugins (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) Containers: sonarqube: Container ID: Image: leaseplan.jfrog.io/docker-hub/sonarqube:9.5.0-developer Image ID: Ports: 9000/TCP, 8000/TCP, 8001/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Limits: cpu: 4 memory: 6Gi Requests: cpu: 1 memory: 4Gi Liveness: http-get http://:http/api/system/liveness delay=60s timeout=1s period=30s #success=1 #failure=6 Readiness: exec [sh -c #!/bin/bash # A Sonarqube container is considered ready if the status is UP, DB_MIGRATION_NEEDED or DB_MIGRATION_RUNNING # status about migration are added to prevent the node to be kill while sonarqube is upgrading the database. host="$(hostname -i || echo '127.0.0.1')" if wget --proxy off -qO- http://${host}:9000/api/system/status | grep -q -e '"status":"UP"' -e '"status":"DB_MIGRATION_NEEDED"' -e '"status":"DB_MIGRATION_RUNNING"'; then exit 0 fi exit 1 ] delay=60s timeout=1s period=30s #success=1 #failure=6 Startup: http-get http://:http/api/system/status delay=30s timeout=1s period=10s #success=1 #failure=24 Environment Variables from: sonarqube-sonarqube-jdbc-config ConfigMap Optional: false Environment: SONAR_WEB_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8000:/opt/sonarqube/conf/prometheus-config.yaml SONAR_CE_JAVAOPTS: -javaagent:/opt/sonarqube/data/jmx_prometheus_javaagent.jar=8001:/opt/sonarqube/conf/prometheus-ce-config.yaml SONAR_JDBC_PASSWORD: <set to the key 'password' in secret 'sonarqube-database'> Optional: false SONAR_WEB_SYSTEMPASSCODE: <set to the key 'SONAR_WEB_SYSTEMPASSCODE' in secret 'sonarqube-sonarqube-monitoring-passcode'> Optional: false Mounts: /opt/sonarqube/conf/ from concat-dir (rw) /opt/sonarqube/conf/prometheus-ce-config.yaml from prometheus-ce-config (rw,path="prometheus-ce-config.yaml") /opt/sonarqube/conf/prometheus-config.yaml from prometheus-config (rw,path="prometheus-config.yaml") /opt/sonarqube/data from sonarqube (rw,path="data") /opt/sonarqube/extensions from sonarqube (rw,path="extensions") /opt/sonarqube/logs from sonarqube (rw,path="logs") /opt/sonarqube/temp from sonarqube (rw,path="temp") /tmp from tmp-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n89wf (ro) Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True Volumes: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-config Optional: false plugins-netrc-file: Type: Secret (a volume populated by a Secret) SecretName: eks-prv-0001-maven-local-default Optional: false init-sysctl: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-init-sysctl Optional: false init-fs: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-init-fs Optional: false install-plugins: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-install-plugins Optional: false prometheus-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-prometheus-config Optional: false prometheus-ce-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: sonarqube-sonarqube-prometheus-ce-config Optional: false sonarqube: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: sonarqube-sonarqube ReadOnly: false tmp-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> concat-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> kube-api-access-n89wf: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 29m (x521 over 44h) kubelet Container image "leaseplan.jfrog.io/docker-hub/curlimages/curl:7.76.1" already present on machine Warning BackOff 4m39s (x12224 over 44h) kubelet Back-off restarting failed container
Microk8s Pod fails to start when i create a service for it with 1000 UDP ports
I am creating a deployment with a custom image i have in a private registry, the container will have lots of ports that need to be exposed, i want to expose them with a NodePort service, if i create a service with 1000 UDP ports then create the deployment, the deployment pod will keep crashing, if i delete the service and the deployment then create the deployment only without the service, the pod starts normally. Any clue why would this be happening ? Pod Description: Name: freeswitch-7764cff4c9-d8zvh Namespace: default Priority: 0 Node: cc-lab/192.168.102.55 Start Time: Wed, 01 Jun 2022 15:44:09 +0000 Labels: app=freeswitch pod-template-hash=7764cff4c9 Annotations: cni.projectcalico.org/containerID: de4baf5c4522e1f3c746a08a60bd7166179bac6c4aef245708205112ad71058a cni.projectcalico.org/podIP: 10.1.5.8/32 cni.projectcalico.org/podIPs: 10.1.5.8/32 Status: Running IP: 10.1.5.8 IPs: IP: 10.1.5.8 Controlled By: ReplicaSet/freeswitch-7764cff4c9 Containers: freeswtich: Container ID: containerd://9cdae9120cc075af73d57ea0759b89c153c8fd5766bc819554d82fdc674e03be Image: 192.168.102.55:32000/freeswitch:v2 Image ID: 192.168.102.55:32000/freeswitch#sha256:e6a36d220f4321e3c17155a889654a83dc37b00fb9d58171f969ec2dccc0a774 Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 139 Started: Wed, 01 Jun 2022 15:47:16 +0000 Finished: Wed, 01 Jun 2022 15:47:20 +0000 Ready: False Restart Count: 5 Environment: <none> Mounts: /etc/freeswitch from freeswitch-config (rw) /tmp from freeswitch-tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwkc8 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: freeswitch-config: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: freeswitch-config ReadOnly: false freeswitch-tmp: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: freeswitch-tmp ReadOnly: false kube-api-access-mwkc8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 4m3s (x5 over 5m44s) kubelet Container image "192.168.102.55:32000/freeswitch:v2" already present on machine Normal Created 4m3s (x5 over 5m43s) kubelet Created container freeswtich Normal Started 4m3s (x5 over 5m43s) kubelet Started container freeswtich Warning BackOff 41s (x24 over 5m35s) kubelet Back-off restarting failed container Service : apiVersion: v1 kind: Service metadata: name: freeswitch spec: type: NodePort selector: app: freeswitch ports: - port: 30000 nodePort: 30000 name: rtp30000 protocol: UDP - port: 30001 nodePort: 30001 name: rtp30001 protocol: UDP - port: 30002 nodePort: 30002 name: rtp30002 protocol: UDP - port: 30003 nodePort: 30003 name: rtp30003 protocol: UDP - port: 30004...... this goes on for port 30999 Deployment : apiVersion: apps/v1 kind: Deployment metadata: name: freeswitch spec: selector: matchLabels: app: freeswitch template: metadata: labels: app: freeswitch spec: containers: - name: freeswtich image: 192.168.102.55:32000/freeswitch:v2 imagePullPolicy: IfNotPresent volumeMounts: - name: freeswitch-config mountPath: /etc/freeswitch - name: freeswitch-tmp mountPath: /tmp restartPolicy: Always volumes: - name: freeswitch-config persistentVolumeClaim: claimName: freeswitch-config - name: freeswitch-tmp persistentVolumeClaim: claimName: freeswitch-tmp
Greenplum Operator on kubernetes zapr error
I am trying to deploy Greenplum Operator on kubernetes and I get the following error: kubectl describe pod greenplum-operator-87d989b4d-ldft6: Name: greenplum-operator-87d989b4d-ldft6 Namespace: greenplum Priority: 0 Node: node-1/some-ip Start Time: Mon, 23 May 2022 14:07:26 +0200 Labels: app=greenplum-operator pod-template-hash=87d989b4d Annotations: cni.projectcalico.org/podIP: some-ip cni.projectcalico.org/podIPs: some-ip Status: Running IP: some-ip IPs: IP: some-ip Controlled By: ReplicaSet/greenplum-operator-87d989b4d Containers: greenplum-operator: Container ID: docker://364997050b1f337ff61b8ce40534697bbc13aae29f7b9ae5255245375acce03f Image: greenplum-operator:v2.3.0 Image ID: docker-pullable://greenplum-operator:v2.3.0 Port: <none> Host Port: <none> Command: greenplum-operator --logLevel debug State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 23 May 2022 15:29:59 +0200 Finished: Mon, 23 May 2022 15:30:32 +0200 Ready: False Restart Count: 19 Environment: GREENPLUM_IMAGE_REPO: greenplum-operator:v2.3.0 GREENPLUM_IMAGE_TAG: v2.3.0 OPERATOR_IMAGE_REPO: greenplum-operator:v2.3.0 OPERATOR_IMAGE_TAG: v2.3.0 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from greenplum-system-operator-token-xcz4q (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: greenplum-system-operator-token-xcz4q: Type: Secret (a volume populated by a Secret) SecretName: greenplum-system-operator-token-xcz4q Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning BackOff 32s (x340 over 84m) kubelet Back-off restarting failed container kubectl logs greenplum-operator-87d989b4d-ldft6 {"level":"INFO","ts":"2022-05-23T13:35:38.735Z","logger":"setup","msg":"Go Info","Version":"go1.14.10","GOOS":"linux","GOARCH":"amd64"} {"level":"INFO","ts":"2022-05-23T13:35:41.242Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"setup","msg":"starting manager"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"admission","msg":"starting greenplum validating admission webhook server"} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.264Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.262Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} {"level":"INFO","ts":"2022-05-23T13:35:41.265Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.361Z","logger":"admission","msg":"CertificateSigningRequest: created"} {"level":"INFO","ts":"2022-05-23T13:35:41.363Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumpxfservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumplservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.364Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumcluster","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.366Z","logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"greenplumtextservice","source":"kind source: /, Kind="} {"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumpxfservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.464Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumplservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumplservice","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumcluster"} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumpxfservice","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.465Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumcluster","worker count":1} {"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting Controller","controller":"greenplumtextservice"} {"level":"INFO","ts":"2022-05-23T13:35:41.466Z","logger":"controller-runtime.controller","msg":"Starting workers","controller":"greenplumtextservice","worker count":1} {"level":"ERROR","ts":"2022-05-23T13:36:11.368Z","logger":"setup","msg":"error","error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition","errorCauses":[{"error":"getting certificate for webhook: failure while waiting for approval: timed out waiting for the condition"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr#v0.1.0/zapr.go:128\nmain.main\n\t/greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator/main.go:35\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} I tried to redeploy the cert-manager and check logs but couldn't find anything. Documentation of the greenplum-for-kubernetes doesn't mention anything about that. Read the whole troubleshooting document from the pivotal website too
OpenShift Ansible-based operator hangs inconsistently
I have an Ansible-based operator running within an OpenShift 4.2 cluster. Most of times when I apply the relevant CR, the operator runs perfectly. Occasionally though the operator hangs without reporting any further logs. The step where this happens is the same however the problem is that this happens inconsistently without any other factors involved and I am not sure how to diagnose it. Restarting the operator always resolves the issue, but I wonder if there's anything I could do to diagnose it and prevent this from happening altogether? - name: allow Pods to reference images in myproject project k8s: definition: apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: "system:image-puller-{{ meta.name }}" namespace: myproject roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:image-puller subjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: "system:serviceaccounts:{{ meta.name }}" The operator's logs simply hang right after the above step and right before the following step: - name: fetch some-secret set_fact: some_secret: "{{ lookup('k8s', kind='Secret', namespace='myproject', resource_name='some-secret') }}" oc describe is as follows oc describe -n openshift-operators pod my-ansible-operator-849b44d6cc-nr5st Name: my-ansible-operator-849b44d6cc-nr5st Namespace: openshift-operators Priority: 0 PriorityClassName: <none> Node: worker1.openshift.mycompany.com/10.0.8.21 Start Time: Wed, 10 Jun 2020 22:35:45 +0100 Labels: name=my-ansible-operator pod-template-hash=849b44d6cc Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.254.20.128" ], "default": true, "dns": {} }] Status: Running IP: 10.254.20.128 Controlled By: ReplicaSet/my-ansible-operator-849b44d6cc Containers: ansible: Container ID: cri-o://63b86ddef4055be4bcd661a3fcd70d525f9788cb96b7af8dd383ac08ea670047 Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1 Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a Port: <none> Host Port: <none> Command: /usr/local/bin/ao-logs /tmp/ansible-operator/runner stdout State: Running Started: Wed, 10 Jun 2020 22:35:56 +0100 Ready: True Restart Count: 0 Environment: <none> Mounts: /tmp/ansible-operator/runner from runner (ro) /var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro) operator: Container ID: cri-o://365077a3c1d83b97428d27eebf2f0735c9d670d364b16fad83fff5bb02b479fe Image: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator:v0.0.1 Image ID: image-registry.openshift-image-registry.svc:5000/openshift-operators/my-ansible-operator#sha256:fda68898e6fe0c61760fe8c50fd0a55de392e63635c5c8da47fdb081cd126b5a Port: <none> Host Port: <none> State: Running Started: Wed, 10 Jun 2020 22:35:57 +0100 Ready: True Restart Count: 0 Environment: WATCH_NAMESPACE: openshift-operators (v1:metadata.namespace) POD_NAME: my-ansible-operator-849b44d6cc-nr5st (v1:metadata.name) OPERATOR_NAME: my-ansible-operator ANSIBLE_GATHERING: explicit Mounts: /tmp/ansible-operator/runner from runner (rw) /var/run/secrets/kubernetes.io/serviceaccount from my-ansible-operator-token-vbwlr (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: runner: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: my-ansible-operator-token-vbwlr: Type: Secret (a volume populated by a Secret) SecretName: my-ansible-operator-token-vbwlr Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none> Is there anything else I could do to diagnose the problem further or prevent the operator from hanging occasionally?
I found a very similar issue in the operator-sdk repository, linking to the root cause in the Ansible k8s module: Ansible 2.7 stuck on Python 3.7 in docker-ce From the discussion in the issue it seems that the problem is related to tasks that do not time out and the current workaround seems to be: For now we just override ansible local connection and normal action plugins, so: all communicate() calls have 60 second timeout all raised TimeoutExpired exceptions are retried a few times Can you check if this resolves your issue? As the issue is still "open", you might need to reach out to the issue as well.
Elasticsearch enable security issues
I have a Elasticsearch 7.6 cluster installed base on https://github.com/openstack/openstack-helm-infra/tree/master/elasticsearch Following is what I did to enable security: a. Generate certificate ./bin/elasticsearch-certutil ca File location: /usr/share/elasticsearch/elastic-stack-ca.p12 ./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12 File location: /usr/share/elasticsearch/elastic-certificates.p12 kubectl create secret generic elastic-certificates --from-file=elastic-certificates.p12 b. Enable Security on statefulset for master pod kubectl edit statefulset elasticsearch-master ---- - name: xpack.security.enabled value: "true" - name: xpack.security.transport.ssl.enabled value: "true" - name: xpack.security.transport.ssl.verification_mode value: certificate - name: xpack.security.transport.ssl.keystore.path value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12 - name: xpack.security.transport.ssl.truststore.path value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12 ---- - mountPath: /usr/share/elasticsearch/config/certs name: elastic-certificates readOnly: true ---- - name: elastic-certificates secret: defaultMode: 444 secretName: elastic-certificates c. Enable security on statefulset for data pod kubectl edit statefulset elasticsearch-data ---- - name: xpack.security.enabled value: "true" - name: xpack.security.transport.ssl.enabled value: "true" - name: xpack.security.transport.ssl.verification_mode value: certificate ---- - mountPath: /usr/share/elasticsearch/config/certs name: elastic-certificates ---- - name: elastic-certificates secret: defaultMode: 444 secretName: elastic-certificates d. Enable security on deployment for client kubectl edit deployment elasticsearch-client ---- - name: xpack.security.enabled value: "true" - name: xpack.security.transport.ssl.enabled value: "true" - name: xpack.security.transport.ssl.verification_mode value: certificate - name: xpack.security.transport.ssl.keystore.path value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12 - name: xpack.security.transport.ssl.truststore.path value: /usr/share/elasticsearch/config/certs/elastic-certificates.p12 ---- - mountPath: /usr/share/elasticsearch/config/certs name: elastic-certificates ---- - name: elastic-certificates secret: defaultMode: 444 secretName: elastic-certificates After pods restart, I got the following issue: a. data pots are stuck in init stage kubectl get pod |grep data elasticsearch-data-0 1/1 Running 0 42m elasticsearch-data-1 0/1 Init:0/3 0 10m kubectl logs elasticsearch-data-1 -c init |tail -1 Entrypoint WARNING: <date/time> entrypoint.go:72: Resolving dependency Service elasticsearch-logging in namespace osh-infra failed: Service elasticsearch-logging has no endpoints . b. Client pod errors regarding connection refused Warning Unhealthy 18m (x4 over 19m) kubelet, s1-worker-2 Readiness probe failed: Get http://192.180.71.82:9200/_cluster/health: dial tcp 192.180.71.82:9200: connect: connection refused Warning Unhealthy 4m17s (x86 over 18m) kubelet, s1-worker-2 Readiness probe failed: HTTP probe failed with statuscode: 401 c. Service "elasticsearch-logging" endpoints is empty Any suggestions how to fix or what is wrong? Thanks.