cam-bpd-ui pod doesn't start successfully after CAM fresh install - ibm-cloud-private

After a CAM 2.1.0.2 fresh install on ICP, by running the following command:
kubectl -n services get pods
I noticed that "cam-bpd-ui" pod didn't start. So I'm not able to log in to Process Designer UI and I'm getting the error: "Readiness probe failed: HTTP probe failed with statuscode: 404".
According to the ICP overview pane it is running and available. However I see this in the logs
"[Warning] Failed to load slave replication state from table mysql.gtid_slave_pos: 1146: Table 'mysql.gtid_slave_pos' doesn't exist
Version: '10.1.16-MariaDB-1~jessie' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
2018-04-24 16:15:52 140411194034112 [Note] mysqld: ready for connections."
When checking the events in the cam-bpd-ui pod we see the following:
kubectl describe pod cam-bpd-ui-687764b5fc-qxjnp -n services
Name: cam-bpd-ui-687764b5fc-qxjnp
Namespace: services
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned cam-bpd-ui-687764b5fc-qxjnp to 10.190.155.237
Normal SuccessfulMountVolume 27m kubelet, 10.190.155.237 MountVolume.SetUp succeeded for volume "default-token-c8nq4"
Normal SuccessfulMountVolume 27m kubelet, 10.190.155.237 MountVolume.SetUp succeeded for volume "cam-logs-pv"
Normal SuccessfulMountVolume 27m kubelet, 10.190.155.237 MountVolume.SetUp succeeded for volume "cam-bpd-appdata-pv"
Normal Pulled 27m kubelet, 10.190.155.237 Container image "icp-dev.watsonplatform.net:8500/services/icam-busybox:2.1.0.2-x86_64" already present on machine
Normal Created 27m kubelet, 10.190.155.237 Created container
Normal Pulled 27m kubelet, 10.190.155.237 Container image "icp-dev.watsonplatform.net:8500/services/icam-bpd-ui:2.1.0.2-x86_64" already present on machine
Normal Started 27m kubelet, 10.190.155.237 Started container
Normal Created 27m kubelet, 10.190.155.237 Created container
Normal Started 27m kubelet, 10.190.155.237 Started container
Warning Unhealthy 26m (x2 over 26m) kubelet, 10.190.155.237 Readiness probe failed: Get http://10.1.45.36:8080/landscaper/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning BackOff 12m (x3 over 12m) kubelet, 10.190.155.237 Back-off restarting failed container
Warning Unhealthy 2m (x129 over 26m) kubelet, 10.190.155.237 Readiness probe failed: HTTP probe failed with statuscode: 404

The Process Designer (BPD) needs to connect to the mariadb so that it can populate.
You need to be 100% sure that the db is functional. If it is not, bpd will not return the login page for you.
Some hints:
1) If you have an NFS, ensure that /etc/exports has /export *(rw,insecure,no_subtree_check,async,no_root_squash)
For more details about no_root_squash you can see the link here: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nfs-mount-on-ubuntu-16-04
As workaround you can do the following:
you can setup your own db and configure so that the BPD uses it.
Details can be found here:
https://www.ibm.com/support/knowledgecenter/SS4GSP_6.2.7/com.ibm.edt.doc/topics/install_database_mysql_bds.html

Related

AWS Linux 2 AMI Failed to get D-Bus connection: No such file or directory

I have an AWS Linux 2 AMI EC2 instance.
When running systemctl --user status I get the message:
Failed to get D-Bus connection: No such file or directory
I then ran systemctl start dbus.socket, which gave me this message:
Failed to start dbus.socket: The name org.freedesktop.PolicyKit1 was not provided by any .service files See system logs and 'systemctl status dbus.socket' for details.
I then ran systemctl status dbus.socket -l which returned this:
dbus.socket - D-Bus System Message Bus Socket
Loaded: loaded (/usr/lib/systemd/system/dbus.socket; static; vendor preset: disabled)
Active: active (running) since Thu 2022-03-31 21:26:42 UTC; 14h ago
Listen: /run/dbus/system_bus_socket (Stream)
Mar 31 21:26:42 ip-10-0-0-193.ec2.internal systemd[1]: Listening on D-Bus System Message Bus Socket.
Mar 31 21:26:42 ip-10-0-0-193.ec2.internal systemd[1]: Starting D-Bus System Message Bus Socket.
Running sudo systemctl --user status gives a different error:
Failed to get D-Bus connection: Connection refused
I'm unsure of what to investigate next or what steps to take to resolve the issue.

Consul UI does not show

Running single node Consul (v1.8.4) on Ubuntu 18.04. consul service is up, I had set the ui to be true (default).
But when I try access http://192.168.37.128:8500/ui
This site can’t be reached 192.168.37.128 took too long to respond.
ui.json
{
"addresses": {
"http": "0.0.0.0"
}
}
consul.service file:
[Unit]
Description=Consul
Documentation=https://www.consul.io/
[Service]
ExecStart=/usr/bin/consul agent –server –ui –data-dir=/temp/consul –bootstrap-expect=1 –node=vault –bind=–config-dir=/etc/consul.d/
ExecReload=/bin/kill –HUP $MAINPID
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
systemctl status consul
● consul.service - Consul
Loaded: loaded (/etc/systemd/system/consul.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2020-10-04 19:19:08 CDT; 50min ago
Docs: https://www.consul.io/
Main PID: 9477 (consul)
Tasks: 9 (limit: 4980)
CGroup: /system.slice/consul.service
└─9477 /opt/consul/bin/consul agent -server -ui -data-dir=/temp/consul -bootstrap-expect=1 -node=vault -bind=1
agent.server.raft: heartbeat timeout reached, starting election: last-leader=
agent.server.raft: entering candidate state: node="Node at 192.168.37.128:8300 [Candid
agent.server.raft: election won: tally=1
agent.server.raft: entering leader state: leader="Node at 192.168.37.128:8300 [Leader]
agent.server: cluster leadership acquired
agent.server: New leader elected: payload=vault
agent.leader: started routine: routine="federation state anti-entropy"
agent.leader: started routine: routine="federation state pruning"
agent.leader: started routine: routine="CA root pruning"
agent: Synced node info
Shows bind at 192.168.37.128:8300
This issue was firewall, had to open firewall on 8500
sudo ufw allow 8500/tcp

microk8s.enable dns gets stuck in ContainerCreating

I have installed microk8s snap on Ubuntu 19 in a VBox. When I run microk8s.enable dns, the pod for the deployment does not get past ContainerCreating state.
I used to work in before. I have also re-installed microk8s, this helped in the passed, but not anymore.
n.a.
Output from microk8s.kubectl get all --all-namespaces shows that something is wrong with the volume for the secrets. I don't know how I can investigate further, so any help is appreciated.
Cheers
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-9b8997588-z88lz 0/1 ContainerCreating 0 16m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 20m
kube-system service/kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 16m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/1 1 0 16m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-9b8997588 1 1 0 16m
Output from microk8s.kubectl describe pod/coredns-9b8997588-z88lz -n kube-system
Name: coredns-9b8997588-z88lz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: peza-ubuntu-19/10.0.2.15
Start Time: Sun, 29 Sep 2019 15:49:27 +0200
Labels: k8s-app=kube-dns
pod-template-hash=9b8997588
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-9b8997588
Containers:
coredns:
Container ID:
Image: coredns/coredns:1.5.0
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-h6qlm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-h6qlm:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-h6qlm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/coredns-9b8997588-z88lz to peza-ubuntu-19
Warning FailedMount 5m59s kubelet, peza-ubuntu-19 Unable to attach or mount volumes: unmounted volumes=[coredns-token-h6qlm config-volume], unattached volumes=[coredns-token-h6qlm config-volume]: timed out waiting for the condition
Warning FailedMount 3m56s (x11 over 10m) kubelet, peza-ubuntu-19 MountVolume.SetUp failed for volume "coredns-token-h6qlm" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 3m44s (x2 over 8m16s) kubelet, peza-ubuntu-19 Unable to attach or mount volumes: unmounted volumes=[config-volume coredns-token-h6qlm], unattached volumes=[config-volume coredns-token-h6qlm]: timed out waiting for the condition
Warning FailedMount 113s (x12 over 10m) kubelet, peza-ubuntu-19 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
I spent my morning fighting with this on ubuntu 19.04. None of the microk8s add-ons worked. Their containers got stuck in "ContainerCreating" status having something like "MountVolume.SetUp failed for volume "kubernetes-dashboard-token-764ml" : failed to sync secret cache: timed out waiting for the condition" in their descriptions.
I tried to start/stop/reset/reinstall microk8s a few times. Nothing worked. Once I downgraded it to the prev version the problem went away.
sudo snap install microk8s --classic --channel=1.15/stable

“Kibana server is not ready yet” error when deploying ELK in k8s addons file

l am new to ELK stack, l want to deploy ELK in my k8s cluster, l use minikube for a try.
The yaml files are all from kubernetes repo :
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch
l just changed kibana-service.yaml by adding one more line type: NodePort
The command used:
kubectl create -f fluentd-elasticsearch/
kubectl get pods -n kube-system
// omit some info
elasticsearch-logging-0 1/1 Running
elasticsearch-logging-1 1/1 Running
fluentd-es-v2.5.1-cz6zp 1/1 Running
kibana-logging-5c895c4cd-qjrkz 1/1 Running
kube-addon-manager-minikube 1/1 Running
kube-dns-7cd4f8cd9f-gzbxb 3/3 Running
kubernetes-dashboard-7b7c7bd496-m748h 1/1 Running
kubectl get svc -n kube-system
elasticsearch-logging ClusterIP 10.96.18.172 <none> 9200/TCP 74m
kibana-logging NodePort 10.102.218.78 <none> 5601:30345/TCP 74m
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 42d
kubernetes-dashboard NodePort 10.102.61.203 <none> 80:30000/TCP 42d
kubectl describe pods elasticsearch-logging-0 -n kube-system
Name: elasticsearch-logging-0
Namespace: kube-system
Node: minikube/192.168.99.100
Start Time: Mon, 29 Apr 2019 21:42:25 +0800
Labels: controller-revision-hash=elasticsearch-logging-76ccc76cd9
k8s-app=elasticsearch-logging
kubernetes.io/cluster-service=true
statefulset.kubernetes.io/pod-name=elasticsearch-logging-0
version=v6.6.1
Annotations: <none>
Status: Running
IP: 172.17.0.20
Controlled By: StatefulSet/elasticsearch-logging
Init Containers:
elasticsearch-logging-init:
Container ID: docker://ff75d166b9df3ee444efb19e2498907d0cfec53d35b14d124bbb6756eb4418ed
Image: alpine:3.6
Image ID: docker-pullable://alpine#sha256:ee0c0e7b6b20b175f5ffb1bbd48b41d94891b0b1074f2721acb008aafdf25417
Port: <none>
Host Port: <none>
Command:
/sbin/sysctl
-w
vm.max_map_count=262144
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 29 Apr 2019 21:42:25 +0800
Finished: Mon, 29 Apr 2019 21:42:25 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-logging-token-g2rcx (ro)
Containers:
elasticsearch-logging:
Container ID: docker://8f52602890334bdbbfd5a2042ac6e99426308230db0f338ea80a3cd2bef3bda3
Image: gcr.io/fluentd-elasticsearch/elasticsearch:v6.6.1
Image ID: docker-pullable://gcr.io/fluentd-elasticsearch/elasticsearch#sha256:89cdf74301f36f911e0fc832b21766114adbd591241278cf97664b7cb76b2e67
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 29 Apr 2019 21:59:20 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 29 Apr 2019 21:58:02 +0800
Finished: Mon, 29 Apr 2019 21:58:35 +0800
Ready: True
Restart Count: 4
Limits:
cpu: 1
Requests:
cpu: 100m
Environment:
NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/data from elasticsearch-logging (rw)
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-logging-token-g2rcx (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
elasticsearch-logging:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elasticsearch-logging-token-g2rcx:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch-logging-token-g2rcx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned elasticsearch-logging-0 to minikube
Normal SuccessfulMountVolume 19m kubelet, minikube MountVolume.SetUp succeeded for volume "elasticsearch-logging"
Normal SuccessfulMountVolume 19m kubelet, minikube MountVolume.SetUp succeeded for volume "elasticsearch-logging-token-g2rcx"
Normal Pulled 19m kubelet, minikube Container image "alpine:3.6" already present on machine
Normal Created 19m kubelet, minikube Created container
Normal Started 19m kubelet, minikube Started container
Warning BackOff 3m3s (x6 over 10m) kubelet, minikube Back-off restarting failed container
Normal Pulled 2m49s (x5 over 19m) kubelet, minikube Container image "gcr.io/fluentd-elasticsearch/elasticsearch:v6.6.1" already present on machine
Normal Created 2m49s (x5 over 19m) kubelet, minikube Created container
Normal Started 2m49s (x5 over 19m) kubelet, minikube Started container
when l visit minikube-ip:30345, l got "Kibana server is not ready yet"
when l ssh into minikube, curl 10.96.18.172:9200 doesn't work, l suspect the problem lies in elasticsearch...
Anyone can help me? Thanks in advance!

kubernetes windows worker node with calico can not deploy pods

I try to use kubeadm.exe join to join windows worker node but it's not working.
Then I try to refer to this document nwoodmsft/SDN/CalicoFelix.md,after this, node status just like this
# node status
root#ysicing:~# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
win-o35a06j767t Ready <none> 1h v1.10.10 <none> Windows Server Standard 10.0.17134.1 docker://18.9.0
ysicing Ready master 4h v1.10.10 <none> Debian GNU/Linux 9 (stretch) 4.9.0-8-amd64 docker://17.3.3
pods stauts:
root#ysicing:~# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default demo-deployment-c96d5d97b-99h9s 0/1 ContainerCreating 0 5m <none> win-o35a06j767t
default demo-deployment-c96d5d97b-lq2jm 0/1 ContainerCreating 0 5m <none> win-o35a06j767t
default demo-deployment-c96d5d97b-zrc2k 1/1 Running 0 5m 192.168.0.3 ysicing
default iis-7f7dc9fbbb-xhccv 0/1 ContainerCreating 0 1h <none> win-o35a06j767t
kube-system calico-node-nr5mt 0/2 ContainerCreating 0 1h 192.168.1.2 win-o35a06j767t
kube-system calico-node-w6mls 2/2 Running 0 5h 172.16.0.169 ysicing
kube-system etcd-ysicing 1/1 Running 0 6h 172.16.0.169 ysicing
kube-system kube-apiserver-ysicing 1/1 Running 0 6h 172.16.0.169 ysicing
kube-system kube-controller-manager-ysicing 1/1 Running 0 6h 172.16.0.169 ysicing
kube-system kube-dns-86f4d74b45-dbcmb 3/3 Running 0 6h 192.168.0.2 ysicing
kube-system kube-proxy-wt6dn 1/1 Running 0 6h 172.16.0.169 ysicing
kube-system kube-proxy-z5jx8 0/1 ContainerCreating 0 1h 192.168.1.2 win-o35a06j767t
kube-system kube-scheduler-ysicing 1/1 Running 0 6h 172.16.0.169 ysicing
The kube-proxy and calico should not be the container way, and it runs under Windows using kube-proxy.exe.
calico pods err info:
Warning FailedCreatePodSandBox 2m (x1329 over 32m) kubelet, win-o35a06j767t Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "calico-node-nr5mt": Error response from daemon: network host not found
demo.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: iis
spec:
replicas: 1
template:
metadata:
labels:
app: iis
spec:
nodeSelector:
beta.kubernetes.io/os: windows
containers:
- name: iis
image: microsoft/iis
resources:
limits:
memory: "128Mi"
cpu: 2
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
labels:
app: iis
name: iis
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: iis
type: NodePort
demo pods err logs
(extra info:
{"SystemType":"Container","Name":"082e861a8720a84223111b3959a1e2cd26e4be3d0ffcb9eda35b2a09955d4081","Owner":"docker","VolumePath":"\\\\?\\Volume{e8dcfa1d-fbbe-4ef9-b849-5f02b1799a3f}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\082e861a8720a84223111b3959a1e2cd26e4be3d0ffcb9eda35b2a09955d4081","Layers":[{"ID":"8c940e59-c455-597f-b4b2-ff055e33bc2a","Path":"C:\\ProgramData\\docker\\windowsfilter\\7f1a079916723fd228aa878db3bb1e37b50e508422f20be476871597fa53852d"},{"ID":"f72db42e-18f4-54da-98f1-0877e17a069f","Path":"C:\\ProgramData\\docker\\windowsfilter\\449dc4ee662760c0102fe0f388235a111bb709d30b6d9b6787fb26d1ee76c990"},{"ID":"40282350-4b8f-57a2-94e9-31bebb7ec0a9","Path":"C:\\ProgramData\\docker\\windowsfilter\\6ba0fa65b66c3b3134bba338e1f305d030e859133b03e2c80550c32348ba16c5"},{"ID":"f5a96576-2382-5cba-a12f-82ad7616de0f","Path":"C:\\ProgramData\\docker\\windowsfilter\\3b68fac2830f2110aa9eb1c057cf881ee96ce973a378b37e20b74e32c3d41ee0"}],"ProcessorWeight":2,"HostName":"iis-7f7dc9fbbb-xhccv","HvPartition":false})
Warning FailedCreatePodSandBox 14m (x680 over 29m) kubelet, win-o35a06j767t (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "iis-7f7dc9fbbb-xhccv": Error response from daemon: CreateComputeSystem 0b9ab5f3dd4a69464f756aeb0bd780763b38712e32e8c1318fdd17e531437b0f: The operating system of the container does not match the operating system of the host.
(extra info:{"SystemType":"Container","Name":"0b9ab5f3dd4a69464f756aeb0bd780763b38712e32e8c1318fdd17e531437b0f","Owner":"docker","VolumePath":"\\\\?\\Volume{e8dcfa1d-fbbe-4ef9-b849-5f02b1799a3f}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\0b9ab5f3dd4a69464f756aeb0bd780763b38712e32e8c1318fdd17e531437b0f","Layers":[{"ID":"8c940e59-c455-597f-b4b2-ff055e33bc2a","Path":"C:\\ProgramData\\docker\\windowsfilter\\7f1a079916723fd228aa878db3bb1e37b50e508422f20be476871597fa53852d"},{"ID":"f72db42e-18f4-54da-98f1-0877e17a069f","Path":"C:\\ProgramData\\docker\\windowsfilter\\449dc4ee662760c0102fe0f388235a111bb709d30b6d9b6787fb26d1ee76c990"},{"ID":"40282350-4b8f-57a2-94e9-31bebb7ec0a9","Path":"C:\\ProgramData\\docker\\windowsfilter\\6ba0fa65b66c3b3134bba338e1f305d030e859133b03e2c80550c32348ba16c5"},{"ID":"f5a96576-2382-5cba-a12f-82ad7616de0f","Path":"C:\\ProgramData\\docker\\windowsfilter\\3b68fac2830f2110aa9eb1c057cf881ee96ce973a378b37e20b74e32c3d41ee0"}],"ProcessorWeight":2,"HostName":"iis-7f7dc9fbbb-xhccv","HvPartition":false})
Normal SandboxChanged 4m (x1083 over 29m) kubelet, win-o35a06j767t Pod sandbox changed, it will be killed and re-created.
config: "c:\k\"
The cni directory is empty by default. Then add calico-felix.exe and config fileL2Brige.conf
i try to google it, need cni, but not found calico cni.
What should I do in this situation, build Windows calico cni?

Resources