setting up Helm cli resulting in fatal 'Error: remote error: tls: bad certificate' - ibm-cloud-private

I am following https://github.com/rpsene/icp-scripts/blob/master/icp-310-single-node.sh to install CE version of ICP using docker. but resulting on below error
TASK [tiller : Deploying Tiller] ***********************************************
changed: [localhost]
TASK [tiller : Waiting for Tiller to start] ************************************
changed: [localhost]
TASK [helm-config : Setting up Helm cli] ***************************************
FAILED - RETRYING: Setting up Helm cli (10 retries left).
FAILED - RETRYING: Setting up Helm cli (9 retries left).
FAILED - RETRYING: Setting up Helm cli (8 retries left).
FAILED - RETRYING: Setting up Helm cli (7 retries left).
FAILED - RETRYING: Setting up Helm cli (6 retries left).
FAILED - RETRYING: Setting up Helm cli (5 retries left).
FAILED - RETRYING: Setting up Helm cli (4 retries left).
FAILED - RETRYING: Setting up Helm cli (3 retries left).
FAILED - RETRYING: Setting up Helm cli (2 retries left).
FAILED - RETRYING: Setting up Helm cli (1 retries left).
fatal: [localhost]: FAILED! => changed=true
attempts: 10
cmd: |-
helm init --client-only --skip-refresh
export HELM_HOME=~/.helm
cp /installer/cluster/cfc-certs/helm/admin.crt $HELM_HOME/cert.pem
cp /installer/cluster/cfc-certs/helm/admin.key $HELM_HOME/key.pem
kubectl -n kube-system get pods -l app=helm,name=tiller
helm list --tls
delta: '0:00:02.447326'
end: '2019-01-31 19:36:02.072940'
msg: non-zero return code
rc: 1
start: '2019-01-31 19:35:59.625614'
stderr: 'Error: remote error: tls: bad certificate'
stderr_lines: <omitted>
stdout: |-
$HELM_HOME has been configured at /root/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
NAME READY STATUS RESTARTS AGE
tiller-deploy-546cd68bcb-b8wkw 1/1 Running 1 5h
stdout_lines: <omitted>
PLAY RECAP *********************************************************************
192.168.17.131 : ok=159 changed=87 unreachable=0 failed=0
localhost : ok=75 changed=40 unreachable=0 failed=1
Playbook run took 0 days, 0 hours, 10 minutes, 10 seconds

You many need to upgrade the tiller-deploy by reinitiate it.
# use following command to check whether the tiller-deploy pod is running or not
$kubectl get pod -n kube-system
# delete tiller-deploy deployment
$kubectl delete deployment -n kube-system tiller-deploy
# use the same command to confirm that the tiller-deploy is deleted
$kubectl get pod -n kube-system
# use the command below to deploy tiller-deploy again
$helm init

Thank you Richard for your answer. From your answer get the glimpse and researched on it. Found out that certificate builder was outdated just updated that and voila it installed without an error.

Got this error after reinstalling kubernetes integration on gitlab.
The error on the kubernetes integration page was: "Something went wrong while installing GitLab Runner. Operation failed. Check pod logs for install-runner for more details."
It turned out that gitlab is not correctly removing any deployments/pods on google cloud console after deleting kubernetes integration.
To get pod logs:
kubectl -n gitlab-managed-apps get pods
kubectl -n gitlab-managed-apps logs [pod-name]
To solve the problem:
First remove your kubernetes integration on gitlab. Then delete the gitlab-managed-apps workspace.
gcloud config set project [project-id]
kubectl delete namespace gitlab-managed-apps
At the end re-add kubernetes integration.
Have fun.

Related

How can I diagnose why a k8s pod keeps restarting?

I deploy a elasticsearch to minikube with below configure file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch
spec:
replicas: 1
selector:
matchLabels:
name: elasticsearch
template:
metadata:
labels:
name: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:7.10.1
ports:
- containerPort: 9200
- containerPort: 9300
I run the command kubectl apply -f es.yml to deploy the elasticsearch cluster.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
elasticsearch-fb9b44948-bchh2 1/1 Running 5 6m23s
The elasticsearch pod keep restarting every a few minutes. When I run kubectl describe pod command, I can see these events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m11s default-scheduler Successfully assigned default/elasticsearch-fb9b44948-bchh2 to minikube
Normal Pulled 3m18s (x5 over 7m11s) kubelet Container image "elasticsearch:7.10.1" already present on machine
Normal Created 3m18s (x5 over 7m11s) kubelet Created container elasticsearch
Normal Started 3m18s (x5 over 7m10s) kubelet Started container elasticsearch
Warning BackOff 103s (x11 over 5m56s) kubelet Back-off restarting failed container
The last event is Back-off restarting failed but I don't know why it restarts the pod. Is there any way I can check why it keeps restarting?
The first step (kubectl describe pod) you've already done. As a next step I suggest checking container logs: kubectl logs <pod_name>. 99% you get the reason from logs in this case (I bet on bootstrap check failure).
When neither describe pod nor logs do not have anything about the error, I get into the container with 'exec': kubectl exec -it <pod_name> -c <container_name> sh. With this you'll get a shell inside the container (of course if there IS a shell binary in it) ans so you can use it to investigate the problem manually. Note that to keep failing container alive you may need to change command and args to something like this:
command:
- /bin/sh
- -c
args:
- cat /dev/stdout
Be sure to disable probes when doing this. A container may restart if liveness probe fails, you will see that in kubectl describe pod if it happen. Since your snippet doesn't have any probes specified, you can skip this.
Checking logs of the pod using kubectl logs podname gives clue about what could go wrong.
ERROR: [2] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log
Check this post for a solution

Helm Fail In Cloud Build

I'm using alpine/helm:3.0.0 in the following Google Cloud Build step
- id: 'update helm app'
name: 'alpine/helm:3.0.0'
args: ['upgrade', 'staging', './iprocure-chart/']
env:
- CLOUDSDK_COMPUTE_ZONE=us-central1-a
- CLOUDSDK_CONTAINER_CLUSTER=iprocure-cluster
The problem is when i run this using cloud-build-local i get the following error and the pipeline ends with a fail
Starting Step #4 - "update helm app"
Step #4 - "update helm app": Already have image (with digest): alpine/helm:3.0.0
Step #4 - "update helm app": Error: UPGRADE FAILED: query: failed to query with labels: Get http://localhost:8080/api/v1/namespaces/default/secrets?labelSelector=name%3Dstaging%2Cowner%3Dhelm%2Cstatus%3Ddeployed: dial tcp 127.0.0.1:8080: connect: connection refused
This is because the configuration has not been set or passed.
To configure checkout = https://cloud.google.com/cloud-build/docs/build-debug-locally#before_you_begin
and in your build step add a evn like this :
id: 'update helm app'
name: 'alpine/helm:3.0.0'
args: ['upgrade', 'staging', './iprocure-chart/']
env:
CLOUDSDK_COMPUTE_ZONE=us-central1-a
CLOUDSDK_CONTAINER_CLUSTER=iprocure-cluster
KUBECONFIG=/workspace/.kube/config
If this does't work try passing the config with --kubeconfig flag in your helm command.Like this :
--kubeconfig=/workspace/.kube/config..

MiniKube coredns is in CrashLoopBackOff status in AWS EC2

I am new to Minikube. We deployed minikube version: v1.0.1 on AWS EC2. Coredns is showing as CrashLoopBackOff.
kubectl get -n kube-system pods
NAME READY STATUS RESTARTS AGE
coredns-6765558d84-6bpff 0/1 CrashLoopBackOff 531 44h
coredns-6765558d84-9mqz6 0/1 CrashLoopBackOff 531 44h
The logs of these pods are showing:
2019-05-22T06:29:40.959Z [FATAL] plugin/loop: Loop (127.0.0.1:53726 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 1771143215983809104.2668792180170228628."
I did try to remove the word loop from a config file - as per another Stackoverflow ticket. CoreDNS started working, but proxy stopped!

After successfully install of CAM the "cam-mongo" pod went down

After successful deployment of CAM (Was up and running for couple of days), suddenly "cam-mongo" microservice went down and while checking the logs for pod using below 2 command it gives you Error synching pod
1) kubectl describe pods -n services
Warning BackOff 3s (x3 over 18s) kubelet, 9.109.191.126 Back-off restarting failed container
Warning FailedSync 3s (x3 over 18s) kubelet, 9.109.191.126 Error syncing pod
With this information you don't know what went wrong and how do you fix it
2) kubectl -n services logs cam-mongo-5c89fcccbd-r2hv4 -p (with -p option you can grab the logs from previously running container )
The above command show below information:
exception in initAndListen: 98 Unable to lock file: /data/db/mongod.lock Resource temporarily unavailable. Is a mongod instance already running?, terminatingConclusion:
While starting the container inside "cam-mongo" pod it was unable to use the existing /data/db/mongod.lock file and hence your pod will be not up and running and you cannot access CAM
After further analysis I resolved the issue as following:
1) spin up a container and mount the cam-mongo volume within it.
To do this I used the below pod creation yaml which will mount the concern pv's where /data/db/ is present.
kind: Pod
apiVersion: v1
metadata:
name: mongo-troubleshoot-pod
spec:
volumes:
name: cam-mongo-pv
persistentVolumeClaim:
claimName: cam-mongo-pv
containers:
name: mongo-troubleshoot
image: nginx
ports:
containerPort: 80
name: "http-server"
volumeMounts:
mountPath: "/data/db"
name: cam-mongo-pv
RUN:kubectl -n services create -f ./mongo-troubleshoot-pod.yaml
2) Use "docker exec -it /bin/bash " (look for it from "kubectl -n services describe po/mongo-troubleshoot-pod-xxxxx" info)
cd /data/db
rm mongod.lock
rm WiredTiger.lock
3) kill the pod which you have created for troubleshooting
4) kill the corrupted cam-mongo pod using below command
kubectl delete pods -n services
It fixed the issue.

ICP 2.1 Beta 2 can't create Cloudant DB fast enough

When running the following command from the installation instructions:
sudo docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster \
ibmcom/icp-inception:2.1.0-beta-2-ee install
The task to create the Cloudant DB times out:
TASK [master : Ensuring that the Cloudant Database is ready] ***********************************************************************************************************************************************
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (20 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (19 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (18 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (17 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (16 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (15 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (14 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (13 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (12 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (11 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (10 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (9 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (8 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (7 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (6 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (5 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (4 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (3 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (2 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (1 retries left).
fatal: [10.20.30.29] => Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>
PLAY RECAP *************************************************************************************************************************************************************************************************
10.20.30.29 : ok=149 changed=57 unreachable=0 failed=1
The container is still using CPU in /usr/bin/python /usr/bin/ansible-playbook -e #cluster/config.yaml playbook/site.yaml so I think it's still installing. How do I increase the number of retries?
Run the command
docker info | grep -i cgroup
and you should see
Cgroup Driver: systemd
...so matched that in ICP by adding the line
kubelet_extra_args: ["--cgroup-driver=systemd"]
to ICP's config.yaml
This usually means cloudant db takes too long to be ready and by that time, 20 times retry limit has been reached. Uninstall ICP 2.1 beta2 and install it usually gets past this issue. In the ICP 2.1 beta3, it will have an increased timeout.
If this does not work, you can check the following:
docker exec -it <cloudant container> bash
now inside the container, to check the database status:
cast cluster status -p $ADMIN_PASSWORD
Also can check:
docker logs <long cloudant container name>
Before installing ICP beta2 again, check whether
docker volume ls
was cleaned up.
In some cases, we find that the size of the ICP datastore, which is based on Cloudant, takes awhile to download. Have you considered pre-pulling the image onto your host?
docker pull ibmcom/icp-datastore:{version}
You would need to do this on each master (but not on the worker nodes).

Resources