ICP 2.1 Beta 2 can't create Cloudant DB fast enough - ibm-cloud-private

When running the following command from the installation instructions:
sudo docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster \
ibmcom/icp-inception:2.1.0-beta-2-ee install
The task to create the Cloudant DB times out:
TASK [master : Ensuring that the Cloudant Database is ready] ***********************************************************************************************************************************************
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (20 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (19 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (18 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (17 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (16 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (15 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (14 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (13 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (12 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (11 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (10 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (9 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (8 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (7 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (6 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (5 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (4 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (3 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (2 retries left).
FAILED - RETRYING: TASK: master : Ensuring that the Cloudant Database is ready (1 retries left).
fatal: [10.20.30.29] => Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>
PLAY RECAP *************************************************************************************************************************************************************************************************
10.20.30.29 : ok=149 changed=57 unreachable=0 failed=1
The container is still using CPU in /usr/bin/python /usr/bin/ansible-playbook -e #cluster/config.yaml playbook/site.yaml so I think it's still installing. How do I increase the number of retries?

Run the command
docker info | grep -i cgroup
and you should see
Cgroup Driver: systemd
...so matched that in ICP by adding the line
kubelet_extra_args: ["--cgroup-driver=systemd"]
to ICP's config.yaml

This usually means cloudant db takes too long to be ready and by that time, 20 times retry limit has been reached. Uninstall ICP 2.1 beta2 and install it usually gets past this issue. In the ICP 2.1 beta3, it will have an increased timeout.
If this does not work, you can check the following:
docker exec -it <cloudant container> bash
now inside the container, to check the database status:
cast cluster status -p $ADMIN_PASSWORD
Also can check:
docker logs <long cloudant container name>
Before installing ICP beta2 again, check whether
docker volume ls
was cleaned up.

In some cases, we find that the size of the ICP datastore, which is based on Cloudant, takes awhile to download. Have you considered pre-pulling the image onto your host?
docker pull ibmcom/icp-datastore:{version}
You would need to do this on each master (but not on the worker nodes).

Related

how to connect kubernetes cluster with image registry

Hi I have deployed 3 node kubernetes cluster (one master, 2 worker nodes) as below:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.domain.com Ready control-plane 161m v1.24.4
worker1.domain.com Ready <none> 154m v1.24.4
worker2.domain.com Ready <none> 153m v1.24.4
I used cri-o container run time, tried creating few pods but it is failing with below events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 40s default-scheduler Successfully assigned default/nginx to worker2.domain.com
Normal BackOff 26s kubelet Back-off pulling image "nginx"
Warning Failed 26s kubelet Error: ImagePullBackOff
Normal Pulling 11s (x2 over 32s) kubelet Pulling image "nginx"
Warning Failed 2s (x2 over 27s) kubelet Failed to pull image "nginx": rpc error: code = Unknown desc = Error reading manifest latest in registry.hub.docker.com/nginx: unauthorized: authentication required
Warning Failed 2s (x2 over 27s) kubelet Error: ErrImagePull
The pod definition file is below:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: frontend
spec:
containers:
- name: nginx
image: nginx
Same like this I tried with mysql instead of nginx, I'm getting below events in the mysql pod, looks like it is able to pull the image but not able to run the pod:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23m default-scheduler Successfully assigned default/mysql to worker1.domain.com
Normal Pulled 22m kubelet Successfully pulled image "mysql" in 54.067277637s
Normal Pulled 22m kubelet Successfully pulled image "mysql" in 18.227802182s
Normal Pulled 21m kubelet Successfully pulled image "mysql" in 13.511077504s
Normal Created 20m (x4 over 22m) kubelet Created container mysql
Normal Started 20m (x4 over 22m) kubelet Started container mysql
Normal Pulled 20m kubelet Successfully pulled image "mysql" in 11.998942705s
Normal Pulling 20m (x5 over 23m) kubelet Pulling image "mysql"
Normal Pulled 20m kubelet Successfully pulled image "mysql" in 13.68976309s
Normal Pulled 18m kubelet Successfully pulled image "mysql" in 16.584670292s
Warning BackOff 3m12s (x80 over 22m) kubelet Back-off restarting failed container
below is the POD status:
NAME READY STATUS RESTARTS AGE
mysql 0/1 CrashLoopBackOff 8 (4m51s ago) 23m
nginx 0/1 ImagePullBackOff 0 3m26s
You do not really need any extra config to pull image from public image registry
The containers/image library is used for pulling images from registries. Currently, it supports Docker schema 2/version 1 as well as schema 2/version 2. It also passes all Docker and Kubernetes tests.
cri-container-images
So just mention the image with the right URI and it should work.

state_replicaset/state_replicaset.go98 error making http request: Get kube-state-metrics:8080/metrics: lookup kube-state-metrics on IP:53 no such host

We are trying to start metricbeat on typhoon kubernetes cluster. But after startup its not able to get some pod specific events like restart etc because of the following
Corresponding metricbeat.yaml snippet
# State metrics from kube-state-metrics service:
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_statefulset
- state_pod
- state_container
- state_cronjob
- state_resourcequota
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
# Uncomment this to get k8s events:
#- event period: 10s
hosts: ["kube-state-metrics:8080"]
Error which we are facing
2020-07-01T10:31:02.486Z ERROR [kubernetes.state_statefulset] state_statefulset/state_statefulset.go:97 error making http request: Get http://kube-state-metrics:8080/metrics: lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:02.611Z WARN [transport] transport/tcp.go:52 DNS lookup failure "kube-state-metrics": lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:02.611Z INFO module/wrapper.go:259 Error fetching data for metricset kubernetes.state_node: error doing HTTP request to fetch 'state_node' Metricset data: error making http request: Get http://kube-state-metrics:8080/metrics: lookup kube-state-metrics on *.*.*.*:53: no such host
2020-07-01T10:31:03.313Z ERROR process_summary/process_summary.go:102 Unknown or unexpected state <P> for process with pid 19
2020-07-01T10:31:03.313Z ERROR process_summary/process_summary.go:102 Unknown or unexpected state <P> for process with pid 20
I can add some other info which is required for this.
Make sure you have the Kube-State-Metrics deployed in your cluster in the kube-system namespace to make this work. Metricbeat will not come with this by default.
Please refer this for detailed deployment instructions.
If your kube-state-metrics is deployed to another namespace, Kubernetes cannot resolve the name. E.g. we have kube-state-metrics deployed to the monitoring namespace:
$ kubectl get pods -A | grep kube-state-metrics
monitoring kube-state-metrics-765c7c7f95-v7mmp 3/3 Running 17 10d
You could set hosts option to the full name, including namespace, like this:
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_statefulset
- state_pod
- state_container
- state_cronjob
- state_resourcequota
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
hosts: ["kube-state-metrics.<your_namespace>:8080"]

Helm Fail In Cloud Build

I'm using alpine/helm:3.0.0 in the following Google Cloud Build step
- id: 'update helm app'
name: 'alpine/helm:3.0.0'
args: ['upgrade', 'staging', './iprocure-chart/']
env:
- CLOUDSDK_COMPUTE_ZONE=us-central1-a
- CLOUDSDK_CONTAINER_CLUSTER=iprocure-cluster
The problem is when i run this using cloud-build-local i get the following error and the pipeline ends with a fail
Starting Step #4 - "update helm app"
Step #4 - "update helm app": Already have image (with digest): alpine/helm:3.0.0
Step #4 - "update helm app": Error: UPGRADE FAILED: query: failed to query with labels: Get http://localhost:8080/api/v1/namespaces/default/secrets?labelSelector=name%3Dstaging%2Cowner%3Dhelm%2Cstatus%3Ddeployed: dial tcp 127.0.0.1:8080: connect: connection refused
This is because the configuration has not been set or passed.
To configure checkout = https://cloud.google.com/cloud-build/docs/build-debug-locally#before_you_begin
and in your build step add a evn like this :
id: 'update helm app'
name: 'alpine/helm:3.0.0'
args: ['upgrade', 'staging', './iprocure-chart/']
env:
CLOUDSDK_COMPUTE_ZONE=us-central1-a
CLOUDSDK_CONTAINER_CLUSTER=iprocure-cluster
KUBECONFIG=/workspace/.kube/config
If this does't work try passing the config with --kubeconfig flag in your helm command.Like this :
--kubeconfig=/workspace/.kube/config..

setting up Helm cli resulting in fatal 'Error: remote error: tls: bad certificate'

I am following https://github.com/rpsene/icp-scripts/blob/master/icp-310-single-node.sh to install CE version of ICP using docker. but resulting on below error
TASK [tiller : Deploying Tiller] ***********************************************
changed: [localhost]
TASK [tiller : Waiting for Tiller to start] ************************************
changed: [localhost]
TASK [helm-config : Setting up Helm cli] ***************************************
FAILED - RETRYING: Setting up Helm cli (10 retries left).
FAILED - RETRYING: Setting up Helm cli (9 retries left).
FAILED - RETRYING: Setting up Helm cli (8 retries left).
FAILED - RETRYING: Setting up Helm cli (7 retries left).
FAILED - RETRYING: Setting up Helm cli (6 retries left).
FAILED - RETRYING: Setting up Helm cli (5 retries left).
FAILED - RETRYING: Setting up Helm cli (4 retries left).
FAILED - RETRYING: Setting up Helm cli (3 retries left).
FAILED - RETRYING: Setting up Helm cli (2 retries left).
FAILED - RETRYING: Setting up Helm cli (1 retries left).
fatal: [localhost]: FAILED! => changed=true
attempts: 10
cmd: |-
helm init --client-only --skip-refresh
export HELM_HOME=~/.helm
cp /installer/cluster/cfc-certs/helm/admin.crt $HELM_HOME/cert.pem
cp /installer/cluster/cfc-certs/helm/admin.key $HELM_HOME/key.pem
kubectl -n kube-system get pods -l app=helm,name=tiller
helm list --tls
delta: '0:00:02.447326'
end: '2019-01-31 19:36:02.072940'
msg: non-zero return code
rc: 1
start: '2019-01-31 19:35:59.625614'
stderr: 'Error: remote error: tls: bad certificate'
stderr_lines: <omitted>
stdout: |-
$HELM_HOME has been configured at /root/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
NAME READY STATUS RESTARTS AGE
tiller-deploy-546cd68bcb-b8wkw 1/1 Running 1 5h
stdout_lines: <omitted>
PLAY RECAP *********************************************************************
192.168.17.131 : ok=159 changed=87 unreachable=0 failed=0
localhost : ok=75 changed=40 unreachable=0 failed=1
Playbook run took 0 days, 0 hours, 10 minutes, 10 seconds
You many need to upgrade the tiller-deploy by reinitiate it.
# use following command to check whether the tiller-deploy pod is running or not
$kubectl get pod -n kube-system
# delete tiller-deploy deployment
$kubectl delete deployment -n kube-system tiller-deploy
# use the same command to confirm that the tiller-deploy is deleted
$kubectl get pod -n kube-system
# use the command below to deploy tiller-deploy again
$helm init
Thank you Richard for your answer. From your answer get the glimpse and researched on it. Found out that certificate builder was outdated just updated that and voila it installed without an error.
Got this error after reinstalling kubernetes integration on gitlab.
The error on the kubernetes integration page was: "Something went wrong while installing GitLab Runner. Operation failed. Check pod logs for install-runner for more details."
It turned out that gitlab is not correctly removing any deployments/pods on google cloud console after deleting kubernetes integration.
To get pod logs:
kubectl -n gitlab-managed-apps get pods
kubectl -n gitlab-managed-apps logs [pod-name]
To solve the problem:
First remove your kubernetes integration on gitlab. Then delete the gitlab-managed-apps workspace.
gcloud config set project [project-id]
kubectl delete namespace gitlab-managed-apps
At the end re-add kubernetes integration.
Have fun.

Hyperledger-fabric : chaincode deploy connection error

I'm trying to test fabric chaincode example02, with docker. I'm newbie :)
This is my docker-compose.yml :
membersrvc:
image: hyperledger/fabric-membersrvc
command: membersrvc
vp0:
image: hyperledger/fabric-peer
environment:
- CORE_PER_ID=vp0
- CORE_PEER_ADDRESSAUTODETECT=true
- CORE_VM_ENDPOINT=http://0.0.0.0:2375
- CORE_LOGGING_LEVEL=DEBUG
command: sh -c "sleep 5; peer node start --peer-chaincodedev"
vp1:
extends:
service: vp0
environment:
- CORE_PEER_ID=vp1
- CORE_PEER_DISCOVERY_ROOTNODE=vp0:7051
links:
- vp0
vp2:
extends:
service: vp0
environment:
- CORE_PEER_ID=vp2
- CORE_PEER_DISCOVERY_ROOTNODE=vp0:7051
links:
- vp0
and I run (I refered to Fabric chaincode setup page):
Terminal 1 :
$ docker-compose up
Terminal 2 :
$ cd /hyperledger/examples/chaincode/go/chaincode_example02
$ CORE_CHAINCODE_ID_NAME=mycc CORE_PEER_ADDRESS=0.0.0.0:7051 ./chaincode_example02
Terminal 3 :
$ peer chaincode deploy -n mycc -c '{"Args": ["init", "a","100", "b", "200"]}'
It works well in terminal 1,2. But terminal 3 shows connection error.
2016/10/21 04:39:15 grpc: addrConn.resetTransport failed to create client
transport: connection error: desc = "transport: dial tcp 0.0.0.0:7051:
getsockopt: connection refused"; Reconnecting to {"0.0.0.0:7051" <nil>}
Error: Error building chaincode: Error trying to connect to local peer:
grpc: timed out when dialing
What's the problem?
It seems you are missing the compose statements to map the required ports from the docker container to the host machine (where you are trying out the peer command ). So its possible that the peer process is listening on port 7051 inside your peer docker container, but this connection is not available to the peer command used outside of this container in terminal 3.
You can map ports using the 'ports' tag. eg:
membersrvc:
image: hyperledger/fabric-membersrvc
ports:
- "7054:7054"
command: membersrvc
vp0:
image: hyperledger/fabric-peer
ports:
- "7050:7050"
- "7051:7051"
- "7053:7053"
environment:
- CORE_PER_ID=vp0
- CORE_PEER_ADDRESSAUTODETECT=true
- CORE_VM_ENDPOINT=http://0.0.0.0:2375
- CORE_LOGGING_LEVEL=DEBUG
command: sh -c "sleep 5; peer node start --peer-chaincodedev"
Before you do peer chaincode deploy ...in terminal 3, you can check if a the peer process is listening on port 7051 using
netstat -lnptu |grep 7051

Resources