how to recover HA clusters of Stacked control plane and etcd nodes - high-availability

I used kubeadm to setup a HA clusters (3 masters) of Stacked control plane and etcd nodes;But when I use kubeadm reset to destroy one master, cann't join a master to ha clusters anymore:
step1:
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.2.24 etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etcd/kubernetes/pki/etcd/ca.crt --endpoints https://xxx.xxx.xxx.xxx:2379 member remove xxxxxxx
to remove the bad etcd;
step2:
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.2.24 etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etcd/kubernetes/pki/etcd/ca.crt --endpoints https://xxx.xxx.xxx.xxx:2379 cluster-health
……
……
cluster is healthy
step3:
kubeadm get cs
……
……
etcd-0 Healthy {"health":"true"}
step4:
kubeadm join the new master to the ha cluster,but get wrong:
etcd cluster is not healthy: context deadline exceeded
Any one can help me to solve this problem

$kubectl -n kube-system edit cm kubeadm-config
then remove the bad node information below apiEndpoints,
eg: remove these three line of below
master1-k8s:
advertiseAddress: 172.16.12.216
bindPort: 6443
finally you can use "kubeadm join" join the control-plane to HA clusters successfully!!!

Related

Consul Leader not found while writing data to consul

I am new in consul.In my case i have three servers.all are tuning state.
When i checked leader information using following url "http://localhost:8500/v1/status/leader" getting the correct information
"192.168.10.7:8300"
Consul\data\raft have the following information
I could see some answers in stack.it didn't help me.
Also try following command
-bootstrap-expect=3
showing an error given below
Error Log
Consul request failed with status [500]: No cluster leader
Am totally stuck.How can i fix this issue
Use docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp --name node1 -h node1 progrium/consul -server -bootstrap-expect 3
Since we have given expect 3 it means its looking for three peers to get connected first and then it will bootstrap the servers.
1. docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp --name node1 -h node1 progrium/consul -server -bootstrap-expect 3
docker inspect -f '{{.NetworkSettings.IPAddress}}' node1
Use the inspected IP to join with, in next three commands.
2. docker run -d --name node2 -h node2 progrium/consul -server -join 172.17.0.2
3. docker run -d --name node3 -h node3 progrium/consul -server -join 172.17.0.2
4. docker run -d --name node4 -h node4 progrium/consul -server -join 172.17.0.2
And you can start your service now, it will get connected with consul.
Explanation:-
As said in docs Before a Consul cluster can begin to service requests, a server node must be elected leader. And this is reason of your exception on start of spring-boot service the leader has not been elected yet!!
Why the leader has not been elected? The list of servers involved in the cluster should be bootstrapped. And the servers can be bootstrapped using the
-bootstrap-expect configuration option. Recommended
Note:- Just for testing/learning purpose you can go ahead and create a single server because A single server deployment is highly discouraged as data loss is inevitable in a failure scenario.

Worker nodes not available

I have setup and installed IBM Cloud private CE with two ubuntu images in Virtual Box. I can ssh into both images and from there ssh into the others. The ICp dashboard shows only one active node I was expecting two.
I explicitly ran the command (from a root user on master node):
docker run -e LICENSE=accept --net=host \
-v "$(pwd)":/installer/cluster \
ibmcom/cfc-installer install -l \
192.168.27.101
The result of this command seemed to be a successful addition of the worker node:
PLAY RECAP *********************************************************************
192.168.27.101 : ok=45 changed=11 unreachable=0 failed=0
But still the worker node isn't showing in the dashboard.
What should I be checking to ensure the worker node will work for the master node?
If you're using Vagrant to configure IBM Cloud Private, I'd highly recommend trying https://github.com/IBM/deploy-ibm-cloud-private
The project will use a Vagrantfile to configure a master/proxy and then provision 2 workers within the image using LXD. You'll get better density and performance on your laptop with this configuration over running two full Virtual Box images (1 for master/proxy, 1 for the worker).
You can check on your worker node with following steps:
check cluster nodes status
kubectl get nodes to check status of the newly added worker node
if it's NotReady, check kubelet log if there is error message about why kubelet is not running properly:
ICp 2.1
systemctl status kubelet
ICp 1.2
docker ps -a|grep kubelet to get kubelet_containerid,
docker logs kubelet_containerid
Run this to get the kubectl working
ln -sf /opt/kubernetes/hyperkube /usr/local/bin/kubectl
run the below command to identified failed pods if any in the setup on the master node.
Run this to get the pods details running in the environment
kubectl -n kube-system get pods -o wide
for restarting any failed pods of icp
txt="0/";ns="kube-system";type="pods"; kubectl -n $ns get $type | grep "$txt" | awk '{ print $1 }' | xargs kubectl -n $ns delete $type
now run the kubectl cluster-info
kubectl get nodes
Then ckeck the cluster info command of kubectl
Check kubectl version is giving you https://localhost:8080 or https://masternodeip:8001
kubectl cluster-info
Do you get the output
if no..
then
login to https://masternodeip:8443 using admin login
and then copy the configure clientcli settings by clicking on admin on the panel
paste it in ur master node.
and run the
kubectl cluster-info

What are the port mapping complexities solved by Flannel?

Suppose, I have 3 Containers running on a single host and we are making a Hadoop cluster,
1 is master and other 2 are slaves(Namenode and datanodes)
And,we need to map ports:
docker run -itd -p 50070:50070 --name master centos:bigdata
docker run -itd -p 50075:50075 -p 50010:50010 --name slave1 centos:bigdata
Now ports 50075,50010,50070 are busy on host, we cannot map them for slave2
And if we do some random mapping like,
docker run -p 123:50075 -p 234:50010 --name slave2 centos:bigdata
Then, containers won't be able to communicate and it won't work.
So, Can flannel solve this problem?

Accessing consul UI running in docker on OSX

I have a problem similar to How to access externally to consul UI but I can't get the combinations of network options to work right.
I'm on OSX using Docker for Mac, not the old docker-machine stuff, and the official consul docker image, not the progrium/docker image.
I can start up a 3-node server cluster fine using
docker run -d --name node1 -h node1 consul agent -server -bootstrap-expect 3
JOIN_IP="$(docker inspect -f '{{.NetworkSettings.IPAddress}}' node1)"
docker run -d --name node2 -h node2 consul agent -server -join $JOIN_IP
docker run -d --name node3 -h node3 consul agent -server -join $JOIN_IP
So far so good, they're connected to each other and working fine. Now I want to start an agent, and view the UI via it.
I tried a bunch of combinations of -client and -bind, which seem to be the key to all of this. Using
docker run -d -p 8500:8500 --name node4 -h node4 consul agent -join $JOIN_IP -ui -client=0.0.0.0 -bind=127.0.0.1
I can get the UI via http://localhost:8500/ui/, and consul members shows all the nodes:
docker exec -t node4 consul members
Node Address Status Type Build Protocol DC
node1 172.17.0.2:8301 alive server 0.7.1 2 dc1
node2 172.17.0.3:8301 alive server 0.7.1 2 dc1
node3 172.17.0.4:8301 alive server 0.7.1 2 dc1
node4 127.0.0.1:8301 alive client 0.7.1 2 dc1
But all is not well; in the UI it tells me node4 is "Agent not live or unreachable" and in its logs there's a whole bunch of
2016/12/19 18:18:13 [ERR] memberlist: Failed to send ping: write udp 127.0.0.1:8301->172.17.0.4:8301: sendto: invalid argument
I've tried a bunch of other combinations - --net=host just borks things up on OSX.
If I try -bind=my box's external IP it won't start,
Error starting agent: Failed to start Consul client: Failed to start lan serf: Failed to create memberlist: Failed to start TCP listener. Err: listen tcp 192.168.1.5:8301: bind: cannot assign requested address
I also tried mapping all the other ports including the udp ports (-p 8500:8500 -p 8600:8600 -p 8400:8400 -p 8300-8302:8300-8302 -p 8600:8600/udp -p 8301-8302:8301-8302/udp) but that didn't change anything.
How can I join a node up to this cluster and view the UI?
Try using the 0.7.2 release of Consul and start the agent using the following (beta as of 0.7.2, final by 0.8.0) syntax:
$ docker run -d -p 8500:8500 --name node4 -h node4 consul agent -join $JOIN_IP -ui -client=0.0.0.0 -bind='{{ GetPrivateIP }}'
The change being the argument to -bind where Consul will now render out the IP address of a private IP address. The other template parameters are documented in the hashicorp/go-sockaddr.

Docker for Windows: cannot assign requested address

How can I setup a multi-host running Docker 1.12 on Hyper-V?
I can easily assign a 127.x.x.x ip, but I would like to assign e.g. 10.240.0.x.
This is my docker-compose.yaml:
version: '2'
services:
nginx:
image: nginx:lastest
ports:
- "127.0.0.100:80:80"
If I try to assign 10.240.0.100 I get this error:
Error starting userland proxy: listen tcp 10.240.0.100:80: bind: cannot assign requested address
What am I missing? Do I have to configure Windows to support these addresses?
Is this using Docker for Windows? With that, you're limited to binding stuff to localhost on the host.
If you want to test a multi-node swarm on your machine, you need to set up a separate set of VMs:
> docker-machine create -d hyperv --hyperv-virtual-switch "Better New Virtual Switch" master
> docker-machine create -d hyperv --hyperv-virtual-switch "Better New Virtual Switch" worker1
> docker-machine create -d hyperv --hyperv-virtual-switch "Better New Virtual Switch" worker2
Init swarm:
> docker-machine inspect --format '{{ json .Driver.IPAddress }}' master
"192.168.202.112"
> docker-machine ssh master docker swarm init --advertise-addr 192.168.202.112
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-4k5ljcmxs1d9q14lth4tfbg868lf8eqi5alxtvgo7s1ptyrhlu-3ihz3bfmx5622vei1smzetudf \
192.168.202.112:2377
Add the workers:
> docker-machine ssh worker1 docker swarm join --token SWMTKN-1-4k5ljcmxs1d9q14lth4tfbg868lf8eqi5alxtvgo7s1ptyrhlu-3ihz3bfmx5622vei1smzetudf 192.168.202.112:2377
> docker-machine ssh worker2 docker swarm join --token SWMTKN-1-4k5ljcmxs1d9q14lth4tfbg868lf8eqi5alxtvgo7s1ptyrhlu-3ihz3bfmx5622vei1smzetudf 192.168.202.112:2377
SSH into the master and go to town (or use it from the host):
> docker-machine ssh master
> docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
aojoo2h0uuj5hv1c9xajo67o2 worker1 Ready Active
eqt1yd8x52gph3axjkz8lxl1z * master Ready Active Leader
Details here: https://github.com/docker/for-mac/issues/67#issuecomment-242239997

Resources