Kubernetes service unreachable from master node on EC2 - amazon-ec2

I created a k8s cluster on AWS, using kubeadm, with 1 master and 1 worker following the guide available here.
Then, I started 1 ElasticSearch container:
kubectl run elastic --image=elasticsearch:2 --replicas=1
And it was deployed successfully on worker. Then, I try to expose it as a service on cluster:
kubectl expose deploy/elastic --port 9200
And it was exposed successfully:
NAMESPACE NAME READY STATUS RESTARTS AGE
default elastic-664569cb68-flrrz 1/1 Running 0 16m
kube-system etcd-ip-172-31-140-179.ec2.internal 1/1 Running 0 16m
kube-system kube-apiserver-ip-172-31-140-179.ec2.internal 1/1 Running 0 16m
kube-system kube-controller-manager-ip-172-31-140-179.ec2.internal 1/1 Running 0 16m
kube-system kube-dns-86f4d74b45-mc24s 3/3 Running 0 17m
kube-system kube-flannel-ds-fjkkc 1/1 Running 0 16m
kube-system kube-flannel-ds-zw4pq 1/1 Running 0 17m
kube-system kube-proxy-4c8lh 1/1 Running 0 17m
kube-system kube-proxy-zkfwn 1/1 Running 0 16m
kube-system kube-scheduler-ip-172-31-140-179.ec2.internal 1/1 Running 0 16m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default elastic ClusterIP 10.96.141.188 <none> 9200/TCP 16m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 17m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 17m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system kube-flannel-ds 2 2 2 2 2 beta.kubernetes.io/arch=amd64 17m
kube-system kube-proxy 2 2 2 2 2 <none> 17m
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default elastic 1 1 1 1 16m
kube-system kube-dns 1 1 1 1 17m
NAMESPACE NAME DESIRED CURRENT READY AGE
default elastic-664569cb68 1 1 1 16m
kube-system kube-dns-86f4d74b45 1 1 1 17m
But, when I try to execute a curl to http://10.96.141.188:9200 (from the master node) I'm getting a timeout, and everything indicates that the generated cluster IP is not reachable from the master node. It's working only on worker node.
I tried everything I could found:
Add a bunch of rules to iptables
iptables -P FORWARD ACCEPT
iptables -I FORWARD 1 -i cni0 -j ACCEPT -m comment --comment "flannel subnet"
iptables -I FORWARD 1 -o cni0 -j ACCEPT -m comment --comment "flannel subnet"
iptables -t nat -A POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE
Disable firewalld
Enable all ports on ec2 security policy (from everywhere)
Use different docker versions (1.13.1, 17.03, 17.06, 17.12)
Different k8s versions (1.9.0 ~1.9.6)
Differents CNI (flannel and weave)
Add some parameters to kubeadm init command (--node-name with FQDN and --apiserver-advertise-address with public master IP)
But none of this worked. It appears that is a specific issue on AWS, since the tutorial guide works fine on Linux Academy Cloud Server.
Is there anything else I could try?
Obs:
Currently, I'm using docker 1.13 and k8s 1.9.6 (with flannel 0.9.1) on Centos7.

I finally found the problem. According to this page, Flannel needs to open ports UDP 8285 and 8472 on both Master and Worker node. It's interesting that this is not mentioned at official kubeadm documentation.

kubectl run elastic --image=elasticsearch:2 --replicas=1
As best I can tell, you did not inform kubernetes that the elasticsearch:2 image listens on any port(s), which it will not infer by itself. You would have experienced the same problem if you had just run that image under docker without similarly specifying the --publish or --publish-all options.
Thus, when the ClusterIP attempts to forward traffic from port 9200 to the Pods matching its selector, those packets fall into /dev/null because the container is not listening for them.
Add a bunch of rules to iptables
Definitely don't do that; if you observed, there are already a ton of iptables rules that are managed by kube-proxy: in fact, its primary job in life is to own the iptables rules on the Node upon which it is running. Your rules only serve to confuse both kube-proxy as well as any person who follows along behind you, trying to work out where those random rules came from. If you haven't already made them permanent, then either undo them or just reboot the machine to flush those tables. Leaving your ad-hoc rules in place will 100% not make your troubleshooting process any easier.

Related

How to get OpenEBS Mayastor clustered storage to work on microk8s?

I have tried all sorts of things to get OpenEBS Mayastor clustered storage to work on microk8s without much success. So rather than give up completely I thought I would detail one of my failed attempts and see if anyone could figure out what I am doing wrong. Thanks in advance for any help you can give me :-)
Failed Attempt
Here is the results of following the steps posted on at https://microk8s.io/docs/addon-mayastor.
VM Setup:
3 VM running Ubuntu 22.04 with 16GB ram on a vSphere hypervisor. I have used these same VM to create a 3 node microk8s cluster with good success in the past.
Microk8s removal:
removed microk8s on all 3 nodes.
microk8s stop
sudo snap remove microk8s --purge
sudo reboot
Microk8s fresh install:
https://microk8s.io/docs/setting-snap-channel
snap info microk8s
latest/stable: v1.26.0 2022-12-17 (4390) 176MB classic
On all 3 nodes:
sudo snap install microk8s --classic --channel=1.26/stable
sudo usermod -a -G microk8s $USER
sudo chown -f -R $USER ~/.kube
newgrp microk8s
sudo reboot
verify everything is ok
microk8s status
microk8s inspect
**Do what inspect tells you to do:**
WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent
sudo iptables -S
sudo iptables-legacy -S
sudo iptables -P FORWARD ACCEPT
sudo apt-get install iptables-persistent
sudo systemctl is-enabled netfilter-persistent.service
sudo reboot
microk8s inspect
still get the IPtable FORWARD warning on 2 of the 3 nodes.
hopefully it is not that important.
ping all the ip addresses in cluster from every node.
Followed the directions at https://microk8s.io/docs/addon-mayastor
step 1:
sudo sysctl vm.nr_hugepages=1024
echo 'vm.nr_hugepages=1024' | sudo tee -a /etc/sysctl.conf
sudo nvim /etc/sysctl.conf
step 2:
sudo apt install linux-modules-extra-$(uname -r)
sudo modprobe nvme_tcp
echo 'nvme-tcp' | sudo tee -a /etc/modules-load.d/microk8s-mayastor.conf
sudo nvim /etc/modules-load.d/microk8s-mayastor.conf
step 3:
microk8s enable dns
microk8s enable helm3
thought we might need rbac so I enabled that also.
microk8s enable rbac
Created 3 node cluster.
from main node.
sudo microk8s add-node
go to 2nd node.
microk8s join 10.1.0.116:25000/0c902af525c13fbfb5e7c37cff29b29a/acf13be17a96
from main node.
sudo microk8s add-node
go to 3rd node.
microk8s join 10.1.0.116:25000/36134181872079c649bed48d969a006d/acf13be17a96
microk8s status
enable the mayastor add-on:
from main node.
sudo microk8s enable core/mayastor --default-pool-size 20G
go to 2nd node.
sudo microk8s enable core/mayastor --default-pool-size 20G
Addon core/mayastor is already enabled
go to 3rd node.
sudo microk8s enable core/mayastor --default-pool-size 20G
Addon core/mayastor is already enabled
Wait for the mayastor control plane and data plane pods to come up:
sudo microk8s.kubectl get pod -n mayastor
NAME READY STATUS RESTARTS AGE
mayastor-csi-962jf 0/2 ContainerCreating 0 2m6s
mayastor-csi-l4zxx 0/2 ContainerCreating 0 2m5s
mayastor-8pcc4 0/1 Init:0/3 0 2m6s
msp-operator-74ff9cf5d5-jvxqb 0/1 Init:0/2 0 2m5s
mayastor-lt8qq 0/1 Init:0/3 0 2m5s
etcd-operator-mayastor-65f9967f5-mpkrw 0/1 ContainerCreating 0 2m5s
mayastor-csi-6wb7x 0/2 ContainerCreating 0 2m5s
core-agents-55d76bb877-8nffd 0/1 Init:0/1 0 2m5s
csi-controller-54ccfcfbcc-m94b7 0/3 Init:0/1 0 2m5s
mayastor-9q4gl 0/1 Init:0/3 0 2m5s
rest-77d69fb479-qsvng 0/1 Init:0/2 0 2m5s
# Still waiting
sudo microk8s.kubectl get pod -n mayastor
NAME READY STATUS RESTARTS AGE
mayastor-8pcc4 0/1 Init:0/3 0 32m
msp-operator-74ff9cf5d5-jvxqb 0/1 Init:0/2 0 32m
mayastor-lt8qq 0/1 Init:0/3 0 32m
core-agents-55d76bb877-8nffd 0/1 Init:0/1 0 32m
csi-controller-54ccfcfbcc-m94b7 0/3 Init:0/1 0 32m
mayastor-9q4gl 0/1 Init:0/3 0 32m
rest-77d69fb479-qsvng 0/1 Init:0/2 0 32m
mayastor-csi-962jf 2/2 Running 0 32m
mayastor-csi-l4zxx 2/2 Running 0 32m
etcd-operator-mayastor-65f9967f5-mpkrw 1/1 Running 1 32m
mayastor-csi-6wb7x 2/2 Running 0 32m
etcd-6tjf7zb9dh 0/1 Init:0/1 0 30m
Went to the trouble-shooting section at https://microk8s.io/docs/addon-mayastor
microk8s.kubectl logs -n mayastor daemonset/mayastor
output was:
Found 3 pods, using pod/mayastor-8pcc4
Defaulted container "mayastor" out of: mayastor, registration-probe (init), etcd-probe (init), initialize-pool (init)
Error from server (BadRequest): container "mayastor" in pod "mayastor-8pcc4" is waiting to start: PodInitializing

Seldon-core deployment in GKE private cluster with Anthos Service Mesh

I'm trying to use GKE private cluster with standard config, with the Anthos service mesh managed profile. However, when I try to deploy "Iris" model for the test, the deployment stuck in calling "storage.googleapis.com":
$ kubectl get all -n test
NAME READY STATUS RESTARTS AGE
pod/iris-model-default-0-classifier-dfb586df4-ltt29 0/3 Init:1/2 0 30s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/iris-model-default ClusterIP xxx.xxx.65.194 <none> 8000/TCP,5001/TCP 30s
service/iris-model-default-classifier ClusterIP xxx.xxx.79.206 <none> 9000/TCP,9500/TCP 30s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/iris-model-default-0-classifier 0/1 1 0 31s
NAME DESIRED CURRENT READY AGE
replicaset.apps/iris-model-default-0-classifier-dfb586df4 1 1 0 31s
$ kubectl logs -f -n test pod/iris-model-default-0-classifier-dfb586df4-ltt29 -c classifier-model-initializer
2022/11/19 20:59:34 NOTICE: Config file "/.rclone.conf" not found - using defaults
2022/11/19 20:59:57 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 20:59:57 ERROR : Attempt 1/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : Attempt 2/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
I used "sidecar injection" with the namespace labeling:
kubectl create namespace test
kubectl label namespace test istio-injection- istio.io/rev=asm-managed --overwrite
kubectl annotate --overwrite namespace test mesh.cloud.google.com/proxy='{"managed":"true"}'
When I don't use "sidecar injection", the deployment was quite successful. But in this case I need to inject the proxy manually to get the accesss to the model API. I wonder if this is the intended operation or not.
Istio sidecars will block connectivity on other init containers. This is a known issue with Istio sidecars unfortunately. A potential workaround is to ask Istio to don't "filter" traffic going to storage.googleapis.com (i.e. don't route that traffic through Istio's egress), which can be done through Istio's excludeIPRanges flag.
In the longer term, due to these shortcomings, Istio seems to be moving away from sidecars into their new "Ambient mesh".

Cannot deploy basic OpenWhisk action onto Kubernetes running with Minikube

I am trying to setup a simple POC of Apache OpenWhisk serverless framework running on Kubernetes. I am using MacOS with Minikube. Here are the specs:
Kubernetes: v1.20.2
Minikube: v1.17.0
Docker: 20.10.0-rc1, 4.26GB allocated
Here are the setup steps for Minikube:
$ minikube start --cpus 2 --memory 4096 --kubernetes-version=v1.20.2
$ minikube ssh -- sudo ip link set docker0 promisc on
$ kubectl create namespace openwhisk
$ kubectl label nodes --all openwhisk-role=invoker
Install OpenWhisk using Helm:
$ helm install owdev ./helm/openwhisk -n openwhisk --create-namespace -f mycluster.yaml
Configure Whisk CLI:
$ wsk property set --apihost 192.168.49.2:31001
$ wsk property set --auth 23bc46b1-71f6-4ed5-8c54-816aa4f8c502:123zO3xZCLrMN6v2BKK1dXYFpXlPkccOFqm12CdAsMgRU4VrNZ9lyGVCGuMDGIwP
The 192.168.49.2 IP address of Minikube was confirmed by typing:
$ minikube ip
Here is my mycluster.yaml file:
whisk:
ingress:
type: NodePort
apiHostName: 192.168.49.2
apiHostPort: 31001
nginx:
httpsNodePort: 31001
I checked the health of my OpenWhisk setup:
$ kubectl get pods -n openwhisk
NAME READY STATUS RESTARTS AGE
owdev-alarmprovider-5b86cb64ff-q86nj 1/1 Running 0 137m
owdev-apigateway-bccbbcd67-7q2r8 1/1 Running 0 137m
owdev-controller-0 1/1 Running 13 137m
owdev-couchdb-584676b956-7pxtc 1/1 Running 0 137m
owdev-gen-certs-7227t 0/1 Completed 0 137m
owdev-init-couchdb-g6vhb 0/1 Completed 0 137m
owdev-install-packages-sg2f4 1/1 Running 0 137m
owdev-invoker-0 1/1 Running 1 137m
owdev-kafka-0 1/1 Running 0 137m
owdev-kafkaprovider-5574d4bf5f-vvdb9 1/1 Running 0 137m
owdev-nginx-86749d59cb-mxxrt 1/1 Running 0 137m
owdev-redis-d65649c5b-vd8d4 1/1 Running 0 137m
owdev-wskadmin 1/1 Running 0 137m
owdev-zookeeper-0 1/1 Running 0 137m
wskowdev-invoker-00-13-prewarm-nodejs10 1/1 Running 0 116m
wskowdev-invoker-00-14-prewarm-nodejs10 1/1 Running 0 116m
wskowdev-invoker-00-15-whisksystem-invokerhealthtestaction0 1/1 Running 0 112m
Finally, I created a simple hello world action following these instructions taken directly from the OpenWhisk documentation. When I try to test the action, I get a network timeout:
$ wsk action create helloJS hello.js
error: Unable to create action 'helloJS': Put "https://192.168.49.2:31001/api/v1/namespaces/_/actions/helloJS?overwrite=false": dial tcp 192.168.49.2:31001: i/o timeout
I tried turning on debug mode with the -d switch, but could not make much of what feedback I am seeing.
My feeling is that there is either a bug at work here, or perhaps Minikube on Mac was never intended to be fully supported on OpenWhisk.
Can anyone suggest what I might try to get this setup and action working?
We stopped maintaining OpenWhisk for Minikube a while ago. With the availability of a full-fledged Kubernetes cluster built-in to Docker Desktop on MacOS and Windows and kind (https://kind.sigs.k8s.io) being available on all of our platforms supporting Minikube was more work than it was worth.
Wait until the pod (starting with the name owdev-install-packages-) packages completes.
This may take some time, after that it should work.

kubernetes nodeport external ip not accessible

I have been trying to deploy the Spring Boot application on kubernetes cluseter. But somehow I can not access the rest end point from outside the cluster.
Here are the steps which i performed
Setup the kubernetes cluster using kubespray following the guide - Kubernetes Cluster setup using Kubespray
Pushed the spring boot docker image to docker hub
Created kubernetes deployment
vagrant#node1:~/spring-boot$ kubectl create deployment demo --image=rahulwagh17/kubernetes:jhooq-k8s-springboot
deployment.apps/demo created
Exposed the deployment with external IP = 1.1.1.1
kubectl expose deployment demo --type=LoadBalancer --name=demo-service --external-ip=1.1.1.1 --port=8080
service/demo-service exposed
This is how my deployment is looking
vagrant#node1:~/spring-boot$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
demo 1/1 1 1 24s
This is how my services are looking
vagrant#node1:~/spring-boot$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-service LoadBalancer 10.233.31.159 1.1.1.1 8080:30099/TCP 13s
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 23h
I can curl the rest end point within the cluster without a problem
vagrant#node1:~/spring-boot$ curl 10.233.31.159:8080/hello
Hello - Jhooq-k8s
Problem I am facing - When i am trying to curl the rest point from outside the cluster, i can not do
$ curl http://1.1.1.1:30099/hello
curl: (7) Failed to connect to 1.1.1.1 port 30099: Operation timed out
I am little new to kubernetes, so any leads or suggestions are highly appreciated
Please try via below approach:
Via Node Port:- Which means NodeIP:NodePort and in this case, please get any node-ip and then run a command
curl http://$NODE_IP:30099/hello
and you should be able to access your service.

IBM Cloud Private monitoring gets 502 bad gateway

The following containers are not starting after installing IBM Cloud Private. I had previously installed ICP without a Management node and was doing a new install after having done and 'uninstall' and did restart the Docker service on all nodes.
Installed a second time with a Management node defined, Master/Proxy on a single node, and two Worker nodes.
Selecting menu option Platform / Monitoring gets 502 Bad Gateway
Event messages from deployed containers
Deployment - monitoring-prometheus
TYPE SOURCE COUNT REASON MESSAGE
Warning default-scheduler 2113 FailedScheduling
No nodes are available that match all of the following predicates:: MatchNodeSelector (3), NoVolumeNodeConflict (4).
Deployment - monitoring-grafana
TYPE SOURCE COUNT REASON MESSAGE
Warning default-scheduler 2097 FailedScheduling
No nodes are available that match all of the following predicates:: MatchNodeSelector (3), NoVolumeNodeConflict (4).
Deployment - rootkit-annotator
TYPE SOURCE COUNT REASON MESSAGE
Normal kubelet 169.53.226.142 125 Pulled
Container image "ibmcom/rootkit-annotator:20171011" already present on machine
Normal kubelet 169.53.226.142 125 Created
Created container
Normal kubelet 169.53.226.142 125 Started
Started container
Warning kubelet 169.53.226.142 2770 BackOff
Back-off restarting failed container
Warning kubelet 169.53.226.142 2770 FailedSync
Error syncing pod
The management console sometimes displays a 502 Bad Gateway Error after installation or rebooting the master node. If you recently installed IBM Cloud Private, wait a few minutes and reload the page.
If you rebooted the master node, take the following steps:
Configure the kubectl command line interface. See Accessing your IBM Cloud Private cluster by using the kubectl CLI.
Obtain the IP addresses of the icp-ds pods. Run the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.171 10.10.25.134
In this example, 10.1.231.171 is the IP address of the pod.
In high availability (HA) environments, an icp-ds pod exists for each master node.
From the master node, ping the icp-ds pods. Check the IP address for each icp-ds pod by running the following command for each IP address:
ping 10.1.231.171
If the output resembles the following text, you must delete the pod:
connect: Invalid argument
Delete each pod that you cannot reach:
kubectl delete pods icp-ds-0 -n kube-system
In this example, icp-ds-0 is the name of the unresponsive pod.
In HA installations, you might have to delete the pod for each master node.
Obtain the IP address of the replacement pod or pods. Run the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.172 10.10.2
Ping the pods again. Check the IP address for each icp-ds pod by running the following command for each IP address:
ping 10.1.231.172
If you can reach all icp-ds pods, you can access the IBM Cloud Private management console when that pod enters the available state.

Resources