kubectl port-forward does not return when connection lost - bash

The help of the kubectl port-forward says The forwarding session ends when the selected pod terminates, and rerun of the command is needed to resume forwarding.
Although it does not auto-reconnect when the pod terminates the command does not return either and just hangs with errors:
E0929 11:57:50.187945 62466 portforward.go:400] an error occurred forwarding 8000 -> 8080: error forwarding port 8080 to pod a1fe1d167955e1c345e0f8026c4efa70a84b9d46029037ebc5b69d9da5d30249, uid : network namespace for sandbox "a1fe1d167955e1c345e0f8026c4efa70a84b9d46029037ebc5b69d9da5d30249" is closed
Handling connection for 8000
E0929 12:02:44.505938 62466 portforward.go:400] an error occurred forwarding 8000 -> 8080: error forwarding port 8080 to pod a1fe1d167955e1c345e0f8026c4efa70a84b9d46029037ebc5b69d9da5d30249, uid : failed to find sandbox "a1fe1d167955e1c345e0f8026c4efa70a84b9d46029037ebc5b69d9da5d30249" in store: not found
I would like it to return so that I can handle this error and make the script that will rerun it.
Is there any way or workaround for how to do it?

Based on the information, described on Kubernetes Issues page on GitHub, I can suppose that it is a normal behavior for your case: port-forward connection cannot be canceled on pod deletion, since there is no a connection management inside REST connectors on server side.
A connection being maintained from kubectl all the way through to the kubelet hanging open even if the pod doesn't exist.
We'll proxy a websocket connection kubectl->kubeapiserver->kubelet on port-forward.

Recursive function?
kpf(){ kubectl port-forward $type/$object $LOCAL:$REMOTE $ns || kpf; }
Also check this

Related

Seldon-core deployment in GKE private cluster with Anthos Service Mesh

I'm trying to use GKE private cluster with standard config, with the Anthos service mesh managed profile. However, when I try to deploy "Iris" model for the test, the deployment stuck in calling "storage.googleapis.com":
$ kubectl get all -n test
NAME READY STATUS RESTARTS AGE
pod/iris-model-default-0-classifier-dfb586df4-ltt29 0/3 Init:1/2 0 30s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/iris-model-default ClusterIP xxx.xxx.65.194 <none> 8000/TCP,5001/TCP 30s
service/iris-model-default-classifier ClusterIP xxx.xxx.79.206 <none> 9000/TCP,9500/TCP 30s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/iris-model-default-0-classifier 0/1 1 0 31s
NAME DESIRED CURRENT READY AGE
replicaset.apps/iris-model-default-0-classifier-dfb586df4 1 1 0 31s
$ kubectl logs -f -n test pod/iris-model-default-0-classifier-dfb586df4-ltt29 -c classifier-model-initializer
2022/11/19 20:59:34 NOTICE: Config file "/.rclone.conf" not found - using defaults
2022/11/19 20:59:57 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 20:59:57 ERROR : Attempt 1/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : GCS bucket seldon-models path v1.15.0-dev/sklearn/iris: error reading source root directory: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
2022/11/19 21:00:17 ERROR : Attempt 2/3 failed with 1 errors and: Get "https://storage.googleapis.com/storage/v1/b/seldon-models/o?alt=json&delimiter=%2F&maxResults=1000&prefix=v1.15.0-dev%2Fsklearn%2Firis%2F&prettyPrint=false": dial tcp 199.36.153.8:443: connect: connection refused
I used "sidecar injection" with the namespace labeling:
kubectl create namespace test
kubectl label namespace test istio-injection- istio.io/rev=asm-managed --overwrite
kubectl annotate --overwrite namespace test mesh.cloud.google.com/proxy='{"managed":"true"}'
When I don't use "sidecar injection", the deployment was quite successful. But in this case I need to inject the proxy manually to get the accesss to the model API. I wonder if this is the intended operation or not.
Istio sidecars will block connectivity on other init containers. This is a known issue with Istio sidecars unfortunately. A potential workaround is to ask Istio to don't "filter" traffic going to storage.googleapis.com (i.e. don't route that traffic through Istio's egress), which can be done through Istio's excludeIPRanges flag.
In the longer term, due to these shortcomings, Istio seems to be moving away from sidecars into their new "Ambient mesh".

How to use microk8s kubectl after host reboot (Hyper-V)

I have a fully functional MicroK8s running in my Hyper-V. After my host rebooted, I can't use microk8s kubectl anymore. I always get the following error:
microk8s kubectl get node -o wide
Unable to connect to the server: dial tcp 172.31.119.125:16443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
As I know, the master node IP been changed. If I update the KUBECONFIG locally, I can connect to cluster without problem.
microk8s config > ~/.kube/config
But if I want to use microk8s kubectl get node -o wide to get node status, I always can't get it working. I'm still unable to connect to the server.
I tried to clear all possible cache by removing all .kube/cache folders. But still not working.
sudo rm -rf /.kube/cache /root/.kube/cache /home/ubuntu/.kube/cache /var/snap/microk8s/3582/.kube/cache
I stopped and started MicroK8s again. I'm still unable to connect to the server.
microk8s stop
microk8s start
After MicroK8s restarted, I also tried to find out all files that contains 172.31.119.125 ip address.
grep '172.31.119.125' -r /
Nothing useful found. Only /var contains some logs with 172.31.119.125. That's so weird. Is there anything I can try? How to connect to MicroK8s using microk8s kubectl?
After 1 hours deep dive, I finally realized there is a $env:LOCALAPPDATA\MicroK8s\config file used by MicroK8s which the doc never said.
All you need to do is the update the config file by the following ways:
PowerShell
microk8s config > $env:LOCALAPPDATA\MicroK8s\config
Command Prompt
microk8s config > %LOCALAPPDATA%\MicroK8s\config

how to resolve micok8s port forwarding error on vagrant VMs?

have a 2 node microk8s cluster running on 2 Vagrant VMs (Ubuntu 20.04). trying to forward port forward 443 from host so I can connect to dashboard from the host PC over the private VM network.
sudo microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443
receive the following error:
error: error upgrading connection: error dialing backend: dial tcp: lookup node-1: Temporary failure in name resolution
also noticed that the internal IPs for the nodes are not correct:
the master node is provisioned with an IP of 10.0.1.5 and the worker node 10.0.1.10. in the listing from kubectl both nodes have the same IP of 10.0.2.15.
not sure how to resolve this issue.
note I am able to access the dashboard login screen from http and port 8001 connecting to 10.0.1.5. but submitting the token does not do anything as per the K8s security design:
Logging in is only available when accessing Dashboard over HTTPS or when domain is either localhost
or 127.0.0.1. It's done this way for security reasons.
was able to get passed this issue by adding the nodes to the /etc/hosts file on each node:
10.1.0.10 node-1
10.1.0.5 k8s-master
then was able to restart and issue the port forward command:
sudo microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443 --address 0.0.0.0
Forwarding from 0.0.0.0:10443 -> 8443
then was able to access the K8s dashboard via the token auth method

How can I troubleshoot/fix an issue interacting with a running Kubernetes pod (timeout error)?

I have two EC2 instances, one running a Kubernetes Master node and the other running the Worker node. I can successfully create a pod from a deployment file that pulls a docker image and it starts with a status of "Running". However when I try to interact with it I get a timeout error.
Ex: kubectl logs <pod-name> -v6
Output:
Config loaded from file /home/ec2-user/.kube/config
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name> 200 OK in 11 milliseconds
GET https://<master-node-ip>:6443/api/v1/namespaces/default/pods/<pod-name>/log 500 Internal Server Error in 30002 milliseconds
Server response object: [{"status": "Failure", "message": "Get https://<worker-node-ip>:10250/containerLogs/default/<pod-name>/<container-name>: dial tcp <worker-node-ip>:10250: i/o timeout", "code": 500 }]
I can get information about the pod by running kubectl describe pod <pod-name> and confirm the status as Running. Any ideas on how to identify exactly what is causing this error and/or how to fix it?
Probably, you didn't install any network add-on to your Kubernetes cluster. It's not included in kubeadm installation, but it's required to communicate between pods scheduled on different nodes. The most popular are Calico and Flannel. As you already have a cluster, you may want to chose the network add-on that uses the same subnet as you stated with kubeadm init --pod-network-cidr=xx.xx.xx.xx/xx during cluster initialization.
192.168.0.0/16 is default for Calico network addon
10.244.0.0/16 is default for Flannel network addon
You can change it by downloading corresponded YAML file and by replacing the default subnet with the subnet you want. Then just apply it with kubectl apply -f filename.yaml

Kubernetes proxy connection

I am trying to play around with kubernetes and specifically the REST API. The steps to connect with the cluster API are listed here. However Im stuck in the first step i.e. running kubectl proxy
I try running this:
kubectl --context='vagrant' proxy --port=8080 &
which returns error: couldn't read version from server: Get https://172.17.4.99:443/api: dial tcp 172.17.4.99:443: i/o timeout
What does this mean? How do overcome it connect to the API?
Check that your docker, proxy, kube-apiserver, kube-control-manager services are running without error. Check their status using systemclt status your-service-name. If the service is loaded but not running then restart the service by using systemctl restart your-service-name.

Resources