how to check restart reason of stateful set in kubernetes cluster - elasticsearch

I am deploy a elasticsearch cluster in kubernetes,now the cluster pod restart for many times.For elasticsearch cluster stable,I want to find out why the cluster restart.Now I check the restart pods log it only shows the restarted log output,but the restarted log is no errors,I try to set the cluster not start automatic and I could see the error output on fail,It shows:
StatefulSet.apps "es-cluster" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"
so which is the best way to find out why the pods restart?

To get a detailed view of the log snipper or immediate reasons for startup failures of a pod, run
kubectl describe pod <pod> -n <namespace>
Ideally you should run it as soon as a pod restarts (or you can force it by deleting the pod). This can/should be done in addition to the above comment that suggests tailing the logs, up to the moment when the pods fail to start. Be mindful that if there are multiple containers in a pod you may also need to
kubectl logs -f <pod> -n <namespace> -c <container name> --previous
Cheers

To get the log of restarted pod run
kubectl logs -f <pod> -n <namespace> --previous

Related

Kubernetes logs not found in default locations?

In my k8s environment where spring-boot applications runs, I checked log location in /var/log and /var/lib but both empty. Then I found log location in /tmp/spring.log . It seems this the default log location. My problem are
How kubectl log knows it should read logs from /tmp location. I get log output on kubectl logs command.
I have fluent-bit configured where it has input as
following
[INPUT]
Name tail
Tag kube.dev.*
Path /var/log/containers/*dev*.log
DB /var/log/flb_kube_dev.db
This suggest it should reads logs from /var/log/containers/ but it does not have logs. However i am getting fluent-bit logs successfully. What am i missing here ?
Docker logs only contain the logs that are dumped on STDOUT by your container's process with PID 1 (your container's entrypoint or cmd process).
If you want to see the logs via kubectl logs or docker logs, you should redirect your application logs to STDOUT instead of file /tmp/spring.log. Here's an excellent example of how this can achieved with minimal effort.
Alternatively, you can also use hostPath volumeMount. This way, you can directly access the log from the path on the host.
Warning when using hostPath volumeMount
If the pod is shifted to another host due to some reason, you logs will not move along with it. A new log file will be created on this new host at the same path.
If you are searching for the actual location of the logs outside the containers (and on the host nodes of the cluster), this depends on a couple things. I suppose you are using Docker to run your containers under Kubernetes, which is the most common setup.
On each node of your Kubernetes cluster, you can use the following command to check what is the logging driver being currently used:
docker info | grep -i logging
The default value should be json-file, which means that logs are being written as jsons from the containers, to a certain location on your host nodes.
If you find another driver, such as for example journald, then that means Docker logging driver is sending logs directly to the systemd journal. There are many logging drivers, so as a first check, you should be sure that all yours Kubernetes nodes are configured to log as json files (or, in the way you need to harvest them).
Once this is done, you can start checking where your containers are logging their own log. Choose a Pod to analyze, then:
Identify on which Kubernetes node it is running on
kubectl get pod pod-name -owide
Grab the container ID with something like the following
kubectl get pod pod-name -ojsonpath='{.status.containerStatuses[0].containerID}'
Where the id should be something in the shape of docker://f834508490bd2b248a2bbc1efc4c395d0b8086aac4b6ff03b3cc8fd16d10ce2c
Remove the docker:// part and SSH on the Kubernetes node on which this container is running, then do a
docker inspect container-id | grep -i logpath
Which should give you the log locations for that particular container. You can try tail on the file to check if the logs are really there or not.
In my case, the container I tried this procedure on, was logging inside:
/var/lib/docker/containers/289271086d977dc4e2e0b80cc28a7a6aca32c888b7ea5e1b5f24b28f7601ff63/289271086d977dc4e2e0b80cc28a7a6aca32c888b7ea5e1b5f24b28f7601ff63-json.log

Access k8s pod logs generated from ssh exec

I have a filebeat configured to send my k8s cluster logs to Elasticsearch.
When I connect to the pod directly (kubectl exec -it <pod> -- sh -c bash),
the generated output logs aren't being sent to the destination.
Digging at k8s docs, I couldn't find how k8s is handling STDOUT from a running shell.
How can I configure k8s to send live shell logs?
Kubernetes has (mostly) nothing to do with this, as logging is handled by the container environment used to support Kubernetes, which is usually docker.
Depending on docker version, logs of containers could be written on json-file, journald or more, with the default being a json file. You can do a docker info | grep -i logging to check what is the Logging Driver used by docker. If the result is json-file, logs are being written down on a file in json format. If there's another value, logs are being handled in another way (and as there are various logging drivers, I suggest to check the documentation about them)
If the logs are being written on file, chances are that by using docker inspect container-id | grep -i logpath, you'll be able to see the path on the node.
Filebeat simply harvest the logs from those files and it's docker who handles the redirection between the application STDOUT inside the container and one of those files, with its driver.
Regarding exec commands not being in logs, this is an open proposal ( https://github.com/moby/moby/issues/8662 ) as not everything is redirected, just logs of the apps started by the entrypoint itself.
There's a suggested workaround which is ( https://github.com/moby/moby/issues/8662#issuecomment-277396232 )
In the mean time you can try this little hack....
echo hello > /proc/1/fd/1
Redirect your output into PID 1's (the docker container) file
descriptor for STDOUT
Which works just fine but has the problem of requiring a manual redirect.
Use the following process:
Make changes in your application to push logs to STDOUT. You may configure this in your logging configuration file.
Configure file to read those STDOUT logs (which eventual is some docker log file location like /var/log etc)
Start your file as a DeamonSets, so that logs from new pods and nodes can be anatomically pushed to ES.
For better readability of logs, make sure you push logs in json format.

Microclimate Pod CrashLoopBackOff in IBM Cloud Private

I'm trying to deploy IBM Microclimate to IBM Cloud Private CE 2.1.0.3, as described in the documentation (https://github.com/IBM/charts/blob/master/stable/ibm-microclimate/README.md), but the Microclimate pod status shows CrashLoopBackOff and the Portal is not accessible (it shows a 503 Service Unavailable error in the browser). I tried looking at the logs for the pod, but that is not possible either. Has anyone faced an issue like this one before? Any hints on how to troubleshoot or solve the issue? Thanks!
That's not a lot of information to go on. If you'd like some more interactive help do please ask in our Slack channel as per https://microclimate-dev2ops.github.io/community. If you want to debug it here, can you please post the results of: kubectl get pods, kubectl get ing, kubectl describe pods, helm list --tls, kubectl get deployments -o yaml. If you installed to a non-default namespace, please add --namespace [your-mc-ns] to each command.
Adding the command "mount --make-rshared /run" to the Vagrant file for the ICP CE image solves this issue and Microclimate is able to be installed successfully. Reference: https://github.com/IBM/deploy-ibm-cloud-private/issues/139

IBM Cloud Private - Stopped container restarts automatically

In IBM Cloud Private when stopping a Docker container, it automatically restarts. How can it be stopped?
Here's a bit more information:
When you work with containers on IBM Cloud Private, you'd actually deploying individual Pods or more likely Deployments.
When a Pod is managed by a ReplicaSet, DaemonSet, or StatefulSet, there are semantics which apply to reschedule the pod if it fails unexpectedly. Deleting a Pod isn't distinguished from other failures within a pod (application crashes or worker node failure).
You should be using kubectl to work with pods. You can configure kubectl from User > Configure Client in the top right corner of the web UI. Copy and paste the commands for your environment into your console. Validate that the IP or network address is resolvable from your client machine (control this value in the install cluster/config.yaml with cluster_access_ip).
Example kubectl configure steps (Copy from User > Configure Client in the web UI):
kubectl config set-cluster mycluster.icp --server=https://[NETWORK_ADDRESS]:8001 --insecure-skip-tls-verify=true
kubectl config set-context mycluster.icp-context --cluster=mycluster.icp
kubectl config set-credentials mycluster.icp-user --token=[TOKEN]
kubectl config set-context mycluster.icp-context --user=mycluster.icp-user --namespace=default
kubectl config use-context mycluster.icp-context
Then view running pods:
kubectl get pods [--namespace default]
These pods represent the basic unit of deployment: containers + volumes + labels + links to ConfigMaps and Secrets.
These pods are generally deployed from other management "sets":
kubectl get deployments [--namespace default]
kubectl get daemonsets [--namespace default]
kubectl get statefulsets [--namespace default]
These collections represent policy + pods; behaviors about how to recover are built into each construct.
You probably have a deployment, so to remove the container --
kubectl get deployments -o wide [--namespace default]
Find the deployment of interest, and delete it:
kubectl delete deployments my-deployment [--namespace default]
Now the deployment will be removed, along with all associated pods.
You need to stop the kubelet first, otherwise it will automatically start up exited containers. You can run “systemctl stop kubelet”.
kubernetes restarts failed containers (pods), you should scale the deployment to 0 instances or delete the deployment, both can be achieved with kubectl (kubectl scale --replicas=0 ...) or using ICP console.
You should change the number of replicas to zero.

`docker-compose up` times out with UnixHTTPConnectionPool

In our Jenkins agents we are running about several (around 20) tests whose setup involves running docker-compose up for a "big" number of services/containers (around 14).
From time to time, I'll get the following error:
ERROR: for testdb-data UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Haven't been able to reproduce this consistently. And I'm still trying to figure out whether or not there is a correlation with our agent's resources being at full use.
docker -v is 1.10.1 and docker-compose -v is 1.13.1.
Any ideas about what this may be related to?
Restarting docker service:
sudo systemctl restart docker
and setting DOCKER_CLIENT_TIMEOUT and COMPOSE_HTTP_TIMEOUT environment variables:
export DOCKER_CLIENT_TIMEOUT=120
export COMPOSE_HTTP_TIMEOUT=120
are two workarounds for now. But the issues are still open in docker compose github:
https://github.com/docker/compose/issues/3927
https://github.com/docker/compose/issues/4486
https://github.com/docker/compose/issues/3834
I had the same problem. It was solved after change the max-file size value from a number to a string.
Wrong config
logging:
options:
max-file: 10
max-size: 10m
Correct config
logging:
options:
max-file: "10"
max-size: 10m
docker-compose down
Running docker-compose down and then running docker-compose up --build may work. I am working on vscode and when I encountered a similar problem while building docker, this solution worked for me.
Before performing above mentioned command its better you refer to what is the purpose of docker-compose down
docker-compose restart
Fixed my issue. This will restart all stopped and running services.
docker-compose down
export DOCKER_CLIENT_TIMEOUT=120
export COMPOSE_HTTP_TIMEOUT=120
docker-compose up -d
following steps worked
sometimes it could because of docker internal cache, for me following steps worked:
1: docker-compose down -v --rmi all --> to remove all the image and cache
2: docker-compose up --build

Resources