I'm having a pipeline that has the following structure:
Step 1: kubernetes-deployment
Step 2: kubectl rollout restart deploy -n NAMESPACE
Step 3: http-calls to deployment A and B
In the same Namespace, there's a database pod and Pod A and B are connected to this database.
The problem
The problem now is caused by rolling updates - when applying a rolling update, kubernetes starts new pods as the deployment got updated. The old pod is not terminated until the corresponding new pod starts, though.
As kubectl rollout restart deploy is a non-blocking call, it will not wait for the update to finish. And afaik, there is no builtin way of kubectl to force such behavior.
As I'm executing some HTTP Requests after this was called, i now got the problem that sometimes, when the update is not fast enough, the HTTP Calls are received and answered by the old pods from deployment A and B. Shortly after this, the old pods will be terminated, as the new ones are up and running.
This leads to the problem that the effects of those HTTP requests are no longer visible, as they were received by the old pods, saving the corresponding data in a database located in the "old" database pod. As database pod is restarted, the data will be lost.
Note that I am not using a persistent Volume in this case, as this comes from a nightly build scenario and i want to restart those deployments every day and the database state should always contain only the data from the current day's build.
Do you see any way of solving this?
Including a simple wait step would probably work, but I am curious if there is a fancier solution.
Thanks in Advance!
kubectl rollout status deployment <deploymentname> together with startupProbe and livenessProbe solved my problem.
Related
I built the project to binary file and deployed it to server before. and start it with nohup. But if I updated my code and rebuild my program. I must to kill the process first, then updated the file and start again.
My problem is:
The app must be down with at least few seconds.
I must update file manually (login the server, kill process, replace file, and then start it)
Is there anyway to hot update the program, something like PHP? I just need update my code to server by git (or svn or others way). then the server will rebuild app and graceful restart it.
Usually you run more than one instance of your web application behind a reversed proxy, eg nginx, or any other load balancer. If the few second downtime is an issue for you then you need to have a HA setup anyway. And in such setup you can do a rolling update, where you are replacing instances one by one.
Quick googling will let you find instructions how to do the deployment eg: https://www.digitalocean.com/community/tutorials/how-to-deploy-a-go-web-application-using-nginx-on-ubuntu-18-04
We've been experiencing issues with DockerImageFunction. All deployments fail with the following (and very cryptic to me) error:
Lambda function XXX failed to stabilize since it is in InProgress state
Other functions deploy without any problems. Deployments were working fine until Wednesday 07.04.2021. Since then, they fail every time. We haven't changed anything in our CDK code, in that function Dockerfile or its code.
We deploy using cdk typescript. I tested with 1.93 and 1.97 (latest version at the time of this writing).
Any clue ?
This looks to be a bug that Amazon will need to resolve. However...
In the event someone needs a temporary workaround to deploy their CDK stack which includes a DockerImageFunction and they do not want to delete the whole stack first (perhaps because some resources are S3 buckets with important data), here are some steps that worked for me. This assumes your stack is in the state described above, i.e. an update has failed, the system attempted a rollback, and then the update rollback also failed.
From the CloudFormation console select "Continue update rollback"
Select "advanced options" and choose to skip the function or functions that use the containerized deployment (i.e. DockerImageFunctions)
The rollback should now complete successfully
If you try to deploy again now the stack will return to the UPDATE_ROLLBACK_FAILED state again, so don't bother. Instead comment out all the code that instantiates and references the DockerImageFunctions in your CDK stack class. Then perform the deployment, which should remove those functions and their various roles and permissions from the CloudFormation stack.
Once this is complete you can uncomment all the stack code you just commented out and perform a final deploy. This one should succeed. It did for me at least: all the latest version of my application is deployed.
It seem likely that if I perform another deploy after this the same error will occur and I will have to go through these five steps again. I haven't tried it yet. But at least this is workaround, however clumsy.
FYI - This issue should be resolve. Confirmed with AWS support and our stacks.
I'm having an issue where some of my Pods are on CrashLoopBackOff when I try to deploy CAM from the Catalog. I also followed the instructions in the IBM documentation to clear the data from PVs (By doing rm -Rf /export/CAM_db/*) and purge the previous installations of CAM.
Here are the pods that are on CrashLoopBackOff:
Cam Pods
Here's the specific error when I describe the pod:
MongoDB Pod
Ro-
It is almost always the case that if the cam-mongo pod does not come up properly, the issue is with the PV unable to mount/read/access the actual disk location or the data itself which is on the PV.
Since your pod events indicates container image already exists, and scoped to the store, it seems like you have already tried before to install CAM and its using CE version from the Docker store, correct?
If a prior deploy did not go well, do clean up the disk locations as per the doc,
https://www.ibm.com/support/knowledgecenter/SS2L37_3.1.0.0/cam_uninstalling.html
but like you showed I can see you already tried by cleaning CAM_db, so do the same for the CAM_logs, CAM_bpd and CAM_terraform locations.
Make a note of our install troubleshooting section as it describes a few scenarios in which CAM mongo can be impacted:
https://www.ibm.com/support/knowledgecenter/SS2L37_3.1.0.0/ts_cam_install.html
in the bottom of the PV Create topic, we provide some guidance around the NFS mount options that work best, please review it:
https://www.ibm.com/support/knowledgecenter/SS2L37_3.1.0.0/cam_create_pv.html
Hope this helps you make some forward progress!
The postStart error you can effectively ignore, it means mongo container probably failed to start, so it kills a post script.
This issue usually is due to NFS configuration issue.
I would recommend you to try the troubleshooting steps here in the section that has cam-mongo pod is in CrashLoopBackoff
https://www.ibm.com/support/knowledgecenter/SS2L37_3.1.0.0/ts_cam_install.html
If it's NFS, typically it's things like
-no_root_squash is missing on base directory
-fsid=0 needs to be removed on the base directory for that setup
-folder permissions.
Note. I have seen another customer experiencing this issue and the problem was caused by NFS: there were .snapshot file there already, they have to remove it at first.
Using HyperLedger Composer 0.19.1, I can't find a way to undeploy my business network. I don't necessarily want to upgrade to a newer version each time, but rather replacing the one deployed with a fix in the JS code for instance. Any replacement for the undeploy command that existed before?
There is no replacement for the old undeploy command, and in fact it it not really undeploy - merely hiding the old network.
Be aware that everytime you upgrade a network it creates a new Docker Image and Container so you may want to tidy these up periodically. (You could also try to delete the BNA from the Peer servers but these are very small in comparison to the docker images.)
It might not help your situation, but if you are rapidly developing and iterating you could try this in the online Playground or local Playground with the Web profile - this is fast and does not create any new images/containers.
I have tried all the basics of Kubernetes and if you want to update your application all you can use kubectl rolling-update to update the pods one by one without downtime. Now, I have read the kubernetes documentation again and I have found a new feature called Deployment on version v1beta1. I am confused since I there is a line on the Deployment docs:
Next time we want to update pods, we can just update the deployment again.
Isn't this the role for rolling-update? Any inputs would be very useful.
Deployment is an Object that lets you define a declarative deploy.
It encapsulates
DeploymentStatus object, that is in charge of managing the number of replicas and its state.
DeploymentSpec object, which holds number of replicas, templateSpec , Selectors, and some other data that deal with deployment behaviour.
You can get a glimpse of actual code here:
https://github.com/kubernetes/kubernetes/blob/5516b8684f69bbe9f4688b892194864c6b6d7c08/pkg/apis/extensions/v1beta1/types.go#L223-L253
You will mostly use Deployments to deploy services/applications, in a declarative manner.
If you want to modify your deployment, update the yaml/json you used without changing the metadata.
In contrast, kubectl rolling-update isn't declarative, no yaml/json involved, and needs an existing replication controller.
I have been testing rolling update of a service using both replication controller and declarative deployment objects. I found using rc there appears to be no downtime from a client perspective. But when the Deployment is doing a rolling update, the client gets some errors for a while until the update stabilizes.
This is with kubernetes 1.2.1
The main difference is that "kubectl rolling-update" is client-driven rolling update, whereas the Deployment object gives you server-side rolling update.