Yesterday there were some restart of the container for one pod in my Openshift Origin environment.
But today when I am inspecting why those restarts, I went to see the events for the particular pod.
All I see is an empty table. Why this that?
oc v3.6.0-alpha.1+46942ad
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server <server>
openshift v1.4.1
kubernetes v1.4.0+776c994
Events have a time-to-live and are expired after a few hours (2, by default). That prevents those events from filling up the etcd storage space. You can alter the default config to set a longer interval, but this would only be recommended for small clusters.
From Openshift 3.1 Release Notes:
In master-config.yaml, add the following stanza to set the event TTL. If you are experiencing high event volume, set the TTL to a lower value like 15 minutes 15m. The default is two hours 2h
kubernetesMasterConfig:
apiServerArguments:
event-ttl:
- "15m"
Related
I have created a Kubernetes application (Say deployment D1, using docker image I1), that will run on client clusters.
Requirement 1 :
Now, I want to roll updates whenever I update my docker image I1, without any efforts from client side
(Somehow, client cluster should automatically pull the latest docker image)
Requirement 2:
Whenever, I update a particular configMap, the client cluster should automatically start using the new configMap
How should I achieve this ?
Using Kubernetes Cronjobs ?
Kubernetes Operators ?
Or something else ?
I heard that k8s Operator can be useful
Starting with the Requirement 2:
Whenever, I update a particular configMap, the client cluster should
automatically start using the new configMap
If configmap is mounted to the deployment it will get auto-updated however if getting injected as the Environment restart is only option unless you are using the sidecar solution or restarting the process.
For ref : Update configmap without restarting POD
How should I achieve this ?
ImagePullpolicy is not a good option i am seeing however, in that case, manual intervention is required to restart deployment and it
pulls the latest image from the client side and it won't be in a
controlled manner.
Using Kubernetes Cronjobs ?
Cronjobs you will run which side ? If client-side it's fine to do
that way also.
Else you can keep deployment with Exposed API which will run Job to
update the deployment with the latest tag when any image gets pushed
to your docker registry.
Kubernetes Operators ?
An operator is a good native K8s option you can write in Go,
Python or your preferred language with/without Operator framework or Client Libraries.
Or something else?
If you just looking for updating the deployment, Go with running the API in the deployment or Job you can schedule in a controlled manner, no issue with the operator too would be a more native and a good approach if you can create, manage & deploy one.
If in the future you have a requirement to manage all clusters (deployment, service, firewall, network) of multiple clients from a single source of truth place you can explore the Anthos.
Config management from Git repo sync with Anthos
You can build a Kubernetes operator to watch your particular configmap and trigger cluster restart. As for the rolling updates, you can configure the deployment according to your requirement. A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, .spec.template) is changed, for example, if the labels or container images of the template are updated. Add the specifications for rolling update on your Kubernetes deployment .spec section:
type: RollingUpdate
rollingUpdate:
maxSurge: 3 //the maximum number of pods to be created beyond the desired state during the upgrade
maxUnavailable: 1 //the maximum number of unavailable pods during an update
timeoutSeconds: 100 //the time (in seconds) that waits for the rolling event to timeout
intervalSeconds: 5 //the time gap in seconds after an update
updatePeriodSeconds: 5 //time to wait between individual pods migrations or updates
Our horizontal scaling is currently suffering because of Liquibase.
We would want our deployments to always deploy one pod which runs Liquibase (-Dspring.liquibase.enabled=true), and then all subsequent pods to not run it (-Dspring.liquibase.enabled=false).
Is there anything that Kubernetes offers which could do this out of the box?
I'm unfamiliar with Liquibase and I'm unclear how non-first Pods leverage Liquibase but, you may be able to use a lock to control access. A Pod that acquires the lock sets the property to true and, if it is unable to acquire the lock, the property is false.
One challenge will be in ensuring that the lock is released if the first Pod terminates. And, to understand the consequence on the other Pods. Is an existing Pod promoted?
Even though Kubernetes leverages etcd for its own distributed locking purposes, users are encouraged to run separate etcd instances if they need locks. Since you have to choose, you may as well choose what you prefer e.g. Redis, Zookeeper.
You could use an init Container or sidecar for the locking mechanism and a shared volume to record its state.
It feels as though Liquibase should be a distinct Deployment exposed as a Service that all Pods access.
Have you contacted Liquibase to see what it recommends for Kubernetes deployments?
My SpringBoot application is scheduled to run at 1 UTC each day for some data collection and put that in the database. We are using Kubernetes and we have two pods accessing the same database. The database is at some other location for which we have a connection string which is the same in both pods.
The problem is both of my pods wake up at 1 UTC and add duplicate entries in the database? How can I ensure that only one pod is talking to the database? Is this application is not ideal for k8s deployment?
I know this is old, but for anybody else, look into ShedLock. It handles locking across distributed nodes and is pretty easy to implement.
Is there a way to upgrade from Aurora 1 (MySQL 5.6) to Aurora 2 (MySQL 5.7) without downtime on an active database? This seems like a simple task given we should be able to simply do major version upgrades from either the CLI or the Console, but that is not the case.
We tried:
Creating a snapshot of the database
Creating a new cluster using Aurora 2 (MySQL 5.7) from the snapshot
Configure replication to the new cluster from the primary cluster
However, because you can't run commands that require SUPER user privileges in Aurora you're not able to stop transactions long enough to get a good binlog pointer from the master, which results in a ton of SQL errors that are impossible to skip on an active database.
Also, because Aurora is not doing binlog replication to its Read replicas I can't necessarily stop replication to that read replica and get the pointer.
I have seen this semi-related question, but it certainly requires downtime: How to upgrade AWS RDS Aurora MySQL 5.6 to 5.7
UPDATE: AWS just announced in-place upgrade option available for 5.6 > 5.7:
https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-aurora-supports-in-place-upgrades-mysql-5-6-to-5-7/
Simple as Modify and choose version with 2.x. :)
I tested this Aurora MySQL 5.6 > 5.7 on a 25Gb db, many minor versions behind and it took 10 min, with 8 min of downtime. Not zero downtime, but a very easy option, and it can be scheduled in AWS to happen automatically during off-peak times (maintenance window).
Additionally consider RDS Proxy to reduce downtime. During small windows of db unavailable time (eg. reboot for minor updates), the proxy will hold connections open, instead of completely unavailable, simply appearing as a brief delay/latency, only.
Need was to upgrade the AWS RDS Aurora MySQL from 5.6 to 5.7 without causing any downtime to our production. Being a SaaS solution, we could not afford any downtime.
Background
We have distributed architecture based on micro services running in AWS Fargate and AWS Lambda. For data persistency AWS RDS Aurora MySQL is used. While there are other services being used, those are not of interest in this use case.
Approach
After a good deliberation on in place upgrade by declaring a downtime and maintenance window, we realized that having zero downtime upgrade is the need. As without which we would have created a processing backlog for us.
High level approach was:
Create an AWS RDS Cluster with the required version and copy the data from the existing RDS Cluster to this new Cluster
Setup AWS DMS(Data Migration Service) between these two clusters
Once the replication is done and is ongoing then switch the application to point to the new DB. In our case, the micro-services running in AWS Fargate has to upgraded with the new end point and it took care of draining the old and using the new.
For Complete post please check out
https://bharatnainani1997.medium.com/aws-rds-major-version-upgrade-with-zero-downtime-5-6-to-5-7-b0aff1ea1f4
When you create a new Aurora cluster from a snapshot, you get a binlog pointer in the error log from the point at which the snapshot was taken. You can use that to set up replication from the old cluster to the new cluster.
I've followed a similar process to what you've described in your question (multiple times in fact) and was able to keep the actual downtime in the low seconds range.
I have few queries regarding setting up ES on kubernetes as we are currently facing issues while enabling basic security in our 3 master, 2 data and 2 client node configuration
1 Is the helm chart stable/elasticsearch supported for version 6.8.4 or 7.4.2?
2 How can we leverage the user authentication available as a part of basic license by setting Xpack. enabled: true option? Currently using the default settings ( 3 master and 2 data nodes and 2 client), we are not able to get either pods in ready state. On disabling the xpack.enabled option, everything works as normal.
3 Are there any other settings apart from xpack.enabled: true, to allow the statefulsets to function as expected?
4 Is there a provision to allow the inbuilt user secrets created using Kubernetes secrets? How can we ensure that the values of these secrets are known to ES administrators?
5 In case I want to replace the existing chart with elastic’s helm chart (as mentioned in the pre deprecation notice), how can we make sure to deploy the data nodes? Currently I am not able to see a stateful set for data nodes or any configuration.