Leader election- Pod is not selecting as a leader - go

I have implemented leader election using kubernetes/client-go leader election. I have 2 replicas. For the first time both pod is selecting as leader, but same pod is not elected as leader after this. And the leader election get stopped after some time. I tried to delete one pod, then the new pod that is created is selected as leader. Again once the pod stopped leading, no pod is acting as leader. I am using configmap for resource lock. Please help me to solve the issue.
func NewElectorWithCallbacks(namespace, configMapName, identity string, ttl time.Duration, client cli.CoreV1Interface, callbacks *leaderelection.LeaderCallbacks) (*leaderelection.LeaderElector, error) {
hostname, err := os.Hostname()
if err != nil {
return nil, err
}
broadcaster := record.NewBroadcaster()
broadcaster.StartLogging(log.Printf)
broadcaster.StartRecordingToSink(&cli.EventSinkImpl{Interface: client.Events(namespace)})
recorder := broadcaster.NewRecorder(scheme.Scheme, api.EventSource{Component: identity, Host: hostname})
cmLock := &resourcelock.ConfigMapLock{
Client: client,
ConfigMapMeta: meta.ObjectMeta{
Namespace: namespace,
Name: configMapName,
},
LockConfig: resourcelock.ResourceLockConfig{
Identity: identity,
EventRecorder: recorder,
},
}
if callbacks == nil {
callbacks = NewDefaultCallbacks()
}
config := leaderelection.LeaderElectionConfig{
Lock: cmLock,
LeaseDuration: ttl,
RenewDeadline: ttl / 2,
RetryPeriod: ttl / 4,
Callbacks: *callbacks,
}
return leaderelection.NewLeaderElector(config)
}
config, err = rest.InClusterConfig()
v1Client, err := v1.NewForConfig(config)
callbacks := &leaderelection.LeaderCallbacks{
OnStartedLeading: func(context.Context) {
// do the work
fmt.Println("selected as leader")
// Wait forever
select {}
},
OnStoppedLeading: func() {
fmt.Println("Pod stopped leading")
},
}
elector, err := election.NewElectorWithCallbacks(namespace, electionName, hostname, ttl, v1Client, callbacks)
elector.Run(context.TODO())

You can deploy the pods as statefullsets & headless service. Please refer the docs
Why?
Pods will create sequentially. You define the first pod being launched is Master and rest are slaves.
Pods in a StatefulSet have a unique ordinal index and a stable network identity. For example below,
kubectl get pods -l app=nginx
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 1m
web-1 1/1 Running 0 1m
Even if the pod web-0 restarts, the name or FQDN for pod never change.
web-0.nginx.default.svc.cluster.local
<pod_name>.<service_name>.<namespace>.svc.cluster.local
I have only highlighted few points, please go though the docs completly.

Related

How to add condition to test http request using httptest in golang

I am beginner in golang and started working on backend RBAC application to manage access of Kubernetes cluster, we have a monitoring stack that is behind proxy serves prometheus , thanos and grafana URL. I am not able to add conditions to check HTTP status using httptest. I have to add condition if pods are up and running else print the error.
rq := httptest.NewRequest("GET", "/", nil)
rw := httptest.NewRecorder()
proxy.ServeHTTP(rw, rq)
if rw.Code != 200 && monitoringArgs.selector == "PROMETHEUS" {
fmt.Printf("Target pods are in error state, please check with 'oc get pods -n %s -l %s'", monitoringArgs.namespace, monitoringArgs.selector)
}
How can I added condition for all three prometheus/grafana/Thanos
You can use the restart count of POD also in logic something like
pods, err := clientset.CoreV1().Pods(namespace).List(metav1.ListOptions{
LabelSelector: "app=myapp",
})
// check status for the pods - to see Probe status
for _, pod := range pods.Items {
pod.Status.Conditions // use your custom logic here
for _, container := range pod.Status.ContainerStatuses {
container.RestartCount // use this number in your logic
}
}

No data recover from incluster config with kubernetes/go-client

I made a demo with kubernetes/go-client where i tried to list pods from my cluster.
config, err := rest.InClusterConfig()
if err != nil {
panic(err.Error())
}
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
panic(err.Error())
}
pods, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{})
fmt.Fprint(w, "There are d pods in the cluster\n", len(pods.Items))
I created serviceaccount token to assign to the pod where this code is running in.
But when code is executed pods.Items has no pods.
I deployed this pod inside minikube. When I launch some kubectl command for listing pods, this way I can get resources so it is no t permissions problems.
I wonder what is happening and how i can fix it.
Repository https://github.com/srpepperoni/inframanager.git
Image is pushed into: https://hub.docker.com/r/jaimeyh/inframanager
The endpoint I have problems with is this one :
mux.HandleFunc("/getPods", GetPodsFromNamespace)
You need to check if the err on the last line is non-nil.
pods, err := clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{})
OK, there is the problem. pods is forbidden: User "system:serviceaccount:mis-pruebas:sa-prueba-go" cannot list resource "pods" in API group "" at the cluster scope
As the error message indicates, the ServiceAccount does not have permission to list pods at cluster scope. You need to create Role and bind it to the ServiceAccount.
The article Using RBAC Authorization even has a role example for how to create such a role.

creating Kubernetes POD with DeletionGracePeriodSeconds is not respected

I am creating Kubernetes POD with Golang. Iam trying to set DeletionGracePeriodSeconds but after creating pod, the pod has 30 in this field while I am setting 25.
Name of the pod is OK, so after creating the POD it has name that I assigned in code.
func setupPod(client *Client, ns string, name string, labels map[string]string) (*v1.Pod, error) {
seconds := func(i int64) *int64 { return &i }(25)
pod := &v1.Pod{}
pod.Name = name
pod.Namespace = ns
pod.SetDeletionGracePeriodSeconds(seconds) //it is 25 seconds under debugger
pod.DeletionGracePeriodSeconds = seconds
pod.Spec.Containers = []v1.Container{v1.Container{Name: "ubuntu", Image: "ubuntu", Command: []string{"sleep", "30"}}}
pod.Spec.NodeName = "node1"
if labels != nil {
pod.Labels = labels
}
_, err := client.client.CoreV1().Pods(ns).Create(client.context, pod, metav1.CreateOptions{})
return pod, err
}
DeletionGracePeriodSeconds is read-only and hence you can not change it. You should instead set terminationGracePeriodSeconds and kubernetes will set the DeletionGracePeriodSeconds accordingly. You can verify that by getting the value and printing it.
From the API docs
Number of seconds allowed for this object to gracefully terminate
before it will be removed from the system. Only set when
deletionTimestamp is also set. May only be shortened. Read-only.
podSpec := &v1.Pod{
Spec: v1.PodSpec{
TerminationGracePeriodSeconds: <Your-Grace-Period>
},
}
_, err = clientset.CoreV1().Pods("namespacename").Create(context.TODO(), podSpec, metav1.CreateOptions{})

Confluent Kafka Golang Client Producer "Broker: Not enough in-sync replicas"

I am attempting to test out a producer writing messages to a topic on a kafka cluster using the Golang client. This works fine writing to a topic on a local cluster, I just copied and pasted the example code from their github repo.
package main
import (
"fmt"
"gopkg.in/confluentinc/confluent-kafka-go.v1/kafka"
)
func main() {
p, err := kafka.NewProducer(&kafka.ConfigMap{"bootstrap.servers":"localhost"})
if err != nil {
panic(err)
}
defer p.Close()
// Delivery report handler for produced messages
go func() {
for e := range p.Events() {
switch ev := e.(type) {
case *kafka.Message:
if ev.TopicPartition.Error != nil {
fmt.Printf("Delivery failed: %v\n", ev.TopicPartition)
} else {
fmt.Printf("Delivered message to %v\n", ev.TopicPartition)
}
}
}
}()
// Produce messages to topic (asynchronously)
topic := "test"
for _, word := range []string{"test message"} {
p.Produce(&kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
Value: []byte(word),
}, nil)
}
// Wait for message deliveries before shutting down
p.Flush(15 * 1000)
}
I receive the message on my console-consumer no issues.
I then try to do the same thing, just using my remote kafka cluster topic (note I also tried without the ports in the strings):
p, err := kafka.NewProducer(&kafka.ConfigMap{"bootstrap.servers":"HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092"})
it prints the following error:
Delivery failed: test[0]#end(Broker: Not enough in-sync replicas)
The console producer has no issues though:
./bin/kafka-console-producer.sh --broker-list HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test
>proving that this works
The console-consumer receives it:
bin/kafka-console-consumer.sh --bootstrap-server HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test --from-beginning
proving that this works
Last thing I did was check to see how many In-Sync replicas there were for that topic. If I am reading this correctly, the min should be 2 and there are 3.
./bin/kafka-topics.sh --describe --bootstrap-server HOSTNAME1.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:min.insync.replicas=2,flush.ms=10000,segment.bytes=1073741824,retention.ms=86400000,flush.messages=9223372036854775807,max.message.bytes=1000012,min.cleanable.dirty.ratio=0.5,unclean.leader.election.enable=true,retention.bytes=-1,delete.retention.ms=86400000,segment.ms=604800000
Topic: test Partition: 0 Leader: 3 Replicas: 3 Isr: 3
Any ideas of what else I could look into?
You have min.insync.replicas=2, but the topic only has one replica.
If you have request.required.acks=all (which is the default), then the producer will fail because it cannot replicate what you've produced to the leader broker to the minimum set of required replicas
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md#topic-configuration-properties
I believe console producer only sets that property to just 1
there are 3
There's actually only one. That's broker ID 3. You'd see a total of three separate numbers there as ISR if there were actually three replicas
Or if you're using AWS's MSK, this could arise when the EBS storage per broker is completely used for one of the broker and the possible way to overcome is to increase it's storage.

Kafka: client has run out of available brokers

UPDATE: It turned out I had an issue with my ports in Docker. Not sure why that fixed this phenomenon.
I believe I have come across a strange error. I am using the Sarama library and am able to create a consumer successfully.
func main() {
config = sarama.NewConfig()
config.ClientID = "go-kafka-consumer"
config.Consumer.Return.Errors = true
// Create new consumer
master, err := sarama.NewConsumer("localhost:9092", config)
if err != nil {
panic(err)
}
defer func() {
if err := master.Close(); err != nil {
panic(err)
}
}()
partitionConsumer, err := master.ConsumePartition("myTopic",0,
sarama.OffsetOldest)
if err != nil {
panic(err)
}
}
As soon as I break this code up and move outside the main routine, I run into the error:
kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
I have split my code up as follows: the previous main() method I have now converted into a consumer package with a method called NewConsumer() and my new main() calls NewConsumer() like so:
c := consumer.NewConsumer()
The panic statement is getting triggered in the line with sarama.NewConsumer and prints out kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Why would breaking up my code this way trigger Sarama to fail to make the consumer? Does Sarama need to be run directly from main?
I think you create this way 2 or more consumers that get grouped into a single group (probably go-kafka-consumer). Your Broker has a Topic with 1 Partition, so one of Group gets assigned, the other one produces this error message. If you would raise the Partitions of that Topic to 2 the error would go away.
But I think your problem is that you somehow have instantiated more consumers than before.
From Kafka in a Nutshell:
Consumers can also be organized into consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic. If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from. If you have more partitions than consumers then consumers will receive messages from multiple partitions. If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.
They would not exactly produce an Error, so that would be an issue with Sarama.

Resources