How to handle long running tasks in reconciliation loop - go

I am writing a Kubernetes operator and dealing with a peculiar situation of handling long-running tasks from the reconcile loop.
I have the following situation:
func (r *ProvMyAppReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
// _ = context.Background()
_ = r.Log.WithValues("Instance.Namespace", req.NamespacedName)
// your logic here
var i int32
var j int32
var yappMyAppSpex myAppingv1alpha1.MyAppSpec
var result *ctrl.Result
var msg string
var requeueFlag bool
runFlag = false
<<<<< Instance prov logic>>>
// =============================== Check Deletion TimeStamp========
// Check if the MyAppOMyApp instance is marked to be deleted, which is
// // indicated by the deletion timestamp being set.
// <<<<<<<Deleteion and finalizer logic is here>>>>>>>>
// ================================ MyApp Setup ===================
if len(instance.Spec.MyApp) > 0 {
for i = 0; i < int32(len(instance.Spec.MyApp)); i++ {
yappMyAppSpex = instance.Spec.MyApp[i]
if !yappMyAppSpex.setRemove {
result, err = r.provisionStatefulSet(instance, prepareStatefulSetForMyApp(instance, yappMyAppSpex), "TYPE=TELEAPP")
if err != nil {
return *result, err
} else {
_, err = r.addMyApp(yappMyAppSpex.Name) <<< This takes lot of time and if I do any operation on a CR resource , it is not cptured by K8s as operator is busy in running this job in foreground.>>>>
if err != nil {
requeueFlag = true
}
}
}
}
}
if runFlag {
return ctrl.Result{Requeue: true, RequeueAfter: 30 * time.Second}, nil
}
return ctrl.Result{}, nil
}
I am trying to understand what is the best way to handle the above situation? Do I need to use channels and run them in the background? The main issue here is that I have to run some configuration which is taking a lot of time and causing the K8s operator not to handle other updates that are done on CR.

The first thing I would recommend is making good use of the resource status. It's important that consumers of the operator know that the operator has acknowledged the changes and is acting on it.
Then, I would recommend revisiting the API definitions - is this the right API to be using? i.e. can you split this up into more than one controller (or API)?
A simple example: in a restaurant it's not a good idea if the cook is also the waiter because you want customers to know they are being taken care of but cooking could take a long time. It's better for customers to have a waiter that takes the order, marks the order status accordingly, and then hands it to the cook(s) to execute on (this could be another API only used between cook and waiter).
In your example above you could add an API for a single app instance. The main controller would only be responsible for applying that API for each of the instances declared in the top level API (which doesn't take long). The worker controller would respond to the creation of that single-app-instance API and execute on it. This still would take time for the worker but visibility and UX would improve.

Related

Sync Map possibly leading increase in ram and goroutines

Hi here is the code where I make util called as collector
import (
"context"
"errors"
"sync"
"time"
)
type Collector struct {
keyValMap *sync.Map
}
func (c *Collector) LoadOrWait(key any) (retValue any, availability int, err error) {
value, status := c.getStatusAndValue(key)
switch status {
case 0:
return nil, 0, nil
case 1:
return value, 1, nil
case 2:
ctxWithTimeout, _ := context.WithTimeout(context.Background(), 5 * time.Second)
for {
select {
case <-ctxWithTimeout.Done():
return nil, 0, errRequestTimeout
default:
value, resourceStatus := c.getStatusAndValue(key)
if resourceStatus == 1 {
return value, 1, nil
}
time.Sleep(50 * time.Millisecond)
}
}
}
return nil, 0, errRequestTimeout
}
// Store ...
func (c *Collector) Store(key any, value any) {
c.keyValMap.Store(key, value)
}
func (c *Collector) getStatusAndValue(key any) (retValue any, availability int) {
var empty any
result, loaded := c.keyValMap.LoadOrStore(key, empty)
if loaded && result != empty {
return result, 1
}
if loaded && result == empty {
return empty, 2
}
return nil, 0
}
So the purpose of this util is to act as a cache where similar value is only loaded once but read many times. However when an object of Collector is passed to multiple goroutines I am facing increase in gorotines and ram usage whenever multiple goroutines are trying to use collector cache. Could someone explain if this usage of sync Map is correct. If yes then what might be the cause high number of goroutines / high ram usage
For sure, you're facing possible memory leaks due to not calling the cancel func of the newly created ctxWithTimeout context. In order to fix this change the line to these:
ctxWithTimeout, cancelFunc := context.WithTimeout(context.Background(), requestTimeout)
defer cancelFunc()
Thanks to this, you're always sure to clean up all the resources allocated once the context expires. This should address the issue of the leaks.
About the usage of sync.Map seems good to me.
Let me know if this solves your issue or if there is something else to address, thanks!
You show the code on the reader side of things, but not the code which does the request (and calls .Store(key, value)).
With the code you display :
the first goroutine which tries to access a given key will store your empty value in the map (when executing c.keyValMap.LoadOrStore(key, empty)),
so all goroutines that will come afterwards querying for the same key will enter the "query with timeout" loop -- even if the action that actually runs the request and stores its result in the cache isn't executed.
[after your update]
The code for your collector alone seems to be ok regarding resource consumption : I don't see deadlocks or multiplication of goroutines in that code alone.
You should probably look at other places in your code.
Also, if this structure only grows and never shrinks, it is bound to consume more memory. Do audit your program to evaluate how many different keys can live together in your cache and how much memory the cached values can occupy.

Refactoring Golang avoiding manual fields updating between similar structs

I'm using GraphQL and go-pg.
I have many entities like these:
type Player struct {
ID int
CreatedAt time.Time `pg:"default:now(),notnull"`
TeamID int `pg:",notnull"`
Team *Team
Type int
Score int64 `pg:",notnull"`
Note *string
// and others...
}
type PlayerInput struct {
TeamID int
Type int
Score int64
Note *string
// and others...
}
I have many times functions like these:
func (db *postgres) Update(context context.Context, id int, input types.PlayerInput) (*types.Player, error) {
var actualPlayer types.Player
newPlayer := graphqlToDB(&input)
tx, err := db.Begin()
//handle err
err = tx.Model(&actualPlayer).Where("id = ?", id).For("UPDATE").Select()
// handle err and rollback
actualPlayer.TeamID = newPlayer.TeamID
actualPlayer.Type = newPlayer.Type
actualPlayer.Score = newPlayer.Score
actualPlayer.Note = newPlayer.Note
// and others...
_, err = tx.Model(&actualPlayer).WherePK().Update()
// handle err and rollback
err = tx.Commit()
//handle err
return &actualPlayer, nil
}
func graphqlToDB(input *types.PlayerInput) *types.Player {
var output = &types.Player{
TeamID: input.TeamID,
Type: input.Type,
Score: input.Score,
Note: input.Note,
// and others...
}
if input.Type == "example" {
output.Score = 10000000
}
return output
}
I have this code for each entity in my project and I would like to limit/avoid redundant code, specially:
transformation from Graphql input type every time
newPlayer := graphqlToDB(&input)
manual updating of these (and other) fields every time
actualPlayer.TeamID = newPlayer.TeamID
actualPlayer.Type = newPlayer.Type
actualPlayer.Score = newPlayer.Score
actualPlayer.Note = newPlayer.Note
opening and closing DB transaction every time
tx, err := db.Begin()
Am I asking for the moon?
I don't think there's an abnormal amount of redundancy in this code.
transformation from Graphql input type every time
Transforming structs from external to internal models is a common pattern, and helps with separation of concerns. Furthermore, you already have the graphqlToDB function that allows you to reuse the 10 lines of code in its body. That's probably as good as it can get.
manual updating of these (and other) fields every time
In the specific piece of code you showed here, actualPlayer is of type types.Player and graphqlToDB function returns a *types.Player object.
So you could simply write actualPlayer := graphqlToDB(&input) and then pass the pointer around, like tx.Model(actualPlayer).
This saves remapping newPlayer to actualPlayer
opening and closing DB transaction every time
If you need to hit the DB transactionally every time, then you need to open the transaction every time (and then commit/rollback). There's no redundancy in this. Refactoring might just result in loss of readability.

How to synchronize constant writing and periodically reading and updating

Defining the problem:
We have this IOT device which each send us logs about cars locations. We want to compute the distance the car is travelling online! so when ever a log comes(after putting it in a queue etc) we do this:
type Delta struct {
DeviceId string
time int64
Distance float64
}
var LastLogs = make(map[string]FullLog)
var Distances = make(map[string]Delta)
func addLastLog(l FullLog) {
LastLogs[l.DeviceID] = l
}
func AddToLogPerDay(l FullLog) {
//mutex.Lock()
if val, ok := LastLogs[l.DeviceID]; ok {
if distance, exist := Distances[l.DeviceID]; exist {
x := computingDistance(val, l)
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: distance.time + 1,
Distance: distance.Distance + x,
}
} else {
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: 1,
Distance: 0,
}
}
}
addLastLog(l)
}
which basically calculates distance using a utility function! so in Distances each device Id is mapped to some distance traveled! now here is where the problem starts: While this distances are added to Distances map, I want a go routine to put this data in the database but since there are many devices and many logs and so on doing this query for every log is not a good idea. So I need to this for every 5 second which means every 5 seconds try to empty the list of all last distances added to the map. I wrote this function:
func UpdateLogPerDayTable() {
for {
for _, distance := range Distances {
logs := model.HourPerDay{}
result := services.CarDBProvider.DB.Table(model.HourPerDay{}.TableName()).
Where("created_at >? AND device_id = ?", getCurrentData(), distance.DeviceId).
Find(&logs)
if result.Error != nil && !result.RecordNotFound() {
log.Infof("Something went wrong while checking the log: %v", result.Error)
} else {
if !result.RecordNotFound() {
logs.CountDistance = distance.Distance
logs.CountSecond = distance.time
err := services.CarDBProvider.DB.Model(&logs).
Update(map[string]interface{}{
"count_second": logs.CountSecond,
"count_distance": logs.CountDistance,
})
if err.Error != nil {
log.Infof("Something went wrong while updating the log: %v", err.Error)
}
} else if result.RecordNotFound() {
dayLog := model.HourPerDay{
Model: gorm.Model{},
DeviceId: distance.DeviceId,
CountSecond: int64(distance.time),
CountDistance: distance.Distance,
}
err := services.CarDBProvider.DB.Create(&dayLog)
if err.Error != nil {
log.Infof("Something went wrong while adding the log: %v", err.Error)
}
}
}
}
time.Sleep(time.Second * 5)
}
}
it is called go utlis.UpdateLogPerDayTable() on another go routine. However there are many problems here:
I don't know how to secure Distances so when I add it in another routine I read it somewhere else ,every thing is ok!(The problem is that I want to use go channels and don't have any idea how to do it)
How can I schedule tasks in go for this problem?
Probably I will add a redis to store all the devices that or online so I could do the select query faster and just update the actual database. also add an expire time for redis so if a device didn't send and data for some time, it vanishes! where should I put this code?
Sorry If my explanations weren't enough but I really need some help. specifically for code implementation
Go has a really cool pattern using for / select over multiple channels. This allows you to batch distance writes using both a timeout and a max record size. Using this pattern requires using channels.
First thing is to model your distances as a channel:
distances := make(chan Delta)
Then you an keep track of the current batch
var deltas []Delta
Then
ticker := time.NewTicker(time.Second * 5)
var deltas []Delta
for {
select {
case <-ticker.C:
// 5 seconds up flush to db
// reset deltas
case d := <-distances:
deltas = append(deltas, d)
if len(deltas) >= maxDeltasPerFlush {
// flush
// reset deltas
}
}
}
I don't know how to secure Distances so when I add it in another
routine I read it somewhere else ,every thing is ok!(The problem is
that I want to use go channels and don't have any idea how to do it)
If you intend to keep a map and share memory you need to protect it using mutual exclusion (mutex) to synchronize access between go routines. Using a channel allows you to send a copy to a channel, removing the need for synchronizing across the Delta Object. Depending on your architecture you could also create a pipeline of go routines connected by channels, which could make it so only a single go routine (monitor go routine) is accessing the Delta, also removing the need for synchronization.
How can I schedule tasks in go for this problem?
Using a channel as the primitive for how you pass Deltas to different go routines :)
Probably I will add a redis to store all the devices that or online so
I could do the select query faster and just update the actual
database. also add an expire time for redis so if a device didn't send
and data for some time, it vanishes! where should I put this code?
This depends on your finished architecture. You could write a decorator for the select operation, which would check redis first then go to the DB. The client of this function wouldn't have to know about this. Write operations could be done the same way: Write to persistent store and then write back to redis with the cached value and the expiration. Using decorators the client wouldn't need to know about this, they would just perform the Reads and Writes and the cache logic would be implemented inside of the decorators. There are many ways for this, and its largely dependent on where your implementation settles.

Should there be a new datastore.Client per HTTP request?

The official Go documentation on the datastore package (client library for the GCP datastore service) has the following code snippet for demonstartion:
type Entity struct {
Value string
}
func main() {
ctx := context.Background()
// Create a datastore client. In a typical application, you would create
// a single client which is reused for every datastore operation.
dsClient, err := datastore.NewClient(ctx, "my-project")
if err != nil {
// Handle error.
}
k := datastore.NameKey("Entity", "stringID", nil)
e := new(Entity)
if err := dsClient.Get(ctx, k, e); err != nil {
// Handle error.
}
old := e.Value
e.Value = "Hello World!"
if _, err := dsClient.Put(ctx, k, e); err != nil {
// Handle error.
}
fmt.Printf("Updated value from %q to %q\n", old, e.Value)
}
As one can see, it states that the datastore.Client should ideally only be instantiated once in an application. Now given that the datastore.NewClient function requires a context.Context object does it mean that it should get instantiated only once per HTTP request or can it safely be instantiated once globally with a context.Background() object?
Each operation requires a context.Context object again (e.g. dsClient.Get(ctx, k, e)) so is that the point where the HTTP request's context should be used?
I'm new to Go and can't really find any online resources which explain something like this very well with real world examples and actual best practice patterns.
You may use any context.Context for the datastore client creation, it may be context.Background(), that's completely fine. Client creation may be lengthy, it may require connecting to a remote server, authenticating, fetching configuration etc. If your use case has limited time, you may pass a context with timeout to abort the operation. Also if creation takes longer than the time you have, you may use a context with cancel and abort the mission at your will. These are just options which you may or may not use. But the "tools" are given via context.Context.
Later when you use the datastore.Client during serving (HTTP) client requests, then using the request's context is reasonable, so if a request gets cancelled, then so will its context, and so will the datastore operation you issue, rightfully, because if the client cannot see the result, then there's no point completing the query. Terminating the query early you might not end up using certain resources (e.g. datastore reads), and you may lower the server's load (by aborting jobs whose result will not be sent back to the client).

In sync.Map is it necessary to use Load followed by LoadOrStore for complex values

In code where a global map with an expensive to generate value structure may be modified by multiple concurrent threads, which pattern is correct?
// equivalent to map[string]*activity where activity is a
// fairly heavyweight structure
var ipActivity sync.Map
// version 1: not safe with multiple threads, I think
func incrementIP(ip string) {
val, ok := ipActivity.Load(ip)
if !ok {
val = buildComplexActivityObject()
ipActivity.Store(ip, val)
}
updateTheActivityObject(val.(*activity), ip)
}
// version 2: inefficient, I think, because a complex object is built
// every time even through it's only needed the first time
func incrementIP(ip string) {
tmp := buildComplexActivityObject()
val, _ := ipActivity.LoadOrStore(ip, tmp)
updateTheActivity(val.(*activity), ip)
}
// version 3: more complex but technically correct?
func incrementIP(ip string) {
val, found := ipActivity.Load(ip)
if !found {
tmp := buildComplexActivityObject()
// using load or store incase the mapping was already made in
// another store
val, _ = ipActivity.LoadOrStore(ip, tmp)
}
updateTheActivity(val.(*activity), ip)
}
Is version three the correct pattern given Go's concurrency model?
Option 1 obviously can be called by multiple goroutines with a new ip concurrently, and only the last one in the if block would get stored. This possibility is greatly increased the longer buildComplexActivityObject takes, as there is more time in the critical section.
Option 2 works, but calls buildComplexActivityObject every time, which you state is not what you want.
Given that you want to call buildComplexActivityObject as infrequently as possible, the third option is the only one that makes sense.
The sync.Map however cannot protect the actual activity values referenced by the stored pointers. You also need synchronization there when updating the activity value.

Resources