How to synchronize constant writing and periodically reading and updating - go

Defining the problem:
We have this IOT device which each send us logs about cars locations. We want to compute the distance the car is travelling online! so when ever a log comes(after putting it in a queue etc) we do this:
type Delta struct {
DeviceId string
time int64
Distance float64
}
var LastLogs = make(map[string]FullLog)
var Distances = make(map[string]Delta)
func addLastLog(l FullLog) {
LastLogs[l.DeviceID] = l
}
func AddToLogPerDay(l FullLog) {
//mutex.Lock()
if val, ok := LastLogs[l.DeviceID]; ok {
if distance, exist := Distances[l.DeviceID]; exist {
x := computingDistance(val, l)
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: distance.time + 1,
Distance: distance.Distance + x,
}
} else {
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: 1,
Distance: 0,
}
}
}
addLastLog(l)
}
which basically calculates distance using a utility function! so in Distances each device Id is mapped to some distance traveled! now here is where the problem starts: While this distances are added to Distances map, I want a go routine to put this data in the database but since there are many devices and many logs and so on doing this query for every log is not a good idea. So I need to this for every 5 second which means every 5 seconds try to empty the list of all last distances added to the map. I wrote this function:
func UpdateLogPerDayTable() {
for {
for _, distance := range Distances {
logs := model.HourPerDay{}
result := services.CarDBProvider.DB.Table(model.HourPerDay{}.TableName()).
Where("created_at >? AND device_id = ?", getCurrentData(), distance.DeviceId).
Find(&logs)
if result.Error != nil && !result.RecordNotFound() {
log.Infof("Something went wrong while checking the log: %v", result.Error)
} else {
if !result.RecordNotFound() {
logs.CountDistance = distance.Distance
logs.CountSecond = distance.time
err := services.CarDBProvider.DB.Model(&logs).
Update(map[string]interface{}{
"count_second": logs.CountSecond,
"count_distance": logs.CountDistance,
})
if err.Error != nil {
log.Infof("Something went wrong while updating the log: %v", err.Error)
}
} else if result.RecordNotFound() {
dayLog := model.HourPerDay{
Model: gorm.Model{},
DeviceId: distance.DeviceId,
CountSecond: int64(distance.time),
CountDistance: distance.Distance,
}
err := services.CarDBProvider.DB.Create(&dayLog)
if err.Error != nil {
log.Infof("Something went wrong while adding the log: %v", err.Error)
}
}
}
}
time.Sleep(time.Second * 5)
}
}
it is called go utlis.UpdateLogPerDayTable() on another go routine. However there are many problems here:
I don't know how to secure Distances so when I add it in another routine I read it somewhere else ,every thing is ok!(The problem is that I want to use go channels and don't have any idea how to do it)
How can I schedule tasks in go for this problem?
Probably I will add a redis to store all the devices that or online so I could do the select query faster and just update the actual database. also add an expire time for redis so if a device didn't send and data for some time, it vanishes! where should I put this code?
Sorry If my explanations weren't enough but I really need some help. specifically for code implementation

Go has a really cool pattern using for / select over multiple channels. This allows you to batch distance writes using both a timeout and a max record size. Using this pattern requires using channels.
First thing is to model your distances as a channel:
distances := make(chan Delta)
Then you an keep track of the current batch
var deltas []Delta
Then
ticker := time.NewTicker(time.Second * 5)
var deltas []Delta
for {
select {
case <-ticker.C:
// 5 seconds up flush to db
// reset deltas
case d := <-distances:
deltas = append(deltas, d)
if len(deltas) >= maxDeltasPerFlush {
// flush
// reset deltas
}
}
}
I don't know how to secure Distances so when I add it in another
routine I read it somewhere else ,every thing is ok!(The problem is
that I want to use go channels and don't have any idea how to do it)
If you intend to keep a map and share memory you need to protect it using mutual exclusion (mutex) to synchronize access between go routines. Using a channel allows you to send a copy to a channel, removing the need for synchronizing across the Delta Object. Depending on your architecture you could also create a pipeline of go routines connected by channels, which could make it so only a single go routine (monitor go routine) is accessing the Delta, also removing the need for synchronization.
How can I schedule tasks in go for this problem?
Using a channel as the primitive for how you pass Deltas to different go routines :)
Probably I will add a redis to store all the devices that or online so
I could do the select query faster and just update the actual
database. also add an expire time for redis so if a device didn't send
and data for some time, it vanishes! where should I put this code?
This depends on your finished architecture. You could write a decorator for the select operation, which would check redis first then go to the DB. The client of this function wouldn't have to know about this. Write operations could be done the same way: Write to persistent store and then write back to redis with the cached value and the expiration. Using decorators the client wouldn't need to know about this, they would just perform the Reads and Writes and the cache logic would be implemented inside of the decorators. There are many ways for this, and its largely dependent on where your implementation settles.

Related

On which step can a goroutine be interrupted

I am writing some asynchromous code in go which basically implements in-memory caching. I have a not very fast source which I query every minute (using ticker), and save the result into a cache struct field. This field can be queried from different goroutines asynchronously.
In order to avoid using mutexes when updating values from source I do not write to the same struct field which is being queried by other goroutines but create another variable, fill it and then assign it to the queried field. This works fine since assigning operation is atomic and no race occurs.
The code looks like the following:
// this fires up when cache is created
func (f *FeaturesCache) goStartUpdaterDaemon(ctx context.Context) {
go func() {
defer kiterrors.RecoverFunc(ctx, f.logger(ctx))
ticker := time.NewTicker(updateFeaturesPeriod) // every minute
defer ticker.Stop()
for {
select {
case <-ticker.C:
f.refill(ctx)
case <-ctx.Done():
return
}
}
}()
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
// some querying and processing logic
// save new data for future queries
f.features = newSources
}
Now I need to add another view of my data so I can also get it from cache. Basically that means adding one more struct field which will be queriad and filled in the same way the previous one (features) was.
I need these 2 views of my data to be in sync, so it is undesired to have, for example, new data in view 2 and old data in view 1 or the other way round.
So the only thing I need to change about refill is to add a new field, at first I did it this way:
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
f.features = newSources // line A
f.anotherView = anotherView // line B
}
However, for this code I'm wondering whether it satisfies my consistency requirements. I am worried that if the scheduler decides to interrupt the goroutine which runs refill between lines A nd B (check the code above) than I might get inconsistency between data views.
So I researched the problem. Many sources on the Internet say that the scheduler switches goroutines on syscalls and function calls. However, according to this answer https://stackoverflow.com/a/64113553/12702274 since go 1.14 there is an asynchronous preemtion mechanism in go scheduler which switches goroutines based on their running time in addition to previously checked signals. That makes me think that it is actually possible that refill goroutine can be interrupted between lines A and B.
Then I thought about surrounding those 2 assignments with mutex - lock before line A, unlock after line B. However, it seems to me that this doesn't change things much. The goroutine may still be interrupted between lines A and B and the data gets inconsistent. The only thing mutex achieves here is that 2 simultaneous refills do not conflict with each other which is actually impossible, because I run them in the same thread as timer. Thus it is useless here.
So, is there any way I can ensure atomicity for two consecutive assignments?
If I understand your concern correctly, you don't want to lock existing cached data while updating it(bec. it takes time to update, you want to be able to allow usage of existing cached data while updating it in another routine right ?).
Also you want to make f.features and f.anotherView updates atomic.
What about to take your data in a map[int]map[string]FeatureData and map[int]map[string]DataView2. Put the data to a new key each time and let the queries from this key(newSearchIndex).
Just tried to explain in code roughly(think below like pseudo code)
type FeaturesCache struct {
mu sync.RWMutex
features map[int8]map[string]FeatureData
anotherView map[int8]map[string]DataView2
oldSearchIndex int8
newSearchIndex int8
}
func (f *FeaturesCache) CreateNewIndex() int8 {
f.mu.Lock()
defer f.mu.Unlock()
return (f.newSearchIndex + 1) % 16 // mod 16 could be change per your refill rate
}
func (f *FeaturesCache) SetNewIndex(newIndex int8) {
f.mu.Lock()
defer f.mu.Unlock()
f.oldSearchIndex = f.newSearchIndex
f.newSearchIndex = newIndex
}
func (f *FeaturesCache) refill(ctx context.Context) {
var newSources map[string]FeatureData
var anotherView map[string]DataView2
// some querying and processing logic
// save new data for future queries
newSearchIndex := f.CreateNewIndex()
f.features[newSearchIndex] = newSources
f.anotherView[newSearchIndex] = anotherView
f.SetNewIndex(newSearchIndex) //Let the queries to new cached datas after updating search Index
f.features[f.oldSearchIndex] = nil
f.anotherView[f.oldSearchIndex] = nil
}

How to handle long running tasks in reconciliation loop

I am writing a Kubernetes operator and dealing with a peculiar situation of handling long-running tasks from the reconcile loop.
I have the following situation:
func (r *ProvMyAppReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
// _ = context.Background()
_ = r.Log.WithValues("Instance.Namespace", req.NamespacedName)
// your logic here
var i int32
var j int32
var yappMyAppSpex myAppingv1alpha1.MyAppSpec
var result *ctrl.Result
var msg string
var requeueFlag bool
runFlag = false
<<<<< Instance prov logic>>>
// =============================== Check Deletion TimeStamp========
// Check if the MyAppOMyApp instance is marked to be deleted, which is
// // indicated by the deletion timestamp being set.
// <<<<<<<Deleteion and finalizer logic is here>>>>>>>>
// ================================ MyApp Setup ===================
if len(instance.Spec.MyApp) > 0 {
for i = 0; i < int32(len(instance.Spec.MyApp)); i++ {
yappMyAppSpex = instance.Spec.MyApp[i]
if !yappMyAppSpex.setRemove {
result, err = r.provisionStatefulSet(instance, prepareStatefulSetForMyApp(instance, yappMyAppSpex), "TYPE=TELEAPP")
if err != nil {
return *result, err
} else {
_, err = r.addMyApp(yappMyAppSpex.Name) <<< This takes lot of time and if I do any operation on a CR resource , it is not cptured by K8s as operator is busy in running this job in foreground.>>>>
if err != nil {
requeueFlag = true
}
}
}
}
}
if runFlag {
return ctrl.Result{Requeue: true, RequeueAfter: 30 * time.Second}, nil
}
return ctrl.Result{}, nil
}
I am trying to understand what is the best way to handle the above situation? Do I need to use channels and run them in the background? The main issue here is that I have to run some configuration which is taking a lot of time and causing the K8s operator not to handle other updates that are done on CR.
The first thing I would recommend is making good use of the resource status. It's important that consumers of the operator know that the operator has acknowledged the changes and is acting on it.
Then, I would recommend revisiting the API definitions - is this the right API to be using? i.e. can you split this up into more than one controller (or API)?
A simple example: in a restaurant it's not a good idea if the cook is also the waiter because you want customers to know they are being taken care of but cooking could take a long time. It's better for customers to have a waiter that takes the order, marks the order status accordingly, and then hands it to the cook(s) to execute on (this could be another API only used between cook and waiter).
In your example above you could add an API for a single app instance. The main controller would only be responsible for applying that API for each of the instances declared in the top level API (which doesn't take long). The worker controller would respond to the creation of that single-app-instance API and execute on it. This still would take time for the worker but visibility and UX would improve.

Golang: remove structs older than 1h from slice

So I'm building a small utility that listens on a socket and stores incoming messages as a structs in a slice:
var points []Point
type Point struct {
time time.Time
x float64
y float64
}
func main() {
received = make([]Point, 0)
l, err := net.Listen("tcp4", ":8900")
// (...)
}
func processIncomingData(data string) {
// Parse icoming data that comes as: xValue,yValue
inData = strings.Split(data, ",")
x, err := strconv.ParseFloat(inData[0], 64);
if err != nil {
fmt.Println(err)
}
y, err := strconv.ParseFloat(inData[1], 64);
if err != nil {
fmt.Println(err)
}
// Store the new Point
points = append(points, Point{
time: time.Now(),
x: x,
y: y,
})
// Remove points older than 1h ?
}
Now, as you might imagine this will quickly fill my RAM. What's the best way (faster execution) to remove points older than 1h after appening each new one? I'll be getting new points 10-15 times peer second.
Thank you.
An approach I've used several times is to start a goroutine early in the project that looks something like this:
go cleanup()
...
func cleanup() {
for {
time.Sleep(...)
// do cleanup
}
}
Then what you could do is iterate over points using time.Since(point.time) to figure out how old each piece of data is. If it's too old, there's a slice trick to remove an item from a slice given it's position:
points = append(points[:i], points[i+1:]...)
(where i is the index to remove)
Because the points are in the slice in order of the time they were added, you could speed things up by simply finding the first index that isn't an hour old and doing points = points[i:] to chop off the old points from the beginning of the slice.
You may run into problems if you get a request that accesses the array while you're cleaning it up. Adding a sync.Mutex can help with that. Just lock the mutex before the cleanup and also attempt to lock the mutex anywhere else you write to the array. This may be premature optimization though. I'd experiment without the mutex before adding it in as this would effectively make interacting with points a serial operation and slow the service down.
The time.Sleep(...) in the loop is to prevent from cleaning too often. You might be tempted to set it to an hour since you want to delete points older than that but you might end up with a situation where a point is added immediately after a cleanup. On the next cleanup it'll be 59 mins old and you don't delete it, on the NEXT cleanup it's nearly 2 hours old. My rule of thumb is that I attempt to clean up every 1/10 the amount of time I want to allow an object to stay in memory but that's rather arbitrary. This approach means an object could be at most 1h 5m 59s old when it's deleted.

How to optimize a large recursive task concurrently

i have a chron task to perform the best way in Golang.
I need to store big data from web service in JSON in sellers
After saving these sellers in a database, i need to browse another large JSON webservice with sellersID parameter to save to another table named customers.
Each customer has an initial state, if this state has changed from the data of the webservice (n°2) i need to store the difference in another table changes to have a history of changes.
Finally, if the change is equal to our conditions I perform another task.
My current operation
var wg sync.WaitGroup
action.FetchSellers() // fetch large JSON and stort in sellers table ~2min
sellers := action.ListSellers()
for _, s := range sellers {
wg.Add(1)
go action.FetchCustomers(&wg, s) // fetch multiple large JSON and stort in customers table and store notify... ~20sec
}
wg.Wait()
The first difficulty with this code is that I do not control the number of calls to the webservice.
The second is that the action.FetchCustomers function does a lot of work that I think can be done in a concurrency way.
The third difficulty is that I can not resume where an error has occurred in case of errors.
I need to run this code every hour so it needs to be well built, currently it works but not in the best way.
I think that considering the use of Worker Pools in Go like this example Go by Example: Worker Pools But I have trouble conceiving it
Not to be a jerk! But I would use a queue for this kind of things. I have already created a library and using this. github.com/AnikHasibul/queue
// Limit the max
maximumJobLimit := 50
// Open a new queue with the limit
q := queue.New(maximumJobLimit)
defer q.Close()
// simulate a large amount of jobs
for i := 0; i != 1000; i++ {
// Add a job to queue
q.Add()
// Run your long long long job here in a goroutine
go func(c int) {
// Must call Done() after finishing the job
defer q.Done()
time.Sleep(time.Second)
fmt.Println(c)
}(i)
}
//wait for the end of the all jobs
q.Wait()
// Done!

In sync.Map is it necessary to use Load followed by LoadOrStore for complex values

In code where a global map with an expensive to generate value structure may be modified by multiple concurrent threads, which pattern is correct?
// equivalent to map[string]*activity where activity is a
// fairly heavyweight structure
var ipActivity sync.Map
// version 1: not safe with multiple threads, I think
func incrementIP(ip string) {
val, ok := ipActivity.Load(ip)
if !ok {
val = buildComplexActivityObject()
ipActivity.Store(ip, val)
}
updateTheActivityObject(val.(*activity), ip)
}
// version 2: inefficient, I think, because a complex object is built
// every time even through it's only needed the first time
func incrementIP(ip string) {
tmp := buildComplexActivityObject()
val, _ := ipActivity.LoadOrStore(ip, tmp)
updateTheActivity(val.(*activity), ip)
}
// version 3: more complex but technically correct?
func incrementIP(ip string) {
val, found := ipActivity.Load(ip)
if !found {
tmp := buildComplexActivityObject()
// using load or store incase the mapping was already made in
// another store
val, _ = ipActivity.LoadOrStore(ip, tmp)
}
updateTheActivity(val.(*activity), ip)
}
Is version three the correct pattern given Go's concurrency model?
Option 1 obviously can be called by multiple goroutines with a new ip concurrently, and only the last one in the if block would get stored. This possibility is greatly increased the longer buildComplexActivityObject takes, as there is more time in the critical section.
Option 2 works, but calls buildComplexActivityObject every time, which you state is not what you want.
Given that you want to call buildComplexActivityObject as infrequently as possible, the third option is the only one that makes sense.
The sync.Map however cannot protect the actual activity values referenced by the stored pointers. You also need synchronization there when updating the activity value.

Resources