Datastore transaction - hitting entity write limit - go

The Problem
Using the golang cloud.google.com/go/datastore package to create a transaction, perform a series of getMulti's, and putMulti's, on commit of this transaction I'm confronted with an an entity write limit error.
2021/12/22 09:07:18 err: rpc error: code = InvalidArgument desc = cannot write more than 500 entities in a single call
The Question
My question is how do you create a transaction with more than 500 writes?
While I want my operation to remain atomic, I can't seem to solve this write limit error for a transaction and the set of queries run just fine when I test on an emulator, writing in batches of 500.
What I've Tried
please excuse the sudo code but I'm trying to get the jist of what I've done
All in one
transaction, err := datastoreClient.NewTransaction(ctx)
transaction.PutMulti(allKeys, allEntities)
transaction.commit()
// err too many entities written in a single call
Batched in an attempt to avoid the write limit
transaction, err := datastoreClient.NewTransaction(ctx)
transaction.PutMulti(first500Keys, first500Entities)
transaction.PutMulti(second500Keys, second500Entities)
transaction.commit()
// err too many entities written in a single call
A simple regular putmulti also fails
datastoreClient.PutMulti(ctx,allKeys, allEntities)
// err too many entities written in a single call
What Works
Non-atomic write to the datastore
datastoreClient.PutMulti(ctx,first500Keys, first500Entities)
datastoreClient.PutMulti(ctx,second500Keys, second500Entities)
here's the real code that I used for the write, either as a batched transaction or regular putMulti
for i := 0; i < (len(allKeys) / 500); i++ {
var max int = (i + 1) * 500
if len(allKeys) < max {
max = len(allKeys) % 500
}
_, err = svc.dsClient.PutMulti(ctx, allKeys[i*500:max], allEntities[i*500:max])
if err != nil {
return
}
}
Where I'm Lost
so in an effort to keeping my work atomic, is there any method to commit a transaction that has more than 500 entities written in it?

Nothing you can do. This limit is enforced by the platform to ensure scalability and to prevent performance degradation. You can't write more than 500 entities in a single transaction.
It's possible to change the limit on Google's side, but nothing you can do on your side.

Related

Go Gorm Atomic Update to Increment Counter

I have what is essentially a counter that users can increment.
However, I want to avoid the race condition of two users incrementing the counter at once.
Is there a way to atomically increment a counter using Gorm as opposed to fetching the value from the database, incrementing, and finally updating the database?
If you want to use the basic ORM features, you can use FOR UPDATE as query option when retrieving the record, the database will lock the record for that specific connection until that connection issues an UPDATE query to change that record.
Both the SELECT and UPDATE statements must happen on the same connection, which means you need to wrap them in a transaction (otherwise Go may send the second query over a different connection).
Please note that this will make every other connection that wants to SELECT the same record wait until you've done the UPDATE. That is not an issue for most applications, but if you either have very high concurrency or the time between SELECT ... FOR UPDATE and the UPDATE after that is long, this may not be for you.
In addition to FOR UPDATE, the FOR SHARE option sounds like it can also work for you, with less locking contentions (but I don't know it well enough to say this for sure).
Note: This assumes you use an RDBMS that supports SELECT ... FOR UPDATE; if it doesn't, please update the question to tell us which RDBMS you are using.
Another option is to just go around the ORM and do db.Exec("UPDATE counter_table SET counter = counter + 1 WHERE id = ?", 42) (though see https://stackoverflow.com/a/29945125/1073170 for some pitfalls).
A possible solution is to use GORM transactions (https://gorm.io/docs/transactions.html).
err := db.Transaction(func(tx *gorm.DB) error {
// Get model if exist
var feature models.Feature
if err := tx.Where("id = ?", c.Param("id")).First(&feature).Error; err != nil {
return err
}
// Increment Counter
if err := tx.Model(&feature).Update("Counter", feature.Counter+1).Error; err != nil {
return err
}
return nil
})
if err != nil {
c.Status(http.StatusInternalServerError)
return
}
c.Status(http.StatusOK)

How to synchronize constant writing and periodically reading and updating

Defining the problem:
We have this IOT device which each send us logs about cars locations. We want to compute the distance the car is travelling online! so when ever a log comes(after putting it in a queue etc) we do this:
type Delta struct {
DeviceId string
time int64
Distance float64
}
var LastLogs = make(map[string]FullLog)
var Distances = make(map[string]Delta)
func addLastLog(l FullLog) {
LastLogs[l.DeviceID] = l
}
func AddToLogPerDay(l FullLog) {
//mutex.Lock()
if val, ok := LastLogs[l.DeviceID]; ok {
if distance, exist := Distances[l.DeviceID]; exist {
x := computingDistance(val, l)
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: distance.time + 1,
Distance: distance.Distance + x,
}
} else {
Distances[l.DeviceID] = Delta{
DeviceId: l.DeviceID,
time: 1,
Distance: 0,
}
}
}
addLastLog(l)
}
which basically calculates distance using a utility function! so in Distances each device Id is mapped to some distance traveled! now here is where the problem starts: While this distances are added to Distances map, I want a go routine to put this data in the database but since there are many devices and many logs and so on doing this query for every log is not a good idea. So I need to this for every 5 second which means every 5 seconds try to empty the list of all last distances added to the map. I wrote this function:
func UpdateLogPerDayTable() {
for {
for _, distance := range Distances {
logs := model.HourPerDay{}
result := services.CarDBProvider.DB.Table(model.HourPerDay{}.TableName()).
Where("created_at >? AND device_id = ?", getCurrentData(), distance.DeviceId).
Find(&logs)
if result.Error != nil && !result.RecordNotFound() {
log.Infof("Something went wrong while checking the log: %v", result.Error)
} else {
if !result.RecordNotFound() {
logs.CountDistance = distance.Distance
logs.CountSecond = distance.time
err := services.CarDBProvider.DB.Model(&logs).
Update(map[string]interface{}{
"count_second": logs.CountSecond,
"count_distance": logs.CountDistance,
})
if err.Error != nil {
log.Infof("Something went wrong while updating the log: %v", err.Error)
}
} else if result.RecordNotFound() {
dayLog := model.HourPerDay{
Model: gorm.Model{},
DeviceId: distance.DeviceId,
CountSecond: int64(distance.time),
CountDistance: distance.Distance,
}
err := services.CarDBProvider.DB.Create(&dayLog)
if err.Error != nil {
log.Infof("Something went wrong while adding the log: %v", err.Error)
}
}
}
}
time.Sleep(time.Second * 5)
}
}
it is called go utlis.UpdateLogPerDayTable() on another go routine. However there are many problems here:
I don't know how to secure Distances so when I add it in another routine I read it somewhere else ,every thing is ok!(The problem is that I want to use go channels and don't have any idea how to do it)
How can I schedule tasks in go for this problem?
Probably I will add a redis to store all the devices that or online so I could do the select query faster and just update the actual database. also add an expire time for redis so if a device didn't send and data for some time, it vanishes! where should I put this code?
Sorry If my explanations weren't enough but I really need some help. specifically for code implementation
Go has a really cool pattern using for / select over multiple channels. This allows you to batch distance writes using both a timeout and a max record size. Using this pattern requires using channels.
First thing is to model your distances as a channel:
distances := make(chan Delta)
Then you an keep track of the current batch
var deltas []Delta
Then
ticker := time.NewTicker(time.Second * 5)
var deltas []Delta
for {
select {
case <-ticker.C:
// 5 seconds up flush to db
// reset deltas
case d := <-distances:
deltas = append(deltas, d)
if len(deltas) >= maxDeltasPerFlush {
// flush
// reset deltas
}
}
}
I don't know how to secure Distances so when I add it in another
routine I read it somewhere else ,every thing is ok!(The problem is
that I want to use go channels and don't have any idea how to do it)
If you intend to keep a map and share memory you need to protect it using mutual exclusion (mutex) to synchronize access between go routines. Using a channel allows you to send a copy to a channel, removing the need for synchronizing across the Delta Object. Depending on your architecture you could also create a pipeline of go routines connected by channels, which could make it so only a single go routine (monitor go routine) is accessing the Delta, also removing the need for synchronization.
How can I schedule tasks in go for this problem?
Using a channel as the primitive for how you pass Deltas to different go routines :)
Probably I will add a redis to store all the devices that or online so
I could do the select query faster and just update the actual
database. also add an expire time for redis so if a device didn't send
and data for some time, it vanishes! where should I put this code?
This depends on your finished architecture. You could write a decorator for the select operation, which would check redis first then go to the DB. The client of this function wouldn't have to know about this. Write operations could be done the same way: Write to persistent store and then write back to redis with the cached value and the expiration. Using decorators the client wouldn't need to know about this, they would just perform the Reads and Writes and the cache logic would be implemented inside of the decorators. There are many ways for this, and its largely dependent on where your implementation settles.

How to optimize a large recursive task concurrently

i have a chron task to perform the best way in Golang.
I need to store big data from web service in JSON in sellers
After saving these sellers in a database, i need to browse another large JSON webservice with sellersID parameter to save to another table named customers.
Each customer has an initial state, if this state has changed from the data of the webservice (n°2) i need to store the difference in another table changes to have a history of changes.
Finally, if the change is equal to our conditions I perform another task.
My current operation
var wg sync.WaitGroup
action.FetchSellers() // fetch large JSON and stort in sellers table ~2min
sellers := action.ListSellers()
for _, s := range sellers {
wg.Add(1)
go action.FetchCustomers(&wg, s) // fetch multiple large JSON and stort in customers table and store notify... ~20sec
}
wg.Wait()
The first difficulty with this code is that I do not control the number of calls to the webservice.
The second is that the action.FetchCustomers function does a lot of work that I think can be done in a concurrency way.
The third difficulty is that I can not resume where an error has occurred in case of errors.
I need to run this code every hour so it needs to be well built, currently it works but not in the best way.
I think that considering the use of Worker Pools in Go like this example Go by Example: Worker Pools But I have trouble conceiving it
Not to be a jerk! But I would use a queue for this kind of things. I have already created a library and using this. github.com/AnikHasibul/queue
// Limit the max
maximumJobLimit := 50
// Open a new queue with the limit
q := queue.New(maximumJobLimit)
defer q.Close()
// simulate a large amount of jobs
for i := 0; i != 1000; i++ {
// Add a job to queue
q.Add()
// Run your long long long job here in a goroutine
go func(c int) {
// Must call Done() after finishing the job
defer q.Done()
time.Sleep(time.Second)
fmt.Println(c)
}(i)
}
//wait for the end of the all jobs
q.Wait()
// Done!

Reload tensorflow model in Golang app server

I have a Golang app server wherein I keep reloading a saved tensorflow model every 15 minutes. Every api call that uses the tensorflow model, takes a read mutex lock and whenever I reload the model, I take a write lock. Functionality wise, this works fine but during the model load, my API response time increases as the request threads keep waiting for the write lock to be released. Could you please suggest a better approach to keep the loaded model up to date?
Edit, Code updated
Model Load Code:
tags := []string{"serve"}
// load from updated saved model
var m *tensorflow.SavedModel
var err error
m, err = tensorflow.LoadSavedModel("/path/to/model", tags, nil)
if err != nil {
log.Errorf("Exception caught while reloading saved model %v", err)
destroyTFModel(m)
}
if err == nil {
ModelLoadMutex.Lock()
defer ModelLoadMutex.Unlock()
// destroy existing model
destroyTFModel(TensorModel)
TensorModel = m
}
Model Use Code(Part of the API request):
config.ModelLoadMutex.RLock()
defer config.ModelLoadMutex.RUnlock()
scoreTensorList, err = TensorModel.Session.Run(map[tensorflow.Output]*tensorflow.Tensor{
UserOp.Output(0): uT,
DataOp.Output(0): nT},
[]tensorflow.Output{config.SumOp.Output(0)},
nil,
)
Presumably destroyTFModel takes a long time. You could try this:
old := TensorModel
ModelLoadMutex.Lock()
TensorModel = new
ModelLoadMutex.Unlock()
go destroyTFModel(old)
So destroy after assign and/or try destroying on another goroutine if it needs to clean up resources and somehow takes a long time blocking this response. I'd look into what you're doing in destroyTFModel and why it is slow though, does it make network requests to the db or involve the file system? Are you sure there isn't another lock external to your app you're not aware of (for example if it had to open a file and locked it for reads while destroying this model?).
Instead of using if err == nil { around it, consider returning on error.

In CockroachDB, how do batches and transactions interact?

When should I use batches and when should I use transactions? Can I embed a transaction in a batch? A batch in a transaction?
A batch is a collection of operations that are sent to the server as a single unit for efficiency. It is equivalent to sending the same operations as individual requests from different threads. Requests in a batch may be executed out of order, and it's possible for some operations in a batch to succeed while others fail.
In Go, batches are created with the batcher object DB.B, and must be passed to DB.Run(). For example:
err := db.Run(db.B.Put("a", "1").Put("b", "2"))
is equivalent to:
_, err1 := db.Put("a", "1")
_, err2 := db.Put("b", "2")
A transaction defines a consistent and atomic sequence of operations. Transactions guarantee consistency with respect to all other operations in the system: the results of a transaction cannot be seen unless and until the transaction is committed. Since transactions may need to be retried, transactions are defined by function objects (typically closures) which may be called multiple times.
In Go, transactions are created with the DB.Tx method. The *client.Tx parameter to the closure implements a similar interface to DB; inside the transaction you must perform all your operations on this object instead of the original DB. If your function returns an error, the transaction will be aborted; otherwise it will commit. Here is a transactional version of the previous example (but see below for a more efficient version):
err := db.Tx(func(tx *client.Tx) error {
err := tx.Put("a", "1")
if err != nil {
return err
}
return tx.Put("b", "2")
})
The previous example waits for the "a" write to complete before starting the "b" write, and then waits for the "b" write to complete before committing the transaction. It is possible to make this more efficient by using batches inside the transaction. Tx.B is a batcher object, just like DB.B. In a transaction, you can run batches with either Tx.Run or Tx.Commit. Tx.Commit will commit the transaction if and only if all other operations in the batch succeed, and is more efficient than letting the transaction commit automatically when the closure returns. It is a good practice to always make the last operation in a transaction a batch executed by Tx.Commit:
err := db.Tx(func(tx *client.Tx) error {
return tx.Commit(tx.B.Put("a", "1").Put("b", "2"))
})

Resources