In order to sync mailboxes my application follows the sync recommendations by attempting to find the history ID of the latest message in the users mailbox. We then use this for partial syncs going forward.
Recently we noticed behavior that suggested an issue with these syncs. One explanation was that we were receiving a much older message and history ID. I've tested our functionality and it appears to work correctly. Still, in an attempt to rule out a potential root cause, I added some checks to detect if the users.messages.list API return results out of descending order. These checks ended up being hit suggesting that this is an issue.
Here is my function, in Go, for finding the latest history ID. This includes the additional checks I added to validate the ordering -- essentially instead of using messages.get for the first entry in the list, it also gets the last entry in the list and then compares dates/history IDs: the first entry in the list should have the greatest history ID and date.
func getLatestHistoryID(ctx context.Context, gmailService *gmail.Service) (uint64, time.Time, error) {
messagesResponse, err := gmailService.Users.Messages.List("me").IncludeSpamTrash(true).Context(ctx).Do()
if err != nil {
return 0, time.Time{}, err
}
messagesList := messagesResponse.Messages
if messagesList == nil || len(messagesList) == 0 {
return 0, time.Time{}, nil
}
latestMessage, err := gmailService.Users.Messages.Get("me", messagesList[0].Id).Context(ctx).Do()
if err != nil {
return 0, time.Time{}, err
} else if latestMessage == nil {
return 0, time.Time{}, nil
}
earliestMessage, err := gmailService.Users.Messages.Get("me", messagesList[len(messagesList)-1].Id).Context(ctx).Do()
if err != nil {
log.Errorf("error doing optional check to validate ordering of message list. %v", err)
} else if earliestMessage == nil {
log.Errorf("unexpected earliest message not retrieved")
} else {
if latestMessage.HistoryId < earliestMessage.HistoryId {
return 0, time.Time{}, fmt.Errorf("message list was not in the expected order by history id! first in list %d (%s), last %d (%s)",
latestMessage.HistoryId, latestMessage.Id,
earliestMessage.HistoryId, earliestMessage.Id)
}
// This could probably fail in rare clock skew cases, but right now we're observing this being a several hour difference between dates.
if latestMessage.InternalDate < earliestMessage.InternalDate {
return 0, time.Time{}, fmt.Errorf("message list was not in the expected order by date! first in list %s (%s), last %s (%s)",
time.UnixMilli(latestMessage.InternalDate).String(), latestMessage.Id,
time.UnixMilli(earliestMessage.InternalDate).String(), earliestMessage.Id)
}
}
return latestMessage.HistoryId, time.UnixMilli(latestMessage.InternalDate), nil
}
I've found several resources that confirm that users.messages.list is expected to be descending by date/history ID:
Gmail API - Getting different results with users.threads.list vs users.messages.list
In what order does the Gmail API return messages when calling "Users.messages: list"
https://developers.google.com/gmail/api/guides/sync#full_synchronization #3
Edited: originally linked to https://developers.google.com/gmail/api/guides/sync#limitations
When I test the function above locally it works as expected, and the return statement on the last line is hit. Yet I've observed the out of order detection errors hundred of times. Of the failures, ~9/10 times I'm seeing the HistoryId check fail. I believe this is largely failing on a small set of mailboxes, and I am currently not sure what proportion usages this occurs (working on gathering this).
Is there any reason the API may return results out of order? Is there anything wrong with the assumptions made by my checks?
API return results out of descending order.
If you check the documentation for users.messages.list you will find that there is no order by parameter. Which means that there is no way for you to guarantee the order the data arrives in.
It could arrive sometimes in descending order and other times not in descending order. There is no way to guarantee it if there was it would state the order in the docs.
#limitations does not mention anything about order it only mentions that it may or may not be alliable.
History records are typically available for at least one week and often longer.
you should always sort this locally.
Related
What is the read cost at point A, B and C? Is it always 1 read no matter what, or are there circumstances under which no read is incurred?
dsnap, err := docRef.Get(ctx)
if status.Code(err) == codes.NotFound {
return nil, ErrNotFound // Point A
}
if err != nil {
return nil, err // Point B
}
// Point C
According to the documentation on pricing:
Minimum charge for queries
There is a minimum charge of one document read for each query that you
perform, even if the query returns no results.
This suggests that every time you call Get, it will cost 1 read if the request hits the server. This is essentially the cost of using the massively scalable Firestore indexes.
I have what is essentially a counter that users can increment.
However, I want to avoid the race condition of two users incrementing the counter at once.
Is there a way to atomically increment a counter using Gorm as opposed to fetching the value from the database, incrementing, and finally updating the database?
If you want to use the basic ORM features, you can use FOR UPDATE as query option when retrieving the record, the database will lock the record for that specific connection until that connection issues an UPDATE query to change that record.
Both the SELECT and UPDATE statements must happen on the same connection, which means you need to wrap them in a transaction (otherwise Go may send the second query over a different connection).
Please note that this will make every other connection that wants to SELECT the same record wait until you've done the UPDATE. That is not an issue for most applications, but if you either have very high concurrency or the time between SELECT ... FOR UPDATE and the UPDATE after that is long, this may not be for you.
In addition to FOR UPDATE, the FOR SHARE option sounds like it can also work for you, with less locking contentions (but I don't know it well enough to say this for sure).
Note: This assumes you use an RDBMS that supports SELECT ... FOR UPDATE; if it doesn't, please update the question to tell us which RDBMS you are using.
Another option is to just go around the ORM and do db.Exec("UPDATE counter_table SET counter = counter + 1 WHERE id = ?", 42) (though see https://stackoverflow.com/a/29945125/1073170 for some pitfalls).
A possible solution is to use GORM transactions (https://gorm.io/docs/transactions.html).
err := db.Transaction(func(tx *gorm.DB) error {
// Get model if exist
var feature models.Feature
if err := tx.Where("id = ?", c.Param("id")).First(&feature).Error; err != nil {
return err
}
// Increment Counter
if err := tx.Model(&feature).Update("Counter", feature.Counter+1).Error; err != nil {
return err
}
return nil
})
if err != nil {
c.Status(http.StatusInternalServerError)
return
}
c.Status(http.StatusOK)
I have an array of Ids of type int64 And this is my Nsq Message that I am trying to publish.
nsqMsg := st{
Action : "insert",
Ids : Ids
GID : Gids
}
msg, err := json.Marshal(nsqMsg)
if err != nil {
log.Println(err)
return err
}
err = nsqProducer.Publish(TOPIC-NAME, msg)
if err != nil {
log.Println(err)
return err
}
While in my consumer I am taking each Id one by one and fetching an info based on my Id from my datastore.
So while fetching there can be a case if my CreateObject method returns an error so I handle that case by requeue the msg (which is giving the error) and so it can be retried.
for i := 0; i < len(data.Ids); i++ {
Object, err := X.CreateObject(data.Ids[i)
if err != nil {
requeueMsgData = append(requeueMsgData, data.Ids[i])
continue
}
DataList = append(DataList, Object)
}
if len(requeueMsgData) > 0 {
msg, err := json.Marshal(requeueMsgData)
if err != nil {
log.Println(err)
return err
}
message.Body = msg
message.Requeue(60 * time.Second)
log.Println("error while creating Object", err)
return n
}
So, is this the right way of doing this?
Is their any drawback of this case?
Is it better to publish it again?
Some queues (like Kafka) support acknowledgement where items that are dequeued are not removed from the queue until the consumer has actually acknowledged successful receipt of the item.
The advantage of this model is that if the consumer dies after consumption but before acknowledgement, the item will be automatically re-queued. The downside of your model is that the item might be lost in that case.
The risk of an acknowledgment model is that items could now be double consumed. Where a consumer attempts consumption that has side-effects (like incrementing a counter or mutating a database) but doesn't acknowledge so retries might not create the desired result. (note that reading through the nsq docs, retries are not guaranteed to happen even if you don't re-enqueue the data so your code will likely have to be defensive against this anyway).
You should look into the topic of "Exactly Once" vs. "At Most Once" processing if you want to understand this deeper.
Reading through the nsq docs, it doesn't look like acknowledgement is supported so this might be the best option you have if you are obligated to use nsq.
Along the lines with what dolan was saying there are a couple of cases that you could encounter:
main message heartbeat/lease times out and you receive ALL ids again (from the original message). NSQ provides "at least once" semantics.
Requeue of any single message times out and is never complete (fallback to the main IDS)
Because nsq can (and most def will :p) deliver messages more than once CreateObjects could/should be idempotent in order to handle this case.
Additionally the redelivery is an important safety mechanism,
The original message shouldn’t be fin’d until all individual ids or confirmed created or successfully requeued, which ensures that no data is lost.
IMO the way you are handling it looks perfectly good, but the most important considerations IMO are handling correctness/data integrity in an environment where duplicate messages will be received.
Another option could be to batch the Requeue so that it attempts to produce a single output message of failed ids, which could cut back on the # of messages in the queue at any given time:
Consider a message with 3 ids:
message ids: [id1, id2, id3]
id1 succeeds creation and id2 and id3 fail:
the program could attempt all operations and emit a single requeue message, with id2, id3.
But trade offs with this too.
I use this driver to communicate with psql from Go. Now when I issue an update query, I have no possibility to know whether it actually updated anything (it can update 0 rows if such id is not present).
_, err := Db.Query("UPDATE tags SET name=$1 WHERE id=1", name)
I tried to investigate err variable (in the way the doc suggests for Insert statement):
if err == sql.ErrNoRows {
...
}
But even with non-existent id, err is still null.
I also tried to use QueryRow with returning clause:
id := 0
err := Db.QueryRow("UPDATE tags SET name=$1 WHERE id=1 RETURNING id", name).Scan(&id)
But this one fails to scan &id when id=1 is not present in the database.
So what is the canonical way to check whether my update updated anything?
Try using db.Exec() instead of db.Query() for queries that do not return results. Instead of returning a sql.Rows object (which doesn't have a way to check how many rows were affected), it returns a sql.Result object, which has a method RowsAffected() (int64, error). This returns the number of rows affected (inserted, deleted, updated) by any write operations in the query fed to the Exec() call.
res, err := db.Exec(query, args...)
if err != nil {
return err
}
n, err := res.RowsAffected()
if err != nil {
return err
}
// do something with n
Note that if your query doesn't affect any rows directly, but only does so via a subquery, the rows affected by the subquery will not be counted as rows affected for that method call.
Also, as the method comment notes, this doesn't work for all database types, but I know for a fact it works with pq, as we're using that driver ourselves (and using the RowsAffected() method).
Reference links:
https://golang.org/pkg/database/sql/#DB.Exec
https://golang.org/pkg/database/sql/#Result
I have a small Heroku app in which i print out name and age from each rows after query execution.
I want to avoid looping rows.Next(),Scan().. and just want to show what database returned after query execution which may be some data or error.
Can we directly dump data to a string for printing?
rows, err := db.Query("SELECT name FROM users WHERE age = $1", age)
if err != nil {
log.Fatal(err)
}
for rows.Next() {
var name string
if err := rows.Scan(&name); err != nil {
log.Fatal(err)
}
fmt.Printf("%s is %d\n", name, age)
}
if err := rows.Err(); err != nil {
log.Fatal(err)
}
Pretty much: No.
The Query method is going to return a pointer to a Rows struct:
func (db *DB) Query(query string, args ...interface{}) (*Rows, error)
If you print that (fmt.Printf("%#v\n", rows)) you'll see something such as:
&sql.Rows{dc:(*sql.driverConn)(0xc8201225a0), releaseConn:(func(error)(0x4802c0), rowsi:(*pq.rows)(0xc820166700), closed:false, lastcols:[]driver.Value(nil), lasterr:error(nil), closeStmt:driver.Stmt(nil)}
...probably not what you want.
Those correspond to the Rows struct from the sql package (you'll notice the fields are not exported):
type Rows struct {
dc *driverConn // owned; must call releaseConn when closed to release
releaseConn func(error)
rowsi driver.Rows
closed bool
lastcols []driver.Value
lasterr error // non-nil only if closed is true
closeStmt driver.Stmt // if non-nil, statement to Close on close
}
You'll see []driver.Value (an interface from the driver package), that looks like where we can expect to find some useful, maybe even human readable data. But when directly printed it doesn't appear useful, it's even empty... So you have to somehow get at the underlying information. The sql package gives us the Next method to start with:
Next prepares the next result row for reading with the Scan method.
It returns true on success, or false if there is no next
result row or an error happened while preparing it. Err
should be consulted to distinguish between the two cases.
Every call to Scan, even the first one, must be preceded by a call to Next.
Next is going to make a []driver.Value the same size as the number of columns I have, which is accessible (within the sql package) through driver.Rows (the rowsi field) and populate it with values from the query.
After calling rows.Next() if you did the same fmt.Printf("%#v\n", rows) you should now see that []diver.Value is no longer empty but it's still not going to be anything that you can read, more likely something resembling:[]diver.Value{[]uint8{0x47, 0x65...
And since the field isn't exported you can't even try and convert it to something more meaningful. But the sql package gives us a means to do something with the data, which is Scan.
The Scan method is pretty concise, with lengthy comments that I won't paste here, but the really important bit is that it ranges over the columns in the current row you get from the Next method and calls convertAssign(dest[i], sv), which you can see here:
https://golang.org/src/database/sql/convert.go
It's pretty long but actually relatively simple, it essentially switches on the type of the source and destination and converts where it can, and copies from source to destination; the function comments tell us:
convertAssign copies to dest the value in src, converting it if possible. An error is returned if the copy would result in loss of information. dest should be a pointer type.
So now you have a method (Scan) which you can call directly and which hands you back converted values. Your code sample above is fine (except maybe the call to Fatal() on a Scan error).
It's important to realize that the sql package has to work with a specific driver, which is in turn implemented for specific database software, so there is quite some work going on behind the scenes.
I think your best bet if you want to hide/generalize the whole Query() ---> Next() ---> Scan() idiom is to drop it into another function which does it behind the scenes... write a package in which you abstract away that higher level implementation, as the sql package abstracts away some of the driver-specific details, the converting and copying, populating the Rows, etc.