I am trying to send and receive protobuff encoded messages in GoLang over TCP, where the sender can cancel the write() halfway through the operation, and the receiver can correctly receive partial messages.
Note that I use a single TCP connection to send messages of different user defined types, infinitely (this is not a per connection message case)
To explain my question concretely, first I will present how I implement the send/receive without partial writes.
In my program, there are multiple types of messages, defined in a .proto file. I will explain the mechanism for one such message type.
message MessageType {
int64 sender = 1;
int64 receiver = 2;
int64 operation = 3;
string message = 4;
}
Then I use Golang Protobuf plugin to generate the stubs.
Then in the sender side, the following is how I send.
func send(w *bufio.Writer, code uint8, oriMsg MessageType) {
err := w.WriteByte(code)
data, err := proto.Marshal(oriMsg)
lengthWritten := len(data)
var b [8]byte
bs := b[:8]
binary.LittleEndian.PutUint64(bs, uint64(lengthWritten))
_, err = w.Write(bs)
_, err = w.Write(data)
w.flush()
}
Then in the receiver side, the following is how I receive.
reader *bufio.Reader
for true {
if msgType, err = reader.ReadByte(); err != nil {
panic()
}
if msgType == 1 || msgType == 2{
var b [8]byte
bs := b[:8]
_, err := io.ReadFull(reader, bs)
numBytes := binary.LittleEndian.Uint64(bs)
data := make([]byte, numBytes)
length, err := io.ReadFull(reader, data)
msg *MessageType = new(GenericConsensus) // an empty message
err = proto.Unmarshal(data[:length], msg)
// do something with the message
} else {
// unknown message type handler
}
}
Now my question is, what if the sender aborts his writes in the middle: more concretely,
Case 1: what if the sender writes the message type byte, and then abort? In this case the receiver will read the message type byte, and waits to receive an 8 byte message length, but the sender doesn't send it.
Case 2: This is an extended version of case 1 where the sender first sends only the message type byte, and the aborts sending the message length and marshaled message, and then send the next message: the type byte, the length and encoded message. Now in the receiver side, everything goes wrong because the order of messages (type, length and encoded message) is violated.
So my question is, how can I modify the receiver such that it can continue to operate despite the sender violating the pre-agreed order of type:length:encoded-message?
Thanks
Why would the sender abort a message, but then send another message? You mean it's a fully byzantine sender? Or are you preparing for fuzzy-testing?
If your API contract says that the sender always needs to send a correct message, then the receiver can simply ignore wrong messages, or even close the connection if it sees a violation of the API contract.
If you really need it, here some ideas of how you could make it work:
start with a unique preamble - but then you will have to make sure this preamble never comes up in the data
add a checksum to the message before sending it to the decoder. So the full packet would be: [msg_type : msg_len : msg : chksum ]. This allows the receiver to check whether it's a correct message or a misformed one.
Also, as the code is currently, it is quite easy to crash by sending a size with the maximum of 64 bits. So you should also check for the size to be in a useful range. I would limit it to 32 bits...
Related
I have a GRPC service defined like:
message SendEventRequest {
string producer = 1;
google.protobuf.Any event = 2;
}
message SendEventResponse {
string event_name = 1;
string status = 2;
}
service EventService {
rpc Send(SendEventRequest) returns (SendEventResponse);
}
I also have defined a custom message option:
extend google.protobuf.MessageOptions {
// event_name is the unique name of the event sent by the clients
string event_name = 50000;
}
What I want to achieve is have clients create custom proto messages that set the event_name option to a "constant". For instance:
message SomeCustomEvent {
option (mypackage.event_name) = "some_custom_event";
string data = 1;
...
}
That way the service can keep track of what events are being sent. When I do something like this I'm able to get the value of the option from a specific proto.Message:
_, md := descriptor.MessageDescriptorProto(SomeCustomEvent)
mOpts := md.GetOptions()
eventName := proto.GetExtension(mOpts, mypackage.E_EventName)
However, when the message is of type github.com/golang/protobuf/ptypes/any.Any the options are nil. How can I retrieve the event_name from the message? I've come across the protoregistry.MessageTypeResolver, which looks like it might help, but I would need to figure out a way to dynamically update the proto definitions of the events when clients integrate.
In order to obtain the options of an Any type, you need its specific protoreflect.MessageType so that you can unmarshal it into a specific message. In order to get the message type, you need a MessageTypeResolver.
Any contains a type_url field, which can be used for that purpose. In order to unmarshal the Any object into a message of an existing message type:
// GlobalTypes contains information about the proto message types
var res protoregistry.MessageTypeResolver = protoregistry.GlobalTypes
typeUrl := anyObject.GetTypeUrl()
msgType, _ := res.FindMessageByURL(typeUrl)
msg := msgType.New().Interface()
unmarshalOptions := proto.UnmarshalOptions{Resolver: res}
unmarshalOptions.Unmarshal(anyObject.GetValue(), msg)
After having the specific message, you can simply get the option you need:
msgOpts := msg.ProtoReflect().Descriptor().Options()
eventName := proto.GetExtension(msgOpts, mypackage.E_EventName)
Note that proto.GetExtension will panic if the message doesn't extend the event_name option, and it needs to be recovered. This block can be added at the beginning of the function:
defer func() {
if r := recover(); r != nil {
// err is a named return parameter of the outer function
err = fmt.Errorf("recovering from panic while extracting event_name from proto message: %s", r)
}
}()
EDIT: Note that the application has to import the package containing the proto definitions in order for protoregistry.GlobalTypes to recognize the type. You could do something like this in your code:
var _ mypackage.SomeEvent
The any.ANY type in Go contains the TypeUrl field which contains the type of the message that was sent. You can then use that to UnmarshalAny to the correct, generated Go type.
Link to a complete guide on how to work with any.ANI.
A little addition to Boyan's answer, I found it needs to pass protoregistry.ExtensionTypeResolver variable to assign proto.UnmarshalOptions.Resolver, which looks like:
var extRes protoregistry.ExtensionTypeResolver = protoregistry.GlobalTypes
unmarshalOptions := proto.UnmarshalOptions{Resolver: extRes}
I'm learning go lang and I'd like to create a go app to achieve the following:
receive data from a remote log
analyze some sort of error of warning
periodically send an HTTP request to a URL informing that everything is ok or send warn and error.
I've been reading about concurrency, parallelism and channels but I'm not sure how I should pass data from my logging goroutine to another routine with a timer to make the request. Should I declare a slice in another routine to receive all the messages and at the end fo timer iterate over it?
currently, my code looks like:
package main
import (
"fmt"
"log"
"strings"
"gopkg.in/mcuadros/go-syslog.v2"
)
func strigAnalyze(str string){
/*analyse the contents of the log message and do something*/
}
func main() {
channel := make(syslog.LogPartsChannel)
handler := syslog.NewChannelHandler(channel)
server := syslog.NewServer()
server.SetFormat(syslog.RFC3164)
server.SetHandler(handler)
server.ListenUDP("0.0.0.0:8888")
server.ListenTCP("0.0.0.0:8888")
server.Boot()
go func(channel syslog.LogPartsChannel) {
for logParts := range channel {
content := logParts["content"]
fmt.Println("logparts", logParts)
string := fmt.Sprintf("%v", content)
strigAnalyze(str)
}
}(channel)
server.Wait()
}
Should I declare a slice in another routine to receive all the
messages and at the end fo timer iterate over it?
This is one very common pattern in go. The example youre describe is sometimes called a "monitor routine". It guards the buffer of logs and because it "owns" them you know that they are safe from concurrent access.
The data is shared through the channel, the producer of the log data will be completely decoupled from how the sender is using it, all it needs to do is send on a channel. If the channel is unbuffered then your producer will block until the receiver can process. If you need to keep the producer high throughput you could buffer the channel or shed sends, which would look like:
select {
case logChan <- log:
...
default:
// chan is full shedding data.
}
This pattern also lends really well to a "receive" loop that for...selects over the input channel, the timer, and some sort of done/context. The following is not a working example and it is missing cancellation and logic but it shows how you can for...select over multiple channels (one of which is your timer/heartbeat):
logChan := make(chan string)
go func() {
var logBuf []string
t := time.NewTimer(time.Second * 5)
for {
select {
log, ok := <-logChan:
if !ok { return }
logBuf = append(logBuf, log)
<-t.C:
// timer up
// flush logs
// reset slice
}
}
}()
Also depending on how you are using the data, it might make more sense to use an actual buffer here instead of a slice.
I have an array of Ids of type int64 And this is my Nsq Message that I am trying to publish.
nsqMsg := st{
Action : "insert",
Ids : Ids
GID : Gids
}
msg, err := json.Marshal(nsqMsg)
if err != nil {
log.Println(err)
return err
}
err = nsqProducer.Publish(TOPIC-NAME, msg)
if err != nil {
log.Println(err)
return err
}
While in my consumer I am taking each Id one by one and fetching an info based on my Id from my datastore.
So while fetching there can be a case if my CreateObject method returns an error so I handle that case by requeue the msg (which is giving the error) and so it can be retried.
for i := 0; i < len(data.Ids); i++ {
Object, err := X.CreateObject(data.Ids[i)
if err != nil {
requeueMsgData = append(requeueMsgData, data.Ids[i])
continue
}
DataList = append(DataList, Object)
}
if len(requeueMsgData) > 0 {
msg, err := json.Marshal(requeueMsgData)
if err != nil {
log.Println(err)
return err
}
message.Body = msg
message.Requeue(60 * time.Second)
log.Println("error while creating Object", err)
return n
}
So, is this the right way of doing this?
Is their any drawback of this case?
Is it better to publish it again?
Some queues (like Kafka) support acknowledgement where items that are dequeued are not removed from the queue until the consumer has actually acknowledged successful receipt of the item.
The advantage of this model is that if the consumer dies after consumption but before acknowledgement, the item will be automatically re-queued. The downside of your model is that the item might be lost in that case.
The risk of an acknowledgment model is that items could now be double consumed. Where a consumer attempts consumption that has side-effects (like incrementing a counter or mutating a database) but doesn't acknowledge so retries might not create the desired result. (note that reading through the nsq docs, retries are not guaranteed to happen even if you don't re-enqueue the data so your code will likely have to be defensive against this anyway).
You should look into the topic of "Exactly Once" vs. "At Most Once" processing if you want to understand this deeper.
Reading through the nsq docs, it doesn't look like acknowledgement is supported so this might be the best option you have if you are obligated to use nsq.
Along the lines with what dolan was saying there are a couple of cases that you could encounter:
main message heartbeat/lease times out and you receive ALL ids again (from the original message). NSQ provides "at least once" semantics.
Requeue of any single message times out and is never complete (fallback to the main IDS)
Because nsq can (and most def will :p) deliver messages more than once CreateObjects could/should be idempotent in order to handle this case.
Additionally the redelivery is an important safety mechanism,
The original message shouldn’t be fin’d until all individual ids or confirmed created or successfully requeued, which ensures that no data is lost.
IMO the way you are handling it looks perfectly good, but the most important considerations IMO are handling correctness/data integrity in an environment where duplicate messages will be received.
Another option could be to batch the Requeue so that it attempts to produce a single output message of failed ids, which could cut back on the # of messages in the queue at any given time:
Consider a message with 3 ids:
message ids: [id1, id2, id3]
id1 succeeds creation and id2 and id3 fail:
the program could attempt all operations and emit a single requeue message, with id2, id3.
But trade offs with this too.
I am currently building a simple chat server that supports posting messages through a REST API.
example:
========
```
curl -X POST -H "Content-Type: application/json" --data '{"user":"alex", "text":"this is a message"}' http://localhost:8081/message
{
"ok": true
}
Right now, I'm just currently storing the messages in an array of messages. I'm pretty sure this is an inefficient way. So is there a simple, better way to get and store the messages using goroutines and channels that will make it thread-safe.
Here is what I currently have:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
Messages = append(Messages, m)
}
Thanks!
It could be made thread safe using mutex, as #ThunderCat suggested but I think this does not add concurrency. If two or more requests are made simultaneously, one will have to wait for the other to complete first, slowing the server down.
Adding Concurrency: You make it faster and handle more concurrent request by using a queue (which is a Go channel) and a worker that listens on that channel - it'll be a simple implementation. Every time a message comes in through a Post request, you add to the queue (this is instantaneous and the HTTP response can be sent immediately). In another goroutine, you detect that a message has been added to the queue, you take it out append it to your Messages slice. While you're appending to Messages, the HTTP requests don't have to wait.
Note: You can make it even better by having multiple goroutines listen on the queue, but we can leave that for later.
This is how the code will somewhat look like:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
// messageQueue is the queue that holds new messages until they are processed
var messageQueue chan Message
func init() { // need the init function to initialize the channel, and the listeners
// initialize the queue, choosing the buffer size as 8 (number of messages the channel can hold at once)
messageQueue = make(chan Message, 8)
// start a goroutine that listens on the queue/channel for new messages
go listenForMessages()
}
func listenForMessages() {
// whenever we detect a message in the queue, append it to Messages
for m := range messageQueue {
Messages = append(Messages, m)
}
}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
// add the message to the channel, it'll only wait if the channel is full
messageQueue <- m
}
Storing Messages: As other users have suggested, storing messages in memory may not be the right choice since the messages won't persist if the application is restarted. If you're working on a small, proof-of-concept type project and don't want to figure out the DB, you could save the Messages variable as a flat file on the server and then read from it every time the application starts (*Note: this should not be done on a production system, of course, for that you should set up a Database). But yeah, database should be the way to go.
Use a mutex to make the program threadsafe.
var Messages = []Message{}
var messageMu sync.Mutex
...
messageMu.Lock()
Messages = append(Messages, m)
messageMu.Unlock()
There's no need to use channels and goroutines to make the program threadsafe.
A database is probably a better choice for storing messages than the in memory slice used in the question. Asking how to use a database to implement a chat program is too broad a question for SO.
After using the protogen tool, I have a message type for sending messages:
type File struct {
Info string `protobuf:"bytes,1,opt,name=info,json=info" json:"info,omitempty"`
BytesValues []byte `protobuf:"bytes,2,opt,name=bytes_values,json=bytesValues,proto3" json:"bytes_values,omitempty"`
}
I am trying to send some binary data using the BytesValues field like so:
filePath := filepath.Join("test", "myfile.bin")
f, _ := ioutil.ReadFile(filePath) // error return value ignored for brevity
msg := File{BytesValues: f}
body, _ := proto.Marshal(msg) // encode
The server seems to have problems decoding the message I am sending to it. Is this the correct way to send binary data using a []byte field with protocol buffers?
In my case, the problem was actually the server not reading the raw bytes from the correct field.
The correct way to send raw bytes is to just set the bytes to the field. There is no need to encode the bytes in any way because protocol buffers is a binary format.
filePath := filepath.Join("test", "myfile.bin")
f, _ := ioutil.ReadFile(filePath) // error return value ignored for brevity
msg := File{BytesValues: f}
body, _ := proto.Marshal(msg) // encode