How to use self-describing message for protbuf - go

One of the use cases I'm working on while using protocol buffers is to deserialize the Protocol Buffers Kafka messages which I receive at the consumer end (using sarama library and Go).
The way how i'm doing currently is i defined the sample pixel.proto file as show below.
syntax = "proto3";
package saramaprotobuf;
message Pixel {
// Session identifier stuff
string session_id = 2;
}
i'm sending the message through sarama.Producer(by marshalling it) receiving it sarama.Consumer (unmarshalling message it by referencing with complied pixel.proto.pb). Code is as below.
import (
"github.com/Shopify/sarama"
"github.com/golang/protobuf/proto"
"log"
"os"
"os/signal"
"protobuftest/example"
"syscall"
"time"
)
func main() {
topic := "test_topic"
brokerList := []string{"localhost:9092"}
producer, err := newSyncProducer(brokerList)
if err != nil {
log.Fatalln("Failed to start Sarama producer:", err)
}
go func() {
ticker := time.NewTicker(time.Second)
for {
select {
case t := <-ticker.C:
elliot := &example.Pixel{
SessionId: t.String(),
}
pixelToSend := elliot
pixelToSendBytes, err := proto.Marshal(pixelToSend)
if err != nil {
log.Fatalln("Failed to marshal example:", err)
}
msg := &sarama.ProducerMessage{
Topic: topic,
Value: sarama.ByteEncoder(pixelToSendBytes),
}
producer.SendMessage(msg)
log.Printf("Pixel sent: %s", pixelToSend)
}
}
}()
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM)
partitionConsumer, err := newPartitionConsumer(brokerList, topic)
if err != nil {
log.Fatalln("Failed to create Sarama partition consumer:", err)
}
log.Println("Waiting for messages...")
for {
select {
case msg := <-partitionConsumer.Messages():
receivedPixel := &example.Pixel{}
err := proto.Unmarshal(msg.Value, receivedPixel)
if err != nil {
log.Fatalln("Failed to unmarshal example:", err)
}
log.Printf("Pixel received: %s", receivedPixel)
case <-signals:
log.Print("Received termination signal. Exiting.")
return
}
}
}
func newSyncProducer(brokerList []string) (sarama.SyncProducer, error) {
config := sarama.NewConfig()
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Retry.Max = 5
config.Producer.Return.Successes = true
// TODO configure producer
producer, err := sarama.NewSyncProducer(brokerList, config)
if err != nil {
return nil, err
}
return producer, nil
}
func newPartitionConsumer(brokerList []string, topic string) (sarama.PartitionConsumer, error) {
conf := sarama.NewConfig()
// TODO configure consumer
consumer, err := sarama.NewConsumer(brokerList, conf)
if err != nil {
return nil, err
}
partitionConsumer, err := consumer.ConsumePartition(topic, 0, sarama.OffsetOldest)
if err != nil {
return nil, err
}
return partitionConsumer, err
}
In the code as you can see I have imported the .proto file and referencing it in the main function inorder to send and receive the message. The problem here is, the solution is not generic. I will receive the message of different .proto type at the consumer end.
How can I make it generic? I know there is something called as self-describing message(dynamic message) as the part of protobuf. I referred this link https://developers.google.com/protocol-buffers/docs/techniques?csw=1#self-description . But it doesn't has any explaination on how to embed this as the part of pixel.proto(example which i have used) so that at the consumer end i came directly deserialize it to required type.

You would define a generic container message type that would include a DescriptorSet and an Any fields.
When sending, you build an instance of that generic message type, setting the field of type Any with an instance of your Pixel message and setting the DescriptorSet field with the DescriptorSet of the Pixel type.
That would allow the receiver of such message to parse the Any contents using the DescriptorSet you are attaching. In practical terms, this is sending a piece of proto definition together with the message. So receivers wouldn't need pre-shared proto definitions or generated code.
Having said that, I'm not sure this is what you really want because if you are planning to share proto definitions or generated code with clients then I'd suggest simply using a oneof field in a container type would be much simpler to use.

Related

Google PubSub and Go: create client outside or inside publish-function?

I'm new when it comes to Google PubSub(and pubsub applications in general). I'm also relatively new when it comes to Go.
I'm working on a pretty heavy backend service application that already has too many responsibilities. The service needs to fire off one message for each incoming request to a Google PubSub topic. It only needs to "fire and forget". If something goes wrong with the publishing, nothing will happen. The messages are not crucial(only used for analytics), but there will be many of them. We estimate between 50 and 100 messages per second for most of the day.
Now to the code:
func(p *publisher) Publish(message Message, log zerolog.Logger) error {
ctx := context.Background()
client, err := pubsub.NewClient(ctx, p.project)
defer client.Close()
if err != nil {
log.Error().Msgf("Error creating client: %v", err)
return err
}
marshalled, _ := json.Marshal(message)
topic := client.Topic(p.topic)
result := topic.Publish(ctx, &pubsub.Message{
Data: marshalled,
})
_, err = result.Get(ctx)
if err != nil {
log.Error().Msgf("Failed to publish message: %v", err)
return err
}
return nil
}
Disclaimer: p *publisher only contains configuration.
I wonder if this is the best way? Will this lead to the service creating and closing a client 100 times per second? If so, then I guess I should create the client once and pass it as an argument to the Publish()-function instead?
This is how the Publish()-function gets called:
defer func(publisher publish.Publisher, message Message, log zerolog.Logger) {
err := publisher.Publish(log, Message)
if err != nil {
log.Error().Msgf("Failed to publish message: %v", err)
}
}(publisher, message, logger,)
Maybe the way to go is to hold pubsubClient & pubsubTopic inside struct?
type myStruct struct {
pubsubClient *pubsub.Client
pubsubTopic *pubsub.Topic
logger *yourLogger.Logger
}
func newMyStruct(projectID string) (*myStruct, error) {
ctx := context.Background()
pubsubClient, err := pubusb.NewClient(ctx, projectID)
if err != nil {...}
pubsubTopic := pubsubClient.Topic(topicName)
return &myStruct{
pubsubClient: pubsubClient,
pubsubTopic: pubsubTopic,
logger: Logger,
// and whetever you want :D
}
}
And then for that struct create a method, which will take responsibility of marshalling the msg and sends it to Pub/sub
func (s *myStruct) request(ctx context.Context data yorData) {
marshalled, err := json.Marshal(message)
if err != nil {..}
res := s.pubsubTopic.Publish(ctx, &pubsub.Message{
Data: marshalled,
})
if _, err := res.Get(ctx); err !=nil {..}
return nil
}

Golang bufio from websocket breaking after first read

I am trying to stream JSON text from a websocket. However after an initial read I noticed that the stream seems to break/disconnect. This is from a Pleroma server (think: Mastodon). I am using the default Golang websocket library.
package main
import (
"bufio"
"fmt"
"log"
"golang.org/x/net/websocket"
)
func main() {
origin := "https://poa.st/"
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, err := websocket.Dial(url, "", origin)
if err != nil {
log.Fatal(err)
}
s := bufio.NewScanner(ws)
for s.Scan() {
line := s.Text()
fmt.Println(line)
}
}
After the initial JSON text response, the for-loop breaks. I would expect it to send a new message every few seconds.
What might be causing this? I am willing to switch to the Gorilla websocket library if I can use it with bufio.
Thanks!
Although x/net/websocket connection has a Read method with the same signature as the Read method in io.Reader, the connection does not work like an io.Reader. The connection will not work as you expect when wrapped with a bufio.Scanner.
The poa.st endpoint sends a stream of messages where each message is a JSON document. Use the following code to read the messages using the Gorilla package:
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
_, p, err := ws.ReadMessage()
if err != nil {
log.Fatal(err)
}
// p is a []byte containing the JSON document.
fmt.Printf("%s\n", p)
}
The Gorilla package has a helper method for decoding JSON messages. Here's an example of how to use that method.
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
// The JSON documents are objects containing two fields,
// the event type and the payload. The payload is a JSON
// document itself.
var e struct {
Event string
Payload string
}
err := ws.ReadJSON(&e)
if err != nil {
log.Fatal(err)
}
// TODO: decode e.Payload based on e.Event
}

Connect kafka in go(sarama), the consumer can not get message through topic

I just want to follow a demo to try use kafka in go. I can successfully produce message by sarama, but when i want to consume the message, can not get it.
package main
import (
"fmt"
"github.com/Shopify/sarama"
)
// kafka consumer
func main() {
consumer, err := sarama.NewConsumer([]string{"127.0.0.1:9092"}, nil)
if err != nil {
fmt.Printf("fail to start consumer, err:%v\n", err)
return
}
partitionList, err := consumer.Partitions("test")
if err != nil {
fmt.Printf("fail to get list of partition:err%v\n", err)
return
}
fmt.Println(partitionList)
for partition := range partitionList {
pc, err := consumer.ConsumePartition("test", int32(partition), sarama.OffsetNewest)
if err != nil {
fmt.Printf("failed to start consumer for partition %d,err:%v\n", partition, err)
return
}
defer pc.AsyncClose()
go func(sarama.PartitionConsumer) {
for msg := range pc.Messages() {
fmt.Printf("Partition:%d Offset:%d Key:%v Value:%v", msg.Partition, msg.Offset, msg.Key, msg.Value)
}
}(pc)
}
}
The return of the code is
[0]
-1
But actually i can get the message through kafka-console-consumer.
I believe you are not wating for the messages to come...
here is the list fo issues you have in your code:
defer pc.AsyncClose() will trigger on function exit, not scope exit.
goroutine is launching into nowhere... nothing blocks or wait for the results to come.
go func(sarama.PartitionConsumer) {
for msg := range pc.Messages() {
fmt.Printf("Partition:%d Offset:%d Key:%v Value:%v", msg.Partition, msg.Offset, msg.Key, msg.Value)
}
}(pc)
not passing argument to goroutine. go func(sarama.PartitionConsumer) { this is only type. go func(pc sarama.PartitionConsumer) {.
Remove goroutine, and just check the consumer channel if you want to make hello world example.

Google Pub/Sub message ordering not working (or increasing latency to over 10 seconds)?

I'm trying to make a simplified example demonstrating the use of Google Pub/Sub's message ordering feature (https://cloud.google.com/pubsub/docs/ordering). From those docs, after message ordering is enabled for a subscription,
After the message ordering property is set, the Pub/Sub service delivers messages with the same ordering key in the order that the Pub/Sub service receives the messages. For example, if a publisher sends two messages with the same ordering key, the Pub/Sub service delivers the oldest message first.
I've used this to write the following example:
package main
import (
"context"
"log"
"time"
"cloud.google.com/go/pubsub"
uuid "github.com/satori/go.uuid"
)
func main() {
client, err := pubsub.NewClient(context.Background(), "my-project")
if err != nil {
log.Fatalf("NewClient: %v", err)
}
topicID := "test-topic-" + uuid.NewV4().String()
topic, err := client.CreateTopic(context.Background(), topicID)
if err != nil {
log.Fatalf("CreateTopic: %v", err)
}
defer topic.Delete(context.Background())
subID := "test-subscription-" + uuid.NewV4().String()
sub, err := client.CreateSubscription(context.Background(), subID, pubsub.SubscriptionConfig{
Topic: topic,
EnableMessageOrdering: true,
})
if err != nil {
log.Fatalf("CreateSubscription: %v", err)
}
defer sub.Delete(context.Background())
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
messageReceived := make(chan struct{})
go sub.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
log.Printf("Received message with ordering key %s: %s", msg.OrderingKey, msg.Data)
msg.Ack()
messageReceived <- struct{}{}
})
topic.Publish(context.Background(), &pubsub.Message{Data: []byte("Dang1!"), OrderingKey: "foobar"})
topic.Publish(context.Background(), &pubsub.Message{Data: []byte("Dang2!"), OrderingKey: "foobar"})
for i := 0; i < 2; i++ {
select {
case <-messageReceived:
case <-time.After(10 * time.Second):
log.Fatal("Expected to receive a message, but timed out after 10 seconds.")
}
}
}
First, I tried the program without specifying OrderingKey: "foobar" in the topic.Publish() calls. This resulted in the following output:
> go run main.go
2020/08/10 21:40:34 Received message with ordering key : Dang2!
2020/08/10 21:40:34 Received message with ordering key : Dang1!
In other words, messages are not received in the same order as they were published in, which in my use case is undesirable and I'd like to prevent by specifying an OrderingKey
However, as soon as I added the OrderingKeys in the publish calls, the program times out after 10 seconds of waiting to receive Pub/Sub messages:
> go run main.go
2020/08/10 21:44:36 Expected to receive a message, but timed out after 10 seconds.
exit status 1
What I would expect is to now first receive the message Dang1! followed by Dang2!, but instead I'm not receiving any messages. Any idea why this is not happening?
The publishes are failing with the following error: Failed to publish: Topic.EnableMessageOrdering=false, but an OrderingKey was set in Message. Please remove the OrderingKey or turn on Topic.EnableMessageOrdering.
You can see this if you change your publish calls to check the error:
res1 := topic.Publish(context.Background(), &pubsub.Message{Data: []byte("Dang1!"), OrderingKey: "foobar"})
res2 := topic.Publish(context.Background(), &pubsub.Message{Data: []byte("Dang2!"), OrderingKey: "foobar"})
_, err = res1.Get(ctx)
if err != nil {
fmt.Printf("Failed to publish: %v", err)
return
}
_, err = res2.Get(ctx)
if err != nil {
fmt.Printf("Failed to publish: %v", err)
return
}
To fix it, add a line to enable message ordering on your topic. Your topic creation would be as follows:
topic, err := client.CreateTopic(context.Background(), topicID)
if err != nil {
log.Fatalf("CreateTopic: %v", err)
}
topic.EnableMessageOrdering = true
defer topic.Delete(context.Background())
I independently came up with the same solution as Kamal, just wanted to share the full revised implementation:
package main
import (
"context"
"flag"
"log"
"time"
"cloud.google.com/go/pubsub"
uuid "github.com/satori/go.uuid"
)
var enableMessageOrdering bool
func main() {
flag.BoolVar(&enableMessageOrdering, "enableMessageOrdering", false, "Enable and use Pub/Sub message ordering")
flag.Parse()
client, err := pubsub.NewClient(context.Background(), "fleetsmith-dev")
if err != nil {
log.Fatalf("NewClient: %v", err)
}
topicID := "test-topic-" + uuid.NewV4().String()
topic, err := client.CreateTopic(context.Background(), topicID)
if err != nil {
log.Fatalf("CreateTopic: %v", err)
}
topic.EnableMessageOrdering = enableMessageOrdering
defer topic.Delete(context.Background())
subID := "test-subscription-" + uuid.NewV4().String()
sub, err := client.CreateSubscription(context.Background(), subID, pubsub.SubscriptionConfig{
Topic: topic,
EnableMessageOrdering: enableMessageOrdering,
})
if err != nil {
log.Fatalf("CreateSubscription: %v", err)
}
defer sub.Delete(context.Background())
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
messageReceived := make(chan struct{})
go sub.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
log.Printf("Received message with ordering key %s: %s", msg.OrderingKey, msg.Data)
msg.Ack()
messageReceived <- struct{}{}
})
msg1, msg2 := &pubsub.Message{Data: []byte("Dang1!")}, &pubsub.Message{Data: []byte("Dang2!")}
if enableMessageOrdering {
msg1.OrderingKey, msg2.OrderingKey = "foobar", "foobar"
}
publishMessage(topic, msg1)
publishMessage(topic, msg2)
for i := 0; i < 2; i++ {
select {
case <-messageReceived:
case <-time.After(10 * time.Second):
log.Fatal("Expected to receive a message, but timed out after 10 seconds.")
}
}
}
func publishMessage(topic *pubsub.Topic, msg *pubsub.Message) {
publishResult := topic.Publish(context.Background(), msg)
messageID, err := publishResult.Get(context.Background())
if err != nil {
log.Fatalf("Get: %v", err)
}
log.Printf("Published message with ID %s", messageID)
}
When called with the enableMessageOrdering flag set to true, I receive Dang1! first, followed by Dang2!:
> go run main.go --enableMessageOrdering
2020/08/11 05:38:07 Published message with ID 1420685949616723
2020/08/11 05:38:08 Published message with ID 1420726763302425
2020/08/11 05:38:09 Received message with ordering key foobar: Dang1!
2020/08/11 05:38:11 Received message with ordering key foobar: Dang2!
whereas without it, I receive them in reverse order as before:
> go run main.go
2020/08/11 05:38:47 Published message with ID 1420687395091051
2020/08/11 05:38:47 Published message with ID 1420693737065665
2020/08/11 05:38:48 Received message with ordering key : Dang2!
2020/08/11 05:38:48 Received message with ordering key : Dang1!

message does not start with magic byte

I am trying to produce avro encoded data into kafka topic using /linkedin/goavro package in Go. The goal is to be able to consume the topic using different clients.
First I register the schema as following:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"name\":\"test_topic2\",\"type\":\"record\", \"fields\":[{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"password\",\"size\":10,\"type\":\"string\"}]}"}' http://localhost:8081/subjects/test_topic2-value/versions
Then I create avro data, produce and consume it with Go.
package main
import (
"github.com/Shopify/sarama"
"github.com/linkedin/goavro"
"fmt"
)
const (
brokers = "localhost:9092"
topic = "test_topic2"
)
const loginEventAvroSchema = `{"name":"test_topic2","type":"record", "fields":[{"name":"user","type":"string"},{"name":"password","size":10,"type":"string"}]}`
func main() {
// Create Message
codec, err := goavro.NewCodec(loginEventAvroSchema)
if err != nil {
panic(err)
}
m := map[string]interface{}{
"user": "pikachu", "password": 231231,
}
single, err := codec.SingleFromNative(nil, m)
if err != nil {
panic(err)
}
// Producer
config := sarama.NewConfig()
config.Consumer.Return.Errors = true
config.Producer.Return.Successes = true
config.Version = sarama.V2_4_0_0
//get broker
cluster, err := sarama.NewSyncProducer(brokers, config)
if err != nil {
panic(err)
}
defer func() {
if err := cluster.Close(); err != nil {
panic(err)
}
}()
msg := &sarama.ProducerMessage{
Topic: topic,
Value: sarama.StringEncoder(single),
}
cluster.SendMessage(msg)
// Consumer
clusterConsumer, err := sarama.NewConsumer(brokers, config)
if err != nil {
panic(err)
}
defer func() {
if err := clusterConsumer.Close(); err != nil {
panic(err)
}
}()
msgK, _ := clusterConsumer.ConsumePartition(topic, 0, sarama.OffsetOldest)
for {
q := <-msgK.Messages()
native, _, err := codec.NativeFromSingle([]byte(q.Value))
if err != nil {
fmt.Println(err)
}
fmt.Println(native)
}
This code works fine and I can successfully produce and consume messages into the kafka topic.
Now I try to consume the topics from python avro-consumer:
from confluent_kafka import KafkaError
from confluent_kafka.avro import AvroConsumer
from confluent_kafka.avro.serializer import SerializerError
c = AvroConsumer({
'bootstrap.servers': 'localhost',
'group.id': 'groupid',
'schema.registry.url': 'http://localhost:8081',
'auto.offset.reset': 'earliest'})
c.subscribe(['test_topic2'])
while True:
try:
msg = c.poll(10)
except SerializerError as e:
print("Message deserialization failed for {}: {}".format(msg, e))
break
if msg is None:
continue
if msg.error():
print("AvroConsumer error: {}".format(msg.error()))
continue
print(msg.value(), msg.key())
c.close()
But I get the following error:
confluent_kafka.avro.serializer.SerializerError: Message deserialization failed for message at test_topic2 [0] offset 1: message does not start with magic byte
I think that I have missed something on the Go producer part, I would much appreciate it if someone can share his/her experience on how to fix this issue.
goavro doesn't use the Schema Registry.
Plus, you're using the StringEncoder, which I assume outputs only a string slice and not Avro bytes
StringEncoder implements the Encoder interface for Go strings so that they can be used as the Key or Value in a ProducerMessage.
FWIW, I would suggest testing a consumer with kafka-avro-console-consumer, if you have it

Resources