Send data in cloudevents format to Kafka topic - go

Right now I have this code and it works fine. (It sends some json format data to Kafka topic)
j, err := json.Marshal(data)
if err != nil {
log.Fatal(err)
}
msg := &sarama.ProducerMessage{
Topic: tName,
Value: sarama.StringEncoder(j),
}
_, _, err = producer.SendMessage(msg)
but somebody wishes ho have this data in cloudevents format. -> https://github.com/cloudevents/sdk-go
so what should I do because this Event structure can not be directly casted to string.
type Event struct {
Context EventContext
DataEncoded []byte
// DataBase64 indicates if the event, when serialized, represents
// the data field using the base64 encoding.
// In v0.3, this field is superseded by DataContentEncoding
DataBase64 bool
FieldErrors map[string]error
}
so this code won't even compile.
j, err := json.Marshal(data)
if err != nil {
log.Fatal(err)
}
//...
event := cloudevents.NewEvent()
event.SetSource("example/uri")
event.SetType("example.type")
event.SetData(cloudevents.ApplicationJSON, j)
producerMsg := &sarama.ProducerMessage{
Topic: s.outputTopic,
Value: sarama.StringEncoder(event),
}
_, _, err = s.producer.SendMessage(producerMsg)
What should I do to send this Event to Kafka? Try to cast event.DataEncoded to string or something like that?
btw.
Programming language is golang.

Did you see the section of the docs that serialized the event?
https://github.com/cloudevents/sdk-go#serializedeserialize-a-cloudevent
event := cloudevents.NewEvent()
event.SetSource("example/uri")
event.SetType("example.type")
// data here is a map[string] interface{}, or some other Struct type representing the "example.type" schema type above
event.SetData(cloudevents.ApplicationJSON, data)
bytes, err := json.Marshal(event)
if err != nil {
log.Fatal(err)
}
producerMsg := &sarama.ProducerMessage{
Topic: s.outputTopic,
Value: bytes, // you've already encoded the event
}
Otherwise, be sure to look at the sample code provided that uses the CloudEvent client https://github.com/cloudevents/sdk-go/blob/main/samples/kafka/sender/main.go
sender, err := kafka_sarama.NewSender([]string{"127.0.0.1:9092"}, saramaConfig, "test-topic")
if err != nil {
log.Fatalf("failed to create protocol: %s", err.Error())
}
defer sender.Close(context.Background())
c, err := cloudevents.NewClient(sender, cloudevents.WithTimeNow(), cloudevents.WithUUIDs())
if err != nil {
log.Fatalf("failed to create client, %v", err)
}
event := cloudevents.NewEvent()
event.Set...
c.Send(..., event)
...

Related

XACK is not deleting the message, even if it is processed successfully?

I am trying to implement redis stream where we have a producer.
package producer
import (
"RedisStream/models"
"encoding/json"
"fmt"
"github.com/garyburd/redigo/redis"
)
type Producer struct {
streamName string
}
func NewProducer(streamName string) *Producer {
return &Producer{streamName: streamName}
}
func (p *Producer) WriteEvents(conn redis.Conn, key string) {
// Create a new struct
employee := models.Employee{
Name: "ashutosh",
Employer: "self-employee",
}
// Convert struct to JSON
e, _ := json.Marshal(employee)
// Send key and value to Redis stream
_, err := conn.Do("XADD", p.streamName, "*", key, e)
if err != nil {
fmt.Println(err)
}
fmt.Println("Successfully sent data to Redis stream")
}
then I have implemented a consumer
func (c *Consumer) ReadEventsCons1() {
// Connect to Redis
conn, err := redis.Dial("tcp", ":6379")
if err != nil {
fmt.Println(err)
return
}
defer conn.Close()
for {
// Read key and value from Redis stream
reply, err := conn.Do("XREADGROUP", "GROUP", c.groupName[0], "ashu", "COUNT", "1", "STREAMS", c.streamName, ">")
vs, err := redis.Values(reply, err)
if err != nil {
if errors.Is(err, redis.ErrNil) {
continue
}
fmt.Printf("Error: %+v", err)
}
// Get the first and only value in the array since we're only
// reading from one stream "some-stream-name" here.
vs, err = redis.Values(vs[0], nil)
if err != nil {
fmt.Printf("Error: %+v", err)
}
// Ignore the stream name as the first value as we already have
// that in hand! Just get the second value which is guaranteed to
// exist per the docs, and parse it as some stream entries.
res, err := entries(vs[1], nil)
if err != nil {
fmt.Errorf("error parsing entries: %w", err)
}
for _, val := range res {
for k, v := range val.Fields {
empl := &models.Employee{}
_ = json.Unmarshal(v, empl)
fmt.Printf("From Consumer Ashu: Key: %s and val: %+v \n", k, empl)
}
reply, err := redis.Int(conn.Do("XACK", c.streamName, c.groupName[0], val.ID))
if reply != 1 {
fmt.Printf("failed to ack: err: %+v", err)
}
}
}
}
Once a consumer from a consumergroup successfully processed a message, I sent acknowledgement to redis.But messages still resides in redis stream. because post running
XLEN streamName
I can see length is growing. This may create memory challenge, since messages are residing in perpetuity. Is there any intelligent way to handle this issue?

Google PubSub and Go: create client outside or inside publish-function?

I'm new when it comes to Google PubSub(and pubsub applications in general). I'm also relatively new when it comes to Go.
I'm working on a pretty heavy backend service application that already has too many responsibilities. The service needs to fire off one message for each incoming request to a Google PubSub topic. It only needs to "fire and forget". If something goes wrong with the publishing, nothing will happen. The messages are not crucial(only used for analytics), but there will be many of them. We estimate between 50 and 100 messages per second for most of the day.
Now to the code:
func(p *publisher) Publish(message Message, log zerolog.Logger) error {
ctx := context.Background()
client, err := pubsub.NewClient(ctx, p.project)
defer client.Close()
if err != nil {
log.Error().Msgf("Error creating client: %v", err)
return err
}
marshalled, _ := json.Marshal(message)
topic := client.Topic(p.topic)
result := topic.Publish(ctx, &pubsub.Message{
Data: marshalled,
})
_, err = result.Get(ctx)
if err != nil {
log.Error().Msgf("Failed to publish message: %v", err)
return err
}
return nil
}
Disclaimer: p *publisher only contains configuration.
I wonder if this is the best way? Will this lead to the service creating and closing a client 100 times per second? If so, then I guess I should create the client once and pass it as an argument to the Publish()-function instead?
This is how the Publish()-function gets called:
defer func(publisher publish.Publisher, message Message, log zerolog.Logger) {
err := publisher.Publish(log, Message)
if err != nil {
log.Error().Msgf("Failed to publish message: %v", err)
}
}(publisher, message, logger,)
Maybe the way to go is to hold pubsubClient & pubsubTopic inside struct?
type myStruct struct {
pubsubClient *pubsub.Client
pubsubTopic *pubsub.Topic
logger *yourLogger.Logger
}
func newMyStruct(projectID string) (*myStruct, error) {
ctx := context.Background()
pubsubClient, err := pubusb.NewClient(ctx, projectID)
if err != nil {...}
pubsubTopic := pubsubClient.Topic(topicName)
return &myStruct{
pubsubClient: pubsubClient,
pubsubTopic: pubsubTopic,
logger: Logger,
// and whetever you want :D
}
}
And then for that struct create a method, which will take responsibility of marshalling the msg and sends it to Pub/sub
func (s *myStruct) request(ctx context.Context data yorData) {
marshalled, err := json.Marshal(message)
if err != nil {..}
res := s.pubsubTopic.Publish(ctx, &pubsub.Message{
Data: marshalled,
})
if _, err := res.Get(ctx); err !=nil {..}
return nil
}

kafka retry many times when i download large file

I am newbie in kafka, i try build a service send mail with attach files.
Execution flow:
Kafka will receive a message to send mail
function get file will download file from url , scale image, and save file
when send mail i will get files from folder and attach to form
Issues:
when i send mail with large files many times , kafka retry many times, i will receive many mail
kafka error: "kafka server: The provided member is not known in the current generation"
I listened MaxProcessingTime , but i try to test a mail with large file, it still work fine
Kafka info : 1 broker , 3 consumer
func (s *customerMailService) SendPODMail() error { filePaths, err := DownloadFiles(podURLs, orderInfo.OrderCode)
if err != nil{
countRetry := 0
for countRetry <= NUM_OF_RETRY{
filePaths, err = DownloadFiles(podURLs, orderInfo.OrderCode)
if err == nil{
break
}
countRetry++
}
}
err = s.sendMailService.Send(ctx, orderInfo.CustomerEmail, tmsPod, content,filePaths)}
function download file :
func DownloadFiles(files []string, orderCode string) ([]string, error) {
var filePaths []string
err := os.Mkdir(tempDir, 0750)
if err != nil && !os.IsExist(err) {
return nil, err
}
tempDirPath := tempDir + "/" + orderCode
err = os.Mkdir(tempDirPath, 0750)
if err != nil && !os.IsExist(err) {
return nil, err
}
for _, fileUrl := range files {
fileUrlParsed, err := url.ParseRequestURI(fileUrl)
if err != nil {
logrus.WithError(err).Infof("Pod url is invalid %s", orderCode)
return nil, err
}
extFile := filepath.Ext(fileUrlParsed.Path)
dir, err := os.MkdirTemp(tempDirPath, "tempDir")
if err != nil {
return nil, err
}
f, err := os.CreateTemp(dir, "tmpfile-*"+extFile)
if err != nil {
return nil, err
}
defer f.Close()
response, err := http.Get(fileUrl)
if err != nil {
return nil, err
}
defer response.Body.Close()
contentTypes := response.Header["Content-Type"]
isTypeAllow := false
for _, contentType := range contentTypes {
if contentType == "image/png" || contentType == "image/jpeg" {
isTypeAllow = true
}
}
if !isTypeAllow {
logrus.WithError(err).Infof("Pod image type is invalid %s", orderCode)
return nil, errors.New("Pod image type is invalid")
}
decodedImg, err := imaging.Decode(response.Body)
if err != nil {
return nil, err
}
resizedImg := imaging.Resize(decodedImg, 1024, 0, imaging.Lanczos)
imaging.Save(resizedImg, f.Name())
filePaths = append(filePaths, f.Name())
}
return filePaths, nil}
function send mail
func (s *tikiMailService) SendFile(ctx context.Context, receiver string, templateCode string, data interface{}, filePaths []string) error {
path := "/v1/emails"
fullPath := fmt.Sprintf("%s%s", s.host, path)
formValue := &bytes.Buffer{}
writer := multipart.NewWriter(formValue)
_ = writer.WriteField("template", templateCode)
_ = writer.WriteField("to", receiver)
if data != nil {
b, err := json.Marshal(data)
if err != nil {
return errors.Wrapf(err, "Cannot marshal mail data to json with object %+v", data)
}
_ = writer.WriteField("params", string(b))
}
for _, filePath := range filePaths {
part, err := writer.CreateFormFile(filePath, filepath.Base(filePath))
if err != nil {
return err
}
pipeReader, pipeWriter := io.Pipe()
go func() {
defer pipeWriter.Close()
file, err := os.Open(filePath)
if err != nil {
return
}
defer file.Close()
io.Copy(pipeWriter, file)
}()
io.Copy(part, pipeReader)
}
err := writer.Close()
if err != nil {
return err
}
request, err := http.NewRequest("POST", fullPath, formValue)
if err != nil {
return err
}
request.Header.Set("Content-Type", writer.FormDataContentType())
resp, err := s.doer.Do(request)
if err != nil {
return errors.Wrap(err, "Cannot send request to send email")
}
defer resp.Body.Close()
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
return err
}
if resp.StatusCode != http.StatusOK {
return errors.New(fmt.Sprintf("Send email with code %s error: status code %d, response %s",
templateCode, resp.StatusCode, string(b)))
} else {
logrus.Infof("Send email with attachment ,code %s success with response %s , box-code", templateCode, string(b),filePaths)
}
return nil
}
Thank
My team found my problem when I redeploy k8s pods, which lead to conflict leader partition causing rebalance. It will try to process the remaining messages in buffer of pods again.
Solution: I don't fetch many messages saved in buffer , I just get a message and process it by config :
ChannelBufferSize = 0
Example conflict leader parition:
consumer A and B startup in the same time
consumer A registers itself as leader, and owns the topic with all partitions
consumer B registers itself as leader, and then begins to rebalance and owns all partitions
consumer A rebalance and obtains all partitions, but can not consume because the memberId is old and need a new one
consumer B rebalance again and owns the topic with all partitions, but it's already obtained by consumer A
My two cents: in case of very big attachments, the consumer takes quite a lot of time to read the file and to send it as an attachment.
This increases the amount of time between two poll() calls. If that time is greater than max.poll.interval.ms, the consumer is thought to be failed and the partition offset is not committed. As a result, the message is processed again and eventually, if by chance the execution time stays below the poll interval, the offset is committed. The effect is a multiple email send.
Try increasing the max.poll.interval.ms on the consumer side.

Golang bufio from websocket breaking after first read

I am trying to stream JSON text from a websocket. However after an initial read I noticed that the stream seems to break/disconnect. This is from a Pleroma server (think: Mastodon). I am using the default Golang websocket library.
package main
import (
"bufio"
"fmt"
"log"
"golang.org/x/net/websocket"
)
func main() {
origin := "https://poa.st/"
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, err := websocket.Dial(url, "", origin)
if err != nil {
log.Fatal(err)
}
s := bufio.NewScanner(ws)
for s.Scan() {
line := s.Text()
fmt.Println(line)
}
}
After the initial JSON text response, the for-loop breaks. I would expect it to send a new message every few seconds.
What might be causing this? I am willing to switch to the Gorilla websocket library if I can use it with bufio.
Thanks!
Although x/net/websocket connection has a Read method with the same signature as the Read method in io.Reader, the connection does not work like an io.Reader. The connection will not work as you expect when wrapped with a bufio.Scanner.
The poa.st endpoint sends a stream of messages where each message is a JSON document. Use the following code to read the messages using the Gorilla package:
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
_, p, err := ws.ReadMessage()
if err != nil {
log.Fatal(err)
}
// p is a []byte containing the JSON document.
fmt.Printf("%s\n", p)
}
The Gorilla package has a helper method for decoding JSON messages. Here's an example of how to use that method.
url := "wss://poa.st/api/v1/streaming/?stream=public"
ws, _, err := websocket.DefaultDialer.Dial(url, nil)
if err != nil {
log.Fatal(err)
}
defer ws.Close()
for {
// The JSON documents are objects containing two fields,
// the event type and the payload. The payload is a JSON
// document itself.
var e struct {
Event string
Payload string
}
err := ws.ReadJSON(&e)
if err != nil {
log.Fatal(err)
}
// TODO: decode e.Payload based on e.Event
}

How to use self-describing message for protbuf

One of the use cases I'm working on while using protocol buffers is to deserialize the Protocol Buffers Kafka messages which I receive at the consumer end (using sarama library and Go).
The way how i'm doing currently is i defined the sample pixel.proto file as show below.
syntax = "proto3";
package saramaprotobuf;
message Pixel {
// Session identifier stuff
string session_id = 2;
}
i'm sending the message through sarama.Producer(by marshalling it) receiving it sarama.Consumer (unmarshalling message it by referencing with complied pixel.proto.pb). Code is as below.
import (
"github.com/Shopify/sarama"
"github.com/golang/protobuf/proto"
"log"
"os"
"os/signal"
"protobuftest/example"
"syscall"
"time"
)
func main() {
topic := "test_topic"
brokerList := []string{"localhost:9092"}
producer, err := newSyncProducer(brokerList)
if err != nil {
log.Fatalln("Failed to start Sarama producer:", err)
}
go func() {
ticker := time.NewTicker(time.Second)
for {
select {
case t := <-ticker.C:
elliot := &example.Pixel{
SessionId: t.String(),
}
pixelToSend := elliot
pixelToSendBytes, err := proto.Marshal(pixelToSend)
if err != nil {
log.Fatalln("Failed to marshal example:", err)
}
msg := &sarama.ProducerMessage{
Topic: topic,
Value: sarama.ByteEncoder(pixelToSendBytes),
}
producer.SendMessage(msg)
log.Printf("Pixel sent: %s", pixelToSend)
}
}
}()
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM)
partitionConsumer, err := newPartitionConsumer(brokerList, topic)
if err != nil {
log.Fatalln("Failed to create Sarama partition consumer:", err)
}
log.Println("Waiting for messages...")
for {
select {
case msg := <-partitionConsumer.Messages():
receivedPixel := &example.Pixel{}
err := proto.Unmarshal(msg.Value, receivedPixel)
if err != nil {
log.Fatalln("Failed to unmarshal example:", err)
}
log.Printf("Pixel received: %s", receivedPixel)
case <-signals:
log.Print("Received termination signal. Exiting.")
return
}
}
}
func newSyncProducer(brokerList []string) (sarama.SyncProducer, error) {
config := sarama.NewConfig()
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Retry.Max = 5
config.Producer.Return.Successes = true
// TODO configure producer
producer, err := sarama.NewSyncProducer(brokerList, config)
if err != nil {
return nil, err
}
return producer, nil
}
func newPartitionConsumer(brokerList []string, topic string) (sarama.PartitionConsumer, error) {
conf := sarama.NewConfig()
// TODO configure consumer
consumer, err := sarama.NewConsumer(brokerList, conf)
if err != nil {
return nil, err
}
partitionConsumer, err := consumer.ConsumePartition(topic, 0, sarama.OffsetOldest)
if err != nil {
return nil, err
}
return partitionConsumer, err
}
In the code as you can see I have imported the .proto file and referencing it in the main function inorder to send and receive the message. The problem here is, the solution is not generic. I will receive the message of different .proto type at the consumer end.
How can I make it generic? I know there is something called as self-describing message(dynamic message) as the part of protobuf. I referred this link https://developers.google.com/protocol-buffers/docs/techniques?csw=1#self-description . But it doesn't has any explaination on how to embed this as the part of pixel.proto(example which i have used) so that at the consumer end i came directly deserialize it to required type.
You would define a generic container message type that would include a DescriptorSet and an Any fields.
When sending, you build an instance of that generic message type, setting the field of type Any with an instance of your Pixel message and setting the DescriptorSet field with the DescriptorSet of the Pixel type.
That would allow the receiver of such message to parse the Any contents using the DescriptorSet you are attaching. In practical terms, this is sending a piece of proto definition together with the message. So receivers wouldn't need pre-shared proto definitions or generated code.
Having said that, I'm not sure this is what you really want because if you are planning to share proto definitions or generated code with clients then I'd suggest simply using a oneof field in a container type would be much simpler to use.

Resources