Kafka Error Handling using Shopify Sarama - go

So I am trying to use Kafka for my application which has a producer logging actions into the Kafka MQ and the consumer which reads it off the MQ.Since my application is in Go, I am using the Shopify Sarama to make this possible.
Right now, I'm able to read off the MQ and print the message contents using a
fmt.Printf
Howeveer, I would really like the error handling to be better than console printing and I am willing to go the extra mile.
Code right now for consumer connection:
mqCfg := sarama.NewConfig()
master, err := sarama.NewConsumer([]string{brokerConnect}, mqCfg)
if err != nil {
panic(err) // Don't want to panic when error occurs, instead handle it
}
And the processing of messages:
go func() {
defer wg.Done()
for message := range consumer.Messages() {
var msgContent Message
_ = json.Unmarshal(message.Value, &msgContent)
fmt.Printf("Reading message of type %s with id : %d\n", msgContent.Type, msgContent.ContentId) //Don't want to print it
}
}()
My questions (I am new to testing Kafka and new to kafka in general):
Where could errors occur in the above program, so that I can handle them? Any sample code will be great for me to start with. The error conditions I could think of are when the msgContent doesn't really contain any Type of ContentId fields in the JSON.
In kafka, are there situations when the consumer is trying to read at the current offset, but for some reason was not able to (even when the JSON is well formed)? Is it possible for my consumer to backtrack to say x steps above the failed offset read and re-process the offsets? Or is there a better way to do this? again, what could these situations be?
I'm open to reading and trying things.

Regarding 1) Check where I log error messages below. This is more or less what I would do.
Regarding 2) I don't know about trying to walk backwards in a topic. Its very much possible by just creating a consumer over and over, with its starting offset minus one each time. But I wouldn't advise it, as most likely you'll end up replaying the same message over and over. I do advice saving your offset often so you can recover if things go south.
Below is code that I believe addresses most of your questions. I haven't tried compiling this. And sarama api has been changing lately, so the api may currently differ a bit.
func StartKafkaReader(wg *sync.WaitGroup, lastgoodoff int64, out chan<- *Message) (error) {
wg.Add(1)
go func(){
defer wg.Done()
//to track the last known good offset we processed, which is
// updated after each successfully processed event.
saveprogress := func(off int64){
//Save the offset somewhere...a file...
//Ive also used kafka to store progress
//using a special topic as a WAL
}
defer saveprogress(lastgoodoffset)
client, err := sarama.NewClient("clientId", brokers, sarama.NewClientConfig())
if err != nil {
log.Error(err)
return
}
defer client.Close()
sarama.NewConsumerConfig()
consumerConfig.OffsetMethod = sarama.OffsetMethodManual
consumerConfig.OffsetValue = int64(lastgoodoff)
consumer, err := sarama.NewConsumer(client, topic, partition, "consumerId", consumerConfig)
if err != nil {
log.Error(err)
return
}
defer consumer.Close()
for {
select {
case event := <-consumer.Events():
if event.Err != nil {
log.Error(event.Err)
return
}
msgContent := &Message{}
err = json.Unmarshal(message.Value, msgContent)
if err != nil {
log.Error(err)
continue //continue to skip this message or return to stop without updating the offset.
}
// Send the message on to be processed.
out <- msgContent
lastgoodoff = event.Offset
}
}
}()
}

Related

golang confluent kafka consumer doesn't receive messages

I was trying to create a simple producer / consumer kafka duo. Since producer was successfully working, according to the examples in confluent's github page, I had trouble while implementing consumer. I use cloud kafka broker, which is Cloudkarafka. The consumer.go code is below:
func main() {
config := &kafka.ConfigMap{
"metadata.broker.list": "XXXXXXX", // 3 hosts Cloudkarafka provides to me
"security.protocol": "SASL_SSL",
"sasl.mechanisms": "SCRAM-SHA-256",
"sasl.username": "XXXXXXXX", // My username provided by Cloudkarafka
"sasl.password": "XXXXXXXX", // My password provided by
"group.id": "cloudkarafka-example",
"go.events.channel.enable": true,
"go.application.rebalance.enable": true,
"default.topic.config": kafka.ConfigMap{"auto.offset.reset": "earliest"},
//"debug": "generic,broker,security",
}
topic := "XXXXXX" + "A" // username + "A"
consumer, err := kafka.NewConsumer(config)
if err != nil {
panic(fmt.Sprintf("Failed to create consumer: %s", err))
}
topics := []string{topic}
//consumer.SubscribeTopics(topics, nil)
err = consumer.SubscribeTopics(topics, nil)
run := true
for run == true {
ev := consumer.Poll(0)
switch e := ev.(type) {
case *kafka.Message:
fmt.Printf("%% Message on %s:\n%s\n",
e.TopicPartition, string(e.Value))
case kafka.PartitionEOF:
fmt.Printf("%% Reached %v\n", e)
case kafka.Error:
fmt.Fprintf(os.Stderr, "%% Error: %v\n", e)
run = false
default:
fmt.Printf("Ignored %v\n", e)
}
}
consumer.Close()
}
The problem here I get is, even though I produce messages to the same topic, consumer always stays in the default case, and constantly gives the output "Ignored <nil> ". Since I feel beginner to these topics, any help & suggestion would be appreciated.
ps: I use Windows 11, in the details it says "confluent-kafka-go is not supported on Windows" but the code works just stays in default state, also the producer part just works fine.
producer.go:
config := &kafka.ConfigMap{
"metadata.broker.list": "XXXXXXXXXX",
"security.protocol": "SASL_SSL",
"sasl.mechanisms": "SCRAM-SHA-256",
"sasl.username": "XXXXXXXXX",
"sasl.password": "XXXXXXXXX",
"group.id": "cloudkarafka-example",
"default.topic.config": kafka.ConfigMap{"auto.offset.reset": "earliest"},
//"debug": "generic,broker,security",
}
topic := "XXXXX-" + "A"
p, err := kafka.NewProducer(config)
if err != nil {
fmt.Printf("Failed to create producer: %s\n", err)
os.Exit(1)
}
fmt.Printf("Created Producer %v\n", p)
deliveryChan := make(chan kafka.Event)
for i := 0; i < 10; i++ {
value := fmt.Sprintf("[%d] Hello Go!", i+1)
err = p.Produce(&kafka.Message{TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny}, Value: []byte(value)}, deliveryChan)
e := <-deliveryChan
m := e.(*kafka.Message)
if m.TopicPartition.Error != nil {
fmt.Printf("Delivery failed: %v\n", m.TopicPartition.Error)
} else {
fmt.Printf("Delivered message to topic %s [%d] at offset %v\n",
*m.TopicPartition.Topic, m.TopicPartition.Partition, m.TopicPartition.Offset)
}
}
close(deliveryChan)
Poll() will return nil on timeout. Since you are specifying a timeout of 0ms, I suspect that what you are seeing is the behaviour of a consumer with no messages to consume.
i.e. you are asking it to wait 0ms for new messages, there are never any new messages so the Poll() call is immediately returning nil every time, all the time. Without a specific nil case, these are handled by your default case.
Are you SURE you are producing messages to the same topic your consumer is subscribed to?As Andrey pointed out in his comment, either your topic ids are different or you have obfuscated them differently in your consumer vs producer example code. It may be more helpful to attempt first reproducing your problem with a configuration that does not require obfuscation, to avoid uncertainty on such points.
Are you getting any error from Subscribe() (why aren't you checking this)?
How long have you waited to see messages consumed?It can take a few seconds for a broker to accept a new consumer into a group; with a 0ms timeout, you may see lots of "no message" events before you eventually start receiving any waiting messages.
For a minimal, working example, I'd suggest keeping things as simple as possible:
You don't need to configure go.events.channel.enable if you are using Poll() to read messages
You don't need to configure go.application.rebalance.enable if you aren't interested in, and don't need to modify, initial offsets.
If you aren't interested in events such as PartitionEOF etc (and you likely aren't) then you might want to consider using the higher-level consumer.ReadMessage() function rather than Poll() (ReadMessage returns only messages or errors and ignores all other events).

Streaming values from websocket, determining if I am lagging behind processing the data

I am connecting to a websocket that is stream live stock trades.
I have to read the prices, perform calculations on the fly and based on these calculations make another API call e.g. buy or sell.
I want to ensure my calculations/processing doesn't slow down my ability to stream in all the live data.
What is a good design pattern to follow for this type of problem?
Is there a way to log/warn in my system to know if I am falling behind?
Falling behind means: the websocket is sending price data, and I am not able to process that data as it comes in and it is lagging behind.
While doing the c.ReadJSON and then passing the message to my channel, there might be a delay in deserializing into JSON
When inside my channel and processing, calculating formulas and sending another API request to buy/sell, this will add delays
How can I prevent lags/delays and also monitor if indeed there is a delay?
func main() {
c, _, err := websocket.DefaultDialer.Dial("wss://socket.example.com/stocks", nil)
if err != nil {
panic(err)
}
defer c.Close()
// Buffered channel to account for bursts or spikes in data:
chanMessages := make(chan interface{}, 10000)
// Read messages off the buffered queue:
go func() {
for msgBytes := range chanMessages {
logrus.Info("Message Bytes: ", msgBytes)
}
}()
// As little logic as possible in the reader loop:
for {
var msg interface{}
err := c.ReadJSON(&msg)
if err != nil {
panic(err)
}
chanMessages <- msg
}
}
You can read bytes, pass them to the channel, and use other goroutines to do conversion.
I worked on a similar crypto market bot. Instead of creating large buffured channel i created buffered channel with cap of 1 and used select statement for sending socket data to channel.
Here is the example
var wg sync.WaitGroup
msg := make(chan []byte, 1)
wg.Add(1)
go func() {
defer wg.Done()
for data := range msg {
// decode and process data
}
}()
for {
_, data, err := c.ReadMessage()
if err != nil {
log.Println("read error: ", err)
return
}
select {
case msg <- data: // in case channel is free
default: // if not, next time will try again with latest data
}
}
This will insure that you'll get the latest data when you are ready to process.

Goroutine safe channel close doesn't actually close webscoket

This one is a tricky issue that bugs me quite a bit.
Essentially, I wrote an integration microservice that provides data streams from Binance crypto exchange using the Go client. A client sends a start messages, starts data stream for a symbol, and at some point, sends a close message to stop the stream. My implementation looks basically like this:
func (c BinanceClient) StartDataStream(clientType bn.ClientType, symbol, interval string) error {
switch clientType {
case bn.SPOT_LIVE:
wsKlineHandler := c.handlers.klineHandler.SpotKlineHandler
wsErrHandler := c.handlers.klineHandler.ErrHandler
_, stopC, err := binance.WsKlineServe(symbol, interval, wsKlineHandler, wsErrHandler)
if err != nil {
fmt.Println(err)
return err
} else {
c.state.clientSymChanMap[clientType][symbol] = stopC
return nil
}
...
}
The clientSymChanMap stores the stopChannel in a nested hashmap so that I can retrieve the stop channel later to stop the data feed. The stop function has been implemented accordingly:
func (c BinanceClient) StopDataStream(clientType bn.ClientType, symbol string) {
//mtd := "StopDataStream: "
stopC := c.state.clientSymChanMap[clientType][symbol]
if isClosed(stopC) {
DbgPrint(" Channel is already closed. Do nothing for: " + symbol)
} else {
close(stopC)
}
// Delete channel from the map otherwise the next StopAll throws a NPE due to closing a dead channel
delete(c.state.clientSymChanMap[clientType], symbol)
return
}
To prevent panics from already closed channels, I use a check function that returns true in case the channel is already close.
func isClosed(ch <-chan struct{}) bool {
select {
case <-ch:
return true
default:
}
return false
}
Looks nice, but has a catch. When I run the code with starting data for just one symbol, it starts and closes the datafeed exactly as expected.
However, when starting multiple data feeds, then the above code somehow never closes the websocket and just keeps streaming data forever. Without the isClosed check, I get panics of trying to close a closed channel, but with the check in place, well, nothing gets closed.
When looking at the implementation of the above binance.WsKlineServe function, it's quite obvious that it just wraps a new websocket with each invocation and then returns the done & stop channel.
The documentation gives the following usage example:
wsKlineHandler := func(event *binance.WsKlineEvent) {
fmt.Println(event)
}
errHandler := func(err error) {
fmt.Println(err)
}
doneC, stopC, err := binance.WsKlineServe("LTCBTC", "1m", wsKlineHandler, errHandler)
if err != nil {
fmt.Println(err)
return
}
<-doneC
Because the doneC channel actually blocks, I removed it and thought that storing the stopC channel and then use it later to stop the datafeed would work. However, it only does so for one single instance. When multiple streams are open, this doesn't work anymore.
Any idea what that's the case and how to fix it?
Firstly, this is dangerous:
if isClosed(stopC) {
DbgPrint(" Channel is already closed. Do nothing for: " + symbol)
} else {
close(stopC) // <- can't be sure channel is still open
}
there is no guarantee that after your polling check of the channel state, that the channel will still be in that same state in the next line of code. So this code could in theory could panic if it's called concurrently.
If you want an asynchronous action to occur on the channel close - it's best to do this explicitly from its own goroutine. So you could try this:
go func() {
stopC := c.state.clientSymChanMap[clientType][symbol]
<-stopC
// stopC definitely closed now
delete(c.state.clientSymChanMap[clientType], symbol)
}()
P.S. you do need some sort of mutex on your map, since the delete is asynchronous - you need to ensure any adds to the map don't datarace with this.
P.P.S Channels are reclaimed by the GC when they go out of scope. If you are no longer reading from it - they do not need to be explicitly closed to be reclaimed by the GC.
Using channels for stopping a goroutine or closing something is very tricky. There are lots of things you can do wrong or forget to do.
context.WithCancel abstracts that complexity away, making the code more readable and maintainable.
Some code snippets:
ctx, cancel := context.WitchCancel(context.TODO())
TheThingToCancel(ctx, ...)
// Whenever you want to stop TheThingToCancel. Can be called multiple times.
cancel()
Then in a for loop you'd often have a select like this:
for {
select {
case <-ctx.Done():
return
default:
}
// do stuff
}
Here some code that is closer to your specific case of an open connection:
func TheThingToCancel(ctx context.Context) (context.CancelFunc, error) {
ctx, cancel := context.WithCancel(ctx)
conn, err := net.Dial("tcp", ":12345")
if err != nil {
cancel()
return nil, err
}
go func() {
<-ctx.Done()
_ = conn.Close()
}()
go func() {
defer func() {
_ = conn.Close()
// make sure context is always cancelled to avoid goroutine leak
cancel()
}()
var bts = make([]byte, 1024)
for {
n, err := conn.Read(bts)
if err != nil {
return
}
fmt.Println(bts[:n])
}
}()
return cancel, nil
}
It returns the cancel function to be able to close it from the outside.
Cancelling a context can be done many times over without a panic like would occur if a channel is closed multiple times. That is one advantage. Also you can derive contexts from other contexts and thereby close a lot of contexts that all stop different routines by closing a parent context. Carefully designed, this is very powerful for shutting down different routines belonging together that also need to be able to be shut down individually.

Golang ending (Binance) web service stream using go routine

I'm integrating Binance API into an existing system and while most parts a straight forward, the data streaming API hits my limited understanding of go-routines. I don't believe there is anything special in the golang SDK for Binance, but essentially I only need two functions, one that starts the data stream and processes events with the event handler given as a parameter and a second one that ends the data stream without actually shutting down the client as it would close all other connections. On a previous project, there were two message types for this, but the binance SDK uses an implementation that returns two go channels, one for errors and an another one, I guess from the name, for stopping the data stram.
The code I wrote for starting the data stream looks like this:
func startDataStream(symbol, interval string, wsKlineHandler futures.WsKlineHandler, errHandler futures.ErrHandler) (err error){
doneC, stopC, err := futures.WsKlineServe(symbol, interval, wsKlineHandler, errHandler)
if err != nil {
fmt.Println(err)
return err
}
return nil
}
This works as expected and streams data. A simple test verifies it:
func runWSDataTest() {
symbol := "BTCUSDT"
interval := "15m"
errHandler := func(err error) {fmt.Println(err)}
wsKlineHandler := func(event *futures.WsKlineEvent) {fmt.Println(event)}
_ = startDataStream(symbol, interval, wsKlineHandler, errHandler)
}
The thing that is not so clear to me, mainly due to incomplete understanding, really is how do I stop the stream. I think the returned stopC channel can be used to somehow issue a end singnal similar to, say, a sigterm on system level and then the stream should end.
Say, I have a stopDataStream function that takes a symbol as an argument
func stopDataStream(symbol){
}
Let's suppose I start 5 data streams for five symbols and now I want to stop just one of the streams. That begs the question of:
How do I track all those stopC channels?
Can I use a collection keyed with the symbol, pull the stopC channel, and then just issue a signal to end just that data stream?
How do I actually write into the stopC channel from the stop function?
Again, I don't think this is particularly hard, it's just I could not figure it out yet from the docs so any help would be appreciated.
Thank you
(Answer originally written by #Marvin.Hansen)
Turned out, just saving & closing the channel solved it all. I was really surprised how easy this is, but here is the code of the updated functions:
func startDataStream(symbol, interval string, wsKlineHandler futures.WsKlineHandler, errHandler futures.ErrHandler) (err error) {
_, stopC, err := futures.WsKlineServe(symbol, interval, wsKlineHandler, errHandler)
if err != nil {
fmt.Println(err)
return err
}
// just save the stop channel
chanMap[symbol] = stopC
return nil
}
And then, the stop function really becomes embarrassing trivial:
func stopDataStream(symbol string) {
stopC := chanMap[symbol] // load the stop channel for the symbol
close(stopC) // just close it.
}
Finally, testing it all out:
var (
chanMap map[string]chan struct{}
)
func runWSDataTest() {
chanMap = make(map[string]chan struct{})
symbol := "BTCUSDT"
interval := "15m"
errHandler := func(err error) { fmt.Println(err) }
wsKlineHandler := getKLineHandler()
println("Start stream")
_ = startDataStream(symbol, interval, wsKlineHandler, errHandler)
time.Sleep(3 * time.Second)
println("Stop stream")
stopDataStream(symbol)
time.Sleep(1 * time.Second)
}
This is it.

Can't close confluent golang kafka consumer

I'm having an issue regarding the disposing of kafka consumer in the end of program execution. Here is code responsible for closing the consumer
func(kc *KafkaConsumer) Dispose() {
Sugar.Info("Disposing of consumer")
kc.mu.Lock()
kc.Consumer.Close();
Sugar.Info("Disposed of consumer")
kc.mu.Unlock()
}
As you might have already noticed, i'm making use of sync.Mutex, inasmuch as consumer is accessed by multiple goroutines. Below is another snippet responsible for reading messages from kafka
func (kc *KafkaConsumer) Consume(signalChan chan os.Signal, ctx context.Context) {
for{
select{
case sig := <-signalChan:
Sugar.Info("Caught signal %v", sig)
break
case <-ctx.Done():
Sugar.Info("Got context done message. Closing consumer...")
kc.Dispose()
break
default:
for{
message, err := kc.Consumer.ReadMessage(-1); if err != nil{
Log.Error(err.Error())
return
}
Sugar.Infof("Got a new message %v",message)
resp := make(chan *KafkaResponseEntity)
go router.UseMessage(*message, resp, ctx)
//Potential deadlock
response := <-resp
/*
Explicit commit of an offset in order to ensure
that request has been successfully processed
*/
kc.Consumer.Commit()
Sugar.Info("Successfully commited an offset")
Sugar.Infof("Just got a response %v", response)
go producer.KP.Produce(response.PaymentId, response.Bytes, "some_random_topic")
}
}
}
}
The problem is that when closing the consumer, program execution simply stalls.
Are there any issues? Should i use cond along with mutex? I'd be very glad if you provide thorough explanation of what might went wrong in my code.
Thanks in advance.
I suspect this is hanging because of:
kc.Consumer.ReadMessage(-1)
Which the documentation states will block indefinitely, hence why it's not closed. The simplest approach would be to make that value a positive time duration (e.g. 1 * time.Second), but then you may get time out errors if messages are not consumed within the timeouts. The time out error is generally innocuous, but is something to account for, from the linked documentation:
Timeout is returned as (nil, err) where err is
`err.(kafka.Error).Code() == kafka.ErrTimedOut`
I'm not yet sure of a good way to utilize the indefinite blocking and allow it to be interrupted. If anyone does know please post the findings!

Resources