I use https://github.com/Shopify/sarama for interaction with Kafka. I have a topic with, for example, 100 partitions. I have application, which is deployed on 1 host. So, I want to consume from this topic in multiple goroutines.
I see this example - https://github.com/Shopify/sarama/blob/master/examples/consumergroup/main.go , in which we can see, how to create consumer in specific consumer group.
So, my question is, should I create multiply such consumers, or there is some setting in Sarama, where I can set up needed number of consumer goroutines.
P.S. I see this question - https://github.com/Shopify/sarama/issues/140 - but there is no answer, how to create MultiConsumer.
This example shows a fully working console application which can consume for all partitions in a topic creating one goroutine per partition:
It is linked at the end of the thread you posted in your question.
It basically creates one consumer:
c, err := sarama.NewConsumer(strings.Split(*brokerList, ","), config)
Then gets all the partitions for the desired topic:
func getPartitions(c sarama.Consumer) ([]int32, error) {
if *partitions == "all" {
return c.Partitions(*topic)
Then for each partition it creates a PartitionConsumer and consumes from each partition in a different goroutine:
for _, partition := range partitionList {
pc, err := c.ConsumePartition(*topic, partition, initialOffset)
go func(pc sarama.PartitionConsumer) {
defer wg.Done()
for message := range pc.Messages() {
messages <- message
I am attempting to test out a producer writing messages to a topic on a kafka cluster using the Golang client. This works fine writing to a topic on a local cluster, I just copied and pasted the example code from their github repo.
package main
import (
func main() {
p, err := kafka.NewProducer(&kafka.ConfigMap{"bootstrap.servers":"localhost"})
if err != nil {
defer p.Close()
// Delivery report handler for produced messages
go func() {
for e := range p.Events() {
switch ev := e.(type) {
case *kafka.Message:
if ev.TopicPartition.Error != nil {
fmt.Printf("Delivery failed: %v\n", ev.TopicPartition)
} else {
fmt.Printf("Delivered message to %v\n", ev.TopicPartition)
// Produce messages to topic (asynchronously)
topic := "test"
for _, word := range []string{"test message"} {
TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
Value: []byte(word),
}, nil)
// Wait for message deliveries before shutting down
p.Flush(15 * 1000)
I receive the message on my console-consumer no issues.
I then try to do the same thing, just using my remote kafka cluster topic (note I also tried without the ports in the strings):
p, err := kafka.NewProducer(&kafka.ConfigMap{"bootstrap.servers":"HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092"})
it prints the following error:
Delivery failed: test[0]#end(Broker: Not enough in-sync replicas)
The console producer has no issues though:
./bin/kafka-console-producer.sh --broker-list HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test
>proving that this works
The console-consumer receives it:
bin/kafka-console-consumer.sh --bootstrap-server HOSTNAME.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test --from-beginning
proving that this works
Last thing I did was check to see how many In-Sync replicas there were for that topic. If I am reading this correctly, the min should be 2 and there are 3.
./bin/kafka-topics.sh --describe --bootstrap-server HOSTNAME1.amazonaws.com:9092,HOSTNAME2.amazonaws.com:9092,HOSTNAME3.amazonaws.com:9092 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:min.insync.replicas=2,flush.ms=10000,segment.bytes=1073741824,retention.ms=86400000,flush.messages=9223372036854775807,max.message.bytes=1000012,min.cleanable.dirty.ratio=0.5,unclean.leader.election.enable=true,retention.bytes=-1,delete.retention.ms=86400000,segment.ms=604800000
Topic: test Partition: 0 Leader: 3 Replicas: 3 Isr: 3
Any ideas of what else I could look into?
You have min.insync.replicas=2, but the topic only has one replica.
If you have request.required.acks=all (which is the default), then the producer will fail because it cannot replicate what you've produced to the leader broker to the minimum set of required replicas
I believe console producer only sets that property to just 1
there are 3
There's actually only one. That's broker ID 3. You'd see a total of three separate numbers there as ISR if there were actually three replicas
Or if you're using AWS's MSK, this could arise when the EBS storage per broker is completely used for one of the broker and the possible way to overcome is to increase it's storage.
UPDATE: It turned out I had an issue with my ports in Docker. Not sure why that fixed this phenomenon.
I believe I have come across a strange error. I am using the Sarama library and am able to create a consumer successfully.
func main() {
config = sarama.NewConfig()
config.ClientID = "go-kafka-consumer"
config.Consumer.Return.Errors = true
// Create new consumer
master, err := sarama.NewConsumer("localhost:9092", config)
if err != nil {
defer func() {
if err := master.Close(); err != nil {
partitionConsumer, err := master.ConsumePartition("myTopic",0,
if err != nil {
As soon as I break this code up and move outside the main routine, I run into the error:
kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
I have split my code up as follows: the previous main() method I have now converted into a consumer package with a method called NewConsumer() and my new main() calls NewConsumer() like so:
c := consumer.NewConsumer()
The panic statement is getting triggered in the line with sarama.NewConsumer and prints out kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Why would breaking up my code this way trigger Sarama to fail to make the consumer? Does Sarama need to be run directly from main?
I think you create this way 2 or more consumers that get grouped into a single group (probably go-kafka-consumer). Your Broker has a Topic with 1 Partition, so one of Group gets assigned, the other one produces this error message. If you would raise the Partitions of that Topic to 2 the error would go away.
But I think your problem is that you somehow have instantiated more consumers than before.
From Kafka in a Nutshell:
Consumers can also be organized into consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic. If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from. If you have more partitions than consumers then consumers will receive messages from multiple partitions. If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.
They would not exactly produce an Error, so that would be an issue with Sarama.
I've never used kafka before. I have two test Go programs accessing a local kafka instance: a reader and a writer. I'm trying to tweak my producer, consumer, and kafka server settings to get a particular behavior.
My writer:
package main
import (
func main() {
topics := []string{
progress := make(map[string]int)
for _, t := range topics {
progress[t] = 0
producer, err := kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "localhost",
"group.id": "0",
if err != nil {
defer producer.Close()
fmt.Println("producing messages...")
for i := 0; i < 30; i++ {
index := rand.Intn(len(topics))
topic := topics[index]
num := progress[topic]
fmt.Printf("%s => %d\n", topic, num)
msg := &kafka.Message{
Value: []byte(strconv.Itoa(num)),
TopicPartition: kafka.TopicPartition{
Topic: &topic,
err = producer.Produce(msg, nil)
if err != nil {
progress[topic] = num
time.Sleep(time.Millisecond * 100)
There are three topics that exist on my local kafka: policymanager-100, policymanager-200, policymanager-300. They each only have 1 partition to ensure all messages are sorted by the time kafka receives them. My writer will randomly pick one of those topics and issue a message consisting of a number that increments solely for that topic. When it's done running, I expect the queues to look something like this (topic names shortened for legibility):
100: 1 2 3 4 5 6 7 8 9 10 11
200: 1 2 3 4 5 6 7
300: 1 2 3 4 5 6 7 8 9 10 11 12
So far so good. I'm trying to configure things so that any number of consumers can be spun up and consume these messages in order. By "in-order" I mean that no consumer should get message 2 for topic 100 until message 1 is COMPLETED (not just started). If message 1 for topic 100 is being worked on, consumers are free to consume from other topics that currently don't have a message being processed. If a message of a topic has been sent to a consumer, that entire topic should become "locked" until either a timeout assumes that the consumer failed or the consumer commits the message, then the topic is "unlocked" to have it's next message made available to be consumed.
My reader:
package main
import (
func main() {
count := 2
for i := 0; i < count; i++ {
go consumer(i + 1)
// hold this thread open indefinitely
select {}
func consumer(id int) {
c, err := kafka.NewConsumer(&kafka.ConfigMap{
"bootstrap.servers": "localhost",
"group.id": "0", // strconv.Itoa(id),
"enable.auto.commit": "false",
if err != nil {
c.SubscribeTopics([]string{`^policymanager-.+$`}, nil)
for {
msg, err := c.ReadMessage(-1)
if err != nil {
fmt.Printf("%d) Message on %s: %s\n", id, msg.TopicPartition, string(msg.Value))
_, err = c.CommitMessage(msg)
if err != nil {
fmt.Printf("ERROR commiting: %+v\n", err)
From my current understanding, the way I'm likely to achieve this is by setting up my consumer properly. I've tried many different variations of this program. I've tried having all my goroutines share the same consumer. I've tried using a different group.id for each goroutine. None of these was the right configuration to get the behavior I'm after.
What the posted code does is empty out one topic at a time. Despite having multiple goroutines, the process will read all of 100 then move to 200 then 300 and only one goroutine will actually do all the reading. When I let each goroutine have a different group.id then messages get read by multiple goroutines which I would like to prevent.
My example consumer is simply breaking things up with goroutines but when I begin working this project into my use case at work, I'll need this to work across multiple kubernetes instances that won't be talking to each other so using anything that interacts between goroutines won't work as soon as there are 2 instances on 2 kubes. That's why I'm hoping to make kafka do the gatekeeping I want.
Generally speaking, you cannot. Even if you had a single consumer that consumed all the partitions for the topic, the partitions would be consumed in a non-deterministic order and your total ordering across all partitions would not be guaranteed.
Try Keyed Messages, think you may find this of good use for your use case.
I'm trying to find a good method to consume asynchronously from an input queue, process the content using several workers and then publish to an output queue. So far I've tried a number of examples, most recently using the code from here and here as inspiration.
My current code doesn't appear to be doing what it should be however, increasing the number of workers doesn't increase performance (msg/s consumed or published) and the number of goroutines remains fairly static whilst running.
func main() {
maxWorkers := 10
// channel for jobs
in := make(chan []byte)
out := make(chan []byte)
// start workers
wg := &sync.WaitGroup{}
for i := 1; i <= maxWorkers; i++ {
defer wg.Done()
go processor(in, out)
// add jobs
go collector(in)
go sender(out)
// wait for workers to complete
The collector is basically the example from the RabbitMQ site with a goroutine that collects messages from the queue and places them on the 'in' channel:
forever := make(chan bool)
go func() {
for d := range msgs {
in <- d.Body
log.Printf("[*] Waiting for messages. To exit press CTRL+C")
The processor receives an 'in' and 'out' channel, unmarshals JSON, performs a series of regexes and then places the output into the 'out' channel:
func processor(in chan []byte, out chan []byte) {
var (
// list of regexes declared here
for {
body := <-in
jsonIn := &Data{}
err := json.Unmarshal(body, jsonIn)
if err != nil {
log.Fatalln("Failed to decode:", err)
content := jsonIn.Content
//process regexes using:
//jsonIn.a = r1.FindAllString(content, -1)
jsonOut, _ := json.Marshal(jsonIn)
out <- jsonOut
And finally the sender is simply the code from the RabbitMQ site, setting up a connection, reading from the 'out' channel and then publishing to a RMQ queue:
for {
jsonOut := <-out
err = ch.Publish(
"", // exchange
q.Name, // routing key
false, // mandatory
DeliveryMode: amqp.Persistent,
ContentType: "text/json",
Body: []byte(jsonOut),
failOnError(err, "Failed to publish a message")
This is a pattern that I'll be using quite a lot, so I'm spending a lot of time trying to find something that works correctly (and well) - any advice or help would be appreciated (and in case it isn't obvious, I'm new to Go).
There are a couple of things that jump out:
Done within main function
for i := 1; i <= maxWorkers; i++ {
defer wg.Done()
go processor(in, out)
The defer here is executed when main returns so it's not actually indicating when processing is complete. I don't think this'll have an effect on the performance profile of your program though.
To address this you could pass in wg *sync.WaitGroup to your processor so your processor can indicate when it's done.
CPU Bound Processing
Parsing messages and performing Regex is a cpu intensive workload. How many cores is your machine? How is throughput affected if you run your program on two separate machines, does throughput 2x? What if you double your amount of cores? What about running your program with 1 worker vs 2 processor workers? does that double throughput? Are you maxing out your rabbitmq local instance? is it the bottleneck??
Setting up benchmarking and load testing harnesses should allow you to setup experiments to see where your bottle necks are :)
For queue based services it's pretty easy to setup a test harness to fill rabbitmq with a set backlog and benchmark how fast you can process those messages, or to setup a load generator to send x messages/second to rabbitmq and observe if you can keep up.
Does rabbitmq have good visibility into message processing throughput? If not I frequently add a counter to go code and then log the overall averaged throughput on an interval to get a rough idea of performance:
start := time.Now()
updateInterval := time.Tick(1 * time.Second)
numIn := 0
for {
select {
case <-updateInterval:
log.Infof("IN - Count: %d", numIn)
log.Infof("IN - Througput: %.0f events/second",
case e := <-msgs:
in <- d.Body
We have a process whereby users request files that we need to get from our source. This source isn't the most reliable so we implemented a queue using Amazon SQS. We put the download URL into the queue and then we poll it with a small app that we wrote in Go. This app simply retrieves the messages, downloads the file and then pushes it to S3 where we store it. Once all of this is complete it calls back a service which will email the user to let them know that the file is ready.
Originally I wrote this to create n channels and then attached 1 go-routine to each and had the go-routine in an infinite loop. This way I could ensure that I was only ever processing a fixed number of downloads at a time.
I realised that this isn't the way that channels are supposed to be used and, if I'm understanding correctly now, there should actually be one channel with n go-routines receiving on that channel. Each go-routine is in an infinite loop, waiting on a message and when it receives it will process the data, do everything that it's supposed to and when it's done it will wait on the next message. This allows me to ensure that I'm only ever processing n files at a time. I think this is the right way to do it. I believe this is fan-out, right?
What I don't need to do, is to merge these processes back together. Once the download is done it is calling back a remote service so that handles the remainder of the process. There is nothing else that the app needs to do.
OK, so some code:
func main() {
queue, err := ConnectToQueue() // This works fine...
if err != nil {
log.Fatalf("Could not connect to queue: %s\n", err)
msgChannel := make(chan sqs.Message, 10)
for i := 0; i < MAX_CONCURRENT_ROUTINES; i++ {
go processMessage(msgChannel, queue)
for {
response, _ := queue.ReceiveMessage(MAX_SQS_MESSAGES)
for _, m := range response.Messages {
msgChannel <- m
func processMessage(ch <-chan sqs.Message, queue *sqs.Queue) {
for {
m := <-ch
// Do something with message m
// Delete message from queue when we're done
Am I anywhere close here? I have n running go-routines (where MAX_CONCURRENT_ROUTINES = n) and in the loop we will keep passing messages in to the single channel. Is this the right way to do it? Do I need to close anything or can I just leave this running indefinitely?
One thing that I'm noticing is that SQS is returning messages but once I've had 10 messages passed into processMessage() (10 being the size of the channel buffer) that no further messages are actually processed.
Thanks all
That looks fine. A few notes:
You can limit the work parallelism by means other than limiting the number of worker routines you spawn. For example you can create a goroutine for every message received, and then have the spawned goroutine wait for a semaphore that limits the parallelism. Of course there are tradeoffs, but you aren't limited to just the way you've described.
sem := make(chan struct{}, n)
work := func(m sqs.Message) {
sem <- struct{}{} // When there's room we can proceed
// do the work
<-sem // Free room in the channel
for _, m := range queue.ReceiveMessage(MAX_SQS_MESSAGES) {
for _, m0 := range m {
go work(m0)
The limit of only 10 messages being processed is being caused elsewhere in your stack. Possibly you're seeing a race where the first 10 fill the channel, and then the work isn't completing, or perhaps you're accidentally returning from the worker routines. If your workers are persistent per the model you've described, you'll want to be certain that they don't return.
It's not clear if you want the process to return after you've processed some number of messages. If you do want this process to exit, you'll need to wait for all the workers to finish their current tasks, and probably signal them to return afterwards. Take a look at sync.WaitGroup for synchronizing their completion, and having another channel to signal that there's no more work, or close msgChannel, and handle that in your workers. (Take a look at the 2-tuple return channel receive expression.)