Go routines with kafka consumer channel and context - go

I have a simple kafka consumer for which I have created a handle and trying to read it using a go routine:
func process(ctx context.Context){
consumer := queueHandle.Consume(topic_ops_req, consumerHandler)
// Get signal for finish
doneCh := make(chan struct{})
go func(consumer chan *sarama.ConsumerMessage, ctx context.Context) {
for {
select {
case msg, ok := <-consumer:
if !ok {
logger.Info("Channel has been closed")
doneCh <- struct{}{}
return
}
var request queue.Request
err := json.Unmarshal(msg.Value, &request)
if err != nil {
logger.Error("consumer unmarshal err", err)
panic(err)
}
res, err := new_process(ctx, request, service) // call another func
if err != nil {
//TODO
}
result = res
doneCh <- struct{}{}
case <-ctx.Done():
logger.Info(fmt.Sprintf("Context ended with err : %s", ctx.Err()))
doneCh <- struct{}{}
}
}
}(consumer, ctx)
<-doneCh
}
The issue I am seeing is that once I introduce the "case <-ctx.Done()", the go routine does not enter the "case msg, ok := <-consumer" and always returns that the context ended. How do I my go func work with both consumer channel and ctx.Done() ?

Related

go routines and channel to send response

I have the following code:
i have a list to go through and do something with a value from that list, and so i thought of using go routines, but i need to use a max number of go routines, and then in go routine i need to make a call that will get a return of response, err, when the err is different from null I need to terminate all the go routines and return an http response, and if there is no err I need to terminate the go routines and return an http response,
When I have few values ​​it works ok, but when I have many values ​​I have a problem, because when I call cancel I will still have go routines trying to send to the response channel that is already closed and I keep getting errors from:
goroutine 36 [chan send]:
type response struct {
value string
}
func Testing() []response {
fakeValues := getFakeValues()
maxParallel := 25
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
if len(fakeValues) < maxParallel {
maxParallel = len(fakeValues)
}
type responseChannel struct {
Response response
Err error
}
reqChan := make(chan string) //make this an unbuffered channel
resChan := make(chan responseChannel)
wg := &sync.WaitGroup{}
wg.Add(maxParallel)
for i := 0; i < maxParallel; i++ {
go func(ctx context.Context, ch chan string, resChan chan responseChannel) {
for {
select {
case val := <-ch:
resp, err := getFakeResult(val)
resChan <- responseChannel{
Response: resp,
Err: err,
}
case <-ctx.Done():
wg.Done()
return
}
}
}(ctx, reqChan, resChan)
}
go func() {
for _, body := range fakeValues {
reqChan <- body
}
close(reqChan)
cancel()
}()
go func() {
wg.Wait()
close(resChan)
}()
var hasErr error
response := make([]response, 0, len(fakeValues))
for res := range resChan {
if res.Err != nil {
hasErr = res.Err
cancel()
break
}
response = append(response, res.Response)
}
if hasErr != nil {
// return responses.ErrorResponse(hasErr) // returns http response
}
// return responses.Accepted(response, nil) // returns http response
return nil
}
func getFakeValues() []string {
return []string{"a"}
}
func getFakeResult(val string) (response, error) {
if val == "" {
return response{}, fmt.Errorf("ooh noh:%s", val)
}
return response{
value: val,
}, nil
}
The workers end up blocked on sending to resChan because it's not buffered, and after an error, nothing reads from it.
You can either make resChan buffered, with a size at least as large as maxParallel. Or check to see if the context was canceled, e.g. change the resChan <- to
select {
case resChan <- responseChannel{
Response: resp,
Err: err,
}:
case <-ctx.Done():
}
There are two main problems with your solution:
First, if your fakeValues slice has more items than maxParallel+1, your program will block on this part:
for _, body := range fakeValues {
reqChan <- body
}
How does this happen? As you start putting values in reqChan, each started goroutine will read one value from the reqChan and try to write the response to resChan. But, since resChan is still not reading responses, each goroutine will block there (writing to resChan). Eventually, once each goroutine is blocked, reading from the reqChan is blocked as well and you cannot put any more values in it (apart from one buffered value).
Second, you are passing the context to your goroutines, but you are not doing anything with it. You can use ctx.Done() channel to get a signal to exit the goroutine. Something like this:
go func(ctx context.Context, ch chan string, resChan chan responseChannel) {
for {
select {
case val := <-ch:
resp, err := getFakeResult(val)
resChan <- responseChannel{
Response: resp,
Err: err,
}
case <- ctx.Done():
return
}
}
}(ctx, reqChan, resChan)
Now, to tie everything together so that there are no deadlocks, no race conditions, and no situations where values are not processed, a few other changes need to be made. I've posted the entire code below.
func Testing() []response {
fakeValues := getFakeValues()
maxParallel := 25
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
if len(fakeValues) < maxParallel {
maxParallel = len(fakeValues)
}
type responseChannel struct {
Response response
Err error
}
reqChan := make(chan string) //make this an unbuffered channel
resChan := make(chan responseChannel)
wg := &sync.WaitGroup{}
wg.Add(maxParallel)
for i := 0; i < maxParallel; i++ {
go func(ctx context.Context, ch chan string, resChan chan responseChannel) {
for {
select {
case val := <-ch:
resp, err := getFakeResult(val)
resChan <- responseChannel{
Response: resp,
Err: err,
}
case <- ctx.Done():
wg.Done()
return
}
}
wg.Done()
}(ctx, reqChan, resChan)
}
go func() {
for _, body := range fakeValues {
reqChan <- body
}
close(reqChan)
//putting cancel here so that it can terminate all goroutines when all values are read from reqChan
cancel()
}()
go func() {
wg.Wait()
close(resChan)
}()
var hasErr error
response := make([]response, 0, len(fakeValues))
for res := range resChan {
if res.Err != nil {
hasErr = res.Err
cancel()
break
}
response = append(response, res.Response)
}
if hasErr != nil {
return responses.ErrorResponse(hasErr) // returns http response
}
return responses.Accepted(response, nil) // returns http response
}
In short, the changes are:
reqChan is an unbuffered channel, as this will help in cases where values might not get processed when we close goroutines that read data from buffered channels.
worker goroutines have been changed to accommodate the cases of both exiting when error happens and when there is no more data from reqChan to process. wg.Done() is executed when the context is canceled to ensure that resChan is eventually closed.
separate goroutine is created to put the data in the reqChan without blocking the program, close it afterward, and cancel the context.

golang, goroutines race condition in test

I need to subscribe to a topic(topic is a channel) before publishing to a topic, but when creating a thread I need to run go Func to keep listening to channels to process messages (for example from publish or subscribe a new subscribe )
the test works (but not every time), sometimes when I run the test it ends up posting a message on the channel (topic) before I'm listening to the topic (channel)
i have this test:
func Test_useCase_publish(t *testing.T) {
for _, tt := range tests {
tt := tt
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
tt.fields.storage = &RepositoryMock{
GetTopicFunc: func(ctx context.Context, topicName vos.TopicName) (entities.Topic, error) {
return tt.fields.topic, nil
},
}
useCase := New(tt.fields.storage)
subscribed := make(chan struct{})
go func() {
tt.fields.topic.Activate()
ch, _, err := useCase.Subscribe(tt.args.ctx, tt.args.message.TopicName)
require.NoError(t, err)
close(subscribed)
msg, ok := <-ch
if ok {
fmt.Println("msg", msg)
assert.Equal(t, tt.want, msg)
}
}()
<-subscribed
err := useCase.Publish(tt.args.ctx, tt.args.message)
assert.ErrorIs(t, err, tt.wantErr)
})
}
}
topic :
func (t Topic) Activate() {
go t.listenForSubscriptions()
go t.listenForMessages()
go t.listenForKills()
}
func (t *Topic) listenForSubscriptions() {
for newSubCh := range t.newSubCh {
t.Subscribers.Store(newSubCh.GetID(), newSubCh)
}
}
func (t *Topic) listenForKills() {
for subscriberID := range t.killSubCh {
t.Subscribers.Delete(subscriberID)
}
}
func (t *Topic) listenForMessages() {
for msg := range t.newMessageCh {
m := msg
t.Subscribers.Range(func(key, value interface{}) bool {
if key == nil || value == nil {
return false
}
if subscriber, ok := value.(Subscriber); ok {
subscriber.ReceiveMessage(m)
}
return true
})
}
func (t Topic) Dispatch(message vos.Message) {
t.newMessageCh <- message
}
func (t *Topic) listenForMessages() {
for msg := range t.newMessageCh {
m := msg
t.Subscribers.Range(func(key, value interface{}) bool {
if key == nil || value == nil {
return false
}
if subscriber, ok := value.(Subscriber); ok {
subscriber.ReceiveMessage(m)
}
return true
})
}
}
subscribe:
func (u useCase) Subscribe(ctx context.Context, topicName vos.TopicName) (chan vos.Message, vos.SubscriberID, error) {
if err := topicName.Validate(); err != nil {
return nil, "", err
}
topic, err := u.storage.GetTopic(ctx, topicName)
if err != nil {
if !errors.Is(err, entities.ErrTopicNotFound) {
return nil, "", err
}
topic, err = u.createTopic(ctx, topicName)
if err != nil {
return nil, "", err
}
subscriber := entities.NewSubscriber(topic)
subscriptionCh, id := subscriber.Subscribe()
return subscriptionCh, id, nil
}
subscriber := entities.NewSubscriber(topic)
subscriptionCh, id := subscriber.Subscribe()
return subscriptionCh, id, nil
}
func (s Subscriber) Subscribe() (chan vos.Message, vos.SubscriberID) {
s.topic.addSubscriber(s)
return s.subscriptionCh, s.GetID()
}
func (s Subscriber) ReceiveMessage(msg vos.Message) {
s.subscriptionCh <- msg
}
publisher :
func (u useCase) Publish(ctx context.Context, message vos.Message) error {
if err := message.Validate(); err != nil {
return err
}
topic, err := u.storage.GetTopic(ctx, message.TopicName)
if err != nil {
return err
}
topic.Dispatch(message)
return nil
}
when I call subscribe (I send a message to a subscribe to channel and add a subscribe to my thread) when I post a message to a topic I send a message to topic channel
Some points are missing from the code you show, such as the code for .Subscribe() and .Publish(), or how the channels are instanciated (are they buffered/unbuffered ?).
One point can be said, though :
from the looks of (t *Topic) listenForSubscriptions() : this subscribing method does not send any signal to the subscriber that it has been registered.
So my guess is : your useCase.Subscribe(...) call has the information that the created channel has been written on newSubCH, but it hasn't got the inforamtion that t.Subcribers.Store(...) has completed.
So, depending on how the goroutines are scheduled, the message sending in your test function can occur before the channel has actually been registered.
To fix this, you add something that will send a signal back to the caller. One possible way :
type subscribeReq struct{
ch chan Message
done chan struct{}
}
// turn Topic.newSubCh into a chan *subscribeReq
func (t *Topic) listenForSubscriptions() {
for req := range t.newSubCh {
t.Subscribers.Store(newSubCh.GetID(), req.ch)
close(req.done)
}
}
Another point : your test function does not check if the goroutine spun with your go func(){ ... }() call completes at all, so your unit test process may also exit before the goroutine has had the chance to execute fmt.Println(msg).
A common way to check this is to use a sync.WaitGroup :
t.Run(tt.name, func(t *testing.T) {
...
useCase := New(tt.fields.storage)
subscribed := make(chan struct{})
wg := &sync.WaitGroup{} // create a *sync.WaitGroup
wg.Add(1) // increment by 1 (you start only 1 goroutine)
go func() {
defer wg.Done() // have the goroutine call wg.Done() when returning
...
}()
// send message, check that no error occurs
wg.Wait() // block here until the goroutine has completed
})

Concurrently process a lot of files and upload to S3 in Go

I'm migrating a lot of files that are currently stored in a relational database to amazon S3. I'm using go because I had heard about the concurrency of it, but I'm getting very low throughput. I'm new to go so I'm probably not doing it in the best way possible.
This is what I have at the moment
type Attachment struct {
BinaryData []byte `db:"BinaryData"`
CreatedAt time.Time `db:"CreatedDT"`
Id int `db:"Id"`
}
func main() {
connString := os.Getenv("CONNECTION_STRING")
log.SetFlags(log.Ltime)
db, err := sqlx.Connect("sqlserver", connString)
if err != nil {
panic(err)
}
log.Print("Connected to database")
sql := "SELECT TOP 1000 Id,CreatedDT, BinaryData FROM Attachment"
attachmentsDb := []Attachment{}
err = db.Select(&attachmentsDb, sql)
if err != nil {
log.Fatal(err)
}
session, err := session.NewSession(&aws.Config{
Region: aws.String("eu-west-1"),
})
if err != nil {
log.Fatal(err)
return
}
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
err := <-saveAttachment(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done
}
wg.Wait()
//close(in)
defer db.Close()
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
func getKey(att *Attachment) string {
return fmt.Sprintf("%s/%d", os.Getenv("KEY"), att.Id)
}
These loops will executes sequentially because in every loop, it waits for result from channel done so there aren't any benifit from running multiple goroutines. And no need to create a new goroutine in func saveAttachment(), because you already create it in the loops.
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
//New goroutine
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
//Already in a goroutine now, but in func saveAttachment() will create a new goroutine?
err := <-saveAttachment(&att, svc) //There is a goroutine created in this func
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done //This will block until receives the result, after that a new loop countinues
}
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
//Why new goroutine?
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
If you want to upload in parallel, don't do that. You can quickly fix it like this
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Number of goroutines = number of attachments
for _, att := range attachmentsDb {
wg.Add(1)
//One goroutine to uploads for each Attachment
go func(wg *sync.WaitGroup, att Attachment) {
err := saveAtt(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
}(wg, att)
//No blocking after created a goroutine, loops countines to create new goroutine
}
wg.Wait()
fmt.Println("done")
}
//This func will be executed in goroutine, so no need to create a goroutine inside it
func saveAtt(att *Attachment, svc *s3.S3) error {
bucket := os.Getenv("BUCKET")
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
return err
}
But this approach isn't good when there are so many attachments beacause number of goroutines = number of attachments. In this case, you will need a goroutine pool so you can limit number of goroutines to run.
Warining!!!, This is just an example to show goroutine pool logic, you need to implement it by your way
//....
//Create a attachment queue
queue := make(chan *Attachment) //Or use buffered channel: queue := make(chan *Attachment, bufferedSize)
//Send all attachment to queue
go func() {
for _, att := range attachmentsDb {
queue <- &att
}
}()
//....
//Create a goroutine pool
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Use this as const
workerCount := 5
//Number of goroutines = Number of workerCount
for i := 1; i <= workerCount; i++ {
//New goroutine
go func() {
//Get attachment from queue to upload. When the queue channel is empty, this code will blocks
for att := range queue {
err := saveAtt(att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
}
}()
}
//....
//Warning!!! You need to call close channel only WHEN all attachments was uploaded, this code just show how you can end the goroutine pool
//Just close queue channel when all attachments was uploaded, all upload goroutines will end (because of `att := range queue`)
close(queue)
//....

how to return values in a goroutine

I have the code:
go s.addItemSync(ch, cs.ResponseQueue, user)
This calls the func:
func (s *Services) addItemSync(ch types.ChannelInsertion, statusQueueName, user string) {
//func body here
}
I would however like to do this:
if ok, err := go s.addItemSync(ch, cs.ResponseQueue, user); !ok {
if err != nil {
log.Log.Error("Error adding channel", zap.Error(err))
return
}
Which would change the other func to this
func (s *Services) addItemSync(ch types.ChannelInsertion, statusQueueName, user string) (bool, error) {
}
As in, I would like to be able to declare a go func but this errors out every time. Any idea how you can declare a variable while able to call the go func ability for synchronous calls? as seen in the if ok, err := go s.addItemSync(ch, cs.ResponseQueue, user); !ok { line?
If you want to wait until a go-routine has completed, you need to return results in a channel. The basic pattern, without complicating with wait groups, etc. is:
func myFunc() {
// make a channel to receive errors
errChan := make(chan error)
// launch a go routine
go doSomething(myVar, errChan)
// block until something received on the error channel
if err := <- errChan; err != nil {
// something bad happened
}
}
// your async funciton
func doSomething(myVar interface{}, errChan chan error) {
// Do stuff
if something, err := someOtherFunc(myVar); err != nil {
errChan <- err
return
}
// all good - send nil to the error channel
errChan <- nil
}
In your case if you just want to fire off a go-routine and log if an error happens, you can use an anonymous function:
go func() {
if ok, err := s.addItemSync(ch, cs.ResponseQueue, user); !ok {
if err != nil {
log.Log.Error("Error adding channel", zap.Error(err))
}
}
}()
Or if you want to wait for the result:
errChan := make(chan error)
go func() {
if ok, err := s.addItemSync(ch, cs.ResponseQueue, user); !ok {
if err != nil {
errChan <- err
return
}
}
errChan <- nil
}()
// do some other stuff while we wait...
// block until go routine returns
if err := <- errChan; err != nil {
log.Log.Error("Error adding channel", zap.Error(err))
}
Note:
Your code as written, may have unexpected results if it is possible that a response where ok == false would not return an error. If this is a concern, I would suggest creating and returning a new error for cases where !ok && err == nil

Go routine leak fix

I am working on a small service at the moment. From my testing, the code I've written has the possibility of leaking go routines under certain circumstances pertaining to the context. Is there a good and/or idiomatic way to remedy this? I'm providing some sample code below.
func Handle(ctx context.Context, r *Req) (*Response, error) {
ctx, cancel := context.WithTimeout(ctx, time.Second * 5)
defer cancel()
resChan := make(chan Response)
errChan := make(chan error)
go process(r, resChan, errChan)
select {
case ctx.Done():
return nil, ctx.Err()
case res := <-resChan:
return &res, nil
case err := <-errChan:
return nil, err
}
}
func process(r *Req, resChan chan<- Response, errChan chan<- error) {
defer close(errChan)
defer close(resChan)
err := doSomeWork()
if err != nil {
errChan <- err
return
}
err = doSomeMoreWork()
if err != nil {
errChan <- err
return
}
res := Response{}
resChan <- res
}
Hypothetically, if the client cancelled the context or the timeout occurred before the process func had a chance to send on one of the unbuffered channels (resChan, errChan), there would be no channel readers left from Handle and sending on the channels would block indefinitely with no readers. Since process would not return in this case, the channels would also not be closed.
I came up with the process2 as a solution, but I can't help thinking I'm doing something wrong, or there's a better way to handle this.
func process2(ctx context.Context, r *Req, resChan chan<- Response, errChan chan<- error) {
defer close(errChan)
defer close(resChan)
err := doSomeWork()
select {
case <-ctx.Done():
return
default:
if err != nil {
errChan <- err
return
}
}
err = doSomeMoreWork()
select {
case <-ctx.Done():
return
default:
if err != nil {
errChan <- err
return
}
}
res := Response{}
select{
case <-ctx.Done():
return
default:
resChan <- res
}
}
This approach makes sure that each time a channel send is attempted, first the context is checked for having been completed or cancelled. If it was, then it does not attempt the send and returns. I'm pretty sure this fixes any go routine leaking happening in the first process func.
Is there a better way? Maybe I have this all wrong.

Resources