I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.
Related
I am building a daemon and I have two services that will be sending data to and from each other. Service A is what produces the data and service B a is Data Buffer service or like a queue. So from the main.go file, service B is instantiated and started. The Start() method will perform the buffer() function as a goroutine because this function waits for data to be passed onto a channel and I don't want the main process to halt waiting for buffer to complete. Then Service A is instantiated and started. It is then also "registered" with Service B.
I created a method called RegisterWithBufferService for Service A that creates two new channels. It will store those channels as it's own attributes and also provide them to Service B.
func (s *ServiceA) RegisterWithBufferService(bufService *data.DataBuffer) error {
newIncomingChan := make(chan *data.DataFrame, 1)
newOutgoingChan := make(chan []byte, 1)
s.IncomingBuffChan = newIncomingChan
s.OutgoingDataChannels = append(s.OutgoingDataChannels, newOutgoingChan)
bufService.DataProviders[s.ServiceName()] = data.DataProviderInfo{
IncomingChan: newOutgoingChan, //our outGoing channel is their incoming
OutgoingChan: newIncomingChan, // our incoming channel is their outgoing
}
s.DataBufferService = bufService
bufService.NewProvider <- s.ServiceName() //The DataBuffer service listens for new services and creates a new goroutine for buffering
s.Logger.Info().Msg("Registeration completed.")
return nil
}
Buffer essentially listens for incoming data from Service A, decodes it using Decode() and then adds it to a slice called buf. If the slice is greater in length than bufferPeriod then it will send the first item in the slice in the Outgoing channel back to Service A.
func (b* DataBuffer) buffer(bufferPeriod int) {
for {
select {
case newProvider := <- b.NewProvider:
b.wg.Add(1)
/*
newProvider is a string
DataProviders is a map the value it returns is a struct containing the Incoming and
Outgoing channels for this service
*/
p := b.DataProviders[newProvider]
go func(prov string, in chan []byte, out chan *DataFrame) {
defer b.wg.Done()
var buf []*DataFrame
for {
select {
case rawData := <-in:
tmp := Decode(rawData) //custom decoding function. Returns a *DataFrame
buf = append(buf, tmp)
if len(buf) < bufferPeriod {
b.Logger.Info().Msg("Sending decoded data out.")
out <- buf[0]
buf = buf[1:] //pop
}
case <- b.Quit:
return
}
}
}(newProvider, p.IncomingChan, p.OutgoingChan)
}
case <- b.Quit:
return
}
}
Now Service A has a method called record that will periodically push data to all the channels in it's OutgoingDataChannels attribute.
func (s *ServiceA) record() error {
...
if atomic.LoadInt32(&s.Listeners) != 0 {
s.Logger.Info().Msg("Sending raw data to data buffer")
for _, outChan := range s.OutgoingDataChannels {
outChan <- dataBytes // the receiver (Service B) is already listening and this doesn't hang
}
s.Logger.Info().Msg("Raw data sent and received") // The logger will output this so I know it's not hanging
}
}
The problem is that Service A seems to push the data successfully using record but Service B never goes into the case rawData := <-in: case in the buffer sub-goroutine. Is this because I have nested goroutines? Incase it's not clear, when Service B is started, it calls buffer but because it would hang otherwise, I made the call to buffer a goroutine. So then when Service A calls RegisterWithBufferService, the buffer goroutine creates a goroutine to listen for new data from Service B and push it back to Service A once the buffer is filled. I hope I explained it clearly.
EDIT 1
I've made a minimal, reproducible example.
package main
import (
"fmt"
"sync"
"sync/atomic"
"time"
)
var (
defaultBufferingPeriod int = 3
DefaultPollingInterval int64 = 10
)
type DataObject struct{
Data string
}
type DataProvider interface {
RegisterWithBufferService(*DataBuffer) error
ServiceName() string
}
type DataProviderInfo struct{
IncomingChan chan *DataObject
OutgoingChan chan *DataObject
}
type DataBuffer struct{
Running int32 //used atomically
DataProviders map[string]DataProviderInfo
Quit chan struct{}
NewProvider chan string
wg sync.WaitGroup
}
func NewDataBuffer() *DataBuffer{
var (
wg sync.WaitGroup
)
return &DataBuffer{
DataProviders: make(map[string]DataProviderInfo),
Quit: make(chan struct{}),
NewProvider: make(chan string),
wg: wg,
}
}
func (b *DataBuffer) Start() error {
if ok := atomic.CompareAndSwapInt32(&b.Running, 0, 1); !ok {
return fmt.Errorf("Could not start Data Buffer Service.")
}
go b.buffer(defaultBufferingPeriod)
return nil
}
func (b *DataBuffer) Stop() error {
if ok := atomic.CompareAndSwapInt32(&b.Running, 1, 0); !ok {
return fmt.Errorf("Could not stop Data Buffer Service.")
}
for _, p := range b.DataProviders {
close(p.IncomingChan)
close(p.OutgoingChan)
}
close(b.Quit)
b.wg.Wait()
return nil
}
// buffer creates goroutines for each incoming, outgoing data pair and decodes the incoming bytes into outgoing DataFrames
func (b *DataBuffer) buffer(bufferPeriod int) {
for {
select {
case newProvider := <- b.NewProvider:
fmt.Println("Received new Data provider.")
if _, ok := b.DataProviders[newProvider]; ok {
b.wg.Add(1)
p := b.DataProviders[newProvider]
go func(prov string, in chan *DataObject, out chan *DataObject) {
defer b.wg.Done()
var (
buf []*DataObject
)
fmt.Printf("Waiting for data from: %s\n", prov)
for {
select {
case inData := <-in:
fmt.Printf("Received data from: %s\n", prov)
buf = append(buf, inData)
if len(buf) > bufferPeriod {
fmt.Printf("Queue is filled, sending data back to %s\n", prov)
out <- buf[0]
fmt.Println("Data Sent")
buf = buf[1:] //pop
}
case <- b.Quit:
return
}
}
}(newProvider, p.IncomingChan, p.OutgoingChan)
}
case <- b.Quit:
return
}
}
}
type ServiceA struct{
Active int32 // atomic
Stopping int32 // atomic
Recording int32 // atomic
Listeners int32 // atomic
name string
QuitChan chan struct{}
IncomingBuffChan chan *DataObject
OutgoingBuffChans []chan *DataObject
DataBufferService *DataBuffer
}
// A compile time check to ensure ServiceA fully implements the DataProvider interface
var _ DataProvider = (*ServiceA)(nil)
func NewServiceA() (*ServiceA, error) {
var newSliceOutChans []chan *DataObject
return &ServiceA{
QuitChan: make(chan struct{}),
OutgoingBuffChans: newSliceOutChans,
name: "SERVICEA",
}, nil
}
// Start starts the service. Returns an error if any issues occur
func (s *ServiceA) Start() error {
atomic.StoreInt32(&s.Active, 1)
return nil
}
// Stop stops the service. Returns an error if any issues occur
func (s *ServiceA) Stop() error {
atomic.StoreInt32(&s.Stopping, 1)
close(s.QuitChan)
return nil
}
func (s *ServiceA) StartRecording(pol_int int64) error {
if ok := atomic.CompareAndSwapInt32(&s.Recording, 0, 1); !ok {
return fmt.Errorf("Could not start recording. Data recording already started")
}
ticker := time.NewTicker(time.Duration(pol_int) * time.Second)
go func() {
for {
select {
case <-ticker.C:
fmt.Println("Time to record...")
err := s.record()
if err != nil {
return
}
case <-s.QuitChan:
ticker.Stop()
return
}
}
}()
return nil
}
func (s *ServiceA) record() error {
current_time := time.Now()
ct := fmt.Sprintf("%02d-%02d-%d", current_time.Day(), current_time.Month(), current_time.Year())
dataObject := &DataObject{
Data: ct,
}
if atomic.LoadInt32(&s.Listeners) != 0 {
fmt.Println("Sending data to Data buffer...")
for _, outChan := range s.OutgoingBuffChans {
outChan <- dataObject // the receivers should already be listening
}
fmt.Println("Data sent.")
}
return nil
}
// RegisterWithBufferService satisfies the DataProvider interface. It provides the bufService with new incoming and outgoing channels along with a polling interval
func (s ServiceA) RegisterWithBufferService(bufService *DataBuffer) error {
if _, ok := bufService.DataProviders[s.ServiceName()]; ok {
return fmt.Errorf("%v data provider already registered with Data Buffer.", s.ServiceName())
}
newIncomingChan := make(chan *DataObject, 1)
newOutgoingChan := make(chan *DataObject, 1)
s.IncomingBuffChan = newIncomingChan
s.OutgoingBuffChans = append(s.OutgoingBuffChans, newOutgoingChan)
bufService.DataProviders[s.ServiceName()] = DataProviderInfo{
IncomingChan: newOutgoingChan, //our outGoing channel is their incoming
OutgoingChan: newIncomingChan, // our incoming channel is their outgoing
}
s.DataBufferService = bufService
bufService.NewProvider <- s.ServiceName() //The DataBuffer service listens for new services and creates a new goroutine for buffering
return nil
}
// ServiceName satisfies the DataProvider interface. It returns the name of the service.
func (s ServiceA) ServiceName() string {
return s.name
}
func main() {
var BufferedServices []DataProvider
fmt.Println("Instantiating and Starting Data Buffer Service...")
bufService := NewDataBuffer()
err := bufService.Start()
if err != nil {
panic(fmt.Sprintf("%v", err))
}
defer bufService.Stop()
fmt.Println("Data Buffer Service successfully started.")
fmt.Println("Instantiating and Starting Service A...")
serviceA, err := NewServiceA()
if err != nil {
panic(fmt.Sprintf("%v", err))
}
BufferedServices = append(BufferedServices, *serviceA)
err = serviceA.Start()
if err != nil {
panic(fmt.Sprintf("%v", err))
}
defer serviceA.Stop()
fmt.Println("Service A successfully started.")
fmt.Println("Registering services with Data Buffer...")
for _, s := range BufferedServices {
_ = s.RegisterWithBufferService(bufService) // ignoring error msgs for base case
}
fmt.Println("Registration complete.")
fmt.Println("Beginning recording...")
_ = atomic.AddInt32(&serviceA.Listeners, 1)
err = serviceA.StartRecording(DefaultPollingInterval)
if err != nil {
panic(fmt.Sprintf("%v", err))
}
for {
select {
case RTD := <-serviceA.IncomingBuffChan:
fmt.Println(RTD)
case <-serviceA.QuitChan:
atomic.StoreInt32(&serviceA.Listeners, 0)
bufService.Quit<-struct{}{}
}
}
}
Running on Go 1.17. When running the example, it should print the following every 10 seconds:
Time to record...
Sending data to Data buffer...
Data sent.
But then Data buffer never goes into the inData := <-in case.
To diagnose this I changed fmt.Println("Sending data to Data buffer...") to fmt.Println("Sending data to Data buffer...", s.OutgoingBuffChans) and the output was:
Time to record...
Sending data to Data buffer... []
So you are not actually sending the data to any channels. The reason for this is:
func (s ServiceA) RegisterWithBufferService(bufService *DataBuffer) error {
As the receiver is not a pointer when you do the s.OutgoingBuffChans = append(s.OutgoingBuffChans, newOutgoingChan) you are changing s.OutgoingBuffChans in a copy of the ServiceA which is discarded when the function exits. To fix this change:
func (s ServiceA) RegisterWithBufferService(bufService *DataBuffer) error {
to
func (s *ServiceA) RegisterWithBufferService(bufService *DataBuffer) error {
and
BufferedServices = append(BufferedServices, *serviceA)
to
BufferedServices = append(BufferedServices, serviceA)
The amended version outputs:
Time to record...
Sending data to Data buffer... [0xc0000d8060]
Data sent.
Received data from: SERVICEA
Time to record...
Sending data to Data buffer... [0xc0000d8060]
Data sent.
Received data from: SERVICEA
So this resolves the reported issue (I would not be suprised if there are other issues but hopefully this points you in the right direction). I did notice that the code you originally posted does use a pointer receiver so that might have suffered from another issue (but its difficult to comment on code fragments in a case like this).
There are such models:
type ChromeBasedDirections struct {
CurrencyFromName string
CurrencyToName string
URL string
ParseResponse ParseResponse
}
type ParseResponse struct {
CurrentPrice float64
Change24H float64
Err error
}
type ParseResponseChan struct {
URL string
CurrentPrice float64
Change24H float64
Err error
}
The function is like this:
func ParseBNCByURL(u string, chanResponse chan models.ParseResponseChan) {
var parseResponse models.ParseResponseChan
//..
// code that fill up parseResponse
//..
if err != nil {
parseResponse.Err = err
chanResponse <- parseResponse
return
}
parseResponse.CurrentPrice = price
parseResponse.Change24H = change24H
chanResponse <- parseResponse
return
}
And this is how the function call goes:
func initParserMultiTread() {
var urls = []string{"url_0","url_1","url_2","url_n"} // on production it will be taken from the json file
var chromeBasedDirections []models.ChromeBasedDirections
for _, url := range urls {
chromeBasedDirections = append(chromeBasedDirections, models.ChromeBasedDirections{
CurrencyFromName: "",
CurrencyToName: "",
URL: url,
ParseResponse: models.ParseResponse{},
})
}
var parseResponseChan = make(chan models.ParseResponseChan) // here's how to do without hardcode?
for _, dir := range chromeBasedDirections {
go controllers.ParseBNCByURL(dir.URL, parseResponseChan)
}
}
If initParserMultiTread is executed, the function will complete before ParseBNCByURL is executed. This is because the channel has not been read. If it was possible to use hardcode, then it would be possible for each url_n to create its own chan models.ParseResponseChan and then in a loop compare ChromeBasedDirections.URL and ParseResponseChan.URL, and if there is a match then fill in ChromeBasedDirections.ParseResponse.
But I need to avoid hardcode and do everything in a loop. In general, this is far from the first option as I try to do multi-threaded execution of the ParseBNCByURL function.
That is, there is a ChromeBasedDirections model with filled CurrencyFrom/ToName and URL fields, and I need to execute ParseBNCByURL in a multi-thread, which will fill ParseResponse.
I tried it again, but the function doesn't work in multithreaded mode:
var urls = []string{"url_0","url_1","url_2","url_n"}
var chromeResults []models.ChromeBasedDirections
for _, url := range urls {
var chromeResult = make(chan models.ChromeBasedDirections)
go controllers.ParseBNCByURL(url, chromeResult)
chromeResults = append(chromeResults, <-chromeResult)
}
The problem with your code is that you are not reading the parseResponseChan anywhere. All your goroutines get blocked because of that, and that is why the initParserMultiTread function finishes before your goroutines.
You need to add code for reading the parseResponseChan channel, as well as code for closing the parseResponseChan channel once all goroutines are done with execution.
One solution is synchronization with sync.WaitGroup.
First a change in your ParseBNCByURL function. This function needs to send a signal when it's done.
func ParseBNCByURL( wg *sync.WaitGroup, u string, chanResponse chan models.ParseResponseChan) {
defer wg.Done() //this signals when the goroutine is done
//the rest of the function's body
In the initParserMultiTread function, you need a way to wait for signals from all goroutines to close the parseResponseChan channel and to read the response from that channel.
func initParserMultiTread() {
var urls = []string{"url_0","url_1","url_2","url_n"} // on production it will be taken from the json file
var chromeBasedDirections []models.ChromeBasedDirections
var wg sync.WaitGroup
wg.Add(len(urls)) // add how many goroutines will be initialized
for _, url := range urls {
chromeBasedDirections = append(chromeBasedDirections, models.ChromeBasedDirections{
CurrencyFromName: "",
CurrencyToName: "",
URL: url,
ParseResponse: models.ParseResponse{},
})
}
var parseResponseChan = make(chan models.ParseResponseChan)
//init a goroutine for closing the parseResponseChan once all goroutines are done
go func(wg *sync.WaitGroup, ch chan models.ParseResponseChan) {
wg.Wait() //wait for all goroutines to send the signal
close(ch) //close the channel
}(&wg, parseResponseChan)
for _, dir := range chromeBasedDirections {
go controllers.ParseBNCByURL(&wg, dir.URL, parseResponseChan)
}
//read the responses from the channel
for response := range parseResponseChan {
//some code to handle the responses
}
}
Once all the goroutines are initialized, you start reading from the parseResponseChan channel. This will block the initParserMultiTread function from returning until all goroutines are done with execution and the parseResponseChan channel closes.
How we can set something like listener on go channels that when someone has read something from the channel, that notify us?
Imagine we have a sequence number for channel entries and we wanna decrement it when someone had read a value from our channel somewhere out of our package.
Unbuffered channels hand off data synchronously, so you already know when the data is read. Buffered channels work similarly when the buffer is full, but otherwise they don't block the same, so this approach wouldn't tell you quite the same thing. Depending on what your needs really are, consider also using tools like sync.WaitGroup.
ch = make(chan Data)
⋮
for {
⋮
// make data available
ch <- data
// now you know it was read
sequenceNumber--
⋮
}
You could create a channel relay mechanism, to capture read events in realtime.
So for example:
func relayer(in <-chan MyStruct) <-chan MyStruct {
out := make(chan MyStruct) // non-buffered chan (see below)
go func() {
defer close(out)
readCountLimit := 10
for item := range in {
out <- item
// ^^^^ so this will block until some worker has read from 'out'
readCountLimit--
}
}()
return out
}
Usage:
type MyStruct struct {
// put your data fields here
}
ch := make(chan MyStruct) // <- original channel - used by producer to write to
rch := relayer(ch) // <- relay channel - used to read from
// consumers
go worker("worker 1", rch)
go worker("worker 2", rch)
// producer
for { ch <- MyStruct{} }
You can do it in manual mode. implement some sort of ACK marker to the message.
Something like this:
type Msg struct {
Data int
ack bool
}
func (m *Msg) Ack() {
m.ack = true
}
func (m *Msg) Acked() bool {
return m.ack
}
func main() {
ch := make(chan *Msg)
msg := &Msg{Data: 1}
go func() {
for {
if msg.Acked() {
// do smth
}
time.Sleep(10 * time.Second)
}
}()
ch <- msg
for msg := range ch {
msg.Ack()
}
}
Code not tested.
You can also add some additional information to Ack() method, say meta information about package and func, from where Ack() was called, this answer may be related: https://stackoverflow.com/a/35213181/3782382
I have a channel of thousands of IDs that need to be processed in parallel inside goroutines. How could I implement a lock so that goroutines cannot process the same id at the same time, should it be repeated in the channel?
package main
import (
"fmt"
"sync"
"strconv"
"time"
)
var wg sync.WaitGroup
func main() {
var data []string
for d := 0; d < 30; d++ {
data = append(data, "id1")
data = append(data, "id2")
data = append(data, "id3")
}
chanData := createChan(data)
for i := 0; i < 10; i++ {
wg.Add(1)
process(chanData, i)
}
wg.Wait()
}
func createChan(data []string) <-chan string {
var out = make(chan string)
go func() {
for _, val := range data {
out <- val
}
close(out)
}()
return out
}
func process(ids <-chan string, i int) {
go func() {
defer wg.Done()
for id := range ids {
fmt.Println(id + " (goroutine " + strconv.Itoa(i) + ")")
time.Sleep(1 * time.Second)
}
}()
}
--edit:
All values need to be processed in any order, but "id1, "id2" & "id3" need to block so they cannot be processed by more than one goroutine at the same time.
The simplest solution here is to not send the duplicate values at all, and then no synchronization is required.
func createChan(data []string) <-chan string {
seen := make(map[string]bool)
var out = make(chan string)
go func() {
for _, val := range data {
if seen[val] {
continue
}
seen[val] = true
out <- val
}
close(out)
}()
return out
}
I've found a solution. Someone has written a package (github.com/EagleChen/mapmutex) to do exactly what I needed:
package main
import (
"fmt"
"github.com/EagleChen/mapmutex"
"strconv"
"sync"
"time"
)
var wg sync.WaitGroup
var mutex *mapmutex.Mutex
func main() {
mutex = mapmutex.NewMapMutex()
var data []string
for d := 0; d < 30; d++ {
data = append(data, "id1")
data = append(data, "id2")
data = append(data, "id3")
}
chanData := createChan(data)
for i := 0; i < 10; i++ {
wg.Add(1)
process(chanData, i)
}
wg.Wait()
}
func createChan(data []string) <-chan string {
var out = make(chan string)
go func() {
for _, val := range data {
out <- val
}
close(out)
}()
return out
}
func process(ids <-chan string, i int) {
go func() {
defer wg.Done()
for id := range ids {
if mutex.TryLock(id) {
fmt.Println(id + " (goroutine " + strconv.Itoa(i) + ")")
time.Sleep(1 * time.Second)
mutex.Unlock(id)
}
}
}()
}
Your problem as stated is difficult by definition and my first choice would be to re-architect the application to avoid it, but if that's not an option:
First, I assume that if a given ID is repeated you still want it processed twice, but not in parallel (if that's not the case and the 2nd instance must be ignored, it becomes even more difficult, because you have to remember every ID you have processed forever, so you don't run the task over it twice).
To achieve your goal, you must keep track of every ID that is being acted upon in a goroutine - a go map is your best option here (note that its size will grow up to as many goroutines as you spin in parallel!). The map itself must be protected by a lock, as it is modified from multiple goroutines.
Another simplification that I'd take is that it is OK for an ID removed from the channel to be added back to it if found to be currently processed by another gorotuine. Then, we need map[string]bool as the tracking device, plus a sync.Mutex to guard it. For simplicity, I assume the map, mutex and the channel are global variables; but that may not be convenient for you - arrange access to those as you see fit (arguments to the goroutine, closure, etc.).
import "sync"
var idmap map[string]bool
var mtx sync.Mutex
var queue chan string
func process_one_id(id string) {
busy := false
mtx.Lock()
if idmap[id] {
busy = true
} else {
idmap[id] = true
}
mtx.Unlock()
if busy { // put the ID back on the queue and exit
queue <- id
return
}
// ensure the 'busy' mark is cleared at the end:
defer func() { mtx.Lock(); delete(idmap, id); mtx.Unlock() }()
// do your processing here
// ....
}
I'm pretty new to Go so sorry if the topic is wrong but I hope you understand my question. I want to process events to different go routines via a channel. Here is some sample code
type Event struct {
Host string
Command string
Output string
}
var (
incoming = make(chan Event)
)
func processEmail(ticker* time.Ticker) {
for {
select {
case t := <-ticker.C:
fmt.Println("Email Tick at", t)
case e := <-incoming:
fmt.Println("EMAIL GOT AN EVENT!")
fmt.Println(e)
}
}
}
func processPagerDuty(ticker* time.Ticker) {
for {
select {
case t := <-ticker.C:
fmt.Println("Pagerduty Tick at", t)
case e := <-incoming:
fmt.Println("PAGERDUTY GOT AN EVENT!")
fmt.Println(e)
}
}
}
func main() {
err := gcfg.ReadFileInto(&cfg, "dispatch-api.cfg")
if err != nil {
fmt.Printf("Error loading the config")
}
ticker := time.NewTicker(time.Second * 10)
go processEmail(ticker)
ticker := time.NewTicker(time.Second * 1)
go processPagerDuty(ticker)
}
func eventAdd(r render.Render, params martini.Params, req *http.Request) {
// create an event now
e := Event{Host: "web01-east.domain.com", Command: "foo", Output: "bar"}
incoming <- e
}
So the ticker events work just create. When I issue an API call to create an event I just get output from the processEmail function. Its whatever go routine is called first will get the event over the channel.
Is there a way for both functions to get that event?
You can use fan in and fan out (from Rob Pike's speech):
package main
func main() {
// feeders - feeder1, feeder2 and feeder3 are used to fan in
// data into one channel
go func() {
for {
select {
case v1 := <-feeder1:
mainChannel <- v1
case v2 := <-feeder2:
mainChannel <- v2
case v3 := <-feeder3:
mainChannel <- v3
}
}
}()
// dispatchers - not actually fan out rather dispatching data
go func() {
for {
v := <-mainChannel
// use this to prevent leaking goroutines
// (i.e. when one consumer got stuck)
done := make(chan bool)
go func() {
consumer1 <- v
done <- true
}()
go func() {
consumer2 <- v
done <- true
}()
go func() {
consumer3 <- v
done <- true
}()
<-done
<-done
<-done
}
}()
// or fan out (when processing the data by just one consumer is enough)
go func() {
for {
v := <-mainChannel
select {
case consumer1 <- v:
case consumer2 <- v:
case consumer3 <- v:
}
}
}()
// consumers(your logic)
go func() { <-consumer1 /* using the value */ }()
go func() { <-consumer2 /* using the value */ }()
go func() { <-consumer3 /* using the value */ }()
}
type payload int
var (
feeder1 = make(chan payload)
feeder2 = make(chan payload)
feeder3 = make(chan payload)
mainChannel = make(chan payload)
consumer1 = make(chan payload)
consumer2 = make(chan payload)
consumer3 = make(chan payload)
)
Channels are a point to point communication method, not a broadcast communication method, so no, you can't get both functions to get the event without doing something special.
You could have separate channels for both goroutines and send the message into each. This is probably the simplest solution.
Or alternatively you could get one goroutine to signal the next one.
Go has two mechanisms for doing broadcast signalling as far as I know. One is closing a channel. This only works a single time though.
The other is to use a sync.Cond lock. These are moderately tricky to use, but will allow you to have multiple goroutines woken up by a single event.
If I was you, I'd go for the first option, send the event to two different channels. That seems to map the problem quite well.