stacking data from go routines - go

I'm learning go lang and I'd like to create a go app to achieve the following:
receive data from a remote log
analyze some sort of error of warning
periodically send an HTTP request to a URL informing that everything is ok or send warn and error.
I've been reading about concurrency, parallelism and channels but I'm not sure how I should pass data from my logging goroutine to another routine with a timer to make the request. Should I declare a slice in another routine to receive all the messages and at the end fo timer iterate over it?
currently, my code looks like:
package main
import (
"fmt"
"log"
"strings"
"gopkg.in/mcuadros/go-syslog.v2"
)
func strigAnalyze(str string){
/*analyse the contents of the log message and do something*/
}
func main() {
channel := make(syslog.LogPartsChannel)
handler := syslog.NewChannelHandler(channel)
server := syslog.NewServer()
server.SetFormat(syslog.RFC3164)
server.SetHandler(handler)
server.ListenUDP("0.0.0.0:8888")
server.ListenTCP("0.0.0.0:8888")
server.Boot()
go func(channel syslog.LogPartsChannel) {
for logParts := range channel {
content := logParts["content"]
fmt.Println("logparts", logParts)
string := fmt.Sprintf("%v", content)
strigAnalyze(str)
}
}(channel)
server.Wait()
}

Should I declare a slice in another routine to receive all the
messages and at the end fo timer iterate over it?
This is one very common pattern in go. The example youre describe is sometimes called a "monitor routine". It guards the buffer of logs and because it "owns" them you know that they are safe from concurrent access.
The data is shared through the channel, the producer of the log data will be completely decoupled from how the sender is using it, all it needs to do is send on a channel. If the channel is unbuffered then your producer will block until the receiver can process. If you need to keep the producer high throughput you could buffer the channel or shed sends, which would look like:
select {
case logChan <- log:
...
default:
// chan is full shedding data.
}
This pattern also lends really well to a "receive" loop that for...selects over the input channel, the timer, and some sort of done/context. The following is not a working example and it is missing cancellation and logic but it shows how you can for...select over multiple channels (one of which is your timer/heartbeat):
logChan := make(chan string)
go func() {
var logBuf []string
t := time.NewTimer(time.Second * 5)
for {
select {
log, ok := <-logChan:
if !ok { return }
logBuf = append(logBuf, log)
<-t.C:
// timer up
// flush logs
// reset slice
}
}
}()
Also depending on how you are using the data, it might make more sense to use an actual buffer here instead of a slice.

Related

Go Concurrency Circular Logic

I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel. If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is appended onto the stack slice in the dispatcher and the dispatcher will try to send again later when a worker becomes available, and/or no more messages are received on the dispatchchan. This is because the dispatchchan and the jobchan are unbuffered, and the go routine the workers are running will append other messages to the dispatcher up to a certain point and I don't want the workers blocked waiting on the dispatcher and creating a deadlock. Here's the dispatcher code I've come up with so far:
func dispatch() {
var stack []string
acount := 0
for {
select {
case d := <-dispatchchan:
stack = append(stack, d)
case c := <-mw:
acount = acount + c
case jobchan <-stack[0]:
if len(stack) > 1 {
stack[0] = stack[len(stack)-1]
stack = stack[:len(stack)-1]
} else {
stack = nil
}
default:
if acount == 0 && len(stack) == 0 {
close(jobchan)
close(dispatchchan)
close(mw)
wg.Done()
return
}
}
}
Complete example at https://play.golang.wiki/p/X6kXVNUn5N7
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool. If the worker routine is doing [m]eaningful [w]ork it throws int 1 on the mw channel and when it finishes its work and goes back into the for loop listening to the jobchan it throws int -1 on the mw. This way the dispatcher knows if there's any work being done by the worker pool, or if the pool is idle. If the pool is idle and there are no more messages on the stack, then the dispatcher closes the channels and return control to the main func.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error. What I'm trying to figure out is how to ensure that when I hit that case, either stack[0] has a value in it or not. I don't want that case to send an empty string to the jobchan.
Any help is greatly appreciated. If there's a more idomatic concurrency pattern I should consider, I'd love to hear about it. I'm not 100% sold on this solution but this is the farthest I've gotten so far.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error.
I can't reproduce it with your playground link, but it's believable, because at lest one gofunc worker might have been ready to receive on that channel.
My output has been Msgcnt: 0, which is also easily explained, because gofunc might not have been ready to receive on jobschan when dispatch() runs its select. The order of these operations is not defined.
trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel
A channel needs no dispatcher. A channel is the dispatcher.
If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is [...] will [...] send again later when a worker becomes available, [...] or no more messages are received on the dispatchchan.
With a few creative edits, it was easy to turn that into something close to the definition of a buffered channel. It can be read from immediately, or it can take up to some "limit" of messages that can't be immediately dispatched. You do define limit, though it's not used elsewhere within your code.
In any function, defining a variable you don't read will result in a compile time error like limit declared but not used. This stricture improves code quality and helps identify typeos. But at package scope, you've gotten away with defining the unused limit as a "global" and thus avoided a useful error - you haven't limited anything.
Don't use globals. Use passed parameters to define scope, because the definition of scope is tantamount to functional concurrency as expressed with the go keyword. Pass the relevant channels defined in local scope to functions defined at package scope so that you can easily track their relationships. And use directional channels to enforce the producer/consumer relationship between your functions. More on this later.
Going back to "limit", it makes sense to limit the quantity of jobs you're queueing because all resources are limited, and accepting more messages than you have any expectation of processing requires more durable storage than process memory provides. If you don't feel obligated to fulfill those requests no matter what, don't accept "too many" of them in the first place.
So then, what function has dispatchchan and dispatch()? To store a limited number of pending requests, if any, before they can be processed, and then to send them to the next available worker? That's exactly what a buffered channel is for.
Circular Logic
Who "knows" when your program is done? main() provides the initial input, but you close all 3 channels in `dispatch():
close(jobchan)
close(dispatchchan)
close(mw)
Your workers write to their own job queue so only when the workers are done writing to it can the incoming job queue be closed. However, individual workers also don't know when to close the jobs queue because other workers are writing to it. Nobody knows when your algorithm is done. There's your circular logic.
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool.
There's a race condition here. Consider the case where all n workers have just received the last n jobs. They've each read from jobschan and they're checking the value of ok. disptatcher proceeds to run its select. Nobody is writing to dispatchchan or reading from jobschan right now so the default case is immediately matched. len(stack) is 0 and there's no current job so dispatcher closes all channels including mw. At some point thereafter, a worker tries to write to a closed channel and panics.
So finally I'm ready to provide some code, but I have one more problem: I don't have a clear problem statement to write code around.
I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel.
Channels between goroutines are like the teeth that synchronize gears. But to what end do the gears turn? You're not trying to keep time, nor construct a wind-up toy. Your gears could be made to turn, but what would success look like? Their turning?
Let's try to define a more specific use case for channels: given an arbitrarily long set of durations coming in as strings on standard input*, sleep that many seconds in one of n workers. So that we actually have a result to return, we'll say each worker will return the start and end time the duration was run for.
So that it can run in the playground, I'll simulate standard input with a hard-coded byte buffer.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
"sync"
"time"
)
type SleepResult struct {
worker_id int
duration time.Duration
start time.Time
end time.Time
}
func main() {
var num_workers = 2
workchan := make(chan time.Duration)
resultschan := make(chan SleepResult)
var wg sync.WaitGroup
var resultswg sync.WaitGroup
resultswg.Add(1)
go results(&resultswg, resultschan)
for i := 0; i < num_workers; i++ {
wg.Add(1)
go worker(i, &wg, workchan, resultschan)
}
// playground doesn't have stdin
var input = bytes.NewBufferString(
strings.Join([]string{
"3ms",
"1 seconds",
"3600ms",
"300 ms",
"5s",
"0.05min"}, "\n") + "\n")
var scanner = bufio.NewScanner(input)
for scanner.Scan() {
text := scanner.Text()
if dur, err := time.ParseDuration(text); err != nil {
fmt.Fprintln(os.Stderr, "Invalid duration", text)
} else {
workchan <- dur
}
}
close(workchan) // we know when our inputs are done
wg.Wait() // and when our jobs are done
close(resultschan)
resultswg.Wait()
}
func results(wg *sync.WaitGroup, resultschan <-chan SleepResult) {
for res := range resultschan {
fmt.Printf("Worker %d: %s : %s => %s\n",
res.worker_id, res.duration,
res.start.Format(time.RFC3339Nano), res.end.Format(time.RFC3339Nano))
}
wg.Done()
}
func worker(id int, wg *sync.WaitGroup, jobchan <-chan time.Duration, resultschan chan<- SleepResult) {
var res = SleepResult{worker_id: id}
for dur := range jobchan {
res.duration = dur
res.start = time.Now()
time.Sleep(res.duration)
res.end = time.Now()
resultschan <- res
}
wg.Done()
}
Here I use 2 wait groups, one for the workers, one for the results. This makes sure Im done writing all the results before main() ends. I keep my functions simple by having each function do exactly one thing at a time: main reads inputs, parses durations from them, and sends them off to the next worker. The results function collects results and prints them to standard output. The worker does the sleeping, reading from jobchan and writing to resultschan.
workchan can be buffered (or not, as in this case); it doesn't matter because the input will be read at the rate it can be processed. We can buffer as much input as we want, but we can't buffer an infinite amount. I've set channel sizes as big as 1e6 - but a million is a lot less than infinite. For my use case, I don't need to do any buffering at all.
main knows when the input is done and can close the jobschan. main also knows when jobs are done (wg.Wait()) and can close the results channel. Closing these channels is an important signal to the worker and results goroutines - they can distinguish between a channel that is empty and a channel that is guaranteed not to have any new additions.
for job := range jobchan {...} is shorthand for your more verbose:
for {
job, ok := <- jobchan
if !ok {
wg.Done()
return
}
...
}
Note that this code creates 2 workers, but it could create 20 or 2000, or even 1. The program functions regardless of how many workers are in the pool. It can handle any volume of input (though interminable input of course leads to an interminable program). It does not create a cyclic loop of output to input. If your use case requires jobs to create more jobs, that's a more challenging scenario that can typically be avoided with careful planning.
I hope this gives you some good ideas about how you can better use concurrency in your Go applications.
https://play.golang.wiki/p/cZuI9YXypxI

How to create channels in loop?

I am learning concurrency in go and how it works.
What I am trying to do ?
Loop through slice of data
Create struct for required/needed data
Create channel for that struct
Call worker func using go rutine and pass that channel to that rutine
Using data from channel do some processing
Set the processed output back into channel
Wait in main thread to get output from all the channels which we kicked off
Code Which I tried
package main
import (
"fmt"
"github.com/pkg/errors"
"time"
)
type subject struct {
Name string
Class string
StartDate time.Time
EndDate time.Time
}
type workerData struct {
Subject string
Class string
Result string
Error error
}
func main () {
// Creating test data
var subjects []subject
st,_ := time.Parse("01/02/2016","01/01/2015")
et,_ := time.Parse("01/02/2016","01/01/2016")
s1 := subject{Name:"Math", Class:"3", StartDate:st,EndDate:et }
s2 := subject{Name:"Geo", Class:"3", StartDate:st,EndDate:et }
s3 := subject{Name:"Bio", Class:"3", StartDate:st,EndDate:et }
s4 := subject{Name:"Phy", Class:"3", StartDate:st,EndDate:et }
s5 := subject{Name:"Art", Class:"3", StartDate:st,EndDate:et }
subjects = append(subjects, s1)
subjects = append(subjects, s2)
subjects = append(subjects, s3)
subjects = append(subjects, s4)
subjects = append(subjects, s5)
c := make(chan workerData) // I am sure this is not how I should be creating channel
for i := 0 ; i< len(subjects) ; i++ {
go worker(c)
}
for _, v := range subjects {
// Setting required data in channel
data := workerData{Subject:v.Name, Class:v.Class}
// set the data and start the routine
c <- data // I think this will update data for all the routines ? SO how should create separate channel for each routine
}
// I want to wait till all the routines set the data in channel and return the data from workers.
for {
select {
case data := <- c :
fmt.Println(data)
}
}
}
func worker (c chan workerData) {
data := <- c
// This can be any processing
time.Sleep(100 * time.Millisecond)
if data.Subject != "Math" {
data.Result = "Pass"
} else {
data.Error = errors.New("Subject not found")
}
fmt.Println(data.Subject)
// returning processed data and error to channel
c <- data
// Rightfully this closes channel and here after I get error send on Closed channel.
close(c)
}
Playgorund Link - https://play.golang.org/p/hs1-B1UR98r
Issue I am Facing
I am not sure how to create different channel for each data item. The way I am currently doing will update the channel data for all routines. I want to know is there way to create diffrent channel for each data item in loop and pass that to the go rutine. And then wait in main rutine to get the result back from rutines from all channels.
Any pointers/ help would be great ? If any confusion feel free to comment.
"// I think this will update data for all the routines ?"
A channel (to simplify) is not a data structure to store data.
It is a structure to send and receive data over different goroutines.
As such, notice that your worker function is doing send and receive on the same channel within each goroutine instances. If you were having only one instance of such worker, this would deadlock (https://golang.org/doc/articles/race_detector.html).
In the version of the code you posted, for a beginner this might seem to work because you have many workers exchanging works to each other. But it is wrong for a correct program.
As a consequence, if a worker can not read and write the same channel, then it must consume a specific writable channel to send its results to some other routines.
// I want to wait till all the routines set the data in channel and
return the data from workers.
This is part of the synchronization mechanisms required to ensure that a pusher waits until all its workers has finished their job before proceeding further. (this blog post talks about it https://medium.com/golangspec/synchronized-goroutines-part-i-4fbcdd64a4ec)
// Rightfully this closes channel and here after I get error send on
Closed channel.
Take care that you have n routines of workers executing in parallel. The first of this worker to reach the end of its function will close the channel, making it unwritable to other workers, and false signaling its end to main.
Normally one use the close statement on the writer side to indicate that there is no more data into the channel. To indicate it has ended. This signal is consumed by readers to quit their read-wait operation of the channel.
As an example, lets review this loop
for {
select {
case data := <- c :
fmt.Println(data)
}
}
it is bad, really bad.
It is an infinite loop with no exit statement
The select is superfluous and does not contain exit statement, remember that a read on a channel is a blocking operation.
It is a bad rewrite of a standard pattern provided by the language, the range loop over a channel
The range loop over a channel is very simply written
for data := range c {
fmt.Println(data)
}
This pattern has one great advantage, it automatically detect a closed channel to exit the loop! letting you loop over only the relevant data to process. It is also much more succint.
Also, your worker is a awkward in that it read and write only one element before quitting.
Spawning go routines is cheap, but not free. You should always evaluate the trade-off between the costs of async processing and its actual workload.
Overall, your code should be closer to what is demonstrated here
https://gobyexample.com/worker-pools

How do I properly post a message to a simple Go Chatserver using REST API

I am currently building a simple chat server that supports posting messages through a REST API.
example:
========
```
curl -X POST -H "Content-Type: application/json" --data '{"user":"alex", "text":"this is a message"}' http://localhost:8081/message
{
"ok": true
}
Right now, I'm just currently storing the messages in an array of messages. I'm pretty sure this is an inefficient way. So is there a simple, better way to get and store the messages using goroutines and channels that will make it thread-safe.
Here is what I currently have:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
Messages = append(Messages, m)
}
Thanks!
It could be made thread safe using mutex, as #ThunderCat suggested but I think this does not add concurrency. If two or more requests are made simultaneously, one will have to wait for the other to complete first, slowing the server down.
Adding Concurrency: You make it faster and handle more concurrent request by using a queue (which is a Go channel) and a worker that listens on that channel - it'll be a simple implementation. Every time a message comes in through a Post request, you add to the queue (this is instantaneous and the HTTP response can be sent immediately). In another goroutine, you detect that a message has been added to the queue, you take it out append it to your Messages slice. While you're appending to Messages, the HTTP requests don't have to wait.
Note: You can make it even better by having multiple goroutines listen on the queue, but we can leave that for later.
This is how the code will somewhat look like:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
// messageQueue is the queue that holds new messages until they are processed
var messageQueue chan Message
func init() { // need the init function to initialize the channel, and the listeners
// initialize the queue, choosing the buffer size as 8 (number of messages the channel can hold at once)
messageQueue = make(chan Message, 8)
// start a goroutine that listens on the queue/channel for new messages
go listenForMessages()
}
func listenForMessages() {
// whenever we detect a message in the queue, append it to Messages
for m := range messageQueue {
Messages = append(Messages, m)
}
}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
// add the message to the channel, it'll only wait if the channel is full
messageQueue <- m
}
Storing Messages: As other users have suggested, storing messages in memory may not be the right choice since the messages won't persist if the application is restarted. If you're working on a small, proof-of-concept type project and don't want to figure out the DB, you could save the Messages variable as a flat file on the server and then read from it every time the application starts (*Note: this should not be done on a production system, of course, for that you should set up a Database). But yeah, database should be the way to go.
Use a mutex to make the program threadsafe.
var Messages = []Message{}
var messageMu sync.Mutex
...
messageMu.Lock()
Messages = append(Messages, m)
messageMu.Unlock()
There's no need to use channels and goroutines to make the program threadsafe.
A database is probably a better choice for storing messages than the in memory slice used in the question. Asking how to use a database to implement a chat program is too broad a question for SO.

How to block all goroutines except the one running

I have two (but later I'll be three) go routines that are handling incoming messages from a remote server (from a ampq channel). But because they are handling on the same data/state, I want to block all other go routines, except the one running.
I come up with a solution to use chan bool where each go routine blocks and then release it, the code is like:
package main
func a(deliveries <-chan amqp, handleDone chan bool) {
for d := range deliveries {
<-handleDone // Data comes always, wait for other channels
handleDone <- false // Block other channels
// Do stuff with data...
handleDone <- true // I'm done, other channels are free to do anything
}
}
func b(deliveries <-chan amqp, handleDone chan bool) {
for d := range deliveries {
<-handleDone
handleDone <- false
// Do stuff with data...
handleDone <- true
}
}
func main() {
handleDone := make(chan bool, 1)
go a(arg1, handleDone)
go b(arg2, handleDone)
// go c(arg3, handleDone) , later
handleDone <- true // kickstart
}
But for the first time each of the function will get handleDone <- true, which they will be executed. And later if I add another third function, things will get more complicated. How can block all other go routines except the running? Any other better solutions?
You want to look at the sync package.
http://golang.org/pkg/sync/
You would do this with a mutex.
If you have an incoming stream of messages and you have three goroutines listening on that stream and processing and you want to ensure that only one goroutine is running at a time, the solution is quite simple: kill off two of the goroutines.
You're spinning up concurrency and adding complexity and then trying to prevent them from running concurrently. The end result is the same as a single stream reader, but with lots of things that can go wrong.
I'm puzzled why you want this - why can't each message on deliveries be handled independently? and why are there two different functions handling those message? If each is responsible for a particular type of message, it seems like you want one deliveries receiver that dispatches to appropriate logic for the type.
But to answer your question, I don't think it's true that each function will get a true from handleDone on start. One (let's say it's a) is receiving the true sent from main; the other (b then) is getting the false sent from the first. Because you're discarding the value received, you can't tell this. And then both are running, and you're using a buffered channel (you probably want make(chan bool) instead for an unbuffered one), so confusion ensues, particularly when you add that third goroutine.
The handleDone <- false doesn't actually accomplish anything. Just treat any value on handleDone as the baton in a relay race. Once a goroutine receives this value, it can do its thing; when it's done, it should send it to the channel to hand it to the next goroutine.

More idiomatic way of adding channel result to queue on completion

So, right now, I just pass a pointer to a Queue object (implementation doesn't really matter) and call queue.add(result) at the end of goroutines that should add things to the queue.
I need that same sort of functionality—and of course doing a loop checking completion with the comma ok syntax is unacceptable in terms of performance versus the simple queue add function call.
Is there a way to do this better, or not?
There are actually two parts to your question: how does one queue data in Go, and how does one use a channel without blocking.
For the first part, it sounds like what you need to do is instead of using the channel to add things to the queue, use the channel as a queue. For example:
var (
ch = make(chan int) // You can add an int parameter to this make call to create a buffered channel
// Do not buffer these channels!
gFinished = make(chan bool)
processFinished = make(chan bool)
)
func f() {
go g()
for {
// send values over ch here...
}
<-gFinished
close(ch)
}
func g() {
// create more expensive objects...
gFinished <- true
}
func processObjects() {
for val := range ch {
// Process each val here
}
processFinished <- true
}
func main() {
go processObjects()
f()
<-processFinished
}
As for how you can make this more asynchronous, you can (as cthom06 pointed out) pass a second integer to the make call in the second line which will make send operations asynchronous until the channel's buffer is full.
EDIT: However (as cthom06 also pointed out), because you have two goroutines writing to the channel, one of them has to be responsible for closing the channel. Also, my previous revision would exit before processObjects could complete. The way I chose to synchronize the goroutines is by creating a couple more channels that pass around dummy values to ensure that the cleanup gets finished properly. Those channels are specifically unbuffered so that the sends happen in lock-step.

Resources