Limit the number of processed messages from channel - go

I recieve around 200 000 message/seconds over channel to my worker, I need to limit the number of messages I will send to the client to only 20 per second.
This make it 1 message per 50 milliseconds
And the worker will still alive during all the program lifetime with the help of the LOOP (and not opening a channel for each message).
My goal:
- Since the order of the messages is important, I want to skip all the messages that comes during that blocked 50ms and save only the latest one
- If the latest one come during the blocked 50ms, I want the saved message to be processed when the block time is over inside the loop and no new message coming ! <-- This is my problem
My strategy
- Keep sending the latest message that is not yet processed to the same channel
But the problem with it, what if that message is sent after a new message that comes (from the application) ?
The code below is more an algorythm as a working code, just want a tip/way on how to do it.
func example (new_message_from_channel <-chan *message) {
default = message
time = now_milliseconds
diff_accepted = 50milli
for this_message := range new_message_from_channel {
if now_millisecond - time >= diff_accepted {
send_it_to_the_client
time = now_milliseconds
} else {
//save the latest message
default = this_message
//My problem is how to process this latest message when the blocked 50ms is over and no new message coming ?!
//My strategy - keep sending it to the same channel
theChannel <- default
}
}
}
If you got an elegent way to do it, you are welcome to share it with me :)

Using a rate-limiter, you can create a throttle function which will take: a rate and a channel as input; and return two channels - one which includes all of the original channels items, the other only relaying items at a fixed rate:
func throttle(r time.Duration, in <-chan event) (C, tC <-chan event) {
// "writeable" channels
var (
wC = make(chan event)
wtC = make(chan event)
)
// read-only channels - returned to caller
C = wC
tC = wtC
go func() {
defer close(wC)
defer close(wtC)
rl := rate.NewLimiter(
rate.Every(r),
1,
)
// relays input channel's items to two channels:
// (1) gets all writes from original channel
// (2) only writes at a fixed frequency
for ev := range in {
wC <- ev
if rl.Allow() {
wtC <- ev
}
}
}()
return
}
Working example: https://play.golang.org/p/upei0TiyzNr
EDIT:
To avoid using a rate-limiter and instead use a simple time.Ticker:
tick := time.NewTicker(r)
for ev := range in {
select {
case wC <- ev: // write to main
case <-tick.C:
wC <- ev // write to main ...
wtC <- ev // ... plus throttle channel
}
}
Working example: https://play.golang.org/p/UTRXh72BvRl

Related

Go Concurrency Circular Logic

I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel. If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is appended onto the stack slice in the dispatcher and the dispatcher will try to send again later when a worker becomes available, and/or no more messages are received on the dispatchchan. This is because the dispatchchan and the jobchan are unbuffered, and the go routine the workers are running will append other messages to the dispatcher up to a certain point and I don't want the workers blocked waiting on the dispatcher and creating a deadlock. Here's the dispatcher code I've come up with so far:
func dispatch() {
var stack []string
acount := 0
for {
select {
case d := <-dispatchchan:
stack = append(stack, d)
case c := <-mw:
acount = acount + c
case jobchan <-stack[0]:
if len(stack) > 1 {
stack[0] = stack[len(stack)-1]
stack = stack[:len(stack)-1]
} else {
stack = nil
}
default:
if acount == 0 && len(stack) == 0 {
close(jobchan)
close(dispatchchan)
close(mw)
wg.Done()
return
}
}
}
Complete example at https://play.golang.wiki/p/X6kXVNUn5N7
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool. If the worker routine is doing [m]eaningful [w]ork it throws int 1 on the mw channel and when it finishes its work and goes back into the for loop listening to the jobchan it throws int -1 on the mw. This way the dispatcher knows if there's any work being done by the worker pool, or if the pool is idle. If the pool is idle and there are no more messages on the stack, then the dispatcher closes the channels and return control to the main func.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error. What I'm trying to figure out is how to ensure that when I hit that case, either stack[0] has a value in it or not. I don't want that case to send an empty string to the jobchan.
Any help is greatly appreciated. If there's a more idomatic concurrency pattern I should consider, I'd love to hear about it. I'm not 100% sold on this solution but this is the farthest I've gotten so far.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error.
I can't reproduce it with your playground link, but it's believable, because at lest one gofunc worker might have been ready to receive on that channel.
My output has been Msgcnt: 0, which is also easily explained, because gofunc might not have been ready to receive on jobschan when dispatch() runs its select. The order of these operations is not defined.
trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel
A channel needs no dispatcher. A channel is the dispatcher.
If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is [...] will [...] send again later when a worker becomes available, [...] or no more messages are received on the dispatchchan.
With a few creative edits, it was easy to turn that into something close to the definition of a buffered channel. It can be read from immediately, or it can take up to some "limit" of messages that can't be immediately dispatched. You do define limit, though it's not used elsewhere within your code.
In any function, defining a variable you don't read will result in a compile time error like limit declared but not used. This stricture improves code quality and helps identify typeos. But at package scope, you've gotten away with defining the unused limit as a "global" and thus avoided a useful error - you haven't limited anything.
Don't use globals. Use passed parameters to define scope, because the definition of scope is tantamount to functional concurrency as expressed with the go keyword. Pass the relevant channels defined in local scope to functions defined at package scope so that you can easily track their relationships. And use directional channels to enforce the producer/consumer relationship between your functions. More on this later.
Going back to "limit", it makes sense to limit the quantity of jobs you're queueing because all resources are limited, and accepting more messages than you have any expectation of processing requires more durable storage than process memory provides. If you don't feel obligated to fulfill those requests no matter what, don't accept "too many" of them in the first place.
So then, what function has dispatchchan and dispatch()? To store a limited number of pending requests, if any, before they can be processed, and then to send them to the next available worker? That's exactly what a buffered channel is for.
Circular Logic
Who "knows" when your program is done? main() provides the initial input, but you close all 3 channels in `dispatch():
close(jobchan)
close(dispatchchan)
close(mw)
Your workers write to their own job queue so only when the workers are done writing to it can the incoming job queue be closed. However, individual workers also don't know when to close the jobs queue because other workers are writing to it. Nobody knows when your algorithm is done. There's your circular logic.
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool.
There's a race condition here. Consider the case where all n workers have just received the last n jobs. They've each read from jobschan and they're checking the value of ok. disptatcher proceeds to run its select. Nobody is writing to dispatchchan or reading from jobschan right now so the default case is immediately matched. len(stack) is 0 and there's no current job so dispatcher closes all channels including mw. At some point thereafter, a worker tries to write to a closed channel and panics.
So finally I'm ready to provide some code, but I have one more problem: I don't have a clear problem statement to write code around.
I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel.
Channels between goroutines are like the teeth that synchronize gears. But to what end do the gears turn? You're not trying to keep time, nor construct a wind-up toy. Your gears could be made to turn, but what would success look like? Their turning?
Let's try to define a more specific use case for channels: given an arbitrarily long set of durations coming in as strings on standard input*, sleep that many seconds in one of n workers. So that we actually have a result to return, we'll say each worker will return the start and end time the duration was run for.
So that it can run in the playground, I'll simulate standard input with a hard-coded byte buffer.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
"sync"
"time"
)
type SleepResult struct {
worker_id int
duration time.Duration
start time.Time
end time.Time
}
func main() {
var num_workers = 2
workchan := make(chan time.Duration)
resultschan := make(chan SleepResult)
var wg sync.WaitGroup
var resultswg sync.WaitGroup
resultswg.Add(1)
go results(&resultswg, resultschan)
for i := 0; i < num_workers; i++ {
wg.Add(1)
go worker(i, &wg, workchan, resultschan)
}
// playground doesn't have stdin
var input = bytes.NewBufferString(
strings.Join([]string{
"3ms",
"1 seconds",
"3600ms",
"300 ms",
"5s",
"0.05min"}, "\n") + "\n")
var scanner = bufio.NewScanner(input)
for scanner.Scan() {
text := scanner.Text()
if dur, err := time.ParseDuration(text); err != nil {
fmt.Fprintln(os.Stderr, "Invalid duration", text)
} else {
workchan <- dur
}
}
close(workchan) // we know when our inputs are done
wg.Wait() // and when our jobs are done
close(resultschan)
resultswg.Wait()
}
func results(wg *sync.WaitGroup, resultschan <-chan SleepResult) {
for res := range resultschan {
fmt.Printf("Worker %d: %s : %s => %s\n",
res.worker_id, res.duration,
res.start.Format(time.RFC3339Nano), res.end.Format(time.RFC3339Nano))
}
wg.Done()
}
func worker(id int, wg *sync.WaitGroup, jobchan <-chan time.Duration, resultschan chan<- SleepResult) {
var res = SleepResult{worker_id: id}
for dur := range jobchan {
res.duration = dur
res.start = time.Now()
time.Sleep(res.duration)
res.end = time.Now()
resultschan <- res
}
wg.Done()
}
Here I use 2 wait groups, one for the workers, one for the results. This makes sure Im done writing all the results before main() ends. I keep my functions simple by having each function do exactly one thing at a time: main reads inputs, parses durations from them, and sends them off to the next worker. The results function collects results and prints them to standard output. The worker does the sleeping, reading from jobchan and writing to resultschan.
workchan can be buffered (or not, as in this case); it doesn't matter because the input will be read at the rate it can be processed. We can buffer as much input as we want, but we can't buffer an infinite amount. I've set channel sizes as big as 1e6 - but a million is a lot less than infinite. For my use case, I don't need to do any buffering at all.
main knows when the input is done and can close the jobschan. main also knows when jobs are done (wg.Wait()) and can close the results channel. Closing these channels is an important signal to the worker and results goroutines - they can distinguish between a channel that is empty and a channel that is guaranteed not to have any new additions.
for job := range jobchan {...} is shorthand for your more verbose:
for {
job, ok := <- jobchan
if !ok {
wg.Done()
return
}
...
}
Note that this code creates 2 workers, but it could create 20 or 2000, or even 1. The program functions regardless of how many workers are in the pool. It can handle any volume of input (though interminable input of course leads to an interminable program). It does not create a cyclic loop of output to input. If your use case requires jobs to create more jobs, that's a more challenging scenario that can typically be avoided with careful planning.
I hope this gives you some good ideas about how you can better use concurrency in your Go applications.
https://play.golang.wiki/p/cZuI9YXypxI

stacking data from go routines

I'm learning go lang and I'd like to create a go app to achieve the following:
receive data from a remote log
analyze some sort of error of warning
periodically send an HTTP request to a URL informing that everything is ok or send warn and error.
I've been reading about concurrency, parallelism and channels but I'm not sure how I should pass data from my logging goroutine to another routine with a timer to make the request. Should I declare a slice in another routine to receive all the messages and at the end fo timer iterate over it?
currently, my code looks like:
package main
import (
"fmt"
"log"
"strings"
"gopkg.in/mcuadros/go-syslog.v2"
)
func strigAnalyze(str string){
/*analyse the contents of the log message and do something*/
}
func main() {
channel := make(syslog.LogPartsChannel)
handler := syslog.NewChannelHandler(channel)
server := syslog.NewServer()
server.SetFormat(syslog.RFC3164)
server.SetHandler(handler)
server.ListenUDP("0.0.0.0:8888")
server.ListenTCP("0.0.0.0:8888")
server.Boot()
go func(channel syslog.LogPartsChannel) {
for logParts := range channel {
content := logParts["content"]
fmt.Println("logparts", logParts)
string := fmt.Sprintf("%v", content)
strigAnalyze(str)
}
}(channel)
server.Wait()
}
Should I declare a slice in another routine to receive all the
messages and at the end fo timer iterate over it?
This is one very common pattern in go. The example youre describe is sometimes called a "monitor routine". It guards the buffer of logs and because it "owns" them you know that they are safe from concurrent access.
The data is shared through the channel, the producer of the log data will be completely decoupled from how the sender is using it, all it needs to do is send on a channel. If the channel is unbuffered then your producer will block until the receiver can process. If you need to keep the producer high throughput you could buffer the channel or shed sends, which would look like:
select {
case logChan <- log:
...
default:
// chan is full shedding data.
}
This pattern also lends really well to a "receive" loop that for...selects over the input channel, the timer, and some sort of done/context. The following is not a working example and it is missing cancellation and logic but it shows how you can for...select over multiple channels (one of which is your timer/heartbeat):
logChan := make(chan string)
go func() {
var logBuf []string
t := time.NewTimer(time.Second * 5)
for {
select {
log, ok := <-logChan:
if !ok { return }
logBuf = append(logBuf, log)
<-t.C:
// timer up
// flush logs
// reset slice
}
}
}()
Also depending on how you are using the data, it might make more sense to use an actual buffer here instead of a slice.

How to create channels in loop?

I am learning concurrency in go and how it works.
What I am trying to do ?
Loop through slice of data
Create struct for required/needed data
Create channel for that struct
Call worker func using go rutine and pass that channel to that rutine
Using data from channel do some processing
Set the processed output back into channel
Wait in main thread to get output from all the channels which we kicked off
Code Which I tried
package main
import (
"fmt"
"github.com/pkg/errors"
"time"
)
type subject struct {
Name string
Class string
StartDate time.Time
EndDate time.Time
}
type workerData struct {
Subject string
Class string
Result string
Error error
}
func main () {
// Creating test data
var subjects []subject
st,_ := time.Parse("01/02/2016","01/01/2015")
et,_ := time.Parse("01/02/2016","01/01/2016")
s1 := subject{Name:"Math", Class:"3", StartDate:st,EndDate:et }
s2 := subject{Name:"Geo", Class:"3", StartDate:st,EndDate:et }
s3 := subject{Name:"Bio", Class:"3", StartDate:st,EndDate:et }
s4 := subject{Name:"Phy", Class:"3", StartDate:st,EndDate:et }
s5 := subject{Name:"Art", Class:"3", StartDate:st,EndDate:et }
subjects = append(subjects, s1)
subjects = append(subjects, s2)
subjects = append(subjects, s3)
subjects = append(subjects, s4)
subjects = append(subjects, s5)
c := make(chan workerData) // I am sure this is not how I should be creating channel
for i := 0 ; i< len(subjects) ; i++ {
go worker(c)
}
for _, v := range subjects {
// Setting required data in channel
data := workerData{Subject:v.Name, Class:v.Class}
// set the data and start the routine
c <- data // I think this will update data for all the routines ? SO how should create separate channel for each routine
}
// I want to wait till all the routines set the data in channel and return the data from workers.
for {
select {
case data := <- c :
fmt.Println(data)
}
}
}
func worker (c chan workerData) {
data := <- c
// This can be any processing
time.Sleep(100 * time.Millisecond)
if data.Subject != "Math" {
data.Result = "Pass"
} else {
data.Error = errors.New("Subject not found")
}
fmt.Println(data.Subject)
// returning processed data and error to channel
c <- data
// Rightfully this closes channel and here after I get error send on Closed channel.
close(c)
}
Playgorund Link - https://play.golang.org/p/hs1-B1UR98r
Issue I am Facing
I am not sure how to create different channel for each data item. The way I am currently doing will update the channel data for all routines. I want to know is there way to create diffrent channel for each data item in loop and pass that to the go rutine. And then wait in main rutine to get the result back from rutines from all channels.
Any pointers/ help would be great ? If any confusion feel free to comment.
"// I think this will update data for all the routines ?"
A channel (to simplify) is not a data structure to store data.
It is a structure to send and receive data over different goroutines.
As such, notice that your worker function is doing send and receive on the same channel within each goroutine instances. If you were having only one instance of such worker, this would deadlock (https://golang.org/doc/articles/race_detector.html).
In the version of the code you posted, for a beginner this might seem to work because you have many workers exchanging works to each other. But it is wrong for a correct program.
As a consequence, if a worker can not read and write the same channel, then it must consume a specific writable channel to send its results to some other routines.
// I want to wait till all the routines set the data in channel and
return the data from workers.
This is part of the synchronization mechanisms required to ensure that a pusher waits until all its workers has finished their job before proceeding further. (this blog post talks about it https://medium.com/golangspec/synchronized-goroutines-part-i-4fbcdd64a4ec)
// Rightfully this closes channel and here after I get error send on
Closed channel.
Take care that you have n routines of workers executing in parallel. The first of this worker to reach the end of its function will close the channel, making it unwritable to other workers, and false signaling its end to main.
Normally one use the close statement on the writer side to indicate that there is no more data into the channel. To indicate it has ended. This signal is consumed by readers to quit their read-wait operation of the channel.
As an example, lets review this loop
for {
select {
case data := <- c :
fmt.Println(data)
}
}
it is bad, really bad.
It is an infinite loop with no exit statement
The select is superfluous and does not contain exit statement, remember that a read on a channel is a blocking operation.
It is a bad rewrite of a standard pattern provided by the language, the range loop over a channel
The range loop over a channel is very simply written
for data := range c {
fmt.Println(data)
}
This pattern has one great advantage, it automatically detect a closed channel to exit the loop! letting you loop over only the relevant data to process. It is also much more succint.
Also, your worker is a awkward in that it read and write only one element before quitting.
Spawning go routines is cheap, but not free. You should always evaluate the trade-off between the costs of async processing and its actual workload.
Overall, your code should be closer to what is demonstrated here
https://gobyexample.com/worker-pools

How do I properly post a message to a simple Go Chatserver using REST API

I am currently building a simple chat server that supports posting messages through a REST API.
example:
========
```
curl -X POST -H "Content-Type: application/json" --data '{"user":"alex", "text":"this is a message"}' http://localhost:8081/message
{
"ok": true
}
Right now, I'm just currently storing the messages in an array of messages. I'm pretty sure this is an inefficient way. So is there a simple, better way to get and store the messages using goroutines and channels that will make it thread-safe.
Here is what I currently have:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
Messages = append(Messages, m)
}
Thanks!
It could be made thread safe using mutex, as #ThunderCat suggested but I think this does not add concurrency. If two or more requests are made simultaneously, one will have to wait for the other to complete first, slowing the server down.
Adding Concurrency: You make it faster and handle more concurrent request by using a queue (which is a Go channel) and a worker that listens on that channel - it'll be a simple implementation. Every time a message comes in through a Post request, you add to the queue (this is instantaneous and the HTTP response can be sent immediately). In another goroutine, you detect that a message has been added to the queue, you take it out append it to your Messages slice. While you're appending to Messages, the HTTP requests don't have to wait.
Note: You can make it even better by having multiple goroutines listen on the queue, but we can leave that for later.
This is how the code will somewhat look like:
type Message struct {
Text string
User string
Timestamp time.Time
}
var Messages = []Message{}
// messageQueue is the queue that holds new messages until they are processed
var messageQueue chan Message
func init() { // need the init function to initialize the channel, and the listeners
// initialize the queue, choosing the buffer size as 8 (number of messages the channel can hold at once)
messageQueue = make(chan Message, 8)
// start a goroutine that listens on the queue/channel for new messages
go listenForMessages()
}
func listenForMessages() {
// whenever we detect a message in the queue, append it to Messages
for m := range messageQueue {
Messages = append(Messages, m)
}
}
func messagePost(c http.ResponseWriter, req *http.Request){
decoder := json.NewDecoder(req.Body)
var m Message
err := decoder.Decode(&m)
if err != nil {
panic(err)
}
if m.Timestamp == (time.Time{}) {
m.Timestamp = time.Now()
}
addUser(m.User)
// add the message to the channel, it'll only wait if the channel is full
messageQueue <- m
}
Storing Messages: As other users have suggested, storing messages in memory may not be the right choice since the messages won't persist if the application is restarted. If you're working on a small, proof-of-concept type project and don't want to figure out the DB, you could save the Messages variable as a flat file on the server and then read from it every time the application starts (*Note: this should not be done on a production system, of course, for that you should set up a Database). But yeah, database should be the way to go.
Use a mutex to make the program threadsafe.
var Messages = []Message{}
var messageMu sync.Mutex
...
messageMu.Lock()
Messages = append(Messages, m)
messageMu.Unlock()
There's no need to use channels and goroutines to make the program threadsafe.
A database is probably a better choice for storing messages than the in memory slice used in the question. Asking how to use a database to implement a chat program is too broad a question for SO.

Go, How do I pull X messages from a channel at a time

I have a channel with incoming messages and a go routine that waits on it
I process these messages and send them to a different server
I would like to either process 100 messages at a time if they are ready,
or after say 5 seconds process what ever is in there and go wait again
How do I do that in Go
The routine you use to read from the message channel should define a cache in which incoming messages are stored. These cached messages are then sent to the remote server in bulk either when the cache reaches 100 messages, or 5 seconds have passed. You use a timer channel and Go's select statement to determine which one occurs first.
The following example can be run on the Go playground
package main
import (
"fmt"
"math/rand"
"time"
)
type Message int
const (
CacheLimit = 100
CacheTimeout = 5 * time.Second
)
func main() {
input := make(chan Message, CacheLimit)
go poll(input)
generate(input)
}
// poll checks for incoming messages and caches them internally
// until either a maximum amount is reached, or a timeout occurs.
func poll(input <-chan Message) {
cache := make([]Message, 0, CacheLimit)
tick := time.NewTicker(CacheTimeout)
for {
select {
// Check if a new messages is available.
// If so, store it and check if the cache
// has exceeded its size limit.
case m := <-input:
cache = append(cache, m)
if len(cache) < CacheLimit {
break
}
// Reset the timeout ticker.
// Otherwise we will get too many sends.
tick.Stop()
// Send the cached messages and reset the cache.
send(cache)
cache = cache[:0]
// Recreate the ticker, so the timeout trigger
// remains consistent.
tick = time.NewTicker(CacheTimeout)
// If the timeout is reached, send the
// current message cache, regardless of
// its size.
case <-tick.C:
send(cache)
cache = cache[:0]
}
}
}
// send sends cached messages to a remote server.
func send(cache []Message) {
if len(cache) == 0 {
return // Nothing to do here.
}
fmt.Printf("%d message(s) pending\n", len(cache))
}
// generate creates some random messages and pushes them into the given channel.
//
// Not part of the solution. This just simulates whatever you use to create
// the messages by creating a new message at random time intervals.
func generate(input chan<- Message) {
for {
select {
case <-time.After(time.Duration(rand.Intn(100)) * time.Millisecond):
input <- Message(rand.Int())
}
}
}

Resources