Scheduled polling task in Go - go

I have written some code that will concurrently poll URLs every 30 minutes:
func (obj * MyObj) Poll() {
for ;; {
for _, url := range obj.UrlList {
//Download the current contents of the URL and do something with it
}
time.Sleep(30 * time.Minute)
}
//Start the routine in another function
go obj.Poll()
How would I then add to obj.UrlList elsewhere in the code and ensure that the next time the URLs are polled that the UrlList in the Poll goroutine as also been updated and as such will also poll the new URL?
I understand that memory is shared through communicating rather than vice versa in Go and I've investigated channels however I'm not sure how to implement them in this example.

Here's an untested, but safe model for periodically fetching some URLs with the ability to dynamically add new URLs to the list of URLs safely. It should be obvious to the reader what would be required if you wanted to remove a URL as well.
type harvester struct {
ticker *time.Ticker // periodic ticker
add chan string // new URL channel
urls []string // current URLs
}
func newHarvester() *harvester {
rv := &harvester{
ticker: time.NewTicker(time.Minute * 30),
add: make(chan string),
}
go rv.run()
return rv
}
func (h *harvester) run() {
for {
select {
case <-h.ticker.C:
// When the ticker fires, it's time to harvest
for _, u := range h.urls {
harvest(u)
}
case u := <-h.add:
// At any time (other than when we're harvesting),
// we can process a request to add a new URL
h.urls = append(h.urls, u)
}
}
}
func (h *harvester) AddURL(u string) {
// Adding a new URL is as simple as tossing it onto a channel.
h.add <- u
}

If you need to poll at regular periodic intervals, you should not use time.Sleep but a time.Ticker instead (or relative like time.After). The reason is that a sleep is just a sleep and takes no account of drift due to the real work you did in your loop. Conversely, a Ticker has a separate goroutine and a channel, which together are able to send you regular events and thereby cause something useful to happen.
Here's an example that's similar to yours. I put in a random jitter to illustrate the benefit of using a Ticker.
package main
import (
"fmt"
"time"
"math/rand"
)
func Poll() {
r := rand.New(rand.NewSource(99))
c := time.Tick(10 * time.Second)
for _ = range c {
//Download the current contents of the URL and do something with it
fmt.Printf("Grab at %s\n", time.Now())
// add a bit of jitter
jitter := time.Duration(r.Int31n(5000)) * time.Millisecond
time.Sleep(jitter)
}
}
func main() {
//go obj.Poll()
Poll()
}
When I ran this, I found that it kept to a strict 10-second cycle in spite of the jitter.

// Type with queue through a channel.
type MyType struct {
queue chan []*net.URL
}
func (t *MyType) poll() {
for urls := range t.queue {
...
time.Sleep(30 * time.Minute)
}
}
// Create instance with buffered queue.
t := MyType{make(chan []*net.URL, 25)}
go t.Poll()

Related

How to handle multiple goroutines that share the same channel

I've been searching a lot but could not find an answer for my problem yet.
I need to make multiple calls to an external API, but with different parameters concurrently.
And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.
First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.
The function that sends the streamed response from a single dataset.
(I've cut useless parts of code that are out of context)
func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
for {
line, err := reader.ReadBytes('\n')
s := string(line)
if err != nil {
log.Println("End of %s", dataset)
return err
}
data, err := extractDataFromStreamLine(s, dataset)
if err != nil {
continue
}
c <- *data
}
}
The function that will process the incoming data
func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
for {
data, more := <-ch
if !more {
break
}
// data contains {dataset string, value float64, date time.Time}
// The s.Parameter needs to match the dataset
// IMPORTANT PART, checks if the received data is part of this struct dataset
// If not I want to send it to another go routine until it gets to the correct
one. There will be a max of 4 datasets but still this could not be the best approach to have
if !api.GetDataset(s.Parameter, data.Dataset) {
requeue <- data
continue
}
// Do stuff with the data from this point
}
}
Now on my own API endpoint I have the following:
ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)
var wg sync.WaitGroup
for _, s := range args.Parameters.Strikes {
strike := strike.StrikePerYear{
Parameter: strike.Parameter(s.Dataset),
StrikeValue: s.Value,
}
// I receive and process the data in here
go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
}
go func() {
for {
data, _ := <-requeue
ch <- data
}
}()
// Creates a goroutine for each dataset
for _, dataset := range api.Params.Dataset {
wg.Add(1)
go api.RequestData(ctx, ch, dataset, &wg)
}
wg.Wait()
close(ch)
//Once the data is all processed it is all appended
var strikes []strike.StrikePerYearResponse
for range args.Fetch.Datasets {
strikes = append(strikes, <-final)
}
return strikes
The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.
My two questions are:
Why is the requeue blocking if it has a goroutine always ready to receive?
Should I take a different approach on how I'm processing the incoming data?
this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:
import (
"fmt"
"sync"
)
// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel
var (
finalResult = make(chan string)
)
// IData use for message dispatcher that all struct must implement its method
type IData interface {
IsThisForMe() bool
Process(*sync.WaitGroup)
}
//MainData can be your main struct like StrikePerYear
type MainData struct {
// add any props
Id int
Name string
}
type DataTyp1 struct {
MainData *MainData
}
func (d DataTyp1) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 2 {
return true
}
return false
}
func (d DataTyp1) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp1"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
type DataTyp2 struct {
MainData *MainData
}
func (d DataTyp2) IsThisForMe() bool {
// you can check your condition here to checking incoming data
if d.MainData.Id == 3 {
return true
}
return false
}
func (d DataTyp2) Process(wg *sync.WaitGroup) {
d.MainData.Name = "processed by DataTyp2"
// send result to final channel, you can change it as you want
finalResult <- d.MainData.Name
wg.Done()
}
//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
// based on your requirements you can remove this go routing or not
go func() {
var p IData
p = DataTyp1{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
p = DataTyp2{incomingData}
if p.IsThisForMe() {
go p.Process(wg)
return
}
}()
}
func main() {
dummyDataArray := []MainData{
MainData{Id: 2, Name: "this data #2"},
MainData{Id: 3, Name: "this data #3"},
}
wg := sync.WaitGroup{}
for i := range dummyDataArray {
wg.Add(1)
dispatcher(&dummyDataArray[i], &wg)
}
result := make([]string, 0)
done := make(chan struct{})
// data collector
go func() {
loop:for {
select {
case <-done:
break loop
case r := <-finalResult:
result = append(result, r)
}
}
}()
wg.Wait()
done<- struct{}{}
for _, s := range result {
fmt.Println(s)
}
}
Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.

How to check how many times a function has been called in a test?

I'm working on a web server with an API which calls a function. This function does a heavy job and caches the result. I've designed it in a way that if there is no cache and multiple users call this API with the same parameters at the same time, the server call that function only once for the first request and all other requests wait for finishing the job and return response from the cache.
I've written a test for it in this way:
func TestConcurentRequests(t *testing.T) {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
// do the request
wg.Done()
}
}
wg.Wait()
}
I can check that my code works correctly by printing a value inside that heavy function and check the console to see if it appears only once, but I'm searching for a way to fail the test if this function has been called more times. Something like this:
if n := NumberOfCalls(MyHeavyFunction); n > 1 {
t.Fatalf("Expected heavy function to run only once but called %d times", n)
}
I suggest breaking your heavy function into Cache part and actual processing part. Here Cache part will be facade to the concurrent clients. The system under test here actually is the cache layer that decides whether to forward the request, hold or return from cache. So in order to test that you need to mock the actual function that does processing and the mock should record how many requests it served.
Here is the demonstration with certain assumptions:
package main
import "fmt"
func CreateCachedProcessor(processor Processing) func(string) string {
//cache here
return func(req string) string {
//Assume you have some logic to call the actual processor based on your conditions
return processor.ProcessRequest(req)
}
}
type Processing interface {
ProcessRequest(string) string
}
type HeavyProcessing struct{}
func (p *HeavyProcessing) ProcessRequest(request string) string {
return "response"
}
type HeavyProcessingRecorder struct {
Count int
}
func (p *HeavyProcessingRecorder) ProcessRequest(request string) string {
p.Count += 1
return "response recorded"
}
func main() {
//normal
hp := &HeavyProcessing{}
c := CreateCachedProcessor(hp)
fmt.Println(c("req1"))
//Test with recorder
hpr := &HeavyProcessingRecorder{}
cr := CreateCachedProcessor(hpr)
cr("req1")
cr("req1")
cr("req1")
fmt.Println("No. of times Called ", hpr.Count)
}

Quit channel on a worker pool implementation

What I eventually want to accomplish is to dynamically scale my workers up OR down, depending on the workload.
The code below successfully parses data when a Task is coming through w.Channel
func (s *Storage) StartWorker(w *app.Worker) {
go func() {
for {
w.Pool <- w.Channel // register current worker to the worker pool
select {
case task := <-w.Channel: // received a work request, do some work
time.Sleep(task.Delay)
fmt.Println(w.WorkerID, "processing task:", task.TaskName)
w.Results <- s.ProcessTask(w, &task)
case <-w.Quit:
fmt.Println("Closing channel for", w.WorkerID)
return
}
}
}()
}
The blocking point here is the line below.
w.Pool <- w.Channel
In that sense, if I try to stop a worker(s) in any part of my program with:
w.Quit <- true
the case <-w.Quit: is blocked and never receives until there's another incoming Task on w.Channel (and I guess select statement here is random for each case selection).
So how can I stop a channel(worker) independently?
See below sample code, it declares a fanout function that is reponsible to size up/down the workers.
It works by using timeouts to detect that new workers has ended or are required to spawn.
there is an inner loop to ensure that each item is processed before moving on, blocking the source when it is needed.
package main
import (
"fmt"
"io"
"log"
"net"
"os"
)
func main() {
input := make(chan string)
fanout(input)
}
func fanout() {
workers := 0
distribute := make(chan string)
workerEnd := make(chan bool)
for i := range input {
done := false
for done {
select {
case distribute<-i:
done = true
case <-workerEnd:
workers--
default:
if workers <10 {
workers++
go func(){
work(distribute)
workerEnd<-true
}()
}
}
}
}
}
func work(input chan string) {
for {
select {
case i := <-input:
<-time.After(time.Millisecond)
case <-time.After(time.Second):
return
}
}
}

Go: Passing functions through channels

I'm trying to rate limit the functions that I call by placing them through a queue to be accessed later. Below I have a slice of requests that I have created, and the requestHandler function processes each request at a certain rate.
I want it to accept all kinds of functions with different types of parameters, hence the interface{} type.
How would I be able to pass the functions through a channel and successfully call them?
type request struct {
function interface{}
channel chan interface{}
}
var requestQueue []request
func pushQueue(f interface{}, ch chan interface{}) {
req := request{
f,
ch,
}
//push
requestQueue = append(requestQueue, req)
}
func requestHandler() {
for {
if len(requestQueue) > 0 {
//pop
req := requestQueue[len(requestQueue)-1]
requestQueue = requestQueue[:len(requestQueue)-1]
req.channel <- req.function
}
<-time.After(1200 * time.Millisecond)
}
}
Here is an example of what I'm trying to achieve (GetLeagueEntries(string, string) and GetSummonerName(int, int) are functions):
ch := make(chan interface{})
pushQueue(l.GetLeagueEntries, ch)
pushQueue(l.GetSummonerName, ch)
leagues, _ := <-ch(string1, string2)
summoners, _ := <-ch(int1, int2)
Alright, here is the codez: https://play.golang.org/p/XZvb_4BaJF
Notice that it's not perfect. You have a queue that is executed every second. If the queue is empty and a new item is added, the new item can wait for almost a second before being executed.
But this should get you very close to what you need anyway :)
This code can be split into 3 section:
The rate limited queue executor, which I call the server (I'm horrible at naming things) - The server doesn't know anything about the functions. All it does is start a never-ending goroutine that pops the oldest function in the queue, once every second, and calls it. The issue that I talked about above is in this section of the code BTW and I could help you fix it if you want.
The Button Click functionality - This shows you how each button click could call 3 diff functions (you could obviously make more/less function calls) using the server and make sure that they are each 1 second apart from each other. You can even add a timeout to any of the functions (to fake latency) and they would still get called 1 second apart. This is the only place that you need channels because you want to make all the function calls as fast as possible (if the first function takes 5 seconds, you only want to wait 1 second to call the second function) and then wait for them to finish so you need to know when they are all done.
The Button Click simulation (the main func) - this just shows that 3 button clicks would work as expected. You can also put them in a goroutine to simulate 3 users clicking the button at the same time and it would still work.
package main
import (
"fmt"
"sync"
"time"
)
const (
requestFreq = time.Second
)
type (
// A single request
request func()
// The server that will hold a queue of requests and make them once a requestFreq
server struct {
// This will tick once per requestFreq
ticker *time.Ticker
requests []request
// Mutex for working with the request slice
sync.RWMutex
}
)
var (
createServerOnce sync.Once
s *server
)
func main() {
// Multiple button clicks:
ButtonClick()
ButtonClick()
ButtonClick()
fmt.Println("Done!")
}
// BUTTON LOGIC:
// Calls 3 functions and returns 3 diff values.
// Each function is called at least 1 second appart.
func ButtonClick() (val1 int, val2 string, val3 bool) {
iCh := make(chan int)
sCh := make(chan string)
bCh := make(chan bool)
go func(){
Server().AppendRequest(func() {
t := time.Now()
fmt.Println("Calling func1 (time: " + t.Format("15:04:05") + ")")
// do some stuff
iCh <- 1
})
}()
go func(){
Server().AppendRequest(func() {
t := time.Now()
fmt.Println("Calling func2 (time: " + t.Format("15:04:05") + ")")
// do some stuff
sCh <- "Yo"
})
}()
go func(){
Server().AppendRequest(func() {
t := time.Now()
fmt.Println("Calling func3 (time: " + t.Format("15:04:05") + ")")
// do some stuff
bCh <- true
})
}()
// Wait for all 3 calls to come back
for count := 0; count < 3; count++ {
select {
case val1 = <-iCh:
case val2 = <-sCh:
case val3 = <-bCh:
}
}
return
}
// SERVER LOGIC
// Factory function that will only create a single server
func Server() *server {
// Only one server for the entire application
createServerOnce.Do(func() {
s = &server{ticker: time.NewTicker(requestFreq), requests: []request{}}
// Start a thread to make requests.
go s.makeRequests()
})
return s
}
func (s *server) makeRequests() {
if s == nil || s.ticker == nil {
return
}
// This will keep going once per each requestFreq
for _ = range s.ticker.C {
var r request
// You can't just access s.requests because you are in a goroutine
// here while someone could be adding new requests outside of the
// goroutine so you have to use locks.
s.Lock()
if len(s.requests) > 0 {
// We have a lock here, which blocks all other operations
// so just shift the first request out, save it and give
// the lock back before doing any work.
r = s.requests[0]
s.requests = s.requests[1:]
}
s.Unlock()
if r != nil {
// make the request!
r()
}
}
}
func (s *server) AppendRequest(r request) {
if s == nil {
return
}
s.Lock()
s.requests = append(s.requests, r)
s.Unlock()
}
First, I would write it as:
leagues := server.GetLeagueEntries()
summoners := server.GetSummoners()
And, put the rate limiting into the server. With one of the rate-limiting libraries.
However, it is possible to use an interface to unify the requests, and use a func type to allow closures (as in http.HandleFunc):
type Command interface {
Execute(server *Server)
}
type CommandFunc func(server *Server)
func (fn CommandFunc) Execute(server *Server) { fn(server) }
type GetLeagueEntries struct { Leagues []League }
func (entries *GetLeagueEntries) Execute(server *Server) {
// ...
}
func GetSummonerName(id int, result *string) CommandFunc {
return CommandFunc(func(server *Server){
*result = "hello"
})
}
get := GetLeagueEnties{}
requests <- &get
requests <- CommandFunc(func(server *Server){
// ... handle struff here
})
Of course, this needs some synchronization.
I would have thought it easier to use some sort of semaphore or worker pool. That way you have limited number of workers who can do anything. It would be possible to have multiple worker pools too.
Do you need any of these calls to be concurrent/asynchronous? If not, and they can be called in order you could have configurable sleep (a nasty hack mind).
Try out a worker pool or semaphore rather than a chan of functions.

async reply in registry pattern

I'm learning go, and I would like to explore some patterns.
I would like to build a Registry component which maintains a map of some stuff, and I want to provide a serialized access to it:
Currently I ended up with something like this:
type JobRegistry struct {
submission chan JobRegistrySubmitRequest
listing chan JobRegistryListRequest
}
type JobRegistrySubmitRequest struct {
request JobSubmissionRequest
response chan Job
}
type JobRegistryListRequest struct {
response chan []Job
}
func NewJobRegistry() (this *JobRegistry) {
this = &JobRegistry{make(chan JobRegistrySubmitRequest, 10), make(chan JobRegistryListRequest, 10)}
go func() {
jobMap := make(map[string] Job)
for {
select {
case sub := <- this.submission:
job := MakeJob(sub.request) // ....
jobMap[job.Id] = job
sub.response <- job.Id
case list := <- this.listing:
res := make([]Job, 0, 100)
for _, v := range jobMap {
res = append(res, v)
}
list.response <- res
}
/// case somechannel....
}
}()
return
}
Basically, I encapsulate each operation inside a struct, which carries
the parameters and a response channel.
Then I created helper methods for end users:
func (this *JobRegistry) List() ([]Job, os.Error) {
res := make(chan []Job, 1)
req := JobRegistryListRequest{res}
this.listing <- req
return <-res, nil // todo: handle errors like timeouts
}
I decided to use a channel for each type of request in order to be type safe.
The problem I see with this approach are:
A lot of boilerplate code and a lot of places to modify when some param/return type changes
Have to do weird things like create yet another wrapper struct in order to return errors from within the handler goroutine. (If I understood correctly there are no tuples, and no way to send multiple values in a channel, like multi-valued returns)
So, I'm wondering whether all this makes sense, or rather just get back to good old locks.
I'm sure that somebody will find some clever way out using channels.
I'm not entirely sure I understand you, but I'll try answering never the less.
You want a generic service that executes jobs sent to it. You also might want the jobs to be serializable.
What we need is an interface that would define a generic job.
type Job interface {
Run()
Serialize(io.Writer)
}
func ReadJob(r io.Reader) {...}
type JobManager struct {
jobs map[int] Job
jobs_c chan Job
}
func NewJobManager (mgr *JobManager) {
mgr := &JobManager{make(map[int]Job),make(chan Job,JOB_QUEUE_SIZE)}
for {
j,ok := <- jobs_c
if !ok {break}
go j.Run()
}
}
type IntJob struct{...}
func (job *IntJob) GetOutChan() chan int {...}
func (job *IntJob) Run() {...}
func (job *IntJob) Serialize(o io.Writer) {...}
Much less code, and roughly as useful.
About signaling errors with an axillary stream, you can always use a helper function.
type IntChanWithErr struct {
c chan int
errc chan os.Error
}
func (ch *IntChanWithErr) Next() (v int,err os.Error) {
select {
case v := <- ch.c // not handling closed channel
case err := <- ch.errc
}
return
}

Resources