Go: channel many slow API queries into single SQL transaction - go

I wonder what would be idiomatic way to do as following.
I have N slow API queries, and one database connection, I want to have a buffered channel, where responses will come, and one database transaction which I will use to write data.
I could only come up with semaphore thing as following makeup example:
func myFunc(){
//10 concurrent API calls
sem := make(chan bool, 10)
//A concurrent safe map as buffer
var myMap MyConcurrentMap
for i:=0;i<N;i++{
sem<-true
go func(i int){
defer func(){<-sem}()
resp:=slowAPICall(fmt.Sprintf("http://slow-api.me?%d",i))
myMap.Put(resp)
}(i)
}
for j=0;j<cap(sem);j++{
sem<-true
}
tx,_ := db.Begin()
for data:=range myMap{
tx.Exec("Insert data into database")
}
tx.Commit()
}
I am nearly sure there is simpler, cleaner and more proper solution, but it is seems complicated to grasp for me.
EDIT:
Well, I come with following solution, this way I do not need the buffer map, so once data comes to resp channel the data is printed or can be used to insert into a database, it works, I am still not sure if everything OK, at last there are no race.
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
//Gloab waitGroup
var wg sync.WaitGroup
func init() {
//just for fun sake, make rand seeded
rand.Seed(time.Now().UnixNano())
}
//Emulate a slow API call
func verySlowAPI(id int) int {
n := rand.Intn(5)
time.Sleep(time.Duration(n) * time.Second)
return n
}
func main() {
//Amount of tasks
N := 100
//Concurrency level
concur := 10
//Channel for tasks
tasks := make(chan int, N)
//Channel for responses
resp := make(chan int, 10)
//10 concurrent groutinezs
wg.Add(concur)
for i := 1; i <= concur; i++ {
go worker(tasks, resp)
}
//Add tasks
for i := 0; i < N; i++ {
tasks <- i
}
//Collect data from goroutiens
for i := 0; i < N; i++ {
fmt.Printf("%d\n", <-resp)
}
//close the tasks channel
close(tasks)
//wait till finish
wg.Wait()
}
func worker(task chan int, resp chan<- int) {
defer wg.Done()
for {
task, ok := <-task
if !ok {
return
}
n := verySlowAPI(task)
resp <- n
}
}

There's no need to use channels for a semaphore, sync.WaitGroup was made for waiting for a set of routines to complete.
If you're using the channel to limit throughput, you're better off with a worker pool, and using the channel to pass jobs to the workers:
type job struct {
i int
}
func myFunc(N int) {
// Adjust as needed for total number of tasks
work := make(chan job, 10)
// res being whatever type slowAPICall returns
results := make(chan res, 10)
resBuff := make([]res, 0, N)
wg := new(sync.WaitGroup)
// 10 concurrent API calls
for i = 0; i < 10; i++ {
wg.Add(1)
go func() {
for j := range work {
resp := slowAPICall(fmt.Sprintf("http://slow-api.me?%d", j.i))
results <- resp
}
wg.Done()
}()
}
go func() {
for r := range results {
resBuff = append(resBuff, r)
}
}
for i = 0; i < N; i++ {
work <- job{i}
}
close(work)
wg.Wait()
close(results)
}

Maybe this will work for you. Now you can get rid of your concurrent map. Here is a code snippet:
func myFunc() {
//10 concurrent API calls
sem := make(chan bool, 10)
respCh := make(chan YOUR_RESP_TYPE, 10)
var responses []YOUR_RESP_TYPE
for i := 0; i < N; i++ {
sem <- true
go func(i int) {
defer func() {
<-sem
}()
resp := slowAPICall(fmt.Sprintf("http://slow-api.me?%d",i))
respCh <- resp
}(i)
}
respCollected := make(chan struct{})
go func() {
for i := 0; i < N; i++ {
responses = append(responses, <-respCh)
}
close(respCollected)
}()
<-respCollected
tx,_ := db.Begin()
for _, data := range responses {
tx.Exec("Insert data into database")
}
tx.Commit()
}
Than we need to use one more goroutine that will collect all responses in some slice or map from a response channel.

Related

Deadlocks with buffered channels in Go

I am encountering for the below code fatal error: all goroutines are asleep - deadlock!
Am I right in using a buffered channel? I would appreciate it if you can give me pointers. I am unfortunately at the end of my wits.
func main() {
valueChannel := make(chan int, 2)
defer close(valueChannel)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go doNothing(&wg, valueChannel)
}
for {
v, ok := <- valueChannel
if !ok {
break
}
fmt.Println(v)
}
wg.Wait()
}
func doNothing(wg *sync.WaitGroup, numChan chan int) {
defer wg.Done()
time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
numChan <- 12
}
The main goroutine blocks on <- valueChannel after receiving all values. Close the channel to unblock the main goroutine.
func main() {
valueChannel := make(chan int, 2)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go doNothing(&wg, valueChannel)
}
// Close channel after goroutines complete.
go func() {
wg.Wait()
close(valueChannel)
}()
// Receive values until channel is closed.
// The for / range loop here does the same
// thing as the for loop in the question.
for v := range valueChannel {
fmt.Println(v)
}
}
Run the example on the playground.
The code above works independent of the number of values sent by the goroutines.
If the main() function can determine the number of values sent by the goroutines, then receive that number of values from main():
func main() {
const n = 10
valueChannel := make(chan int, 2)
for i := 0; i < n; i++ {
go doNothing(valueChannel)
}
// Each call to doNothing sends one value. Receive
// one value for each call to doNothing.
for i := 0; i < n; i++ {
fmt.Println(<-valueChannel)
}
}
func doNothing(numChan chan int) {
time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
numChan <- 12
}
Run the example on the playground.
The main problem is on the for loop of channel receiving.
The comma ok idiom is slightly different on channels, ok indicates whether the received value was sent on the channel (true) or is a zero value returned because the channel is closed and empty (false).
In this case the channel is waiting a data to be sent and since it's already finished sending the value ten times : Deadlock.
So apart of the design of the code if I just need to do the less change possible here it is:
func main() {
valueChannel := make(chan int, 2)
defer close(valueChannel)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go doNothing(&wg, valueChannel)
}
for i := 0; i < 10; i++ {
v := <- valueChannel
fmt.Println(v)
}
wg.Wait()
}
func doNothing(wg *sync.WaitGroup, numChan chan int) {
defer wg.Done()
time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
numChan <- 12
}

X number of goroutines to update the same variable

I want to make X number of goroutines to update CountValue using parallelism (numRoutines are as much as how many CPU you have).
Solution 1:
func count(numRoutines int) (countValue int) {
var mu sync.Mutex
k := func(i int) {
mu.Lock()
defer mu.Unlock()
countValue += 5
}
for i := 0; i < numRoutines; i++ {
go k(i)
}
It becomes a data race and the returned countValue = 0.
Solution 2:
func count(numRoutines int) (countValue int) {
k := func(i int, c chan int) {
c <- 5
}
c := make(chan int)
for i := 0; i < numRoutines; i++ {
go k(i, c)
}
for i := 0; i < numRoutines; i++ {
countValue += <- c
}
return
}
I did a benchmark test on it and doing a sequential addition would work faster than using goroutines. I think it's because I have two for loops here as when I put countValue += <- c inside the first for loop, the code runs faster.
Solution 3:
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
k := func(i int) {
defer wg.Done()
c <- 5
}
for i := 0; i < numShards; i++ {
wg.Add(1)
go k(i)
}
go func() {
for i := range c {
countValue += i
}
}()
wg.Wait()
return
}
Still a race count :/
Is there any way better to do this?
There definitely is a better way to safely increment a variable: using sync/atomic:
import "sync/atomic"
var words int64
k := func() {
_ = atomic.AddInt64(&words, 5) // increment atomically
}
Using a channel basically eliminates the need for a mutex, or takes away the the risk of concurrent access to the variable itself, and a waitgroup here is just a bit overkill
Channels:
words := 0
done := make(chan struct{}) // or use context
ch := make(chan int, numRoutines) // buffer so each routine can write
go func () {
read := 0
for i := range ch {
words += 5 // or use i or something
read++
if read == numRoutines {
break // we've received data from all routines
}
}
close(done) // indicate this routine has terminated
}()
for i := 0; i < numRoutines; i++ {
ch <- i // write whatever value needs to be used in the counting routine on the channel
}
<- done // wait for our routine that increments words to return
close(ch) // this channel is no longer needed
fmt.Printf("Counted %d\n", words)
As you can tell, the numRoutines no longer is the number of routines, but rather the number of writes on the channel. You can move that to individual routines, still:
for i := 0; i < numRoutines; i++ {
go func(ch chan<- int, i int) {
// do stuff here
ch <- 5 * i // for example
}(ch, i)
}
WaitGroup:
Instead of using a context that you can cancel, or a channel, you can use a waitgroup + atomic to get the same result. The easiest way IMO to do so is to create a type:
type counter struct {
words int64
}
func (c *counter) doStuff(wg *sync.WaitGroup, i int) {
defer wg.Done()
_ = atomic.AddInt64(&c.words, i * 5) // whatever value you need to add
}
func main () {
cnt := counter{}
wg := sync.WaitGroup{}
wg.Add(numRoutines) // create the waitgroup
for i := 0; i < numRoutines; i++ {
go cnt.doStuff(&wg, i)
}
wg.Wait() // wait for all routines to finish
fmt.Println("Counted %d\n", cnt.words)
}
Fix for your third solution
As I mentioned in the comment: your third solution is still causing a race condition because the channel c is never closed, meaning the routine:
go func () {
for i := range c {
countValue += i
}
}()
Never returns. The waitgroup also only ensures that you've sent all values on the channel, but not that the countValue has been incremented to its final value. The fix would be to either close the channel after wg.Wait() returns so the routine can return, and add a done channel that you can close when this last routine returns, and add a <-done statement before returning.
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
k := func(i int) {
defer wg.Done()
c <- 5
}
for i := 0; i < numShards; i++ {
wg.Add(1)
go k(i)
}
done := make(chan struct{})
go func() {
for i := range c {
countValue += i
}
close(done)
}()
wg.Wait()
close(c)
<-done
return
}
This adds some clutter, though, and IMO is a bit messy. It might just be easier to just move the wg.Wait() call to a routine:
func count(numRoutines int) (countValue int) {
var wg sync.WaitGroup
c := make(chan int)
// add wg as argument, makes it easier to move this function outside of this scope
k := func(wg *sync.WaitGroup, i int) {
defer wg.Done()
c <- 5
}
wg.Add(numShards) // increment the waitgroup once
for i := 0; i < numShards; i++ {
go k(&wg, i)
}
go func() {
wg.Wait()
close(c) // this ends the loop over the channel
}()
// just iterate over the channel until it is closed
for i := range c {
countValue += i
}
// we've added all values to countValue
return
}

Stop a go routine before starting a new one

Got a server that handles requests from clients. Each requests spins up a go routine in a for loop, how do I stop a previous go routine before a new one starts.
for {
select {
case discoveryRequest := <-discoveryRequestChannel:
for _, resourceNames := range discoveryRequest.ResourceNames {
accumulatedResourceNames = append(accumulatedResourceNames, resourceNames) //resources requested are accumulated
}
log.Infof("request resources: %v", accumulatedResourceNames)
secret, err := s.MakeSecretResponse(accumulatedResourceNames, nonce)
if err != nil {
log.Error(err) // Handling all the errors from the above layers. no need for context as they are provided in previous layers.
continue
}
if err := stream.Send(secret); err != nil {
log.Errorf("Error when sending stream to envoy %v ", err)
}
version = secret.VersionInfo
secretsRenewEndChannel <- true
close(secretsRenewEndChannel)
log.Info("kill switch")
go s.SecretsRenewer(version, accumulatedResourceNames, secretsRenewChannel, secretsRenewEndChannel)
isSecretRenewRunning = true
log.Info("set to true")
secretsRenewEndChannel = make(chan bool)
Your code isn't clear. Why do you need to select like this in a loop? What are you selecting amongst? You can just receive from the channel instead.
You may be looking for a worker pool here. Here's an example from gobyexample:
package main
import (
"fmt"
"time"
)
func worker(id int, jobs <-chan int, results chan<- int) {
for j := range jobs {
fmt.Println("worker", id, "started job", j)
time.Sleep(time.Second)
fmt.Println("worker", id, "finished job", j)
results <- j * 2
}
}
func main() {
const numJobs = 5
jobs := make(chan int, numJobs)
results := make(chan int, numJobs)
for w := 1; w <= 3; w++ {
go worker(w, jobs, results)
}
for j := 1; j <= numJobs; j++ {
jobs <- j
}
close(jobs)
for a := 1; a <= numJobs; a++ {
<-results
}
}
To adjust it to your problem, the worker would handle requests and the main goroutine would feed requests into a channel. You can easily control how many goroutines run concurrently with this approach.
Another pattern that may be useful in your case is rate limiting

Get the fastest goroutine

How to get only the result of the fastest goroutine. For example let's we have this function.
func Add(num int)int{
return num+5
}
And we have this function.
func compute(){
for i := 0; i < 5; i++ {
go Add(i)
}
}
I want to get the result of the goroutine that finishes first.
Use a buffered channel to get first result of many grouting execution
func Test(t *testing.T) {
ch := make(chan int, 1)
go func() {
for i := 0; i < 5; i++ {
go func(c chan<- int, i int) {
res := Add(i)
c <- res
}(ch, i)
}
}()
res := <-ch // blocking here, before get first result
//close(ch) - writing to close channel produce a panic
fmt.Println(res)
}
PLAYGROUND
It hasn't memory leak when leave a channel open and never close it. When the channel is no longer used, it will be garbage collected.

Always getting deadlock with channels

I'm learning to work with Go channels, and I'm always getting deadlocks. What might be wrong with this code? Printer randomly stops working when array sizes are unequal; I guess it would help to somehow notify printer that receiver stopped working. Any ideas how to fix it? My code is pasted below.
package main
import (
"fmt"
"sync"
)
var wg = sync.WaitGroup{}
var wgs = sync.WaitGroup{}
var sg = make(chan int, 50)
var gp1 = make(chan int, 50)
var gp2 = make(chan int, 50)
func main(){
wgs.Add(2)
go Sender(0)
go Sender(11)
wg.Add(3)
go Receiver()
go Printer()
go Printer2()
wg.Wait()
}
func Sender(start int){
defer wgs.Done()
for i := start; i < 20; i++ {
sg <- i
}
}
func Receiver(){
defer wg.Done()
for i := 0; i < 20; i++{
nr := <- sg
if nr % 2 == 0{
gp1 <- nr
} else{
gp2 <- nr
}
}
}
func Printer(){
defer wg.Done()
var m [10]int
for i := 0; i < 10; i++ {
m[i] = <- gp1
}
wgs.Wait()
fmt.Println(m)
}
func Printer2(){
defer wg.Done()
var m [10]int
for i := 0; i < 10; i++ {
m[i] = <- gp2
}
wgs.Wait()
fmt.Println(m)
}
// Better to use this one
// func Receiver(senderChannel <-chan int, printerChannel1 chan<- int, printerChannel2 chan<- int, wg *sync.WaitGroup) {
The Sender generates (I think 28 messages) . Roughly half the first 20 of these go to one of gp1 and gp2. Printer and Printer2 then unload the messages
Trouble is, the way that Receiver splits the messages depends on if the number received is odd or even. But you aren't controlling for this. If one of the Printers has less than 10 items in it's queue it will hang
That's one potential problem
Your core problem is that everything in this is "dead reckoning": they expect to see a fixed number of messages, but this doesn't necessarily match up with reality. You should set up the channels so that they get closed once all of their data gets produced.
This probably means setting up an intermediate function to manage the sending:
func Sender(from, to int, c chan<- int) {
for i := from; i < to; i++ {
c <- i
}
}
func SendEverything(c chan<- int) {
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
Sender(0, 20, c)
}()
go func() {
defer wg.Done()
Sender(11, 20, c)
}()
wg.Wait()
close(c)
}
Make the dispatcher function work on everything in the channel:
func Receive(c <-chan int, odds, evens chan<- int) {
for n := range c {
if n%2 == 0 {
evens <- n
} else {
odds <- n
}
}
close(odds)
close(evens)
}
And then you can share a single print function:
func Printer(prefix string, c <-chan int) {
for n := range c {
fmt.Printf("%s: %d\n", prefix, n)
}
}
Finally, you have a main function that stitches it all together:
func main() {
var wg sync.WaitGroup
inputs := make(chan int)
odds := make(chan int)
evens := make(chan int)
wg.Add(4)
go func() {
defer wg.Done()
SendEverything(inputs)
}()
go func() {
defer wg.Done()
Receive(inputs, odds, evens)
}()
go func() {
defer wg.Done()
Printer("odd number", odds)
}()
go func() {
defer wg.Done()
Printer("even number", evens)
}()
wg.Wait()
}
The complete example is at https://play.golang.org/p/qTUqlt-uaWH.
Note that I've completely refrained from using any global variables, and either things have a hopefully self-explanatory very short name (i and n are simple integers, c is a channel) or are complete words (odds, evens). I've tended to keep sync.WaitGroup objects local to where they're created. Since everything is passed as parameters, I don't need two copies of the same function to act on different global variables, and if I chose to write test code for this, I could create my own local channels.

Resources