I'm getting in 'stdin' lines of URL's like:
$ echo -e 'https://golang.org\nhttps://godoc.org\nhttps://golang.org' | go run 1.go .
The task is to get from each WEB-page number of word "Go". But I'm not allowed to start more than 5 goroutines and can use only standard library
Here is my code:
package main
import (
"fmt"
"net/http"
"bufio"
"os"
"regexp"
"io/ioutil"
"time"
)
func worker(id int, jobs<-chan string, results chan<-int) {
t0 := time.Now()
for url := range jobs {
resp, err := http.Get(url)
if err != nil {
fmt.Println("problem while opening url", url)
results<-0
//continue
}
defer resp.Body.Close()
html, err := ioutil.ReadAll(resp.Body)
if err != nil {
continue
}
regExp:= regexp.MustCompile("Go")
matches := regExp.FindAllStringIndex(string(html), -1)
t1 := time.Now()
fmt.Println("Count for", url, ":", len(matches), "Elapsed time:",
t1.Sub(t0), "works id", id)
results<-len(matches)
}
}
func main(){
scanner := bufio.NewScanner(os.Stdin)
jobs := make(chan string, 100)
results := make(chan int, 100)
t0 := time.Now()
for w:= 0; w<5; w++{
go worker(w, jobs, results)
}
var tasks int = 0
res := 0
for scanner.Scan() {
jobs <- scanner.Text()
tasks ++
}
close(jobs)
for a := 1; a <= tasks; a++ {
res+=<-results
}
close(results)
t2 := time.Now()
fmt.Println("Total:",res, "Elapsed total time:", t2.Sub(t0) );
}
I thought it works until I passed more than 5 URL (one of them was incorrect) to stdin. The output was:
goroutine 9 [running]:
panic ...
Obviously, extra goroutnes have been started. How to fix it? May be there are more convenient way to limit number of goroutines?
goroutine 9 [running]:
Some goroutines are started by the runtime, and by web fetches.
Looking at your code, you only started 5 goroutines.
If you really want to know how many go routines you are running use runtime.Numgoroutine
Related
This question already has answers here:
Deadlock after attempting to print values of channel using 'range'
(3 answers)
Deadlock in the program when using for range on buffered channel
(1 answer)
Waiting for a WaitGroup and ranging over a channel results in a "fatal error: all goroutines are asleep - deadlock!"
(2 answers)
go routine for range over channels
(5 answers)
golang concurrency sync issue
(1 answer)
Closed 26 days ago.
I try to work on a piece of code with goroutine to scan open port on a subnet.
Here is the code :
package main
import (
"fmt"
"log"
"net"
"time"
)
func is_445_open(ip string, ch chan string){
connexion, err := net.DialTimeout("tcp", ip + ":445", 3*time.Second )
if err != nil {
return
}
defer connexion.Close()
ch <- ip
}
func ip_suivante(ip net.IP) {
for j := len(ip) - 1; j >= 0; j-- {
ip[j]++
if ip[j] > 0 {
break
}
}
}
func main() {
ch := make(chan string)
ipv4Addr, ipv4Net, err := net.ParseCIDR("200.31.0.0/24")
if err != nil {
log.Fatal(err)
}
for ipv4Addr := ipv4Addr.Mask(ipv4Net.Mask); ipv4Net.Contains(ipv4Addr); ip_suivante(ipv4Addr) {
//fmt.Println(ipv4Addr.String())
go is_445_open(ipv4Addr.String(), ch)
}
for v := range ch{
fmt.Println(v)
}
fmt.Println("Done")
}
I use ParseCIDR to loop over every IP of the network range then i connect to the port 445 to test if it is open.
To speed up the process, i want to use goroutine and write the ip where 445 is open to a channel.
Then loop over it to print them on the console.
When i use the loop to print every IP on my console the program never end.
It print the ip correctly but never the "done" at the end.
I have tried to close the channel before looping but when i do, no more data is printed and the program only show "done".
Do you have any tips ?
EDIT :
Finally figured out. You MUST put the code which manage the waitgroup and the channel closing into another goroutine to make it work.
Cannot undestand why.
But here is the code modified accordingly :
package main
import (
"fmt"
"log"
"net"
"time"
"sync"
)
func is_445_open(ip string, ch chan string, wg *sync.WaitGroup){
connexion, err := net.DialTimeout("tcp", ip + ":445", 3*time.Second )
if err != nil {
wg.Done()
return
}
defer connexion.Close()
ch <- ip
wg.Done()
}
func ip_suivante(ip net.IP) {
for j := len(ip) - 1; j >= 0; j-- {
ip[j]++
if ip[j] > 0 {
break
}
}
}
func main() {
wg := &sync.WaitGroup{}
ch := make(chan string)
ipv4Addr, ipv4Net, err := net.ParseCIDR("192.168.1.0/24")
if err != nil {
log.Fatal(err)
}
for ipv4Addr := ipv4Addr.Mask(ipv4Net.Mask); ipv4Net.Contains(ipv4Addr); ip_suivante(ipv4Addr) {
//fmt.Println(ipv4Addr.String())
wg.Add(1)
go is_445_open(ipv4Addr.String(), ch, wg)
}
go func(cha chan string, wg *sync.WaitGroup){
wg.Wait()
close(cha)
}(ch,wg)
for v := range ch{
fmt.Println(v)
}
fmt.Println("Done")
}
I'm trying to run a function concurrently. It makes a call to my DB that may take 2-10 seconds. I would like it to continue on to the next routine once it has finished, even if the other one is still processing, but only ever want it be processing a max of 2 at a time. I want this to happen indefinitely. I feel like I'm almost there, but waitGroup forces both routines to wait until completion prior to continuing to another iteration.
const ROUTINES = 2;
for {
var wg sync.WaitGroup
_, err:= db.Exec(`Random DB Call`)
if err != nil {
panic(err)
}
ch := createRoutines(db, &wg)
wg.Add(ROUTINES)
for i := 1; i <= ROUTINES; i++ {
ch <- i
time.Sleep(2 * time.Second)
}
close(ch)
wg.Wait()
}
func createRoutines(db *sqlx.DB, wg *sync.WaitGroup) chan int {
var ch = make(chan int, 5)
for i := 0; i < ROUTINES ; i++ {
go func(db *sqlx.DB) {
defer wg.Done()
for {
_, ok := <-ch
if !ok {
return
}
doStuff(db)
}
}(db)
}
return ch
}
If you need to only have n number of goroutines running at the same time, you can have a buffered channel of size n and use that to block creating new goroutines when there is no space left, something like this
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const ROUTINES = 2
rand.Seed(time.Now().UnixNano())
stopper := make(chan struct{}, ROUTINES)
var counter int
for {
counter++
stopper <- struct{}{}
go func(c int) {
fmt.Println("+ Starting goroutine", c)
time.Sleep(time.Duration(rand.Intn(3)) * time.Second)
fmt.Println("- Stopping goroutine", c)
<-stopper
}(counter)
}
}
In this example you see how you can only have ROUTINES number of goroutines that live 0, 1 or 2 seconds. In the output you can also see how every time one goroutine ends another one starts.
This adds an external dependency, but consider this implementation:
package main
import (
"context"
"database/sql"
"log"
"github.com/MicahParks/ctxerrpool"
)
func main() {
// Create a pool of 2 workers for database queries. Log any errors.
databasePool := ctxerrpool.New(2, func(_ ctxerrpool.Pool, err error) {
log.Printf("Failed to execute database query.\nError: %s", err.Error())
})
// Get a list of queries to execute.
queries := []string{
"SELECT first_name, last_name FROM customers",
"SELECT price FROM inventory WHERE sku='1234'",
"other queries...",
}
// TODO Make a database connection.
var db *sql.DB
for _, query := range queries {
// Intentionally shadow the looped variable for scope.
query := query
// Perform the query on a worker. If no worker is ready, it will block until one is.
databasePool.AddWorkItem(context.TODO(), func(workCtx context.Context) (err error) {
_, err = db.ExecContext(workCtx, query)
return err
})
}
// Wait for all workers to finish.
databasePool.Wait()
}
I want to have the limited number of goroutines that make some computation (func worker(), it makes some computation and places the result in a channel). Also a have another channel, that has "jobs" for my workers. As a result I can see that all jobs were computed correctly, but after computation executions stucks.
package main
import (
"bufio"
"fmt"
"os"
"net/http"
"io/ioutil"
"strings"
"time"
)
func worker(id int, urls <- chan string, results chan<- int) {
var data string
for url := range urls {
fmt.Println("worker", id, "started job", url)
if (strings.HasPrefix(url, "http") || strings.HasPrefix(url, "https")) {
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println(err)
}
data = string(body)
} else {
body, err := ioutil.ReadFile(url)
if err != nil {
fmt.Println(err)
}
data = string(body)
}
number := strings.Count(data, "Go")
fmt.Println("worker", id, "finished job", url, "Number of Go is", number)
results <- number
}
return
}
func main() {
final_result := 0
maxNbConcurrentGoroutines := 5
numJobs := 0
urls := make(chan string)
results := make(chan int)
scanner := bufio.NewScanner(os.Stdin)
start := time.Now()
for w := 1; w <= maxNbConcurrentGoroutines; w++ {
go worker(w, urls, results)
}
for scanner.Scan() {
url := (scanner.Text())
urls <- url
numJobs += 1
}
close(urls)
for num := range results {
final_result += num
}
t := time.Now()
elapsed := t.Sub(start)
for i := 1; i <= numJobs; i++ {
one_result := <- results
final_result += one_result
}
fmt.Println("Number = ", final_result)
fmt.Println("Time = ", elapsed)
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "error:", err)
os.Exit(1)
}
}
I tried to use https://gobyexample.com/worker-pools to extract all the values from results channel, but was not succeed. What should I do to have it unstacked and gone further. Here is an example of how to run it:
echo -e 'https://golang.org\n/etc/passwd\nhttps://golang.org\nhttps://golang.org' | go run 1.go
Your program doesn't return because it waits the closed status of results channel.
In https://gobyexample.com/worker-pools the loop for getting results is different:
for a := 1; a <= numJobs; a++ {
<-results
}
If you want to use for num := range results you need close(results) and determine when to call it.
You can view another example using WaitGroup at https://gobyexample.com/waitgroups
i have written the following code in order to run until someone exit the program manually.
it does is
----- check if the exists every 1 second
----- if available then read the file and print the file content line by line
for this i have first call a function from the main
and then i call a waitgroup and call a function again from there to do the aforementioned tasks.
please check if i have written the source code correctly as im a newbi on GO
plus this only runs once and stop... i want to it keep alive and see if the file exsists
please help me
package main
import (
"encoding/csv"
"fmt"
"io"
"log"
"os"
"sync"
"time"
)
func main() {
mainfunction()
}
//------------------------------------------------------------------
func mainfunction() {
var wg sync.WaitGroup
wg.Add(1)
go filecheck(&wg)
wg.Wait()
fmt.Printf("Program finished \n")
}
func filecheck(wg *sync.WaitGroup) {
for range time.Tick(time.Second * 1) {
fmt.Println("Foo")
var wgi sync.WaitGroup
wgi.Add(1)
oldName := "test.csv"
newName := "testi.csv"
if _, err := os.Stat(oldName); os.IsNotExist(err) {
fmt.Printf("Path does not exsist \n")
} else {
os.Rename(oldName, newName)
if err != nil {
log.Fatal(err)
}
looping(newName, &wgi)
}
fmt.Printf("Test complete \n")
wgi.Wait()
wg.Done()
time.Sleep(time.Second * 5)
}
}
func looping(newName string, wgi *sync.WaitGroup) {
file, _ := os.Open(newName)
r := csv.NewReader(file)
for {
record, err := r.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
var Date = record[0]
var Agent = record[1]
var Srcip = record[2]
var Level = record[3]
fmt.Printf("Data: %s Agent: %s Srcip: %s Level: %s\n", Date, Agent, Srcip, Level)
}
fmt.Printf("Test complete 2 \n")
wgi.Done()
fmt.Printf("for ended")
}
The short answer is that you have this in the loop:
wg.Done()
Which makes the main goroutine proceed to exit as soon as the file is read once.
The longer answer is that you're not using wait groups correctly here, IMHO. For example there's absolutely no point in passing a WaitGroup into looping.
It's not clear what your code is trying to accomplish - you certainly don't need any goroutines to just perform the task you've specified - it can all be gone with no concurrency and thus simpler code.
First time with go, and trying to get go routines and WaitGroups working.
I have a CSV file with 100 rows of data. (101 including header)
I have the following simple code:
package main
import (
"bufio"
"fmt"
"io"
"os"
"sync"
"time"
)
func main() {
start := time.Now()
numRows := 0
waitGroup := sync.WaitGroup{}
file, _ := os.Open("./data.csv")
scanner := bufio.NewScanner(file)
scanner.Scan() // to read the header
for scanner.Scan() {
err := scanner.Err()
if err != nil && err != io.EOF {
panic(err)
}
waitGroup.Add(1)
go (func() {
numRows++
waitGroup.Done()
})()
}
waitGroup.Wait()
file.Close()
fmt.Println("Finished parsing ", numRows)
fmt.Println("Elapsed time in seconds: ", time.Now().Sub(start))
}
When i run this, the numRows output fluctuates between 94 and 100 each time. I'm expecting it to be 100 each time. If i run the same code on a CSV of 10 rows of data, it outputs 10 each and every time.
Seems to me like the final few go routines aren't finishing in time.
I've tried the following which have failed:
using a CsvReader instead of a Scanner
moving waitGroup.Add(1) to underneath the anonymous func
moving the anonymous func out into a package-level scope func (and passed things round using ptrs)
What am i missing?
It's not safe to modify a single variable simultaneously in different goroutines. Some of your updates to numRows will be lost, and occasionally your program may crash.
Either protect your numRows variable with a mutex, or use one of the atomic functions to do your addition atomically:
var numRows int32
// ...
go (func() {
atomic.AddInt32(&numRows, 1)
waitGroup.Done()
})()
What do you do with this code:
for scanner.Scan() {
err := scanner.Err()
if err != nil && err != io.EOF {
panic(err)
}
waitGroup.Add(1)
go (func() {
numRows++
waitGroup.Done()
})()
}
Really all the work is done in one main goroutine and only numRows increment uses separate goroutines. I think this could be simplified to simple increment:
for scanner.Scan() {
err := scanner.Err()
if err != nil && err != io.EOF {
panic(err)
}
numRows++
}
If you want to simulate parallel parsing and pipelining you may use channels. Make only one goroutine responsible for counter increment. Every time when another goroutine wants to increment counter - it sends a message to that channel.
https://play.golang.org/p/W60twJjY8P