bounded parallelism to hash strings [duplicate] - go

This question already has answers here:
How would you define a pool of goroutines to be executed at once?
(5 answers)
Closed 4 years ago.
Trying to solve the problem of reading phone numbers from a file (line by line) and hash them concurrently (md5) in go with a maximum of 100 concurrent process maximum, creating a fixed number of goroutines. tried using go pipeline and bounded parallelism but did not work. any guidance, please?
this is what I tried, https://play.golang.org/p/vp7s512l8w4

package main
import (
"bufio"
"fmt"
"os"
"sync"
)
func hashPhoneNumers(phoneNO string, ch chan struct{}, group *sync.WaitGroup) {
do_hash(phoneNO)
<-ch
group.Done()
}
func do_hash(s string) {
fmt.Println("do hash: ", s)
}
func main() {
c := make(chan struct{}, 100)
file, err := os.Open("file.txt")
if err != nil {
fmt.Println(err)
os.Exit(-1)
}
defer file.Close()
line := bufio.NewScanner(file)
wg := sync.WaitGroup{}
for line.Scan() {
// with a maximum of 100 concurrent process maximum
c <- struct{}{}
wg.Add(1)
go hashPhoneNumers(line.Text(), c, &wg)
}
// wait all goroutines done
wg.Wait()
close(c)
}
this works fine

Related

Goroutine is blocked in execution [duplicate]

This question already has answers here:
Deadlock after attempting to print values of channel using 'range'
(3 answers)
Deadlock in the program when using for range on buffered channel
(1 answer)
Waiting for a WaitGroup and ranging over a channel results in a "fatal error: all goroutines are asleep - deadlock!"
(2 answers)
go routine for range over channels
(5 answers)
golang concurrency sync issue
(1 answer)
Closed 26 days ago.
I try to work on a piece of code with goroutine to scan open port on a subnet.
Here is the code :
package main
import (
"fmt"
"log"
"net"
"time"
)
func is_445_open(ip string, ch chan string){
connexion, err := net.DialTimeout("tcp", ip + ":445", 3*time.Second )
if err != nil {
return
}
defer connexion.Close()
ch <- ip
}
func ip_suivante(ip net.IP) {
for j := len(ip) - 1; j >= 0; j-- {
ip[j]++
if ip[j] > 0 {
break
}
}
}
func main() {
ch := make(chan string)
ipv4Addr, ipv4Net, err := net.ParseCIDR("200.31.0.0/24")
if err != nil {
log.Fatal(err)
}
for ipv4Addr := ipv4Addr.Mask(ipv4Net.Mask); ipv4Net.Contains(ipv4Addr); ip_suivante(ipv4Addr) {
//fmt.Println(ipv4Addr.String())
go is_445_open(ipv4Addr.String(), ch)
}
for v := range ch{
fmt.Println(v)
}
fmt.Println("Done")
}
I use ParseCIDR to loop over every IP of the network range then i connect to the port 445 to test if it is open.
To speed up the process, i want to use goroutine and write the ip where 445 is open to a channel.
Then loop over it to print them on the console.
When i use the loop to print every IP on my console the program never end.
It print the ip correctly but never the "done" at the end.
I have tried to close the channel before looping but when i do, no more data is printed and the program only show "done".
Do you have any tips ?
EDIT :
Finally figured out. You MUST put the code which manage the waitgroup and the channel closing into another goroutine to make it work.
Cannot undestand why.
But here is the code modified accordingly :
package main
import (
"fmt"
"log"
"net"
"time"
"sync"
)
func is_445_open(ip string, ch chan string, wg *sync.WaitGroup){
connexion, err := net.DialTimeout("tcp", ip + ":445", 3*time.Second )
if err != nil {
wg.Done()
return
}
defer connexion.Close()
ch <- ip
wg.Done()
}
func ip_suivante(ip net.IP) {
for j := len(ip) - 1; j >= 0; j-- {
ip[j]++
if ip[j] > 0 {
break
}
}
}
func main() {
wg := &sync.WaitGroup{}
ch := make(chan string)
ipv4Addr, ipv4Net, err := net.ParseCIDR("192.168.1.0/24")
if err != nil {
log.Fatal(err)
}
for ipv4Addr := ipv4Addr.Mask(ipv4Net.Mask); ipv4Net.Contains(ipv4Addr); ip_suivante(ipv4Addr) {
//fmt.Println(ipv4Addr.String())
wg.Add(1)
go is_445_open(ipv4Addr.String(), ch, wg)
}
go func(cha chan string, wg *sync.WaitGroup){
wg.Wait()
close(cha)
}(ch,wg)
for v := range ch{
fmt.Println(v)
}
fmt.Println("Done")
}

How to return multiple values from goroutine?

I found some example with "Catch values from Goroutines"-> link
There is show how to fetch value but if I want to return value from several goroutines, it wont work.So, does anybody know, how to do it?
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"sync"
)
// WaitGroup is used to wait for the program to finish goroutines.
var wg sync.WaitGroup
func responseSize(url string, nums chan int) {
// Schedule the call to WaitGroup's Done to tell goroutine is completed.
defer wg.Done()
response, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}
// Send value to the unbuffered channel
nums <- len(body)
}
func main() {
nums := make(chan int) // Declare a unbuffered channel
wg.Add(1)
go responseSize("https://www.golangprograms.com", nums)
go responseSize("https://gobyexample.com/worker-pools", nums)
go responseSize("https://stackoverflow.com/questions/ask", nums)
fmt.Println(<-nums) // Read the value from unbuffered channel
wg.Wait()
close(nums) // Closes the channel
// >> loading forever
Also, this example, worker pools
Is it possible to get value from result: fmt.Println(<-results) <- will be error.
Yes, just read from the channel multiple times:
answerOne := <-nums
answerTwo := <-nums
answerThree := <-nums
Channels function like thread-safe queues, allowing you to queue up values and read them out one by one
P.S. You should either add 3 to the wait group or not have one at all. The <-nums will block until a value is available on nums so it is not necessary

Listen to array of go channels simultaneously [duplicate]

This question already has answers here:
how to listen to N channels? (dynamic select statement)
(7 answers)
Closed 3 years ago.
Suppose I have a slice of go receiving channels. Is there a way I can listen to all of them at once? For example:
channels := make([]<-chan int, 0, N)
// fill the slice with channels
for _, channel := range channels {
<-channel
}
Is the closest I can get to doing that. However, this implementation is dependent on the order of the elements of the slice.
For clarity, I don't need to know the values of the go channel. I just need to know they all finished.
You can use sync package to create a waitgroup.
Start a goroutine for each channel after adding to waitGroup.
Finally when channel is done it decrements waitGroup. The caller just waits on waitgroup. Here is the playground link.
package main
import (
"fmt"
"sync"
"time"
)
func main() {
wg := sync.WaitGroup{}
numChans := 10
chans := make([]chan interface{}, numChans)
for i:=0;i<numChans;i++ {
chans[i] = make(chan interface{})
wg.Add(1)
go func(c chan interface{}) {
defer wg.Done()
<-c
}(chans[i])
}
//simulate closing channels
go func(){
time.Sleep(time.Second)
fmt.Println("closing")
for _, ch := range chans{
close(ch)
}
}()
fmt.Println("waiting")
wg.Wait()
}

I want to split a file into equally sized "chunks", or slices and use goroutines to process them simultaneously

Using Go, I have large log files. Currently I open them, create a new scanner bufio.NewScanner, and then for scanner.Scan() to loop through the lines. Each line is sent through a processing function, which matches it to regular expressions and extracts data. I would like to process this file in chunks simultaneously using goroutines. I believe this may be quicker than looping through the whole file sequentially.
It can take a few seconds per file, and I'm wondering if I can process a single file in, say, 10 pieces at a time. I believe I can sacrifice the memory if needed. I have ~3gb, and the biggest log file is maybe 75mb.
I see that a scanner has a .Split() method, where you can provide a custom split function, but I wasn't able to find a good solution using this method.
I've also tried creating a slice of slices, looping through the scanner with scanner.Scan() and appending scanner.Text() to each slice.
eg:
// pseudocode because I couldn't get this to work either
scanner := bufio.NewScanner(logInfo)
threads := [[], [], [], [], []]
i := 0
for scanner.Scan() {
i = i + 1
if i > 5 {
i = 0
}
threads[i] = append(threads[i], scanner.Text())
}
fmt.Println(threads)
I'm new to Go and concerned about efficiency and performance. I want to learn how to write good Go code! Any help or advice is really appreciated.
Peter gives a good starting point, if you wanted to do something like a fan-out, fan-in pattern you could do something like:
package main
import (
"bufio"
"fmt"
"log"
"os"
"sync"
)
func main() {
file, err := os.Open("/path/to/file.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
lines := make(chan string)
// start four workers to do the heavy lifting
wc1 := startWorker(lines)
wc2 := startWorker(lines)
wc3 := startWorker(lines)
wc4 := startWorker(lines)
scanner := bufio.NewScanner(file)
go func() {
defer close(lines)
for scanner.Scan() {
lines <- scanner.Text()
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}()
merged := merge(wc1, wc2, wc3, wc4)
for line := range merged {
fmt.Println(line)
}
}
func startWorker(lines <-chan string) <-chan string {
finished := make(chan string)
go func() {
defer close(finished)
for line := range lines {
// Do your heavy work here
finished <- line
}
}()
return finished
}
func merge(cs ...<-chan string) <-chan string {
var wg sync.WaitGroup
out := make(chan string)
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan string) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}
If it is acceptable that line N+1 is processed before line N, you can use a simple fan-out pattern to get started. The Go blog explains this and more advanced patterns, such as cancelation and fan-in.
Note that this is just a starting point to keep it simple and on point. You would almost certainly want to wait for the process functions to return before exiting, for instance. This is explained in the mentioned blog post.
package main
import "bufio"
func main() {
var sc *bufio.Scanner
lines := make(chan string)
go process(lines)
go process(lines)
go process(lines)
go process(lines)
for sc.Scan() {
lines <- sc.Text()
}
close(lines)
}
func process(lines <-chan string) {
for line := range lines {
// implement processing here
}
}

Scanf in multiple goroutines giving unexpected results

I was simply experimenting in golang. I came across an interesting result. This is my code.
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
var str1, str2 string
wg.Add(2)
go func() {
fmt.Scanf("%s", &str1)
wg.Done()
}()
go func() {
fmt.Scanf("%s", &str2)
wg.Done()
}()
wg.Wait()
fmt.Printf("%s %s\n", str1, str2)
}
I gave the following input.
beat
it
I was expecting the result to be either
it beat
or
beat it
But I got.
eat bit
Can any one please help me figure out why it is so?
fmt.Scanf isn't an atomic operation. Here's the implementation : http://golang.org/src/pkg/fmt/scan.go#L1115
There's no semaphor, nothing preventing two parallel executions. So what happens is simply that the executions are really parallel, and as there's no buffering, any byte reading is an IO operation and thus a perfect time for the go scheduler to change goroutine.
The problem is that you are sharing a single resource (the stdin byte stream) across multiple goroutines.
Each goroutine could be spawn at different non-deterministic times. i.e:
first goroutine 1 read all stdin, then start goroutine 2
first goroutine 2 read all stdin, then start goroutine 1
first goroutine 1 block on read, then start goroutine 2 read one char and then restart goroutine 1
... and so on and on ...
In most cases is enough to use only one goroutine to access a linear resource as a byte stream and attach a channel to it and then spawn multiple consumers that listen to that channel.
For example:
package main
import (
"fmt"
"io"
"sync"
)
func main() {
var wg sync.WaitGroup
words := make(chan string, 10)
wg.Add(1)
go func() {
for {
var buff string
_, err := fmt.Scanf("%s", &buff)
if err != nil {
if err != io.EOF {
fmt.Println("Error: ", err)
}
break
}
words <- buff
}
close(words)
wg.Done()
}()
// Multiple consumers
for i := 0; i < 5; i += 1 {
go func() {
for word := range words {
fmt.Printf("%s\n", word)
}
}()
}
wg.Wait()
}

Resources