Reading a file line-by-line with concurrency - go

What I Want To Do
In GetLine, I am trying to parse a file line-by-line using bufio.Scanner and a naive attempt at concurrency.
Following fetching the text in each line, I am sending it via a channel of string to the caller(main function). Along with the value, I am also sending errors and completion flag(via done channel). Thus, this should be able to fetch a new line to process in a separate goroutine while the current line is processed.
What I Have Actually Done
var READCOMPLETE = errors.New("Completed Reading")
func main() {
filename := flag.String("filename", "", "The file to parse")
flag.Parse()
if *filename == "" {
log.Fatal("Provide a file to parse")
}
fmt.Println("Getting file")
names := make(chan string)
readerr := make(chan error)
done := make(chan bool)
go GetLine(*filename, names, readerr, done)
for {
select {
case name := <-names:
// Process each line
fmt.Println(name)
case err := <-readerr:
log.Fatal(err)
case <-done:
// close(names)
// close(readerr)
break
}
}
fmt.Println("Processing Complete")
}
func GetLine(filename string, names chan string, readerr chan error, done chan bool) {
file, err := os.Open(filename)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
names <- scanner.Text()
//fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
readerr <- err
}
done <- true
}
What I Get on Running
Runtime Error: fatal error: all goroutines are asleep - deadlock!
What have I Tried to Fix?
After reading this answer about the error message, I tried closing the channels names and readerr in the last clause of the select statement as shown in the comments. However, the program still crashes with a log message. I am unable to fix it further and would appreciate any help.
Resources for learning are welcome.
P.S: I am relatively new to GoLang and still learning how to work with the CSP model of concurrency in Go. Infact, this is my first attempt at writing a synchronous concurrent program.

The break statement in a select breaks out of the select. The application must break out of the for loop when done. Use a label to break out of the for loop:
loop:
for {
select {
case name := <-names:
// Process each line
fmt.Println(name)
case err := <-readerr:
log.Fatal(err)
case <-done:
// close(names)
// close(readerr)
break loop
}
}
The code can be simplified by eliminating the done channel.
func main() {
filename := flag.String("filename", "", "The file to parse")
flag.Parse()
if *filename == "" {
log.Fatal("Provide a file to parse")
}
fmt.Println("Getting file")
names := make(chan string)
readerr := make(chan error)
go GetLine(*filename, names, readerr)
loop:
for {
select {
case name := <-names:
// Process each line
fmt.Println(name)
case err := <-readerr:
if err != nil {
log.Fatal(err)
}
break loop
}
}
fmt.Println("Processing Complete")
}
func GetLine(filename string, names chan string, readerr chan error) {
file, err := os.Open(filename)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
names <- scanner.Text()
}
readerr <- scanner.Err()
}
In this specific example, the code can be restructured to separate receiving names from receiving the error.
func main() {
filename := flag.String("filename", "", "The file to parse")
flag.Parse()
if *filename == "" {
log.Fatal("Provide a file to parse")
}
fmt.Println("Getting file")
names := make(chan string)
readerr := make(chan error)
go GetLine(*filename, names, readerr)
for name := range names {
fmt.Println(name)
}
if err := <-readerr; err != nil {
log.Fatal(err)
}
fmt.Println("Processing Complete")
}
func GetLine(filename string, names chan string, readerr chan error) {
file, err := os.Open(filename)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
names <- scanner.Text()
}
close(names) // close causes range on channel to break out of loop
readerr <- scanner.Err()
}

Related

fastest way to extract tar files in side tar file using GO

I have a tar file that contains multiple tar files in it. I'm currently extracting these tars recursively using the tar Reader by moving manually over the files. This process is very heavy and slow, especially when dealing with large tar files that contain thousands of files and directories.
I didn't find any good package that is able to do this recursive extraction fast. plus I tried using the command tar -xf file.tar --same-owner" for the inner tars, but had a problem with permissions issue (which happens only on mac).
my question is:
Is there a way to parallelize the manual extraction process so that the inner tars will be extracted in parallel?
I have a method for the extraction task which I'm trying to make parallel:
var wg sync.WaitGroup
wg.Add(len(tarFiles))
for {
header, err := tarBallReader.Next()
if err != nil {
break
}
go extractFileAsync(parentFolder, header, tarBallReader, depth, &wg)
}
wg.Wait()
after adding the go routines, the files are getting corrupted and the process is stuck on an endless loop.
example of the main tar content:
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/VERSION
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/json
1d2755f3375860aaaf2b5f0474692df2e0d4329569c1e8187595bf4b3bf3f3b9/layer.tar
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/VERSION
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/json
348188998f2a69b4ac0ca96b42990292eef67c0abfa05412e2fb7857645f4280/layer.tar
54c027bf04447fdb035ddc13a6ae5493a3f997bdd3577607b0980954522efb9e.json
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/VERSION
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/json
9dd3c29af50daaf86744a8ade86ecf12f6a5a6ffc27a5a7398628e4a21770ee3/layer.tar
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/VERSION
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/json
b6c49400b643245cdbe17b7a7eb14f0f7def5a93326b99560241715c1e95502e/layer.tar
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/VERSION
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/json
c662ec0dc487910e7b76b2a4d67ab1a9ca63ce1784f636c2637b41d6c7ac5a1e/layer.tar
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/VERSION
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/json
da87454b77f6ac7fab1f465c10a07a1eb4b46df8058d98892794618cac8eacdc/layer.tar
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/VERSION
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/json
ea1c2adfdc777d8746e50ad3e679789893a991606739c9bc7e01f273fa0b6e12/layer.tar
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/VERSION
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/json
f3b6608e814053048d79e519be79f654a2e9364dfdc8fb87b71e2fc57bbff115/layer.tar
manifest.json
repositories
or simply you can run docker save <image>:<tag> -o image.tar and check the content of the tar.
Probably your code hangs on wg.Wait() due to the fact that the number of calls to wg.Done() during execution is not equal to len(tarFiles).
That should work:
var wg sync.WaitGroup
// wg.Add(len(tarFiles))
for {
header, err := tarBallReader.Next()
if err != nil {
break
}
wg.Add(1)
go extractFileAsync(parentFolder, header, tarBallReader, depth, &wg)
}
wg.Wait()
func extractFileAsync(...) {
defer wg.Done()
// some code
}
UPD: correction of a possible race condition. Thanks #craigb
Here is my solution to a similar problem (simplified):
package main
import (
"archive/tar"
"fmt"
"io"
"os"
"path/filepath"
"strings"
"sync"
)
type Semaphore struct {
Wg sync.WaitGroup
Ch chan int
}
// Limit on the number of simultaneously running goroutines.
// Depends on the number of processor cores, storage performance, amount of RAM, etc.
const grMax = 10
const tarFileName = "docker_image.tar"
const dstDir = "output/docker"
func extractTar(tarFileName string, dstDir string) error {
f, err := os.Open(tarFileName)
if err != nil {
return err
}
sem := Semaphore{}
sem.Ch = make(chan int, grMax)
if err := Untar(dstDir, f, &sem, true); err != nil {
return err
}
fmt.Println("extractTar: wait for complete")
sem.Wg.Wait()
return nil
}
func Untar(dst string, r io.Reader, sem *Semaphore, godeep bool) error {
tr := tar.NewReader(r)
for {
header, err := tr.Next()
switch {
case err == io.EOF:
return nil
case err != nil:
return err
}
// the target location where the dir/file should be created
target := filepath.Join(dst, header.Name)
switch header.Typeflag {
// if its a dir and it doesn't exist create it
case tar.TypeDir:
if _, err := os.Stat(target); err != nil {
if err := os.MkdirAll(target, 0755); err != nil {
return err
}
}
// if it's a file create it
case tar.TypeReg:
if err := saveFile(tr, target, os.FileMode(header.Mode)); err != nil {
return err
}
ext := filepath.Ext(target)
// if it's tar file and we are on top level, extract it
if ext == ".tar" && godeep {
sem.Wg.Add(1)
// A buffered channel is used to limit the number of simultaneously running goroutines
sem.Ch <- 1
// the file is unpacked to a directory with the file name (without extension)
newDir := filepath.Join(dst, strings.TrimSuffix(header.Name, ".tar"))
if err := os.Mkdir(newDir, 0755); err != nil {
return err
}
go func(target string, newDir string, sem *Semaphore) {
fmt.Println("start goroutine, chan length:", len(sem.Ch))
fmt.Println("START:", target)
defer sem.Wg.Done()
defer func() {<-sem.Ch}()
// the internal tar file opens
ft, err := os.Open(target)
if err != nil {
fmt.Println(err)
return
}
defer ft.Close()
// the godeep parameter is false here to avoid unpacking archives inside the current archive.
if err := Untar(newDir, ft, sem, false); err != nil {
fmt.Println(err)
return
}
fmt.Println("DONE:", target)
}(target, newDir, sem)
}
}
}
return nil
}
func saveFile(r io.Reader, target string, mode os.FileMode) error {
f, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, mode)
if err != nil {
return err
}
defer f.Close()
if _, err := io.Copy(f, r); err != nil {
return err
}
return nil
}
func main() {
err := extractTar(tarFileName, dstDir)
if err != nil {
fmt.Println(err)
}
}

Best approach to getting results out of goroutines

I have two functions that I cannot change (see first() and second() below). They are returning some data and errors (the output data is different, but in the examples below I use (string, error) for simplicity)
I would like to run them in separate goroutines - my approach:
package main
import (
"fmt"
"os"
)
func first(name string) (string, error) {
if name == "" {
return "", fmt.Errorf("empty name is not allowed")
}
fmt.Println("processing first")
return fmt.Sprintf("First hello %s", name), nil
}
func second(name string) (string, error) {
if name == "" {
return "", fmt.Errorf("empty name is not allowed")
}
fmt.Println("processing second")
return fmt.Sprintf("Second hello %s", name), nil
}
func main() {
firstCh := make(chan string)
secondCh := make(chan string)
go func() {
defer close(firstCh)
res, err := first("one")
if err != nil {
fmt.Printf("Failed to run first: %v\n", err)
}
firstCh <- res
}()
go func() {
defer close(secondCh)
res, err := second("two")
if err != nil {
fmt.Printf("Failed to run second: %v\n", err)
}
secondCh <- res
}()
resultsOne := <-firstCh
resultsTwo := <-secondCh
// It's important for my app to do error checking and stop if errors exist.
if resultsOne == "" || resultsTwo == "" {
fmt.Println("There was an ERROR")
os.Exit(1)
}
fmt.Println("ONE:", resultsOne)
fmt.Println("TWO:", resultsTwo)
}
I believe one caveat is that resultsOne := <- firstCh blocks until first goroutine finishes, but I don't care too much about this.
Can you please confirm that my approach is good? What other approaches would be better in my situation?
The example looks mostly good. A couple improvements are:
declaring your channels as buffered
firstCh := make(chan string, 1)
secondCh := make(chan string, 1)
With unbuffered channels, send operations block (until someone receives). If your goroutine #2 is much faster than the first, it will have to wait until the first finishes as well, since you receive in sequence:
resultsOne := <-firstCh // waiting on this one first
resultsTwo := <-secondCh // sender blocked because the main thread hasn't reached this point
use "golang.org/x/sync/errgroup".Group. The program will feel "less native" but it dispenses you from managing channels by hand — which trades, in a non-contrived setting, for sync'ing writes on the results:
func main() {
var (
resultsOne string
resultsTwo string
)
g := errgroup.Group{}
g.Go(func() error {
res, err := first("one")
if err != nil {
return err
}
resultsOne = res
return nil
})
g.Go(func() error {
res, err := second("two")
if err != nil {
return err
}
resultsTwo = res
return nil
})
err := g.Wait()
// ... handle err

Optimize writing to CSV in Go

The following snippet validates a phone number and write the details to CSV.
func Parse(phone Input, output *PhoneNumber) error {
var n PhoneNumber
num, _ := phonenumbers.Parse(phone.Number, phone.Prefix)
n.PhoneNumber = phonenumbers.Format(num, phonenumbers.E164)
n.CountryCode = num.GetCountryCode()
n.PhoneType = phonenumbers.GetNumberType(num)
n.NetworkName, _ = phonenumbers.GetCarrierForNumber(num, "EN")
n.Region = phonenumbers.GetRegionCodeForNumber(num)
*output = n
return nil
}
func createFile(path string) {
// detect if file exists
var _, err = os.Stat(path)
// create file if not exists
if os.IsNotExist(err) {
var file, err = os.Create(path)
if err != nil {
return
}
defer file.Close()
}
}
func worker(ctx context.Context, dst chan string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
go processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
path := "sample_all.csv"
createFile(path)
var csvFile, _ = os.OpenFile(path, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
csvwriter := csv.NewWriter(csvFile)
_ = csvwriter.Write([]string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region})
defer csvFile.Close()
csvwriter.Flush()
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
// drain the output
for res := range out {
fmt.Println(res)
}
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}
In processNumber function, if I skip writing to CSV, the process of verifying number completes 6 seconds but writing one record at a time on CSV stretch the time consumption to 15s.
How can I optimize the code?
Can I chunk the records and write them in chunks instead of writing one row at a time?
Do work directly in worker goroutine instead of firing off goroutine per task.
Open file output file once. Flush output file once.
func worker(ctx context.Context, dst chan []string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
dst <- processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) []string {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
return []string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region}
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
path := "sample_all.csv"
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
csvwriter := csv.NewWriter(csvFile)
// drain the output
for res := range out {
csvwriter.Write(res)
}
csvwriter.Flush()
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}

Goroutine deadlock while walking folders

I have this code based on the pipelines example. walkFiles takes one or more than one folder (as specified in the folders variable) and "visits" the files in all folders given as a parameter. It also takes a done channel to allow for cancellation, but I don't think it matters for this problem.
The code works as expected when passed only one folder to walk. But when given two it gives me the infamous fatal error: all goroutines are asleep - deadlock! error. It even looks like it's doing the right thing by processing the files of the two folders, but it doesn't end well. What is the (probably obvious) error I'm making in the concurrency of this function?
Here's the code:
type result struct {
path string
checksum []byte
err error
}
type FileData struct {
Hash []byte
}
// walkFiles starts a goroutine to walk the directory tree at root and send the
// path of each regular file on the string channel. It sends the result of the
// walk on the error channel. If done is closed, walkFiles abandons its work.
func (p Processor) walkFiles(done <-chan struct{}, folders []string) (<-chan string, <-chan error) {
paths := make(chan string)
errc := make(chan error, 1)
visit := func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.Mode().IsRegular() {
return nil
}
select {
case paths <- path:
case <-done:
return errors.New("walk canceled")
}
return nil
}
var wg sync.WaitGroup
for i, folder := range folders {
wg.Add(1)
go func(f string, i int) {
defer wg.Done()
// No select needed for this send, since errc is buffered.
errc <- filepath.Walk(f, visit)
}(folder, i)
}
go func() {
wg.Wait()
close(paths)
}()
return paths, errc
}
func closeFile(f *os.File) {
err := f.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
}
// processor reads path names from paths and sends digests of the corresponding
// files on c until either paths or done is closed.
func (p Processor) process(done <-chan struct{}, files <-chan string, c chan<- result, loc *locator.Locator) {
for f := range files {
func() {
file, err := os.Open(f.path)
if err != nil {
fmt.Println(err)
return
}
defer closeFile(file)
// Hashing file, producing `checksum` variable, and an `err`
select {
case c <- result{f.path, checksum, err}:
case <-done:
return
}
}()
}
}
// MD5All reads all the files in the file tree rooted at root and returns a map
// from file path to the MD5 sum of the file's contents. If the directory walk
// fails or any read operation fails, MD5All returns an error. In that case,
// MD5All does not wait for inflight read operations to complete.
func (p Processor) MD5All(folders []string) (map[string]FileData, error) {
// MD5All closes the done channel when it returns; it may do so before
// receiving all the values from c and errc.
done := make(chan struct{})
defer close(done)
paths, errc := p.walkFiles(done, folders)
c := make(chan result)
var wg sync.WaitGroup
wg.Add(NUM_DIGESTERS)
for i := 0; i < NUM_DIGESTERS; i++ {
go func() {
p.process(done, paths, c, loc)
wg.Done()
}()
}
go func() {
wg.Wait()
close(c)
}()
// End of pipeline. OMIT
m := make(map[string]FileData)
for r := range c {
if r.err != nil {
return nil, r.err
}
m[r.path] = FileData{r.checksum}
}
if err := <-errc; err != nil {
return nil, err
}
return m, nil
}
func (p Processor) Start() map[string]FileData {
m, err := p.MD5All(p.folders)
if err != nil {
log.Fatal(err)
}
return m
}
The problem is here:
if err := <-errc; err != nil {
return nil, err
}
You're reading from the errc only once, but all groutines are writing to it. Once the errc is read for the first completing goroutine, all others are stuck waiting to write to it.
Read using a for-loop.

how to repeat shutting down and establish go routine?

every one,I am new to golang.I wanna get the data from log file generated by my application.cuz roll-back mechanism, I met some problem.For instance,my target log file is chats.log,it will be renamed to chats.log.2018xxx and a new chats.log will be created.so my go routine that read log file will fail to work.
so I need detect the change and shutdown the previous go routine and then establish the new go routine.
I looked for modules that can help me,and I found
func ExampleNewWatcher(fn string, createnoti chan string, wg sync.WaitGroup) {
wg.Add(1)
defer wg.Done()
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
done := make(chan bool)
go func() {
for {
select {
case event := <-watcher.Events:
if event.Op == fsnotify.Create && event.Name==fn{
createnoti <- "has been created"
}
case err := <-watcher.Errors:
log.Println("error:", err)
}
}
}()
err = watcher.Add("./")
if err != nil {
log.Fatal(err)
}
<-done
}
I use fsnotify to detech the change,and make sure the event of file is my log file,and then send some message to a channel.
this is my worker go routine:
func tailer(fn string,isfollow bool, outchan chan string, done <-chan interface{},wg sync.WaitGroup) error {
wg.Add(1)
defer wg.Done()
_, err := os.Stat(fn)
if err != nil{
panic(err)
}
t, err := tail.TailFile(fn, tail.Config{Follow:isfollow})
if err != nil{
panic(err)
}
defer t.Stop()
for line := range t.Lines{
select{
case outchan <- line.Text:
case <- done:
return nil
}
}
return nil
}
I using tail module to read the log file,and I add a done channel to it to shutdown the cycle(I don't know whether I put it in the right way)
And I will send every log content to a channel to consuming it.
So here is the question:how should I put it together?
ps: Actually,I can use some tool to do this job.like apache-flume,but all of those tools need dependency.
Thank you a lot!
Here is a complete example that reloads and rereads the file as it changes or gets deleted and recreated:
package main
import (
"github.com/fsnotify/fsnotify"
"io/ioutil"
"log"
)
const filename = "myfile.txt"
func ReadFile(filename string) string {
data, err := ioutil.ReadFile(filename)
if err != nil {
log.Println(err)
}
return string(data)
}
func main() {
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
err = watcher.Add("./")
if err != nil {
log.Fatal(err)
}
for {
select {
case event := <-watcher.Events:
if event.Op == fsnotify.Create && event.Name == filename {
log.Println(ReadFile(filename))
}
case err := <-watcher.Errors:
log.Println("error:", err)
}
}
}
Note this doesn't require goroutines, channels or a WaitGroup. Better to keep things simple and reserve those for when they're actually needed.

Resources