I'm trying to parse some log files as they're being written in Go but I'm not sure how I would accomplish this without rereading the file again and again while checking for changes.
I'd like to be able to read to EOF, wait until the next line is written and read to EOF again, etc. It feels a bit like how tail -f looks.
I have written a Go package -- github.com/hpcloud/tail -- to do exactly this.
t, err := tail.TailFile("/var/log/nginx.log", tail.Config{Follow: true})
for line := range t.Lines {
fmt.Println(line.Text)
}
...
Quoting kostix's answer:
in real life files might be truncated, replaced or renamed (because that's what tools like logrotate are supposed to do).
If a file gets truncated, it will automatically be re-opened. To support re-opening renamed files (due to logrotate, etc.), you can set Config.ReOpen, viz.:
t, err := tail.TailFile("/var/log/nginx.log", tail.Config{
Follow: true,
ReOpen: true})
for line := range t.Lines {
fmt.Println(line.Text)
}
Config.ReOpen is analogous to tail -F (capital F):
-F The -F option implies the -f option, but tail will also check to see if the file being followed has been
renamed or rotated. The file is closed and reopened when tail detects that the filename being read from
has a new inode number. The -F option is ignored if reading from standard input rather than a file.
You have to either watch the file for changes (using an OS-specific subsystem to accomplish this) or poll it periodically to see whether its modification time (and size) changed. In either case, after reading another chunk of data you remember the file offset and restore it before reading another chunk after detecting the change.
But note that this seems to be easy only on paper: in real life files might be truncated, replaced or renamed (because that's what tools like logrotate are supposed to do).
See this question for more discussion of this problem.
A simple example:
package main
import (
"bufio"
"fmt"
"io"
"os"
"time"
)
func tail(filename string, out io.Writer) {
f, err := os.Open(filename)
if err != nil {
panic(err)
}
defer f.Close()
r := bufio.NewReader(f)
info, err := f.Stat()
if err != nil {
panic(err)
}
oldSize := info.Size()
for {
for line, prefix, err := r.ReadLine(); err != io.EOF; line, prefix, err = r.ReadLine() {
if prefix {
fmt.Fprint(out, string(line))
} else {
fmt.Fprintln(out, string(line))
}
}
pos, err := f.Seek(0, io.SeekCurrent)
if err != nil {
panic(err)
}
for {
time.Sleep(time.Second)
newinfo, err := f.Stat()
if err != nil {
panic(err)
}
newSize := newinfo.Size()
if newSize != oldSize {
if newSize < oldSize {
f.Seek(0, 0)
} else {
f.Seek(pos, io.SeekStart)
}
r = bufio.NewReader(f)
oldSize = newSize
break
}
}
}
}
func main() {
tail("x.txt", os.Stdout)
}
I'm also interested in doing this, but haven't (yet) had the time to tackle it. One approach that occurred to me is to let "tail" do the heavy lifting. It would likely make your tool platform-specific, but that may be ok. The basic idea would be to use Cmd from the "os/exec" package to follow the file. You could fork a process that was the equivalent of "tail --retry --follow=name prog.log", and then listen to it's Stdout using the Stdout reader on the the Cmd object.
Sorry I know it's just a sketch, but maybe it's helpful.
There are many ways to do this. In modern POSIX based Operating Systems, one can use the inotify interface to do this.
One can use this package: https://github.com/fsnotify/fsnotify
Sample code:
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
done := make(chan bool)
err = watcher.Add(fileName)
if err != nil {
log.Fatal(err)
}
for {
select {
case event := <-watcher.Events:
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", event.Name)
}
}
Hope this helps!
Related
I need my program to be in the middle of the connection and transfer data correctly in both directions. I wrote this code, but it does not work properly
package main
import (
"fmt"
"net"
)
func main() {
listener, err := net.Listen("tcp", ":8120")
if err != nil {
fmt.Println(err)
return
}
defer listener.Close()
fmt.Println("Server is listening...")
for {
var conn1, conn2 net.Conn
var err error
conn1, err = listener.Accept()
if err != nil {
fmt.Println(err)
conn1.Close()
continue
}
conn2, err = net.Dial("tcp", "185.151.245.51:80")
if err != nil {
fmt.Println(err)
conn2.Close()
continue
}
go handleConnection(conn1, conn2)
go handleConnection(conn2, conn1)
}
}
func handleConnection(conn1, conn2 net.Conn) {
defer conn1.Close()
for {
input := make([]byte, 1024)
n, err := conn1.Read(input)
if n == 0 || err != nil {
break
}
conn2.Write([]byte(input))
}
}
The problem is that the data is corrupted,
for example.
Left one is original, right one is what i got.
End of the final gotten file is unreadable.
But at the beginnig everything is ok.
I tried to change input slice size. If size > 0 and < 8, everything is fine, but slow. If i set input size very large, corruption of data become more awful.
What I'm doing wrong?
In handleConnection, you always write 1024 bytes, no matter what conn1.Read returns.
You want to write the data like this:
conn2.Write(input[:n])
You should also check your top-level for loop. Are you sure you're not accepting multiple connections and smushing them all together? I'd sprinkle in some log statements so you can see when connections are made and closed.
Another (probably inconsequential) mistake, is that you treat n==0 as a termination condition. In the documentation of io.Reader it's recommended that you ignore n==0, err==nil. Without checking the code I can't be sure, but I expect that conn.Read never returns n==0, err==nil, so it's unlikely that this is causing you trouble.
Although it doesn't affect correctness, you could also lift the definition of input out of the loop so that it's reused on each iteration; it's likely to reduce the amount of work the garbage collector has to do.
I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram
func search() {
unique := make(map[string]string)
files, err := ioutil.ReadDir(".")
if err != nil {
log.Println(err)
}
for _, file := range files {
fileName := file.Name()
fmt.Println("CHECKING:", fileName)
fi, err := os.Stat(fileName)
if err != nil {
fmt.Println(err)
continue
}
if fi.Mode().IsRegular() {
data, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Println(err)
continue
}
sum := md5.Sum(data)
hexDigest := hex.EncodeToString(sum[:])
if _, ok := unique[hexDigest]; ok == false {
unique[hexDigest] = fileName
} else {
fmt.Println("DUPLICATE:", fileName)
}
}
}
}
As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks
There is an example in the Golang documentation, which covers your case.
package main
import (
"crypto/md5"
"fmt"
"io"
"log"
"os"
)
func main() {
f, err := os.Open("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := md5.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
fmt.Printf("%x", h.Sum(nil))
}
For your case, just make sure to close the files in the loop and not defer them. Or put the logic into a function.
Sounds like the 16GB RAM is your problem, not speed per se.
Don't read the entire file into a variable with ReadFile; io.Copy from the Reader that Open gives you to the Writer that hash/md5 provides (md5.New returns a hash.Hash, which embeds an io.Writer). That only copies a little bit at a time instead of pulling all of the file into RAM.
This is a trick useful in a lot of places in Go; packages like text/template, compress/gzip, net/http, etc. work in terms of Readers and Writers. With them, you don't usually need to create huge []bytes or strings; you can hook I/O interfaces up to each other and let them pass around pieces of content for you. In a garbage collected language, saving memory tends to save you CPU work as well.
I've modified the official documentation example for the zlib package to use an opened file rather than a set of hardcoded bytes (code below).
The code reads in the contents of a source text file and compresses it with the zlib package. I then try to read back the compressed file and print its decompressed contents into stdout.
The code doesn't error, but it also doesn't do what I expect it to do; which is to display the decompressed file contents into stdout.
Also: is there another way of displaying this information, rather than using io.Copy?
package main
import (
"compress/zlib"
"io"
"log"
"os"
)
func main() {
var err error
// This defends against an error preventing `defer` from being called
// As log.Fatal otherwise calls `os.Exit`
defer func() {
if err != nil {
log.Fatalln("\nDeferred log: \n", err)
}
}()
src, err := os.Open("source.txt")
if err != nil {
return
}
defer src.Close()
dest, err := os.Create("new.txt")
if err != nil {
return
}
defer dest.Close()
zdest := zlib.NewWriter(dest)
defer zdest.Close()
if _, err := io.Copy(zdest, src); err != nil {
return
}
n, err := os.Open("new.txt")
if err != nil {
return
}
r, err := zlib.NewReader(n)
if err != nil {
return
}
defer r.Close()
io.Copy(os.Stdout, r)
err = os.Remove("new.txt")
if err != nil {
return
}
}
Your defer func doesn't do anything, because you're shadowing the err variable on every new assignment. If you want a defer to run, return from a separate function, and call log.Fatal after the return statement.
As for why you're not seeing any output, it's because you're deferring all the Close calls. The zlib.Writer isn't flushed until after the function exits, and neither is the destination file. Call Close() explicitly where you need it.
zdest := zlib.NewWriter(dest)
if _, err := io.Copy(zdest, src); err != nil {
log.Fatal(err)
}
zdest.Close()
dest.Close()
I think you messed up the code logic with all this defer stuff and your "trick" err checking.
Files are definitively written when flushed or closed. You just copy into new.txt without closing it before opening it to read it.
Defering the closing of the file is neat inside a function which has multiple exits: It makes sure the file is closed once the function is left. But your main requires the new.txt to be closed after the copy, before re-opening it. So don't defer the close here.
BTW: Your defense against log.Fatal terminating the code without calling your defers is, well, at least strange. The files are all put into some proper state by the OS, there is absolutely no need to complicate the stuff like this.
Check the error from the second Copy:
2015/12/22 19:00:33
Deferred log:
unexpected EOF
exit status 1
The thing is, you need to close zdest immediately after you've done writing. Close it after the first Copy and it works.
I would have suggested to use io.MultiWriter.
In this way you read only once from src. Not much gain for small files but is faster for bigger files.
w := io.MultiWriter(dest, os.Stdout)
I have this code
subProcess := exec.Cmd{
Path: execAble,
Args: []string{
fmt.Sprintf("-config=%s", *configPath),
fmt.Sprintf("-serverType=%s", *serverType),
fmt.Sprintf("-reload=%t", *reload),
fmt.Sprintf("-listenFD=%d", fd),
},
Dir: here,
}
subProcess.Stdout = os.Stdout
subProcess.Stderr = os.Stderr
logger.Info("starting subProcess:%s ", subProcess.Args)
if err := subProcess.Run(); err != nil {
logger.Fatal(err)
}
and then I do os.Exit(1) to stop the main process
I can get output from the subprocess
but I also want to put stdin to
I try
subProcess.Stdin = os.Stdin
but it does not work
I made a simple program (for testing). It reads a number and writes the given number out.
package main
import (
"fmt"
)
func main() {
fmt.Println("Hello, What's your favorite number?")
var i int
fmt.Scanf("%d\n", &i)
fmt.Println("Ah I like ", i, " too.")
}
And here is the modified code
package main
import (
"fmt"
"io"
"os"
"os/exec"
)
func main() {
subProcess := exec.Command("go", "run", "./helper/main.go") //Just for testing, replace with your subProcess
stdin, err := subProcess.StdinPipe()
if err != nil {
fmt.Println(err) //replace with logger, or anything you want
}
defer stdin.Close() // the doc says subProcess.Wait will close it, but I'm not sure, so I kept this line
subProcess.Stdout = os.Stdout
subProcess.Stderr = os.Stderr
fmt.Println("START") //for debug
if err = subProcess.Start(); err != nil { //Use start, not run
fmt.Println("An error occured: ", err) //replace with logger, or anything you want
}
io.WriteString(stdin, "4\n")
subProcess.Wait()
fmt.Println("END") //for debug
}
You interested about these lines
stdin, err := subProcess.StdinPipe()
if err != nil {
fmt.Println(err)
}
defer stdin.Close()
//...
io.WriteString(stdin, "4\n")
//...
subProcess.Wait()
Explanation of the above lines
We gain the subprocess' stdin, now we can write to it
We use our power and we write a number
We wait for our subprocess to complete
Output
START
Hello, What's your favorite number?
Ah I like 4 too.
END
For better understanding
There's now an updated example available in the Go docs: https://golang.org/pkg/os/exec/#Cmd.StdinPipe
If the subprocess doesn't continue before the stdin is closed, the io.WriteString() call needs to be wrapped inside an anonymous function:
func main() {
cmd := exec.Command("cat")
stdin, err := cmd.StdinPipe()
if err != nil {
log.Fatal(err)
}
go func() {
defer stdin.Close()
io.WriteString(stdin, "values written to stdin are passed to cmd's standard input")
}()
out, err := cmd.CombinedOutput()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s\n", out)
}
Though this question is a little old, but here is my answer:
This question is of course very platform specific as how standard IO is handled depends on the OS implementation and not on Go language. However, as general rule of thumb (due to some OSes being prevailing), "what you ask is not possible".
On most of modern operating systems you can pipe standard streams (as in #mraron's answer), you can detach them (this is how daemons work), but you cannot reassign or delegate them to another process.
I think this limitation is more because of security concern. There are still from time to time bugs being discovered that allow remote code execution, imagine if OS was allowing to reassign/delegate STDIN/OUT, then with such vulnerabilities the consequences would be disastrous.
While you cannot directly do this as #AlexKey wrote earlier still you can make some workarounds. If os prevents you to pipe your own standard streams who cares all you need 2 channels and 2 goroutines
var stdinChan chan []byte
var stdoutChan chan []byte
//when something happens in stdout of your code just pipe it to stdout chan
stdoutChan<-somehowGotDataFromStdOut
then you need two funcs as i mentioned before
func Pipein(){
for {
stdinFromProg.Write(<-stdInChan)
}
}
The same idea for the stdout
I want to be able to fully communicate with some programs after spawning them from Golang program. What I already have is spawning process and talking through pipes based on last line read from stdout:
package main
import (
"fmt"
"io"
"log"
"os/exec"
"strings"
)
var stdinPipe io.WriteCloser
var stdoutPipe io.ReadCloser
var err error
func main() {
cmd := &exec.Cmd{
Path: "/Users/seba/Projects/go/src/bootstrap/in",
Args: []string{"program"},
}
stdinPipe, err = cmd.StdinPipe()
if err != nil {
log.Fatal(err)
}
stdoutPipe, err = cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
err = cmd.Start()
if err != nil {
log.Fatal(err)
}
var stdoutLines []string
go stdoutManage(stdoutLines, stdoutController)
cmd.Wait()
}
// TODO: imporove as in io.Copy
func stdoutManage(lines []string, manager func(string)) {
buf := make([]byte, 32*1024)
for {
nr, err := stdoutPipe.Read(buf)
if nr > 0 {
thelines := strings.Split(string(buf), "\n")
for _, l := range thelines {
manager(l)
lines = append(lines, l)
}
}
buf = make([]byte, 32*1024) // clear buf
if err != nil {
break
}
}
}
However this approach have problems with programs clearing terminal output and programs which somehow buffer it's stdin or don't use stdin at all (don't know if it's possible).
So the question: is there a portable way of talking with programs (it can be non-Golang solution)?
Problems like this are usually to do with the C library which changes its default buffering mode depending on exactly what stdin / stdout / stderr are.
If stdout is a terminal then buffering is automatically set to line buffered, else it is set to buffered.
This is relevant to you because when you run the programs through a pipe they aren't connected to a terminal and so will have buffering which messes up this sort of use.
To fix, you need to use a pseudo tty which pretends to be a terminal but acts just like a pipe. Here is a library implementing the pty interface which I haven't actually tried but it looks like it does the right thing!