Reading a file as it is written - go

Being fairly new to Go, I wrote a simple Go program that continually reads a file as it is written - for example to watch a log file for specific output, which I want to capture as soon as it has been written.
My (naive) code:
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"os"
)
func main() {
fileHandle, err := os.Open("test.log")
if err != nil {
fmt.Println(err)
}
fileReader := bufio.NewReader(fileHandle)
line := bytes.NewBufferString("")
for true {
content, err := fileReader.ReadBytes('\n')
line.Write(content)
if err == io.EOF {
continue
} else {
if line.String() == "done\r\n" {
break
}
fmt.Println(line.String())
line.Reset()
}
}
err = fileHandle.Close()
if err != nil {
fmt.Println(err)
}
}
This works just fine, it prints every line of the log file as it is being written, never writing anything until it has a complete line (which is the desired behaviour).
My question: what is the idiomatic way in Go to change this code so that I can run this as a routine that blocks until there is more to read?
I know how to take the code and turn it into a go routine, but it would still be constantly trying to read the file, dealing with the io.EOF and continuing. Is there a more idiomatic way of writing this as a go routine that would avoid needlessly checking?
Is it a matter of constantly checking .Buffered() instead? Even that seems wasteful and intensive. I can slow it down with time.Sleep() whenever it encounters io.EOF, but I want to minimise the amount of delay. In Go, what is the way to wait for something to be written to the file, with a minimum of delay, and the minimum computational and I/O overhead?
(Note: I'm not asking for a rewrite of the code above, but really just an answer to the question itself, perhaps pointing at an existing example, or the appropriate functions in the documentation)

Related

Why does writer.Write(msg) fail to write to os.Stdout?

In the code bellow, why does the Write() operation not work?
package main
import (
"fmt"
"bufio"
"os"
)
func main() {
fmt.Println("Hello, playground")
writer := bufio.NewWriter(os.Stdout)
//var msg []byte
msg := []byte{104, 101, 108, 108, 111, 10}
_, err := writer.Write(msg)
if err != nil {
fmt.Println("some error")
}
}
The output is:
Hello, playground
But it should be:
Hello, playground
hello
Also, I don’t want to use fmt.Println(). To be more specific, I get the data as []byte type
As Cerise Limón noted in a comment, a writer.Write() call merely queues up more data to be written (depending on the buffer size and the amount of data). The actual write may happen later, or never. In your case, since you never tell the writer to finish any delayed write, no write ever happens.
You'll need to invoke writer.Flush() (not writer.WriterFlush(), that's a typo of sorts). This can return an error, if the write fails, so ideally you should check for that, not just defer the call and hope. However, there's not much you can do about the failure, if there is one.
You can do an explicit, in-line check for error, as I did here for instance, or you can just defer the flush call and throw away any error.

Can't get golang and the package bigquery to work for loading to big query

I am trying to figure out how to get a simple bq load command to work with https://godoc.org/cloud.google.com/go/bigquery#Table.LoaderFrom
Running it manually it looks like this:
bq load --source_format=AVRO --ignore_unknown_values --replace=true mydataset.mytable gs://mybucket/table/*
And running it in my golang with exec.Command() successfully looks like this:
exec.Command("bq", "load", "--source_format=AVRO", "--ignore_unknown_values",
"--replace=true", "mydataset.mytable",
"gs://mybucket/table/*")
However, I cannot get this program to run without a segmentation fault when trying to get the load and job.wait to run successfully it seems to be getting a segmentation violation at the job.wait line of the program
package main
import (
"context"
"log"
"cloud.google.com/go/bigquery"
)
func main(){
ctx := context.Background()
client, err := bigquery.NewClient(ctx, "my-project-id")
if err != nil {
// TODO: Handle error.
}
gcsRef := bigquery.NewGCSReference("gs://mybucket/table/*")
gcsRef.SourceFormat = "AVRO"
gcsRef.IgnoreUnknownValues = true
// TODO: set other options on the GCSReference.
ds := client.Dataset("mydataset")
loader := ds.Table("mytable").LoaderFrom(gcsRef)
// TODO: set other options on the Loader.
job, err := loader.Run(ctx)
if err != nil {
// TODO: Handle error.
}
status, err := job.Wait(ctx) //seg faults right here
if err != nil {
// TODO: Handle error.
}
if status.Err() != nil {
// TODO: Handle error.
}
}
The panic is probably coming from a nil pointer reference to the job variable.
I would suggest including a log.Fatal(err)
In all of your err!= nil blocks.
This will help get you closer to why job is not being assigned correctly.
When you're writing one off scripts like this one in go log.Fatal is a great way to exit the program and print exactly what the issue is.
With go you're always trying to bubble errors up the stack to determine if the code should continue to execute, if things can be recovered, or if it's just a fatal thing and you should end the program.
For more info on the logging package checkout here: https://golang.org/pkg/log/
If you're just starting out learning go here are some awesome resources that can help give you ideas on how different types of programs can be designed.
https://github.com/dashpradeep99/https-github.com-miguellgt-books/tree/master/go
Best,
Christopher

How to detect deleted file?

Writing to a non-existent file does not produce an error in Go.
For example, here's a sample program writing to a file in a loop:
package main
import (
"log"
"os"
"time"
)
func main() {
f, err := os.OpenFile("mytest.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
}
}
While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString(). (I tested on Linux, it may be different for other OS's)
Is there a way to detect if the file was deleted (other than doing a stat on the file before every write)? And presumably the bytes written are simply discarded by the operating system?
While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString()
Yes, that's exactly the behavior that's specified. Also the file hasn't been removed. The only thing that rm does remove is that particular path entry in the filesystem. A single file can have multiple paths, also called hardlinks.
The actual file is deleted only, when the last reference to it, either by filesystem entry (link) or by file descriptor (file open in a program) has been closed.
This particular behavior of the Unix file model was used for a long time to implement "unnamed" shared memory, by creating and opening a file in /dev/shm and then removing the filesystem entry – because this particular way of doing things introduces a race condition, for security sensitive applications new syscalls were introduced, that allow creating anonymous memory maps, and very recently Linux even got a function to create a file in a filesystem, without creating a path entry (open with O_TMPFILE flag).
On more recent versions of Linux you can even re-/create filesystem entries for files which last entry already was removed using the linkat syscall.
Update
The question is, do you really want to error out if the last filesystem entry vanishes? It's not a bad condition after all, you can safely write and read, without problems, just be aware, that once you close the last file descriptor to the file, it will be lost.
It is perfectly possible to detect if the last filesystem entry has been removed and abort file operations if so – however be aware, that such code might introduce it's very own share of problems, for example if the program expects to create a new filesystem entry, once everything has been written to the file properly, using linkat.
Anyway, what you can do, is fstat-ing the file (file.Stat in Go) and look at the number of hardlinks the file has. If that number drops to zero, all filesystem entries are gone. Actually getting that number is a little bit tricky in Go, it's described here Counting hard links to a file in Go
package main
import (
"fmt"
"log"
"os"
"syscall"
"time"
)
func main() {
fmt.Println("Test Operation")
f, err := os.OpenFile("test.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
stat, err := f.Stat()
if err != nil{
log.Fatal(err)
}
if sys := stat.Sys(); sys != nil {
if stat, ok := sys.(*syscall.Stat_t); ok {
nlink := uint64(stat.Nlink)
if 0 == nlink {
log.Printf("All filesystem entries to original file removed, exiting")
break
}
}
}
}
}

golang - open file in an empty bash window

lets say I have this code:
package main
import (
"io/ioutil"
"fmt"
)
func check(err error) {
if err != nil {
panic(err)
}
}
func main() {
file, err := ioutil.ReadFile("test.txt")
check(err)
fmt.Print(string(file))
}
when running it with go run I want it to be written in a cleared bash window. Is it possible to do so without using any additional open-source repositories?
Thanks in advance.
If clearing the terminal is truly part of your program's responsibility then check out the answers in this question How can I clear the terminal screen in Go?
However if you're just wanting to clear the screen as part of your development process then I would keep it simple and do something like this
clear && go run *.go

Getting IP addresses from big nfcapd binary files

I need to get information about source IPs and destination IPs from nfcapd binary file. The problem is in file's size. I know that it is not desirable to open and read very large (more than 1 GB) files with io or os package.
Here is my hacking and draft start:
package main
import (
"fmt"
"time"
"os"
"github.com/tehmaze/netflow/netflow5"
"log"
"io"
"bytes"
)
type Message interface {}
func main() {
startTime := time.Now()
getFile := os.Args[1]
processFile(getFile)
endTime := time.Since(startTime)
log.Printf("Program executes in %s", endTime)
}
func processFile(fileName string) {
file, err := os.Open(fileName)
// Check if file is not empty. If it is, then exit from program
if err != nil {
fmt.Println(err)
os.Exit(1)
}
// Useful to close file after getting information about it
defer file.Close()
Read(file)
}
func Read(r io.Reader) (Message, error) {
data := [2]byte{}
if _, err := r.Read(data[:]); err != nil {
return nil, err
}
buffer := bytes.NewBuffer(data[:])
mr := io.MultiReader(buffer, r)
return netflow5.Read(mr)
}
I want to split file into chunks with 24 flows and process it concurrently after reading with netflow package. But I do not imagine how to do it without losing any data during division.
Please fix me if I missed something in code or description. I spend a lot of time in searching my solution on the web and thinking about another possible implementations.
Any help and/or advice will be highly appreciated.
File has the following properties (command file -I <file_name> in terminal):
file_name: application/octet-stream; charset=binary
The output of file after command nfdump -r <file_name> has this structure:
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
Every property is on own column.
UPDATE 1:
Unfortunately, it is impossible to parse file with netflow package due to difference in binary file structure after saving it on disk via nfcapd. This answer was given by one of the nfdump contributors.
The only way now is to run nfdump from terminal in go program like pynfdump.
Another possible solution in the future is to use gopacket.
IO is is almost always going to be the limiting factor when parsing a file, and unless there is heavy computation involved, reading a single file serially is going to be the fastest way to process it.
Wrap the file in a bufio.Reader and give it to the Read function:
file, err := os.Open(fileName)
if err != nil {
log.Fatal((err)
}
defer file.Close()
packet, err := netflow5.Read(bufio.NewReader(file))
Once it's parsed, you can then split up the records if you need to handle the chunks separately.

Resources