How to detect deleted file? - go

Writing to a non-existent file does not produce an error in Go.
For example, here's a sample program writing to a file in a loop:
package main
import (
"log"
"os"
"time"
)
func main() {
f, err := os.OpenFile("mytest.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
}
}
While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString(). (I tested on Linux, it may be different for other OS's)
Is there a way to detect if the file was deleted (other than doing a stat on the file before every write)? And presumably the bytes written are simply discarded by the operating system?

While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString()
Yes, that's exactly the behavior that's specified. Also the file hasn't been removed. The only thing that rm does remove is that particular path entry in the filesystem. A single file can have multiple paths, also called hardlinks.
The actual file is deleted only, when the last reference to it, either by filesystem entry (link) or by file descriptor (file open in a program) has been closed.
This particular behavior of the Unix file model was used for a long time to implement "unnamed" shared memory, by creating and opening a file in /dev/shm and then removing the filesystem entry – because this particular way of doing things introduces a race condition, for security sensitive applications new syscalls were introduced, that allow creating anonymous memory maps, and very recently Linux even got a function to create a file in a filesystem, without creating a path entry (open with O_TMPFILE flag).
On more recent versions of Linux you can even re-/create filesystem entries for files which last entry already was removed using the linkat syscall.
Update
The question is, do you really want to error out if the last filesystem entry vanishes? It's not a bad condition after all, you can safely write and read, without problems, just be aware, that once you close the last file descriptor to the file, it will be lost.
It is perfectly possible to detect if the last filesystem entry has been removed and abort file operations if so – however be aware, that such code might introduce it's very own share of problems, for example if the program expects to create a new filesystem entry, once everything has been written to the file properly, using linkat.
Anyway, what you can do, is fstat-ing the file (file.Stat in Go) and look at the number of hardlinks the file has. If that number drops to zero, all filesystem entries are gone. Actually getting that number is a little bit tricky in Go, it's described here Counting hard links to a file in Go
package main
import (
"fmt"
"log"
"os"
"syscall"
"time"
)
func main() {
fmt.Println("Test Operation")
f, err := os.OpenFile("test.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
stat, err := f.Stat()
if err != nil{
log.Fatal(err)
}
if sys := stat.Sys(); sys != nil {
if stat, ok := sys.(*syscall.Stat_t); ok {
nlink := uint64(stat.Nlink)
if 0 == nlink {
log.Printf("All filesystem entries to original file removed, exiting")
break
}
}
}
}
}

Related

Reading a file as it is written

Being fairly new to Go, I wrote a simple Go program that continually reads a file as it is written - for example to watch a log file for specific output, which I want to capture as soon as it has been written.
My (naive) code:
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"os"
)
func main() {
fileHandle, err := os.Open("test.log")
if err != nil {
fmt.Println(err)
}
fileReader := bufio.NewReader(fileHandle)
line := bytes.NewBufferString("")
for true {
content, err := fileReader.ReadBytes('\n')
line.Write(content)
if err == io.EOF {
continue
} else {
if line.String() == "done\r\n" {
break
}
fmt.Println(line.String())
line.Reset()
}
}
err = fileHandle.Close()
if err != nil {
fmt.Println(err)
}
}
This works just fine, it prints every line of the log file as it is being written, never writing anything until it has a complete line (which is the desired behaviour).
My question: what is the idiomatic way in Go to change this code so that I can run this as a routine that blocks until there is more to read?
I know how to take the code and turn it into a go routine, but it would still be constantly trying to read the file, dealing with the io.EOF and continuing. Is there a more idiomatic way of writing this as a go routine that would avoid needlessly checking?
Is it a matter of constantly checking .Buffered() instead? Even that seems wasteful and intensive. I can slow it down with time.Sleep() whenever it encounters io.EOF, but I want to minimise the amount of delay. In Go, what is the way to wait for something to be written to the file, with a minimum of delay, and the minimum computational and I/O overhead?
(Note: I'm not asking for a rewrite of the code above, but really just an answer to the question itself, perhaps pointing at an existing example, or the appropriate functions in the documentation)

Can't access filesystem in Go Lambda

I used Lambda functions before, and if I remember correctly I'm supposed to have ~500Mb of (ephemeral) space in /tmp.
Nevertheless, my Go lambda function doesn't seem to interact with the fs properly:
exec.Command("ls -la /").Output() returns empty
exec.Command("rm -rf /tmp/xxx").Run() returns fork/exec : no such file or directory
exec.Command("mkdir -p /tmp/xxx").Run() returns fork/exec : no such file or directory
It's really weird.
It's using the go1.x environment (thus, I guess amazonlinux:2)
UPDATE
I CAN access the fs using Go os functions:
os.RemoveAll("/tmp/xxx")
if _, err := os.Stat("/tmp/xxx"); os.IsNotExist(err) {
if err := os.Mkdir("/tmp/xxx", os.ModePerm); err != nil {
return err
}
}
BUT I really need exec to run afterwards (a binary command), and write a file in that tmp folder. The error in that case is the same (no such file or directory). Even though I've just created the folder with the above commands.
You are close. The way you use exec.Command() is not yet 100% correct. Try the following:
package main
import (
"fmt"
"os"
"os/exec"
)
func main() {
o, err := exec.Command("ls", "-la", "/tmp").Output()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
fmt.Printf("%s\n", o)
}
The first argument to Command() is the program you want to run and all the following arguments are the programs arguments.
See https://play.golang.org/p/WaVOU0IESmZ

How to view over thousands of files in a directory through the command line?

I have downloaded over 500 Gb of data to a single directory off of AWS.
Whenever I try to access that directory, the command line hangs and doesn't show me anything.
I'm trying to run some code that will interact with the files by printing out the path of each file but the command line hangs and then exits the program.
The program definitely starts execution because "Printing file path's" gets displayed to the console.
func main() {
fmt.Println("Printing file path's")
err := filepath.Walk(source,
func(fpath string, info os.FileInfo, err error) {
if !info.IsDir() && file path.Ext(fpath)==".txt" {
fmt.Println(fpath)
}
}
}
}
How should I handle the situation of being able to view all the files in the command line and why is this program not working?
UPDATE:
By using
files, err := dir.Readdir(10)
if err == io.EOF {
break
}
I was able to snap up the first 10 folders/files in the directory.
Using a loop I could keep doing this until I hit the end of the directory.
This doesn't rely on ordering the files/folders as does the walk function and so its more efficient.
The possible performance issue with filepath.Walk is clearly documented:
The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient.
Use os.File.Readdir to iterate files in filesystem order:
Readdir reads the contents of the directory associated with file and returns a slice of up to n FileInfo values, as would be returned by Lstat, in directory order. Subsequent calls on the same file will yield further FileInfos.
package main
import (
"fmt"
"io"
"log"
"os"
"time"
)
func main() {
dir, err := os.Open("/tmp")
if err != nil {
log.Fatal(err)
}
for {
files, err := dir.Readdir(10)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
for _, fi := range files {
classifier := ""
if fi.IsDir() {
classifier = "/"
}
fmt.Printf("%v %12d %s%s\n",
fi.ModTime().UTC().Truncate(time.Second),
fi.Size(),
fi.Name(), classifier,
)
}
}
}

Is it safe to write files in mode os.O_APPEND|os.O_WRONLY?

I have a Go function that appends a line to a file:
func AppendLine(p string, s string) error {
f, err := os.OpenFile(p, os.O_APPEND|os.O_WRONLY, 0600)
defer f.Close()
if err != nil {
return errors.WithStack(err)
}
_, err = f.WriteString(s + "\n")
return errors.WithStack(err)
}
I'm wondering if the flags os.O_APPEND|os.O_WRONLY make this a safe operation. Is there a guarantee that no matter what happens (even if the process gets shut off in the middle of writing) the existing file contents cannot be deleted?
os package is a wrapper around systems calls so you have guarantees provided by operation system. In this case linux OS guarantees that file opened with O_APPEND flag would be processed atomically http://man7.org/linux/man-pages/man2/open.2.html

Getting IP addresses from big nfcapd binary files

I need to get information about source IPs and destination IPs from nfcapd binary file. The problem is in file's size. I know that it is not desirable to open and read very large (more than 1 GB) files with io or os package.
Here is my hacking and draft start:
package main
import (
"fmt"
"time"
"os"
"github.com/tehmaze/netflow/netflow5"
"log"
"io"
"bytes"
)
type Message interface {}
func main() {
startTime := time.Now()
getFile := os.Args[1]
processFile(getFile)
endTime := time.Since(startTime)
log.Printf("Program executes in %s", endTime)
}
func processFile(fileName string) {
file, err := os.Open(fileName)
// Check if file is not empty. If it is, then exit from program
if err != nil {
fmt.Println(err)
os.Exit(1)
}
// Useful to close file after getting information about it
defer file.Close()
Read(file)
}
func Read(r io.Reader) (Message, error) {
data := [2]byte{}
if _, err := r.Read(data[:]); err != nil {
return nil, err
}
buffer := bytes.NewBuffer(data[:])
mr := io.MultiReader(buffer, r)
return netflow5.Read(mr)
}
I want to split file into chunks with 24 flows and process it concurrently after reading with netflow package. But I do not imagine how to do it without losing any data during division.
Please fix me if I missed something in code or description. I spend a lot of time in searching my solution on the web and thinking about another possible implementations.
Any help and/or advice will be highly appreciated.
File has the following properties (command file -I <file_name> in terminal):
file_name: application/octet-stream; charset=binary
The output of file after command nfdump -r <file_name> has this structure:
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
Every property is on own column.
UPDATE 1:
Unfortunately, it is impossible to parse file with netflow package due to difference in binary file structure after saving it on disk via nfcapd. This answer was given by one of the nfdump contributors.
The only way now is to run nfdump from terminal in go program like pynfdump.
Another possible solution in the future is to use gopacket.
IO is is almost always going to be the limiting factor when parsing a file, and unless there is heavy computation involved, reading a single file serially is going to be the fastest way to process it.
Wrap the file in a bufio.Reader and give it to the Read function:
file, err := os.Open(fileName)
if err != nil {
log.Fatal((err)
}
defer file.Close()
packet, err := netflow5.Read(bufio.NewReader(file))
Once it's parsed, you can then split up the records if you need to handle the chunks separately.

Resources