I've modified the official documentation example for the zlib package to use an opened file rather than a set of hardcoded bytes (code below).
The code reads in the contents of a source text file and compresses it with the zlib package. I then try to read back the compressed file and print its decompressed contents into stdout.
The code doesn't error, but it also doesn't do what I expect it to do; which is to display the decompressed file contents into stdout.
Also: is there another way of displaying this information, rather than using io.Copy?
package main
import (
"compress/zlib"
"io"
"log"
"os"
)
func main() {
var err error
// This defends against an error preventing `defer` from being called
// As log.Fatal otherwise calls `os.Exit`
defer func() {
if err != nil {
log.Fatalln("\nDeferred log: \n", err)
}
}()
src, err := os.Open("source.txt")
if err != nil {
return
}
defer src.Close()
dest, err := os.Create("new.txt")
if err != nil {
return
}
defer dest.Close()
zdest := zlib.NewWriter(dest)
defer zdest.Close()
if _, err := io.Copy(zdest, src); err != nil {
return
}
n, err := os.Open("new.txt")
if err != nil {
return
}
r, err := zlib.NewReader(n)
if err != nil {
return
}
defer r.Close()
io.Copy(os.Stdout, r)
err = os.Remove("new.txt")
if err != nil {
return
}
}
Your defer func doesn't do anything, because you're shadowing the err variable on every new assignment. If you want a defer to run, return from a separate function, and call log.Fatal after the return statement.
As for why you're not seeing any output, it's because you're deferring all the Close calls. The zlib.Writer isn't flushed until after the function exits, and neither is the destination file. Call Close() explicitly where you need it.
zdest := zlib.NewWriter(dest)
if _, err := io.Copy(zdest, src); err != nil {
log.Fatal(err)
}
zdest.Close()
dest.Close()
I think you messed up the code logic with all this defer stuff and your "trick" err checking.
Files are definitively written when flushed or closed. You just copy into new.txt without closing it before opening it to read it.
Defering the closing of the file is neat inside a function which has multiple exits: It makes sure the file is closed once the function is left. But your main requires the new.txt to be closed after the copy, before re-opening it. So don't defer the close here.
BTW: Your defense against log.Fatal terminating the code without calling your defers is, well, at least strange. The files are all put into some proper state by the OS, there is absolutely no need to complicate the stuff like this.
Check the error from the second Copy:
2015/12/22 19:00:33
Deferred log:
unexpected EOF
exit status 1
The thing is, you need to close zdest immediately after you've done writing. Close it after the first Copy and it works.
I would have suggested to use io.MultiWriter.
In this way you read only once from src. Not much gain for small files but is faster for bigger files.
w := io.MultiWriter(dest, os.Stdout)
Related
I have around 220'000 image files (.png) to create. I run into this error message when trying to create the 1'081th file:
panic: open /media/Snaps/pics/image1081_0.png: too many open files
I've added the defer w.Close() line but it did not change the error.
i := 1
for i <= 223129 {
(some other code to prepare the data and create the chart)
img := vgimg.New(450, 600)
dc := draw.New(img)
canvases := table.Align(plots, dc)
plots[0][0].Draw(canvases[0][0])
plots[1][0].Draw(canvases[1][0])
plots[2][0].Draw(canvases[2][0])
testFile := "/media/Snaps/pics/image"+strconv.Itoa(i+60)+"_"+gain_loss+".png"
w, err := os.Create(testFile)
if err != nil {
panic(err)
}
defer w.Close()
png := vgimg.PngCanvas{Canvas: img}
if _, err := png.WriteTo(w); err != nil {
panic(err)
}
//move to next image
i = i + 1
}
Surely this limit can be worked around ? Maybe I'm not closing the files properly ?
The Go Programming Language Specification
Defer statements
A "defer" statement invokes a function whose execution is deferred to
the moment the surrounding function returns, either because the
surrounding function executed a return statement, reached the end of
its function body, or because the corresponding goroutine is
panicking.
DeferStmt = "defer" Expression .
The expression must be a function or method call; it cannot be
parenthesized. Calls of built-in functions are restricted as for
expression statements.
Each time a "defer" statement executes, the function value and
parameters to the call are evaluated as usual and saved anew but the
actual function is not invoked. Instead, deferred functions are
invoked immediately before the surrounding function returns, in the
reverse order they were deferred. If a deferred function value
evaluates to nil, execution panics when the function is invoked, not
when the "defer" statement is executed.
In other words, if you are processing files in a loop, put the processing for a single file in a separate function to pair the Open with the defer Close(). This avoids the "too many open files" error.
For example, use a file processing structure like this to guarantee each file is closed immediately after use.
package main
import (
"fmt"
"io/ioutil"
"os"
)
// process single file
func processFile(name string) error {
f, err := os.Open(name)
if err != nil {
return err
}
defer f.Close()
fi, err := f.Stat()
if err != nil {
return err
}
fmt.Println(fi.Name(), fi.Size())
return nil
}
func main() {
wd, err := os.Getwd()
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fis, err := ioutil.ReadDir(wd)
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
// process all files
for _, fi := range fis {
processFile(fi.Name())
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
}
}
Playground: https://play.golang.org/p/FrBWqlMOzaS
Output:
dev 1644
etc 1644
tmp 548
usr 822
Deferred statements are not executed until the surrounding function returns, that is why your files stay open until after the for-loop.
To fix this you can simply insert an anonymous function call inside the loop:
for ... {
func() {
w, err := os.Create(testFile)
if err != nil {
panic(err)
}
defer w.Close()
...
}()
}
That way, after each iteration of the loop, the current file is closed.
ok, i got it, changed defer w.Close() to w.Close() and moved it after
png := vgimg.PngCanvas{Canvas: img}
if _, err := png.WriteTo(w); err != nil {
panic(err)
}
I'm now above 10'000 images and running...
Consider the following code snippet:
func a(fd int) {
file := os.NewFile(uintptr(fd), "")
defer func() {
if err := file.Close(); err != nil {
fmt.Printf("%v", err)
}
}
This piece of code is legit, and will work OK. Files will be closed upon returning from a()
However, The following will not work correctly:
func a(fd int) {
file := os.NewFile(uintptr(fd), "")
defer func() {
if err := syscall.Close(int(file.Fd()); err != nil {
fmt.Printf("%v", err)
}
}
The error that will be received, occasionally, will be bad file descriptor, due to the fact of NewFile setting a finalizer
which, during garbage collection, will close the file itself.
Whats unclear to me, is that the deferred function still has a reference to the file, so theoretically, it shouldn't be garbage collected yet.
So why is golang runtime behaves that way?
the problems of the code is after file.Fd() return, file is unreachable, so file may be close by the finalizer(garbage collected).
according to runtime.SetFinalizer:
For example, if p points to a struct that contains a file descriptor d, and p has a finalizer that closes that file descriptor, and if the last use of p in a function is a call to syscall.Write(p.d, buf, size), then p may be unreachable as soon as the program enters syscall.Write. The finalizer may run at that moment, closing p.d, causing syscall.Write to fail because it is writing to a closed file descriptor (or, worse, to an entirely different file descriptor opened by a different goroutine). To avoid this problem, call runtime.KeepAlive(p) after the call to syscall.Write.
runtime.KeepAlive usage:
KeepAlive marks its argument as currently reachable. This ensures that the object is not freed, and its finalizer is not run, before the point in the program where KeepAlive is called.
func a(fd int) {
file := os.NewFile(uintptr(fd), "")
defer func() {
if err := syscall.Close(int(file.Fd()); err != nil {
fmt.Printf("%v", err)
}
runtime.KeepAlive(file)
}()
}
I am using os.Pipes() in my program, but for some reason it gives a bad file descriptor error each time i try to write or read data from it.
Is there some thing I am doing wrong?
Below is the code
package main
import (
"fmt"
"os"
)
func main() {
writer, reader, err := os.Pipe()
if err != nil {
fmt.Println(err)
}
_,err= writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
var data []byte
_, err = reader.Read(data)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(data))
}
output :
write |0: Invalid argument
read |1: Invalid argument
You are using an os.Pipe, which returns a pair of FIFO connected files from the os. This is different than an io.Pipe which is implemented in Go.
The invalid argument errors are because you are reading and writing to the wrong files. The signature of os.Pipe is
func Pipe() (r *File, w *File, err error)
which shows that the returns values are in the order "reader, writer, error".
and io.Pipe:
func Pipe() (*PipeReader, *PipeWriter)
Also returning in the order "reader, writer"
When you check the error from the os.Pipe function, you are only printing the value. If there was an error, the files are invalid. You need to return or exit on that error.
Pipes are also blocking (though an os.Pipe has a small, hard coded buffer), so you need to read and write asynchronously. If you swapped this for an io.Pipe it would deadlock immediately. Dispatch the Read method inside a goroutine and wait for it to complete.
Finally, you are reading into a nil slice, which will read nothing. You need to allocate space to read into, and you need to record the number of bytes read to know how much of the buffer is used.
A more correct version of your example would look like:
reader, writer, err := os.Pipe()
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
data := make([]byte, 1024)
n, err = reader.Read(data)
if n > 0 {
fmt.Println(string(data[:n]))
}
if err != nil && err != io.EOF {
fmt.Println(err)
}
}()
_, err = writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
wg.Wait()
I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.
I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.
From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.
With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}
I'm trying to parse some log files as they're being written in Go but I'm not sure how I would accomplish this without rereading the file again and again while checking for changes.
I'd like to be able to read to EOF, wait until the next line is written and read to EOF again, etc. It feels a bit like how tail -f looks.
I have written a Go package -- github.com/hpcloud/tail -- to do exactly this.
t, err := tail.TailFile("/var/log/nginx.log", tail.Config{Follow: true})
for line := range t.Lines {
fmt.Println(line.Text)
}
...
Quoting kostix's answer:
in real life files might be truncated, replaced or renamed (because that's what tools like logrotate are supposed to do).
If a file gets truncated, it will automatically be re-opened. To support re-opening renamed files (due to logrotate, etc.), you can set Config.ReOpen, viz.:
t, err := tail.TailFile("/var/log/nginx.log", tail.Config{
Follow: true,
ReOpen: true})
for line := range t.Lines {
fmt.Println(line.Text)
}
Config.ReOpen is analogous to tail -F (capital F):
-F The -F option implies the -f option, but tail will also check to see if the file being followed has been
renamed or rotated. The file is closed and reopened when tail detects that the filename being read from
has a new inode number. The -F option is ignored if reading from standard input rather than a file.
You have to either watch the file for changes (using an OS-specific subsystem to accomplish this) or poll it periodically to see whether its modification time (and size) changed. In either case, after reading another chunk of data you remember the file offset and restore it before reading another chunk after detecting the change.
But note that this seems to be easy only on paper: in real life files might be truncated, replaced or renamed (because that's what tools like logrotate are supposed to do).
See this question for more discussion of this problem.
A simple example:
package main
import (
"bufio"
"fmt"
"io"
"os"
"time"
)
func tail(filename string, out io.Writer) {
f, err := os.Open(filename)
if err != nil {
panic(err)
}
defer f.Close()
r := bufio.NewReader(f)
info, err := f.Stat()
if err != nil {
panic(err)
}
oldSize := info.Size()
for {
for line, prefix, err := r.ReadLine(); err != io.EOF; line, prefix, err = r.ReadLine() {
if prefix {
fmt.Fprint(out, string(line))
} else {
fmt.Fprintln(out, string(line))
}
}
pos, err := f.Seek(0, io.SeekCurrent)
if err != nil {
panic(err)
}
for {
time.Sleep(time.Second)
newinfo, err := f.Stat()
if err != nil {
panic(err)
}
newSize := newinfo.Size()
if newSize != oldSize {
if newSize < oldSize {
f.Seek(0, 0)
} else {
f.Seek(pos, io.SeekStart)
}
r = bufio.NewReader(f)
oldSize = newSize
break
}
}
}
}
func main() {
tail("x.txt", os.Stdout)
}
I'm also interested in doing this, but haven't (yet) had the time to tackle it. One approach that occurred to me is to let "tail" do the heavy lifting. It would likely make your tool platform-specific, but that may be ok. The basic idea would be to use Cmd from the "os/exec" package to follow the file. You could fork a process that was the equivalent of "tail --retry --follow=name prog.log", and then listen to it's Stdout using the Stdout reader on the the Cmd object.
Sorry I know it's just a sketch, but maybe it's helpful.
There are many ways to do this. In modern POSIX based Operating Systems, one can use the inotify interface to do this.
One can use this package: https://github.com/fsnotify/fsnotify
Sample code:
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
done := make(chan bool)
err = watcher.Add(fileName)
if err != nil {
log.Fatal(err)
}
for {
select {
case event := <-watcher.Events:
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", event.Name)
}
}
Hope this helps!