"too many open files" with os.Create - go

I have around 220'000 image files (.png) to create. I run into this error message when trying to create the 1'081th file:
panic: open /media/Snaps/pics/image1081_0.png: too many open files
I've added the defer w.Close() line but it did not change the error.
i := 1
for i <= 223129 {
(some other code to prepare the data and create the chart)
img := vgimg.New(450, 600)
dc := draw.New(img)
canvases := table.Align(plots, dc)
plots[0][0].Draw(canvases[0][0])
plots[1][0].Draw(canvases[1][0])
plots[2][0].Draw(canvases[2][0])
testFile := "/media/Snaps/pics/image"+strconv.Itoa(i+60)+"_"+gain_loss+".png"
w, err := os.Create(testFile)
if err != nil {
panic(err)
}
defer w.Close()
png := vgimg.PngCanvas{Canvas: img}
if _, err := png.WriteTo(w); err != nil {
panic(err)
}
//move to next image
i = i + 1
}
Surely this limit can be worked around ? Maybe I'm not closing the files properly ?

The Go Programming Language Specification
Defer statements
A "defer" statement invokes a function whose execution is deferred to
the moment the surrounding function returns, either because the
surrounding function executed a return statement, reached the end of
its function body, or because the corresponding goroutine is
panicking.
DeferStmt = "defer" Expression .
The expression must be a function or method call; it cannot be
parenthesized. Calls of built-in functions are restricted as for
expression statements.
Each time a "defer" statement executes, the function value and
parameters to the call are evaluated as usual and saved anew but the
actual function is not invoked. Instead, deferred functions are
invoked immediately before the surrounding function returns, in the
reverse order they were deferred. If a deferred function value
evaluates to nil, execution panics when the function is invoked, not
when the "defer" statement is executed.
In other words, if you are processing files in a loop, put the processing for a single file in a separate function to pair the Open with the defer Close(). This avoids the "too many open files" error.
For example, use a file processing structure like this to guarantee each file is closed immediately after use.
package main
import (
"fmt"
"io/ioutil"
"os"
)
// process single file
func processFile(name string) error {
f, err := os.Open(name)
if err != nil {
return err
}
defer f.Close()
fi, err := f.Stat()
if err != nil {
return err
}
fmt.Println(fi.Name(), fi.Size())
return nil
}
func main() {
wd, err := os.Getwd()
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fis, err := ioutil.ReadDir(wd)
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
// process all files
for _, fi := range fis {
processFile(fi.Name())
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
}
}
Playground: https://play.golang.org/p/FrBWqlMOzaS
Output:
dev 1644
etc 1644
tmp 548
usr 822

Deferred statements are not executed until the surrounding function returns, that is why your files stay open until after the for-loop.
To fix this you can simply insert an anonymous function call inside the loop:
for ... {
func() {
w, err := os.Create(testFile)
if err != nil {
panic(err)
}
defer w.Close()
...
}()
}
That way, after each iteration of the loop, the current file is closed.

ok, i got it, changed defer w.Close() to w.Close() and moved it after
png := vgimg.PngCanvas{Canvas: img}
if _, err := png.WriteTo(w); err != nil {
panic(err)
}
I'm now above 10'000 images and running...

Related

using io.Pipes() for sending and receiving message

I am using os.Pipes() in my program, but for some reason it gives a bad file descriptor error each time i try to write or read data from it.
Is there some thing I am doing wrong?
Below is the code
package main
import (
"fmt"
"os"
)
func main() {
writer, reader, err := os.Pipe()
if err != nil {
fmt.Println(err)
}
_,err= writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
var data []byte
_, err = reader.Read(data)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(data))
}
output :
write |0: Invalid argument
read |1: Invalid argument
You are using an os.Pipe, which returns a pair of FIFO connected files from the os. This is different than an io.Pipe which is implemented in Go.
The invalid argument errors are because you are reading and writing to the wrong files. The signature of os.Pipe is
func Pipe() (r *File, w *File, err error)
which shows that the returns values are in the order "reader, writer, error".
and io.Pipe:
func Pipe() (*PipeReader, *PipeWriter)
Also returning in the order "reader, writer"
When you check the error from the os.Pipe function, you are only printing the value. If there was an error, the files are invalid. You need to return or exit on that error.
Pipes are also blocking (though an os.Pipe has a small, hard coded buffer), so you need to read and write asynchronously. If you swapped this for an io.Pipe it would deadlock immediately. Dispatch the Read method inside a goroutine and wait for it to complete.
Finally, you are reading into a nil slice, which will read nothing. You need to allocate space to read into, and you need to record the number of bytes read to know how much of the buffer is used.
A more correct version of your example would look like:
reader, writer, err := os.Pipe()
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
data := make([]byte, 1024)
n, err = reader.Read(data)
if n > 0 {
fmt.Println(string(data[:n]))
}
if err != nil && err != io.EOF {
fmt.Println(err)
}
}()
_, err = writer.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
}
wg.Wait()

Golang - why is string slice element not included in exec cat unless I sort it

I have a slightly funky issue in golang. Essentially I have a slice of strings which represent file paths. I then run a cat against those filepaths to combine the files before sorting, deduping, etc.
here is the section of code (where 'applicableReductions' is the string slice):
applicableReductions := []string{}
for _, fqFromListName := range fqFromListNames {
filePath := GetFilePath()
//BROKE CODE GOES HERE
}
applicableReductions = append(applicableReductions, filePath)
fileOut, err := os.Create(toListWriteTmpFilePath)
if err != nil {
return err
}
cat := exec.Command("cat", applicableReductions...)
catStdOut, err := cat.StdoutPipe()
if err != nil {
return err
}
go func(cat *exec.Cmd) error {
if err := cat.Start(); err != nil {
return fmt.Errorf("File reduction error (cat) : %s", err)
}
return nil
}(cat)
// Init Writer & write file
writer := bufio.NewWriter(fileOut)
defer writer.Flush()
_, err = io.Copy(writer, catStdOut)
if err != nil {
return err
}
if err = cat.Wait(); err != nil {
return err
}
fDiff.StandardiseData(fileOut, toListUpdateFolderPath, list.Name)
The above works fine. The problem comes when I try to append a new ele to the array. I have a seperate function which creates a new file from db content which is then added to the applicableReductions slice.
func RetrieveDomainsFromDB(collection *Collection, listName, outputPath string) error {
domains, err := domainReviews.GetDomainsForList(listName)
if err != nil {
return err
}
if len(domains) < 1 {
return ErrNoDomainReviewsForList
}
fh, err := os.OpenFile(outputPath, os.O_RDWR, 0774)
if err != nil {
fh, err = os.Create(outputPath)
if err != nil {
return err
}
}
defer fh.Close()
_, err = fh.WriteString(strings.Join(domains, "\n"))
if err != nil {
return err
}
return nil
}
If I call the above function and append the filePath to the applicableReduction slice, it is in there but doesnt get called by cat.
To clarify, when I put the following where it says BROKE CODE GOES HERE:
if dbSource {
err = r.RetrieveDomainsFromDB(collection, ToListName, filePath)
if err != nil {
return err
continue
}
}
The filepath can be seen when doing fmt.Println(applicableReductions) but the content of the files contents are not seen in the cat output file.
I thought perhaps a delay in the file being written so i tried adding a time.wait, tis didnt help. However the solution I found was to sort the slice, e.g this code above the call to exec cat solves the problem but I dont know why:
sort.Strings(applicableReductions)
I have confirmed all files present on both successful and unsucessful runs the only difference is without the sort, the content of the final appended file is missing
An explanation from a go-pro out there would be very much appreciated, let me know if you need more info, debug - happy to oblige to understand
UPDATE
It has been suggested that this is the same issue as here: Golang append an item to a slice, I think I understand the issue there and I'm not saying this isnt the same but I cannot see the same thing happenning - the slice in question is not touched from outside the main function (e.g. no editing of the slice in RetrieveDomainsFromDB function), I create the slice before a loop, append to it within a loop and then use it after the loop - Ive added an example at the top to show how the slice is built - please could someone clarify where this slice is being copied if this is the case
UPDATE AND CLOSE
Please close question - the issue was unrelated to the use of a string slice. Turns out that I was reading from the final output file before bufio-writer had been flushed (at end of function before defer flush kicked in on function return)
I think the sorting was just re-arranging the problem so I didnt notice it persisted or possibly giving some time for the buffer to flush. Either way sorted now with a manual call to flush.
Thanks for all help provided

golang zlib reader output not being copied over to stdout

I've modified the official documentation example for the zlib package to use an opened file rather than a set of hardcoded bytes (code below).
The code reads in the contents of a source text file and compresses it with the zlib package. I then try to read back the compressed file and print its decompressed contents into stdout.
The code doesn't error, but it also doesn't do what I expect it to do; which is to display the decompressed file contents into stdout.
Also: is there another way of displaying this information, rather than using io.Copy?
package main
import (
"compress/zlib"
"io"
"log"
"os"
)
func main() {
var err error
// This defends against an error preventing `defer` from being called
// As log.Fatal otherwise calls `os.Exit`
defer func() {
if err != nil {
log.Fatalln("\nDeferred log: \n", err)
}
}()
src, err := os.Open("source.txt")
if err != nil {
return
}
defer src.Close()
dest, err := os.Create("new.txt")
if err != nil {
return
}
defer dest.Close()
zdest := zlib.NewWriter(dest)
defer zdest.Close()
if _, err := io.Copy(zdest, src); err != nil {
return
}
n, err := os.Open("new.txt")
if err != nil {
return
}
r, err := zlib.NewReader(n)
if err != nil {
return
}
defer r.Close()
io.Copy(os.Stdout, r)
err = os.Remove("new.txt")
if err != nil {
return
}
}
Your defer func doesn't do anything, because you're shadowing the err variable on every new assignment. If you want a defer to run, return from a separate function, and call log.Fatal after the return statement.
As for why you're not seeing any output, it's because you're deferring all the Close calls. The zlib.Writer isn't flushed until after the function exits, and neither is the destination file. Call Close() explicitly where you need it.
zdest := zlib.NewWriter(dest)
if _, err := io.Copy(zdest, src); err != nil {
log.Fatal(err)
}
zdest.Close()
dest.Close()
I think you messed up the code logic with all this defer stuff and your "trick" err checking.
Files are definitively written when flushed or closed. You just copy into new.txt without closing it before opening it to read it.
Defering the closing of the file is neat inside a function which has multiple exits: It makes sure the file is closed once the function is left. But your main requires the new.txt to be closed after the copy, before re-opening it. So don't defer the close here.
BTW: Your defense against log.Fatal terminating the code without calling your defers is, well, at least strange. The files are all put into some proper state by the OS, there is absolutely no need to complicate the stuff like this.
Check the error from the second Copy:
2015/12/22 19:00:33
Deferred log:
unexpected EOF
exit status 1
The thing is, you need to close zdest immediately after you've done writing. Close it after the first Copy and it works.
I would have suggested to use io.MultiWriter.
In this way you read only once from src. Not much gain for small files but is faster for bigger files.
w := io.MultiWriter(dest, os.Stdout)

Parallel zip compression in Go

I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.
I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.
From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.
With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}

ReadLine from io.ReadCloser

I need to find a way to read a line from a io.ReadCloser object OR find a way to split a byte array on a "end line" symbol. However I don't know the end line symbol and I can't find it.
My application execs a php script and needs to get the live output from the script and do "something" with it when it gets it.
Here's a small piece of my code:
cmd := exec.Command(prog, args)
/* cmd := exec.Command("ls")*/
out, err := cmd.StdoutPipe()
if err != nil {
fmt.Println(err)
}
err = cmd.Start()
if err != nil {
fmt.Println(err)
}
after this I monitor the out buffer in a go routine. I've tried 2 ways.
1) nr, er := out.Read(buf) where buf is a byte array. the problem here is that I need to brake the array for each new line
2) my second option is to create a new bufio.reader
r := bufio.NewReader(out)
line,_,e := r.ReadLine()
it runs fine if I exec a command like ls, I get the output line by line, but if I exec a php script it immediately get an End Of File error and exits(I'm guessing that's because of the delayed output from php)
EDIT: My problem was I was creating the bufio.Reader inside the go routine whereas if I do it right after the StdoutPipe() like minikomi suggested, it works fine
You can create a reader using bufio, and then read until the next line break character (Note, single quotes to denote character!):
stdout, err := cmd.StdoutPipe()
rd := bufio.NewReader(stdout)
if err := cmd.Start(); err != nil {
log.Fatal("Buffer Error:", err)
}
for {
str, err := rd.ReadString('\n')
if err != nil {
log.Fatal("Read Error:", err)
return
}
fmt.Println(str)
}
If you're trying to read from the reader in a goroutine with nothing to stop the script, it will exit.
Another option is bufio.NewScanner:
package main
import (
"bufio"
"os/exec"
)
func main() {
cmd := exec.Command("go", "env")
out, err := cmd.StdoutPipe()
if err != nil {
panic(err)
}
buf := bufio.NewScanner(out)
cmd.Start()
defer cmd.Wait()
for buf.Scan() {
println(buf.Text())
}
}
https://golang.org/pkg/bufio#NewScanner

Resources