Handling nested zip files with archive/zip

Handling nested zip files with archive/zip - go

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.
archive/zip gives you two methods for handling a zip file:
zip.NewReader
zip.OpenReader
OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.
zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.
My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.
It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.

How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
If you mostly move forwards this probably works ok. Especially if you use a buffered reader.
I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly

I ran into the exact same need and came up with the following approach, not sure if its any help to you:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
So if file doesn't implement io.ReaderAt it reads the whole contents into a buffer.
It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.

Related

How to get the number of files in a directory in beego [duplicate]

I've been trying to figure out how to simply list the files and folders in a single directory in Go.
I've found filepath.Walk, but it goes into sub-directories automatically, which I don't want. All of my other searches haven't turned anything better up.
I'm sure that this functionality exists, but it's been really hard to find. Let me know if anyone knows where I should look. Thanks.

You can try using the ReadDir function in the os package. Per the docs:
ReadDir reads the named directory, returning all its directory entries sorted by filename.
The resulting slice contains os.DirEntry types, which provide the methods listed here. Here is a basic example that lists the name of everything in the current directory (folders are included but not specially marked - you can check if an item is a folder by using the IsDir() method):
package main
import (
"fmt"
"os"
"log"
)
func main() {
entries, err := os.ReadDir("./")
if err != nil {
log.Fatal(err)
}
for _, e := range entries {
fmt.Println(e.Name())
}
}

We can get a list of files inside a folder on the file system using various golang standard library functions.
filepath.Walk
ioutil.ReadDir
os.File.Readdir
package main
import (
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
)
func main() {
var (
root string
files []string
err error
)
root := "/home/manigandan/golang/samples"
// filepath.Walk
files, err = FilePathWalkDir(root)
if err != nil {
panic(err)
}
// ioutil.ReadDir
files, err = IOReadDir(root)
if err != nil {
panic(err)
}
//os.File.Readdir
files, err = OSReadDir(root)
if err != nil {
panic(err)
}
for _, file := range files {
fmt.Println(file)
}
}
Using filepath.Walk
The path/filepath package provides a handy way to scan all the files
in a directory, it will automatically scan each sub-directories in the
directory.
func FilePathWalkDir(root string) ([]string, error) {
var files []string
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
files = append(files, path)
}
return nil
})
return files, err
}
Using ioutil.ReadDir
ioutil.ReadDir reads the directory named by dirname and returns a
list of directory entries sorted by filename.
func IOReadDir(root string) ([]string, error) {
var files []string
fileInfo, err := ioutil.ReadDir(root)
if err != nil {
return files, err
}
for _, file := range fileInfo {
files = append(files, file.Name())
}
return files, nil
}
Using os.File.Readdir
Readdir reads the contents of the directory associated with file and
returns a slice of up to n FileInfo values, as would be returned by
Lstat, in directory order. Subsequent calls on the same file will
yield further FileInfos.
func OSReadDir(root string) ([]string, error) {
var files []string
f, err := os.Open(root)
if err != nil {
return files, err
}
fileInfo, err := f.Readdir(-1)
f.Close()
if err != nil {
return files, err
}
for _, file := range fileInfo {
files = append(files, file.Name())
}
return files, nil
}
Benchmark results.
Get more details on this Blog Post

Even simpler, use path/filepath:
package main
import (
"fmt"
"log"
"path/filepath"
)
func main() {
files, err := filepath.Glob("*")
if err != nil {
log.Fatal(err)
}
fmt.Println(files) // contains a list of all files in the current directory
}

Starting with Go 1.16, you can use the os.ReadDir function.
func ReadDir(name string) ([]DirEntry, error)
It reads a given directory and returns a DirEntry slice that contains the directory entries sorted by filename.
It's an optimistic function, so that, when an error occurs while reading the directory entries, it tries to return you a slice with the filenames up to the point before the error.
package main
import (
"fmt"
"log"
"os"
)
func main() {
files, err := os.ReadDir(".")
if err != nil {
log.Fatal(err)
}
for _, file := range files {
fmt.Println(file.Name())
}
}
Of interest: Go 1.17 (Q3 2021) includes fs.FileInfoToDirEntry():
func FileInfoToDirEntry(info FileInfo) DirEntry
FileInfoToDirEntry returns a DirEntry that returns information from info.
If info is nil, FileInfoToDirEntry returns nil.
Background
Go 1.16 (Q1 2021) will propose, with CL 243908 and CL 243914 , the ReadDir function, based on the FS interface:
// An FS provides access to a hierarchical file system.
//
// The FS interface is the minimum implementation required of the file system.
// A file system may implement additional interfaces,
// such as fsutil.ReadFileFS, to provide additional or optimized functionality.
// See io/fsutil for details.
type FS interface {
// Open opens the named file.
//
// When Open returns an error, it should be of type *PathError
// with the Op field set to "open", the Path field set to name,
// and the Err field describing the problem.
//
// Open should reject attempts to open names that do not satisfy
// ValidPath(name), returning a *PathError with Err set to
// ErrInvalid or ErrNotExist.
Open(name string) (File, error)
}
That allows for "os: add ReadDir method for lightweight directory reading":
See commit a4ede9f:
// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error)
// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
// Name returns the name of the file (or subdirectory) described by the entry.
// This name is only the final element of the path, not the entire path.
// For example, Name would return "hello.go" not "/home/gopher/hello.go".
Name() string
// IsDir reports whether the entry describes a subdirectory.
IsDir() bool
// Type returns the type bits for the entry.
// The type bits are a subset of the usual FileMode bits, those returned by the FileMode.Type method.
Type() os.FileMode
// Info returns the FileInfo for the file or subdirectory described by the entry.
// The returned FileInfo may be from the time of the original directory read
// or from the time of the call to Info. If the file has been removed or renamed
// since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
// If the entry denotes a symbolic link, Info reports the information about the link itself,
// not the link's target.
Info() (FileInfo, error)
}
src/os/os_test.go#testReadDir() illustrates its usage:
file, err := Open(dir)
if err != nil {
t.Fatalf("open %q failed: %v", dir, err)
}
defer file.Close()
s, err2 := file.ReadDir(-1)
if err2 != nil {
t.Fatalf("ReadDir %q failed: %v", dir, err2)
}
Ben Hoyt points out in the comments to Go 1.16 os.ReadDir:
os.ReadDir(path string) ([]os.DirEntry, error), which you'll be able to call directly without the Open dance.
So you can probably shorten this to just os.ReadDir, as that's the concrete function most people will call.
See commit 3d913a9 (Dec. 2020):
os: add ReadFile, WriteFile, CreateTemp (was TempFile), MkdirTemp (was TempDir) from io/ioutil
io/ioutil was a poorly defined collection of helpers.
Proposal #40025 moved out the generic I/O helpers to io.
This CL for proposal #42026 moves the OS-specific helpers to os,
making the entire io/ioutil package deprecated.
os.ReadDir returns []DirEntry, in contrast to ioutil.ReadDir's []FileInfo.
(Providing a helper that returns []DirEntry is one of the primary motivations for this change.)

ioutil.ReadDir is a good find, but if you click and look at the source you see that it calls the method Readdir of os.File. If you are okay with the directory order and don't need the list sorted, then this Readdir method is all you need.

From your description, what you probably want is os.Readdirnames.
func (f *File) Readdirnames(n int) (names []string, err error)
Readdirnames reads the contents of the directory associated with file and returns a slice of up to n names of files in the directory, in directory order. Subsequent calls on the same file will yield further names.
...
If n <= 0, Readdirnames returns all the names from the directory in a single slice.
Snippet:
file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()
names, err := file.Readdirnames(0)
if err != nil {
return err
}
fmt.Println(names)
Credit to SquattingSlavInTracksuit's comment; I'd have suggested promoting their comment to an answer if I could.

A complete example of printing all the files in a directory recursively using Readdirnames
package main
import (
"fmt"
"os"
)
func main() {
path := "/path/to/your/directory"
err := readDir(path)
if err != nil {
panic(err)
}
}
func readDir(path string) error {
file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()
names, _ := file.Readdirnames(0)
for _, name := range names {
filePath := fmt.Sprintf("%v/%v", path, name)
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
fileInfo, err := file.Stat()
if err != nil {
return err
}
fmt.Println(filePath)
if fileInfo.IsDir() {
readDir(filePath)
}
}
return nil
}

Go: zlib uncompressing a slice of bytes

I am trying to parse a file that annoying consists of many separately zipped segments. I have parsed these segments one at a time into a slice of bytes and I want to uncompress them as I go.
Here is my current code that does the decompressing, which doesn't work. from and to are just set at the top as an example, in reality they are set by the code. data is the byte array containing the entire file. I don't want to seek it while it's on disk because its location on another server, so it's only realistic for me to load the entire file to []byte first and then parse it.
from, to := 0, 1000;
b := bytes.NewReader(data[from:from+to])
z, err := zlib.NewReader(b)
CheckErr(err)
defer z.Close()
p := make([]byte,0,1024)
z.Read(p)
fmt.Println(string(p))
So how is it so massively difficult just to unzip a slice of bytes? Anyway...
The problem appears to with how I am reading it out. Where it says z.Read, that doesn't seem to do anything.
How can I read the entire thing in one go into a slice of bytes?

Here's an outline for you. Note: In Go, CHECK FOR ERRORS!
package main
import (
"bytes"
"compress/zlib"
"fmt"
"io/ioutil"
)
func readSegment(data []byte, from, to int) ([]byte, error) {
b := bytes.NewReader(data[from : from+to])
z, err := zlib.NewReader(b)
if err != nil {
return nil, err
}
defer z.Close()
p, err := ioutil.ReadAll(z)
if err != nil {
return nil, err
}
return p, nil
}
func main() {
from, to := 0, 1000
data := make([]byte, from+to)
// ** parse input segments into data **
p, err := readSegment(data, from, to)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(p))
}

Use ReadAll(r io.Reader) ([]byte, error) from the io/ioutil package.
p, err := ioutil.ReadAll(b)
fmt.Println(string(p))
Read only reads up to the length of the given slice (1024 bytes in your case).
To read in chunks of 1024 bytes:
p := make([]byte,1024)
for {
numBytes, err := l.Read(p)
if err == io.EOF {
// you are done, numBytes might be less than len(p)
break
}
// do what you want with p
}
If you are getting the data from a webserver, you might even do
import (
"net/http"
"io/ioutil"
)
...
resp, errGet := http.Get("http://example.com/somefile")
// do error handling
z, errZ := zlib.NewReader(resp.Body)
// do error handling
resp.Body.Close()
p, err := ioutil.ReadAll(b)
// do error handling
since resp.Body happens to be an io.Reader as most io related types.

Getting count of files in directory using Go

How might I get the count of items returned by io/ioutil.ReadDir()?
I have this code, which works, but I have to think isn't the RightWay(tm) in Go.
package main
import "io/ioutil"
import "fmt"
func main() {
files,_ := ioutil.ReadDir("/Users/dgolliher/Dropbox/INBOX")
var count int
for _, f := range files {
fmt.Println(f.Name())
count++
}
fmt.Println(count)
}
Lines 8-12 seem like way too much to go through to just count the results of ReadDir, but I can't find the correct syntax to get the count without iterating over the range. Help?

Found the answer in http://blog.golang.org/go-slices-usage-and-internals
package main
import "io/ioutil"
import "fmt"
func main() {
files,_ := ioutil.ReadDir("/Users/dgolliher/Dropbox/INBOX")
fmt.Println(len(files))
}

ReadDir returns a list of directory entries sorted by filename, so it is not just files. Here is a little function for those wanting to get a count of files only (and not dirs):
func fileCount(path string) (int, error){
i := 0
files, err := ioutil.ReadDir(path)
if err != nil {
return 0, err
}
for _, file := range files {
if !file.IsDir() {
i++
}
}
return i, nil
}

Starting with Go 1.16 (Feb 2021), a better option is os.ReadDir:
package main
import "os"
func main() {
d, e := os.ReadDir(".")
if e != nil {
panic(e)
}
println(len(d))
}
os.ReadDir returns fs.DirEntry instead of fs.FileInfo, which means that
Size and ModTime methods are omitted, making the process more efficient if
you just need an entry count.
https://golang.org/pkg/os#ReadDir

If you wanna get all files (not recursive) you can use len(files). If you need to just get the files without folders and hidden files just loop over them and increase a counter. And please don’t ignore errors

By looking at the code of ioutil.ReadDir
func ReadDir(dirname string) ([]fs.FileInfo, error) {
f, err := os.Open(dirname)
if err != nil {
return nil, err
}
list, err := f.Readdir(-1)
f.Close()
if err != nil {
return nil, err
}
sort.Slice(list, func(i, j int) bool { return list[i].Name() < list[j].Name() })
return list, nil
}
you would realize that it calls os.File.Readdir() then sorts the files.
In case of counting it, you don't need to sort, so you are better off calling os.File.Readdir() directly.
You can simply copy and paste this function then remove the sort.
But I did find out that f.Readdirnames(-1) is much faster than f.Readdir(-1).
Running time is almost half for /usr/bin/ with 2808 items (16ms vs 35ms).
So to summerize it in an example:
package main
import (
"fmt"
"os"
)
func main() {
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
list, err := f.Readdirnames(-1)
f.Close()
if err != nil {
panic(err)
}
fmt.Println(len(list))
}

Parallel zip compression in Go

I am trying build a zip archive from a large number of small-medium sized files. I want to be able to do this concurrently, since compression is CPU intensive, and I'm running on a multi core server. Also I don't want to have the whole archive in memory, since its might turn out to be large.
My question is that do I have to compress every file and then combine manually combine everything together with zip header, checksum etc?
Any help would be greatly appreciated.

I don't think you can combine the zip headers.
What you could do is, run the zip.Writer sequentially, in a separate goroutine, and then spawn a new goroutine for each file that you want to read, and pipe those to the goroutine that is zipping them.
This should reduce the IO overhead that you get by reading the files sequentially, although it probably won't leverage multiple cores for the archiving itself.
Here's a working example. Note that, to keep things simple,
it does not handle errors nicely, just panics if something goes wrong,
and it does not use the defer statement too much, to demonstrate the order in which things should happen.
Since defer is LIFO, it can sometimes be confusing when you stack a lot of them together.
package main
import (
"archive/zip"
"io"
"os"
"sync"
)
func ZipWriter(files chan *os.File) *sync.WaitGroup {
f, err := os.Create("out.zip")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
wg.Add(1)
zw := zip.NewWriter(f)
go func() {
// Note the order (LIFO):
defer wg.Done() // 2. signal that we're done
defer f.Close() // 1. close the file
var err error
var fw io.Writer
for f := range files {
// Loop until channel is closed.
if fw, err = zw.Create(f.Name()); err != nil {
panic(err)
}
io.Copy(fw, f)
if err = f.Close(); err != nil {
panic(err)
}
}
// The zip writer must be closed *before* f.Close() is called!
if err = zw.Close(); err != nil {
panic(err)
}
}()
return &wg
}
func main() {
files := make(chan *os.File)
wait := ZipWriter(files)
// Send all files to the zip writer.
var wg sync.WaitGroup
wg.Add(len(os.Args)-1)
for i, name := range os.Args {
if i == 0 {
continue
}
// Read each file in parallel:
go func(name string) {
defer wg.Done()
f, err := os.Open(name)
if err != nil {
panic(err)
}
files <- f
}(name)
}
wg.Wait()
// Once we're done sending the files, we can close the channel.
close(files)
// This will cause ZipWriter to break out of the loop, close the file,
// and unblock the next mutex:
wait.Wait()
}
Usage: go run example.go /path/to/*.log.
This is the order in which things should be happening:
Open output file for writing.
Create a zip.Writer with that file.
Kick off a goroutine listening for files on a channel.
Go through each file, this can be done in one goroutine per file.
Send each file to the goroutine created in step 3.
After processing each file in said goroutine, close the file to free up resources.
Once each file has been sent to said goroutine, close the channel.
Wait until the zipping has been done (which is done sequentially).
Once zipping is done (channel exhausted), the zip writer should be closed.
Only when the zip writer is closed, should the output file be closed.
Finally everything is closed, so close the sync.WaitGroup to tell the calling function that we're good to go. (A channel could also be used here, but sync.WaitGroup seems more elegant.)
When you get the signal from the zip writer that everything is properly closed, you can exit from main and terminate nicely.
This might not answer your question, but I've been using similar code to generate zip archives on-the-fly for a web service some time ago. It performed quite well, even though the actual zipping was done in a single goroutine. Overcoming the IO bottleneck can already be an improvement.

From the look of it, you won't be able to parallelise the compression using the standard library archive/zip package because:
Compression is performed by the io.Writer returned by zip.Writer.Create or CreateHeader.
Calling Create/CreateHeader implicitly closes the writer returned by the previous call.
So passing the writers returned by Create to multiple goroutines and writing to them in parallel will not work.
If you wanted to write your own parallel zip writer, you'd probably want to structure it something like this:
Have multiple goroutines compress files using the compress/flate module, and keep track of the CRC32 value and length of the uncompressed data. The output should be directed to temporary files. Note the compressed size of the data.
Once everything has been compressed, start writing the Zip file starting with the header.
Write out the file header followed by the contents of the corresponding temporary file for each compressed file.
Write out the central directory record and end record at the end of the file. All the required information should be available at this point.
For added parallelism, step 1 could be performed in parallel with the remaining steps by using a channel to indicate when compression of each file completes.
Due to the file format, you won't be able to perform parallel compression without either storing compressed data in memory or in temporary files.

With Go1.17, parallel compression and merging of zip files are possible using the archive/zip package.
An example is below. In the example, I create zip workers to create individual zip files and an entry provider worker which provides entries to be added to a zip file via a channel to zip workers. Actual files can be provided to the zip workers but I skipped that part.
package main
import (
"archive/zip"
"context"
"fmt"
"io"
"log"
"os"
"strings"
"golang.org/x/sync/errgroup"
)
const numOfZipWorkers = 10
type entry struct {
name string
rc io.ReadCloser
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
entCh := make(chan entry, numOfZipWorkers)
zpathCh := make(chan string, numOfZipWorkers)
group, ctx := errgroup.WithContext(context.Background())
for i := 0; i < numOfZipWorkers; i++ {
group.Go(func() error {
return zipWorker(ctx, entCh, zpathCh)
})
}
group.Go(func() error {
defer close(entCh) // Signal workers to stop.
return entryProvider(ctx, entCh)
})
err := group.Wait()
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile("output.zip", os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
zw := zip.NewWriter(f)
close(zpathCh)
for path := range zpathCh {
zrd, err := zip.OpenReader(path)
if err != nil {
log.Fatal(err)
}
for _, zf := range zrd.File {
err := zw.Copy(zf)
if err != nil {
log.Fatal(err)
}
}
_ = zrd.Close()
_ = os.Remove(path)
}
err = zw.Close()
if err != nil {
log.Fatal(err)
}
err = f.Close()
if err != nil {
log.Fatal(err)
}
}
func entryProvider(ctx context.Context, entCh chan<- entry) error {
for i := 0; i < 2*numOfZipWorkers; i++ {
select {
case <-ctx.Done():
return ctx.Err()
case entCh <- entry{
name: fmt.Sprintf("file_%d", i+1),
rc: io.NopCloser(strings.NewReader(fmt.Sprintf("content %d", i+1))),
}:
}
}
return nil
}
func zipWorker(ctx context.Context, entCh <-chan entry, zpathch chan<- string) error {
f, err := os.CreateTemp(".", "tmp-part-*")
if err != nil {
return err
}
zw := zip.NewWriter(f)
Loop:
for {
var (
ent entry
ok bool
)
select {
case <-ctx.Done():
err = ctx.Err()
break Loop
case ent, ok = <-entCh:
if !ok {
break Loop
}
}
hdr := &zip.FileHeader{
Name: ent.name,
Method: zip.Deflate, // zip.Store can also be used.
}
hdr.SetMode(0644)
w, e := zw.CreateHeader(hdr)
if e != nil {
_ = ent.rc.Close()
err = e
break
}
_, e = io.Copy(w, ent.rc)
_ = ent.rc.Close()
if e != nil {
err = e
break
}
}
if e := zw.Close(); e != nil && err == nil {
err = e
}
if e := f.Close(); e != nil && err == nil {
err = e
}
if err == nil {
select {
case <-ctx.Done():
err = ctx.Err()
case zpathch <- f.Name():
}
}
return err
}

List directory in Go

I've been trying to figure out how to simply list the files and folders in a single directory in Go.
I've found filepath.Walk, but it goes into sub-directories automatically, which I don't want. All of my other searches haven't turned anything better up.
I'm sure that this functionality exists, but it's been really hard to find. Let me know if anyone knows where I should look. Thanks.

You can try using the ReadDir function in the os package. Per the docs:
ReadDir reads the named directory, returning all its directory entries sorted by filename.
The resulting slice contains os.DirEntry types, which provide the methods listed here. Here is a basic example that lists the name of everything in the current directory (folders are included but not specially marked - you can check if an item is a folder by using the IsDir() method):
package main
import (
"fmt"
"os"
"log"
)
func main() {
entries, err := os.ReadDir("./")
if err != nil {
log.Fatal(err)
}
for _, e := range entries {
fmt.Println(e.Name())
}
}

We can get a list of files inside a folder on the file system using various golang standard library functions.
filepath.Walk
ioutil.ReadDir
os.File.Readdir
package main
import (
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
)
func main() {
var (
root string
files []string
err error
)
root := "/home/manigandan/golang/samples"
// filepath.Walk
files, err = FilePathWalkDir(root)
if err != nil {
panic(err)
}
// ioutil.ReadDir
files, err = IOReadDir(root)
if err != nil {
panic(err)
}
//os.File.Readdir
files, err = OSReadDir(root)
if err != nil {
panic(err)
}
for _, file := range files {
fmt.Println(file)
}
}
Using filepath.Walk
The path/filepath package provides a handy way to scan all the files
in a directory, it will automatically scan each sub-directories in the
directory.
func FilePathWalkDir(root string) ([]string, error) {
var files []string
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
files = append(files, path)
}
return nil
})
return files, err
}
Using ioutil.ReadDir
ioutil.ReadDir reads the directory named by dirname and returns a
list of directory entries sorted by filename.
func IOReadDir(root string) ([]string, error) {
var files []string
fileInfo, err := ioutil.ReadDir(root)
if err != nil {
return files, err
}
for _, file := range fileInfo {
files = append(files, file.Name())
}
return files, nil
}
Using os.File.Readdir
Readdir reads the contents of the directory associated with file and
returns a slice of up to n FileInfo values, as would be returned by
Lstat, in directory order. Subsequent calls on the same file will
yield further FileInfos.
func OSReadDir(root string) ([]string, error) {
var files []string
f, err := os.Open(root)
if err != nil {
return files, err
}
fileInfo, err := f.Readdir(-1)
f.Close()
if err != nil {
return files, err
}
for _, file := range fileInfo {
files = append(files, file.Name())
}
return files, nil
}
Benchmark results.
Get more details on this Blog Post

Even simpler, use path/filepath:
package main
import (
"fmt"
"log"
"path/filepath"
)
func main() {
files, err := filepath.Glob("*")
if err != nil {
log.Fatal(err)
}
fmt.Println(files) // contains a list of all files in the current directory
}

Starting with Go 1.16, you can use the os.ReadDir function.
func ReadDir(name string) ([]DirEntry, error)
It reads a given directory and returns a DirEntry slice that contains the directory entries sorted by filename.
It's an optimistic function, so that, when an error occurs while reading the directory entries, it tries to return you a slice with the filenames up to the point before the error.
package main
import (
"fmt"
"log"
"os"
)
func main() {
files, err := os.ReadDir(".")
if err != nil {
log.Fatal(err)
}
for _, file := range files {
fmt.Println(file.Name())
}
}
Of interest: Go 1.17 (Q3 2021) includes fs.FileInfoToDirEntry():
func FileInfoToDirEntry(info FileInfo) DirEntry
FileInfoToDirEntry returns a DirEntry that returns information from info.
If info is nil, FileInfoToDirEntry returns nil.
Background
Go 1.16 (Q1 2021) will propose, with CL 243908 and CL 243914 , the ReadDir function, based on the FS interface:
// An FS provides access to a hierarchical file system.
//
// The FS interface is the minimum implementation required of the file system.
// A file system may implement additional interfaces,
// such as fsutil.ReadFileFS, to provide additional or optimized functionality.
// See io/fsutil for details.
type FS interface {
// Open opens the named file.
//
// When Open returns an error, it should be of type *PathError
// with the Op field set to "open", the Path field set to name,
// and the Err field describing the problem.
//
// Open should reject attempts to open names that do not satisfy
// ValidPath(name), returning a *PathError with Err set to
// ErrInvalid or ErrNotExist.
Open(name string) (File, error)
}
That allows for "os: add ReadDir method for lightweight directory reading":
See commit a4ede9f:
// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error)
// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
// Name returns the name of the file (or subdirectory) described by the entry.
// This name is only the final element of the path, not the entire path.
// For example, Name would return "hello.go" not "/home/gopher/hello.go".
Name() string
// IsDir reports whether the entry describes a subdirectory.
IsDir() bool
// Type returns the type bits for the entry.
// The type bits are a subset of the usual FileMode bits, those returned by the FileMode.Type method.
Type() os.FileMode
// Info returns the FileInfo for the file or subdirectory described by the entry.
// The returned FileInfo may be from the time of the original directory read
// or from the time of the call to Info. If the file has been removed or renamed
// since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
// If the entry denotes a symbolic link, Info reports the information about the link itself,
// not the link's target.
Info() (FileInfo, error)
}
src/os/os_test.go#testReadDir() illustrates its usage:
file, err := Open(dir)
if err != nil {
t.Fatalf("open %q failed: %v", dir, err)
}
defer file.Close()
s, err2 := file.ReadDir(-1)
if err2 != nil {
t.Fatalf("ReadDir %q failed: %v", dir, err2)
}
Ben Hoyt points out in the comments to Go 1.16 os.ReadDir:
os.ReadDir(path string) ([]os.DirEntry, error), which you'll be able to call directly without the Open dance.
So you can probably shorten this to just os.ReadDir, as that's the concrete function most people will call.
See commit 3d913a9 (Dec. 2020):
os: add ReadFile, WriteFile, CreateTemp (was TempFile), MkdirTemp (was TempDir) from io/ioutil
io/ioutil was a poorly defined collection of helpers.
Proposal #40025 moved out the generic I/O helpers to io.
This CL for proposal #42026 moves the OS-specific helpers to os,
making the entire io/ioutil package deprecated.
os.ReadDir returns []DirEntry, in contrast to ioutil.ReadDir's []FileInfo.
(Providing a helper that returns []DirEntry is one of the primary motivations for this change.)

ioutil.ReadDir is a good find, but if you click and look at the source you see that it calls the method Readdir of os.File. If you are okay with the directory order and don't need the list sorted, then this Readdir method is all you need.

From your description, what you probably want is os.Readdirnames.
func (f *File) Readdirnames(n int) (names []string, err error)
Readdirnames reads the contents of the directory associated with file and returns a slice of up to n names of files in the directory, in directory order. Subsequent calls on the same file will yield further names.
...
If n <= 0, Readdirnames returns all the names from the directory in a single slice.
Snippet:
file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()
names, err := file.Readdirnames(0)
if err != nil {
return err
}
fmt.Println(names)
Credit to SquattingSlavInTracksuit's comment; I'd have suggested promoting their comment to an answer if I could.

A complete example of printing all the files in a directory recursively using Readdirnames
package main
import (
"fmt"
"os"
)
func main() {
path := "/path/to/your/directory"
err := readDir(path)
if err != nil {
panic(err)
}
}
func readDir(path string) error {
file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()
names, _ := file.Readdirnames(0)
for _, name := range names {
filePath := fmt.Sprintf("%v/%v", path, name)
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
fileInfo, err := file.Stat()
if err != nil {
return err
}
fmt.Println(filePath)
if fileInfo.IsDir() {
readDir(filePath)
}
}
return nil
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Handling nested zip files with archive/zip - go

Related

How to get the number of files in a directory in beego [duplicate]

Go: zlib uncompressing a slice of bytes

Getting count of files in directory using Go

Parallel zip compression in Go

List directory in Go

Categories

Resources