I've found lots of examples on how to extract all files from .zip, but I can't figure out how to extract a single file without iterating over all files in the .zip file.
Is it possible in Go to extract a single file from a .zip archive without iterating over all files in the .zip file?
For example, if a zip file contained:
folder1/file1.txt
folder1/file2.txt
folder1/file3.txt
folder2/file1.txt
How would I extract only folder2/file1.txt?
zip.Reader provides you the content of the archive, the files as a slice (of zip.File). There is no helper method to get a file by name, you have to iterate over the files with a loop. You don't need to open / extract the files, but to find a file by name, you have to use a loop.
For example:
r, err := zip.OpenReader("testdata/readme.zip")
if err != nil {
log.Fatal(err)
}
defer r.Close()
for _, f := range r.File {
if f.Name != "folder2/file1.txt" {
continue
}
// Found it, print its content to terminal:
rc, err := f.Open()
if err != nil {
log.Fatal(err)
}
_, err = io.Copy(os.Stdout, rc)
if err != nil {
log.Fatal(err)
}
rc.Close()
fmt.Println()
break
}
Related
Below I have a snippet of my code which collects some gzip compressed PDF files.
I want to add the PDF's to a tar.gz file, but before adding them they need to be uncompressed (gzip). Don't want to end up with a tar.gz filled with pdf.gz files
Need to decompress it without reading the entire file into memory. The PDF files in the tar.gz are clipped and corrupted. When I compare the tar.gz files with the original PDF files the look equal except the tar.gz files are clipped. The last part of each file is missing
// Create new gz writer with compression level 1
gzw, _ := gzip.NewWriterLevel(w, 1)
defer gzw.Close()
// Create new tar writer
tw := tar.NewWriter(gzw)
defer tw.Close()
file_path := "path-to-file.pdf.gz"
file_name := "filename-shown-in-tar.pdf"
// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
log.Printf("Error: %v", err)
}
defer fp.Close()
file_name := file[1]+file_ext
info, err := fp.Stat()
if err != nil {
log.Printf("Error: %v", err)
}
header, err := tar.FileInfoHeader(info, file_name)
if err != nil {
log.Printf("Error: %v", err)
}
header.Name = file_name
tw.WriteHeader(header)
// This part will write the *.pdf.gz files directly to the tar.gz file
// This part works and it's possible to both open the tar.gz file and
// afterwards open the individuel pdf.gz files
//io.Copy(tw, fp)
// This part decode the gz before adding, but it clips the pdf files in
// the tar.gz file
gzr, err := gzip.NewReader(fp)
if err != nil {
log.Printf("Error: %v", err)
}
defer gzr.Close()
io.Copy(tw, gzr)
update
Got a suggestion from a comment, but now the PDF files inside the tar can't be opened. The tar.gz file is created and can be opened, but the PDF files inside are corrupted
Have tried to compare output files from the tar.gz with the original PDF. It looks like the corrupted file is missing the last bit of the file.
In one example the original file has 498 lines and the corrupted has only 425. But it looks like the 425 lines are equal to the original. Somehow the last bit is just clipped
The issue appears to be that you're setting the file info header based on the original file, which is compressed. In particular, it is the size that is causing problems - if you attempt to write in excess of the size indicated by the Size value in the header, archive/tar.Writer.Write() will return ErrWriteTooLong - see https://github.com/golang/go/blob/d5efd0dd63a8beb5cc57ae7d25f9c60d5dea5c65/src/archive/tar/writer.go#L428-L429
Something like the following should work, whereby the file is uncompressed and read so an accurate size can be established:
// Open file to add to tar
fp, err := os.Open(file_path)
if err != nil {
log.Printf("Error: %v", err)
}
defer fp.Close()
gzr, _ := gzip.NewReader(fp)
if err != nil {
panic(err)
}
defer gzr.Close()
data, err := io.ReadAll(gzr)
if err != nil {
log.Printf("Error: %v", err)
}
// Create tar header for file
header := &tar.Header{
Name: file_name,
Mode: 0600,
Size: int64(len(data)),
}
// Write header to the tar
if err = tw.WriteHeader(header); err != nil {
log.Printf("Error: %v", err)
}
// Write the file content to the tar
if _, err = tw.Write(data); err != nil {
log.Printf("Error: %v", err)
}
I have a 7z archive of a number of .txt files. I am trying to list all the files in the archive and upload them to an s3 bucket. But I'm having trouble with extracting .7z archives on Go. To do this, I found a package github.com/gen2brain/go-unarr (imported as extractor) and this is what I have so far
content, err := ioutil.ReadFile("sample_archive.7z")
if err != nil {
fmt.Printf("err: %+v", err)
}
a, err := extractor.NewArchiveFromMemory(content)
if err != nil {
fmt.Printf("err: %+v", err)
}
lst, _ := a.List()
fmt.Printf("lst: %+v", last)
This prints a list of all the files in the archive. But this has two issues.
It reads files from local using ioutil and the input of NewArchiveFromMemory must be of type []byte. But I can't read from local and will have to use a file from memory of type os.file. So I will either have to find a different method or convert the os.file to []byte. There's another method NewArchiveFromReader(r io.Reader). But this is returning an error saying Bad File Descriptor.
file, err := os.OpenFile(
path,
os.O_WRONLY|os.O_TRUNC|os.O_CREATE,
0666,
)
a, err := extractor.NewArchiveFromReader(file)
if err != nil {
fmt.Printf("ERROR: %+v", err)
}
lst, _ := a.List()
fmt.Printf("files: %+v\n", lst)
I am able to get the list of the files in the archive. And using Extract(destinaltion_path string), I can also extract it to a local directory. But I want the extracted files also in os.file format ( ie. a list of os.file since there will be multiple files ).
How can I change my current code to achieve both the above targets? Is there any other library to do this?
os.File implements the io.Reader interface (because it has a Read([]byte) (int, error) method defined), so you can use NewArchiveFromReader(file) without any conversions needed. You can read up on Go interfaces for more background on why that works.
If you're okay with extracting to a local directory, you can do that and then read the files back in (warning, may contain typos):
func extractAndOpenAll(*extractor.Archive) ([]*os.File, error) {
err := a.Extract("/tmp/path") // consider using ioutil.TempDir()
if err != nil {
return nil, err
}
filestats, err := ioutil.ReadDir("/tmp/path")
if err != nil {
return nil, err
}
# warning: all these file handles must be closed by the caller,
# which is why even the error case here returns the list of files.
# if you forget, your process might leak file handles.
files := make([]*os.File, 0)
for _, fs := range(filestats) {
file, err := os.Open(fs.Name())
if err != nil {
return files, err
}
files = append(files, file)
}
return files, nil
}
It is possible to use the archived files without writing back to disk (https://github.com/gen2brain/go-unarr#read-all-entries-from-archive), but whether or not you should do that instead depends on what your next step is.
I need to implement sftp client that connects to a host, read all available files in a specified folder, then check if a particular file matches a pattern and copy it to according local directory. Problem is that i can't find a way to.
I tried to use client.Walk but cannot figure out a way to understand if this is a directory and skip it:
walker := client.Walk(startDir)
for walker.Step() {
if err := walker.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
continue
}
filePath := walker.Path()
}
How can I determine if the current iteration is directory?
You may use Walker.Stat() to obtain info about the most recent file or directory visited by a call to Walker.Step(). It returns you a value of type os.FileInfo which has an IsDir() method.
For example:
for walker.Step() {
if err := walker.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
continue
}
if fi := walker.Stat(); fi.IsDir() {
continue // Skip dir
}
// ...
}
I need to take tmp1.zip and append it's tmp1.signed file to the end of it; creating a new tmp1.zip.signed file using Go.
It's essentially same as cat | sc
I could call cmd line from Go, but that seems super inefficient (and cheesy).
So far
Google-ing the words "go combine files" et. al. yields minimal help.
But I have come across a couple of options that I have tried such as ..
f, err := os.OpenFile("tmp1.txt", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
if _, err := f.Write([]byte("appended some data\n")); err != nil {
log.Fatal(err)
}
if err := f.Close(); err != nil {
log.Fatal(err)
}
But that is just getting strings added to the end of the file, not really merging the two files, or appending the signature to the original file.
Question
Assuming I am asking the right questions to get one file appended to another, Is there a better sample of how exactly to merge two files into one using Go?
Based on your question, you want to create a new file with the content of both files.
You can use io.Copy to achieve that.
Here is a simple command-line tool implementing it.
package main
import (
"io"
"log"
"os"
)
func main() {
if len(os.Args) != 4 {
log.Fatalln("Usage: %s <zip> <signed> <output>\n", os.Args[0])
}
zipName, signedName, output := os.Args[1], os.Args[2], os.Args[3]
zipIn, err := os.Open(zipName)
if err != nil {
log.Fatalln("failed to open zip for reading:", err)
}
defer zipIn.Close()
signedIn, err := os.Open(signedName)
if err != nil {
log.Fatalln("failed to open signed for reading:", err)
}
defer signedIn.Close()
out, err := os.OpenFile(output, os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatalln("failed to open outpout file:", err)
}
defer out.Close()
n, err := io.Copy(out, zipIn)
if err != nil {
log.Fatalln("failed to append zip file to output:", err)
}
log.Printf("wrote %d bytes of %s to %s\n", n, zipName, output)
n, err = io.Copy(out, signedIn)
if err != nil {
log.Fatalln("failed to append signed file to output:", err)
}
log.Printf("wrote %d bytes of %s to %s\n", n, signedName, output)
}
Basically, it open both files you want to merge, create a new one and copy the content of each file to the new file.
We can create a zip new file and add files using Go Language.
But, how to add a new file with existing zip file using GoLang?
If we can use Create function, how to get the zip.writer reference?
Bit confused.
After more analysis, i found that, it is not possible to add any files with the existing zip file.
But, I was able to add files with tar file by following the hack given in this URL.
you can:
copy old zip items into a new zip file;
add new files into the new zip file;
zipReader, err := zip.OpenReader(zipPath)
targetFile, err := os.Create(targetFilePath)
targetZipWriter := zip.NewWriter(targetFile)
for _, zipItem := range zipReader.File {
zipItemReader, err := zipItem.Open()
header, err := zip.FileInfoHeader(zipItem.FileInfo())
header.Name = zipItem.Name
targetItem, err := targetZipWriter.CreateHeader(header)
_, err = io.Copy(targetItem, zipItemReader)
}
addNewFiles(targetZipWriter) // IMPLEMENT YOUR LOGIC
Although I have not attempted this yet with a zip file that already exists and then writing to it, I believe you should be able to add files to it.
This is code I have written to create a conglomerate zip file containing multiple files in order to expedite uploading the data to another location. I hope it helps!
type fileData struct {
Filename string
Body []byte
}
func main() {
outputFilename := "path/to/file.zip"
// whatever you want as filenames and bodies
fileDatas := createFileDatas()
// create zip file
conglomerateZip, err := os.Create(outputFilename)
if err != nil {
return err
}
defer conglomerateZip.Close()
zipWriter := zip.NewWriter(conglomerateZip)
defer zipWriter.Close()
// populate zip file with multiple files
err = populateZipfile(zipWriter, fileDatas)
if err != nil {
return err
}
}
func populateZipfile(w *zip.Writer, fileDatas []*fileData) error {
for _, fd := range fileDatas {
f, err := w.Create(fd.Filename)
if err != nil {
return err
}
_, err = f.Write([]byte(fd.Body))
if err != nil {
return err
}
err = w.Flush()
if err != nil {
return err
}
}
return nil
}
This is a bit old and already has an answer, but if performance isn't a key concern for you (making the zip file isn't on a hot path for example) you can do this with the archive/zip library by creating a new writer and copying the existing files into it then adding your new content. Something like this:
zw := // new zip writer from buffer or temp file
newFileName := // file name to add
reader, _ := zip.NewReader(bytes.NewReader(existingFile), int64(len(existingFile)))
for _, file := range reader.File {
if file.Name == newFileName {
continue // don't copy the old file over to avoid duplicates
}
fw, _ := zw.Create(file.Name)
fr, _ := file.Open()
io.Copy(fw, fr)
fr.Close()
}
Then you would return the new writer and append files as needed. If you aren't sure which files might overlap you can turn that if check into a function with a list of file names you will eventually add. You can also use this logic to remove a file from an existing archive.
Now in 2021, there is still no support for appending files to an existing archive.
But at least it is now possible to add already-compressed files, i.e. we don't anymore have to decompress & re-compress files when duplicating them from old archive to new one.
(NOTE: this only applies to Go 1.17+)
So, based on examples by #wongoo and #Michael, here is how I would implement appending files now with the minimum performance overhead (you'll want to add error handling though):
zr, err := zip.OpenReader(zipPath)
defer zr.Close()
zwf, err := os.Create(targetFilePath)
defer zwf.Close()
zw := zip.NewWriter(zwf)
defer zwf.Close() // or not... since it will try to wrote central directory
for _, zipItem := range zrw.File {
if isOneOfNamesWeWillAdd(zipItem.Name) {
continue // avoid duplicate files!
}
zipItemReader, err := zipItem.OpenRaw()
header := zipItem.FileHeader // clone header data
targetItem, err := targetZipWriter.CreateRaw(&header) // use cloned data
_, err = io.Copy(targetItem, zipItemReader)
}
addNewFiles(zw) // IMPLEMENT YOUR LOGIC