Determine Length Of Golang Gzip File Without Reading It? - go

I have gzipped files on disk that I wish to stream to an HTTP client uncompressed. To do this I need to send a length header, then stream the uncompressed file to the client. I know the gzip protocol stores the original length of the uncompressed data, but as far as I can tell golang's "compress/gzip" package does not appear to have a way to grab this length. I've resorted to reading the file into a variable then taking the string length from that, but this is grossly inefficient and wasteful of memory especially on larger files.
Bellow I've posted the code I've ended up using:
DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer openfile.Close();
fz, err := gzip.NewReader(openfile);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer fz.Close()
// Wastefully read data into a string so I can get the length.
s, err := ioutil.ReadAll(fz);
r := strings.NewReader(string(s));
//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(len(s))); // Send length to client.
w.Header().Set("Content-Type", "text/csv");
io.Copy(w, r) //'Copy' the file to the client
}
What I would expect to be able to do instead is something like this:
DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer openfile.Close();
fz, err := gzip.NewReader(openfile);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer fz.Close()
//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(fz.Length())); // Send length to client.
w.Header().Set("Content-Type", "text/csv");
io.Copy(w, fz) //'Copy' the file to the client
}
Does anyone know how to get the uncompressed length for a gzipped file in golang?

The gzip format might appear to provide the uncompressed length, but actually it does not. Unfortunately, the only reliable way to get the uncompressed length is to decompress the gzip stream. (You can just count the bytes, not saving the uncompressed data anywhere.)
See this answer for why.

Related

Editing zip file in memory and returning it via http response results in a corrupt file

Hey guys am new to go exactly 23 hours and 10 minutes new so obviously am having issues with some stuff, I have a zip file that is in memory and I would like to take that file make a copy of it add some files to the copy and return the file via HTTP, it works but when I open the file it seems to be corrupted
outFile, err := os.OpenFile("./template.zip", os.O_RDWR, 0666)
if err != nil {
log.Fatalf("Failed to open zip for writing: %s", err)
}
defer outFile.Close()
zipw := zip.NewWriter(outFile)
fmt.Println(reflect.TypeOf(zipw))
for _, appCode := range appPageCodeText {
f, err := zipw.Create(appCode.Name + ".jsx")
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(appCode.Content)) //casting it to byte array and writing to file
}
// Clean up
err = zipw.Close()
if err != nil {
log.Fatal(err)
}
defer outFile.Close()
//Get the Content-Type of the file
//Create a buffer to store the header of the file in
FileHeader := make([]byte, 512)
//Copy the headers into the FileHeader buffer
outFile.Read(FileHeader)
//Get content type of file
fmt.Println(reflect.TypeOf(outFile))
//Get the file size
FileStat, _ := outFile.Stat() //Get info from file
FileSize := strconv.FormatInt(FileStat.Size(), 10) //Get file size as a string
buffer := make([]byte, FileStat.Size())
outFile.Read(buffer)
//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename="+"template.zip")
w.Header().Set("Content-Type", "application/zip")
w.Header().Set("Content-Length", FileSize)
outFile.Seek(0, 0)
// io.Copy(w, buffer) //'Copy' the file to the client
w.Write(buffer)
(The primary problem): you Read the first 512 bytes of outFile into FileHeader, which means that they're not read into buffer, which means the first 512 bytes of the file aren't sent to the client. You do a Seek, but too late for it to be useful — the contents of buffer are already set at that point. You need to move the Seek earlier, or write both buffers, or just remove the unnecessary FileHeader read.
Your comment claims that you do so to get the content-type of the file, but FileHeader is actually never used. And why would it be? You know what the type of the file is, you just wrote it. So the separate read of the first 512 bytes is unneeded.
Actually, it's all unneeded — Instead of making a file on disk, using a zip.Writer to write to the file, re-opening the file from disk, reading it into a byte array, and then writing that byte array to the HTTP client, you could simply either have the zip.Writer write directly to the HTTP client (if you don't care about setting Content-Length), or have it write to a bytes.Buffer and then copy that buffer out to the HTTP client (if an accurate Content-Length is important to you).
The first version looks like:
w.Header().Set("Content-Disposition", "attachment; filename=template.zip")
w.Header().Set("Content-Type", "application/zip")
zipw := zip.NewWriter(w)
// Your for loop to add items to the zip goes here.
//
zipw.Close() // plus error handling
And the second version looks like:
buffer := &bytes.Buffer{}
zipw := zip.NewWriter(buffer)
// Your for loop to add items to the zip goes here.
//
zipw.Close() // plus error handling
w.Header().Set("Content-Disposition", "attachment; filename=template.zip")
w.Header().Set("Content-Type", "application/zip")
w.Header().Set("Content-Length", strconv.FormatInt(buffer.Length(), 10))
io.Copy(w, buffer) // plus error handling

Transfer contents of directory over net's TCP connection

I am currently learning Go and I am trying to send the contents of a directory to another machine over a plain tcp connection using Go's net package.
It works fine with individual files and small folders, but I run into issues if the folder contains many subfolders and larger files. I am using the filepath.Walk function to traverse over all files in the given directory. For each file or directory I send, I also send a header that provides the receiver with file name, file size, isDir properties so I know for how long I need to read for when reading the content. The issue I am having is that after a while when reading the header, I am reading actual file content of the previous file even though I already read that file from the connection
Here is the writer side. I simply traverse over the directory.
func transferDir(session *Session, dir string) error {
return filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
header := Header{Name: info.Name(), Size: info.Size(), Path: path}
if info.IsDir() {
header.SetDirBit()
session.WriteHeader(header)
return nil // nothing more to write
}
// content is a file. write the file now byte by byte
file, err := os.Open(path)
inf, err := file.Stat()
header.Size = inf.Size() // get the true size of the file
session.WriteHeader(header)
defer file.Close()
if err != nil {
return err
}
buf := make([]byte, BUF_SIZE)
for {
n, err := file.Read(buf)
if err != nil {
if err == io.EOF {
session.Write(buf[:n])
session.Flush()
break
} else {
log.Println(err)
return err
}
}
session.Write(buf[:n])
session.Flush()
}
return nil
})
And here is the reader part
func (c *Clone) readFile(h Header) error {
file, err := os.Create(h.Path)
defer file.Close()
if err != nil {
return err
}
var receivedByts int64
fmt.Printf("Reading File: %s Size: %d\n", h.Name, h.Size)
for {
if (h.Size - receivedByts) < BUF_SIZE {
n, err := io.CopyN(file, c.sesh, (h.Size - receivedByts))
fmt.Println("Written: %d err: %s\n", n, err)
break
}
n, err := io.CopyN(file, c.sesh, BUF_SIZE)
fmt.Println("Written: %d err: %s\n", n, err)
receivedByts += BUF_SIZE
fmt.Println("Bytes Read: ", receivedByts)
}
return nil
}
Now the weird part is that when I am looking at the print statements I see something like:
Reading File: test.txt Size: 14024
Written 1024 nil
Bytes Read 1024
... This continues all the way to the break statement
And the total of the Bytes read equals the actual file size. Yet, the subsequent read for the header will return content from the test.txt file. Almost like there is still stuff in the buffer, but I think I read it already....

Best pattern to create sha256 from file and store file

I am writing a webserver that receives a file as an upload as multipart/form-data. I am generating the file sha256 from the request but due to the nature of the Reader interface, I can't reuse the data to also upload the file to a filer. These files can be a few hundred MBs. What is the best way to store the content? I can duplicate the contents but I am worried that could be wasteful on memory resources.
EDIT
func uploadFile(w http.ResponseWriter, r *http.Request) {
f, err := r.MultipartForm.File["capture"][0].Open()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer f.Close()
hash, err := createSha(f)
if err != nil {
fmt.Println(err.Error())
return
}
}
func createSha(image multipart.File) (hash.Hash, error) {
sha := sha256.New()
// This cause the contents of image to no longer be available to be read again to be stored on the filer
if _, err := io.Copy(sha, image); err != nil {
return nil, err
}
return sha, nil
}
You might use io.MultiWriter(...) to send the data to multiple output streams concurrently, such as a hash and some remote writer.
For example (roughly):
sha := sha256.New()
filer := filer.New(...) // Some Writer that stores the bytes for you?
err := io.Copy(io.MultiWriter(sha, filer), r)
// TODO: handle error
// Now sha.Sum(nil) has the file digest and "filer" got sent all the bytes.
Note that io.Multiwriter can take as many writers as you want, so you could compute additional hashes at the same time (e.g. md5, sha1, etc.) or even send the file to multiple locations, e.g.:
md5, sha1, sha256, sha512 := md5.New(), sha1.New(), sha256.New(), sha512.New()
s3Writer, gcsWriter := filer.NewS3Writer(), filer.NewGCSWriter()
mw := io.MultiWriter(awsWriter, gcsWriter, md5, sha1, sha256, sha512)
err := io.Copy(mw, r)
// TODO: handle error
// Now you've got all the hashes for the file and it's stored in the cloud.

Download a zip file using io.Pipe() read/write golang

I am trying to stream out bytes of a zip file using io.Pipe() function in golang. I am using pipe reader to read the bytes of each file in the zip and then stream those out and use the pipe writer to write the bytes in the response object.
func main() {
r, w := io.Pipe()
// go routine to make the write/read non-blocking
go func() {
defer w.Close()
bytes, err := ReadBytesforEachFileFromTheZip()
err := json.NewEncoder(w).Encode(bytes)
handleErr(err)
}()
This is not a working implementation but a structure of what I am trying to achieve. I don't want to use ioutil.ReadAll since the file is going to be very large and Pipe() will help me avoid bringing all the data into memory. Can someone help with a working implementation using io.Pipe() ?
I made it work using golang io.Pipe().The Pipewriter writes byte to the pipe in chunks and the pipeReader reader from the other end. The reason for using a go-routine is to have a non-blocking write operation while simultaneous reads happen form the pipe.
Note: It's important to close the pipe writer (w.Close()) to send EOF on the stream otherwise it will not close the stream.
func DownloadZip() ([]byte, error) {
r, w := io.Pipe()
defer r.Close()
defer w.Close()
zip, err := os.Stat("temp.zip")
if err != nil{
return nil, err
}
go func(){
f, err := os.Open(zip.Name())
if err != nil {
return
}
buf := make([]byte, 1024)
for {
chunk, err := f.Read(buf)
if err != nil && err != io.EOF {
panic(err)
}
if chunk == 0 {
break
}
if _, err := w.Write(buf[:chunk]); err != nil{
return
}
}
w.Close()
}()
body, err := ioutil.ReadAll(r)
if err != nil {
return nil, err
}
return body, nil
}
Please let me know if someone has another way of doing it.

File reading and encoding performance

I'm writing some web service which supposed to receive an xml file from user, read it and save data to database
This file is gzipped and encoded in UTF-16. So i have to ungzip it, save xml to a file (for future purposes). Next i have to read file into a string, decode it to UTF-8 and kind of xml.Unmarshal([]byte(xmlString), &report)
Currently without saving it into a database
On my local machine i've realized that processing of one request takes about 30% of my CPU and about 300ms of time. For one request looks like okay. But i made script which simultaneously fires 100 requests (via curl ) and i saw that CPU usage is up to 100% and time for one request increased to 2sec
What i wanted to ask is: should i worry about it or maybe on a real web server things are going to be ok? Or maybe i'm doing smth wrong
Here is the code:
func Parse(filename string) Report {
xmlString := getXml(filename)
report := Report{}
xml.Unmarshal([]byte(xmlString), &report)
return report
}
func getXml(filename string) string {
b, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Println("Error opening file:", err)
}
s, err := decodeUTF16(b)
if err != nil {
panic(err)
}
pattern := `<?xml version="1.0" encoding="UTF-16"?>`
res := strings.Replace(s, pattern, "", 1)
return res
}
func decodeUTF16(b []byte) (string, error) {
if len(b)%2 != 0 {
return "", fmt.Errorf("Must have even length byte slice")
}
u16s := make([]uint16, 1)
ret := &bytes.Buffer{}
b8buf := make([]byte, 4)
lb := len(b)
for i := 0; i < lb; i += 2 {
u16s[0] = uint16(b[i]) + (uint16(b[i+1]) << 8)
r := utf16.Decode(u16s)
n := utf8.EncodeRune(b8buf, r[0])
ret.Write(b8buf[:n])
}
return ret.String(), nil
}
Please ask if i forgot something important

Resources