Best pattern to create sha256 from file and store file - go

I am writing a webserver that receives a file as an upload as multipart/form-data. I am generating the file sha256 from the request but due to the nature of the Reader interface, I can't reuse the data to also upload the file to a filer. These files can be a few hundred MBs. What is the best way to store the content? I can duplicate the contents but I am worried that could be wasteful on memory resources.
EDIT
func uploadFile(w http.ResponseWriter, r *http.Request) {
f, err := r.MultipartForm.File["capture"][0].Open()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer f.Close()
hash, err := createSha(f)
if err != nil {
fmt.Println(err.Error())
return
}
}
func createSha(image multipart.File) (hash.Hash, error) {
sha := sha256.New()
// This cause the contents of image to no longer be available to be read again to be stored on the filer
if _, err := io.Copy(sha, image); err != nil {
return nil, err
}
return sha, nil
}

You might use io.MultiWriter(...) to send the data to multiple output streams concurrently, such as a hash and some remote writer.
For example (roughly):
sha := sha256.New()
filer := filer.New(...) // Some Writer that stores the bytes for you?
err := io.Copy(io.MultiWriter(sha, filer), r)
// TODO: handle error
// Now sha.Sum(nil) has the file digest and "filer" got sent all the bytes.
Note that io.Multiwriter can take as many writers as you want, so you could compute additional hashes at the same time (e.g. md5, sha1, etc.) or even send the file to multiple locations, e.g.:
md5, sha1, sha256, sha512 := md5.New(), sha1.New(), sha256.New(), sha512.New()
s3Writer, gcsWriter := filer.NewS3Writer(), filer.NewGCSWriter()
mw := io.MultiWriter(awsWriter, gcsWriter, md5, sha1, sha256, sha512)
err := io.Copy(mw, r)
// TODO: handle error
// Now you've got all the hashes for the file and it's stored in the cloud.

Related

How to extract .7z files in Go

I have a 7z archive of a number of .txt files. I am trying to list all the files in the archive and upload them to an s3 bucket. But I'm having trouble with extracting .7z archives on Go. To do this, I found a package github.com/gen2brain/go-unarr (imported as extractor) and this is what I have so far
content, err := ioutil.ReadFile("sample_archive.7z")
if err != nil {
fmt.Printf("err: %+v", err)
}
a, err := extractor.NewArchiveFromMemory(content)
if err != nil {
fmt.Printf("err: %+v", err)
}
lst, _ := a.List()
fmt.Printf("lst: %+v", last)
This prints a list of all the files in the archive. But this has two issues.
It reads files from local using ioutil and the input of NewArchiveFromMemory must be of type []byte. But I can't read from local and will have to use a file from memory of type os.file. So I will either have to find a different method or convert the os.file to []byte. There's another method NewArchiveFromReader(r io.Reader). But this is returning an error saying Bad File Descriptor.
file, err := os.OpenFile(
path,
os.O_WRONLY|os.O_TRUNC|os.O_CREATE,
0666,
)
a, err := extractor.NewArchiveFromReader(file)
if err != nil {
fmt.Printf("ERROR: %+v", err)
}
lst, _ := a.List()
fmt.Printf("files: %+v\n", lst)
I am able to get the list of the files in the archive. And using Extract(destinaltion_path string), I can also extract it to a local directory. But I want the extracted files also in os.file format ( ie. a list of os.file since there will be multiple files ).
How can I change my current code to achieve both the above targets? Is there any other library to do this?
os.File implements the io.Reader interface (because it has a Read([]byte) (int, error) method defined), so you can use NewArchiveFromReader(file) without any conversions needed. You can read up on Go interfaces for more background on why that works.
If you're okay with extracting to a local directory, you can do that and then read the files back in (warning, may contain typos):
func extractAndOpenAll(*extractor.Archive) ([]*os.File, error) {
err := a.Extract("/tmp/path") // consider using ioutil.TempDir()
if err != nil {
return nil, err
}
filestats, err := ioutil.ReadDir("/tmp/path")
if err != nil {
return nil, err
}
# warning: all these file handles must be closed by the caller,
# which is why even the error case here returns the list of files.
# if you forget, your process might leak file handles.
files := make([]*os.File, 0)
for _, fs := range(filestats) {
file, err := os.Open(fs.Name())
if err != nil {
return files, err
}
files = append(files, file)
}
return files, nil
}
It is possible to use the archived files without writing back to disk (https://github.com/gen2brain/go-unarr#read-all-entries-from-archive), but whether or not you should do that instead depends on what your next step is.

G110: Potential DoS vulnerability via decompression bomb (gosec)

I'm getting the following golintci message:
testdrive/utils.go:92:16: G110: Potential DoS vulnerability via decompression bomb (gosec)
if _, err := io.Copy(targetFile, fileReader); err != nil {
^
Read the corresponding CWE and I'm not clear on how this is expected to be corrected.
Please offer pointers.
func unzip(archive, target string) error {
reader, err := zip.OpenReader(archive)
if err != nil {
return err
}
for _, file := range reader.File {
path := filepath.Join(target, file.Name) // nolint: gosec
if file.FileInfo().IsDir() {
if err := os.MkdirAll(path, file.Mode()); err != nil {
return err
}
continue
}
fileReader, err := file.Open()
if err != nil {
return err
}
defer fileReader.Close() // nolint: errcheck
targetFile, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
if err != nil {
return err
}
defer targetFile.Close() // nolint: errcheck
if _, err := io.Copy(targetFile, fileReader); err != nil {
return err
}
}
return nil
}
The warning you get comes from a rule provided in gosec.
The rule specifically detects usage of io.Copy on file decompression.
This is a potential issue because io.Copy:
copies from src to dst until either EOF is reached on src or an error occurs.
So, a malicious payload might cause your program to decompress an unexpectedly big amount of data and go out of memory, causing denial of service as mentioned in the warning message.
In particular, gosec will check (source) the AST of your program and warn you about usage of io.Copy or io.CopyBuffer together with any one of the following:
"compress/gzip".NewReader
"compress/zlib".NewReader or NewReaderDict
"compress/bzip2".NewReader
"compress/flate".NewReader or NewReaderDict
"compress/lzw".NewReader
"archive/tar".NewReader
"archive/zip".NewReader
"*archive/zip".File.Open
Using io.CopyN removes the warning because (quote) it "copies n bytes (or until an error) from src to dst", thus giving you (the program writer) control of how many bytes to copy. So you could pass an arbitrarily large n that you set based on the available resources of your application, or copy in chunks.
Based on various pointers provided, replaced
if _, err := io.Copy(targetFile, fileReader); err != nil {
return err
}
with
for {
_, err := io.CopyN(targetFile, fileReader, 1024)
if err != nil {
if err == io.EOF {
break
}
return err
}
}
PS while this helps memory footprint, this wouldn't help a DDOS attack copying very long and/or infinite stream ...
Assuming that you're working on compressed data, you need to use io.CopyN.
You can try a workaround with --nocompress flag. But this will cause the data to be included uncompressed.
See the following PR and related issue : https://github.com/go-bindata/go-bindata/pull/50

Determine Length Of Golang Gzip File Without Reading It?

I have gzipped files on disk that I wish to stream to an HTTP client uncompressed. To do this I need to send a length header, then stream the uncompressed file to the client. I know the gzip protocol stores the original length of the uncompressed data, but as far as I can tell golang's "compress/gzip" package does not appear to have a way to grab this length. I've resorted to reading the file into a variable then taking the string length from that, but this is grossly inefficient and wasteful of memory especially on larger files.
Bellow I've posted the code I've ended up using:
DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer openfile.Close();
fz, err := gzip.NewReader(openfile);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer fz.Close()
// Wastefully read data into a string so I can get the length.
s, err := ioutil.ReadAll(fz);
r := strings.NewReader(string(s));
//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(len(s))); // Send length to client.
w.Header().Set("Content-Type", "text/csv");
io.Copy(w, r) //'Copy' the file to the client
}
What I would expect to be able to do instead is something like this:
DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer openfile.Close();
fz, err := gzip.NewReader(openfile);
if err != nil {
w.WriteHeader(http.StatusNotFound);
fmt.Fprint(w, "404");
return;
}
defer fz.Close()
//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(fz.Length())); // Send length to client.
w.Header().Set("Content-Type", "text/csv");
io.Copy(w, fz) //'Copy' the file to the client
}
Does anyone know how to get the uncompressed length for a gzipped file in golang?
The gzip format might appear to provide the uncompressed length, but actually it does not. Unfortunately, the only reliable way to get the uncompressed length is to decompress the gzip stream. (You can just count the bytes, not saving the uncompressed data anywhere.)
See this answer for why.

Convert os.Stdin to []byte

I'm trying to implement a small chat-server in golang with end-to-end encryption. Starting of the example for server https://github.com/adonovan/gopl.io/tree/master/ch8/chat and client https://github.com/adonovan/gopl.io/blob/master/ch8/netcat3/netcat.go I stumbled upon https://www.thepolyglotdeveloper.com/2018/02/encrypt-decrypt-data-golang-application-crypto-packages/ to encrypt and decrypt in Go.
The function to encrypt:
func encrypt(data []byte, passphrase string) []byte {
block, _ := aes.NewCipher([]byte(createHash(passphrase)))
gcm, err := cipher.NewGCM(block)
if err != nil {
panic(err.Error())
}
nonce := make([]byte, gcm.NonceSize())
if _, err = io.ReadFull(rand.Reader, nonce); err != nil {
panic(err.Error())
}
ciphertext := gcm.Seal(nonce, nonce, data, nil)
return ciphertext
}
in func main():
ciphertext := encrypt([]byte(os.Stdin), "password")
mustCopy(conn, ciphertext)
conn.Close()
os.Stdin is os.file, while it is needed as []byte. The solution should be io.Reader or via buffer, but I can't find a working solution.
I tried
bytes.NewBuffer([]byte(os.Stdin))
and
reader := bytes.NewReader(os.Stdin)
Any input is more than welcome. Sorry if I'm not seeing the obvious problem/solution here, as I'm fairly new.
os.Stdin is an io.Reader. You can't convert it to a []byte, but you can read from it, and the data you read from it, that may be read into a []byte.
Since in many terminals reading from os.Stdin gives data by lines, you should read a complete line from it. Reading from os.Stdin might block until a complete line is available.
For that you have many possibilities, one is to use bufio.Scanner.
This is how you can do it:
scanner := bufio.NewScanner(os.Stdin)
if !scanner.Scan() {
log.Printf("Failed to read: %v", scanner.Err())
return
}
line := scanner.Bytes() // line is of type []byte, exactly what you need

Calculating hash of encrypted data with a multiwriter

I was hoping to be able to calculate the hash of the encrypted data in parallel, but it seems like using a multiple writer as below, the hash is being calculate with plaintext bytes.
Anyone know how I can use a single copy to achieve both encrypting the data and hashing it?
writer := &cipher.StreamWriter{S: cipher.NewCTR(block, iv), W: writeFile}
writeFile.Write(iv)
if _, err := io.Copy(io.MultiWriter(writer, hash), readFile); err != nil {
fmt.Println("error during crypto: " + err.Error())
return "", err
}
You need to move your io.MultiWriter to be the writer of the cipher.StreamWriter. This will calculate the hash of the cypher text, rather than plain text:
writer := &cipher.StreamWriter{
S: cipher.NewCTR(block, iv),
W: io.MultiWriter(writeFile, hash),
}
writeFile.Write(iv)
if _, err := io.Copy(writer, readFile); err != nil {
fmt.Println("error during crypto: " + err.Error())
return "", err
}

Resources