I was hoping to be able to calculate the hash of the encrypted data in parallel, but it seems like using a multiple writer as below, the hash is being calculate with plaintext bytes.
Anyone know how I can use a single copy to achieve both encrypting the data and hashing it?
writer := &cipher.StreamWriter{S: cipher.NewCTR(block, iv), W: writeFile}
writeFile.Write(iv)
if _, err := io.Copy(io.MultiWriter(writer, hash), readFile); err != nil {
fmt.Println("error during crypto: " + err.Error())
return "", err
}
You need to move your io.MultiWriter to be the writer of the cipher.StreamWriter. This will calculate the hash of the cypher text, rather than plain text:
writer := &cipher.StreamWriter{
S: cipher.NewCTR(block, iv),
W: io.MultiWriter(writeFile, hash),
}
writeFile.Write(iv)
if _, err := io.Copy(writer, readFile); err != nil {
fmt.Println("error during crypto: " + err.Error())
return "", err
}
Related
I've been working on serializing a radix tree (used for indexing) to a file in golang. The radix tree nodes are storing 6-bit roaring bitmaps (see https://github.com/RoaringBitmap/roaring). The following code is what I am using, and the output I am getting when trying to load it back into memory:
serializedTree := i.index.ToMap()
encodeFile, err := os.Create(fmt.Sprintf("./serialized/%s/%s", appindex.name, i.field))
if err != nil {
panic(err)
}
e := gob.NewEncoder(encodeFile)
err = e.Encode(serializedTree)
encodeFile.Close()
// Turn it back for testing
decodeFile, err := os.Open(fmt.Sprintf("./serialized/%s/%s", appindex.name, i.field))
defer decodeFile.Close()
d := gob.NewDecoder(decodeFile)
decoded := make(map[string]interface{})
err = d.Decode(&decoded)
fmt.Println("before decode", serializedTree)
fmt.Println("after decode", decoded)
if err != nil {
fmt.Println("!!! Error serializing", err)
panic(err)
}
Output:
before decode map[dan:{1822509180252590512} dan1:{6238704462486574203} goodman:{1822509180252590512,6238704462486574203}]
after decode map[]
!!! Error serializing EOF
panic: EOF
goroutine 1 [running]:
main.(*appIndexes).SerializeIndex(0xc000098240)
(I understand the decode is empty because the gob package doesn't modify on EOF error)
I've noticed that when trying with bytes directly, only 15 bytes are being stored on disk (which is way too few). Trying with the encoding/json package with json.Marshall() and json.Unmarshall() and I see 33 bytes stored, but they are loading in empty (the roaring bitmaps are gone):
post encode map[dan:map[] dan1:map[] goodman:map[]]
I feel like this has something to do with the fact that I am trying to serialize a map[string]interface{} rather than something like a map[string]int, but I am still fairly green with golang.
See https://repl.it/#danthegoodman/SelfishMoralCharactermapping#main.go for an example and my testing.
I believe I fixed it by converting the map[string]interface{} into a map[string]*roaring64.Bitmap before writing to disk, then decoding it back into a map[string]*roaring64.Bitmap then converting it back to a map[string]interface{}
m2 := make(map[string]*roaring64.Bitmap)
// Convert m1 to m2
for key, value := range m1 {
m2[key] = value.(*roaring64.Bitmap)
}
fmt.Println("m1", m1)
fmt.Println("m2", m2)
encodeFile, err := os.Create("./test")
if err != nil {
panic(err)
}
e := gob.NewEncoder(encodeFile)
err = e.Encode(m2)
encodeFile.Close()
// Turn it back for testing
decodeFile, err := os.Open("./test")
defer decodeFile.Close()
d := gob.NewDecoder(decodeFile)
decoded := make(map[string]*roaring64.Bitmap)
err = d.Decode(&decoded)
fmt.Println("before decode", m2)
fmt.Println("after decode", decoded)
if err != nil {
fmt.Println("!!! Error serializing", err)
panic(err)
}
m3 := make(map[string]interface{})
// Convert m2 to m3
for key, value := range m2 {
m3[key] = value
}
afterDecTree := radix.NewFromMap(m3)
See https://repl.it/#danthegoodman/VictoriousUtterMention#main.go for a working example
I'm trying to download and decrypt HLS streams by using io.ReadFull to process the data in chunks to conserve memory:
Irrelevant parts of code has been left out for simplicity.
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
for {
r, err := decryptHLS(key, iv, resp.Body)
if err != nil && err == io.EOF {
break
else if err != nil && err != io.ErrUnexpectedEOF {
panic(err)
}
io.Copy(f, r)
}
}
}
func decryptHLS(key []byte, iv []byte, r io.Reader) (io.Reader, error) {
block, _ := aes.NewCipher(key)
buf := make([]byte, 8192)
mode := cipher.NewCBCDecrypter(block, iv)
n, err := io.ReadFull(r, buf)
if err != nil && err != io.ErrUnexpectedEOF {
return nil, err
}
mode.CryptBlocks(buf, buf)
return bytes.NewReader(buf[:n]), err
}
At first this seems to work as file size is correct and no errors during download,
but the video is corrupted. Not completely as the file is still recognized as a video, but image and sound is distorted.
If I change the code to use ioutil.ReadAll instead, the final video files will no longer be corrupted:
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
segment, _ := ioutil.ReadAll(resp.Body)
r, _ := decryptHLS(key, iv, &segment)
io.Copy(f, r)
}
}
func decryptHLS(key []byte, iv []byte, s *[]byte) io.Reader {
block, _ := aes.NewCipher(key)
mode := cipher.NewCBCDecrypter(block, iv)
mode.CryptBlocks(*s, *s)
return bytes.NewReader(*s)
}
Any ideas why it works correctly when reading the entire segment into memory, and not when using io.ReadFull and processing it in chunks?
Internally, CBCDecrypter makes a copy of your iv, so subsequent blocks start with the initial IV rather than the one that's been mutated by previous decryptions.
Create the decrypter once, and you should be able to keep re-using it to decrypt block by block (assuming the block size is a multiple of the block size expected by this crypto algorithm).
I have a go file server that can receive requests of files up 10GB in size. To keep memory usage low I read the multipart form data into a tmp file. I know behind the scenes FormFile does the same but I still need to transfer it to a regular file for some post upload processing.
f, header, err := r.FormFile("file")
if err != nil {
return nil, fmt.Errorf("could not get file from request %w", err)
}
tmpFile, err := ioutil.TempFile("", "oriio-")
if err != nil {
return nil, err
}
if _, err := io.Copy(tmpFile, f); err != nil {
return nil, fmt.Errorf("could not copy request body to file %w", err)
}
After this I need to grab the first 261 bytes of the file to determine its MIME type.
head := make([]byte, 261)
if _, err := tmpFile.Read(head); err != nil {
return nil, err
}
The issue I'm running into is if I try to read directly from tmpFile the byte array returns 261 0 when I print fmt.Prinf("%x", head) aka invalid data. To verify the data is valid I was saving it to a regular file opening it in my system and the file (in this case an image file) was perfectly in tact. So it is not a corrupt file issue. To get around the problem I now close the tmp file and then reopen it again and that seems to fix everything.
tmpFile, err := ioutil.TempFile("", "oriio-")
if err != nil {
return nil, err
}
if _, err := io.Copy(tmpFile, f); err != nil {
return nil, fmt.Errorf("could not copy request body to file %w", err)
}
tmpFile.Close()
tmpFile, err = os.Open(tmpFile.Name())
if err != nil {
panic(err)
}
head := make([]byte, 261)
if _, err := tmpFile.Read(head); err != nil {
return nil, err
}
Now when I print out the head byte array the proper content is printed. Why is this? Is there some sort of Sync or Flush I have to do with the original tmp file to make it work?
Reading/writing a file changes the current location in the file. After copy, the tmpFile is positioned at the end, so reading from it will read 0 bytes. You have to seek first if you want to read from the beginning of the file:
io.Copy(tmpFile, f)
tmpFile.Seek(0,0)
tmpFile.Read(head)
I am writing a webserver that receives a file as an upload as multipart/form-data. I am generating the file sha256 from the request but due to the nature of the Reader interface, I can't reuse the data to also upload the file to a filer. These files can be a few hundred MBs. What is the best way to store the content? I can duplicate the contents but I am worried that could be wasteful on memory resources.
EDIT
func uploadFile(w http.ResponseWriter, r *http.Request) {
f, err := r.MultipartForm.File["capture"][0].Open()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer f.Close()
hash, err := createSha(f)
if err != nil {
fmt.Println(err.Error())
return
}
}
func createSha(image multipart.File) (hash.Hash, error) {
sha := sha256.New()
// This cause the contents of image to no longer be available to be read again to be stored on the filer
if _, err := io.Copy(sha, image); err != nil {
return nil, err
}
return sha, nil
}
You might use io.MultiWriter(...) to send the data to multiple output streams concurrently, such as a hash and some remote writer.
For example (roughly):
sha := sha256.New()
filer := filer.New(...) // Some Writer that stores the bytes for you?
err := io.Copy(io.MultiWriter(sha, filer), r)
// TODO: handle error
// Now sha.Sum(nil) has the file digest and "filer" got sent all the bytes.
Note that io.Multiwriter can take as many writers as you want, so you could compute additional hashes at the same time (e.g. md5, sha1, etc.) or even send the file to multiple locations, e.g.:
md5, sha1, sha256, sha512 := md5.New(), sha1.New(), sha256.New(), sha512.New()
s3Writer, gcsWriter := filer.NewS3Writer(), filer.NewGCSWriter()
mw := io.MultiWriter(awsWriter, gcsWriter, md5, sha1, sha256, sha512)
err := io.Copy(mw, r)
// TODO: handle error
// Now you've got all the hashes for the file and it's stored in the cloud.
I've written a little server which receives a blob of data in the form of an io.Reader, adds a header and streams the result back to the caller.
My implementation isn't particularly efficient as I'm buffering the blob's data in-memory so that I can calculate the blob's length, which needs to form part of the header.
I've seen some examples of io.Pipe() with io.TeeReader but they're more for splitting an io.Reader into two, and writing them away in parallel.
The blobs I'm dealing with are around 100KB, so not huge but if my server gets busy, memory's going to quickly become an issue...
Any ideas?
func addHeader(in io.Reader) (out io.Reader, err error) {
buf := new(bytes.Buffer)
if _, err = io.Copy(buf, in); err != nil {
return
}
header := bytes.NewReader([]byte(fmt.Sprintf("header:%d", buf.Len())))
return io.MultiReader(header, buf), nil
}
I appreciate it's not a good idea to return interfaces from functions but this code isn't destined to become an API, so I'm not too concerned with that bit.
In general, the only way to determine the length of data in an io.Reader is to read until EOF. There are ways to determine the length of the data for specific types.
func addHeader(in io.Reader) (out io.Reader, err error) {
n := 0
switch v := in.(type) {
case *bytes.Buffer:
n = v.Len()
case *bytes.Reader:
n = v.Len()
case *strings.Reader:
n = v.Len()
case io.Seeker:
cur, err := v.Seek(0, 1)
if err != nil {
return nil, err
}
end, err := v.Seek(0, 2)
if err != nil {
return nil, err
}
_, err = v.Seek(cur, 0)
if err != nil {
return nil, err
}
n = int(end - cur)
default:
var buf bytes.Buffer
if _, err := buf.ReadFrom(in); err != nil {
return nil, err
}
n = buf.Len()
in = &buf
}
header := strings.NewReader(fmt.Sprintf("header:%d", n))
return io.MultiReader(header, in), nil
}
This is similar to how the net/http package determines the content length of the request body.