How do you encrypt large files / byte streams in Go? - go

I have some large files I would like to AES encrypt before sending over the wire or saving to disk. While it seems possible to encrypt streams, there seems to be warnings against doing this and instead people recommend splitting the files into chunks and using GCM or crypto/nacl/secretbox.
Processing streams of data is more difficult due to the authenticity requirement. We can’t encrypt-then-MAC: by it’s nature, we usually don’t know the size of a stream. We can’t send the MAC after the stream is complete, as that usually is indicated by the stream being closed. We can’t decrypt a stream on the fly, because we have to see the entire ciphertext in order to check the MAC. Attempting to secure a stream adds enormous complexity to the problem, with no good answers. The solution is to break the stream into discrete chunks, and treat them as messages.
https://leanpub.com/gocrypto/read
Files are segmented into 4KiB blocks. Each block gets a fresh random 128 bit IV each time it is modified. A 128-bit authentication tag (GHASH) protects each block from modifications.
https://nuetzlich.net/gocryptfs/forward_mode_crypto/
If a large amount of data is decrypted it is not always possible to buffer all decrypted data until the authentication tag is verified. Splitting the data into small chunks fixes the problem of deferred authentication checks but introduces a new one. The chunks can be reordered... ...because every chunk is encrypted separately. Therefore the order of the chunks must be encoded somehow into the chunks itself to be able to detect rearranging any number of chunks.
https://github.com/minio/sio
Can anyone with actual cryptography experience point me in the right direction?
Update
I realized after asking this question that there is a difference between simply not being able to fit the whole byte stream into memory (encrypting a 10GB file) and the byte stream also being an unknown length that could continue long past the need for the stream's start to be decoded (an 24-hour live video stream).
I am mostly interested in large blobs where the end of the stream can be reached before the beginning needs to be decoded. In other words, encryption that does not require the whole plaintext/ciphertext to be loaded into memory at the same time.

As you've already discovered from your research, there isn't much of an elegant solution for authenticated encryption of large files.
There are traditionally two ways to approach this problem:
Split the file into chunks, encrypt each chunk individually and let each chunk have its own authentication tag. AES-GCM would be the best mode to use for this. This method causes file size bloating proportionate to the size of the file. You'll also need a unique nonce for each chunk. You also need a way to indicate where chunks begin/end.
Encrypt using AES-CTR with a buffer, call Hash.Write on an HMAC for each buffer of encrypted data. The benefit of this is that encrypting can be done in one pass. The downside is that decryption requires one pass to validate the HMAC and then another pass to actually decrypt. The upside here is that the file size remains the same, plus roughly ~48 or so bytes for the IV and HMAC result.
Neither is ideal, but for very large files (~2GB or more), the second option is probably preferred.
I have included an example of encryption in Go using the second method below. In this scenario, the last 48 bytes are the IV (16 bytes) and the result of the HMAC (32 bytes). Note the HMACing of the IV also.
const BUFFER_SIZE int = 4096
const IV_SIZE int = 16
func encrypt(filePathIn, filePathOut string, keyAes, keyHmac []byte) error {
inFile, err := os.Open(filePathIn)
if err != nil { return err }
defer inFile.Close()
outFile, err := os.Create(filePathOut)
if err != nil { return err }
defer outFile.Close()
iv := make([]byte, IV_SIZE)
_, err = rand.Read(iv)
if err != nil { return err }
aes, err := aes.NewCipher(keyAes)
if err != nil { return err }
ctr := cipher.NewCTR(aes, iv)
hmac := hmac.New(sha256.New, keyHmac)
buf := make([]byte, BUFFER_SIZE)
for {
n, err := inFile.Read(buf)
if err != nil && err != io.EOF { return err }
outBuf := make([]byte, n)
ctr.XORKeyStream(outBuf, buf[:n])
hmac.Write(outBuf)
outFile.Write(outBuf)
if err == io.EOF { break }
}
outFile.Write(iv)
hmac.Write(iv)
outFile.Write(hmac.Sum(nil))
return nil
}

Using HMAC after encryption is a valid method. However, HMAC can be pretty slow, especially if SHA-2 is used. You could actually do the same with GMAC, the underlying MAC of GCM. It may be tricky to find an implementation but GMAC is over the ciphertext, so you can simply perform it separately if you really want. There are other methods as well such as Poly1305 with AES as used for TLS 1.2 and 1.3.
For GCM (or CCM or EAX or any other authenticated cipher) you need to authenticate the order of the chunks. You could do this by creating a separate file encryption key and then using the nonce input (the 12 byte IV) to indicate the number of the chunk. This will solve the storage of the IV and make sure that the chunks are in order. You can generate the file encryption key using a KDF (if you have a unique way to indicate the file) or by wrapping a random key with a master key.

Related

How do I encrypt special characters or numbers using AES-256?

I would like to encrypt a string in Go using AES-256, without any GCM processing, to compare against MQL4. I encounter issues when I try to encrypt special characters or numbers. Should I be pre-processing my plaintext somehow? I am new to Go so any help would be appreciated; my code is below this explanation.
If I encrypt the plaintext "This is a secret" and then decrypt the ciphertext (encoded to hex), I get the same result (i.e. "This is a secret"). pt is the variable name of the plaintext in the code below.
If I try to encrypt "This is a secret; 1234", the ciphertext has a group of zeroes at the end, and when I decrypt I only get "This is a secret". Similar ciphertext in MQL4 does not have zeroes at the end and decrypts correctly.
If I try to encrypt only "1234", I get build errors, stemming from "crypto/aes.(*aesCipherAsm).Encrypt(0xc0000c43c0, 0xc0000ac058, 0x4, 0x4, 0xc0000ac070, 0x4, 0x8)
C:/Program Files/Go/src/crypto/aes/cipher_asm.go:60 +0x125"
Here is my code:
package main
import (
"crypto/aes"
"encoding/hex"
"fmt"
)
func main() {
// cipher key
key := "thisis32bitlongpassphraseimusing"
// plaintext
pt := "This is a secret"
// pt := "This is a secret; 1234" // zeroes in ciphertext
// pt := "1234" // doesn't build
c := EncryptAES([]byte(key), pt)
// plaintext
fmt.Println(pt)
// ciphertext
fmt.Println(c)
// decrypt
DecryptAES([]byte(key), c)
}
func EncryptAES(key []byte, plaintext string) string {
c, err := aes.NewCipher(key)
CheckError(err)
out := make([]byte, len(plaintext))
c.Encrypt(out, []byte(plaintext))
return hex.EncodeToString(out)
}
func DecryptAES(key []byte, ct string) {
ciphertext, _ := hex.DecodeString(ct)
c, err := aes.NewCipher(key)
CheckError(err)
pt := make([]byte, len(ciphertext))
c.Decrypt(pt, ciphertext)
s := string(pt[:])
fmt.Println("DECRYPTED:", s)
}
func CheckError(err error) {
if err != nil {
panic(err)
}
}
You're creating a raw AES encryptor here. AES can only encrypt precisely 16 bytes of plaintext, producing exactly 16 bytes of cipher text. Your first example "This is a secret" is exactly 16 bytes long, so it works as expected. Your second example is too long. Only the first 16 bytes are being encrypted. The third example is too short and you're likely running into uninitialized memory.
The specific characters in your text are irrelevant. Encryption is performed on raw bytes, not letters.
In order to encrypt larger (or smaller) blocks of text, you need to use a block cipher mode on top of AES. Common modes are GCM, CBC, and CTR, but there are many others. In most cases, when someone says "AES" without any qualifier, they mean AES-CBC. (GCM is becoming much more popular, and it's a great mode, but it's not so popular that it's assumed quite yet.)
I don't know anything about MQL4, but I assume you're trying to reimplement CryptEncode? I don't see any documentation on how they do the encryption. You need to know what mode they use, how they derive their key, how they generate (and possibly encode) their IV, whether they include an HMAC or other auth, and more. You need to know exactly how they implement whatever they mean by "CRYPT_AES256." There is no one, standard answer to this.
MQL4 only supports a very specific implementation of AES encryption and unless you use the correct settings in your other code you will not achieve compatibility between the two platforms.
Specifically you need to ensure the following are implemented:
Padding Mode: Zeros
Cipher Mode: ECB (so no IV)
KeySize: 256
BlockSize: 128
You also need to remember in MQL4 that encryption/decryption is a two stage process (to AES256 then to BASE64).
You can try the online AES encryption/decryption tool to verify your results available here: The online toolbox

How Do I Encrypt AES256 CTR with a Plain Text Password?

I'm working with this example to try to encrypt data using AES-256. However, when I use the password WeakPasswordForTesting as input, I get an error: crypto/aes: invalid key size. I get the impression that it has to do with my plain password length. How can I use this example code to encrypt the data using a short plain-text password?
func Encrypt(password []byte, plainSource []byte) ([]byte, error) {
key, _ := hex.DecodeString(string(password))
block, err := aes.NewCipher(key)
if err != nil {
panic(err)
}
// The IV needs to be unique, but not secure. Therefore it's common to
// include it at the beginning of the ciphertext.
ciphertext := make([]byte, aes.BlockSize+len(plainSource))
iv := ciphertext[:aes.BlockSize]
if _, err := io.ReadFull(rand.Reader, iv); err != nil {
panic(err)
}
stream := cipher.NewCTR(block, iv)
stream.XORKeyStream(ciphertext[aes.BlockSize:], plainSource)
return ciphertext, nil
}
The Advanced Encryption Standard (AES) is a block cipher which expects a key of a fixed size. The key length is in the name of the cipher: AES-256 means a key length of 256 bits.
As your password is not a key which conforms to this spec, you need to implement a process to make it so. A common mechanism is a key derivation function, which takes an input passphrase of entropy significantly lower than the desired key length, and pads it with various data to produce a key of the required length. The PBKDF family (password-based key derivation function) is an example of this, as is scrypt.
You should not implement a naïve mechanism to pad your input data in this way, as the security of a good cryptosystem is entirely dependent on the quality of the key, and it is highly unlikely any homegrown solution will be resistant to a variety of attacks.
This question on crypto.SE may be interesting regarding a more thorough assessment of implementation recommendations using PBKDF2.

Most efficient way to read Zlib compressed file in Golang?

I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.
This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.
Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?
What is the optimal buffer size, or is there a way to calculate this?
If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?
You can wrap your zlib reader in a bufio reader, then implement a specialized reader on top that will rebuild your chunks of data by reading from the bufio reader until a full chunk is read. Be aware that bufio.Read calls Read at most once on the underlying Reader, so you need to call ReadByte in a loop. bufio will however take care of the unpredictable size of data returned by the zlib reader for you.
If you do not want to implement a specialized reader, you can just go with a bufio reader and read as many bytes as needed with ReadByte() to fill a given data type. The optimal buffer size is at least the size of your largest data structure, up to whatever you can shove into memory.
If you read directly from the zlib reader, there is no guarantee that your data won't be split between two reads.
Another, maybe cleaner, solution is to implement a writer for your data, then use io.Copy(your_writer, zlib_reader).
OK, so I figured this out in the end using my own implementation of a reader.
Basically the struct looks like this:
type reader struct {
at int
n int
f io.ReadCloser
buf []byte
}
This can be attached to the zlib reader:
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach zlib reader
r := new(reader)
r.buf = make([]byte, 2048)
r.f, err = zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer r.f.Close()
Then x number of bytes can be read straight out of the zlib reader using a function like this:
mydata := r.readx(10)
func (r *reader) readx(x int) []byte {
for r.n < x {
copy(r.buf, r.buf[r.at:r.at+r.n])
r.at = 0
m, err := r.f.Read(r.buf[r.n:])
if err != nil {
panic(err)
}
r.n += m
}
tmp := make([]byte, x)
copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
r.at += x
r.n -= x
return tmp
}
Note that I have no need to check for EOF because I my parser should stop itself at the right place.

Go AES CFB compatibility

I am developing a client-side app in Go that relies on AES CFB. The server-side is written in C. My problem is that Go's AES CFB implementation appears to differ from many others (including OpenSSL). I wrote this to test my theory:-
package main
import (
"fmt"
"encoding/hex"
"crypto/cipher"
"crypto/aes"
)
func encrypt_aes_cfb(plain, key, iv []byte) (encrypted []byte) {
block, err := aes.NewCipher(key)
if err != nil {
panic(err)
}
encrypted = make([]byte, len(plain))
stream := cipher.NewCFBEncrypter(block, iv)
stream.XORKeyStream(encrypted, plain)
return
}
func decrypt_aes_cfb(encrypted, key, iv []byte) (plain []byte) {
block, err := aes.NewCipher(key)
if err != nil {
panic(err)
}
plain = make([]byte, len(encrypted))
stream := cipher.NewCFBDecrypter(block, iv)
stream.XORKeyStream(plain, encrypted)
return
}
func main() {
plain := []byte("Hello world.....")
key := []byte("01234567890123456789012345678901")
iv := []byte("0123456789012345")
enc := encrypt_aes_cfb(plain, key, iv)
dec := decrypt_aes_cfb(enc, key, iv)
fmt.Println("Key: ", hex.EncodeToString(key))
fmt.Println("IV: ", hex.EncodeToString(iv))
fmt.Println("Enc: ", hex.EncodeToString(enc))
fmt.Println("In: ", hex.EncodeToString(plain))
fmt.Println("Out: ", hex.EncodeToString(dec))
}
When this is run, it appears to work perfectly, however, if the encrypted bytes are pasted into another AES implementation and decrypted using the same key and IV, the plaintext is corrupted (except for the first Byte). http://aes.online-domain-tools.com/ provides a simple means to test this. Any suggestions why this might be happening and how I can resolve it?
Thanks
Steve
(Firstly, an obligatory warning: CFB mode is a sign of homegrown crypto. Unless you're implementing OpenPGP you should be using an AE mode like AES-GCM or NaCl's secretbox. If you're forced to use CFB mode, I hope that you're authenticating ciphertexts with an HMAC at least.)
With that aside, CFB mode in Go is there for OpenPGP support. (OpenPGP uses both a tweaked CFB mode called OCFB, and standard CFB mode in different places.) The Go OpenPGP code appears to interoperate with other implementations at least.
Nick is correct that test vectors are missing in the Go crypto package. The testing was coming from the OpenPGP code, but packages should stand alone and so I'll add tests to crypto/cipher with the test vectors from section F.3.13 of [1].
My best guess for the source of any differences is that CFB is parameterised by a chunk size. This is generally a power of two number of bits up to the block size of the underlying cipher. If the chunk size isn't specified then it's generally taken to be the cipher block size, which is what the Go code does. See [1], section 6.3. A more friendly explanation is given by [2].
Small chunk sizes were used in the dark ages (the late 90s) when people worried about things like cipher resync in the face of ciphertext loss. If another implementation is using CFB1 or CFB8 then it'll be very different from Go's CFB mode and many others. (Go's code does not support smaller chunk sizes.)
[1] http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
[2] http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Cipher_feedback_.28CFB.29
I investigated this with the following inputs because I was unsure of the bit/byte order for both inputs and outputs :
Key: 00000000000000000000000000000000
IV: 00000000000000000000000000000000
Enc: 66
In: 00
Out: 00
http://play.golang.org/p/wl2y1EE6lK
Which matches the tool you provided, and then this :
Key: 00000000000000000000000000000000
IV: 00000000000000000000000000000000
Enc: 66e94b
In: 000000
Out: 000000
http://play.golang.org/p/DNC42m2oU5
Which doesn't match the tool :
6616f9
http://aes.online-domain-tools.com/link/63687gDNzymApefh/
The first byte matches, which indicates there may be a feedback issue.
So I checked the Go package's code and I think there is a bug here :
func (x *cfb) XORKeyStream(dst, src []byte) {
for len(src) > 0 {
if x.outUsed == len(x.out) {
x.b.Encrypt(x.out, x.next)
x.outUsed = 0
}
if x.decrypt {
// We can precompute a larger segment of the
// keystream on decryption. This will allow
// larger batches for xor, and we should be
// able to match CTR/OFB performance.
copy(x.next[x.outUsed:], src)
}
n := xorBytes(dst, src, x.out[x.outUsed:])
if !x.decrypt {
copy(x.next[x.outUsed:], dst) // BUG? `dst` should be `src`
}
dst = dst[n:]
src = src[n:]
x.outUsed += n
}
}
EDIT
After a second look at CFB mode it seems that Go's code is fine, so yeah it may be the other implementations are wrong.

CryptoJS.AES and Golang

i already managed to share a random symetric key via rsa.
However i fail to make aes-encryption work with it.
Problem seems to be the salt and initialization vector, that cryptoJS uses.
First it's output is along the lines of:
U2FsdGVkX19wbzVqqOr6U5dnuG34WyH+n1A4PX9Z+xBhY3bALGS7DOa/aphgnubc
Googling for the reoccuring "U2FsdGVkX" or "cryptoJS.AES output" sadly is of no use.
On the other hand, golang's aes requires only a 32bit key and input of 32bit length each.
Which means i have to somehow split the above into the corresponding blocks and figure out, how to create the 32bit key out of the secret key and the data above (which proably includes salt + init vector).
Sadly neither http://code.google.com/p/crypto-js nor any google search provide me with a solution.
By the way - my encryption right now:
var arr = new Array(32);
symetricKey = "";
var symHex = "";
rng.nextBytes(arr);
for(var i = 0; i < arr.length; i++){
symetricKey += String.fromCharCode(arr[i]);
//symHex= arr[i].toString(16), added a 0 if needed (so length always increases by 2)
}
//submit symetric via rsa. This is working, the server gets that key
var enc = CryptoJS.AES.encrypt(unencrypted, symetricKey)
//submit enc, stuck now - what to do with it on the server?
Edit: After the Base64 response:
Thanks for the base64 input.
However i still don't manage to bring it to work.
Especially since the encoded string starts with "SALTED",
i believe, that there might be a problem.
Way i try to encode now:
Encoded on Client by:
var unencrypted = "{mail:test,password:test}"
var enc = CryptoJS.AES.encrypt(unencrypted, symKey)
On Server, the variables enc and symKey are the same as on Client:
baseReader := base64.NewDecoder(base64.StdEncoding, strings.NewReader(enc))
encData, err := ioutil.ReadAll(baseReader)
//if err != nil { ....}, doesn't happen here
cipher, err := aes.NewCipher(symKey)
//if err != nil { ....}, doesn't happen here
byteData := make([]byte, len(encData))
cipher.Decrypt(byteData, encData)
fmt.Println("Dec: ", string(byteData))
//Outputs unrepresentable characters
Any idea?
The output of CryptoJS.AES.encrypt is a CipherParams object containing the key, IV, optional salt, and ciphertext. The string you were referencing was a OpenSSL-compatible formatted string.
var encrypted = CryptoJS.AES.encrypt("Message", "Secret Passphrase");
alert(encrypted.key); // 74eb593087a982e2a6f5dded54ecd96d1fd0f3d44a58728cdcd40c55227522223
alert(encrypted.iv); // 7781157e2629b094f0e3dd48c4d786115
alert(encrypted.salt); // 7a25f9132ec6a8b34
alert(encrypted.ciphertext); // 73e54154a15d1beeb509d9e12f1e462a0
alert(encrypted); // U2FsdGVkX1+iX5Ey7GqLND5UFUoV0b7rUJ2eEvHkYqA= (OpenSSL-compatible format strategy)
CryptoJS' default encryption mode is CBC. You should pass the IV along with your symmetric key during your RSA-encrypted exchange. With the symmetric key, IV, and cipher text byte arrays on the server, you can decrypt it in Go similar to this:
c, err := aes.NewCipher(key)
cfbdec := cipher.NewCBCDecrypter(c, iv)
plaintext := make([]byte, len(ciphertext))
cfbdec.CryptBlock(plaintext, ciphertext)
U2FsdGVkX19wbzVqqOr6U5dnuG34WyH+n1A4PX9Z+xBhY3bALGS7DOa/aphgnubc
That data is base64 encoded, raw it looks something like this:
Salted__po5jSgm[!P8=Yacv,dj`
(note that there are unrepresentable characters in that string)
And this is exactly 32 bytes long.
Although, it don't directly but maybe you will find it useful to have a look at https://github.com/dgryski/dkeyczar.
Its Go implementation of KeyCzar, open source cryptographic toolkit originally developed by members of the Google Security Team
I hope you can learn something from this project
There are probably a few issues we'll need to work through.
First, you need to make sure you're supplying the correct type of arguments to encrypt(). It looks like you may have realized in a comment that you need to do CryptoJS.enc.Hex.parse(symetricKeyHex), though your original post doesn't reflect this yet.
Second, you need to decide how you're getting or creating the IV on both ends. CryptoJS can create one if you use a passphrase, but you're trying to use an actual key rather than a passphrase. That means you'll have to set the IV like you set the key.
Finally, you'll have to decide how you're going to transmit both the IV and the ciphertext. A typical and simple solution is to prepend the IV to the ciphertext, but really anything that's easily serializable/unserializable will work fine. Just make sure that whatever format you decide on is consistent on both ends.

Resources