How to convert aws.WriteAtBuffer to an io.Reader? - go

I need to download a file from S3, and then upload the same file into a different S3 bucket. So far I have:
sess := session.Must(session.NewSession())
downloader := s3manager.NewDownloader(sess)
buffer := aws.NewWriteAtBuffer([]byte{})
n, err := downloader.Download(buffer, &s3.GetObjectInput{
Bucket: aws.String(sourceS3Bucket),
Key: aws.String(documentKey),
})
uploader := s3manager.NewUploader(sess)
result, err := uploader.Upload(&s3manager.UploadInput{
Bucket: aws.String(targetS3Bucket),
Key: aws.String(documentKey),
Body: buffer,
})
I have used an aws.WriteAtBuffer, as per the answer here: https://stackoverflow.com/a/48254996/504055
However, I am currently stuck on how to treat this buffer as something that implements the io.Reader interface, which is what the uploader's Upload method requires.

Use bytes.NewReader to create an io.Reader on the bytes in the buffer:
result, err := uploader.Upload(&s3manager.UploadInput{
Bucket: aws.String(targetS3Bucket),
Key: aws.String(documentKey),
Body: bytes.NewReader(buffer.Bytes()),
})

Related

Gzip writer not writing gzip data to S3 in Golang

I've got a (hopefully) simple problem. I'm trying to write the results of a HTTP request to a Gzip file in S3. However when downloading the resultant file from S3, it's just in plain text and not compressed. Below is a snippet of the code (sans bootstrapping). The code builds, lints and runs without error, so I'm not sure where I'm going wrong...any pointers would be greatly appreciated!
r, w := io.Pipe()
gw := gzip.NewWriter(w)
go func() {
defer w.Close()
defer gw.Close()
_, err := gw.Write(httpResponse)
if err != nil {
fmt.Println(“error”)
}
}()
cfg, _ := config.LoadDefaultConfig(context.TODO())
s3Client := s3.NewFromConfig(cfg)
ul := manager.NewUploader(s3Client)
_, err := ul.Upload(context.TODO(), &s3.PutObjectInput{
Bucket: aws.String(bucket),
ContentEncoding: aws.String("gzip"),
Key: aws.String(fileName),
Body: r,
})
if err != nil {
fmt.Println(“error”)
}
This is a side effect of downloading via the browser, it'll decode the gzip but leave the .gz extension (which is frankly confusing). If you use the AWS cli or API to download the file, it will remain as GZIP.
See: https://superuser.com/questions/940605/chromium-prevent-unpacking-tar-gz?noredirect=1&lq=1

Getting `panic: os: invalid use of WriteAt on file opened with O_APPEND`

I am a newbie to Go. Was starting to write my first code in which I have to download a bunch of CSV's from AWS. I don't understand why it is giving me the below error with O_APPEND mode. If I remove os.O_APPEND, I only get the last file data which is not the objective.
The objective is to download all CSV files into one file locally. I'd like to understand what I'm doing incorrectly.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
AccessKeyId = "xxxxxxxxx"
SecretAccessKey = "xxxxxxxxxxxxxxxxxxxx"
Region = "eu-central-1"
Bucket = "dexter-reports"
bucketKey = "Jenkins/pluginVersions/"
)
func main() {
// Load the Shared AWS Configuration
os.Setenv("AWS_ACCESS_KEY_ID", AccessKeyId)
os.Setenv("AWS_SECRET_ACCESS_KEY", SecretAccessKey)
filename := "JenkinsPluginDetais.txt"
cred := credentials.NewStaticCredentials(AccessKeyId, SecretAccessKey, "")
config := aws.Config{Credentials: cred, Region: aws.String(Region), Endpoint: aws.String("s3.amazonaws.com")}
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
panic(err)
}
defer file.Close()
sess, err := session.NewSession(&config)
if err != nil {
fmt.Println(err)
}
//list Buckets
ObjectList := listBucketObjects(sess)
//loop over the obectlist. First initialize the s3 downloader via s3manager
downloader := s3manager.NewDownloader(sess)
for _, item := range ObjectList.Contents {
csvFile := filepath.Base(*item.Key)
if csvFile != "pluginVersions" {
downloadBucketObjects(downloader, file, csvFile)
}
}
}
func listBucketObjects(sess *session.Session) *s3.ListObjectsV2Output {
//create a new s3 client
svc := s3.New(sess)
resp, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(Bucket),
Prefix: aws.String(bucketKey),
})
if err != nil {
panic(err)
}
return resp
}
func downloadBucketObjects(downloader *s3manager.Downloader, file *os.File, keyobj string) {
fileToDownload := bucketKey + keyobj
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
Firstly, I don't get it why do you even need os.O_APPEND flag in the first place. As per my understanding, you can omit os.O_APPEND.
Now, let's come to the actual problem of why it's happening:
Doc for O_APPEND (Ref: https://man7.org/linux/man-pages/man2/open.2.html):
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
So for every call to write the file offset is positioned at the end of the file.
But (*s3Manager.Download).Download supposedly be using WriteAt method, i.e.,
Doc for WriteAt:
$ go doc os WriteAt
package os // import "os"
func (f *File) WriteAt(b []byte, off int64) (n int, err error)
WriteAt writes len(b) bytes to the File starting at byte offset off. It
returns the number of bytes written and an error, if any. WriteAt returns a
non-nil error when n != len(b).
If file was opened with the O_APPEND flag, WriteAt returns an error.
Notice the last line, that if the file's opened with O_APPEND flag it will result in an error and it's even right because WriteAt's second argument is an offset but mixing O_APPEND's behaviour and WriteAt offset seeking might create problem resulting in unexpected results and it errors out.
Consider the definition of s3manager.Downloader:
func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)
The first argument is an io.WriterAt; this interface is:
type WriterAt interface {
WriteAt(p []byte, off int64) (n int, err error)
}
This means that the Download function is going to call the WriteAt method in the File you are passing it. As per the documentation for File.WriteAt
If file was opened with the O_APPEND flag, WriteAt returns an error.
So this explains why you are getting the error but raises the question "why is Download using WriteAt and not accepting an io.Writer (and calling Write)?"; the answer can be found in the documentation:
The w io.WriterAt can be satisfied by an os.File to do multipart concurrent downloads, or in memory []byte wrapper using aws.WriteAtBuffer
So, to increase performance, Downloader might make multiple simultaneous requests for parts of the file and then write these out as they are received (meaning it may not write the data in order). This also explains why calling the function multiple times with the same File results in overwritten data (when Downloader retrieves the each chunk of the file it writes it out at the appropriate position in the output file; this overwrites any data already there).
The above quote from the documentation also points to a possible solution; use an aws.WriteAtBuffer and, once the download is finished, write the data to your file (which could then be opened with O_APPEND) - something like this:
buf := aws.NewWriteAtBuffer([]byte{})
numBytes, err := downloader.Download(buf,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
_, err = file.Write(buf.Bytes())
if err != nil {
panic(err)
}
An alternative would be to download into a temporary file and then append that to your output file (you may need to do this if the files are large).

How to Achieve Performance of AWS CLI Sync Command in AWS SDK

The aws s3 sync command in the CLI can download a large collection of files very quickly, and I can not achieve the same performance with the AWS Go SDK. I have millions of files in the bucket so this is critical to me. I need to use the list pages command as well so that I can add a prefix which is not supported well by the sync CLI command.
I have tried using multiple goroutines (10 up to 1000) to make requests to the server, but the time is just so much slower compared to the CLI. It takes about 100 ms per file to run the Go GetObject function which is unacceptable for the number of files that I have. I know that the AWS CLI also uses the Python SDK in the backend, so how does it have so much better performance (I tried my script in boto as well as Go).
I am using ListObjectsV2Pages and GetObject. My region is the same as the S3 server's.
logMtx := &sync.Mutex{}
logBuf := bytes.NewBuffer(make([]byte, 0, 100000000))
err = s3c.ListObjectsV2Pages(
&s3.ListObjectsV2Input{
Bucket: bucket,
Prefix: aws.String("2019-07-21-01"),
MaxKeys: aws.Int64(1000),
},
func(page *s3.ListObjectsV2Output, lastPage bool) bool {
fmt.Println("Received", len(page.Contents), "objects in page")
worker := make(chan bool, 10)
for i := 0; i < cap(worker); i++ {
worker <- true
}
wg := &sync.WaitGroup{}
wg.Add(len(page.Contents))
objIdx := 0
objIdxMtx := sync.Mutex{}
for {
<-worker
objIdxMtx.Lock()
if objIdx == len(page.Contents) {
break
}
go func(idx int, obj *s3.Object) {
gs := time.Now()
resp, err := s3c.GetObject(&s3.GetObjectInput{
Bucket: bucket,
Key: obj.Key,
})
check(err)
fmt.Println("Get: ", time.Since(gs))
rs := time.Now()
logMtx.Lock()
_, err = logBuf.ReadFrom(resp.Body)
check(err)
logMtx.Unlock()
fmt.Println("Read: ", time.Since(rs))
err = resp.Body.Close()
check(err)
worker <- true
wg.Done()
}(objIdx, page.Contents[objIdx])
objIdx += 1
objIdxMtx.Unlock()
}
fmt.Println("ok")
wg.Wait()
return true
},
)
check(err)
Many results look like:
Get: 153.380727ms
Read: 51.562µs
Have you tried using https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/?
iter := new(s3manager.DownloadObjectsIterator)
var files []*os.File
defer func() {
for _, f := range files {
f.Close()
}
}()
err := client.ListObjectsV2PagesWithContext(ctx, &s3.ListObjectsV2Input{
Bucket: aws.String(bucket),
Prefix: aws.String(prefix),
}, func(output *s3.ListObjectsV2Output, last bool) bool {
for _, object := range output.Contents {
nm := filepath.Join(dstdir, *object.Key)
err := os.MkdirAll(filepath.Dir(nm), 0755)
if err != nil {
panic(err)
}
f, err := os.Create(nm)
if err != nil {
panic(err)
}
log.Println("downloading", *object.Key, "to", nm)
iter.Objects = append(iter.Objects, s3manager.BatchDownloadObject{
Object: &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: object.Key,
},
Writer: f,
})
files = append(files, f)
}
return true
})
if err != nil {
panic(err)
}
downloader := s3manager.NewDownloader(s)
err = downloader.DownloadWithIterator(ctx, iter)
if err != nil {
panic(err)
}
I ended up settling for my script in the initial post. I tried 20 goroutines and that seemed to work pretty well. On my laptop, the initial script is definitely slower than the command line (i7 8-thread, 16 GB RAM, NVME) versus the CLI. However, on the EC2 instance, the difference was small enough that it was not worth my time to optimize it further. I used a c5.xlarge instance in the same region as the S3 server.

Convert os.Stdin to []byte

I'm trying to implement a small chat-server in golang with end-to-end encryption. Starting of the example for server https://github.com/adonovan/gopl.io/tree/master/ch8/chat and client https://github.com/adonovan/gopl.io/blob/master/ch8/netcat3/netcat.go I stumbled upon https://www.thepolyglotdeveloper.com/2018/02/encrypt-decrypt-data-golang-application-crypto-packages/ to encrypt and decrypt in Go.
The function to encrypt:
func encrypt(data []byte, passphrase string) []byte {
block, _ := aes.NewCipher([]byte(createHash(passphrase)))
gcm, err := cipher.NewGCM(block)
if err != nil {
panic(err.Error())
}
nonce := make([]byte, gcm.NonceSize())
if _, err = io.ReadFull(rand.Reader, nonce); err != nil {
panic(err.Error())
}
ciphertext := gcm.Seal(nonce, nonce, data, nil)
return ciphertext
}
in func main():
ciphertext := encrypt([]byte(os.Stdin), "password")
mustCopy(conn, ciphertext)
conn.Close()
os.Stdin is os.file, while it is needed as []byte. The solution should be io.Reader or via buffer, but I can't find a working solution.
I tried
bytes.NewBuffer([]byte(os.Stdin))
and
reader := bytes.NewReader(os.Stdin)
Any input is more than welcome. Sorry if I'm not seeing the obvious problem/solution here, as I'm fairly new.
os.Stdin is an io.Reader. You can't convert it to a []byte, but you can read from it, and the data you read from it, that may be read into a []byte.
Since in many terminals reading from os.Stdin gives data by lines, you should read a complete line from it. Reading from os.Stdin might block until a complete line is available.
For that you have many possibilities, one is to use bufio.Scanner.
This is how you can do it:
scanner := bufio.NewScanner(os.Stdin)
if !scanner.Scan() {
log.Printf("Failed to read: %v", scanner.Err())
return
}
line := scanner.Bytes() // line is of type []byte, exactly what you need

Multipart upload to s3 while writing to reader

I've found a few questions that are similar to mine, but nothing that answers my specific question.
I want to upload CSV data to s3. My basic code is along the lines of (I've simplified getting the data for brevity, normally it's reading from a database):
reader, writer := io.Pipe()
go func() {
cWriter := csv.NewWriter(pWriter)
for _, line := range lines {
cWriter.Write(line)
}
cWriter.Flush()
writer.Close()
}()
sess := session.New(//...)
uploader := s3manager.NewUploader(sess)
result, err := uploader.Upload(&s3manager.UploadInput{
Body: reader,
//...
})
The way I understand it, the code will wait for writing to finish and then will upload the contents to s3, so I end up with the full contents of the file in memory. Is it possible to chunk the upload (possibly using the s3 multipart upload?) so that for larger uploads, I'm only storing part of the data in memory at any one time?
The uploader is supported multipart upload if I had read source code of the uploader in right way: https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/upload.go
The minimum size of an uploaded part is 5 Mb.
// MaxUploadParts is the maximum allowed number of parts in a multi-part upload
// on Amazon S3.
const MaxUploadParts = 10000
// MinUploadPartSize is the minimum allowed part size when uploading a part to
// Amazon S3.
const MinUploadPartSize int64 = 1024 * 1024 * 5
// DefaultUploadPartSize is the default part size to buffer chunks of a
// payload into.
const DefaultUploadPartSize = MinUploadPartSize
u := &Uploader{
PartSize: DefaultUploadPartSize,
MaxUploadParts: MaxUploadParts,
.......
}
func (u Uploader) UploadWithContext(ctx aws.Context, input *UploadInput, opts ...func(*Uploader)) (*UploadOutput, error) {
i := uploader{in: input, cfg: u, ctx: ctx}
.......
func (u *uploader) nextReader() (io.ReadSeeker, int, error) {
.............
switch r := u.in.Body.(type) {
.........
default:
part := make([]byte, u.cfg.PartSize)
n, err := readFillBuf(r, part)
u.readerPos += int64(n)
return bytes.NewReader(part[0:n]), n, err
}
}

Resources