I've found a few questions that are similar to mine, but nothing that answers my specific question.
I want to upload CSV data to s3. My basic code is along the lines of (I've simplified getting the data for brevity, normally it's reading from a database):
reader, writer := io.Pipe()
go func() {
cWriter := csv.NewWriter(pWriter)
for _, line := range lines {
cWriter.Write(line)
}
cWriter.Flush()
writer.Close()
}()
sess := session.New(//...)
uploader := s3manager.NewUploader(sess)
result, err := uploader.Upload(&s3manager.UploadInput{
Body: reader,
//...
})
The way I understand it, the code will wait for writing to finish and then will upload the contents to s3, so I end up with the full contents of the file in memory. Is it possible to chunk the upload (possibly using the s3 multipart upload?) so that for larger uploads, I'm only storing part of the data in memory at any one time?
The uploader is supported multipart upload if I had read source code of the uploader in right way: https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/upload.go
The minimum size of an uploaded part is 5 Mb.
// MaxUploadParts is the maximum allowed number of parts in a multi-part upload
// on Amazon S3.
const MaxUploadParts = 10000
// MinUploadPartSize is the minimum allowed part size when uploading a part to
// Amazon S3.
const MinUploadPartSize int64 = 1024 * 1024 * 5
// DefaultUploadPartSize is the default part size to buffer chunks of a
// payload into.
const DefaultUploadPartSize = MinUploadPartSize
u := &Uploader{
PartSize: DefaultUploadPartSize,
MaxUploadParts: MaxUploadParts,
.......
}
func (u Uploader) UploadWithContext(ctx aws.Context, input *UploadInput, opts ...func(*Uploader)) (*UploadOutput, error) {
i := uploader{in: input, cfg: u, ctx: ctx}
.......
func (u *uploader) nextReader() (io.ReadSeeker, int, error) {
.............
switch r := u.in.Body.(type) {
.........
default:
part := make([]byte, u.cfg.PartSize)
n, err := readFillBuf(r, part)
u.readerPos += int64(n)
return bytes.NewReader(part[0:n]), n, err
}
}
Related
I'm writing a simple web server to serve static files. Any HTML file being served needs to be modified "on the go" to include some HTML just before its closing </body> tag.
I achieved it with the below code and it works, however is there perhaps a more efficient way of doing it? I'm beginner in Go and this code needs to be super performant.
// error handling etc omitted for brevity
dir := http.Dir("my/path")
content, _ := dir.Open("my_file")
var bodyBuf strings.Builder
var contentBuf *bytes.Buffer
io.Copy(&bodyBuf, content)
defer content.Close()
if strings.HasSuffix("some/web/uri", ".html") {
new_html_content := "<whatever></body>"
bodyRpld := strings.Replace(bodyBuf.String(), "</body>", new_html_content, 1)
contentBuf = bytes.NewBuffer([]byte(bodyRpld))
} else {
contentBuf = bytes.NewBuffer([]byte(bodyBuf.String()))
}
d, _ := content.Stat()
http.ServeContent(w, r, "my/path", d.ModTime(), bytes.NewReader(contentBuf.Bytes()))
Thanks!
To avoid creating large buffers for files that do no match your file-match pattern *.html, I would suggest using an io.Reader mechanism to pass-through files that you want to serve untouched. This avoids loading into memory potentially large assets (e.g. 100MB non-html video files).
For files that do match your html check - your string-replace is probably fine as .html are typically small in size.
So try something like this:
dir := http.Dir("my/path")
content, err := dir.Open("my_file") // check error
var r io.ReadSeeker // for http.ServeContent needs
if !strings.HasSuffix("some/web/uri", ".html") {
r = content // pass-through file content (avoid memory allocs)
} else {
// similar to what you had before
b := new(bytes.Buffer)
n, err := b.ReadFrom(content) // check err
defer content.Close()
new_html_content := "<whatever></body>"
newContent := strings.Replace(b.String(),
"</body>", new_html_content, 1)
r = bytes.NewReader([]byte(newContent))
}
d, _ := content.Stat()
http.ServeContent(w, r, "my/path", d.ModTime(), r)
i run the gin example about file upload , this repo is from https://github.com/gin-gonic/examples/tree/5898505356e9064c49abb075eae89596a3c5cd67/upload-file/single.
when i change is limit
router.MaxMultipartMemory = 1 // 8 MiB
but not woking for upload big file, anyone know this.
package main
import (
"fmt"
"net/http"
"github.com/gin-gonic/gin"
)
func main() {
router := gin.Default()
// Set a lower memory limit for multipart forms (default is 32 MiB)
router.MaxMultipartMemory = 1 // 8 MiB
fmt.Println(router.MaxMultipartMemory)
router.Static("/", "./public")
router.POST("/upload", func(c *gin.Context) {
ct := c.Request.Header.Get("Content-Type")
fmt.Println(ct)
name := c.PostForm("name")
email := c.PostForm("email")
// Source
file, err := c.FormFile("file")
if err != nil {
c.String(http.StatusBadRequest, fmt.Sprintf("get form err: %s", err.Error()))
return
}
if err := c.SaveUploadedFile(file, file.Filename); err != nil {
c.String(http.StatusBadRequest, fmt.Sprintf("upload file err: %s", err.Error()))
return
}
c.String(http.StatusOK, fmt.Sprintf("File %s uploaded successfully with fields name=%s and email=%s.", file.Filename, name, email))
})
router.Run(":8080")
}
i expect when file size bigger than limit ,there should be a error.
update
misunderstand , the MaxMultipartMemory just limt the memory, not for file upload file size , even file size is more big than this , will write to temp file.
Maybe MaxBytesReader can help?
The following should force the incoming request to be limited to 30MB
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, int64(30<<20))
as comment, code
router.MaxMultipartMemory = 1 // 8 MiB
just limit program can use how much memory when upload file, not limit upload file size.
I'm currently learning Go and have started to re-write a test-data-generation program I originally wrote in Java. I've been intrigued by Go's channels / threading possibilities, as many of the programs I've written have been focused around load testing a system / recording various metrics.
Here, I am creating some data to be written out to a CSV file. I started out by generating all of the data, then passing that off to be written to a file. I then thought I'd try and implement a channel, so data could be written while it's still being generated.
It worked - it almost eliminated the overhead of generating the data first and then writing it. However, I found that this only worked if I had a channel with a buffer big enough to cope with all of the test data being generated: c := make(chan string, count), where count is the same size as the number of test data lines I am generating.
So, to my question: I'm regularly generating millions of records of test data (load test applications) - should I be using a channel with a buffer that large? I can't find much about restrictions on the size of the buffer?
Running the below with a 10m count completes in ~59.5s; generating the data up front and writing it all to a file takes ~62s; using a buffer length of 1 - 100 takes ~80s.
const externalRefPrefix = "Ref"
const fileName = "citizens.csv"
var counter int32 = 0
func WriteCitizensForApplication(applicationId string, count int) {
file, err := os.Create(fileName)
if err != nil {
panic(err)
}
defer file.Close()
c := make(chan string, count)
go generateCitizens(applicationId, count, c)
for line := range c {
file.WriteString(line)
}
}
func generateCitizens(applicationId string, count int, c chan string) {
for i := 0; i < count; i++ {
c <- fmt.Sprintf("%v%v\n", applicationId, generateExternalRef())
}
close(c)
}
func generateExternalRef() string {
atomic.AddInt32(&COUNTER, 1)
return fmt.Sprintf("%v%08d", externalRefPrefix, counter)
}
func SimpleUploader(r *http.Request, w http.ResponseWriter) {
// temp folder path
chunkDirPath := "./creatives/.uploads/" + userUUID
// create folder
err = os.MkdirAll(chunkDirPath, 02750)
// Get file handle from multipart request
var file io.Reader
mr, err := r.MultipartReader()
var fileName string
// Read multipart body until the "file" part
for {
part, err := mr.NextPart()
if err == io.EOF {
break
}
if part.FormName() == "file" {
file = part
fileName = part.FileName()
fmt.Println(fileName)
break
}
}
// Create files
tempFile := chunkDirPath + "/" + fileName
dst, err := os.Create(tempFile)
defer dst.Close()
buf := make([]byte, 1024*1024)
file.Read(buf)
// write/save buffer to disk
ioutil.WriteFile(tempFile, buf, os.ModeAppend)
if http.DetectContentType(buf) != "video/mp4" {
response, _ := json.Marshal(&Response{"File upload cancelled"})
settings.WriteResponse(w, http.StatusInternalServerError, response)
return
}
// joinedFile := io.MultiReader(bytes.NewReader(buf), file)
_, err = io.Copy(dst, file)
if err != nil {
settings.LogError(err, methodName, "Error copying file")
}
response, _ := json.Marshal(&Response{"File uploaded successfully"})
settings.WriteResponse(w, http.StatusInternalServerError, response)
}
I am uploading a Video file.
Before uploading the entire file I want to do some checks so I save the first 1mb to a file :
buf := make([]byte, 1024*1024)
file.Read(buf)
// write/save buffer to disk
ioutil.WriteFile(tempFile, buf, os.ModeAppend)
Then if the checks pass I want to upload the rest of the file dst is the same file used to save the 1st 1 mb so basically i am trying to append to the file :
_, err = io.Copy(dst, file)
The uploaded file size is correct but the file is corrupted(can't play the video).
What else have I tried? : Joining both the readers and saving to a new file. But with this approach the file size increases by 1 mb and is corrupted.
joinedFile := io.MultiReader(bytes.NewReader(buf), file)
_, err = io.Copy(newDst, joinedFile)
Kindly help.
You've basically opened the file twice by doing os.Create and ioutil.WriteFile
the issue being is that os.Create's return value (dst) is like a pointer to the beginning of that file. WriteFile doesn't move where dst points to.
You are basically doing WriteFile, then io.Copy on top of the first set of bytes WriteFile wrote.
Try doing WriteFile first (with Create flag), and then os.OpenFile (instead of os.Create) that same file with Append flag to append the remaining bytes to the end.
Also, it's extremely risky to allow a client to give you the filename as it could be ../../.bashrc (for example), to which you'd overwrite your shell init with whatever the user decided to upload.
It would be much safer if you computed a filename yourself, and if you need to remember the user's selected filename, store that in your database or even a metadata.json type file that you load later.
I'm attempting to access an array of files and values posted to an API written in Gin (golang). I've got a function which takes a file, height and width. It then calls functions to resize the file, and then upload it to S3. However, I'm attempting to also upload multiple files.
func (rc *ResizeController) Resize(c *gin.Context) {
file, header, err := c.Request.FormFile("file")
filename := header.Filename
if err != nil {
log.Fatal(err)
}
height := c.PostForm("height")
width := c.PostForm("width")
finalFile := rc.Crop(height, width, file)
go rc.Upload(filename, finalFile, "image/jpeg", s3.BucketOwnerFull)
c.JSON(200, gin.H{"filename": filename})
}
I couldn't see anywhere in the docs how to access data in the following format:
item[0]file
item[0]width
item[0]height
item[1]file
item[1]width
item[1]height
etc.
I figured something along the lines of:
for index, element := range c.Request.PostForm("item") {
fmt.Println(element.Height)
}
But that threw "c.Request.Values undefined (type *http.Request has no field or method Values)"
You can access the File slice directly instead of using the FormFile method on Request. Assuming you have a form array for width and height that correspond to the order that the files were uploaded.
if err := ctx.Request.ParseMultipartForm(32 << 20); err != nil {
// handle error
}
for i, fh := range ctx.Request.MultipartForm.File["item"] {
// access file header using fh
w := ctx.Request.MultipartForm.Value["width"][i]
h := ctx.Request.MultipartForm.Value["height"][i]
}
The FormFile method on Request is just a wrapper around MultipartForm.File that returns the first file at that key.