Uploading file to Google Storage without saving it to memory - go

I want to upload files from the frontend directly through the backend into a Google Storage bucket, without saving it entirely in memory on the server first. I've added an endpoint similar to the example from the Google docs and it works. However, I'm not sure if this will save the entire file to memory first, since this could lead to issues when uploading larger files.
If it saves the file to memory first, how could I change the code so that it streams the upload directly to Google Storage. The answers to similar questions didn't clarify my question.
Thank you
func Upload(c *gin.Context) {
file, _, _ := c.Request.FormFile("image")
ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
fmt.Printf("Failed to create client with error: %v", err)
return
}
bucket := client.Bucket("test-bucket")
w := bucket.Object("testfile").NewWriter(ctx)
w.ContentType = "image/jpeg"
io.Copy(w, file)
w.Close()
}

As noted in a comment on question and the answer by Peter, use the multipart reader directly to read the request body.
func Upload(c *gin.Context) {
mr, err := c.Request.MultipartReader()
if err != nil {
// handle error
return
}
var foundImage bool
for {
p, err := mr.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
return
}
if p.FormName() == "image" {
foundImage = true
ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
// handle error
return
}
bucket := client.Bucket("test-bucket")
w := bucket.Object("testfile").NewWriter(ctx)
w.ContentType = "image/jpeg"
if _, err := io.Copy(w, p); err != nil {
// handle error
return
}
if err := w.Close(); err != nil {
// handle error
return
}
}
}
if !imageFound {
// handle error
}
}
Replace the // handle error comments with code that responds to the client with an appropriate error status. It may be useful to log some of the errors as well.

FormFile returns the first file for the provided form key. FormFile calls ParseMultipartForm and ParseForm if necessary.
https://golang.org/pkg/net/http/#Request.FormFile
ParseMultipartForm parses a request body as multipart/form-data. The whole request body is parsed and up to a total of maxMemory bytes of its file parts are stored in memory, with the remainder stored on disk in temporary files.
https://golang.org/pkg/net/http/#Request.ParseMultipartForm
At the time of writing this, FormFile passes 32 MB as the maxMemory argument.
So you with this code you will need up to 32 MB of memory per request, plus googleapi.DefaultUploadChunkSize, which is currently 8 MB, as well as some amount of disk space for everything that doesn't fit in memory.
So uploading will not start until the whole file has been read, but not all of it is kept in memory. If that's not what you want, use Request.MultipartReader instead of ParseMultipartForm:
MultipartReader returns a MIME multipart reader if this is a multipart/form-data or a multipart/mixed POST request, else returns nil and an error. Use this function instead of ParseMultipartForm to process the request body as a stream.

Related

Go - correct usage of multipart Part.Read

I've been trying to use multipart.Part to help read very large file uploads (>20GB) from HTTP - so I've written the below code which seems to work nicely:
func ReceiveMultipartRoute(w http.ResponseWriter, r *http.Request) {
mediatype, p, err := mime.ParseMediaType(r.Header.Get("Content-Type"))
if err != nil {
//...
}
if mediatype != "multipart/form-data" {
//...
}
boundary := p["boundary"]
reader := multipart.NewReader(r.Body, boundary)
buffer := make([]byte, 8192)
for {
part, err := reader.NextPart()
if err != nil {
// ...
}
f, err := os.CreateTemp("", part.FileName())
if err != nil {
// ...
}
for {
numBytesRead, err := part.Read(buffer)
// People say not to read if there's an err, but then I miss the last chunk?
f.Write(buffer[:numBytesRead])
if err != nil {
if err == io.EOF {
break
} else {
// error, abort ...
return
}
}
}
}
}
However, in the innermost for loop, I found out that I have to read from part.Read before even checking for EOF, as I notice that I will miss the last chunk if I do so beforehand and break. However, I notice on many other articles/posts where people check for errors/EOF, and break-ing if there is without using the last read. Am I using multipart.Part.Read() wrongly/safely?
You use multipart.Part in a proper way.
multipart.Part is a particular implementation of io.Reader. Accordingly, you should be guided by the conventions and follow the recommendations for io.Reader. Quote from the documentation:
Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.
Also note that in the example you are copying data from io.Reader to os.File. os.File implements io.ReaderFrom interface, so you can use File.ReadFrom() method to copy the data.
_, err := file.ReadFrom(part)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
If you need to use a buffer, you can use io.CopyBuffer() function. But note that you need to hide io.ReaderFrom implementation, otherwise the buffer will not be used to perform the copy. See examples: 1, 2, 3.
_, err := io.CopyBuffer(writeFunc(file.Write), part, buffer)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
type writeFunc func([]byte) (int, error)
func (write writeFunc) Write(data []byte) (int, error) {
return write(data)
}

upload large file - Error VirtualAlloc of x bytes failed with errno=1455 fatal error: out of memory

I have 10GB large file that I'm trying to upload with multipart/form-data in Go via Postman. Since I don't know much how file upload works in Go, I found tutorial on YouTube.
File upload works fine with smaller files, but on larger files always crashing with message: "runtime: VirtualAlloc of 9193373696 bytes failed with errno=1455
fatal error: out of memory". Here's the code I'm trying to make work:
err := r.ParseMultipartForm(500 << 20)
if err != nil {
fmt.Fprintln(w, err)
}
file, handler, err := r.FormFile("file")
if err != nil {
fmt.Fprintln(w, err)
}
fmt.Fprintln(w, handler.Filename)
fmt.Fprintln(w, handler.Size)
fmt.Fprintln(w, handler.Header.Get("Content-type"))
defer file.Close()
saveLocation := "C:\\Users\\Pc\\go\\src\\github.com\\test\\uptest"
tempFile, err := ioutil.TempFile(saveLocation, "upload")
if err != nil {
fmt.Fprintln(w, err)
}
defer tempFile.Close()
fileBytes, err := ioutil.ReadAll(file)
if err != nil {
fmt.Fprintln(w, err)
}
tempFile.Write(fileBytes)
Using ParseMultipartForm() will require you to provide max memory allocation for temporarily store the uploaded file. If your file size is big (and it's bigger than the number of memory you allocated) then it's bad news for your memory resource.
From doc:ParseMultipartForm parses a request body as multipart/form-data. The whole request body is parsed and up to a total of maxMemory bytes of its file parts are stored in memory, with the remainder stored on disk in temporary files. ParseMultipartForm calls ParseForm if necessary. After one call to ParseMultipartForm, subsequent calls have no effect.
Based on your error message, we can tell that the root cause of your issue is due to the uploaded file is larger than the memory you allocated, which is 500 << 20.
For handling big file upload, I suggest to take a look at MultipartReader() instead.
From doc:MultipartReader returns a MIME multipart reader if this is a multipart/form-data or a multipart/mixed POST request, else returns nil and an error. Use this function instead of ParseMultipartForm to process the request body as a stream.
It's way faster approach and won't consume too much resource, it's because we will have the advantage of directly store the body (which is a stream data) into the destination file using io.Copy(), instead of writing it into temporarily storage first.
A simple usage example of MultipartReader():
reader, err := r.MultipartReader()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
fmt.Println(part.FileName()) // prints file name
fmt.Println(part.FormName()) // prints form key, in yor case it's "file"
saveLocation := "C:\\Users\\Pc\\go\\src\\github.com\\test\\uptest"
dst, err := os.Create(saveLocation)
if dst != nil {
defer dst.Close()
}
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
if _, err := io.Copy(dst, part); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
}
Reference: https://pkg.go.dev/net/http#Request.ParseMultipartForm

How to handle buffered Read-Write Stream(s) to peers in golang using libp2p?

I am following this tutorial:
https://github.com/libp2p/go-libp2p-examples/tree/master/chat-with-mdns
In a short form, it:
configures a p2p host
sets a default handler function for incoming connections
(3. not necessary)
and opens a stream to the connecting peers:
stream, err := host.NewStream(ctx, peer.ID, protocol.ID(cfg.ProtocolID))
Afterwards, there is a buffer stream/read-write variable created:
rw := bufio.NewReadWriter(bufio.NewReader(stream), bufio.NewWriter(stream))
Now this stream is used to send and receive data between the peers. This is done using two goroutine functions that have rw as an input:
go writeData(rw)
go readData(rw)
My problems are:
I want to send data to my peers and need feedback from them:
e.g. in rw there is a question and they need to answer yes/no. How can I transfer back this answer and process it (enable some interaction)?
The data I want to send in rw is not always the same. Sometimes it's a string containing only a name, sometimes it's a string containing a whole block etc. How can I distinguish?
I thought about those solutions. But I am new to golang, so maybe you have a better one:
do I need a new stream for every different content:
stream, err := host.NewStream(ctx, peer.ID, protocol.ID(cfg.ProtocolID))
do I need to open more buffered rw varibales for every different content:
rw := bufio.NewReadWriter(bufio.NewReader(stream), bufio.NewWriter(stream))
are there any other solutions?
Thank you for any help to solve this!!
This is what readData does from your tuto:
func readData(rw *bufio.ReadWriter) {
for {
str, err := rw.ReadString('\n')
if err != nil {
fmt.Println("Error reading from buffer")
panic(err)
}
if str == "" {
return
}
if str != "\n" {
// Green console colour: \x1b[32m
// Reset console colour: \x1b[0m
fmt.Printf("\x1b[32m%s\x1b[0m> ", str)
}
}
}
It basically reads the stream until it finds a \n, which is a new line character and prints it to stdout.
The writeData:
func writeData(rw *bufio.ReadWriter) {
stdReader := bufio.NewReader(os.Stdin)
for {
fmt.Print("> ")
sendData, err := stdReader.ReadString('\n')
if err != nil {
fmt.Println("Error reading from stdin")
panic(err)
}
_, err = rw.WriteString(fmt.Sprintf("%s\n", sendData))
if err != nil {
fmt.Println("Error writing to buffer")
panic(err)
}
err = rw.Flush()
if err != nil {
fmt.Println("Error flushing buffer")
panic(err)
}
}
}
It reads data from stdin, so you can type messages, and writes this to the rw and flushes it. This kind of enables a sort of tty chat.
If it works correctly you should be able to start at least two peers and communicate through stdin.
You shouldn't recreate new rw for new content. You can reuse the existing one until you close it. From the tuto's code, a new rw is created for each new peer.
Now a tcp stream does not work as an http request with a request and a response corresponding to that request. So if you want to send something, and get the response to that specific question, you could send a message of this format:
[8 bytes unique ID][content of the message]\n
And when you receive it, you parse it, prepare the response and send it with the same format, so that you can match messages, creating a sort of request/response communication.
You can do something like that:
func sendMsg(rw *bufio.ReadWriter, id int64, content []byte) error {
// allocate our slice of bytes with the correct size 4 + size of the message + 1
msg := make([]byte, 4 + len(content) + 1)
// write id
binary.LittleEndian.PutUint64(msg, uint64(id))
// add content to msg
copy(msg[13:], content)
// add new line at the end
msg[len(msg)-1] = '\n'
// write msg to stream
_, err = rw.Write(msg)
if err != nil {
fmt.Println("Error writing to buffer")
return err
}
err = rw.Flush()
if err != nil {
fmt.Println("Error flushing buffer")
return err
}
return nil
}
func readMsg(rw *bufio.ReadWriter) {
for {
// read bytes until new line
msg, err := rw.ReadBytes('\n')
if err != nil {
fmt.Println("Error reading from buffer")
continue
}
// get the id
id := int64(binary.LittleEndian.Uint64(msg[0:8]))
// get the content, last index is len(msg)-1 to remove the new line char
content := string(msg[8:len(msg)-1])
if content != "" {
// we print [message ID] content
fmt.Printf("[%d] %s", id, content)
}
// here you could parse your message
// and prepare a response
response, err := prepareResponse(content)
if err != nil {
fmt.Println("Err while preparing response: ", err)
continue
}
if err := s.sendMsg(rw, id, response); err != nil {
fmt.Println("Err while sending response: ", err)
continue
}
}
}
Hope this helps.

Golang reading from a file - is it safe from locking?

I have a function that will be called on every single HTTP GET request. The function reads a file, does some stuff to the contents of that file, and returns a slice of bytes of those contents. That slice of bytes of then written as the response body to the HTTP response writer.
Do I need to use a mutex for any of the steps in this function to prevent locking in the event of multiple HTTP requests trying to read the same file? And if so, would a simple RWMutex locking the reading of the file suffice, since I am not actually writing to it but am creating a copy of its contents?
Here is the function:
// prepareIndex will grab index.html and add a nonce to the script tags for the CSP header compliance.
func prepareIndex(nonce string) []byte {
// Load index.html.
file, err := os.Open("./client/dist/index.html")
if err != nil {
log.Fatal(err)
}
// Convert to goquery document.
doc, err := goquery.NewDocumentFromReader(file)
if err != nil {
fmt.Println(err)
}
// Find all script tags and set nonce.
doc.Find("body > script").SetAttr("nonce", nonce)
// Grab the HTML string.
html, err := doc.Html()
if err != nil {
fmt.Println(err)
}
return []byte(html)
}
I also thought about just loading the file once when main starts, but I was having a problem where only the first request could see the data and the subsequent requests saw nothing. Probably an error in the way I was reading the file. But I actually prefer my current approach because if there are any changes to index.html, I want them to be persisted to the user immediately without having to restart the executable.
Using RWMutex won't protect you from the file being modified by another program. The best option here would be to load your file in a []byte at startup, and instantiate "bytes".Buffer whenever you use goquery.NewDocumentFromReader. In order for the changes to be propagated to the user, you can use fsnotify[1] to detect changes to your file, and update your cached file ([]byte) when necessary (you will need RWMutex for that operation).
For example:
type CachedFile struct {
sync.RWMutex
FileName string
Content []byte
watcher *fsnotify.Watcher
}
func (c *CachedFile) Buffer() *bytes.Buffer {
c.RLock()
defer c.RUnlock()
return bytes.NewBuffer(c.Content)
}
func (c *CachedFile) Load() error {
c.Lock()
content, err := ioutil.ReadAll(c.FileName)
if err != nil {
c.Unlock()
return err
}
c.Content = content
c.Unlock()
}
func (c *CachedFile) Watch() error {
var err error
c.watcher, err = fsnotify.NewWatcher()
if err != nil {
return err
}
go func() {
for ev := range c.watcher.Events {
if ev.Op != fsnotify.Write {
continue
}
err := c.Load()
if err != nil {
log.Printf("loading %q: %s", c.FileName, err)
}
}
}()
err = c.watcher.Add(c.FileName)
if err != nil {
c.watcher.Close()
return err
}
return nil
}
func (c *CachedFile) Close() error {
return c.watcher.Close()
}
[1] https://godoc.org/github.com/fsnotify/fsnotify
If you're modifying the file, you need a mutex. RWMutex should work fine. It looks like you're just reading it, and in that case you should not see any locking behavior or corruption.
The reason you didn't get any data the second time you read from the same file handle is that you're already at the end of the file when you start reading from it the second time. You need to seek back to offset 0 if you want to read the contents again.

How to efficiently store html response to a file in golang

I'm trying to build a crawler in Golang. I'm using net/http library to download the html file from url. I'm trying to save http.resp and http.Header into file.
How to convert these two file from their respective format into string so that, it could be written to a text file.
I also see a question asked earlier on parsing a stored html response file. Parse HTTP requests and responses from text file in Go . Is there any way to save the url response in this format.
Go has an httputil package with a response dump.
https://golang.org/pkg/net/http/httputil/#DumpResponse.
The second argument of response dump is a bool of whether or not to include the body. So if you want to save just the header to a file, set that to false.
An example function that would dump the response to a file could be:
import (
"io/ioutil"
"net/http"
"net/http/httputil"
)
func dumpResponse(resp *http.Response, filename string) error {
dump, err := httputil.DumpResponse(resp, true)
if err != nil {
return err
}
return ioutil.WriteFile(filename, dump, 0644)
}
Edit: Thanks to #JimB for pointing to the http.Response.Write method which makes this a lot easier than I proposed in the beginning:
resp, err := http.Get("http://google.com/")
if err != nil{
log.Panic(err)
}
f, err := os.Create("output.txt")
defer f.Close()
resp.Write(f)
This was my first Answer
You could do something like this:
resp, err := http.Get("http://google.com/")
body, err := ioutil.ReadAll(resp.Body)
// write whole the body
err = ioutil.WriteFile("body.txt", body, 0644)
if err != nil {
panic(err)
}
This was the edit to my first answer:
Thanks to #Hector Correa who added the header part. Here is a more comprehensive snippet, targeting your whole question. This writes header followed by the body of the request to output.txt
//get the response
resp, err := http.Get("http://google.com/")
//body
body, err := ioutil.ReadAll(resp.Body)
//header
var header string
for h, v := range resp.Header {
for _, v := range v {
header += fmt.Sprintf("%s %s \n", h, v)
}
}
//append all to one slice
var write []byte
write = append(write, []byte(header)...)
write = append(write, body...)
//write it to a file
err = ioutil.WriteFile("output.txt", write, 0644)
if err != nil {
panic(err)
}
Following on the answer by #Riscie you could also pick up the headers from the response with something like this:
for header, values := range resp.Header {
for _, value := range values {
log.Printf("\t\t %s %s", header, value)
}
}

Resources