Golang reading from a file - is it safe from locking? - go

I have a function that will be called on every single HTTP GET request. The function reads a file, does some stuff to the contents of that file, and returns a slice of bytes of those contents. That slice of bytes of then written as the response body to the HTTP response writer.
Do I need to use a mutex for any of the steps in this function to prevent locking in the event of multiple HTTP requests trying to read the same file? And if so, would a simple RWMutex locking the reading of the file suffice, since I am not actually writing to it but am creating a copy of its contents?
Here is the function:
// prepareIndex will grab index.html and add a nonce to the script tags for the CSP header compliance.
func prepareIndex(nonce string) []byte {
// Load index.html.
file, err := os.Open("./client/dist/index.html")
if err != nil {
log.Fatal(err)
}
// Convert to goquery document.
doc, err := goquery.NewDocumentFromReader(file)
if err != nil {
fmt.Println(err)
}
// Find all script tags and set nonce.
doc.Find("body > script").SetAttr("nonce", nonce)
// Grab the HTML string.
html, err := doc.Html()
if err != nil {
fmt.Println(err)
}
return []byte(html)
}
I also thought about just loading the file once when main starts, but I was having a problem where only the first request could see the data and the subsequent requests saw nothing. Probably an error in the way I was reading the file. But I actually prefer my current approach because if there are any changes to index.html, I want them to be persisted to the user immediately without having to restart the executable.

Using RWMutex won't protect you from the file being modified by another program. The best option here would be to load your file in a []byte at startup, and instantiate "bytes".Buffer whenever you use goquery.NewDocumentFromReader. In order for the changes to be propagated to the user, you can use fsnotify[1] to detect changes to your file, and update your cached file ([]byte) when necessary (you will need RWMutex for that operation).
For example:
type CachedFile struct {
sync.RWMutex
FileName string
Content []byte
watcher *fsnotify.Watcher
}
func (c *CachedFile) Buffer() *bytes.Buffer {
c.RLock()
defer c.RUnlock()
return bytes.NewBuffer(c.Content)
}
func (c *CachedFile) Load() error {
c.Lock()
content, err := ioutil.ReadAll(c.FileName)
if err != nil {
c.Unlock()
return err
}
c.Content = content
c.Unlock()
}
func (c *CachedFile) Watch() error {
var err error
c.watcher, err = fsnotify.NewWatcher()
if err != nil {
return err
}
go func() {
for ev := range c.watcher.Events {
if ev.Op != fsnotify.Write {
continue
}
err := c.Load()
if err != nil {
log.Printf("loading %q: %s", c.FileName, err)
}
}
}()
err = c.watcher.Add(c.FileName)
if err != nil {
c.watcher.Close()
return err
}
return nil
}
func (c *CachedFile) Close() error {
return c.watcher.Close()
}
[1] https://godoc.org/github.com/fsnotify/fsnotify

If you're modifying the file, you need a mutex. RWMutex should work fine. It looks like you're just reading it, and in that case you should not see any locking behavior or corruption.
The reason you didn't get any data the second time you read from the same file handle is that you're already at the end of the file when you start reading from it the second time. You need to seek back to offset 0 if you want to read the contents again.

Related

Go - correct usage of multipart Part.Read

I've been trying to use multipart.Part to help read very large file uploads (>20GB) from HTTP - so I've written the below code which seems to work nicely:
func ReceiveMultipartRoute(w http.ResponseWriter, r *http.Request) {
mediatype, p, err := mime.ParseMediaType(r.Header.Get("Content-Type"))
if err != nil {
//...
}
if mediatype != "multipart/form-data" {
//...
}
boundary := p["boundary"]
reader := multipart.NewReader(r.Body, boundary)
buffer := make([]byte, 8192)
for {
part, err := reader.NextPart()
if err != nil {
// ...
}
f, err := os.CreateTemp("", part.FileName())
if err != nil {
// ...
}
for {
numBytesRead, err := part.Read(buffer)
// People say not to read if there's an err, but then I miss the last chunk?
f.Write(buffer[:numBytesRead])
if err != nil {
if err == io.EOF {
break
} else {
// error, abort ...
return
}
}
}
}
}
However, in the innermost for loop, I found out that I have to read from part.Read before even checking for EOF, as I notice that I will miss the last chunk if I do so beforehand and break. However, I notice on many other articles/posts where people check for errors/EOF, and break-ing if there is without using the last read. Am I using multipart.Part.Read() wrongly/safely?
You use multipart.Part in a proper way.
multipart.Part is a particular implementation of io.Reader. Accordingly, you should be guided by the conventions and follow the recommendations for io.Reader. Quote from the documentation:
Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.
Also note that in the example you are copying data from io.Reader to os.File. os.File implements io.ReaderFrom interface, so you can use File.ReadFrom() method to copy the data.
_, err := file.ReadFrom(part)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
If you need to use a buffer, you can use io.CopyBuffer() function. But note that you need to hide io.ReaderFrom implementation, otherwise the buffer will not be used to perform the copy. See examples: 1, 2, 3.
_, err := io.CopyBuffer(writeFunc(file.Write), part, buffer)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
type writeFunc func([]byte) (int, error)
func (write writeFunc) Write(data []byte) (int, error) {
return write(data)
}

Golang return file from fly

I have the ECHO framework which should return the file on request, and it's working well
func IniExport(c echo.Context) error{
cfg := ini.Empty()
if section, err := cfg.NewSection("test_section"); err != nil {
return c.JSON(http.StatusInternalServerError, "Problems with generation of export file.")
}
if key, err := cfg.Section("test_section").NewKey("name", "value"); err != nil {
return c.JSON(http.StatusInternalServerError, "Problems with generation of export file.")
}
cfg.SaveTo("export.ini")
defer os.Remove("export.ini")
return c.Attachment("export.ini", "export.ini")
}
But question, is it possible to not create physical file export.ini and after do not remove it? Possible to return content on the fly somehow?
Thanks
I think you need Send Blob.
Send Blob Context#Blob(code int, contentType string, b []byte) can be
used to send an arbitrary data response with provided content type and
status code.
Example
func(c echo.Context) (err error) {
data := []byte(`0306703,0035866,NO_ACTION,06/19/2006
0086003,"0005866",UPDATED,06/19/2006`)
return c.Blob(http.StatusOK, "text/csv", data)
}
You can use the WriteTo function to write the cfg content to the io.Writer first and then those contents can be used instead of data(in the previous code example. Also make sure to change the content type to text/plain)

Uploading file to Google Storage without saving it to memory

I want to upload files from the frontend directly through the backend into a Google Storage bucket, without saving it entirely in memory on the server first. I've added an endpoint similar to the example from the Google docs and it works. However, I'm not sure if this will save the entire file to memory first, since this could lead to issues when uploading larger files.
If it saves the file to memory first, how could I change the code so that it streams the upload directly to Google Storage. The answers to similar questions didn't clarify my question.
Thank you
func Upload(c *gin.Context) {
file, _, _ := c.Request.FormFile("image")
ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
fmt.Printf("Failed to create client with error: %v", err)
return
}
bucket := client.Bucket("test-bucket")
w := bucket.Object("testfile").NewWriter(ctx)
w.ContentType = "image/jpeg"
io.Copy(w, file)
w.Close()
}
As noted in a comment on question and the answer by Peter, use the multipart reader directly to read the request body.
func Upload(c *gin.Context) {
mr, err := c.Request.MultipartReader()
if err != nil {
// handle error
return
}
var foundImage bool
for {
p, err := mr.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
return
}
if p.FormName() == "image" {
foundImage = true
ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
// handle error
return
}
bucket := client.Bucket("test-bucket")
w := bucket.Object("testfile").NewWriter(ctx)
w.ContentType = "image/jpeg"
if _, err := io.Copy(w, p); err != nil {
// handle error
return
}
if err := w.Close(); err != nil {
// handle error
return
}
}
}
if !imageFound {
// handle error
}
}
Replace the // handle error comments with code that responds to the client with an appropriate error status. It may be useful to log some of the errors as well.
FormFile returns the first file for the provided form key. FormFile calls ParseMultipartForm and ParseForm if necessary.
https://golang.org/pkg/net/http/#Request.FormFile
ParseMultipartForm parses a request body as multipart/form-data. The whole request body is parsed and up to a total of maxMemory bytes of its file parts are stored in memory, with the remainder stored on disk in temporary files.
https://golang.org/pkg/net/http/#Request.ParseMultipartForm
At the time of writing this, FormFile passes 32 MB as the maxMemory argument.
So you with this code you will need up to 32 MB of memory per request, plus googleapi.DefaultUploadChunkSize, which is currently 8 MB, as well as some amount of disk space for everything that doesn't fit in memory.
So uploading will not start until the whole file has been read, but not all of it is kept in memory. If that's not what you want, use Request.MultipartReader instead of ParseMultipartForm:
MultipartReader returns a MIME multipart reader if this is a multipart/form-data or a multipart/mixed POST request, else returns nil and an error. Use this function instead of ParseMultipartForm to process the request body as a stream.

goroutine deadlock: In an app that reads from a blockchain and writes to rethinkdb, have

Okay, so
My situation is this: It's been three weeks and some-odd hours since I've become entranced by golang. I'm working on a blockchain dump tool for steem, and I'm probably going to give a touch of gjson to github.com/go-steem/rpc, the library I currently rely on. Now, with this said, this question is about the goroutines for my current blockchain reader. Here it is (sorry a tad on the beefy side, but you'll see the part that I want to pull back into the library, too):
// Keep processing incoming blocks forever.
fmt.Println("---> Entering the block processing loop")
for {
// Get current properties.
props, err := Client.Database.GetDynamicGlobalProperties()
if err != nil {
fmt.Println(err)
}
// Process blocks.
for I := uint32(1); I <= props.LastIrreversibleBlockNum; I++ {
go getblock(I, Client, Rsession)
}
if err != nil {
fmt.Println(err)
}
}
}
func getblock(I uint32, Client *rpc.Client, Rsession *r.Session) {
block, err := Client.Database.GetBlock(I)
fmt.Println(I)
writeBlock(block, Rsession)
if err != nil {
fmt.Println(err)
}
}
func writeBlock(block *d.Block, Rsession *r.Session) {
//rethinkdb writes
r.Table("transactions").
Insert(block.Transactions).
Exec(Rsession)
r.Table("blocks").
Insert(block).
Exec(Rsession)
}
I just made a third edit to this, which was to call the function writeBlock from goroutine getBlock instead of the way I was doing things before. I'
Okay, so that is now resolved, but this is going to spawn another question, unfortunatley.
I've got the application working with the goroutine, however it hasn't increased performance any.
The way that I got it to work was by not spawning a goroutine from a goroutine and instead calling a plain function, writeBlock from the goroutine "getblock":
fmt.Println("---> Entering the block processing loop")
for {
// Get current properties.
props, err := Client.Database.GetDynamicGlobalProperties()
if err != nil {
fmt.Println(err)
}
// Process blocks.
for I := uint32(1); I <= props.LastIrreversibleBlockNum; I++ {
go getblock(I, Client, Rsession)
}
if err != nil {
fmt.Println(err)
}
}
}
func getblock(I uint32, Client *rpc.Client, Rsession *r.Session) {
block, err := Client.Database.GetBlock(I)
fmt.Println(I)
writeBlock(block, Rsession)
if err != nil {
fmt.Println(err)
}
}
func writeBlock(block *d.Block, Rsession *r.Session) {
//rethinkdb writes
r.Table("transactions").
Insert(block.Transactions).
Exec(Rsession)
r.Table("blocks").
Insert(block).
Exec(Rsession)
}

Using defer with pointers

Let us say that I have the following code:
func getConnection(fileName string) *os.File {
file, err := os.Open(fileName)
//Check for error
return file
}
I use this function to open a file and the function is called from another function that does some other activity.
My question is, now that I have opened the file, how do I close it. If I were to add defer file.Close() inside getConnection(), wouldn't it close the file before returning? Does it make sense to use defer inside the calling function?
If the purpose of your function is to return a file, why would you want to close it in the function that returns it?
In this case it is the responsibility of the caller to properly close the file, preferably with defer:
func usingGetConnection() {
f := getConnection("file.txt")
defer f.Close()
// Use f here
}
Although your getConnection() function swallows errors, you should use multi-return value to indicate problems like this:
func getConnection(fileName string) (*os.File, error) {
file, err := os.Open(fileName)
//Check for error
if err != nil {
return nil, err
}
return file, nil
}
And using it:
func usingGetConnection() {
f, err := getConnection("file.txt")
if err != nil {
panic(err) // Handle err somehow
}
defer f.Close()
// Use f here
}

Resources