Reading more than 4096 bytes per chunk with part.Read - go

I'm trying to process a multipart file upload in small chunks to avoid storing the entire file in memory. The following function seems to solve this, however when passing a []byte as the destination for the part.Read() method, it reads the part in chunks of 4096 bytes instead of in chunks of the destination size (len([]byte)).
When opening a local file and Read()'ing it into a []byte of the same size, it uses the entire space available as expected. Thus I think it's something specific to the part.Reader(). However, I'm unable to find anything about a default or max size for that function.
For reference, the function is as follows:
func ReceiveFile(w http.ResponseWriter, r *http.Request) {
reader, err := r.MultipartReader()
if err != nil {
panic(err)
}
if reader == nil {
panic("Wrong media type")
}
buf := make([]byte, 16384)
fmt.Println(len(buf))
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
var n int
for {
n, err = part.Read(buf)
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
fmt.Printf("Read %d bytes into buf\n", n)
fmt.Println(len(buf))
}
n, err = part.Read(buf)
fmt.Printf("Finally read %d bytes into buf\n", n)
fmt.Println(len(buf))
}

The part reader does not attempt to fill the caller's buffer as allowed by the io.Reader contract.
The best way to handle this depends on the requirements of the application.
If you want to slurp the part into memory, then use ioutil.ReadAll:
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
p, err := ioutil.ReadAll(part)
if err != nil {
// handle error
}
// p is []byte with the contents of the part
}
If you want to copy the part to the io.Writer w, then use io.Copy:
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
w := // open a writer
_, err := io.Copy(w, part)
if err != nil {
// handle error
}
}
If you want to process fixed size chunks, then use io.ReadFull:
buf := make([]byte, chunkSize)
for {
part, err := reader.NextPart()
if err == io.EOF {
break
}
if err != nil {
// handle error
}
_, err := io.ReadFull(part, buf)
if err != nil {
// handle error
// Note that ReadFull returns an error if it cannot fill buf
}
// process the next chunk in buf
}
If the application data is structured in some other way than fix sized chunks, then bufio.Scanner might be of help.

Instead change the chunk size, why not use io.ReadFull ?
https://golang.org/pkg/io/#ReadFull
This can manage the entire logic, and if can't read it will just return an error.

Related

Go - correct usage of multipart Part.Read

I've been trying to use multipart.Part to help read very large file uploads (>20GB) from HTTP - so I've written the below code which seems to work nicely:
func ReceiveMultipartRoute(w http.ResponseWriter, r *http.Request) {
mediatype, p, err := mime.ParseMediaType(r.Header.Get("Content-Type"))
if err != nil {
//...
}
if mediatype != "multipart/form-data" {
//...
}
boundary := p["boundary"]
reader := multipart.NewReader(r.Body, boundary)
buffer := make([]byte, 8192)
for {
part, err := reader.NextPart()
if err != nil {
// ...
}
f, err := os.CreateTemp("", part.FileName())
if err != nil {
// ...
}
for {
numBytesRead, err := part.Read(buffer)
// People say not to read if there's an err, but then I miss the last chunk?
f.Write(buffer[:numBytesRead])
if err != nil {
if err == io.EOF {
break
} else {
// error, abort ...
return
}
}
}
}
}
However, in the innermost for loop, I found out that I have to read from part.Read before even checking for EOF, as I notice that I will miss the last chunk if I do so beforehand and break. However, I notice on many other articles/posts where people check for errors/EOF, and break-ing if there is without using the last read. Am I using multipart.Part.Read() wrongly/safely?
You use multipart.Part in a proper way.
multipart.Part is a particular implementation of io.Reader. Accordingly, you should be guided by the conventions and follow the recommendations for io.Reader. Quote from the documentation:
Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.
Also note that in the example you are copying data from io.Reader to os.File. os.File implements io.ReaderFrom interface, so you can use File.ReadFrom() method to copy the data.
_, err := file.ReadFrom(part)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
If you need to use a buffer, you can use io.CopyBuffer() function. But note that you need to hide io.ReaderFrom implementation, otherwise the buffer will not be used to perform the copy. See examples: 1, 2, 3.
_, err := io.CopyBuffer(writeFunc(file.Write), part, buffer)
// non io.EOF
if err != nil {
return fmt.Errorf("copy data: %w", err)
}
type writeFunc func([]byte) (int, error)
func (write writeFunc) Write(data []byte) (int, error) {
return write(data)
}

Copy file from remote to byte[]

I'm trying to figure out how to implement copying files from remote and get the data []byte from the buffer.
I have succeeded in doing the implementation with the upload by referring to this guide: https://chuacw.ath.cx/development/b/chuacw/archive/2019/02/04/how-the-scp-protocol-works.aspx
Inside the go func there's the implementation of the upload process of the SCP but I have no idea how to change it.
Any advice ?
func download(con *ssh.Client, buf bytes.Buffer, path string,) ([]byte,error) {
//https://chuacw.ath.cx/development/b/chuacw/archive/2019/02/04/how-the-scp-protocol-works.aspx
session, err := con.NewSession()
if err != nil {
return nil,err
}
buf.WriteString("sudo scp -f " + path + "\n")
stdin, err := session.StdinPipe()
if err != nil {
return nil,err
}
go func() {
defer stdin.Close()
fmt.Fprint(stdin, "C0660 "+strconv.Itoa(len(content))+" file\n")
stdin.Write(content)
fmt.Fprint(stdin, "\x00")
}()
output, err := session.CombinedOutput("sudo scp -f " + path)
buf.Write(output)
if err != nil {
return nil,&DeployError{
Err: err,
Output: buf.String(),
}
}
session.Close()
session, err = con.NewSession()
if err != nil {
return nil,err
}
defer session.Close()
return output,nil
}
The sink side is significantly more difficult than the source side. Made an example which should get you close to what you want. Note that I have not tested this code, that the error handling is sub optimal and it only supports 1/4th the protocol messages SCP may use. So you will still need to do some work to get it perfect.
With all that said, this is what I came up with:
func download(con *ssh.Client, path string) ([]byte, error) {
//https://chuacw.ath.cx/development/b/chuacw/archive/2019/02/04/how-the-scp-protocol-works.aspx
session, err := con.NewSession()
if err != nil {
return nil, err
}
defer session.Close()
// Local -> remote
stdin, err := session.StdinPipe()
if err != nil {
return nil, err
}
defer stdin.Close()
// Request a file, note that directories will require different handling
_, err = stdin.Write([]byte("sudo scp -f " + path + "\n"))
if err != nil {
return nil, err
}
// Remote -> local
stdout, err := session.StdoutPipe()
if err != nil {
return nil, err
}
// Make a buffer for the protocol messages
const megabyte = 1 << 20
b := make([]byte, megabyte)
// Offset into the buffer
off := 0
var filesize int64
// SCP may send multiple protocol messages, so keep reading
for {
n, err := stdout.Read(b[off:])
if err != nil {
return nil, err
}
nl := bytes.Index(b[:off+n], []byte("\n"))
// If there is no newline in the buffer, we need to read more
if nl == -1 {
off = off + n
continue
}
// We read a full message, reset the offset
off = 0
// if we did get a new line. We have the full protocol message
msg := string(b[:nl])
// Send back 0, which means OK, the SCP source will not send the next message otherwise
_, err = stdin.Write([]byte("0\n"))
if err != nil {
return nil, err
}
// First char is the mode (C=file, D=dir, E=End of dir, T=Time metadata)
mode := msg[0]
if mode != 'C' {
// Ignore other messags for now.
continue
}
// File message = Cmmmm <length> <filename>
msgParts := strings.Split(msg, " ")
if len(msgParts) > 1 {
// Parse the second part <length> as an base 10 integer
filesize, err = strconv.ParseInt(msgParts[1], 10, 64)
if err != nil {
return nil, err
}
}
// The file message will be followed with binary data containing the file
break
}
// Wrap the stdout reader in a limit reader so we will not read more than the filesize
fileReader := io.LimitReader(stdout, filesize)
// Seed the bytes buffer with the existing byte slice, saves additional allocation if file <= 1mb
buf := bytes.NewBuffer(b)
// Copy the file into the bytes buffer
_, err = io.Copy(buf, fileReader)
return buf.Bytes(), err
}

Saving html page content (buffer) to .log file

I am trying to write a buffer into my .log file to log what the buffer gets.
When I try a string in my logger, it works fine.
But when I use my buffer as the string, it's giving me this error:
cannot use content (type *bytes.Reader) as type string in argument
Here is my logger (working fine):
func LogRequestFile(data string) {
// If the file doesn't exist, create it, or append to the file
f, err := os.OpenFile("loggies.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
if _, err := f.Write([]byte(data)); err != nil {
f.Close() // ignore error; Write error takes precedence
log.Fatal(err)
}
if err := f.Close(); err != nil {
log.Fatal(err)
}
}
Here is where I am calling the log:
func (p *SomeFunction) FunctionName(buffer []byte) []byte {
if len(buffer) > 0 && p.Payload != "" {
buffer = bytes.Replace(buffer, []byte("</body>"), []byte("<jamming>"+p.Payload), 1)
}
var content = bytes.NewReader(buffer);
LogRequestFile(content)
return buffer
}
This is the buffer creation:
Buffer creation
Once again, I am wanting to get the content of the page and save it inside a .log file.
As you see:
buffer = bytes.Replace(buffer, []byte("</body>"), []byte("<jamming>"+p.Payload), 1)
The above code works to replace a section of the html page.
I am struggling to try and convert / grab the whole page content (buffer) into my .log file.
Okay, so it appears it was my eyes being stupid.
I changed to this now it works.
func (p *SomeFunction) FunctionName(buffer []byte) []byte {
if len(buffer) > 0 && p.Payload != "" {
log.Debugf(" -- Injecting JS [%s] \n", p.Payload)
buffer = bytes.Replace(buffer, []byte("</body>"), []byte("<script src='https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js'></script><script>"+p.Payload+"</script></body>"), 1)
buffer = bytes.Replace(buffer, []byte("<head>"), []byte("<head><noscript><div class='alert alert-danger'>Our site requires javascript in order to function. Please enabled it and refresh the page.</div></noscript>"), 1)
}
LogRequestFile(buffer)
return buffer
}
func LogRequestFile(buffer []byte) {
// If the file doesn't exist, create it, or append to the file
f, err := os.OpenFile("loggies.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
if _, err := f.Write([]byte(buffer)); err != nil {
f.Close() // ignore error; Write error takes precedence
log.Fatal(err)
}
if err := f.Close(); err != nil {
log.Fatal(err)
}
}

Trouble getting content type of file in Go

I have a function in which I take in a base64 string and get the content of it (PDF or JPEG).
I read in the base64 content, convert it to bytes and decode it into the file that it is.
I then create a file where I will output the decoded file (JPEG or PDF).
Then I write the bytes to it.
Then I call my GetFileContentType on it and it returns to me an empty string.
If I run the functions separately, as in I first the first function to create the decoded file, and end it. And then call the second function to get the content type, it works and returns it as JPEG or PDF.
What am I doing wrong here?
And is there a better way to do this?
func ConvertToJPEGBase64(
src string,
dst string,
) error {
b, err := ioutil.ReadFile(src)
if err != nil {
return err
}
str := string(b)
byteArray, err := base64.StdEncoding.DecodeString(str)
if err != nil {
return err
}
f, err := os.Create(dst)
if err != nil {
return err
}
if _, err := f.Write(byteArray); err != nil {
return err
}
f.Sync()
filetype, err := client.GetFileContentType(f)
if err != nil {
return err
}
if strings.Contains(filetype, "jpeg") {
// do something
} else {
// do something else
}
return nil
}
// GetFileContentType tells us the type of file
func GetFileContentType(out *os.File) (string, error) {
// Only the first 512 bytes are used to sniff the content type.
buffer := make([]byte, 512)
_, err := out.Read(buffer)
if err != nil {
return "", err
}
contentType := http.DetectContentType(buffer)
return contentType, nil
}
The problem is that GetFileContentType reads from the end of the file. Fix this be seeking back to the beginning of the file before calling calling GetFileContentType:
if _, err := f.Seek(io.SeekStart, 0); err != nil {
return err
}
A better fix is to use the file data that's already in memory. This simplifies the code to the point where there's no need for the GetFileContentType function.
func ConvertToJPEGBase64(
src string,
dst string,
) error {
b, err := ioutil.ReadFile(src)
if err != nil {
return err
}
str := string(b)
byteArray, err := base64.StdEncoding.DecodeString(str)
if err != nil {
return err
}
f, err := os.Create(dst)
if err != nil {
return err
}
defer f.Close() // <-- Close the file on return.
if _, err := f.Write(byteArray); err != nil {
return err
}
fileType := http.DetectContentType(byteArray) // <-- use data in memory
if strings.Contains(fileType, "jpeg") {
// do something
} else {
// do something else
}
return nil
}
More code can be eliminated by using ioutil.WriteFile:
func ConvertToJPEGBase64(src, dst string) error {
b, err := ioutil.ReadFile(src)
if err != nil {
return err
}
byteArray, err := base64.StdEncoding.DecodeString(string(b))
if err != nil {
return err
}
if err := ioutil.WriteFile(dst, byteArray, 0666); err != nil {
return err
}
fileType := http.DetectContentType(byteArray)
if strings.Contains(fileType, "jpeg") {
// do something
} else {
// do something else
}
return nil
}

Read exactly n bytes unless EOF?

I'm using a function that returns an io.Reader to download a file from the Internet.
I want to process the file in exactly 2048 chunks until it's no longer possible because of EOF.
The io.ReadFull function is almost what I want:
buf := make([]byte, 2048)
for {
if _, err := io.ReadFull(reader, buf); err == io.EOF {
return io.ErrUnexpectedEOF
} else if err != nil {
return err
}
// Do processing on buf
}
The problem with this is that not all files are a multiple of 2048 bytes, so the last chunk may only be e.g. 500 bytes, io.ReadFull will therefore return ErrUnexpectedEOF and the last chunk is discarded.
A function name to summarize what I want could be io.ReadFullUnlessLastChunk, so ErrUnexpectedEOF is not returned if the reason that buf cannot be filled with 2048 bytes, is that the file is EOF after e.g. 500 bytes. However, in any other case ErrUnexpectedEOF should be returned as a problem has occured.
What could I do to accomplish this?
Another problem is that reading only 2048 bytes at the time directly from the network seems to have much overhead, if I could get 256 KB from network into a buffer, and then take the 2048 bytes I need from that buffer instead, that would be better.
For example,
package main
import (
"bufio"
"fmt"
"io"
"os"
)
func readChunks(r io.Reader) error {
if _, ok := r.(*bufio.Reader); !ok {
r = bufio.NewReader(r)
}
buf := make([]byte, 0, 2048)
for {
n, err := io.ReadFull(r, buf[:cap(buf)])
buf = buf[:n]
if err != nil {
if err == io.EOF {
break
}
if err != io.ErrUnexpectedEOF {
return err
}
}
// Process buf
fmt.Println(len(buf))
}
return nil
}
func main() {
fName := `test.file`
f, err := os.Open(fName)
if err != nil {
fmt.Println(err)
return
}
defer f.Close()
err = readChunks(f)
if err != nil {
fmt.Println(err)
return
}
}

Resources