Measure upload speed when using http.ResponseBody - go

Is there a way to measure a client's download speed when uploading a large quantity of data using an http.ResponseWriter?
Update for context: I'm writing a streaming download endpoint for blob storage which stores blobs in chunks. The files are very large, so loading and buffering whole blobs is not feasible. Being able to monitor the buffer state, bytes written or similar would allow better scheduling of the chunk downloads.
E.g. when Write()ing to the response, is there a way to check how much data is already queued?
An example of the context, but not using a file object.
func downloadHandler(w http.ResponseWriter, req *http.Request, ps httprouter.Params) {
// Open some file.
f := os.Open("somefile.txt")
// Adjust the iteration speed of this loop to the client's download speed.
for
{
data := make([]byte, 1000)
count, err := f.Read(data)
if err != nil {
log.Fatal(err)
}
if count == 0 {
break
}
// Upload data chunk to client.
w.Write(data[:count])
}
}

You could implement a custom http.ResponseWriter that measures bytes sent, and calculates throughput.
There are likely packages to do similar things already. Google found this one (which I haven't used).

Related

Extending Golang's http.Resp.Body to handle large files

I have a client application which reads in the full body of a http response into a buffer and performs some processing on it:
body, _ = ioutil.ReadAll(containerObject.Resp.Body)
The problem is that this application runs on an embedded device, so responses that are too large fill up the device RAM, causing Ubuntu to kill the process.
To avoid this, I check the content-length header and bypass processing if the document is too large. However, some servers (I'm looking at you, Microsoft) send very large html responses without setting content-length and crash the device.
The only way I can see of getting around this is to read the response body up to a certain length. If it reaches this limit, then a new reader could be created which first streams the in-memory buffer, then continues reading from the original Resp.Body. Ideally, I would assign this new reader to the containerObject.Resp.Body so that callers would not know the difference.
I'm new to GoLang and am not sure how to go about coding this. Any suggestions or alternative solutions would be greatly appreciated.
Edit 1: The caller expects a Resp.Body object, so the solution needs to be compatible with that interface.
Edit 2: I cannot parse small chunks of the document. Either the entire document is processed or it is passed unchanged to the caller, without loading it into memory.
If you need to read part of the response body, then reconstruct it in place for other callers, you can use a combination of an io.MultiReader and ioutil.NopCloser
resp, err := http.Get("http://google.com")
if err != nil {
return err
}
defer resp.Body.Close()
part, err := ioutil.ReadAll(io.LimitReader(resp.Body, maxReadSize))
if err != nil {
return err
}
// do something with part
// recombine the buffered part of the body with the rest of the stream
resp.Body = ioutil.NopCloser(io.MultiReader(bytes.NewReader(part), resp.Body))
// do something with the full Response.Body as an io.Reader
If you can't defer resp.Body.Close() because you intend to return the response before it's read in its entirety, you will need to augment the replacement body so that the Close() method applies to the original body. Rather than using the ioutil.NopCloser as the io.ReadCloser, create your own that refers to the correct method calls.
type readCloser struct {
io.Closer
io.Reader
}
resp.Body = readCloser{
Closer: resp.Body,
Reader: io.MultiReader(bytes.NewReader(part), resp.Body),
}

How to save data streams in S3? aws-sdk-go example not working?

I am trying to persist a given stream of data to an S3 compatible storage.
The size is not known before the stream ends and can vary from 5MB to ~500GB.
I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3.
Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?
The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk
When I try to pipe data in with a pipe | I get the following error:
failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek
Am I doing something wrong or is the example not working as I expect it to?
I although tried minio-go, with PutObject() or client.PutObjectStreaming().
This is functional but consumes as much memory as the data to store.
Is there a better solution?
Is there a small example program that can pipe arbitrary data into S3?
You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin "unseekable" by wrapping it into an io.Reader. This is because the Uploader, while it requires only an io.Reader as the input body, under the hood it does a check to see whether the input body is also a Seeker and if it is, it does call Seek on it. And since os.Stdin is just an *os.File which implements the Seeker interface, by default, you would get the same error you got from PutObjectWithContext.
The Uploader also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.
Here's a modified version of the linked example, stripped off of code that can remain unchanged.
package main
import (
// ...
"io"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
type reader struct {
r io.Reader
}
func (r *reader) Read(p []byte) (int, error) {
return r.r.Read(p)
}
func main() {
// ... parse flags
sess := session.Must(session.NewSession())
uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
u.PartSize = 20 << 20 // 20MB
// ... more configuration
})
// ... context stuff
_, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
Body: &reader{os.Stdin},
})
// ... handle error
}
As to whether this is a better solution than minio-go I do not know, you'll have to test that yourself.

Golang high cpu usage on simple webserver unable to understand why?

So I have a simple net/http webserver. All it does is is deliver 100MB of random bytes, which I intend to use for network speed testing. My handler for the 100mb endpoint is really simple (pasted below). The code works fine and I get my random byte file, the problem is when I run this and someone downloads these 100megabytes, the CPU for this program shoots up to 150% and stays there until this handler finishes running. Am I doing something very wrong here? What could I do to improve this handler's performance?
func downloadHandler(w http.ResponseWriter, r *http.Request) {
str := RandStringBytes(8192); //generates 8192 bytes of randomness
sz := 1000*1000*100; //100Megabytes
iter := sz/len(str)+1;
w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Length", strconv.Itoa( sz ))
for i := 0; i < iter ; i++ {
fmt.Fprintf(w, str )
}
}
The problem is that fmt.Fprintf() expects a format string:
func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error)
And you pass it a big, 8 KB format string. The fmt package has to analyze the format string, it is not something that gets to the output as is. Most definately this is what is eating your CPU.
If the random string contains the special % sign, that even makes your case worse, as then fmt.Fprintf() might expect further arguments which you don't "deliver", so the fmt package also has to (will) include error messages in the output, such as:
fmt.Fprintf(os.Stdout, "aaa%bbb%d")
Output:
aaa%!b(MISSING)bb%!d(MISSING)
Use fmt.Fprint() instead which does not expect a format string:
fmt.Fprint(w, str)
Or even better, convert your random string to a byte slice once, and just keep writing that:
data := []byte(str)
for i := 0; i < iter; i++ {
if _, err := w.Write(data); err != nil {
// Handle error, e.g. return
}
}
Delivering large amount of data – you won't get a faster solution than writing a prepared byte slice in a loop (maybe slightly if you vary the size of the slice). If your solution is still "slow", that might be due to your RandStringBytes() function which we don't know anything about, or your output might be compressed (gzipped) if you use other handlers or some framework (which does use relatively high CPU). Also if the client that receives the response is also on your computer (e.g. a browser), it –or a firewall / antivirus software– may check / analyze the response for malicious code (which may also be resource intensive).

Most efficient way to read Zlib compressed file in Golang?

I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.
This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.
Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?
What is the optimal buffer size, or is there a way to calculate this?
If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?
You can wrap your zlib reader in a bufio reader, then implement a specialized reader on top that will rebuild your chunks of data by reading from the bufio reader until a full chunk is read. Be aware that bufio.Read calls Read at most once on the underlying Reader, so you need to call ReadByte in a loop. bufio will however take care of the unpredictable size of data returned by the zlib reader for you.
If you do not want to implement a specialized reader, you can just go with a bufio reader and read as many bytes as needed with ReadByte() to fill a given data type. The optimal buffer size is at least the size of your largest data structure, up to whatever you can shove into memory.
If you read directly from the zlib reader, there is no guarantee that your data won't be split between two reads.
Another, maybe cleaner, solution is to implement a writer for your data, then use io.Copy(your_writer, zlib_reader).
OK, so I figured this out in the end using my own implementation of a reader.
Basically the struct looks like this:
type reader struct {
at int
n int
f io.ReadCloser
buf []byte
}
This can be attached to the zlib reader:
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach zlib reader
r := new(reader)
r.buf = make([]byte, 2048)
r.f, err = zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer r.f.Close()
Then x number of bytes can be read straight out of the zlib reader using a function like this:
mydata := r.readx(10)
func (r *reader) readx(x int) []byte {
for r.n < x {
copy(r.buf, r.buf[r.at:r.at+r.n])
r.at = 0
m, err := r.f.Read(r.buf[r.n:])
if err != nil {
panic(err)
}
r.n += m
}
tmp := make([]byte, x)
copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
r.at += x
r.n -= x
return tmp
}
Note that I have no need to check for EOF because I my parser should stop itself at the right place.

What is go lang http.Request Body in term of computer science?

in http.Request type Body is closed when request is send by client. Why it need to be closed, why it can not be string, which you can read over and over?
This is called a stream. It's useful because it lets you handle data without having the whole set of data available in memory. It also lets you give the results of the operations you may do faster : you don't wait for the whole set to be computed.
As soon as you want to handle big data or worry about performances, you need streams.
It's also a convenient abstraction that lets you handle data one by one even when the whole set is available without having to handle an offset to iterate over the whole.
You can store the request stream as a string using the bytes and the io package:
func handler(w http.ResponseWriter, r *http.Request) {
var bodyAsString string
b := new(bytes.Buffer)
_, err := io.Copy(b, r)
if err == io.EOF {
bodyAsString = b.String()
}
}

Resources