Multipart form uploads + memory leaks in golang? - memory-management

The following server code:
package main
import (
"fmt"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
file, _, err := r.FormFile("file")
if err != nil {
fmt.Fprintln(w, err)
return
}
defer file.Close()
return
}
func main() {
http.ListenAndServe(":8081", http.HandlerFunc(handler))
}
being run and then calling it with:
curl -i -F "file=#./large-file" --form hello=world http://localhost:8081/
Where the large-file is about 80MB seems to have some form of memory leak in Go 1.4.2 on darwin/amd64 and linux/amd64.
When I hook up pprof, I see that bytes.makeSlice uses 96MB of memory after calling the service a few times (eventually called by r.FormFile in my code above).
If I keep calling curl, the memory usage of the process grow slows over time, eventually seeming to stick around 300MB on my machine.
Thoughts? I assume this isn't expected/ I'm doing something wrong?

If the memory usage stagnates at a "maximum", I wouldn't really call that a memory leak. I would rather say the GC not being eager and being lazy. Or just don't want to physically free memory if it is frequently reallocated / needed. If it would be really a memory leak, used memory wouldn't stop at 300 MB.
r.FormFile("file") will result in a call to Request.ParseMultipartForm(), and 32 MB will be used as the value of maxMemory parameter (the value of defaultMaxMemory variable defined in request.go). Since you upload a larger file (80 MB), a buffer of size 32 MB at least will be created - eventually (this is implemented in multipart.Reader.ReadFrom()). Since bytes.Buffer is used to read the content, the reading process will start with a small or empty buffer, and reallocate whenever a bigger is needed.
The strategy of buffer reallocations and the buffer sizes are implementation dependent (and also depends on the size of the chunks being read/decoded from the request), but just to have a rough picture, imagine it like this: 0 bytes, 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB, 16 MB, 64 MB. Again, this is just theoretical, but illustrates that the sum can even grow beyond 100 MB just to read the first 32 MB of the file in memory at which point it will be decided that it will be moved/stored in file. See the implementation of multipart.Reader.ReadFrom() for details. This reasonably explains the 96 MB allocation.
Do this a couple of times, and without the GC releasing the allocated buffers immediately, you can easily end up with 300 MB. And if there is enough free memory, there is no pressure on the GC to hurry with releasing memory. The reason why you see it growing relatively big is because large buffers are used in the background. Would you do the same with uploading a 1MB file, you would probably not experience this.
If it is important to you, you can also call Request.ParseMultipartForm() manually with a smaller maxMemory value, e.g.
r.ParseMultipartForm(2 << 20) // 2 MB
file, _, err := r.FormFile("file")
// ... rest of your handler
Doing so much smaller (and fewer) buffers will be allocated in the background.

Related

How to release memory allocated by a slice? [duplicate]

This question already has answers here:
How do you clear a slice in Go?
(3 answers)
Cannot free memory once occupied by bytes.Buffer
(2 answers)
Closed 5 years ago.
package main
import (
"fmt"
"time"
)
func main() {
storage := []string{}
for i := 0; i < 50000000; i++ {
storage = append(storage, "string string string string string string string string string string string string")
}
fmt.Println("done allocating, emptying")
storage = storage[:0]
storage = nil
for {
time.Sleep(1 * time.Second)
}
}
The code above will allocate about ~30mb of memory, and then won't release it. Why is that? How can I force go to release memory used by this slice? I sliced that slice and then nilled it.
The program I'm debugging is a simple HTTP input buffer: it appends all requests into large chunks, and sends these chunks over a channel to goroutine for processing. But problem is illustrated above - I can't get storage to release the memory and then eventually run out of memory.
Edit: as some people pointed out to similar question, no, it first doesn't work, second isn't what I'm asking for. The slice gets emptied, the memory does not.
There are several things going on here.
The first one which is needed to be absorbed is that Go is
a garbage-collected language; the actual algorithm of its GC
is mostly irrelevant but one aspect of it is crucial to understand:
it does not use reference counting, and hence there's no way to
somehow make the GC immediately reclaim the memory of any given
value whose storage is allocated on the heap.
To recap it in more simple words, it's futile to do
s := make([]string, 10*100*100)
s = nil
as the second statement will indeed remove the sole reference
to the slice's underlying memory but won't make the GC go
and "mark" that memory as available for reuse.
This means two things:
You should know how the GC works.
This explains how it works
since v1.5 and up until now (v1.10 these days).
You should structure those of your algorythms which are
memory-intensive in a way that reduces memory pressure.
The latter can be done in several ways:
Preallocate, when you have a sensible idea about how much to.
In your example, you start with a slice of length 0,
and then append to it a lot. Now, almost all library code which deals
with growing memory buffers—the Go runtime included—deals with these
allocations by 1) allocating twice the memory requested—hoping to
prevent several future allocations, and 2) copies the "old" contents
over, when it had to reallocate. This one is important: when reallocation
happens, it means there's two memory regions now: the old one and the new
one.
If you can estimate that you may need to hold N elements on
average, preallocate for them using make([]T, 0, N)—
more info here
and here.
If you'll need to hold less than N elements, the tail of that buffer
will be unused, and if you'll need to hold more than N, you'll need
to reallocate, but on average, you won't need any reallocations.
Re-use your slice(s). Say, in your case, you could "reset" the slice
by reslicing it to the zero length and then use it again for the next
request. This is called "pooling", and in the case of mass-parallel access
to such a pool, you could use sync.Pool to hold your buffers.
Limit the load on your system to make the GC be able to cope with
the sustained load. A good overview of the two approaches to such
limiting is this.
In the program you wrote, it makes no sense to release memory because no part of code is requesting it any more.
To make a valid case, you have to request a new memory and release it inside the loop. Then you will observe that the memory consumption will stabilize at some point.

Should the stats reported by Go's runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?

In Go Should the "Sys" stat or any other stat/combination reported by runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?
Alternatively, assuming some memory may be swapped out, should the Sys stat be approximately greater than or equal to the RSS?
We have a long-running web service that deals with a high frequency of requests and we are finding that the RSS quickly climbs up to consume virtually all of the 64GB memory on our servers. When it hits ~85% we begin to experience considerable degradation in our response times and in how many concurrent requests we can handle. The run I've listed below is after about 20 hours of execution, and is already at 51% memory usage.
I'm trying to determine if the likely cause is a memory leak (we make some calls to CGO). The data seems to indicate that it is, but before I go down that rabbit hole I want to rule out a fundamental misunderstanding of the statistics I'm using to make that call.
This is an amd64 build targeting linux and executing on CentOS.
Reported by runtime.ReadMemStats:
Alloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed
Sys: 3686471104 bytes (3515.69MB) // bytes obtained from system (sum of XxxSys below)
HeapAlloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed (same as Alloc above)
HeapSys: 3104931840 bytes (2961.09MB) // bytes obtained from system
HeapIdle: 1672339456 bytes (1594.87MB) // bytes in idle spans
HeapInuse: 1432592384 bytes (1366.23MB) // bytes in non-idle span
Reported by ps aux:
%CPU %MEM VSZ RSS
1362 51.3 306936436 33742120

How does memory allocation work for buffered channels

If I have a buffered channel like this:
ch := make(chan int, 1000000)
is 8MB of memory allocated off the bat, or does the memory allocation grow/shrink depending on the amount of data?
The full size of the buffer (plus I believe two words for the channel itself) will be allocated up front and retained until it is garbage collected.

Write operation cost

I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.
Approach 1: Keep open the file pointer at the start of the code and
write it for every string. It makes it 20000*30 write operations.
Approach 2: Use bytes.Buffer Go and store everything in the buffer and
write it at the end.Also in this case should the file pointer be
opened from the beginning of the code or at the end of the code. Does
it matter?
I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open.
I am using f.WriteString(<string>) and buffer.WriteString(<some string>) buffer is of type bytes.Buffer and f is the file pointer open.
bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data
Comparing to your second approach bufio.Writer
makes more syscalls (N/S instead of 1)
uses less memory (S bytes instead of N bytes)
where S - is buffer size (can be specified via bufio.NewWriterSize), N - total size of data that needs to be written.
Example usage (https://play.golang.org/p/AvBE1d6wpT):
f, err := os.Create("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // Don't forget to flush!
if err != nil {
log.Fatal(err)
}
The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.
Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.
What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).
I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.
Syscalls are not cheap, so the second approach is better.
You can use lat_syscall tool from lmbench to measure how long it takes to call single write:
$ ./lat_syscall write
Simple write: 0.1522 microseconds
So, on my system it will take approximately 20000 * 0.15μs = 3ms extra time just to call write for every string.

Go memory consumption with many goroutines

I was trying to check how Go will perform with 100,000 goroutines. I wrote a simple program to spawn that many routines which does nothing but print some announcements. I restricted the MaxStack size to just 512 bytes. But what I noticed was the program size doesn't decrease with that. It was consuming around 460 MB of memory and hence around 4 KB per goroutine. My question is, can we set max stack size lower than the "minimum" stack size (which may be 4 KB) for the goroutines. How can we set the minimum Stack size that Goroutine starts with ?
Below is sample code I used for the test:
package main
import "fmt"
import "time"
import "runtime/debug"
func main() {
fmt.Printf("%v\n", debug.SetMaxStack(512))
var i int
for i = 0; i < 100000; i++ {
go func(x int) {
for {
time.Sleep(10 * time.Millisecond)
//fmt.Printf("I am %v\n", x)
}
}(i)
}
fmt.Println("Done")
time.Sleep(999999999999)
}
There's currently no way to set the minimum stack size for goroutines.
Go 1.2 increased the minimum size from 4KB to 8KB
The docs say:
"In Go 1.2, the minimum size of the stack when a goroutine is created has been lifted from 4KB to 8KB. Many programs were suffering performance problems with the old size, which had a tendency to introduce expensive stack-segment switching in performance-critical sections. The new number was determined by empirical testing."
But they go on to say:
"Updating: The increased minimum stack size may cause programs with many goroutines to use more memory. There is no workaround, but plans for future releases include new stack management technology that should address the problem better."
So you may have more luck in the future.
See http://golang.org/doc/go1.2#stack_size for more info.
The runtime/debug.SetMaxStack function only determines a what point does go consider a program infinitely recursive, and terminate it. http://golang.org/pkg/runtime/debug/#SetMaxStack
Setting it absurdly low does nothing to the minimum size of stacks, and only limits the maximum size by virtue of your program crashing when any stack's in-use size exceeds the limit.
Technically the crash only happens when the stack must be grown, so your program will die when a stack needs more than 8KB (or 4KB prior to go 1.2).
The reason why your program uses a minimum of 4KB * nGoroutines is because stacks are page-aligned, so there can never be more than one stack on a VM page. Therefore your program will use at least nGoroutines worth of pages, and OSes usually only measure and allocate memory in page-sized increments.
The only way to change the starting (minimum) size of a stack is to modify and recompile the go runtime (and possibly the compiler too).
Go 1.3 will include contiguous stacks, which are generally faster than the split stacks in Go 1.2 and earlier, and which may also lead to smaller initial stacks in the future.
Just a note: in Go 1.4: the minimum size of the goroutine stack has decreased from 8Kb to 2Kb.
And as per Go 1.13, it's still same -https://github.com/golang/go/blob/bbd25d26c0a86660fb3968137f16e74837b7a9c6/src/runtime/stack.go#L72:
// The minimum size of stack used by Go code
_StackMin = 2048

Resources