How to analyze golang memory? - performance

I wrote a golang program, that uses 1.2GB of memory at runtime.
Calling go tool pprof http://10.10.58.118:8601/debug/pprof/heap results in a dump with only 323.4MB heap usage.
What's about the rest of the memory usage?
Is there any better tool to explain golang runtime memory?
Using gcvis I get this:
.. and this heap form profile:
Here is my code: https://github.com/sharewind/push-server/blob/v3/broker

The heap profile shows active memory, memory the runtime believes is in use by the go program (ie: hasn't been collected by the garbage collector). When the GC does collect memory the profile shrinks, but no memory is returned to the system. Your future allocations will try to use memory from the pool of previously collected objects before asking the system for more.
From the outside, this means that your program's memory use will either be increasing, or staying level. What the outside system presents as the "Resident Size" of your program is the number of bytes of RAM is assigned to your program whether it's holding in-use go values or collected ones.
The reason why these two numbers are often quite different are because:
The GC collecting memory has no effect on the outside view of the program
Memory fragmentation
The GC only runs when the memory in use doubles the memory in use after the previous GC (by default, see: http://golang.org/pkg/runtime/#pkg-overview)
If you want an accurate breakdown of how Go sees the memory you can use the runtime.ReadMemStats call: http://golang.org/pkg/runtime/#ReadMemStats
Alternatively, since you are using web-based profiling if you can access the profiling data through your browser at: http://10.10.58.118:8601/debug/pprof/ , clicking the heap link will show you the debugging view of the heap profile, which has a printout of a runtime.MemStats structure at the bottom.
The runtime.MemStats documentation (http://golang.org/pkg/runtime/#MemStats) has the explanation of all the fields, but the interesting ones for this discussion are:
HeapAlloc: essentially what the profiler is giving you (active heap memory)
Alloc: similar to HeapAlloc, but for all go managed memory
Sys: the total amount of memory (address space) requested from the OS
There will still be discrepancies between Sys, and what the OS reports because what Go asks of the system, and what the OS gives it are not always the same. Also CGO / syscall (eg: malloc / mmap) memory is not tracked by go.

As an addition to #Cookie of Nine's answer, in short: you can try the --alloc_space option.
go tool pprof use --inuse_space by default. It samples memory usage so the result is subset of real one.
By --alloc_space pprof returns all alloced memory since program started.

UPD (2022)
For those who knows Russian, I made a presentation
and wrote couple of articles on this topic:
RAM consumption in Golang: problems and solutions (Потребление оперативной памяти в языке Go: проблемы и пути решения)
Preventing Memory Leaks in Go, Part 1. Business Logic Errors (Предотвращаем утечки памяти в Go, ч. 1. Ошибки бизнес-логики)
Preventing memory leaks in Go, part 2. Runtime features (Предотвращаем утечки памяти в Go, ч. 2. Особенности рантайма)
Original answer (2017)
I was always confused about the growing residential memory of my Go applications, and finally I had to learn the profiling tools that are present in Go ecosystem. Runtime provides many metrics within a runtime.Memstats structure, but it may be hard to understand which of them can help to find out the reasons of memory growth, so some additional tools are needed.
Profiling environment
Use https://github.com/tevjef/go-runtime-metrics in your application. For instance, you can put this in your main:
import(
metrics "github.com/tevjef/go-runtime-metrics"
)
func main() {
//...
metrics.DefaultConfig.CollectionInterval = time.Second
if err := metrics.RunCollector(metrics.DefaultConfig); err != nil {
// handle error
}
}
Run InfluxDB and Grafana within Docker containers:
docker run --name influxdb -d -p 8086:8086 influxdb
docker run -d -p 9090:3000/tcp --link influxdb --name=grafana grafana/grafana:4.1.0
Set up interaction between Grafana and InfluxDB Grafana (Grafana main page -> Top left corner -> Datasources -> Add new datasource):
Import dashboard #3242 from https://grafana.com (Grafana main page -> Top left corner -> Dashboard -> Import):
Finally, launch your application: it will transmit runtime metrics to the contenerized Influxdb. Put your application under a reasonable load (in my case it was quite small - 5 RPS for a several hours).
Memory consumption analysis
Sys (the synonim of RSS) curve is quite similar to HeapSys curve. Turns out that dynamic memory allocation was the main factor of overall memory growth, so the small amount of memory consumed by stack variables seem to be constant and can be ignored;
The constant amount of goroutines garantees the absence of goroutine leaks / stack variables leak;
The total amount of allocated objects remains the same (there is no point in taking into account the fluctuations) during the lifetime of the process.
The most surprising fact: HeapIdle is growing with the same rate as a Sys, while HeapReleased is always zero. Obviously runtime doesn't return memory to OS at all , at least under the conditions of this test:
HeapIdle minus HeapReleased estimates the amount of memory
that could be returned to the OS, but is being retained by
the runtime so it can grow the heap without requesting more
memory from the OS.
For those who's trying to investigate the problem of memory consumption I would recommend to follow the described steps in order to exclude some trivial errors (like goroutine leak).
Freeing memory explicitly
It's interesting that the one can significantly decrease memory consumption with explicit calls to debug.FreeOSMemory():
// in the top-level package
func init() {
go func() {
t := time.Tick(time.Second)
for {
<-t
debug.FreeOSMemory()
}
}()
}
In fact, this approach saved about 35% of memory as compared with default conditions.

You can also use StackImpact, which automatically records and reports regular and anomaly-triggered memory allocation profiles to the dashboard, which are available in a historical and comparable form. See this blog post for more details Memory Leak Detection in Production Go Applications
Disclaimer: I work for StackImpact

Attempting to answer the following original question
Is there any better tool to explain golang runtime memory?
I find the following tools useful
Statsview
https://github.com/go-echarts/statsview
Statsview is integrated the standard net/http/pprof
Statsviz
https://github.com/arl/statsviz

This article will be pretty much helpful for your problem.
https://medium.com/safetycultureengineering/analyzing-and-improving-memory-usage-in-go-46be8c3be0a8
I ran a pprof analysis. pprof is a tool that’s baked into the Go language that allows for analysis and visualisation of profiling data collected from a running application. It’s a very helpful tool that collects data from a running Go application and is a great starting point for performance analysis. I’d recommend running pprof in production so you get a realistic sample of what your customers are doing.
When you run pprof you’ll get some files that focus on goroutines, CPU, memory usage and some other things according to your configuration. We’re going to focus on the heap file to dig into memory and GC stats. I like to view pprof in the browser because I find it easier to find actionable data points. You can do that with the below command.
go tool pprof -http=:8080 profile_name-heap.pb.gz
pprof has a CLI tool as well, but I prefer the browser option because I find it easier to navigate. My personal recommendation is to use the flame graph. I find that it’s the easiest visualiser to make sense of, so I use that view most of the time. The flame graph is a visual version of a function’s stack trace. The function at the top is the called function, and everything underneath it is called during the execution of that function. You can click on individual function calls to zoom in on them which changes the view. This lets you dig deeper into the execution of a specific function, which is really helpful. Note that the flame graph shows the functions that consume the most resources so some functions won’t be there. This makes it easier to figure out where the biggest bottlenecks are.
Is this helpful?

Try GO plugin for Tracy. Tracy is "A real time, nanosecond resolution, remote telemetry" (...).
GoTracy (name of the plugin) is the agent with connect with the Tracy and send necessary information to better understand your app process. After importing plugin You can put telemetry code like in description below:
func exampleFunction() {
gotracy.TracyInit()
gotracy.TracySetThreadName("exampleFunction")
for i := 0.0; i < math.Pi; i += 0.1 {
zoneid := gotracy.TracyZoneBegin("Calculating Sin(x) Zone", 0xF0F0F0)
gotracy.TracyFrameMarkStart("Calculating sin(x)")
sin := math.Sin(i)
gotracy.TracyFrameMarkEnd("Calculating sin(x)")
gotracy.TracyMessageLC("Sin(x) = "+strconv.FormatFloat(sin, 'E', -1, 64), 0xFF0F0F)
gotracy.TracyPlotDouble("sin(x)", sin)
gotracy.TracyZoneEnd(zoneid)
gotracy.TracyFrameMark()
}
}
The result of is similar to:
The plugin is placed in:
https://github.com/grzesl/gotracy
The Tracy is placed in:
https://github.com/wolfpld/tracy

Related

relationship between container_memory_working_set_bytes and process_resident_memory_bytes and total_rss

I'm looking to understanding the relationship of
container_memory_working_set_bytes vs process_resident_memory_bytes vs total_rss (container_memory_rss) + file_mapped so as to better equipped system for alerting on OOM possibility.
It seems against my understanding (which is puzzling me right now) given if a container/pod is running a single process executing a compiled program written in Go.
Why is the difference between container_memory_working_set_bytes is so big(nearly 10 times more) with respect to process_resident_memory_bytes
Also the relationship between container_memory_working_set_bytes and container_memory_rss + file_mapped is weird here, something I did not expect, after reading here
The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file. This should not be confused with the true resident set size or the amount of physical memory used by the cgroup. rss + file_mapped will give you the resident set size of cgroup. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
So cgroup total resident set size is rss + file_mapped how does this value is less than container_working_set_bytes for a container that is running in the given cgroup
Which make me feels something with this stats that I'm not correct.
Following are the PROMQL used to build the above graph
process_resident_memory_bytes{container="sftp-downloader"}
container_memory_working_set_bytes{container="sftp-downloader"}
go_memstats_heap_alloc_bytes{container="sftp-downloader"}
container_memory_mapped_file{container="sftp-downloader"} + container_memory_rss{container="sftp-downloader"}
So the relationship seems is like this
container_working_set_in_bytes = container_memory_usage_bytes - total_inactive_file
container_memory_usage_bytes as its name implies means the total memory used by the container (but since it also includes file cache i.e inactive_file which OS can release under memory pressure) substracting the inactive_file gives container_working_set_in_bytes
Relationship between container_memory_rss and container_working_sets can be summed up using following expression
container_memory_usage_bytes = container_memory_cache + container_memory_rss
cache reflects data stored on a disk that is currently cached in memory. it contains active + inactive file (mentioned above)
This explains why the container_working_set was higher.
Ref #1
Ref #2
Not really an answer, but still two assorted points.
Does this help to make sense of the chart?
Here at my $dayjob, we had faced various different issues with how different tools external to the Go runtime count and display memory usage of a process executing a program written in Go.
Coupled with the fact Go's GC on Linux does not actually release freed memory pages to the kernel but merely madvise(2)s it that such pages are MADV_FREE, a GC cycle which had freed quite a hefty amount of memory does not result in any noticeable change of the readings of the "process' RSS" taken by the external tooling (usually cgroups stats).
Hence we're exporting our own metrics obtained by periodically calling runtime.ReadMemStats (and runtime/debug.ReadGCStats) in any major serivice written in Go — with the help of a simple package written specifically for that. These readings reflect the true idea of the Go runtime about the memory under its control.
By the way, the NextGC field of the memory stats is super useful to watch if you have memory limits set for your containers because once that reading reaches or surpasses your memory limit, the process in the container is surely doomed to be eventually shot down by the oom_killer.

Memory is not being released

This is a simple program to decompress the string, I just running a loop to show that memory usage increases and the memory used never get released.
Memory is not getting released even after 8hr also
Package for decompressing string: https://github.com/Albinzr/lzGo - (simple lz string algorithm)
I'm adding a gist link since the string used for decompressing is large
Source Code:
Code
Activity Monitor
I'm completely new to go, Can anyone tell me how I can solve the memory issue?
UPDATE Jul 15 20
The app still crashes when the memory limit is reached Since it only uses 12mb - 15mb this should not happen!!
There is a lot going on here.
First, using Go version 1.14.2 your program works fine for me. It does not appear to be leaking memory.
Second, even when I purposely created a memory leak by increasing the loop size to 100 and saving the results in an array, I only used about 100 MB of memory.
Which gets us to Third, you should not be using Activity Monitor or any other operating system level tools to be checking for memory leaks in a Go program. Operating System memory management is a painfully complex topic and the OS tools are designed to help you determine how a program is affecting the whole system, not what is going on within the program.
Specifically, macOS "Real Memory" (analogous to RSS, Resident Set Size) includes memory the program is no longer using but the OS has not taken back yet. When the garbage collector frees up memory and tells the OS it does not need that memory anymore, the OS does not immediately take it back. (Why it works that way is way beyond the scope of this answer.)
Also, if the OS is under Memory Pressure, it can take back not only memory the program has freed, but it can also take back (temporarily) memory the program is still using but has not accessed "recently" so that another program that urgently needs memory can use it. In this case, "Real Memory" will be reduced even if the process is not actually using less memory. There is no statistic reported by the operations system that will help you here.
You need to use native Go settings like GODEBUG=gctrace=1 or tools like expvar and expvarmon to see what the garbage collector is doing.
As for why your program ran out of memory when you limited it, keep in mind that by default Go builds a dynamically linked executable and just reading in all the shared libraries can take up a lot of memory. Try building your application with static linking using CGO_ENABLED=0 and see if that helps. See how much memory it uses when you only run 1 iteration of the loop.

How could I make a Go program use more memory? Is that recommended?

I'm looking for option something similar to -Xmx in Java, that is to assign maximum runtime memory that my Go application can utilise. Was checking the runtime , but not entirely if that is the way to go.
I tried setting something like this with func SetMaxStack(), (likely very stupid)
debug.SetMaxStack(5000000000) // bytes
model.ExcelCreator()
The reason why I am looking to do this is because currently there is ample amount of RAM available but the application won't consume more than 4-6% , I might be wrong here but it could be forcing GC to happen much faster than needed leading to performance issue.
What I'm doing
Getting large dataset from RDBMS system , processing it to write out in excel.
Another reason why I am looking for such an option is to limit the maximum usage of RAM on the server where it will be ultimately deployed.
Any hints on this would greatly appreciated.
The current stable Go (1.10) has only a single knob which may be used to trade memory for lower CPU usage by the garbage collection the Go runtime performs.
This knob is called GOGC, and its description reads
The GOGC variable sets the initial garbage collection target percentage. A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100. Setting GOGC=off disables the garbage collector entirely. The runtime/debug package's SetGCPercent function allows changing this percentage at run time. See https://golang.org/pkg/runtime/debug/#SetGCPercent.
So basically setting it to 200 would supposedly double the amount of memory the Go runtime of your running process may use.
Having said that I'd note that the Go runtime actually tries to adjust the behaviour of its garbage collector to the workload of your running program and the CPU processing power at hand.
I mean, that normally there's nothing wrong with your program not consuming lots of RAM—if the collector happens to sweep the garbage fast enough without hampering the performance in a significant way, I see no reason to worry about: the Go's GC is
one of the points of the most intense fine-tuning in the runtime,
and works very good in fact.
Hence you may try to take another route:
Profile memory allocations of your program.
Analyze the profile and try to figure out where the hot spots
are, and whether (and how) they can be optimized.
You might start here
and continue with the gazillion other
intros to this stuff.
Optimize. Typically this amounts to making certain buffers
reusable across different calls to the same function(s)
consuming them, preallocating slices instead of growing them
gradually, using sync.Pool where deemed useful etc.
Such measures may actually increase the memory
truly used (that is, by live objects—as opposed to
garbage) but it may lower the pressure on the GC.

Go garbage collector overhead with minimal allocations?

It's widely accepted that one of the main things holding Go back from C++ level performance is the garbage collector. I'd like to get some intuition to help reason about the overhead of Go's GC in different circumstances. For example, is there nontrivial GC overhead if a program never touches the heap, or just allocates a large block at setup to use as an object pool with self-management? Is a call made to the GC every x seconds, or on every allocation?
As a related question: is my initial assumption correct that Go's GC is the main impediment to C++ level performance, or are there some things that Go just does slower?
The pause time (stop the world) for garbage collection in Golang is in the order of a few milliseconds, or in more recent Golang less than that. (see
https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md)
C++ does not have a garbage collector so these times of pauses do not happen. However, C++ is not magic and memory management must occur if memory for storing objects is to be managed. Memory management is still happening somewhere in your program regardless of the language
Using a static block of memory in C++ and not dealing with any memory management issues is an approach. But Go can do this too. For an outline of how this is done in a real, high performance Go program, see this video
https://www.youtube.com/watch?time_continue=7&v=ZuQcbqYK0BY

Is it possible to deallocate memory manually in go?

After discussion with college, I wonder if it would be possible (even if completely does not make any sense) to deallocate memory manually in go (ie. by using unsafe package). Is it?
Here is a thread that may interest you: Add runtime.Free() for GOGC=off
Interesting part:
The Go GC does not have the ability to manually deallocate blocks
anymore. And besides, runtime. Free is unsafe (people might free still
in use pointers or double free) and then all sorts of C memory problem
that Go tries hard to get rid of will come back. The other reason is
that runtime sometimes allocates behind your back and there is no way
for the program to explicitly free memory.
If you really want to manually manage memory with Go, implement your
own memory allocator based on syscall.Mmap or cgo malloc/free.
Disabling GC for extended period of time is generally a bad solution
for a concurrent language like Go. And Go's GC will only be better
down the road.
TL;DR: Yes, but don't do it
I am a bit late but this question is high ranked on google, so here is an article by the creator of DGraph database which explains an alternative to malloc/calloc which is jemalloc, worth a look
https://dgraph.io/blog/post/manual-memory-management-golang-jemalloc/
With these techniques, we get the best of both worlds: We can do manual memory allocation in critical, memory-bound code paths. At the same time, we can get the benefits of automatic garbage collection in non-critical code paths. Even if you are not comfortable using Cgo or jemalloc, you could apply these techniques on bigger chunks of Go memory, with similar impact.
And I haven't tested it yet, but there is a github library called jemalloc-go
https://github.com/spinlock/jemalloc-go
Go 1.20 introduces an experimental concept of arenas for memory management, per the proposal proposal: arena: new package providing memory arenas. We could manage memory manually through arenas.
We propose the addition of a new arena package to the Go standard library. The arena package will allow the allocation of any number of arenas. Objects of arbitrary type can be allocated from the memory of the arena, and an arena automatically grows in size as needed. When all objects in an arena are no longer in use, the arena can be explicitly freed to reclaim its memory efficiently without general garbage collection. We require that the implementation provide safety checks, such that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens.
Sample codes:
a := arena.New()
var ptrT *T
a.New(&ptrT)
ptrT.val = 1
var sliceT []T
a.NewSlice(&sliceT, 100)
sliceT[99] .val = 4
a.Free()
Example: Per Go 1.20 Experiment: Memory Arenas vs Traditional Memory Management from Pyroscope.
Arenas are a powerful tool for optimizing Go programs, particularly in scenarios where your programs spend significant amount of time parsing large protobuf or JSON blobs.
Some recommendations:
Only use arenas in critical code paths. Do not use them everywhere
Profile your code before and after using arenas to make sure you're adding arenas in areas where they can provide the most benefit
Pay close attention to the lifecycle of the objects created on the arena. - Make sure you don't leak them to other components of your program where objects may outlive the arena
Use defer a.Free() to make sure that you don't forget to free memory
Use arena.Clone() to clone objects back to the heap if you want to use them after an arena was freed
Note: This proposal is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.

Resources