pprof usage and interpretation - go

We believe our go app has a memory leak.
In order to find out what's going on, we are trying with pprof.
We are having a hard time though understanding the reads.
When connecting to go tool pprof http://localhost:6060/debug/pprof/heap?debug=1, a sample output is
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) text
Showing nodes accounting for 17608.45kB, 100% of 17608.45kB total
Showing top 10 nodes out of 67
flat flat% sum% cum cum%
12292.12kB 69.81% 69.81% 12292.12kB 69.81% github.com/acct/repo/vendor/github.com/.../funcA /../github.com/acct/repo/vendor/github.com/../fileA.go
1543.14kB 8.76% 78.57% 1543.14kB 8.76% github.com/acct/repo/../funcB /../github.com/acct/repo/fileB.go
1064.52kB 6.05% 84.62% 1064.52kB 6.05% github.com/acct/repo/vendor/github.com/../funcC /../github.com/acct/repo/vendor/github.com/fileC.go
858.34kB 4.87% 89.49% 858.34kB 4.87% github.com/acct/repo/vendor/golang.org/x/tools/imports.init /../github.com/acct/repo/vendor/golang.org/x/tools/imports/zstdlib.go
809.97kB 4.60% 94.09% 809.97kB 4.60% bytes.makeSlice /usr/lib/go/src/bytes/buffer.go
528.17kB 3.00% 97.09% 528.17kB 3.00% regexp.(*bitState).reset /usr/lib/go/src/regexp/backtrack.go
(Please forgive the clumsy obfuscation)
We interpret funcA to be consuming nearly 70% of memory - but this being around 12MB.
Now top though shows:
top - 18:09:44 up 2:02, 1 user, load average: 0,75, 0,56, 0,38
Tasks: 166 total, 1 running, 165 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3,7 us, 1,6 sy, 0,0 ni, 94,3 id, 0,0 wa, 0,0 hi, 0,3 si, 0,0 st
KiB Mem : 16318684 total, 14116728 free, 1004804 used, 1197152 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 14451260 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4902 me 20 0 1,371g 0,096g 0,016g S 12,9 0,6 1:58.14 mybin
which suggests 1.371 GB of memory used....where is it gone???
Also, pprof docs are quite frugal. We are having difficulties even to understand how it should be used. Our binary is a daemon. For example:
If we start a reading with go tool pprof http://localhost:6060/debug/pprof/heap, is this a one time reading at this particular time or an aggregate over time?
Sometimes hitting text later again in interactive mode seems to report the same values. Are we actually looking at the same values? Do we need to restart go tool pprof... to get fresh values?
Is it a reading of the complete heap, or of some specific go routine, of a specific point in the stack....???
Finally, is this interpretation correct (from http://localhost:6060/debug/pprof/):
/debug/pprof/
profiles:
0 block
64 goroutine
45 heap
0 mutex
13 threadcreate
The binary has 64 open go routines and a total of 45MB of heap memory?

Related

Golang Alloc and HeapAlloc vs pprof large discrepancies

I have a Go program that calculates large correlation matrices in memory. To do this I've set up a pipeline of 3 goroutines where the first reads in files, the second calculates the correlation matrix and the last stores the result to disk.
Problem is, when I run the program, the Go runtime allocates ~17GB of memory while a matrix only takes up ~2-3GB. Using runtime.ReadMemStats shows that the program is using ~17GB (and verified by using htop), but pprof only reports about ~2.3GB.
If I look at the mem stats after running one file through the pipeline:
var mem runtime.MemStats
runtime.ReadMemStats(&mem)
fmt.Printf("Total alloc: %d GB\n", mem.Alloc/1000/1000/1000)
This shows the total allocation of the program:
Total alloc: 17 GB
However, if I run go tool pprof mem.prof I get the following results:
(pprof) top5
Showing nodes accounting for 2.21GB, 100% of 2.21GB total
Showing top 5 nodes out of 9
flat flat% sum% cum cum%
1.20GB 54.07% 54.07% 1.20GB 54.07% dataset.(*Dataset).CalcCorrelationMatrix
1.02GB 45.93% 100% 1.02GB 45.93% bytes.makeSlice
0 0% 100% 1.02GB 45.93% bytes.(*Buffer).WriteByte
0 0% 100% 1.02GB 45.93% bytes.(*Buffer).grow
0 0% 100% 1.02GB 45.93% encoding/json.Indent
So I am wondering how I can go about to find out why the program allocates 17 GB, when it seems that the peak memory usage is only ~2.5GB?
Is there a way to trace the memory usage throughout the program using pprof?
EDIT
I ran the program again with GODEBUG=gctrace=1 and got the following trace:
gc 1 #0.017s 0%: 0.005+0.55+0.003 ms clock, 0.022+0/0.47/0.11+0.012 ms cpu, 1227->1227->1226 MB, 1228 MB goal, 4 P
gc 2 #14.849s 0%: 0.003+1.7+0.004 ms clock, 0.015+0/1.6/0.11+0.018 ms cpu, 1227->1227->1227 MB, 2452 MB goal, 4 P
gc 3 #16.850s 0%: 0.006+60+0.003 ms clock, 0.027+0/0.46/59+0.015 ms cpu, 1876->1876->1712 MB, 2455 MB goal, 4 P
gc 4 #22.861s 0%: 0.005+238+0.003 ms clock, 0.021+0/0.46/237+0.015 ms cpu, 3657->3657->3171 MB, 3658 MB goal, 4 P
gc 5 #30.716s 0%: 0.005+476+0.004 ms clock, 0.022+0/0.44/476+0.017 ms cpu, 5764->5764->5116 MB, 6342 MB goal, 4 P
gc 6 #46.023s 0%: 0.005+949+0.004 ms clock, 0.020+0/0.47/949+0.017 ms cpu, 10302->10302->9005 MB, 10303 MB goal, 4 P
gc 7 #64.878s 0%: 0.006+382+0.004 ms clock, 0.024+0/0.46/382+0.019 ms cpu, 16548->16548->7728 MB, 18011 MB goal, 4 P
gc 8 #89.774s 0%: 0.86+2805+0.006 ms clock, 3.4+0/24/2784+0.025 ms cpu, 20208->20208->17088 MB, 20209 MB goal, 4 P
So it is quite obvious that the heap grows steadily through the program, but I am not able to pinpoint where. I've profiled memory usage using pprof.WriteHeapProfile after calling the memory intensive functions:
func memoryProfile(profpath string) {
if _, err := os.Stat(profpath); os.IsNotExist(err) {
os.Mkdir(profpath, os.ModePerm)
}
f, err := os.Create(path.Join(profpath, "mem.mprof"))
fmt.Printf("Creating memory profile in %s", "data/profile/mem.mprof\n")
if err != nil {
panic(err)
}
if err := pprof.WriteHeapProfile(f); err != nil {
panic(err)
}
f.Close()
}
As mentioned in the comments by JimB, the go profile is a sampling profiler and samples memory usage at certain intervals. In my case the sampling was not frequent enough to catch a function (JSON marshalling) that was using extensive amounts of memory.
Increasing the sample rate of the profiler by setting the environment variable
$ export GODEBUG=memprofilerate=1
Will updateruntime.MemProfileRateand the profile now includes every allocated block.
A possible solution (as it was in my case) is that the binary was compiled with -race, which enables checking for race conditions.
The overhead for this is huge and will look like a massive memory leak if checking with htop or something similar, but won't show in any pprof output

High CPU Utilization issue on Moqui instance

We are facing one problem of random High CPU utilization on Production Server which makes application Not Responding. And we need to restart the application again. We have done initial level diagnostic and couldn’t conclude.
We are using following configuration for Production Server
Amazon EC2 8gb RAM(m4.large) ubuntu 14.04 LTS
Amazon RDS 2gb RAM(t2.small) Mysql database
Java heap size -Xms2048M -Xmx4096
Database Connection Pool size Minimum: 20 and Maximum: 150
MaxThreads 100
Below two results are of top command
1) At 6:52:50 PM
KiB Mem : 8173968 total, 2100304 free, 4116436 used, 1957228 buff/cache
KiB Swap: 1048572 total, 1047676 free, 896 used. 3628092 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20698 root 20 0 6967736 3.827g 21808 S 3.0 49.1 6:52.50 java
2) At 6:53:36 PM
KiB Mem : 8173968 total, 2099000 free, 4116964 used, 1958004 buff/cache
KiB Swap: 1048572 total, 1047676 free, 896 used. 3627512 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND
20698 root 20 0 6967736 3.828g 21808 S 200.0 49.1 6:53.36 java
Note:
Number of Concurrent users - 5 or 6 (at this time)
Number of requests between 6:52:50 PM and 6:53:36 PM - 4
Results shows CPU utilization is increase drastically.
Any suggestion or direction which can lead to solution??
Additionally, following is the cpu utilization graph for last week.
Thanks!
Without seeing a stack trace, I'd guess that the problem is likely Jetty, as there have been recent documented bugs in Jetty causing the behaviour you describe on EC2 (do a google search on this.). I would recommend you do a couple of stack trace dumps during 100% cpu, to confirm it is Jetty, then if you look at the Jetty documentation on this bug, hopefully you may find you simply need to update Jetty.

Should the stats reported by Go's runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?

In Go Should the "Sys" stat or any other stat/combination reported by runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?
Alternatively, assuming some memory may be swapped out, should the Sys stat be approximately greater than or equal to the RSS?
We have a long-running web service that deals with a high frequency of requests and we are finding that the RSS quickly climbs up to consume virtually all of the 64GB memory on our servers. When it hits ~85% we begin to experience considerable degradation in our response times and in how many concurrent requests we can handle. The run I've listed below is after about 20 hours of execution, and is already at 51% memory usage.
I'm trying to determine if the likely cause is a memory leak (we make some calls to CGO). The data seems to indicate that it is, but before I go down that rabbit hole I want to rule out a fundamental misunderstanding of the statistics I'm using to make that call.
This is an amd64 build targeting linux and executing on CentOS.
Reported by runtime.ReadMemStats:
Alloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed
Sys: 3686471104 bytes (3515.69MB) // bytes obtained from system (sum of XxxSys below)
HeapAlloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed (same as Alloc above)
HeapSys: 3104931840 bytes (2961.09MB) // bytes obtained from system
HeapIdle: 1672339456 bytes (1594.87MB) // bytes in idle spans
HeapInuse: 1432592384 bytes (1366.23MB) // bytes in non-idle span
Reported by ps aux:
%CPU %MEM VSZ RSS
1362 51.3 306936436 33742120

Elasticsearch High CPU When Idle

I'm fairly new to Elasticsearch and I've bumped into an issue that I'm having difficulties in even troubleshooting. My Elasticsearch (1.1.1) is currently spiking the cpu even though no searching or indexing is going on. CPU usage isn't always at 100%, but it jumps up there quite a bit and load is very high.
Previously, the indices on this node ran perfectly fine for months without any issue. This just started today and I have no idea what's causing it.
The problem persists even after I restart ES and I even restarted the server in pure desperation. No effect on the issue.
Here are some stats to help frame the issue, but I'd imagine there's more information that's needed. I'm just not sure what to provide.
Elasticsearch 1.1.1
Gentoo Linux 3.12.13
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.7) (Gentoo build 1.6.0_27-b27)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
One node, 5 shards, 0 replicas
32GB RAM on system, 16GB Dedicated to Elasticsearch
RAM does not appear to be the issue here.
Any tips on troubleshooting the issue would be appreciated.
Edit: Info from top if it's helpful at all.
top - 19:56:56 up 3:22, 2 users, load average: 10.62, 11.15, 9.37
Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.5 us, 0.6 sy, 0.0 ni, 0.7 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32881532 total, 31714120 used, 1167412 free, 187744 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free, 12615280 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2531 elastic+ 20 0 0.385t 0.020t 3.388g S 791.9 64.9 706:00.21 java
As Andy Pryor mentioned, the background merging might have been what was causing the issue. Our index rollover had been paused and two of our current indices were over 200GB. Rolling them over appeared to have resolved the issue and we've been humming along just fine since.
Edit:
The high load when seemingly idle was determined to have been caused by merges on several very large indices that were not being rolled over on a weekly basis. This was a failure of an internal process to roll over indices on a weekly basis. After addressing this oversight the merge times were short and the high load subsided.

Phabricator extremely slow

I am using Phabricator for code reviews, and after tinkering with it, I have gotten it set up more or less exactly as I want.
I just have one problem, that I can't really find a solution to.
Navigating the phabricator app is smooth and has no delays. But when I write a comment (or chose any other action) in the Leap Into Action and press Clowncopterize it takes forever before it is done. The gears (busy indicator) in the lower right corner keep spinning for up to 60 seconds.
I can't figure out what the cause of this is. I have tried to do a top and I don't see anything severe:
top - 11:40:36 up 9 min, 1 user, load average: 0.04, 0.10, 0.07
Tasks: 112 total, 1 running, 111 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem: 2044148 total, 526580 used, 1517568 free, 36384 buffers
KiB Swap: 2093052 total, 0 used, 2093052 free, 257344 cached
There are no spikes when I press Clowncopterize either. I have made sure DNS is set up correctly. It wasn't to begin with, but is now. Even after a reboot, that didn't fix the performance problems.
The trouble was that sendmail was incorrectly set up. So it was waiting to time out on sending mails.

Resources