Go: memory issues - go

I need your wisdom.
I have a huge daemon written in Go. Some time ago a user reported that there might be a memory leak somewhere in the code.
I started investigating the issue. When primary code inspection didn't lead me to any clues about the nature of this leakage, I tried to focus on how my process works.
My idea was simple: if I failed to remove references to certain objects, my heap should be constantly growing. I wrote the following procedure to monitor heap:
func PrintHeap() {
ticker := time.NewTicker(time.Second * 5)
for {
<-ticker.C
st := &runtime.MemStats{}
runtime.ReadMemStats(st)
// From Golang docs: HeapObjects increases as objects are allocated
// and decreases as the heap is swept and unreachable objects are
// freed.
fmt.Println("Heap allocs:", st.Mallocs, "Heap frees:",
st.Frees, "Heap objects:", st.HeapObjects)
}
}
This procedure prints some info about heap each 5 seconds, including the number of objects currently allocated.
Now a few words about what the daemon does. It processes lines from some UDP input. Each line bears some info about a certain HTTP request and is parsed into a typical Go struct. This struct has some numeric and string fields, including one for request path. Then lots of things happens to this struct, but those things are irrelevant here.
Now, I set the input rate to 1500 lines per second, each line being rather short (you may read this as: with standard request path, /).
After running the application I could see that heap size stabilizes at some point of time:
Heap allocs: 180301314 Heap frees: 175991675 Heap objects: 4309639
Heap allocs: 180417372 Heap frees: 176071946 Heap objects: 4345426
Heap allocs: 180526254 Heap frees: 176216276 Heap objects: 4309978
Heap allocs: 182406470 Heap frees: 177496675 Heap objects: 4909795
Heap allocs: 183190214 Heap frees: 178248365 Heap objects: 4941849
Heap allocs: 183302680 Heap frees: 178958823 Heap objects: 4343857
Heap allocs: 183412388 Heap frees: 179101276 Heap objects: 4311112
Heap allocs: 183528654 Heap frees: 179181897 Heap objects: 4346757
Heap allocs: 183638282 Heap frees: 179327221 Heap objects: 4311061
Heap allocs: 185609758 Heap frees: 181330408 Heap objects: 4279350
When this state was reached, memory consumption stopped to grow.
Now, I changed my input in such a way that each line became more than 2k chars long (with a huge /AAAAA... request path), and that's where weird things started to happen.
Heap size grew drastically, but still became sort of stable after some time:
Heap allocs: 18353000513 Heap frees: 18335783660 Heap objects: 17216853
Heap allocs: 18353108590 Heap frees: 18335797883 Heap objects: 17310707
Heap allocs: 18355134995 Heap frees: 18336081878 Heap objects: 19053117
Heap allocs: 18356826170 Heap frees: 18336182205 Heap objects: 20643965
Heap allocs: 18366029630 Heap frees: 18336925394 Heap objects: 29104236
Heap allocs: 18366122614 Heap frees: 18336937295 Heap objects: 29185319
Heap allocs: 18367840866 Heap frees: 18337205638 Heap objects: 30635228
Heap allocs: 18368909002 Heap frees: 18337309215 Heap objects: 31599787
Heap allocs: 18369628204 Heap frees: 18337362196 Heap objects: 32266008
Heap allocs: 18373482440 Heap frees: 18358282964 Heap objects: 15199476
Heap allocs: 18374488754 Heap frees: 18358330954 Heap objects: 16157800
But memory consumption grew liearly and never stopped. My question is: any ideas about what's going on?
I thought about memory fragmentation due to lots of huge objects, but actually I don't really know what to think.

You could try the go memory profiling tools.
First you need to alter your program so it provides a memory profile. There are several ways to this.
You can use package net/http/pprof see https://golang.org/pkg/net/http/pprof/ if it is ok for you to publish that endpoint.
You can use package runtime/pprof and make your program dump a memory profile to a known location in response to specific events like receiving a signal or something.
After that you can analyse the memory profile using the go tool pprof
which you can either invoke as go tool pprof <path/to/executable> <file> if you chose to dump a memory profile to a file or as go tool pprof <path/to/executable> http://<host>:<port>/debug/pprof/heap if you used net/http/pprof and use top5 to get the top 5 functions, which allocated most of your memory. You can than use the list command for specific functions to see which lines allocated how much memory.
Starting from that, you should be able to reason about the increase in memory
you are observing.
You can also read about this at https://blog.golang.org/profiling-go-programs which also describes how to profile your cpu usage. Just search for the word memprofile to jump to the relevant parts.

Related

CMS class unloading took much time

On large load was noticed large GC pause(400ms) for our application. During investigation it turns out, that pause happens on CMS Final Remark and class unloading phase took a lot more time, than other phases(10x-100X):
(CMS Final Remark)
[YG occupancy: 142247 K (294912 K)]
2019-03-13T07:38:30.656-0700: 24252.576:
[Rescan (parallel) , 0.0216770 secs]
2019-03-13T07:38:30.677-0700: 24252.598:
[weak refs processing, 0.0028353 secs]
2019-03-13T07:38:30.680-0700: 24252.601:
[class unloading, 0.3232543 secs]
2019-03-13T07:38:31.004-0700: 24252.924:
[scrub symbol table, 0.0371301 secs]
2019-03-13T07:38:31.041-0700: 24252.961:
[scrub string table, 0.0126352 secs]
[1 CMS-remark: 2062947K(4792320K)] 2205195K(5087232K), 0.3986822 secs]
[Times: user=0.63 sys=0.01, real=0.40 secs]
Total time for which application threads were stopped: 0.4156259 seconds,
Stopping threads took: 0.0014133 seconds
This pause always happens in first second of performance test, the duration of pause varies from 300ms to 400+ms.
Unfortunately, I have no access to server(it's under maintenance) and have only logs for several test runs. But when the server would available, I want to be prepared for further investigation, but I have no idea of what causes such a behavior.
My first thought was about Linux Huge pages, but we don't use them.
After more time in logs, I found following:
Heap after GC invocations=7969 (full 511):
par new generation total 294912K, used 23686K [0x0000000687800000, 0x000000069b800000, 0x000000069b800000)
eden space 262144K, 0% used [0x0000000687800000, 0x0000000687800000, 0x0000000697800000)
from space 32768K, 72% used [0x0000000699800000, 0x000000069af219b8, 0x000000069b800000)
to space 32768K, 0% used [0x0000000697800000, 0x0000000697800000, 0x0000000699800000)
concurrent mark-sweep generation total 4792320K, used 2062947K [0x000000069b800000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 282286K, capacity 297017K, committed 309256K, reserved 1320960K
class space used 33038K, capacity 36852K, committed 38872K, reserved 1048576K
}
Heap after GC invocations=7970 (full 511):
par new generation total 294912K, used 27099K [0x0000000687800000, 0x000000069b800000, 0x000000069b800000)
eden space 262144K, 0% used [0x0000000687800000, 0x0000000687800000, 0x0000000697800000)
from space 32768K, 82% used [0x0000000697800000, 0x0000000699276df0, 0x0000000699800000)
to space 32768K, 0% used [0x0000000699800000, 0x0000000699800000, 0x000000069b800000)
concurrent mark-sweep generation total 4792320K, used 2066069K [0x000000069b800000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 282303K, capacity 297017K, committed 309256K, reserved 1320960K
class space used 33038K, capacity 36852K, committed 38872K, reserved 1048576K
}
Investigating GC pause happens between GC invocations 7969 and 7970. And the amount of used space in meta space is almost the same(it's actually increased)
So, It looks like it's not actually some stall classes which aren't used anymore(since no space was cleared) and it's not safe point reaching issue - since blocking of threads took small time(0.0014133).
How to investigate such case and what diagnostic information is required for proper preparedness.
Technical details
Centos5 + JDK8 + CMS GC with args:
-XX:+CMSClassUnloadingEnabled
-XX:CMSIncrementalDutyCycleMin=10
-XX:+CMSIncrementalPacing
-XX:CMSInitiatingOccupancyFraction=50
-XX:+CMSParallelRemarkEnabled
-XX:+DisableExplicitGC
-XX:InitialHeapSize=5242880000
-XX:MaxHeapSize=5242880000
-XX:MaxNewSize=335544320
-XX:MaxTenuringThreshold=6
-XX:NewSize=335544320
-XX:OldPLABSize=16
-XX:+UseCompressedClassPointers
-XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC

Windows program has big native heap, much larger than all allocations

We are running a mixed mode process (managed + unmanaged) on Win 7 64 bit.
Our process is using up too much memory (especially VM). Based on our analysis, the majority of the memory is used by a big native heap. Our theory is that the LFH is saving too many free blocks in committed memory, for future allocations. They sum to about 1.2 GB while our actual allocated native memory is only at most 0.6 GB.
These numbers are from a test run of the process. In production it sometimes exceeded 10 GB of VM - with maybe 6 GB unaccounted for by known allocations.
We'd like to know if this theory of excessive committed-but-free-for-allocation segments is true, and how this waste can be reduced.
Here's the details of our analysis.
First we needed to figure out what's allocated and rule out memory leaks. We ran the excellent Heap Inspector by Jelle van der Beek and we ruled out a leak and established that the known allocations are at most 0.6 deci-GB.
We took a full memory dump and opened in WinDbg.
Ran !heap -stat
It reports a big native heap with 1.83 deci-GB committed memory. Much more than the sum of our allocations!
_HEAP 000000001b480000
Segments 00000078
Reserved bytes 0000000072980000
Committed bytes 000000006d597000
VirtAllocBlocks 0000001e
VirtAlloc bytes 0000000eb7a60118
Then we ran !heap -stat -h 0000001b480000
heap # 000000001b480000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
c0000 12 - d80000 (10.54)
b0000 d - 8f0000 (6.98)
e0000 a - 8c0000 (6.83)
...
If we add up all the 20 reported items, they add up to 85 deci-MB - much less than the 1.79 deci-GB we're looking for.
We ran !heap -h 1b480000
...
Flags: 00001002
ForceFlags: 00000000
Granularity: 16 bytes
Segment Reserve: 72a70000
Segment Commit: 00002000
DeCommit Block Thres: 00000400
DeCommit Total Thres: 00001000
Total Free Size: 013b60f1
Max. Allocation Size: 000007fffffdefff
Lock Variable at: 000000001b480208
Next TagIndex: 0000
Maximum TagIndex: 0000
Tag Entries: 00000000
PsuedoTag Entries: 00000000
Virtual Alloc List: 1b480118
Unable to read nt!_HEAP_VIRTUAL_ALLOC_ENTRY structure at 000000002acf0000
Uncommitted ranges: 1b4800f8
FreeList[ 00 ] at 000000001b480158: 00000000be940080 . 0000000085828c60 (9451 blocks)
...
When adding up up all the segment sizes in the report, we get:
Total Size = 1.83 deci-GB
Segments Marked Busy Size = 1.50 deci-GB
Segments Marked Busy and Internal Size = 1.37 deci-GB
So all the committed bytes in this report do add up to the total commit size. We grouped on block size and the most heavy allocations come from blocks of size 0x3fff0. These don't correspond to allocations that we know of. There were also mystery blocks of other sizes.
We ran !heap -p -all. This reports the LFH internal segments but we don't understand it fully. Those 3fff0 sized blocks in the previous report appear in the LFH report with an asterisk mark and are sometimes Busy and sometimes Free. Then inside them we see many smaller free blocks.
We guess these free blocks are legitimate. They are committed VM that the LFH reserves for future allocations. But why is their total size so much greater than sum of memory allocations, and can this be reduced?
Well, I can sort of answer my own question.
We had been doing lots and lots of tiny allocations and deallocations in our program. There was no leak, but it seems this somehow created a fragmentation of some sort. After consolidating and eliminating most of these allocations our software is running much better and using less peak memory. It is still a mystery why the peak committed memory was so much higher than the peak actually-used memory.

Where is the heap?

I understand that in Linux the mm_struct describes the memory layout of a process. I also understand that the start_brk and brk mark the start and end of the heap section of a process respectively.
Now, this is my problem: I have a process, for which I wrote the source code, that allocates 5.25 GB of heap memory using malloc. However, when I examine the process's mm_sruct using a kernel module I find the value of is equal to 135168. And this is different from what I expected: I expected brk - start_brk to be equal slight above 5.25 GB.
So, what is going on here?
Thanks.
I notice the following in the manpage for malloc(3):
Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3). Allocations performed using mmap(2) are unaffected by the RLIMIT_DATA resource limit (see getrlimit(2)).
So it sounds like mmap is used instead of the heap.

What are reasons for "Cannot allocate memory" except of exceeding address space and memory fragmentation?

The problem is that in a 32-bit application on Mac OS X I receive an error
malloc: *** mmap(size=49721344) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
For the reference error code is in sys/errno.h:
#define ENOMEM 12 /* Cannot allocate memory */
The memory allocation pattern is like this:
First is allocated nearly 250MB of memory
Allocate 6 blocks of 32MB
Then 27 images each handled like this
Allocate 16MB (image bitmap is loaded)
Allocate 32MB, process it, free these 32MB
Again allocate 32MB, process it, free these 32MB
Free 16MB allocated in step 3.1
Free 4 blocks allocated in step 2 (2 blocks are still used)
Free 250MB block allocated in step 1
Allocate blocks of various size, total size doesn't exceed 250MB. And here I receive the mentioned memory allocation error
I've checked that none of these memory blocks is leaked, so I guess used memory at any given time stays below 1GB, which should be accessible on 32-bit system.
The second guess was memory fragmentation. But I've checked that all block in step 3 reuse same addresses. So I touch less than 1GB of memory - memory fragmentation should not be an issue.
Now I am completely lost what can be a reason for not allocating memory. Also everything works OK when I process less than 27 images. Here is part of heap command result before step 6 for 26 images:
Process 1230: 4 zones
Zone DefaultMallocZone_0x273000: Overall size: 175627KB; 29620 nodes malloced for 68559KB (39% of capacity); largest unused: [0x6f800000-8191KB]
Zone DispatchContinuations_0x292000: Overall size: 4096KB; 1 nodes malloced for 1KB (0% of capacity); largest unused: [0x2600000-1023KB]
Zone QuartzCore_0x884400: Overall size: 232KB; 7039 nodes malloced for 132KB (56% of capacity); largest unused: [0x3778ca0-8KB]
Zone DefaultPurgeableMallocZone_0x27f2000: Overall size: 4KB; 0 nodes malloced for 0KB (0% of capacity); largest unused: [0x3723000-4KB]
All zones: 36660 nodes malloced - 68691KB
And for 27 images:
Process 1212: 4 zones
Zone DefaultMallocZone_0x273000: Overall size: 167435KB; 30301 nodes malloced for 68681KB (41% of capacity); largest unused: [0x6ea51000-32372KB]
Zone DispatchContinuations_0x292000: Overall size: 4096KB; 1 nodes malloced for 1KB (0% of capacity); largest unused: [0x500000-1023KB]
Zone QuartzCore_0x106b000: Overall size: 192KB; 5331 nodes malloced for 101KB (52% of capacity); largest unused: [0x37f2f98-8KB]
Zone DefaultPurgeableMallocZone_0x30f8000: Overall size: 4KB; 0 nodes malloced for 0KB (0% of capacity); largest unused: [0x368f000-4KB]
All zones: 35633 nodes malloced - 68782KB
So what are other reasons for "Cannot allocate memory" and how can I diagnose them? Or probably I made a mistake ruling out mentioned reasons, then how can I check them again?
Turned out I've made a mistake checking that address space is not exhausted. Instead of using heap command I should have used vmmap. vmmap revealed that most of the memory is used by images mapped into the memory.

managed heap fragmentation

I am trying to understand how heap fragmenation works. What does the following output tell me?
Is this heap overly fragmented?
I have 243010 "free objects" with a total of 53304764 bytes. Are those "free object" spaces in the heap that once contained object but that are now garabage collected?
How can I force a fragmented heap to clean up?
!dumpheap -type Free -stat
total 243233 objects
Statistics:
MT Count TotalSize Class Name
0017d8b0 243010 53304764 Free
It depends on how your heap is organized. You should have a look at how much memory in Gen 0,1,2 is allocated and how much free memory you have there compared to the total used memory.
If you have 500 MB managed heap used but and 50 MB is free then you are doing pretty well. If you do memory intensive operations like creating many WPF controls and releasing them you need a lot more memory for a short time but .NET does not give the memory back to the OS once you allocated it. The GC tries to recognize allocation patterns and tends to keep your memory footprint high although your current heap size is way too big until your machine is running low on physical memory.
I found it much easier to use psscor2 for .NET 3.5 which has some cool commands like ListNearObj where you can find out which objects are around your memory holes (pinned objects?). With the commands from psscor2 you have much better chances to find out what is really going on in your heaps. Most commands are also available in SOS.dll in .NET 4 as well.
To answer your original question: Yes free objects are gaps on the managed heap which can simply be the free memory block after your last allocated object on a GC segement. Or if you do !DumpHeap with the start address of a GC segment you see the objects allocated in that managed heap segment along with your free objects which are GC collected objects.
This memory holes do normally happen in Gen2. The object addresses before and after the free object do tell you what potentially pinned objects are around your hole. From this you should be able to determine your allocation history and optimize it if you need to.
You can find the addresses of the GC Heaps with
0:021> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x101da9cc
generation 1 starts at 0x10061000
generation 2 starts at 0x02aa1000
ephemeral segment allocation context: none
segment begin allocated size
02aa0000 02aa1000** 03836a30 0xd95a30(14244400)
10060000 10061000** 103b8ff4 0x357ff4(3506164)
Large object heap starts at 0x03aa1000
segment begin allocated size
03aa0000 03aa1000 03b096f8 0x686f8(427768)
Total Size: Size: 0x115611c (18178332) bytes.
------------------------------
GC Heap Size: Size: 0x115611c (18178332) bytes.
There you see that you have heaps at 02aa1000 and 10061000.
With !DumpHeap 02aa1000 03836a30 you can dump the GC Heap segment.
!DumpHeap 02aa1000 03836a30
Address MT Size
...
037b7b88 5b408350 56
037b7bc0 60876d60 32
037b7be0 5b40838c 20
037b7bf4 5b408350 56
037b7c2c 5b408728 20
037b7c40 5fe4506c 16
037b7c50 60876d60 32
037b7c70 5b408728 20
037b7c84 5fe4506c 16
037b7c94 00135de8 519112 Free
0383685c 5b408728 20
03836870 5fe4506c 16
03836880 608c55b4 96
....
There you find your free memory blocks which was an object which was already GCed. You can dump the surrounding objects (the output is sorted address wise) to find out if they are pinned or have other unusual properties.
You have 50MB of RAM as Free space. This is not good.
Having .NET allocating blocks of 16MB from process, we have a fragmentation issue indeed.
There are plenty of reasons to fragmentation to occure in .NET.
Have a look here and here.
In your case it is possibly a pinning. As 53304764 / 243010 makes 219.35 bytes per object - much lower then LOH objects.

Resources