Is it possible to reclaim Private_Clean pages? - memory-management

I have a process that has to read a bunch of stuff from a mmap()ed file and then does memory intensive processing of some of this data (discarding the mmaped data as it gets processed). In my case, the mmaped file is from LMDB. After I mmap the file, I get something like this:
7fc32f29b000-7fc50c000000 rw-s 00000000 fc:02 75628978 /tmp/.tmp5SYf4y/data.mdb
Size: 7812500 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 285516 kB
Pss: 285516 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 285516 kB
Private_Dirty: 0 kB
Referenced: 285516 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 285516 kB
Let's say the process has only 300000kB of physical RAM, limited by cgroups (and may or may now have available swap).
I understand that since all memory pages are Locked (pinned), they cannot be swapped out. After mmap()ing the file (i.e. reading from LMDB), the process then starts allocating more memory, beyond (physical RAM - Shared_Clean) in the output above. Can these Shared_clean pages be evicted and reclaimed by the memory pressure of the new allocations from the same process?

Answering my own question: the answer is yes, clean mmaped pages are reclaimed when there's a memory pressure from within the same process.
To demonstrate this, I added a loop that allocates heap memory after the LMDB file has been mmaped. I ran the process under a cgroup limiting the RSS by the mmaped file size. Then /proc/<pid>/smaps shows that the Clean_Pages RSS of the mmaped file starts dropping when the program goes into heap allocation loop, and the heap RSS / pages start growing correspondingly.

Related

Possible eager mmap page eviction MacOS

I have a program which accesses a large memory block allocated with mmap. It accesses it unevenly, mostly accessing the first ~1 GB on memory, sometimes the next ~2 GB of memory, and rarely the last ~4 GB of memory. The memory is a shared memory mapping with PROT_READ and PROT_WRITE backed by an unlinked file.
Compared to the Linux version, I've found the MacOS version is exceedingly slow. Yet, the memory pressure is low. (6.42 Used, 9.51 Cached.)
The following usage statistics originate from activity monitor:
"Memory": 1.17 GB
Real memory Size: 3.71 GB
Virtual Memory Size: 51.15 GB
Shared Memory Size: 440 KB
Private Memory Size: 3.74 GB
Why is this? Is there anyway to improve caching behavior?

Impala - out of memory exception. Slow queries

Can someone help me. I'm running a cluster of 5 Impala-Nodes for my Api. Now I get a lot of 'out of memory' Exceptions when I run queries.
Failed to get minimum memory reservation of 3.94 MB on daemon r5c3s4.colo.vm:22000 for query 924d155863398f6b:c4a3470300000000 because it would exceed an applicable memory limit. Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error. Memory usage:
, node=[r4c3s2]
Process: Limit=55.00 GB Total=49.79 GB Peak=49.92 GB, node=[r4c3s2]
Buffer Pool: Free Buffers: Total=0, node=[r4c3s2]
Buffer Pool: Clean Pages: Total=4.21 GB, node=[r4c3s2]
Buffer Pool: Unused Reservation: Total=-4.21 GB, node=[r4c3s2]
Free Disk IO Buffers: Total=1.19 GB Peak=1.52 GB, node=[r4c3s2]
However, it says there are just used 23.83 GB of 150.00 GB. Also the queries became really slow. This problem occurred out of nowhere. Does anyone have an explanation for that?
Here are all memory infromation I got from the "/memz?detailed=true" page of one node:
Memory Usage
Memory consumption / limit: 23.83 GB / 150.00 GB
Breakdown`enter code here`
Process: Limit=150.00 GB Total=23.83 GB Peak=58.75 GB
Buffer Pool: Free Buffers: Total=72.69 MB
Buffer Pool: Clean Pages: Total=0
Buffer Pool: Unused Reservation: Total=-71.94 MB
Free Disk IO Buffers: Total=1.61 GB Peak=1.67 GB
RequestPool=root.default: Total=20.77 GB Peak=59.92 GB
Query(2647a4f63d37fdaa:690ad3b500000000): Reservation=20.67 GB ReservationLimit=120.00 GB OtherMemory=101.21 MB Total=20.77 GB Peak=20.77 GB
Unclaimed reservations: Reservation=71.94 MB OtherMemory=0 Total=71.94 MB Peak=139.94 MB
Fragment 2647a4f63d37fdaa:690ad3b50000001c: Reservation=0 OtherMemory=114.48 KB Total=114.48 KB Peak=855.48 KB
AGGREGATION_NODE (id=9): Total=102.12 KB Peak=102.12 KB
Exprs: Total=102.12 KB Peak=102.12 KB
EXCHANGE_NODE (id=8): Total=0 Peak=0
DataStreamRecvr: Total=0 Peak=0
DataStreamSender (dst_id=10): Total=872.00 B Peak=872.00 B
CodeGen: Total=3.50 KB Peak=744.50 KB
Fragment 2647a4f63d37fdaa:690ad3b500000014: Reservation=0 OtherMemory=243.31 KB Total=243.31 KB Peak=1.57 MB
AGGREGATION_NODE (id=3): Total=102.12 KB Peak=102.12 KB
Exprs: Total=102.12 KB Peak=102.12 KB
AGGREGATION_NODE (id=7): Total=119.12 KB Peak=119.12 KB
Exprs: Total=119.12 KB Peak=119.12 KB
EXCHANGE_NODE (id=6): Total=0 Peak=0
DataStreamRecvr: Total=0 Peak=0
DataStreamSender (dst_id=8): Total=6.81 KB Peak=6.81 KB
CodeGen: Total=7.25 KB Peak=1.34 MB
Fragment 2647a4f63d37fdaa:690ad3b50000000c: Reservation=2.32 GB OtherMemory=349.48 KB Total=2.32 GB Peak=2.32 GB
AGGREGATION_NODE (id=2): Total=119.12 KB Peak=119.12 KB
Exprs: Total=119.12 KB Peak=119.12 KB
AGGREGATION_NODE (id=5): Reservation=2.32 GB OtherMemory=199.74 KB Total=2.32 GB Peak=2.32 GB
Exprs: Total=120.12 KB Peak=120.12 KB
EXCHANGE_NODE (id=4): Total=0 Peak=0
DataStreamRecvr: Total=336.00 B Peak=549.14 KB
DataStreamSender (dst_id=6): Total=6.44 KB Peak=6.44 KB
CodeGen: Total=15.85 KB Peak=3.10 MB
Fragment 2647a4f63d37fdaa:690ad3b500000004: Reservation=18.29 GB OtherMemory=100.52 MB Total=18.38 GB Peak=18.38 GB
AGGREGATION_NODE (id=1): Reservation=18.29 GB OtherMemory=334.12 KB Total=18.29 GB Peak=18.29 GB
Exprs: Total=148.12 KB Peak=148.12 KB
HDFS_SCAN_NODE (id=0): Total=100.17 MB Peak=178.15 MB
Exprs: Total=4.00 KB Peak=4.00 KB
DataStreamSender (dst_id=4): Total=6.75 KB Peak=6.75 KB
CodeGen: Total=9.72 KB Peak=2.92 MB
RequestPool=fe-eval-exprs: Total=0 Peak=12.00 KB
Untracked Memory: Total=1.44 GB
tcmalloc
------------------------------------------------
MALLOC: 24646559936 (23504.8 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 725840992 ( 692.2 MiB) Bytes in central cache freelist
MALLOC: + 4726720 ( 4.5 MiB) Bytes in transfer cache freelist
MALLOC: + 208077600 ( 198.4 MiB) Bytes in thread cache freelists
MALLOC: + 105918656 ( 101.0 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 25691123904 (24501.0 MiB) Actual memory used (physical + swap)
MALLOC: + 53904392192 (51407.2 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 79595516096 (75908.2 MiB) Virtual address space used
MALLOC:
MALLOC: 133041 Spans in use
MALLOC: 842 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
System
Physical Memory: 252.41 GB
Transparent Huge Pages Config:
enabled: always [madvise] never
defrag: [always] madvise never
khugepaged defrag: 1
Process and system memory metrics
Name Value Description
memory.anon-huge-page-bytes 19.01 GB Total bytes of anonymous (a.k.a. transparent) huge pages used by this process.
memory.mapped-bytes 113.09 GB Total bytes of memory mappings in this process (the virtual memory size).
memory.num-maps 18092 Total number of memory mappings in this process.
memory.rss 24.51 GB Resident set size (RSS) of this process, including TCMalloc, buffer pool and Jvm.
memory.thp.defrag [always] madvise never The system-wide 'defrag' setting for Transparent Huge Pages.
memory.thp.enabled always [madvise] never The system-wide 'enabled' setting for Transparent Huge Pages.
memory.thp.khugepaged-defrag 1 The system-wide 'defrag' setting for khugepaged.
memory.total-used 23.83 GB Total memory currently used by TCMalloc and buffer pool.
Buffer pool memory metrics
Name Value Description
buffer-pool.clean-page-bytes 0 Total bytes of clean page memory cached in the buffer pool.
buffer-pool.clean-pages 0 Total number of clean pages cached in the buffer pool.
buffer-pool.clean-pages-limit 12.00 GB Limit on number of clean pages cached in the buffer pool.
buffer-pool.free-buffer-bytes 72.69 MB Total bytes of free buffer memory cached in the buffer pool.
buffer-pool.free-buffers 177 Total number of free buffers cached in the buffer pool.
buffer-pool.limit 120.00 GB Maximum allowed bytes allocated by the buffer pool.
buffer-pool.reserved 20.67 GB Total bytes of buffers reserved by Impala subsystems
buffer-pool.system-allocated 20.67 GB Total buffer memory currently allocated by the buffer pool.
buffer-pool.unused-reservation-bytes 71.94 MB Total bytes of buffer reservations by Impala subsystems that are currently unused
JVM aggregate memory metrics
Name Value Description
jvm.total.committed-usage-bytes 1.45 GB Jvm total Committed Usage Bytes
jvm.total.current-usage-bytes 903.10 MB Jvm total Current Usage Bytes
jvm.total.init-usage-bytes 1.92 GB Jvm total Init Usage Bytes
jvm.total.max-usage-bytes 31.23 GB Jvm total Max Usage Bytes
jvm.total.peak-committed-usage-bytes 2.09 GB Jvm total Peak Committed Usage Bytes
jvm.total.peak-current-usage-bytes 1.48 GB Jvm total Peak Current Usage Bytes
jvm.total.peak-init-usage-bytes 1.92 GB Jvm total Peak Init Usage Bytes
jvm.total.peak-max-usage-bytes 31.41 GB Jvm total Peak Max Usage Bytes
JVM heap memory metrics
Name Value Description
jvm.heap.committed-usage-bytes 1.37 GB Jvm heap Committed Usage Bytes
jvm.heap.current-usage-bytes 827.25 MB Jvm heap Current Usage Bytes
jvm.heap.init-usage-bytes 2.00 GB Jvm heap Init Usage Bytes
jvm.heap.max-usage-bytes 26.67 GB Jvm heap Max Usage Bytes
jvm.heap.peak-committed-usage-bytes 0 Jvm heap Peak Committed Usage Bytes
jvm.heap.peak-current-usage-bytes 0 Jvm heap Peak Current Usage Bytes
jvm.heap.peak-init-usage-bytes 0 Jvm heap Peak Init Usage Bytes
jvm.heap.peak-max-usage-bytes 0 Jvm heap Peak Max Usage Bytes
JVM non-heap memory metrics
Name Value Description
jvm.non-heap.committed-usage-bytes 76.90 MB Jvm non-heap Committed Usage Bytes
jvm.non-heap.current-usage-bytes 75.68 MB Jvm non-heap Current Usage Bytes
jvm.non-heap.init-usage-bytes 2.44 MB Jvm non-heap Init Usage Bytes
jvm.non-heap.max-usage-bytes -1.00 B Jvm non-heap Max Usage Bytes
jvm.non-heap.peak-committed-usage-bytes 0 Jvm non-heap Peak Committed Usage Bytes
jvm.non-heap.peak-current-usage-bytes 0 Jvm non-heap Peak Current Usage Bytes
jvm.non-heap.peak-init-usage-bytes 0 Jvm non-heap Peak Init Usage Bytes
jvm.non-heap.peak-max-usage-bytes 0 Jvm non-heap Peak Max Usage Bytes
Process: Limit=150.00 GB Total=23.83 GB Peak=58.75 GB
this caused by memory limit Memorylimit exceeded
change those setting memory.soft_limit_in_bytes ,memory.limit_in_bytes mem_limit ,default_pool_mem_limit value to 0 or -1
1 or 0 represents unlimited .

Should the stats reported by Go's runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?

In Go Should the "Sys" stat or any other stat/combination reported by runtime.ReadMemStats approximately equal the resident memory set reported by ps aux?
Alternatively, assuming some memory may be swapped out, should the Sys stat be approximately greater than or equal to the RSS?
We have a long-running web service that deals with a high frequency of requests and we are finding that the RSS quickly climbs up to consume virtually all of the 64GB memory on our servers. When it hits ~85% we begin to experience considerable degradation in our response times and in how many concurrent requests we can handle. The run I've listed below is after about 20 hours of execution, and is already at 51% memory usage.
I'm trying to determine if the likely cause is a memory leak (we make some calls to CGO). The data seems to indicate that it is, but before I go down that rabbit hole I want to rule out a fundamental misunderstanding of the statistics I'm using to make that call.
This is an amd64 build targeting linux and executing on CentOS.
Reported by runtime.ReadMemStats:
Alloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed
Sys: 3686471104 bytes (3515.69MB) // bytes obtained from system (sum of XxxSys below)
HeapAlloc: 1294777080 bytes (1234.80MB) // bytes allocated and not yet freed (same as Alloc above)
HeapSys: 3104931840 bytes (2961.09MB) // bytes obtained from system
HeapIdle: 1672339456 bytes (1594.87MB) // bytes in idle spans
HeapInuse: 1432592384 bytes (1366.23MB) // bytes in non-idle span
Reported by ps aux:
%CPU %MEM VSZ RSS
1362 51.3 306936436 33742120

Why doesn't this Ruby program return off heap memory to the operating system?

I am trying to understand when memory allocated off the Ruby heap gets returned to the operating system. I understand that Ruby never returns memory allocated to it's heap but I am still not sure about the behaviour of off heap memory. i.e. those objects that don't fit into a 40 byte RVALUE.
Consider the following program that allocates some large strings and then forces a major GC.
require 'objspace'
STRING_SIZE = 250
def print_stats(msg)
puts '-------------------'
puts msg
puts '-------------------'
puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1,"KB";}'`}"
puts "HEAP SIZE: #{(GC.stat[:heap_sorted_length] * 408 * 40)/1024} KB"
puts "SIZE OF ALL OBJECTS: #{ObjectSpace.memsize_of_all/1024} KB"
end
def run
print_stats('START WORK')
#data=[]
600_000.times do
#data << " " * STRING_SIZE
end
print_stats('END WORK')
#data=nil
end
run
GC.start
print_stats('AFTER FORCED MAJOR GC')
Running this program with Ruby 2.2.3 on MRI it produces the following output. After a forced major GC, the heap size is as expected but RSS has not decreased significantly.
-------------------
START WORK
-------------------
RSS: 7036 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3172 KB
-------------------
END WORK
-------------------
RSS: 205660 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 178423 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 164492 KB
HEAP SIZE: 35046 KB
SIZE OF ALL OBJECTS: 2484 KB
Compare these results to the following results when we allocate one large object instead of many smaller objects.
def run
print_stats('START WORK')
#data = " " * STRING_SIZE * 600_000
print_stats('END WORK')
#data=nil
end
-------------------
START WORK
-------------------
RSS: 7072 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 3170 KB
-------------------
END WORK
-------------------
RSS: 153584 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 149064 KB
-------------------
AFTER FORCED MAJOR GC
-------------------
RSS: 7096 KB
HEAP SIZE: 1195 KB
SIZE OF ALL OBJECTS: 2483 KB
Note the final RSS value. We seem to have freed all the memory we allocated for the big string.
I am not sure why the second example releases the memory but the first example doesn't as they are both allocating memory off the Ruby heap. This is one reference that could provide an explanation but I would be interested in explanations from others.
Releasing memory back to the kernel also has a cost. User space memory
allocators may hold onto that memory (privately) in the hope it can be
reused within the same process and not give it back to the kernel for
use in other processes.
#joanbm has a very good point here. His referenced article explains this pretty well :
Ruby's GC releases memory gradually, so when you do GC on 1 big chunk of memory pointed by 1 reference it releases it all, but when there is a lot of references, the GC will releases memory in smaller chuncks.
Several calls to GC.start will release more and more memory in the 1st example.
Here are 2 orther articles to dig deeper :
http://thorstenball.com/blog/2014/03/12/watching-understanding-ruby-2.1-garbage-collector/
https://samsaffron.com/archive/2013/11/22/demystifying-the-ruby-gc

Windows program has big native heap, much larger than all allocations

We are running a mixed mode process (managed + unmanaged) on Win 7 64 bit.
Our process is using up too much memory (especially VM). Based on our analysis, the majority of the memory is used by a big native heap. Our theory is that the LFH is saving too many free blocks in committed memory, for future allocations. They sum to about 1.2 GB while our actual allocated native memory is only at most 0.6 GB.
These numbers are from a test run of the process. In production it sometimes exceeded 10 GB of VM - with maybe 6 GB unaccounted for by known allocations.
We'd like to know if this theory of excessive committed-but-free-for-allocation segments is true, and how this waste can be reduced.
Here's the details of our analysis.
First we needed to figure out what's allocated and rule out memory leaks. We ran the excellent Heap Inspector by Jelle van der Beek and we ruled out a leak and established that the known allocations are at most 0.6 deci-GB.
We took a full memory dump and opened in WinDbg.
Ran !heap -stat
It reports a big native heap with 1.83 deci-GB committed memory. Much more than the sum of our allocations!
_HEAP 000000001b480000
Segments 00000078
Reserved bytes 0000000072980000
Committed bytes 000000006d597000
VirtAllocBlocks 0000001e
VirtAlloc bytes 0000000eb7a60118
Then we ran !heap -stat -h 0000001b480000
heap # 000000001b480000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
c0000 12 - d80000 (10.54)
b0000 d - 8f0000 (6.98)
e0000 a - 8c0000 (6.83)
...
If we add up all the 20 reported items, they add up to 85 deci-MB - much less than the 1.79 deci-GB we're looking for.
We ran !heap -h 1b480000
...
Flags: 00001002
ForceFlags: 00000000
Granularity: 16 bytes
Segment Reserve: 72a70000
Segment Commit: 00002000
DeCommit Block Thres: 00000400
DeCommit Total Thres: 00001000
Total Free Size: 013b60f1
Max. Allocation Size: 000007fffffdefff
Lock Variable at: 000000001b480208
Next TagIndex: 0000
Maximum TagIndex: 0000
Tag Entries: 00000000
PsuedoTag Entries: 00000000
Virtual Alloc List: 1b480118
Unable to read nt!_HEAP_VIRTUAL_ALLOC_ENTRY structure at 000000002acf0000
Uncommitted ranges: 1b4800f8
FreeList[ 00 ] at 000000001b480158: 00000000be940080 . 0000000085828c60 (9451 blocks)
...
When adding up up all the segment sizes in the report, we get:
Total Size = 1.83 deci-GB
Segments Marked Busy Size = 1.50 deci-GB
Segments Marked Busy and Internal Size = 1.37 deci-GB
So all the committed bytes in this report do add up to the total commit size. We grouped on block size and the most heavy allocations come from blocks of size 0x3fff0. These don't correspond to allocations that we know of. There were also mystery blocks of other sizes.
We ran !heap -p -all. This reports the LFH internal segments but we don't understand it fully. Those 3fff0 sized blocks in the previous report appear in the LFH report with an asterisk mark and are sometimes Busy and sometimes Free. Then inside them we see many smaller free blocks.
We guess these free blocks are legitimate. They are committed VM that the LFH reserves for future allocations. But why is their total size so much greater than sum of memory allocations, and can this be reduced?
Well, I can sort of answer my own question.
We had been doing lots and lots of tiny allocations and deallocations in our program. There was no leak, but it seems this somehow created a fragmentation of some sort. After consolidating and eliminating most of these allocations our software is running much better and using less peak memory. It is still a mystery why the peak committed memory was so much higher than the peak actually-used memory.

Resources