How to measure the performance of the Erlang Garbage Collector? - memory-management

I have started programming in Erlang recently and there are a few things I want to understand regarding garbage collection (GC). As far as I understand, there is a generational GC for the private heap of each process and a reference counting GC for the global shared heap.
What I would like to know is if there is anyway to get:
How many number of collection cycles?
How many bytes are allocated and deallocated, on a global level or process level?
What are the private heaps, and shared heap sizes? And can we define this as a GC parameter?
How long does it take to collect garbage? The % of time needed?
Is there a way to run a program without GC?
Is there a way to get this kind of information, either with code or using some commands when I run an Erlang program?
Thanks.

To get information for a single process, you can call erlang:process_info(Pid). This will yield (as of Erlang 18.0) the following fields:
> erlang:process_info(self()).
[{current_function,{erl_eval,do_apply,6}},
{initial_call,{erlang,apply,2}},
{status,running},
{message_queue_len,0},
{messages,[]},
{links,[<0.27.0>]},
{dictionary,[]},
{trap_exit,false},
{error_handler,error_handler},
{priority,normal},
{group_leader,<0.26.0>},
{total_heap_size,4184},
{heap_size,2586},
{stack_size,24},
{reductions,3707},
{garbage_collection,[{min_bin_vheap_size,46422},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,7}]},
{suspending,[]}]
The number of collection cycles for the process is available in the field minor_gcs under the section garbage_collection.
Per Process
The current heap size for the process is available in the field heap_size from the results above (in words, 4 bytes on a 32-bit VM and 8 bytes on a 64-bit VM). The total memory consumption of the process can be obtained by calling erlang:process_info(Pid, memory) which returns for example {memory,34312} for the above process. This includes call stack, heap and internal structures.
Deallocations (and allocations) can be traced using erlang:trace/3. If the trace flag is garbage_collection you will received messages on the form {trace, Pid, gc_start, Info} and {trace, Pid, gc_end, Info}. The Info field of the gc_start message contains such things as heap_size and old_heap_size.
Per System
Top level statistics of the system can be obtained by erlang:memory/0:
> erlang:memory().
[{total,15023008},
{processes,4215272},
{processes_used,4215048},
{system,10807736},
{atom,202481},
{atom_used,187597},
{binary,325816},
{code,4575293},
{ets,234816}]
Garbage collection statistics can be obtained via erlang:statistics(garbage_collection) which yields:
> statistics(garbage_collection).
{85,23961,0}
Where (as of Erlang 18.0) the first field is the total number of garbage collections performed by the VM and the second field is the total number of words reclaimed.
The heap sizes for a process are available under the fields total_heap_size (all heap fragments and stack) and heap_size (the size of the youngest heap generation) from the process info above.
They can be controlled via spawn options, specifically min_heap_size which sets the initial heap size for a process.
To set it for all process, erlang:system_flag(min_heap_size, MinHeapSize) can be called.
You can also control global VM memory allocation via the +M... options to the Erlang VM. The flags are described here. However, this requires extensive knowledge about the internals of the Erlang VM and its allocators and using them should not be taken lightly.
This can be obtained via the tracing described in answer 2. If you use the option timestamp when tracing, you will receive a timestamp with each trace message that can be used to calculate the total GC time.
Short answer: no.
Long answer: Maybe. You can control the initial heap size (via min_heap_size) which will affect when garbage collection will occur the first time. You can also control when a full sweep will be performed with the fullsweep_after option.
More information can be found in the Academic and Historical Questions and Processes section of the Efficiency Guide.
The most practical way of introspecting Erlang memory usage at runtime is via the Recon library, as Steve Vinoski mentioned.

Related

relationship between container_memory_working_set_bytes and process_resident_memory_bytes and total_rss

I'm looking to understanding the relationship of
container_memory_working_set_bytes vs process_resident_memory_bytes vs total_rss (container_memory_rss) + file_mapped so as to better equipped system for alerting on OOM possibility.
It seems against my understanding (which is puzzling me right now) given if a container/pod is running a single process executing a compiled program written in Go.
Why is the difference between container_memory_working_set_bytes is so big(nearly 10 times more) with respect to process_resident_memory_bytes
Also the relationship between container_memory_working_set_bytes and container_memory_rss + file_mapped is weird here, something I did not expect, after reading here
The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file. This should not be confused with the true resident set size or the amount of physical memory used by the cgroup. rss + file_mapped will give you the resident set size of cgroup. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
So cgroup total resident set size is rss + file_mapped how does this value is less than container_working_set_bytes for a container that is running in the given cgroup
Which make me feels something with this stats that I'm not correct.
Following are the PROMQL used to build the above graph
process_resident_memory_bytes{container="sftp-downloader"}
container_memory_working_set_bytes{container="sftp-downloader"}
go_memstats_heap_alloc_bytes{container="sftp-downloader"}
container_memory_mapped_file{container="sftp-downloader"} + container_memory_rss{container="sftp-downloader"}
So the relationship seems is like this
container_working_set_in_bytes = container_memory_usage_bytes - total_inactive_file
container_memory_usage_bytes as its name implies means the total memory used by the container (but since it also includes file cache i.e inactive_file which OS can release under memory pressure) substracting the inactive_file gives container_working_set_in_bytes
Relationship between container_memory_rss and container_working_sets can be summed up using following expression
container_memory_usage_bytes = container_memory_cache + container_memory_rss
cache reflects data stored on a disk that is currently cached in memory. it contains active + inactive file (mentioned above)
This explains why the container_working_set was higher.
Ref #1
Ref #2
Not really an answer, but still two assorted points.
Does this help to make sense of the chart?
Here at my $dayjob, we had faced various different issues with how different tools external to the Go runtime count and display memory usage of a process executing a program written in Go.
Coupled with the fact Go's GC on Linux does not actually release freed memory pages to the kernel but merely madvise(2)s it that such pages are MADV_FREE, a GC cycle which had freed quite a hefty amount of memory does not result in any noticeable change of the readings of the "process' RSS" taken by the external tooling (usually cgroups stats).
Hence we're exporting our own metrics obtained by periodically calling runtime.ReadMemStats (and runtime/debug.ReadGCStats) in any major serivice written in Go — with the help of a simple package written specifically for that. These readings reflect the true idea of the Go runtime about the memory under its control.
By the way, the NextGC field of the memory stats is super useful to watch if you have memory limits set for your containers because once that reading reaches or surpasses your memory limit, the process in the container is surely doomed to be eventually shot down by the oom_killer.

What is Peak Working Set in windows task manager

I'm confused about the windows task manager memory overview.
in the general memory overview it shows "in use" 7.9gb (in my sample)
.
I've used process explorer to sum up the used memory and it shows me the following:
Since this is the nearest number to the 7.9gb of the task manager, i guess this value is shown there.
Now my question:
What is the Peak working set?
If i hoover over the column in task manager, it says:
and the microsoft help says Maximum amount of working set memory used by the process.
Is it now the effective used memory of all processes, or is it the maximum of memory which was used by all process?
The number you refer to is "Memory used by processes, drivers and the operating system" [source].
This is an easy but somewhat vague description. A somewhat similar description would be the total amount of memory that is not free, or part of the buffer cache, or part of the standby list.
It is not the maximum memory used at some time ("peak"), it's a coincidence that you have roughly the same number there. It is the presently used amount (used by "everyone", that is all programs and the OS).
The peak working set is a different thing. The working set is the amount of memory in a process (or, if you consider several processes, in all these processes) that is currently in physical memory. The peak working set is, consequently, the maximum value so far seen.
A process may allocate more memory than it actually ever commits ("uses"), and most processes will commit more memory than they have in their working set at one time. This is perfectly normal. Pages are moved in and out of working sets (and into the standby list) to assure that the computer, which has only a finite amount of memory, always has enough reserves to satisfy any memory needs.
The memory figures in question aren't actually a reliable indicator of how much memory a process is using.
A brief explanation of each of the memory relationships:
Private Bytes are what the process is allocated, also with pagefile usage.
Working Set is the non-paged Private Bytes plus memory-mapped files.
Virtual Bytes are the Working Set plus paged Private Bytes and
standby list.
In answer to your question the peak working set is the maximum amount of physical RAM that was assigned to the process in question.
~ Update ~
Available memory is defined as the sum of the standby list plus free memory. There is far more to total memory usage than the sum all process working sets. Because of this and due to memory sharing this value is not generally very useful.
The virtual size of a process is the portion of a process virtual address space that has been allocated for use. There is no relationship between this and physical memory usage.
Private bytes is the portion of a processes virtual address space that has been allocated for private use. It does not include shared memory or that used for code. There is no relationship between this value and physical memory usage either.
Working set is the amount of physical memory in use by a process. Due to memory sharing there will be some double counting in this value.
The terms mentioned above aren't really going to mean very much until you understand the basic concepts in Windows memory management. Have a look HERE for some further reading.

Gradle heap error on uploadArchives

I am trying to upload an archive that's 600MB in size.
I get this error:
Execution failed for task ':uploadArchives'.
> Java heap space
...
...
Caused by: java.lang.OutOfMemoryError: Java heap space
I have tried to set GRADLE_OPTS, JVM_OPTS and MAVEN_OPTS variables, for setting the max. heap size, like for example:
export GRADLE_OPTS=-Xmx1024m
gradle uploadArchives
But I am still getting the same error.
What am I missing here?
Ultimately you always have a finite max of heap to use no matter what platform you are running on. In Windows 32 bit this is around 2gb (not specifically heap but total amount of memory per process). It just happens that Java happens to make the default smaller (presumably so that the programmer can't create programs that have runaway memory allocation without running into this problem and having to examine exactly what they are doing).
So this given there are several approaches you could take to either determine what amount of memory you need or to reduce the amount of memory you are using. One common mistake with garbage collected languages such as Java or C# is to keep around references to objects that you no longer are using, or allocating many objects when you could reuse them instead. As long as objects have a reference to them they will continue to use heap space as the garbage collector will not delete them.
In this case you can use a Java memory profiler to determine what methods in your program are allocating large number of objects and then determine if there is a way to make sure they are no longer referenced, or to not allocate them in the first place. One option which I have used in the past is "JMP" http://www.khelekore.org/jmp/.
If you determine that you are allocating these objects for a reason and you need to keep around references (depending on what you are doing this might be the case), you will just need to increase the max heap size when you start the program. However, once you do the memory profiling and understand how your objects are getting allocated you should have a better idea about how much memory you need.
In general if you can't guarantee that your program will run in some finite amount of memory (perhaps depending on input size) you will always run into this problem. Only after exhausting all of this will you need to look into caching objects out to disk etc. At this point you should have a very good reason to say "I need Xgb of memory" for something and you can't work around it by improving your algorithms or memory allocation patterns. Generally this will only usually be the case for algorithms operating on large datasets (like a database or some scientific analysis program) and then techniques like caching and memory mapped IO become useful.
Run Java with the command-line option -Xmx, which sets the maximum size of the heap.
http://docs.oracle.com/javase/7/docs/technotes/tools/windows/java.html#nonstandard
The problem was because the actual size of package was a lot higher due to gradle not being able to handle symlinks.
When I manually handled the symlinks, the problem ended.
The JVM heap size can be set in the gradle.properties file, in the root directory of your gradle project. Like this:
org.gradle.jvmargs=-Xms256m -Xmx1024m

How to see hadoop's heap use?

I am doing a school work to analyze the use of heap in hadoop. It involves running two versions of a mapreduce program to calculate the median of the length of forum comments: the first one is 'memory-unconscious' and the reduce program handles in memory a list with the length of every comment; the second one is 'memory-conscious' and the reducer uses a very memory-efficient data structure to handle the data.
The purpose is to use both programs to process data of different sizes and watch how the memory usage goes up faster in the first one (until it eventually runs out of memory).
My question is: how can I obtain the heap usage of hadoop or the reduce tasks?
I thouth the counter "Total committed heap usage (bytes)" would cointain this data, but I have found both versions of the program return almost the same values.
Regarding the correctness of the programs, the 'memory-unconscious' one runs out of memory with a large input and fails, while the other one does not and is able to finish.
Thanks in advance
I don't know what memory-conscious data structure you are using(If you give which one then might help), But most of in-memory data structure utilizes virtual memory means is data structure size increases to some extent based on policy extra data element/s will be dump into virtual memory. Hence we does not result in Out-of-memory error. but in case memory-unconscious doesn't do that. In both the cases data structure size will remain same that's why you are getting same size. To get real memory usage by Reducer you can get it by:
New Feature added java 1.5 is Instrumentation interface by which you can get objects memory usage(getObjectSize). Nice article about it: LINK
/* Returns the amount of free memory in the Java Virtual Machine. Calling the gc method may result in increasing the value returned by freeMemory.*/
long freeMemory = Runtime.getRuntime().freeMemory()
/* Returns the maximum amount of memory that the Java virtual machine will attempt to use. If there is no inherent limit then the value Long.MAX_VALUE will be returned. */
long maximumMemory = Runtime.getRuntime().maxMemory();
/* Returns the total amount of memory in the Java virtual machine. The value returned by this method may vary over time, depending on the host environment.
Note that the amount of memory required to hold an object of any given type may be implementation-dependent. */
long totalMemory = Runtime.getRuntime().totalMemory()

Garbage collection and memory management in Erlang

I want to know technical details about garbage collection (GC) and memory management in Erlang/OTP.
But, I cannot find on erlang.org and its documents.
I have found some articles online which talk about GC in a very general manner, such as what garbage collection algorithm is used.
To classify things, lets define the memory layout and then talk about how GC works.
Memory Layout
In Erlang, each thread of execution is called a process. Each process has its own memory and that memory layout consists of three parts: Process Control Block, Stack and Heap.
PCB: Process Control Block holds information like process identifier (PID), current status (running, waiting), its registered name, and other such info.
Stack: It is a downward growing memory area which holds incoming and outgoing parameters, return addresses, local variables and temporary spaces for evaluating expressions.
Heap: It is an upward growing memory area which holds process mailbox messages and compound terms. Binary terms which are larger than 64 bytes are NOT stored in process private heap. They are stored in a large Shared Heap which is accessible by all processes.
Garbage Collection
Currently Erlang uses a Generational garbage collection that runs inside each Erlang process private heap independently, and also a Reference Counting garbage collection occurs for global shared heap.
Private Heap GC: It is generational, so divides the heap into two segments: young and old generations. Also there are two strategies for collecting; Generational (Minor) and Fullsweep (Major). The generational GC just collects the young heap, but fullsweep collect both young and old heap.
Shared Heap GC: It is reference counting. Each object in shared heap (Refc) has a counter of references to it held by other objects (ProcBin) which are stored inside private heap of Erlang processes. If an object's reference counter reaches zero, the object has become inaccessible and will be destroyed.
To get more details and performance hints, just look at my article which is the source of the answer: Erlang Garbage Collection Details and Why It Matters
A reference paper for the algorithm: One Pass Real-Time Generational Mark-Sweep Garbage Collection (1995) by Joe Armstrong and Robert Virding in
1995 (at CiteSeerX)
Abstract:
Traditional mark-sweep garbage collection algorithms do not allow reclamation of data until the mark phase of the algorithm has terminated. For the class of languages in which destructive operations are not allowed we can arrange that all pointers in the heap always point backwards towards "older" data. In this paper we present a simple scheme for reclaiming data for such language classes with a single pass mark-sweep collector. We also show how the simple scheme can be modified so that the collection can be done in an incremental manner (making it suitable for real-time collection). Following this we show how the collector can be modified for generational garbage collection, and finally how the scheme can be used for a language with concurrent processes.1
Erlang has a few properties that make GC actually pretty easy.
1 - Every variable is immutable, so a variable can never point to a value that was created after it.
2 - Values are copied between Erlang processes, so the memory referenced in a process is almost always completely isolated.
Both of these (especially the latter) significantly limit the amount of the heap that the GC has to scan during a collection.
Erlang uses a copying GC. During a GC, the process is stopped then the live pointers are copied from the from-space to the to-space. I forget the exact percentages, but the heap will be increased if something like only 25% of the heap can be collected during a collection, and it will be decreased if 75% of the process heap can be collected. A collection is triggered when a process's heap becomes full.
The only exception is when it comes to large values that are sent to another process. These will be copied into a shared space and are reference counted. When a reference to a shared object is collected the count is decreased, when that count is 0 the object is freed. No attempts are made to handle fragmentation in the shared heap.
One interesting consequence of this is, for a shared object, the size of the shared object does not contribute to the calculated size of a process's heap, only the size of the reference does. That means, if you have a lot of large shared objects, your VM could run out of memory before a GC is triggered.
Most if this is taken from the talk Jesper Wilhelmsson gave at EUC2012.
I don't know your background, but apart from the paper already pointed out by jj1bdx you can also give a chance to Jesper Wilhelmsson thesis.
BTW, if you want to monitor memory usage in Erlang to compare it to e.g. C++ you can check out:
Erlang Instrument Module
Erlang OS_MON Application
Hope this helps!

Resources