systemd MemoryLimit not enforced - systemd

I am running systemd version 219.
root#EVOvPTX1_RE0-re0:/var/log# systemctl --version
systemd 219
+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP -GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID -ELFUTILS +KMOD -IDN
I have a service, let's call it foo.service which has the following.
[Service]
MemoryLimit=1G
I have deliberately added code to allocate 1M memory 4096 times which causes
4G memory alloc when a certain event is received. The idea is that after
the process consumes 1G of address space, memory alloc would start failing.
However, this does not seem to be the case. I am able to alloc 4G memory
without any issues. This tells me that the memory limit specified in the
service file is not enforced.
Can anyone let me know what am I missing ?
I looked at the proc file system - file named limits. This shows that the
Max address space is Unlimited, which also confirms that the memory limit
is not getting enforced.

This distinction is that you have allocated memory, but you haven't actually used it. In the output of top, this is the difference between the "VIRT" memory column (allocated) and the "RES" column (actually used).
Try modifying your experiment to assign values to elements of a large array instead of just allocating memory and see if you hit the memory limit that way.
Reference: Resident and Virtual memory on Linux: A short example

Related

Difference between aliased (ALI) and shared (SHM) memory on MacOS

I am using vmmap on MacOS. For one region it shows that sharing mode = aliased (ALI):
REGION TYPE START - END [ VSIZE RSDNT DIRTY SWAP] PRT/MAX SHRMOD PURGE REGION DETAIL
mapped file 1008dc000-1008e0000 [ 16K 16K 16K 0K] rw-/rwx SM=ALI /Users/USER/*/data
I wasn't able to find any information what does that mean. This page states that
Aliased (ALI) and shared (SHM) memory are shared between processes.
There is no further information about the difference between ALI and SHM. Can you help me understand what the difference is?
When the memory is shared (SHM) both processes can access is simultaneously.
However, when the memory is aliased (ALI) only one process at the time has the virtual address mapped to the physical memory. When second process tries accessing memory, these steps happen:
Process 2 gets page fault.
Kernel unmaps memory from the Process 1.
Kernel maps the memory to the Process 2.
Now, the process 2 can write/read from the memory.
This is different to how the memory works on linux where there is no aliased (ALI) mode, only shared.
Source.

Memory mapping in Virtual Address Space(VAS)

This [wiki article] about Virtual memory says:
The process then starts executing bytes in the exe file. However, the
only way the process can use or set '-' values in its VAS is to ask
the OS to map them to bytes from a file. A common way to use VAS
memory in this way is to map it to the page file.
A diagram follows :
0 4GB
VAS |---vvvvvvv----vvvvvv---vvvv----vv---v----vvv--|
mapping ||||||| |||||| |||| || | |||
file bytes app.exe kernel user system_page_file
I didn't understand the part values in its VAS is to ask the OS to map them to bytes from a file.
What is the system page file here?
First off, I can't imagine such a badly written article to exist in Wikipedia. One has to be an expert already familiar with the topic before being able to understand what was described.
Assuming you understand the rest of the article, the '-' part represents unallocated virtual address within the 4GB address space available to a process. So the sentence "the only way the process can use or set '-' values in its VAS is to ask the OS to map them to bytes from a file" means to allocate virtual memory address e.g. in a Windows native program calling VirtualAlloc(), or a C program calling malloc() to allocate some memory to store program data while those memory were not already existing in the current process's virtual address space.
When Windows allocates memory to a process address space, it normally associate those memory with the paging file in the hard disk. The c:\pagefile.sys is this paging file which is the system_page_file mentioned in the article. Memory page is swapped out to that file when there is not enough physical page to accommodate the demand.
Hope that clarifies

ext4 commit= mount option and dirty_writeback_centisecs

I'm tring to understand the way bytes go from write() to the phisical disk plate to tune my picture server performance.
Thing I don't understand is what is the difference between these two: commit= mount option and dirty_writeback_centisecs. Looks like they are about the same procces of writing changes to the storage device, but still different.
I do not get it clear which one fires first on the way to the disk for my bytes.
Yeah, I just ran into this investigating mount options for an SDCard Ubuntu install on an ARM Chromebook. Here's what I can tell you...
Here's how to see the dirty and writeback amounts:
user#chrubuntu:~$ cat /proc/meminfo | grep "Dirty" -A1
Dirty: 14232 kB
Writeback: 4608 kB
(edit: This dirty and writeback is rather high, I had a compile running when I ran this.)
So data to be written out is dirty. Dirty data can still be eliminated (if say, a temporary file is created, used, and deleted before it goes to writeback, it'll never have to be written out). As dirty data is moved into writeback, the kernel tries to combine smaller requests that may be into dirty into single larger I/O requests, this is one reason why dirty_expire_centisecs is usually not set too low. Dirty data is usually put into writeback when a) Enough data is cached to get up to vm.dirty_background_ratio. b) As data gets to be vm.dirty_writeback_centisecs centiseconds old (3000 default is 30 seconds) it is put into writeback. vm.dirty_writeback_centisecs, a writeback daemon is run by default every 500 centiseconds (5 seconds) to actually flush out anything in writeback.
fsync will flush out an individual file (force it from dirty into writeback and wait until it's flushed out of writeback), and sync does that with everything. As far as I know, it does this ASAP, bypassing any attempt to try to balance disk reads and writes, it stalls the device doing 100% writes until the sync completes.
commit=5 default ext4 mount option actually forces a sync() every 5 seconds on that filesystem. This is intended to ensure that writes are not unduly delayed if there's heavy read activity (ideally losing a maximum of 5 seconds of data if power is cut or whatever.) What I found with an Ubuntu install on SDCard (in a Chromebook) is that this actually just leads to massive filesystem stalls like every 5 seconds if you're writing much to the card, ChromeOS uses commit=600 and I applied that Ubuntu-side to good effect.
The dirty_writeback_centisecs, configures the daemons of the kernel Linux related to the virtual memory (that's why the vm). Which are in charge of making a write back from the RAM memory to all the storage devices, so if you configure the dirty_writeback_centisecs and you have 25 different storage devices mounted on your system it will have the same amount of time of writeback for all the 25 storage systems.
While the commit is done per storage device (actually is per filesystem) and is related to the sync process instead of the daemons from the virtual memory.
So you can see it as:
dirty_writeback_centisecs
writing from RAM to all filesystems
commit
each filesystem fetches from RAM

redis bgsave failed because fork Cannot allocate memory

all:
here is my server memory info with 'free -m'
total used free shared buffers cached
Mem: 64433 49259 15174 0 3 31
-/+ buffers/cache: 49224 15209
Swap: 8197 184 8012
my redis-server has used 46G memory, there is almost 15G memory left free
As my knowledge,fork is copy on write, it should not failed when there has 15G free memory,which is enough to malloc necessary kernel structures .
besides, when redis-server used 42G memory, bgsave is ok and fork is ok too.
Is there any vm parameter I can tune to make fork return success ?
More specifically, from the Redis FAQ
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
Redis doesn't need as much memory as the OS thinks it does to write to disk, so may pre-emptively fail the fork.
Modify /etc/sysctl.conf and add:
vm.overcommit_memory=1
Then restart sysctl with:
On FreeBSD:
sudo /etc/rc.d/sysctl reload
On Linux:
sudo sysctl -p /etc/sysctl.conf
From proc(5) man pages:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE set are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed". Under Linux 2.4
any non-zero value implies mode 1. In mode 2 (available since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is the size
of the swap space, and RAM is the size of the physical memory, and r is the contents of the file /proc/sys/vm/overcommit_ratio.
Redis's fork-based snapshotting method can effectively double physical memory usage and easily OOM in cases like yours. Reliance on linux virtual memory for doing snapshotting is problematic, because Linux has no visibility into Redis data structures.
Recently a new redis-compatible project Dragonfly has been released. Among other things, it solves the OOM problem entirely. (disclosure - I am the author of this project).

Memory Usage in R

After creating large objects and running out of RAM, I will try and delete the objects in my current environment using
rm(list=ls())
When I check my RAM usage, nothing has changed. Even after calling gc() nothing has changed. I can only replenish my RAM by quitting R.
Anybody have advice for dealing with memory-intensive objects within R?
Memory for deleted objects is not released immediately. R uses a technique called "garbage collection" to reclaim memory for deleted objects. Periodically, it cycles through the list of accessible objects (basically, those that have names and have not been deleted and can therefore be accessed by the user), and "tags" them for retention. The memory for any untagged objects is returned to the operating system after the garbage-collection sweep.
Garbage collection happens automatically, and you don't have any direct control over this process. But you can force a sweep by calling the command gc() from the command line.
Even then, on some operating systems garbage collection might not reclaim memory (as reported by the OS). Older versions of Windows, for example, could increase but not decrease the memory footprint of R. Garbage collection would only make space for new objects in the future, but would not reduce the memory use of R.
On Windows, the technique you describe works for me. Try the following example.
Open the Windows Task Manager (CTRL+SHIFT+ESC).
Start RGui. RGui.exe mem usage is 27 460K.
Type
gcinfo(TRUE)
x <- rnorm(1e8)
RGui.exe mem usage is now 811 100K.
Type rm("x"). RGui.exe mem usage is still 811 100K.
Type gc(). RGui.exe mem usage is now 28 332K.
Note that gc shoud be called automatically if you have removed objects from your workspace, and then you try to allocate more memory to new variables.
My impression is that multiple forms of gc() are tried before R reports failed memory allocation. I'm not aware of a solution for this at present, other than restarting R as you suggest. It appears that R does not defragment memory.
An old question, I realize, but I've found that (on OS Mojave), invoking pryr::mem_used() in the R session causes the activity monitor to immediately update the reported memory usage to reflect only the objects retained in the R environment.

Resources