Can cgroup really guarantee process do not interfere each other? - linux-kernel

I'm a fresh man of cgroup and I'm trying to use it control two C++ processes on my Linux server.
I set mem_limit of each process to 1G, which means it can consume at most 1GB memory, right?
But I think cgroup does not guarantee real isolation like VM, for example, one process can still read (or write) the memory of another process.
There's also competition between the two processes to grap free memory block as cgroup does not allocate anything to them.
Am I right?
What about the case in the cpu_set?
What's the difference between cgroup vs VM considering the isolation?
I Googled it but only got a lot of "docker vs vm", which is really not what I want.
Any tips from implementation of cgroups is really helpful.

First of all, you misunderstood what cgroups is. It is not an isolation tool, it is resource limiting tool that could limit memory, CPU, I/O consumption like mem_limit.
However, each process has its own, unique address space, so when process 1 is running on CPU, process 2 page tables are not used, so process 1 cannot get process 2 variable by simply dereferencing pointer. Virtual Memory is already an isolation technique.
There are some ways (used usually by debuggers) to access other's process memory in Linux:
/proc/PID/mem. If you check permissions on that file, you will see that only same user or root may access it.
process_vm_{readv,writev} system calls. They check if user has capability CAP_SYS_PTRACE.
So there are several options to forbid other processes to access others memory:
Run processes from different users that do not have CAP_SYS_PTRACE. Android does that.
Use Kernel Namespaces - process will not know if other exists - protection is performed on pid level. LXC uses this and Docker probably too.
Bare-Metal Virtualization: Xen, KVM, etc. Not only process page tables are isolated, but also a kernel too.
IMHO (1) is quite enough and (3) is for paranoics ;)

Related

How can I programmatically create a process snapshot on windows/unix?

I am creating a Golang program that creates a process and then should be able to suspend it.
To make it more memory efficient, I would need my program to be able to dump the memory of the process to disk and reload it only when needed.
I cannot find any info here on Stack Overflow and also GitHub is not helping.
Any solution?
Attempting to answer this with the limited info..
To make it more memory efficient, I would need my program to be able to dump the memory of the process to disk and reload it only when needed.
This is generally something handled by your operating system (scheduler, memory management) controlling what processes are currently running / suspended / etc. and what memory needs to be paged in / out. Trying to implement the equivalent is quite complex, error prone, and likely to be less performant. Why do you believe you need to implement this yourself?
If you are building a program and want to have explicit control about whether it should be considered runnable or not, you could create a process which forks (creating two total processes), and have the parent process suspend and resume the child process using signals:
https://man7.org/linux/man-pages/man7/signal.7.html

Can many (similar) processes use a common RAM cache?

As I understand the creation of processes, every process has it's own space in RAM for it's heap, data, etc, which is allocated upon its creation. Many processes can share their data and storage space in some ways. But since terminating a process would erase its allocated memory(so also its caches), I was wondering if it is possible that many (similar) processes share a cache in memory that is not allocated to any specific process, so that it can be used even when these processes are terminated and other ones are created.
This is a theoretical question from a student perspective, so I am merely interested in the general sence of an operating system, without adding more functionality to them to achieve it.
For example I think of a webserver that uses only single-threaded processes (maybe due to lack of multi-threading support), so that most of the processes created do similar jobs, like retrieving a certain page.
There are a least four ways what you describe can occur.
First, the system address space is shared by all processes. The Operating system can save data there that survives the death of a process.
Second, processes can map logical pages to the same physical page frame. The termination of one process does not cause the page frame to be deallocated to the other processes.
Third, some operating systems have support for writable shared libraries.
Fourth, memory mapped files.
There are probably others as well.
I think so, when a process is terminated the RAM clears it. However your right as things such as webpages will be stored in the Cache for when there re-called. For example -
You open Google and then go to another tab and close the open Google page, when you next go to Google it loads faster.
However, what I think your saying is if the Entire program E.G - Google Chrome or Safari - is closed, does the webpage you just had open stay in the cache? No, when the program is closed all its relative data is also terminated in order to fully close the program.
I guess this page has some info on it -
https://www.wikipedia.org/wiki/Shared_memory

free mem as function of command 'purge'

one of my app needs the function that free inactive/used/wired memory just like command 'purge'.
Check and google a lot, but can not get any hit
Welcome any comment
Purge doesn't do what you seem to think it does. It doesn't "free inactive/used/wired memory". As the manpage says:
It does not affect anonymous memory that has been allocated through malloc, vm_allocate, etc.
All it does is purge the disk cache. This is only useful if you're running performance tests and want to simulate the effects of "first run after cold boot" without actually cold booting. Again, from the manpage:
Purge can be used to approximate initial boot conditions with a cold disk buffer cache for performance analysis.
There is no public API for this, although a quick scan of the symbols shows that it seems to call a function CPOSXPurgeAllDiskBuffers from the CoreProfile private framework. I believe the underlying kernel and userland disk cache code is all or mostly available on http://www.opensource.apple.com, so you could do probably implement the same thing yourself, if you really want.
As iMysak says, you can just exec (or NSTask, etc.) the tool if you want to.
As a side note, it you could free used/wired memory, presumably that memory is used by something—even if you don't have pointers into it in your own data structures, malloc probably does. Are you trying to segfault your code?
Freeing inactive memory is a different story. Just freeing something up to malloc doesn't necessarily make malloc return it to the OS. And there's no way you can force it to. If you think about the way traditional UNIX works, it makes sense: When you ask it to allocate more memory, it uses sbrk to expand your data segment; if you free up memory at the top, it can sbrk back down, but if you free up memory in the middle, there's no way it can do that. Of course modern UNIX systems don't work that way, but the POSIX and C APIs are all designed to be compatible with systems that do. So, if you want to make sure memory gets freed, you have to handle memory allocation directly.
The simplest and most portable way to do this is to create and mmap a temporary backing file, or just MAP_ANON, and explicitly unmap pages when you're done with them. (This works on all POSIX systems—and, with a pretty simple wrapper, even Windows.) If you need even more control (e.g., to manually handle flushing pages to disk, etc.), you can use the mach/mach_vm.h APIs.
You can directly run it from OS // with exec() function

Windows Memory Workings - page tables, and data

I was trying to understand following:
I know that page tables are built for translation between virtual memory and physical memory by virtual memory manager at some point. Since there are many processes running on a system, even though only process active at a time, I was wondering whether page tables for inactive process are moved to page file at any point of time? Given the fact that lower 2 GB area is reserved for windows, it would make sense that windows would keep page tables for all processes on the system. Although it would make sense as well that they are moved to page file if the current process is switched?
Same goes for the writable (data) pages. Will windows keep all the data pages for all the process in memory or move them to page file at some point. On my machine, task manager says 1.5 GB RAM is being utilized out of 3 GB and 1.5 is system cache in performance tab so my understanding is data stays in physical memory for all applications. But would there be a time when it needs to moved to paging file?
I was wondering whether page tables for inactive process are moved to page file at any point of time?
Yes, page tables are pageable.
Will windows keep all the data pages for all the process in memory or move them to page file at some point.
As far as the Windows paging policy is concerned, there's two kinds of memory: pageable and non-pageable. It doesn't really matter which process it belongs to or even if it belongs to the O/S itself, if it's pageable then it's subject to being paged out. So, yes, Windows will page out process data pages if necessary.
I suggest reading the memory management chapter in the Windows Internals book, it should cover all of this.
-scott
You are actually asking two questions here.
What's the paging policy regarding the page tables.
What's the paging policy for "writable data" pages (i.e. virtual memory with R/W permissions).
First I'll correct you a little.
Given the fact that lower 2 GB area is reserved for windows, it would
make sense that windows would keep page tables for all processes on
the system
To be exact it's the upper 2GB that are reserved to windows, more correctly - may be accessed in the kernel mode only by Windows kernel and drivers.
Now, this may surprise you, but the kernel memory may be pagable too! So technically it's not important at all which portion of the 32-bit address space is visible in the user/kernel mode. It's not related to paging.
Another correction: virtual memory may be in physical memory and saved to the page file. There's a common belief that the OS frees physical storage by on-demand saving the pages to the page file. Wrong.
Actually Windows saves memory pages to the page file before they need to be freed. In fact it dumps all the memory pages to the page file (besides of those that are related to other files, such as mapped sections) in background. There are two reasons for this:
During high load the OS will free memory pages quicker (since they're already saved)
In the kernel mode paging is not always possible. Drivers that run on high IRQL (i.e. serve the most time-critical events) may not access physical storage drivers, hence paging is not possible.
So, the answers to your questions are:
Don't know for sure, but it depends on the OS implementation details. I see no reasons why per-process page table may not be paged-out. It's needed during the context switch and modifying process virtual memory. Both situations don't belong to the time-critical events.
Definitely "writable data" memory pages are saved to the page file. Are they removed from the physical memory? On-demand only, during the system load, in the least-recent-used order.

Set Windows process (or user) memory limit

Is there any way to set a system wide memory limit a process can use in Windows XP? I have a couple of unstable apps which do work ok for most of the time but can hit a bug which results in eating whole memory in a matter of seconds (or at least I suppose that's it). This results in a hard reset as Windows becomes totally unresponsive and I lose my work.
I would like to be able to do something like the /etc/limits on Linux - setting M90, for instance (to set 90% max memory for a single user to allocate). So the system gets the remaining 10% no matter what.
Use Windows Job Objects. Jobs are like process groups and can limit memory usage and process priority.
Use the Application Verifier (AppVerifier) tool from Microsoft.
In my case I need to simulate memory no longer being available so I did the following in the tool:
Added my application
Unchecked Basic
Checked Low Resource Simulation
Changed TimeOut to 120000 - my application will run normally for 2 minutes before anything goes into effect.
Changed HeapAlloc to 100 - 100% chance of heap allocation error
Set Stacks to true - the stack will not be able to grow any larger
Save
Start my application
After 2 minutes my program could no longer allocate new memory and I was able to see how everything was handled.
Depending on your applications, it might be easier to limit the memory the language interpreter uses. For example with Java you can set the amount of RAM the JVM will be allocated.
Otherwise it is possible to set it once for each process with the windows API
SetProcessWorkingSetSize Function
No way to do this that I know of, although I'm very curious to read if anyone has a good answer. I have been thinking about adding something like this to one of the apps my company builds, but have found no good way to do it.
The one thing I can think of (although not directly on point) is that I believe you can limit the total memory usage for a COM+ application in Windows. It would require the app to be written to run in COM+, of course, but it's the closest way I know of.
The working set stuff is good (Job Objects also control working sets), but that's not total memory usage, only real memory usage (paged in) at any one time. It may work for what you want, but afaik it doesn't limit total allocated memory.
Per process limits
From an end-user perspective, there are some helpful answers (and comments) at the superuser question “Is it possible to limit the memory usage of a particular process on Windows”, including discussions of how to set recursive quota limits on any or all of:
CPU assignment (quantity, affinity, NUMA groups),
CPU usage,
RAM usage (both ‘committed’ and ‘working set’), and
network usage,
… mostly via the built-in Windows ‘Job Objects’ system (as mentioned in #Adam Mitz’s answer and #Stephen Martin’s comment above), using:
the registry (for persistence, when desired) or
free tools, such as the open-source Process Governor.
(Note: nested Job Objects ~may~ not have been available under all earlier versions of Windows, but the un-nested version appears to date back to Windows XP)
Per-user limits
As far as overall per-user quotas:
??
It is possible that each user session is automatically assigned to a job group itself; if true, per-user limits should be able to be applied to that job group. Update: nope; Job Objects can only be nested at the time they are created or associated with a specific process, and in some cases a child Job Object is allowed to ‘break free’ from its parent and become independent, so they can’t facilitate ‘per-user’ resource limits.
(NTFS does support per-user file system ~storage~ quotas, though)
Per-system limits
Besides simple BIOS or ‘energy profile’ restrictions:
VM hypervisor or Kubernetes-style container resource limit controls may be the most straightforward (in terms of end-user understandability, at least) option.
Footnotes, regarding per-process and other resource quotas / QoS for non-Windows systems:
‘Classic’ Mac OS (including ‘classic’ applications running on 2000s-era versions of Mac OS X): per-application memory limits can be easily set within the ‘Memory’ section of the Finder ‘Get Info’ window for the target program; as a system using a cooperative multitasking concurrency model, per-process CPU limits were impossible.
BSD: ? (probably has some overlap with linux and non-proprietary macOS methods?)
macOS (aka ‘Mac OS X’): no user-facing interface; system support includes, depending on version, the ‘Multiprocessing Services API’, Grand Central Dispatch, POSIX threads / pthread, ‘operation objects’, and possibly others.
Linux: ‘Resource Manager’/limits.conf, control groups/‘cgroups’, process priority/‘niceness’/renice, others?
IBM z/OS and other mainframe-style systems: resource controls / allocation was built-in from nearly the beginning

Resources