Consume 'Free' memory on linux machine on the expense of cache - caching

When using the tool 'free' in linux, we can see several values of memory aspects:
[root#coconut-stateless-clients-5 ~] 2021-08-03 17:28:07 $ free
total used free shared buff/cache available
Mem: 62907052 382180 61985152 4788 539720 61933812
Swap: 0 0 0
I need to lower the 'free' memory value and keep 'available' value unchanged (as much as I can)
How can I 'fill' up the cache memory on the expense of 'free' in a linux machine?

Cache memory is filled by the kernel in various cases. Usually the operating system stores binaries or other files it currently works with. For example, data that is displayed, or sent to other machines is kept in the cache memory.
That mechanism can be used in order to load files or data to the cache. Hence, achieving both goals of reducing 'free' memory and filling 'cache'.
For that we can use the BSD reading tool 'head' that is used for reading lines or bytes from file.
The lines or bytes can be read to memory and then will be loaded to cache only momentarily. Or, can be read to a file and the data read and cached in memory until the memory is required for other purpose (and no other space left).
With the help of this article you can get more familiar with the details. But the following example is suffice if you just want to achieve the goals.
Fill 'cache' with x GiB/MiB of data and reduce 'free' with the same space:
2GiB example:
# head -c 2G /dev/urandom > dummy.file
250MiB example:
# head -c 250M /dev/urandom > dummy.file
In order to free up ALL the cached space, run this command:
# echo 3 > /proc/sys/vm/drop_caches

Related

Need bash script that constantly uses high memory but low cpu?

I am running few experiments to see changes in system behavior under different memory and cpu loads. I was wondering is there a bash script which constantly uses high memory but low CPU?
For the purpose of simulating CPU/memory/IO load, most *NIX systems (Linux included) provide handy tool called stress.
The tool varies from OS to OS. On Linux, to take up 512MB of RAM with low CPU load:
stress --vm 1 --vm-bytes 512M --vm-hang 100
(The invocation means: start one memory thread (--vm 1), allocate/free 512MB of memory in every thread, sleep before freeing memory 100 seconds.)
This is silly, and can't be reasonably expected to provide data which will be useful in any real-world scenario. However, to generate at least the amount of memory consumption associated with a given power-of-two bytes:
build_string() {
local pow=$1
local dest=$2
s=' '
for (( i=0; i<pow; i++ )); do
s+="$s"
done
printf -v "$dest" %s "$s"
}
build_string 10 kilobyte # build a string of length 1024
echo "Kilobyte string consumes ${#kilobyte} bytes"
build_string 20 megabyte # build a string of length 1048576
echo "Megabyte string consumes ${#megabyte} bytes"
Note that transiently, during construction, at least 2x the requested space will be required (for the local); a version that didn't have this behavior would either be using namevars (depending on bash 4.3) or eval (depending on the author's willingness to do evil).

Why can't we to mark as WriteCombined already existing memory region by using `cudaHostRegister()`?

In CUDA SDK function cudaHostAlloc() for allocation new memory region can use flags:
cudaHostAllocDefault (default - 0 and causes cudaHostAlloc() to emulate cudaMallocHost())
cudaHostAllocPortable
cudaHostAllocMapped
cudaHostAllocWriteCombined
To mark memory region that already allocated we can use cudaHostRegister() with flags:
0 (default)
cudaHostRegisterPortable
cudaHostRegisterMapped
Why we can mark memory WriteCombined when allocating it by flag cudaHostAllocWriteCombined by using cudaHostAlloc(), but can't mark as WriteCombined already existing memory region by using cudaHostRegister()?
Already allocated memory we must will mark only through the POSIX function set_memory_wc()?
I did not know of any APIs that could change the cacheability of an existing VA range until you referenced set_memory_wc(). Such an operation would be extremely expensive due to all the cache flushes and TLB shootdowns that would be required; and the memory would basically be unreadable until you found some way to unmark it as WC.
Why are you trying to use WC memory? On pre-i7 (Nehalem) CPUs, WC had slightly higher transfer performance (IIRC) because it inhibited snooping of PCI Express traffic to and from the memory. But on Nehalem and later CPUs, I don't know of any application that has concretely demonstrated a benefit from WC memory.

Linux memory overcommit details

I am developing SW for embedded Linux and i am suffering system hangs because OOM Killer appears from time to time. Before going beyond i would like to solve some confusing issues about how Linux Kernel allocate dynamic memory assuming /proc/sys/vm/overcommit_memory has 0 and /proc/sys/vm/min_free_kbytes has 712, and no swap.
Supposing embedded Linux currently physical memory available is 5MB (5MB of free memory and there is not usable cached or buffered memory available) if i write this piece of code:
.....
#define MEGABYTE 1024*1024
.....
.....
void *ptr = NULL;
ptr = (void *) malloc(6*MEGABYTE); //Preserving 6MB
if (!prt)
exit(1);
memset(ptr, 1, MEGABYTE);
.....
I would like to know if when memset call is committed the kernel will try to allocate ~6MB or ~1MB (or min_free_kbytes multiple) in the physical memory space.
Right now there is about 9MB in my embedded device which has 32MB RAM. I check it by doing
# echo 3 > /proc/sys/vm/drop_caches
# free
total used free shared buffers
Mem: 23732 14184 9548 0 220
Swap: 0 0 0
Total: 23732 14184 9548
Forgetting last piece of C code, i would like to know if its possible that oom killer appears when for instance free memory is about >6MB.
I want to know if the system is out of memory when oom appears, so i think i have two options:
See VmRSS entries in /proc/pid/status of suspicious process.
Set /proc/sys/vm/overcommit_memory = 2 and /proc/sys/vm/overcommit_memory = 75 and see if there is any process requiring more of physical memory available.
I think you can read this document. Is provides you three small C programs that you can use to understand what happens with the different possible values of /proc/sys/vm/overcommit_memory .

Fortran array memory management

I am working to optimize a fluid flow and heat transfer analysis program written in Fortran. As I try to run larger and larger mesh simulations, I'm running into memory limitation problems. The mesh, though, is not all that big. Only 500,000 cells and small-peanuts for a typical CFD code to run. Even when I request 80 GB of memory for my problem, it's crashing due to insufficient virtual memory.
I have a few guesses at what arrays are hogging up all that memory. One in particular is being allocated to (28801,345600). Correct me if I'm wrong in my calculations, but a double precision array is 8 bits per value. So the size of this array would be 28801*345600*8=79.6 GB?
Now, I think that most of this array ends up being zeros throughout the calculation so we don't need to store them. I think I can change the solution algorithm to only store the non-zero values to work on in a much smaller array. However, I want to be sure that I'm looking at the right arrays to reduce in size. So first, did I correctly calculate the array size above? And second, is there a way I can have Fortran show array sizes in MB or GB during runtime? In addition to printing out the most memory intensive arrays, I'd be interested in seeing how the memory requirements of the code are changing during runtime.
Memory usage is a quite vaguely defined concept on systems with virtual memory. You can have large amounts of memory allocated (large virtual memory size) but only a small part of it actually being actively used (small resident set size - RSS).
Unix systems provide the getrusage(2) system call that returns information about the amount of system resources in use by the calling thread/process/process children. In particular it provides the maxmimum value of the RSS ever reached since the process was started. You can write a simple Fortran callable helper C function that would call getrusage(2) and return the value of the ru_maxrss field of the rusage structure.
If you are running on Linux and don't care about portability, then you may just open and read from /proc/self/status. It is a simple text pseudofile that among other things contains several lines with statistics about the process virtual memory usage:
...
VmPeak: 9136 kB
VmSize: 7896 kB
VmLck: 0 kB
VmHWM: 7572 kB
VmRSS: 6316 kB
VmData: 5224 kB
VmStk: 88 kB
VmExe: 572 kB
VmLib: 1708 kB
VmPTE: 20 kB
...
Explanation of the various fields - here. You are mostly interested in VmData, VmRSS, VmHWM and VmSize. You can open /proc/self/status as a regular file with OPEN() and process it entirely in your Fortran code.
See also what memory limitations are set with ulimit -a and ulimit -aH. You may be exceeding the hard virtual memory size limit. If you are submitting jobs through a distributed resource manager (e.g. SGE/OGE, Torque/PBS, LSF, etc.) check that you request enough memory for the job.

How to keep cache a second class citizen

OK in a comment to this question:
How to clean caches used by the Linux kernel
ypnos claims that:
"Applications will always be first citizens for memory and don't have to fight with cache for it."
Well, I think my cache is rebelious and does not want to accept its social class. I ran the experiment here:
http://www.linuxatemyram.com/play.html
step 1:
$ free -m
total used free shared buffers cached
Mem: 3015 2901 113 0 15 2282
-/+ buffers/cache: 603 2411
Swap: 2406 2406 0
So 2282MB is used by cache and 113MB is free.
Now:
$ ./munch
Allocated 1 MB
Allocated 2 MB
Allocated 3 MB
Allocated 4 MB
.
.
.
Allocated 265 MB
Allocated 266 MB
Allocated 267 MB
Allocated 268 MB
Allocated 269 MB
Killed
OK, Linux gave me, generously another 156MB and that's it! So, how can I tell Linux that my programs are more important than that 2282MB cache?
Extra info: my /home is encrypted.
More people with the same problem (These make the encryption hypothesis not very plausible):
https://serverfault.com/questions/171164/can-you-set-a-minimum-linux-disk-buffer-size
and
https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem
The thing to know about caching in the kernel is that it's designed to be efficient as possible. This often means things put into cache are left there when there's nothing else asking for memory.
This is the kernel preparing to be lucky in case the thing in cache is asked for again. If no-one else needs the memory, there's little benefit in freeing it up.
I am not sure about Linux specific stuff, but a good OS will keep track of how many times a memory page was accessed, and how long ago. If it wasn't accessed much lately, it can swap it out, and use the RAM for caching.
Also, allocated but unused memory can be sent to swap as well, because sometimes programs allocate more than they actually need, so many memory pages would just sit there filling your RAM.
I found out if I turn off the swap by
#swapoff -a
The problem is going away. If I have swap, when I ask for more memory, then linux tries to move the cache to the swap and then swap get full, then linux halts the whole operation instead of dropping the cache. This results in "out of memory". But without swap Linux knows that it has no hope but dropping the cache in the first place.
I think it's a bug in linux kernel.
From the one of the link that added to the question suggests that:
sysctl -w vm.min_free_kbytes=65536
helps, for me with 64MG still I can easily get into trouble. I'm working with 128MG margin and when the greedy cache reach there, the machine becomes very slow but unlike before it doesn't freeze. I'll check with 256MG margin and see if there will be an improvement or not.

Resources