Memory Usage in R - memory-management

After creating large objects and running out of RAM, I will try and delete the objects in my current environment using
rm(list=ls())
When I check my RAM usage, nothing has changed. Even after calling gc() nothing has changed. I can only replenish my RAM by quitting R.
Anybody have advice for dealing with memory-intensive objects within R?

Memory for deleted objects is not released immediately. R uses a technique called "garbage collection" to reclaim memory for deleted objects. Periodically, it cycles through the list of accessible objects (basically, those that have names and have not been deleted and can therefore be accessed by the user), and "tags" them for retention. The memory for any untagged objects is returned to the operating system after the garbage-collection sweep.
Garbage collection happens automatically, and you don't have any direct control over this process. But you can force a sweep by calling the command gc() from the command line.
Even then, on some operating systems garbage collection might not reclaim memory (as reported by the OS). Older versions of Windows, for example, could increase but not decrease the memory footprint of R. Garbage collection would only make space for new objects in the future, but would not reduce the memory use of R.

On Windows, the technique you describe works for me. Try the following example.
Open the Windows Task Manager (CTRL+SHIFT+ESC).
Start RGui. RGui.exe mem usage is 27 460K.
Type
gcinfo(TRUE)
x <- rnorm(1e8)
RGui.exe mem usage is now 811 100K.
Type rm("x"). RGui.exe mem usage is still 811 100K.
Type gc(). RGui.exe mem usage is now 28 332K.
Note that gc shoud be called automatically if you have removed objects from your workspace, and then you try to allocate more memory to new variables.

My impression is that multiple forms of gc() are tried before R reports failed memory allocation. I'm not aware of a solution for this at present, other than restarting R as you suggest. It appears that R does not defragment memory.

An old question, I realize, but I've found that (on OS Mojave), invoking pryr::mem_used() in the R session causes the activity monitor to immediately update the reported memory usage to reflect only the objects retained in the R environment.

Related

How to free all GPU memory from pytorch.load?

This code fills some GPU memory and doesn't let it go:
def checkpoint_mem(model_name):
checkpoint = torch.load(model_name)
del checkpoint
torch.cuda.empty_cache()
Printing memory with the following code:
print(torch.cuda.memory_reserved(0))
print(torch.cuda.memory_allocated(0))
shows BEFORE running checkpoint_mem:
0
0
and AFTER:
121634816
97332224
This is with torch.__version__ 1.11.0+cu113 on Google colab.
Does torch.load leak memory? How can I get the GPU memory completely cleared?
It probably doesn't. Also, it depends on what you call memory leak. In this case, after the program ends all memory should be freed, python has a garbage collector, so it might not happen immediately (your del or after leaving the scope) like it does in C++ or similar languages with RAII.
del
del is called by Python and only removes the reference (same as when the object goes out of scope in your function).
torch.nn.Module does not implement del, hence its reference is simply removed.
All of the elements within torch.nn.Module have their references removed recursively (so for each CUDA torch.Tensor instance their __del__ is called).
del on each tensor is a call to release memory
More about __del__
Caching allocator
Another thing - caching allocator occupies part of the memory so it doesn't have to rival other apps in need of CUDA when you are going to use it.
Also, I assume PyTorch is loaded lazily, hence you get 0 MB used at the very beginning, but AFAIK PyTorch itself, during startup, reserves some part of CUDA memory.
The short story is given here, longer one here in case you didn’t see it already.
Possible experiments
You may try to run time.sleep(5) after your function and measure afterwards.
You can get snapshot of the allocator state via torch.cuda.memory_snapshot to get more info about allocator’s reserved memory and inner workings.
You might set the environment variable PYTORCH_NO_CUDA_MEMORY_CACHING=1 and see whether and if anything changes.
Disclaimer
Not a CUDA expert by any means, so someone with more insight could probably expand (and/or correct) my current understanding as I am sure way more things happen under the hood.
It is not possible, see here for the same question and the response from PyTorch developer:
https://github.com/pytorch/pytorch/issues/37664

Julia 1.1 with JLD HDF5 package and memory release in Windows

I'm using Julia 1.1 with JLD and HDF5 to save a file onto the disk, where I met a couple of question about the memory usage.
Issue 1:
First, I defined a 4 GB matrix A.
A = zeros(ComplexF64,(243,243,4000));
When I type the command and look at windows task manager:
A=nothing
It took several minutes for Julia to release those memory back to me. Most of the time, (In Task manager) Julia just doesn't release the memory usage at all, even though the command returned results saying that A occupied 0 bytes instantly.
varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––
A 0 bytes Nothing
Base Module
Core Module
InteractiveUtils 162.930 KiB Module
Main Module
ans 0 bytes Nothing
Issue 2:
Further, when I tried to use JLD and HDF5 to save file onto the disk. This time, the task manager told me that, when using the save("test.jld", "A", A) command, an extra 4GB memory was used.
using JLD,HDF5
A = zeros(ComplexF64,(243,243,4000));
save("test.jld", "A", A)
Further, after I typed
A=nothing
Julia won't release the 8 GB memory back to me.
Finding 3:
An interesting thing I found was that, if I retype the command
A = zeros(ComplexF64,(243,243,4000));
The task manager would told me the cashed memory was released, and the total memory usage was again only 4GB.
Question 1:
What's going on with memory management in Julia? Was it just a mistake by Windows, or some command in Julia? How to check the Julia memory usage instantly?
Question 2:
How to tell the Julia to instantly release the memory usage?
Question 3:
Is there a way to tell JLD package not use those extra 4GB meomory?
(Better, could someone tell me how to create A directly on the disk without even creating it in the memory? I knew there's memory mapped I/O in JLD package. I have tried it, but it seemed to require me to create matrix A in the memory and save A onto the disk first, before I could recall the memory mapped A again. )
This is a long question, so thanks ahead!
Julia uses garbage collector to de-alocate the memory. Usually a garbage collector does not run after every line of code but only when needed.
Try to force garbage collection by running the command:
GC.gc()
This releases memory space for unreferenced Julia objects. In this way you can check whether the memory actually has been released.
Side note: JLD used to be somewhat not-always-working (I do not know the current status). Hence you first consideration for non-cross-platform object persistence always should be the serialize function from the in-built Serialization package - check the documentation at https://docs.julialang.org/en/v1/stdlib/Serialization/index.html#Serialization.serialize

x86 - kernel - programs, cleaning and memory overwrite

I am not sure about something. Take linux for example; when a program exits, the kernel is responsible for cleaning after the process.
How can one be sure that physical memory is never overwritten from process A to process B (different virtual memories (page entries) leading to the same physical allocation)?
How is it prevented?
Linux assigns pages to and frees pages from processes using the facilities described here.(Search the kernel sources for more detailed information.)
That means, the kernel saves information about the used pages in some data structure (could be a bitmap, for example) and only the unused ones are exposed as usable to new processes.
That prevents mistakenly assigning pages in use to new process. Any behavior beyond that would be a bug and a magnificent security hole.

Why 'Total MB' in golang heap profile is less than 'RES' in top?

I have a service written in go that takes 6-7G memory at runtime (RES in top). So I used the pprof tool trying to figure out where the problem is.
go tool pprof --pdf http://<service>/debug/pprof/heap > heap_prof.pdf
But there are only about 1-2G memory in result ('Total MB' in pdf). Where's the rest ?
And I've tried profile my service with GOGC=off, as a result the 'Total MB' is exactly the same as 'RES' in top. It seems that memory is GCed but haven't been return to kernel won't be profiled.
Any idea?
P.S, I've tested in both 1.0.3 and 1.1rc3.
This is because Go currently does not give memory of GC-ed objects back to the operating system, to be precise, only for objects smaller then predefined limit (32KB). Instead memory is cached to speed up future allocations Go:malloc. Also, it seems that this is going to be fixed in the future TODO.
Edit:
New GC behavior: If the memory is not used for a while (about 5 min), runtime will advise the kernel to remove the physical mappings from the unused virtual ranges. This process can be forced by calling runtime.FreeOSMemory()

redis bgsave failed because fork Cannot allocate memory

all:
here is my server memory info with 'free -m'
total used free shared buffers cached
Mem: 64433 49259 15174 0 3 31
-/+ buffers/cache: 49224 15209
Swap: 8197 184 8012
my redis-server has used 46G memory, there is almost 15G memory left free
As my knowledge,fork is copy on write, it should not failed when there has 15G free memory,which is enough to malloc necessary kernel structures .
besides, when redis-server used 42G memory, bgsave is ok and fork is ok too.
Is there any vm parameter I can tune to make fork return success ?
More specifically, from the Redis FAQ
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
Redis doesn't need as much memory as the OS thinks it does to write to disk, so may pre-emptively fail the fork.
Modify /etc/sysctl.conf and add:
vm.overcommit_memory=1
Then restart sysctl with:
On FreeBSD:
sudo /etc/rc.d/sysctl reload
On Linux:
sudo sysctl -p /etc/sysctl.conf
From proc(5) man pages:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE set are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed". Under Linux 2.4
any non-zero value implies mode 1. In mode 2 (available since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is the size
of the swap space, and RAM is the size of the physical memory, and r is the contents of the file /proc/sys/vm/overcommit_ratio.
Redis's fork-based snapshotting method can effectively double physical memory usage and easily OOM in cases like yours. Reliance on linux virtual memory for doing snapshotting is problematic, because Linux has no visibility into Redis data structures.
Recently a new redis-compatible project Dragonfly has been released. Among other things, it solves the OOM problem entirely. (disclosure - I am the author of this project).

Resources