It seems that MRI makes duplication of memory allocation for every new thread.
I use Ubuntu x64, ruby-2.2.4 (rvm), and this what i get:
Just started irb:
I see pmap -d 1656 59760K (allocated memory, or '[ stack ]' for the program stack [man pmap(1)]) memory usage:
And when creating a thread:
I see pmap -d 1656 127352K memory usage:
So, I see duplication 59760K -> 127352K of memory allocation.
Such behavior is similar to result of the fork() call, which being used for creation a new process, makes a copy of its calling process data ('copy-on-write' is out this context) for new process.
But Thread is created in the same process and shares its data, and it looks strange...
In practice, it means that Thread in Ruby has similar to Process restriction in memory usage: new thread creation fails when allocated memory getting closer to physical memory size.
I am curious, WHY?
UPDATE
It's not duplication memory but additional allocation for ~50K for each thread.
Thanks #tadman for suggestion that it's an overhead and not something like copying memory in the fork()'s way.
Related
I'm using Julia 1.1 with JLD and HDF5 to save a file onto the disk, where I met a couple of question about the memory usage.
Issue 1:
First, I defined a 4 GB matrix A.
A = zeros(ComplexF64,(243,243,4000));
When I type the command and look at windows task manager:
A=nothing
It took several minutes for Julia to release those memory back to me. Most of the time, (In Task manager) Julia just doesn't release the memory usage at all, even though the command returned results saying that A occupied 0 bytes instantly.
varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––
A 0 bytes Nothing
Base Module
Core Module
InteractiveUtils 162.930 KiB Module
Main Module
ans 0 bytes Nothing
Issue 2:
Further, when I tried to use JLD and HDF5 to save file onto the disk. This time, the task manager told me that, when using the save("test.jld", "A", A) command, an extra 4GB memory was used.
using JLD,HDF5
A = zeros(ComplexF64,(243,243,4000));
save("test.jld", "A", A)
Further, after I typed
A=nothing
Julia won't release the 8 GB memory back to me.
Finding 3:
An interesting thing I found was that, if I retype the command
A = zeros(ComplexF64,(243,243,4000));
The task manager would told me the cashed memory was released, and the total memory usage was again only 4GB.
Question 1:
What's going on with memory management in Julia? Was it just a mistake by Windows, or some command in Julia? How to check the Julia memory usage instantly?
Question 2:
How to tell the Julia to instantly release the memory usage?
Question 3:
Is there a way to tell JLD package not use those extra 4GB meomory?
(Better, could someone tell me how to create A directly on the disk without even creating it in the memory? I knew there's memory mapped I/O in JLD package. I have tried it, but it seemed to require me to create matrix A in the memory and save A onto the disk first, before I could recall the memory mapped A again. )
This is a long question, so thanks ahead!
Julia uses garbage collector to de-alocate the memory. Usually a garbage collector does not run after every line of code but only when needed.
Try to force garbage collection by running the command:
GC.gc()
This releases memory space for unreferenced Julia objects. In this way you can check whether the memory actually has been released.
Side note: JLD used to be somewhat not-always-working (I do not know the current status). Hence you first consideration for non-cross-platform object persistence always should be the serialize function from the in-built Serialization package - check the documentation at https://docs.julialang.org/en/v1/stdlib/Serialization/index.html#Serialization.serialize
I am trying to create a port for Contiki-os to LPC1347, and i have a question as to how exactly is memory handled in Contiki. Protothreads are stack-less and no "real threads" are used so everything is basically on the same stack, so it is basically static memory allocation. I understand how protothreads work but when a new process is initialized, how is memory allocated for it and also, in case of an event having data, how is memory managed for event data?
All required memory is statically allocated during compilation/linkage. Its done by the PROCESS Macro[1], which allocates a structure containing the necessary information [2]. As for the events, they must allocate their own memory, too[3].
It is therefore not possible to run the same thread* or schedule the same event twice.
* Actually it is, but not using the PROCESS macro.
[1] https://github.com/contiki-os/contiki/blob/5bede26b/core/sys/process.h#L301-311
[2] https://github.com/contiki-os/contiki/blob/5bede26b/core/sys/process.h#L315-326
[3] https://github.com/contiki-os/contiki/blob/5bede26b/core/sys/process.c#L62-66
all:
here is my server memory info with 'free -m'
total used free shared buffers cached
Mem: 64433 49259 15174 0 3 31
-/+ buffers/cache: 49224 15209
Swap: 8197 184 8012
my redis-server has used 46G memory, there is almost 15G memory left free
As my knowledge,fork is copy on write, it should not failed when there has 15G free memory,which is enough to malloc necessary kernel structures .
besides, when redis-server used 42G memory, bgsave is ok and fork is ok too.
Is there any vm parameter I can tune to make fork return success ?
More specifically, from the Redis FAQ
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
Redis doesn't need as much memory as the OS thinks it does to write to disk, so may pre-emptively fail the fork.
Modify /etc/sysctl.conf and add:
vm.overcommit_memory=1
Then restart sysctl with:
On FreeBSD:
sudo /etc/rc.d/sysctl reload
On Linux:
sudo sysctl -p /etc/sysctl.conf
From proc(5) man pages:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE set are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed". Under Linux 2.4
any non-zero value implies mode 1. In mode 2 (available since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is the size
of the swap space, and RAM is the size of the physical memory, and r is the contents of the file /proc/sys/vm/overcommit_ratio.
Redis's fork-based snapshotting method can effectively double physical memory usage and easily OOM in cases like yours. Reliance on linux virtual memory for doing snapshotting is problematic, because Linux has no visibility into Redis data structures.
Recently a new redis-compatible project Dragonfly has been released. Among other things, it solves the OOM problem entirely. (disclosure - I am the author of this project).
After creating large objects and running out of RAM, I will try and delete the objects in my current environment using
rm(list=ls())
When I check my RAM usage, nothing has changed. Even after calling gc() nothing has changed. I can only replenish my RAM by quitting R.
Anybody have advice for dealing with memory-intensive objects within R?
Memory for deleted objects is not released immediately. R uses a technique called "garbage collection" to reclaim memory for deleted objects. Periodically, it cycles through the list of accessible objects (basically, those that have names and have not been deleted and can therefore be accessed by the user), and "tags" them for retention. The memory for any untagged objects is returned to the operating system after the garbage-collection sweep.
Garbage collection happens automatically, and you don't have any direct control over this process. But you can force a sweep by calling the command gc() from the command line.
Even then, on some operating systems garbage collection might not reclaim memory (as reported by the OS). Older versions of Windows, for example, could increase but not decrease the memory footprint of R. Garbage collection would only make space for new objects in the future, but would not reduce the memory use of R.
On Windows, the technique you describe works for me. Try the following example.
Open the Windows Task Manager (CTRL+SHIFT+ESC).
Start RGui. RGui.exe mem usage is 27 460K.
Type
gcinfo(TRUE)
x <- rnorm(1e8)
RGui.exe mem usage is now 811 100K.
Type rm("x"). RGui.exe mem usage is still 811 100K.
Type gc(). RGui.exe mem usage is now 28 332K.
Note that gc shoud be called automatically if you have removed objects from your workspace, and then you try to allocate more memory to new variables.
My impression is that multiple forms of gc() are tried before R reports failed memory allocation. I'm not aware of a solution for this at present, other than restarting R as you suggest. It appears that R does not defragment memory.
An old question, I realize, but I've found that (on OS Mojave), invoking pryr::mem_used() in the R session causes the activity monitor to immediately update the reported memory usage to reflect only the objects retained in the R environment.
The main motivation: to use the movntdqa assembler command to avoid stack pollution. This command only works with write combining memory (also called WS and USWC)
Pass PAGE_WRITECOMBINE to VirtualAllocEx(). Sequential writes to that page will be write-combined by the MMU. Reads or nonsequential writes will induce a severe performance penalty.