How much RAM is actually available for applications in Linux? - linux-kernel

I’m working on embedded Linux targets (32-bit ARM) and need to determine how much RAM is available for applications once the kernel and core software are launched. Available memory reported by free and /proc/meminfo don’t seem to align with what testing shows is actually usable by applications. Is there a way to correctly calculate how much RAM is truly available without running e.g., stress on each system?
The target system used in my tests below has 256 MB of RAM and does not use swap (CONFIG_SWAP is not set). I’m used the 3.14.79-rt85 kernel in the tests below but have also tried 4.9.39 and see similar results. During boot, the following is reported:
Memory: 183172K/262144K available (5901K kernel code, 377K rwdata, 1876K rodata, 909K init, 453K bss, 78972K reserved)
Once system initialization is complete and the base software is running (e.g., dhcp client, ssh server, etc.), I get the following reported values:
[root#host ~]# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 210016 320 7880 0 0 0 0 186 568 0 2 97 0 0
[root#host ~]# free -k
total used free shared buff/cache available
Mem: 249616 31484 209828 68 8304 172996
Swap: 0 0 0
[root#host ~]# cat /proc/meminfo
MemTotal: 249616 kB
MemFree: 209020 kB
MemAvailable: 172568 kB
Buffers: 712 kB
Cached: 4112 kB
SwapCached: 0 kB
Active: 4684 kB
Inactive: 2252 kB
Active(anon): 2120 kB
Inactive(anon): 68 kB
Active(file): 2564 kB
Inactive(file): 2184 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 2120 kB
Mapped: 3256 kB
Shmem: 68 kB
Slab: 13236 kB
SReclaimable: 4260 kB
SUnreclaim: 8976 kB
KernelStack: 864 kB
PageTables: 296 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 124808 kB
Committed_AS: 47944 kB
VmallocTotal: 1810432 kB
VmallocUsed: 3668 kB
VmallocChunk: 1803712 kB
[root#host ~]# sysctl -a | grep '^vm'
vm.admin_reserve_kbytes = 7119
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 3
vm.extfrag_threshold = 500
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 32
vm.max_map_count = 65530
vm.min_free_kbytes = 32768
vm.mmap_min_addr = 4096
vm.nr_pdflush_threads = 0
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.scan_unevictable_pages = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.user_reserve_kbytes = 7119
vm.vfs_cache_pressure = 100
Based on the numbers above, I expected to have ~160 MiB available for future applications. By tweaking sysctl vm.min_free_kbytes I can boost this to nearly 200 MiB since /proc/meminfo appears to take this reserve into account, but for testing I left it set as it is above.
To test how much RAM was actually available, i used the stress tool as follows:
stress --vm 11 --vm-bytes 10M --vm-keep --timeout 5s
At 110 MiB, the system remains responsive and both free and vmstat reflect the increased RAM usage. The lowest reported free/available values are below:
[root#host ~]# free -k
total used free shared buff/cache available
Mem: 249616 146580 93196 68 9840 57124
Swap: 0 0 0
[root#host ~]# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
11 0 0 93204 1792 8048 0 0 0 0 240 679 50 0 50 0 0
Here is where things start to break down. After increasing stress’ memory usage to 120 MiB - still well shy of the 168 MiB reported as available - the system freezes for the 5 seconds while stress is running. Continuously running vmstat during the test (or as continuously as possible due to the freeze) shows:
[root#host ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 209664 724 6336 0 0 0 0 237 666 0 1 99 0 0
3 0 0 121916 1024 6724 0 0 289 0 1088 22437 0 45 54 0 0
1 0 0 208120 1328 7128 0 0 1652 0 4431 43519 28 22 50 0 0
Due to the significant increase in interrupts and IO, I’m guessing the kernel is evicting pages containing executable code and then promptly needing to read them back in from flash. My questions are a) is this a correct assessment? and b) why would the kernel be doing this with RAM still available?
Note that if try to use a single worker with stress and claim 160 MiB of memory, the OOM gets activated and kills the test. OOM does not trigger in the scenarios described above.

Related

Bash sed awk, format CPU/Mem info from /proc/cpuinfo and /proc/meminfo [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
The problem that I'm trying to solve is to produce portable output that I can display on all of the servers in our environment to show basic info at login using generic information on all CentOS / Red Hat systems. I would like to pluck info from /proc/cpuinfo and /proc/meminfo (or free -m -h); "why not just 'yum install some-great-tool'?" is not ideal as all of this information is freely available to us right in /proc. I know that this sort of thing can often be a very simple trick for sed/awk experts (I don't know how to approach this
with my limited sed/awk knowledge).
I would like to extract something like the following on a single line:
<model name>, <cpu MHz> MHz, <cpu cores> cores, <detect "vmx" (Intel-VT) or "svm" (AMD-V support)>
e.g. with the below output, this would look like (with "1300.000" rounded to "1300")
"AMD Athlon(tm) II Neo N36L Dual-Core Processor, 1300 MHz, 2 cores, VMX-Virtualization" (or "SVM-Virtualization" or "No Virtualization")
I would like to also combine this info with that of /proc/meminfo or free -mh, so:
"AMD Athlon(tm) II Neo N36L Dual-Core Processor, 1300 MHz, 2 cores, 4.7 GB Memory (1.8 GB Free), SVM-Virtualization"
I have spent some time searching for methods, but without luck, and maybe this is an interesting generic problem as involves taking the format of tables that a lot of info is held in and extracting as required so has some generic application.
$ free -m -h
total used free shared buff/cache available
Mem: 4.5Gi 1.2Gi 1.8Gi 77Mi 1.6Gi 3.0Gi
Swap: 4.8Gi 0B 4.8Gi
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Athlon(tm) II Neo N36L Dual-Core Processor
stepping : 3
microcode : 0x10000c8
cpu MHz : 1300.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save
bugs : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs null_seg amd_e400 spectre_v1 spectre_v2
bogomips : 2595.59
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
$ cat /proc/meminfo
MemTotal: 4771304 kB
MemFree: 1862372 kB
MemAvailable: 3195768 kB
Buffers: 2628 kB
Cached: 1542788 kB
SwapCached: 0 kB
Active: 1534572 kB
Inactive: 909316 kB
Active(anon): 917792 kB
Inactive(anon): 62468 kB
Active(file): 616780 kB
Inactive(file): 846848 kB
Unevictable: 8384 kB
Mlocked: 0 kB
SwapTotal: 5070844 kB
SwapFree: 5070844 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 881304 kB
Mapped: 395420 kB
Shmem: 79776 kB
KReclaimable: 152892 kB
Slab: 295508 kB
SReclaimable: 152892 kB
SUnreclaim: 142616 kB
KernelStack: 9328 kB
PageTables: 45156 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 7456496 kB
Committed_AS: 5260708 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 2864 kB
HardwareCorrupted: 0 kB
AnonHugePages: 417792 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 314944 kB
DirectMap2M: 4796416 kB
DirectMap1G: 0 kB
Using /proc/cpuinfo and free -mh along with awk, search for the strings required, using : as the field delimited, set variables accordingly, splitting the output of free -mh further into an array called arr based on " " as the delimiter. At the end, print the data in the required format using the variables created.
When searching for lines beginning with flag, we search for strings svn or vmx using awk's match function. A match will signified by the RSTART variable not being 0 and so we check this to find the type of virtualisatiion being utilised. As we have set virt to No Virtualisation at the beginning, no matches will print No Virtualisation.
awk -F: '/^model name/ {
mod=$2
}
/^cpu MHz/ {
mhz=$2
}
/^cpu core/ {
core=$2
}
/^flags/ {
virt="No Virtualisation";
match($0,"svm");
if (RSTART!=0)
{
virt="SVM-Virtualisation"
};
match($0,"vmx");
if (RSTART!=0) {
virt="VMX-Virtualisation"
}
}
/^Mem:/ {
split($2,arr," ");
tot=arr[1];
free=arr[2]
}
END {
printf "%s %dMHz %s core(s) %s %sB Memory (%sB Free)\n",mod,mhz,core,virt,tot,free
}' /proc/cpuinfo <(free -mh)
One liner:
awk -F: '/^model name/ { mod=$2 } /^cpu MHz/ { mhz=$2 } /^cpu core/ {core=$2} /^flags/ { virt="No Virtualisation";match($0,"svm");if (RSTART!=0) { virt="SVM-Virtualisation" };match($0,"vmx");if (RSTART!=0) { virt="VMX-Virtualisation" } } /^Mem:/ {split($2,arr," ");tot=arr[1];free=arr[2]} END { printf "%s %dMHz %s core(s) %s %sB Memory (%sB Free)\n",mod,mhz,core,virt,tot,free }' /proc/cpuinfo <(free -mh)

Entry in /proc/meminfo

I am now study linux.
cat /proc/meminfo produces as following.
Please tell me the mean of entry "Active(file)/Inactive(file)".
I can't find the explanation of these entry.
Thanks.
MemTotal: 7736104 kB
MemFree: 166580 kB
Buffers: 604636 kB
Cached: 5965376 kB
SwapCached: 0 kB
Active: 4294464 kB
Inactive: 2319240 kB
Active(anon): 13688 kB
Inactive(anon): 33828 kB
Active(file): 4280776 kB
Inactive(file): 2285412 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777208 kB
SwapFree: 16777208 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 43772 kB
Mapped: 11056 kB
Shmem: 3792 kB
Slab: 861004 kB
SReclaimable: 818040 kB
SUnreclaim: 42964 kB
KernelStack: 1624 kB
PageTables: 5460 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 20645260 kB
Committed_AS: 124392 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 450644 kB
VmallocChunk: 34359282660 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2048 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 16384 kB
DirectMap2M: 3880960 kB
DirectMap1G: 4194304 kB
According to the output, result of Active(file) + Inactive(file) + Shmem doesn't equal to that of Cached + Buffer + SwapCached
Active — The total amount of buffer or page cache memory, in kilobytes, that is in active use. This is memory that has been recently used and is usually not reclaimed for other purposes.
Inactive — The total amount of buffer or page cache memory, in kilobytes, that are free and and available. This is memory that has not been recently used and can be reclaimed for other purposes.
Ref : https://www.centos.org/docs/5/html/5.1/Deployment_Guide/s2-proc-meminfo.html
And FYI.
Active = Active(anon) + Active(file) Inactive = Inactive(anon) + Inactive(file)
Active(file), Inactive(file) has file back-end which means its original file is in disk but to use it faster it was loaded on RAM.
Active(file) + Inactive(file) + Shmem = Cached + Buffer + SwapCached

Go routine performance maximizing

I writing a data mover in go. Taking data located in one data center and moving it to another data center. Figured go would be perfect for this given the go routines.
I notice if I have one program running 1800 threads the amount of data being transmitted is really low
here's the dstat print out averaged over 30 seconds
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0.70 3.58 4.42| 10 1 89 0 0 0| 0 156k|7306k 6667k| 0 0 | 11k 6287
0.61 3.28 4.29| 12 2 85 0 0 1| 0 6963B|8822k 8523k| 0 0 | 14k 7531
0.65 3.03 4.18| 12 2 86 0 0 1| 0 1775B|8660k 8514k| 0 0 | 13k 7464
0.67 2.81 4.07| 12 2 86 0 0 1| 0 1638B|8908k 8735k| 0 0 | 13k 7435
0.67 2.60 3.96| 12 2 86 0 0 1| 0 819B|8752k 8385k| 0 0 | 13k 7445
0.47 2.37 3.84| 11 2 86 0 0 1| 0 2185B|8740k 8491k| 0 0 | 13k 7548
0.61 2.22 3.74| 10 2 88 0 0 0| 0 1229B|7122k 6765k| 0 0 | 11k 6228
0.52 2.04 3.63| 3 1 97 0 0 0| 0 546B|1999k 1365k| 0 0 |3117 2033
If I run 9 instances of the program with 200 threads each I see much better performance
---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw
8.34 9.56 8.78| 53 8 36 0 0 3| 0 410B| 38M 32M| 0 0 | 41k 26k
8.01 9.37 8.74| 74 10 12 0 0 4| 0 137B| 51M 51M| 0 0 | 59k 39k
8.36 9.31 8.74| 75 9 12 0 0 4| 0 1092B| 51M 51M| 0 0 | 59k 39k
6.93 8.89 8.62| 74 10 12 0 0 4| 0 5188B| 50M 49M| 0 0 | 59k 38k
7.09 8.73 8.58| 75 9 12 0 0 4| 0 410B| 51M 50M| 0 0 | 60k 39k
7.40 8.62 8.54| 75 9 12 0 0 4| 0 137B| 52M 49M| 0 0 | 61k 40k
7.96 8.63 8.55| 75 9 12 0 0 4| 0 956B| 51M 51M| 0 0 | 59k 39k
7.46 8.44 8.49| 75 9 12 0 0 4| 0 273B| 51M 50M| 0 0 | 58k 38k
8.08 8.51 8.51| 75 9 12 0 0 4| 0 410B| 51M 51M| 0 0 | 59k 39k
load average is a little high but I'll worry about that later. The network traffic though is almost hitting the network potential.
I'm on Ubuntu 12.04,
8 Gigs Ram,
2.3 GHz processors (says EC2 :P)
Also, I've increased my file descriptors from 1024 to 10240
I thought go was designed for this kind of thing or am I expecting too much of go for this application?
Is there something trivial that I'm missing? Do I need to configure my system to maximizes go's potential?
EDIT
I guess my question wasn't clear enough. Sorry. I'm not asking for magic from go, I know the computers have limitations to what they can handle.
So I'll rephrase. Why is 1 instance with 1800 go routines != 9 instances with 200 threads each? Same amount of go routines significantly less performance for 1 instance compared to 9 instances.
Please note, that goroutines are also limited to your local maschine and that channels are not natively network enabled, i.e. your particular case is probably not biting go's chocolate site.
Also: What did you expect from throwing (suposedly) every transfer into a goroutine? IO-Operations tend to have their bottleneck where the bits hit the metal, i.e. the physical transfer of the data to the medium. Think of it like that: No matter how many Threads or (Goroutines in this case) try to write to Networkcard, you still only have one Networkcard. Most likely hitting it with to many concurrent write calls will only slow things down, since the involved overhead increases
If you think this is not the problem or want to audit your code for optimized performance, go has neat builtin features to do so: profiling go programs (official go blog)
But still the actual bottleneck might well be outside your go program AND/OR in the way it interacts with the os.
Adressing your actual problem without code is pointless guessing. Post some and everyone will try their best to help you.
You will probably have to post your source code to get any real input, but just to be sure, you have increased number of cpus to use?
import "runtime"
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
}

Calculating CPU usage from /proc/stat

When reading /proc/stat, I get these return values:
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
cpu0 3408982 106 264219 81480207 19354 0 35 0 0 0
cpu1 3395441 116 265930 81509149 11129 0 30 0 0 0
cpu2 3411003 197 214515 81133228 418090 0 1911 0 0 0
cpu3 3478358 168 257604 81417703 30421 0 29 0 0 0
cpu4 1840706 20 155376 83328751 1564 0 7 0 0 0
cpu5 1416488 15 171101 83410586 1645 13 108729 0 0 0
cpu6 1773002 7 133686 83346305 25666 10 1803 0 0 0
cpu7 1858207 10 143928 83322929 1819 0 8 0 0 0
Some sources state to read only the first four values to calculate CPU usage, while some sources say to read all the values.
Do I read only the first four values to calculate CPU utilization; the values user, nice, system, and idle? Or do I need all the values? Or not all, but more than four? Would I need iowait, irq, or softirq?
cpu 20582190 643 1606363
Versus the entire line.
cpu 20582190 643 1606363 658948861 509691 24 112555 0 0 0
Edits: Some sources also state that iowait is added into idle.
When calculating a specific process' CPU usage, does the method differ?
The man page states that it varies with architecture, and also gives a couple of examples describing how they are different:
In Linux 2.6 this line includes three additional columns: ...
Since Linux 2.6.11, there is an eighth column, ...
Since Linux 2.6.24, there is a ninth column, ...
When "some people said to only use..." they were probably not taking these into account.
Regarding whether the calculation differs across CPUs: You will find lines related to "cpu", "cpu0", "cpu1", ... in /proc/stat. The "cpu" fields are all aggregates (not averages) of corresponding fields for the individual CPUs. You can check that for yourself with a simple awk one-liner.
cpu 84282 747 20805 1615949 44349 0 308 0 0 0
cpu0 26754 343 9611 375347 27092 0 301 0 0 0
cpu1 12707 56 2581 422198 5036 0 1 0 0 0
cpu2 33356 173 6160 394561 7508 0 4 0 0 0
cpu3 11464 174 2452 423841 4712 0 1 0 0 0

Oracle 11gr2 failed check of kernel parameters on hp-ux

I'm installing oracle 11gR2 on 64 bit itanium HP-UX (v 11.31) system ( for HP Operation Manager 9 ).
According with the installation requiremens, I've changed the kernel parameters but when I start the installation process it don't recognize them.
Below the parameters that I've set :
Parameter ( Manual) (on server)
-------------------------------------------------------------
fs_async 0 0
ksi_alloc_max (nproc*8) 10240*8 = 81920
executable_stack 0 0
max_thread_proc 1024 3003
maxdsiz 0x40000000 (1073741824) 2063835136
maxdsiz_64bit 0x80000000 (2147483648) 2147483648
maxfiles 256 (a) 4096
maxssiz 0x8000000 (134217728) 134217728
maxssiz_64bit 0x40000000 (1073741824) 1073741824
maxuprc ((nproc*9)/10) 9216
msgmni (nproc) 10240
msgtql (nproc) 32000
ncsize 35840 95120
nflocks (nproc) 10240
ninode (8*nproc+2048) 83968
nkthread (((nproc*7)/4)+16) 17936
nproc 4096 10240
semmni (nproc) 10240
semmns (semmni*2) 20480
semmnu (nproc-4) 10236
semvmx 32767 65535
shmmax size of memory or 0x40000000 (higher one) 1073741824
shmmni 4096 4096
shmseg 512 1024
vps_ceiling 64 64
if this can help:
[root#HUG30055 /] # swapinfo
Kb Kb Kb PCT START/ Kb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4194304 0 4194304 0% 0 - 1 /dev/vg00/lvol2
dev 8388608 0 8388608 0% 0 - 1 /dev/vg00/lvol10
reserve - 742156 -742156
memory 7972944 3011808 4961136 38%
[root#HUG30055 /] # bdf /tmp
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol6 4194304 1773864 2401576 42% /tmp

Resources