Why is docker consuming so much VIRT memory? - macos

I am running docker on my Mac OS X: 2,5 GHz Intel Core i7, 16 GB 1600 MHz DDR3.
The point is that it seems to be consuming TOO much VIRT memory, if I am reading the htop output correctly.
Is this normal? Or is there any problem behind it? My laptop is very slow.

This is illustrated by moby/moby issue 31594.
That issue actually asks to run contrib/check-config.sh as a way to know more about the docker configuration being used.
The same issue is illustrated since 2015 in #15020
It appears that Docker somehow does not respect MALLOC_ARENA_MAX and will regardless allow the amount of virtual memory to grow to a number correlating to the number of CPUs being allocated to it.
(host is running macOS 10.13.2)
As commented:
docker itself does nothing with that environment variable (or memory management of the processes inside the container); it sets up namespaces and cgroups for the process, which is all part of the kernel.

Related

How to check IRQ latency in Linux (X86_64) for performance tuning?

Is there a way to check the interrupt processing latency in Linux kernel?
Or is there a way to check why CPU usage is only 40% in a specific configuration of Linux 4.19.138?
Background:
Currently I met a problem, I had a X86 server running either a 3rd party Linux-4.19.138 kernel (whose configuration file is about 6000 lines) or Ubuntu 20.04 X86_64 (whose configuration file is about 9500 lines long).
When running netperf test on this server , I found with the 3rd-party Linux-4.19.138 kernel, the IO latency of netperf is worse than with Ubuntu 20.04. The CPU usage is below 40% when running the 3rd party kernel, while it is about 100% when running Ubuntu 20.04.
They are using the same kernel command line and same performance profile in kernel runtime.
It seemed that the interrupt or the netserver process in the server is throttled in Linux-4.19.138.
Then, I rebuilt Ubuntu 20.04 kernel by using the short configuration file (6000 lines long), and got the similar bad results.
So it concluded that the kernel configuration made the difference.
Before comparing the 2 configurations (6000 lines vs 9500 lines), to narrow it down, my ask is, is there a way to check why CPU usage is only 40% in that configuration of 4.19.138? Or is there a way to check the interrupt processing latency in Linux kernel ?
I finally found the reason. It is from the
net.core.busy_read and
net.core.busy_poll are both to 0.
That means the socket polling is disabled, which impacts the netperf latency.
But the question changed to
In this case, the lower CPU usage is a sign that there is something different in Linux, what kind of tool or how can we should figure out what causes the CPU usage difference in 2 kernels?

Any way to run Docker For Mac in only a couple GB of RAM?

Docker For Mac is demanding 4GB of available RAM.
That is a much larger overhead than I have seen before for VMs.
Is there any way to run Docker on Mac without so much RAM?
Have you tried these settings - https://docs.docker.com/docker-for-mac/#advanced
I have been running Docker Desktop on Mac with these settings for a long time without no issues until you run some heavy workloads on it.

Why is Docker limiting my container's CPU usage to 100% on Raspberry

I'm running the latest docker version on Raspbian on my RaspberryPi 3.
I have a program that takes pictures with the camera, compresses them and sends them over the network.
When I run the program outside of docker I can see using top that it's constantly consuming around 130% CPU (of 4cores x 100% of the raspberry). The constant compression is the CPU intensive part of the program but it manages to compress around 32 fps.
When I run the exact same program in a docker container I can see in top that it is only using 100% cpu (still distributed among the cores). Here the program is only able to compress at around 23 fps.
I tried passing the --cpus flag but it returned an error:
docker: Error response from daemon: NanoCPUs can not be set, as your kernel does not support CPU cfs period/quota or the cgroup is not mounted.
Note: I have done many tests and networking is not the issue.
I think I figured out the problem.
When creating the image via the Dockerfile it downloaded the different versions of the libraries I used in the code. So technically it was running different code from what was running on the host and it was not a docker issue.

CoreOS VM crash: swap trace printed

I'm using CoreOS 773.1.0 with kubernetes. Recently it crashes and printed this trace log:
The VM is still running but I cannot ssh to it, kubernetes master node declare it as NotReady. I had to turn it off (not shutdown) and start it.
I'm using Hyper-V as hypervisor, the VM is assiged 12GB RAM, 4GB swap, 4 cores CPU. Especially, I got this error after I moved the disk (.vhd file) to new partition.
This is a known issue to CoreOS 717.3.0 with swap: https://github.com/coreos/bugs/issues/429
Based on the stack trace, it looks like the kernel was trying to free up memory. So, probably the node was under severe memory pressure. Kernel bugs tend to crop up under memory pressure.
It also looks like swap was turned on. Kubernetes developers don't recommend turning on swap.
Looks like Kubelet process is stucked. Do you have kubelet log and at which operation, kubelet is stucked?

qemu vs qemu-kvm: some performance measurements

I conducted the following benchmark in qemu and qemu-kvm, with the following configuration:
CPU: AMD 4400 process dual core with svm enabled, 2G RAM
Host OS: OpenSUSE 11.3 with latest Patch, running with kde4
Guest OS: FreeDos
Emulated Memory: 256M
Network: Nil
Language: Turbo C 2.0
Benchmark Program: Count from 0000000 to 9999999. Display the counter on the screen
by direct accessing the screen memory (i.e. 0xb800:xxxx)
It only takes 6 sec when running in qemu.
But it takes 89 sec when running in qemu-kvm.
I ran the benchmark one by one, not in parallel.
I scratched my head the whole night, but still not idea why this happens. Would somebody give me some hints?
KVM uses qemu as his device simulator, any device operation is simulated by user space QEMU program. When you write to 0xB8000, the graphic display is operated which involves guest's doing a CPU `vmexit' from guest mode and returning to KVM module, who in turn sends device simulation requests to user space QEMU backend.
In contrast, QEMU w/o KVM does all the jobs in unified process except for usual system calls, there's fewer CPU context switches. Meanwhile, your benchmark code is a simple loop which only requires code block translation for just one time. That cost nothing, compared to vmexit and kernel-user communication of every iteration in KVM case.
This should be the most probable cause.
Your benchmark is an IO-intensive benchmark and all the io-devices are actually the same for qemu and qemu-kvm. In qemu's source code this can be found in hw/*.
This explains that the qemu-kvm must not be very fast compared to qemu. However, I have no particular answer for the slowdown. I have the following explanation for this and I think its correct to a large extent.
"The qemu-kvm module uses the kvm kernel module in linux kernel. This runs the guest in x86 guest mode which causes a trap on every privileged instruction. On the contrary, qemu uses a very efficient TCG which translates the instructions it sees at the first time. I think that the high-cost of trap is showing up in your benchmarks." This ain't true for all io-devices though. Apache benchmark would run better on qemu-kvm because the library does the buffering and uses least number of privileged instructions to do the IO.
The reason is too much VMEXIT take place.

Resources