I'm wondering about haproxy performance from within a container. To make things simple if I have a vm running haproxy with this cpu config I know what to expect:
nbproc 1
nbthread 8
cpu-map auto:1/1-8 0-7
If I want to port the (whole) config to docker for testing purposes without any fancy swarm magic or setup just docker so that I can understand how things map, I'd imagine that the cpu config gets simpler and that the haproxy instance is meant to scale. I guess I have two questions:
Would you even bother configuring cpu from within an haproxy docker container or would you scale the container from behind a service? Maybe you need both.
Can a single container utilise the above config as though it were running on the system as a daemon? Would docker / containerd even care about this config?
I know having 4 containers each with their own config with the cpu evenly mapped like so wouldn't scale or make any sense:
nbproc 1
nbthread 2
cpu-map auto:1/1-2 0-1
nbproc 1
nbthread 2
cpu-map auto:1/3-4 2-3
nbproc 1
nbthread 2
cpu-map auto:1/5-6 4-5
nbproc 1
nbthread 2
cpu-map auto:1/7-8 6-7
But it's this sort of saturation that I'm wondering about. Just how does haproxy / docker handle this sort of cpu nuance?
I've confirmed that there's little to no perceivable impact to service when running haproxy under containerd vs running under systemd using the image provided by haproxy. Running a single container -d with --network host and no limits on cpu or memory at worst I've seen a 2-3% impact on web external latency with live traffic peaked at about 50-60MB/sec, which itself is dependent on throughput and type of requests. On an 8 core vm with 4GB mem (host cpu is xeon 6130 Gold) and a gig interface the memory utilisation is almost identical. cpu performance also remains stable with potential 3-5% increase in utilisation. These tests are private and unpublished.
As far as cpu configuration goes
nbproc 1
nbthread 8
cpu-map auto:1/1-8 0-7
master-worker
This config maps 1:1 between containerd and systemd and yeilds the results already mentioned. The proc and threads will start up under containerd and function as you expect. This takes up about 80-90% of the total cpu (800%) which represents less than 1 fully loaded core at peak. So this container could be scaled with this configuration a further 8 times in theory, 5 or 6 times to leave some headroom.
Also note that any fluctuations in these performance data are likely due to my environment. These tests were taken from a real environment acorss multiple sites not a test bed where I controlled every aspect. Also note depending on your host cpu and load your results will vary wildly.
As per the documentation of docker.
We can get CPU usage of docker container with docker stats command.
The column CPU % will give the percentage of the host’s CPU the container is using.
Let say I limit the container to use 50% of hosts single CPU. I can specify 50% single CPU core limit by --cpus=0.5 option as per https://docs.docker.com/config/containers/resource_constraints/
How can we get the CPU% usage of container out of allowed CPU core by any docker command?
E.g. Out of 50% Single CPU core, 99% is used.
Is there any way to get it with cadvisor or prometheus?
How can we get the CPU% usage of container out of allowed CPU core by any docker command? E.g. Out of 50% Single CPU core, 99% is used.
Docker has docker stats command which shows CPU/Memory usage and few other stats:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c43f085dea8c foo_test.1.l5haec5oyr36qdjkv82w9q32r 0.00% 11.15MiB / 100MiB 11.15% 7.45kB / 0B 3.29MB / 8.19kB 9
Though it does show memory usage regarding the limit out of the box, there is no such feature for CPU yet. It is possible to solve that with a script that will calculate the value on the fly, but I'd rather chosen the second option.
Is there any way to get it with cadvisor or prometheus?
Yes, there is:
irate(container_cpu_usage_seconds_total{cpu="total"}[1m])
/ ignoring(cpu)
(container_spec_cpu_quota/container_spec_cpu_period)
The first line is a typical irate function that calculates how much of CPU seconds a container has used. It comes with a label cpu="total", which the second part does not have, and that's why there is ignoring(cpu).
The bottom line calculates how many CPU cores a container is allowed to use. There are two metrics:
container_spec_cpu_quota - the actual quota value. The value is computed of a fraction of CPU cores that you've set as the limit and multiplied by container_spec_cpu_period.
container_spec_cpu_period - comes from CFS Scheduler and it is like a unit of the quota value.
I know it may be hard to grasp at first, allow me to explain on an example:
Consider that you have container_spec_cpu_period set to the default value, which is 100,000 microseconds, and container CPU limit is set to half a core (0.5). In this case:
container_spec_cpu_period 100,000
container_spec_cpu_quota 50,000 # =container_spec_cpu_period*0.5
With CPU limit set to two cores you will have this:
container_spec_cpu_quota 200,000
And so by dividing one by another we get the fraction of CPU cores back, which is then used in another division to calculate how much of the limit is used.
$ man top
CPU Percentage of processor usage, broken into user, system, and idle components. The time period for which
these percentages are calculated depends on the event counting mode.
Disks Number and total size of disk reads and writes.
LoadAvg Load average over 1, 5, and 15 minutes. The load average is the average number of jobs in the run
queue.
MemRegions Number and total size of memory regions, and total size of memory regions broken into private (broken
into non-library and library) and shared components.
Networks Number and total size of input and output network packets.
PhysMem Physical memory usage, broken into wired, active, inactive, used, and free components.
Procs Total number of processes and number of processes in each process state.
SharedLibs Resident sizes of code and data segments, and link editor memory usage.
Threads Number of threads.
Time Time, in H:MM:SS format. When running in logging mode, Time is in YYYY/MM/DD HH:MM:SS format by
default, but may be overridden with accumulative mode. When running in accumulative event counting
mode, the Time is in HH:MM:SS since the beginning of the top process.
VirtMem Total virtual memory, virtual memory consumed by shared libraries, and number of pageins and pageouts.
Swap Swap usage: total size of swap areas, amount of swap space in use and amount of swap space available.
Purgeable Number of pages purged and number of pages currently purgeable.
Below the global state fields, a list of processes is displayed. The fields that are displayed depend on the
options that are set. The pid field displays the following for the architecture:
+ for 64-bit native architecture, or - for 32-bit native architecture, or * for a non-native architecture.
I see the following output of top on Mac. I don't quite understand as the manual is not very detailed.
For example, I only have 8GB of memory. Why it shows 15G PhysMem? What are wired, active, inactive, used, and free components?
For Disks, are the numbers '21281572/769G read' the size of disk read since the machine starts?
For Networks, are the numbers since the machine starts?
For VM, what are vsize, framework vsize, swapins, swapouts?
$ top -l 1 | head
Processes: 797 total, 4 running, 1 stuck, 792 sleeping, 1603 threads
2019/05/08 09:48:40
Load Avg: 54.32, 41.08, 34.69
CPU usage: 62.2% user, 36.89% sys, 1.8% idle
SharedLibs: 258M resident, 65M data, 86M linkedit.
MemRegions: 78888 total, 6239M resident, 226M private, 2045M shared.
PhysMem: 15G used (2220M wired), 785M unused.
VM: 3392G vsize, 1299M framework vsize, 0(0) swapins, 0(0) swapouts.
Networks: packets: 24484543/16G in, 24962180/7514M out.
Disks: 21281572/769G read, 20527776/242G written.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/disk1s1 466G 444G 19G 97% /
/dev/disk1s4 466G 3.1G 19G 14% /private/var/vm
/dev/disk2s1 932G 546G 387G 59% /Volumes/usbhd
com.apple.TimeMachine.2019-05-06-225547#/dev/disk1s1 466G 441G 19G 96% /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/py???s MacBook Air/2019-05-06-225547/Macintosh HD
com.apple.TimeMachine.2019-05-02-082105#/dev/disk1s1 466G 440G 19G 96% /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/py???s MacBook Air/2019-05-02-082105/Macintosh HD
I only have 8GB of memory. Why it shows 15G PhysMem?
There is 15G available, which is above the 8Gb in the machine due to the usage of Virtual Memory, where the OS can choose to swap (copy in / out) pages in memory to other storage (hard disk / SSD).
What are wired, active, inactive, used, and free components?
These are the states of physical pages of memory as used by Virtual Memory. So we have:
wired - Memory pages that are in use and can't be swapped (paged) out to disk (e.g. the OS itself)
active - Memory pages used for virtual memory that are in use that have been referenced recently. They are not likely to be paged out, unless no other pages are available
inactive - Memory pages that are used for virtual memory, but have not been referenced recently. They are likely to be swapped out if the need arises
used - Sometimes known as "speculative", physical memory that is speculatively mapped as the OS guesses about possibly requiring this, but it's not yet active
free - Physical memory pages not being used for virtual memory and is instantly available
For Disks, are the numbers '21281572/769G read' the size of disk read since the machine starts
For Networks, are the numbers since the machine starts?
Yes, I believe that these are since rebooting the OS.
For VM, what are vsize, framework vsize, swapins, swapouts?
I expect these are:
vsize - the amount of virtual space being used on disk
framework vsize - No idea about this one!
swapins - the number of memory pages loaded in from virtual memory to physical memory
swapout - the number of memory pages swapped out to physical memory from virtual memory
I was testing mesos cgroups isolation. To see what kind of error gets thrown.
I ran the below shell program with marathon. Assigned 1 MB memory and 1 CPU.
#!/bin/sh
temp=a
while :
do
temp=$temp$temp
echo ${#temp}
sleep 1
done
A single character takes 1B of space so the program above needs to throw an exception once the length of the temp string reaches about 1 MB. But the tasks seem to get killed randomly. The task sometimes gets killed at length 1048576 or 2097152 or 4194304.
Ideally since 1MB is the limit it should have stopped when length is 524288.
Additional info -
Slave is run with --isolation='cgroups/cpu,cgroups/mem'
Mesos version - 0.25
The variance you are seeing can be explained with the following:
The amount of memory taken up by your script is not entirely deterministic, as it depends on the implementation of the shell interpreter as well as the size of your system's shared libraries (i.e. the parts of those libraries loaded into your program's resident set).
A 1 MB task in Mesos is accompanied 32 MB for the executor. Because the executor requires slightly less than 32 MB, you will have slightly more than 1 MB for your task.
My program loads a lot of data at start up and then calls debug.FreeOSMemory() so that any extra space is given back immediately.
loadDataIntoMem()
debug.FreeOSMemory()
after loading into memory , htop shows me the following for the process
VIRT RES SHR
11.6G 7629M 8000
But a call to runtime.ReadMemStats shows me the following
Alloc 5593336608 5.3G
BuckHashSys 1574016 1.6M
HeapAlloc 5593336610 5.3G
HeapIdle 2607980544 2.5G
HeapInuse 7062446080 6.6G
HeapReleased 2607980544 2.5G
HeapSys 9670426624 9.1G
MCacheInuse 9600 9.4K
MCacheSys 16384 16K
MSpanInuse 106776176 102M
MSpanSys 115785728 111M
OtherSys 25638523 25M
StackInuse 589824 576K
StackSys 589824 576K
Sys 10426738360 9.8G
TotalAlloc 50754542056 48G
Alloc is the amount obtained from system and not yet freed ( This is
resident memory right ?) But there is a big difference between the two.
I rely on HeapIdle to kill my program i.e if HeapIdle is more than 2 GB, restart - in this case it is 2.5, and isn't going down even after a while. Golang should use from heap idle when allocating more in the future, thus reducing heap idle right ?
If assumption 1 is wrong, which stat can accurately tell me what the RES value in htop is.
What can I do to reduce the value of HeapIdle ?
This was tried on go 1.4.2, 1.5.2 and 1.6.beta1
The effective memory consumption of your program will be Sys-HeapReleased. This still won't be exactly what the OS reports, because the OS can choose to allocate memory how it sees fit based on the requests of the program.
If your program runs for any appreciable amount of time, the excess memory will be offered back to the OS so there's no need to call debug.FreeOSMemory(). It's also not the job of the garbage collector to keep memory as low as possible; the goal is to use memory as efficiently as possible. This requires some overhead, and room for future allocations.
If you're having trouble with memory usage, it would be a lot more productive to profile your program and see why you're allocating more than expected, instead of killing your process based on incorrect assumptions about memory.