Runtime constraints on CPU and memory with docker containers - memory-management

How can we change the memory and cpu limit for docker containers at runtime? I mean while the container is running I would like to change the the memory limit for example
Thanks in advance

You cannot change that inside the running container, you would have to do that on your host.
How you do that on the host depends on your host-os, on Linux I suggest to take a look a cgroups, thats how docker internally restricts containers.
On ubuntu you could use the cgroup manager cgm (tried it on ubuntu 15.04).
create a new cgroup for cpu, move the process (e.g. 28433) to it and set a value
> # cgm create cpu dudecpu
> # cgm movepid cpu dudecpu 28433
> # cgm setvalue cpu dudecpu cpu.shares 512
create a new cgroup for the memory move the process (e.g. 28433) to it and set a value
> cgm create memory dudemem
> cgm movepid memory dudemem 28433
> cgm setvalue memory dudemem memory.limit_in_bytes 1000000000
check where your new cgroups are and look into these directories, you'll find all the properties of the cgroup there.
> find /sys/fs/cgroup/ -name "dude*"
> /sys/fs/cgroup/memory/user.slice/user-1000.slice/session-c3.scope/dudemem
> /sys/fs/cgroup/cpu,cpuacct/user.slice/user-1000.slice/session-c3.scope/dudecpu

Related

Spark - Container is running beyond physical memory limits

I have a cluster of two worker nodes.
Worker_Node_1 - 64GB RAM
Worker_Node_2 - 32GB RAM
Background Summery :
I am trying to execute spark-submit on yarn-cluster to run Pregel on a Graph to calculate the shortest path distances from one source vertex to all other vertices and print the values on console.
Experment :
For Small graph with 15 vertices execution completes application final status : SUCCEEDED
My code works perfectly and prints shortest distance for 241 vertices graph for single vertex as source vertex but there is a problem.
Problem :
When I dig into the Log file the task gets complete successfully in 4 mins and 26 Secs but still on the terminal it keeps on showing application status as Running and after approx 12 more minutes task execution terminates saying -
Application application_1447669815913_0002 failed 2 times due to AM Container for appattempt_1447669815913_0002_000002 exited with exitCode: -104 For more detailed output, check application tracking page:http://myserver.com:8088/proxy/application_1447669815913_0002/
Then, click on links to logs of each attempt.
Diagnostics: Container [pid=47384,containerID=container_1447669815913_0002_02_000001] is running beyond physical memory limits. Current usage: 17.9 GB of 17.5 GB physical memory used; 18.7 GB of 36.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1447669815913_0002_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 47387 47384 47384 47384 (java) 100525 13746 20105633792 4682973 /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -Xmx16384m -Djava.io.tmpdir=/yarn/nm/usercache/cloudera/appcache/application_1447669815913_0002/container_1447669815913_0002_02_000001/tmp -Dspark.eventLog.enabled=true -Dspark.eventLog.dir=hdfs://myserver.com:8020/user/spark/applicationHistory -Dspark.executor.memory=14g -Dspark.shuffle.service.enabled=false -Dspark.yarn.executor.memoryOverhead=2048 -Dspark.yarn.historyServer.address=http://myserver.com:18088 -Dspark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.shuffle.service.port=7337 -Dspark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/lib/spark-assembly.jar -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.authenticate=false -Dspark.app.name=com.path.PathFinder -Dspark.master=yarn-cluster -Dspark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class com.path.PathFinder --jar file:/home/cloudera/Documents/Longest_Path_Data_1/Jars/ShortestPath_Loop-1.0.jar --arg /home/cloudera/workspace/Spark-Integration/LongestWorstPath/configFile --executor-memory 14336m --executor-cores 32 --num-executors 2
|- 47384 47382 47384 47384 (bash) 2 0 17379328 853 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native::/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -Xmx16384m -Djava.io.tmpdir=/yarn/nm/usercache/cloudera/appcache/application_1447669815913_0002/container_1447669815913_0002_02_000001/tmp '-Dspark.eventLog.enabled=true' '-Dspark.eventLog.dir=hdfs://myserver.com:8020/user/spark/applicationHistory' '-Dspark.executor.memory=14g' '-Dspark.shuffle.service.enabled=false' '-Dspark.yarn.executor.memoryOverhead=2048' '-Dspark.yarn.historyServer.address=http://myserver.com:18088' '-Dspark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' '-Dspark.shuffle.service.port=7337' '-Dspark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/spark/lib/spark-assembly.jar' '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' '-Dspark.authenticate=false' '-Dspark.app.name=com.path.PathFinder' '-Dspark.master=yarn-cluster' '-Dspark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' '-Dspark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/native' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.path.PathFinder' --jar file:/home/cloudera/Documents/Longest_Path_Data_1/Jars/ShortestPath_Loop-1.0.jar --arg '/home/cloudera/workspace/Spark-Integration/LongestWorstPath/configFile' --executor-memory 14336m --executor-cores 32 --num-executors 2 1> /var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001/stdout 2> /var/log/hadoop-yarn/container/application_1447669815913_0002/container_1447669815913_0002_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
Things I tried :
yarn.schedular.maximum-allocation-mb – 32GB
mapreduce.map.memory.mb = 2048 (Previously it was 1024)
Tried varying --driver-memory upto 24g
Could you please put more color on to how I can configure the Resource Manager so that Large Size Graphs ( > 300K vertices) can also be processed? Thanks.
Just increase default conf of spark.driver.memory from 512m to 2g solve this error in my case.
You may set the memory to higher if it keeps hitting the same error. Then, you can keep reducing it until it hits the same error so that you know the optimum driver memory to use for your job.
The more data you are processing, the more memory is needed by each Spark task. And if your executor is running too many tasks then it can run out of memory. When I had problems processing large amounts of data, it usually was a result of not properly balancing the number of cores per executor. Try to either reduce the number of cores or increase the executor memory.
One easy way to tell that you are having memory issues is to check the Executor tab on the Spark UI. If you see a lot of red bars indicating high garbage collection time, you are probably running out of memory in your executors.
I slove the error in my case to increase conf of spark.yarn.executor.memoryOverhead Which stand for off-heap memory
When you increase the amount of driver-memory and executor-memory, do not forget this config item
I have similar problem :
Key error info:
exitCode: -104
'PHYSICAL' memory limit
Application application_1577148289818_10686 failed 2 times due to AM Container for appattempt_1577148289818_10686_000002 exited with **exitCode: -104**
Failing this attempt.Diagnostics: [2019-12-26 09:13:54.392]Container [pid=18968,containerID=container_e96_1577148289818_10686_02_000001] is running 132722688B beyond the **'PHYSICAL' memory limit**. Current usage: 1.6 GB of 1.5 GB physical memory used; 4.6 GB of 3.1 GB virtual memory used. Killing container.
Increase both spark.executor.memory and spark.executor.memoryOverhead didn't take effect .
Then I increase spark.driver.memory solved it.
Spark jobs ask for resources from resource manager in a different way from MapReduce jobs. Try to tune the number of executors and mem/vcore allocated to each executor. Follow http://spark.apache.org/docs/latest/submitting-applications.html

Huge size of /lib/modules/3.19.3 while building custom kernel

I was building Linux 3.19.3 for my Linux Mint 64-bit laptop, and ran out of space because of a huge /lib/modules/3.19.3 folder.
I used the config file for 3.16.0-31-generic that was supplied by my Mint updates, reverting to default for any new parameters.
Size of /lib/modules for various kernels:
184M /lib/modules/3.13.0-24-generic
193M /lib/modules/3.15.0-031500-generic
191M /lib/modules/3.16.0-29-generic
192M /lib/modules/3.16.0-31-generic
4.0K /lib/modules/3.16.1-031601-generic
2.3G /lib/modules/3.19.3
608K /lib/modules/3.8.0-31-generic
And this size is despite make modules_install being interrupted due to lack of disk space.
Even some of the older kernels listed here were custom compiled. Is there any parameter I'm missing?
Link to .config - http://paste.ubuntu.com/10726888/

elasticsearch JDBC -RIVER java.lang.OutOfMemoryError: unable to create new native thread

I am using elasticsearch "1.4.2" with river plugin on an aws instance with 8GB ram.Everything was working fine for a week but after a week the river plugin[plugin=org.xbib.elasticsearch.plugin.jdbc.river.JDBCRiverPlugin
version=1.4.0.4] stopped working also I was not able to do a ssh login to the server.After server restart ssh login worked fine ,when I checked the logs of elastic search I could find this error.
[2015-01-29 09:00:59,001][WARN ][river.jdbc.SimpleRiverFlow] no river mouth
[2015-01-29 09:00:59,001][ERROR][river.jdbc.RiverThread ] java.lang.OutOfMemoryError: unable to create new native thread
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: unable to create new native thread
After restarting the service everything works normal .But after certain interval the same thing happen.Can anyone tell what could be the reason and solution .If any other details are required please let me know.
When I checked the number of file descriptor using
sudo ls /proc/1503/fd/ | wc -l
I could see it is increasing after every time . It was 320 and it now reached 360 (keeps increasing) . and
sudo grep -E "^Max open files" /proc/1503/limits
this shows 65535
processor info
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2670 v2 # 2.50GHz
stepping : 4
microcode : 0x415
cpu MHz : 2500.096
cache size : 25600 KB
siblings : 8
cpu cores : 4
memory
MemTotal: 62916320 kB
MemFree: 57404812 kB
Buffers: 102952 kB
Cached: 3067564 kB
SwapCached: 0 kB
Active: 2472032 kB
Inactive: 2479576 kB
Active(anon): 1781216 kB
Inactive(anon): 528 kB
Active(file): 690816 kB
Inactive(file): 2479048 kB
Do the following
Run the following two commands as root:
ulimit -l unlimited
ulimit -n 64000
In /etc/elasticsearch/elasticsearch.yml make sure you uncomment or add a line that says:
bootstrap.mlockall: true
In /etc/default/elasticsearch uncomment the line (or add a line) that says MAX_LOCKED_MEMORY=unlimited and also set the ES_HEAP_SIZE line to a reasonable number. Make sure it's a high enough amount of memory that you don't starve elasticsearch, but it should not be higher than half the memory on your system generally and definitely not higher than ~30GB. I have it set to 8g on my data nodes.
In one way or another the process is obviously being starved of resources. Give your system plenty of memory and give elasticsearch a good part of that.
I think you need to analysis your server log. Maybe In: /var/log/message

Container is running beyond physical memory. Hadoop Streaming python MR

I am running a Python Script which needs a file (genome.fa) as a dependency(reference) to execute. When I run this command :
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/had oop-streaming-2.5.1.jar -file ./methratio.py -file '../Test_BSMAP/genome.fa' - mapper './methratio.py -r -g ' -input /TextLab/sravisha_test/SamFiles/test_sam -output ./outfile
I am getting this Error:
15/01/30 10:48:38 INFO mapreduce.Job: map 0% reduce 0%
15/01/30 10:52:01 INFO mapreduce.Job: Task Idattempt_1422600586708_0001_m_000 009_0, Status : FAILED
Container [pid=22533,containerID=container_1422600586708_0001_01_000017] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
I am using Cloudera Manager (Free Edition) .These are my config :
yarn.app.mapreduce.am.resource.cpu-vcores = 1
ApplicationMaster Java Maximum Heap Size = 825955249 B
mapreduce.map.memory.mb = 1GB
mapreduce.reduce.memory.mb = 1 GB
mapreduce.map.java.opts = -Djava.net.preferIPv4Stack=true
mapreduce.map.java.opts.max.heap = 825955249 B
yarn.app.mapreduce.am.resource.mb = 1GB
Java Heap Size of JobHistory Server in Bytes = 397 MB
Can Someone tell me why I am getting this error ??
I think your python script is consuming a lot of memory during the reading of your large input file (clue: genome.fa).
Here is my reason (Ref: http://courses.coreservlets.com/Course-Materials/pdf/hadoop/04-MapRed-6-JobExecutionOnYarn.pdf, Container is running beyond memory limits, http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/)
Container’s Memory Usage = JVM Heap Size + JVM Perm Gen + Native Libraries + Memory used by spawned processes
The last variable 'Memory used by spawned processes' (the Python code) might be the culprit.
Try increasing the mem size of these 2 parameters: mapreduce.map.java.opts
and mapreduce.reduce.java.opts.
Try increasing the maps spawning at the time of execution ... you can increase no. of mappers by decreasing the split size... mapred.max.split.size ...
It will have overheads but will mitigate the problem ....

Rootfs on SD card

I've a device on which I've a 3.10 linux kernel booting up to a busybox shell (initramfs)
When I extracted the busybox filesystem image on the SD card and when modified the root from root=/dev/ram to /dev/mmcblck0p1, it still boots up to the shell
So the busybox works fine but if I try to use any other FS the kernel would crash...
While I try to generate a rootfs using debootstrap (https://help.ubuntu.com/community/DebootstrapChroot) and have the new rootfs extracted on the SD card. I get an error saying "Failed to execute /sbin/init"
I did check if the file is present and also checked the permissions and it looks good to me.
What could be the problem?
W.R.T rootfs I'm particularly new. I was assuming that any FS on the SD card could be mounted but looks like its not the case. I'm guessing that whatever the /sbin/init will be doing is device dependent?
What I am trying to do? --->
I need to make a rootfs with a few packages and libraries (gcc python etc..) What would a normal approach? I've even tried buildroot but I couldn't get gcc on target. Is it not possible to have gcc in /bin/ within buildroot?
-- UPDATE --
I'm formatting the SD card to ext4 format and following is the output of fdisk
Disk /dev/sdb1: 7945 MB, 7945588224 bytes
255 heads, 63 sectors/track, 965 cylinders, total 15518727 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xc2aa4908
Device Boot Start End Blocks Id System
And following are the kernel logs while I have a filesystem on the SD card. The memory card driver works fine I've verified that. If I have a busybox filesystem on the SD card, everything works fine. When I'm using any other file systems I get the following...
6EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
6VFS: Mounted root (ext4 filesystem) on device 179:1.
6Freeing unused kernel memory: 84K (c0f00000 - c0f15000)
3request_module: runaway loop modprobe binfmt-464c
4kworker/u2:4 (145) used greatest stack depth: 6132 bytes left
3Failed to execute /sbin/init. Attempting defaults...
3request_module: runaway loop modprobe binfmt-464c
3request_module: runaway loop modprobe binfmt-464c
0Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.
When checked, there is /sbin/init with the appropriate permissions that too!
Consider this error: "request_module: runaway loop modprobe binfmt-464c"
In all probability you're trying to use 64b binaries (/sbin/init and the rest) with 32b only kernel. Either recompile your kernel to support 64b or install a 32b user space onto your sd card.
Other things to check:
Confirm that elf support is indeed enabled in your kernel (it normally is, but it is possible to disable it).
Google that error and see what sort of problems people were having with it.

Resources