I am running a micro instace in EC2 with 592 MB available RAM
Jenkins was crashing due to Out Of Memory build errors while running UPDATE on big SQL Table in backend.
Disk utilisation is 83% with 6 GB out of 8GB EBS volume used ..
sudo du -hsx * | sort -rh | head -10
/
2.7G opt
1.5G var
1.2G usr
I found only 6 MB was free with command - "free -m " with these services running -
(i) LAMPP
(ii) Jenkins
(iii) Mysql 5.6
I stopped LAMPP and that created 70 MB free space
Then , I closed Jenkins, it created 320 MB free space
Closing MySQL 5.6 brings it up to 390 MB free space ..
So, 200MB RAM is still getting used with none of my services running.
Is 200MB RAM minimum required for an Ubuntu micro Instance running on Amazon EC2 ?
Nope, i believe it can run till its 100% used.
If a task that requires a large memory than what is available, the task is killed.
To free up more memory space, you can run this from your terminal
sudo apt-get autoremove
Related
I am using EMR and I have task nodes with 32 GB of memory. However when I login to my YARN UI. it says it has only 12 GB of memoery.
Yes, I understand some memory should be used by OS and other services running. However, 20 GB is too much.
On host machine
free -h
total used free shared buffers cached
Mem: 30G 18G 12G 88K 128M 12G
-/+ buffers/cache: 5.6G 25G
Swap: 0B 0B 0B
on other machine.
free -h
total used free shared buffers cached
Mem: 30G 11G 18G 88K 75M 8.5G
-/+ buffers/cache: 3.4G 27G
Swap: 0B 0B 0B
so even after having 18 gb free, why Yarn shows only 12 GB available?
After doing some search on google that it was being restricted by yarn setting. Same thing has been suggested by pradeep in his comment.
I did the following thing the setting on the cluster
Find out the instance group ID for your task and core nodes. Use following command for that.
aws emr describe-cluster --cluster-id j-xxxxxx | jq -r '.Cluster.InstanceGroups[] | select(.InstanceGroupType == "CORE").Id'
aws emr describe-cluster --cluster-id j-xxxxxx | jq -r '.Cluster.InstanceGroups[] | select(.InstanceGroupType == "TASK").Id'
Create JSON configuration to update your emr. You will have to create two configuration file one for TASK group and another for Core group. Or you can simple replace value of InstanceGroupId after each update.
[
{
"InstanceGroupId":"<output_of_above_command>",
"Configurations":[
{
"Classification":"yarn-site",
"Properties":{
"yarn.nodemanager.resource.memory-mb":"32768"
},
"Configurations":[]
}
]
}
]
Finally run the command to update the instance group.
aws emr modify-instance-groups --cluster-id j-******* --instance-groups file://instaceGroup.json
I have a 6 node cluster - 5 DN and 1 NN. All have 32 GB RAM. All slaves have 8.7 TB HDD. DN has 1.1 TB HDD. Here is the link to my core-site.xml , hdfs-site.xml , yarn-site.xml.
After running an MR job, i checked my RAM Usage which is mentioned below:
Namenode
free -g
total used free shared buff/cache available
Mem: 31 7 15 0 8 22
Swap: 31 0 31
Datanode :
Slave1 :
free -g
total used free shared buff/cache available
Mem: 31 6 6 0 18 24
Swap: 31 3 28
Slave2:
total used free shared buff/cache available
Mem: 31 2 4 0 24 28
Swap: 31 1 30
Likewise, other slaves have similar RAM usage. Even if a single job is submitted, the other submitted jobs enter into ACCEPTED state and wait for the first job to finish and then they start.
Here is the output of ps command of the JAR that I submnitted to execute the MR job:
/opt/jdk1.8.0_77//bin/java -Dproc_jar -Xmx1000m
-Dhadoop.log.dir=/home/hduser/hadoop/logs -Dyarn.log.dir=/home/hduser/hadoop/logs
-Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log
-Dyarn.home.dir= -Dyarn.id.str= -Dhadoop.root.logger=INFO,console
-Dyarn.root.logger=INFO,console -Dyarn.policy.file=hadoop-policy.xml
-Dhadoop.log.dir=/home/hduser/hadoop/logs -Dyarn.log.dir=/home/hduser/hadoop/logs
-Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log
-Dyarn.home.dir=/home/hduser/hadoop -Dhadoop.home.dir=/home/hduser/hadoop
-Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console
-classpath --classpath of jars
org.apache.hadoop.util.RunJar abc.jar abc.mydriver2 /raw_data /mr_output/02
Is there any settings that I can change/add to allow multiple jobs to run simultaneously and speed up current data processing ? I am using hadoop 2.5.2. The cluster is in PROD environment and I can not take it down for updating hadoop version.
EDIT 1 : I started a new MR job with 362 GB of data and still the RAM usage is around 8 GB and 22 GB of RAM is free. Here is my job submission command -
nohup yarn jar abc.jar def.mydriver1 /raw_data /mr_output/01 &
Here is some more information :
18/11/22 14:09:07 INFO input.FileInputFormat: Total input paths to process : 130363
18/11/22 14:09:10 INFO mapreduce.JobSubmitter: number of splits:130372
Is there some additional memory parameters that we can use to submit the job to have efficient memory usage ?
I believe you can edit the mapred-default.xml
The Params you are looking for are
mapreduce.job.running.map.limit
mapreduce.job.running.reduce.limit
0 (Probably what it is set too at the moment) means UNLIMITED.
Looking at your Memory 32G/Machine seems too small.
What CPU/Cores are you having ? I would expect Quad CPU/16 Cores Minimum. Per Machine.
Based on your yarn-site.xml your yarn.scheduler.minimum-allocation-mb setting of 10240 is too high. This effectively means you only have at best 18 vcores available. This might be the right setting for a cluster where you have tons of memory but for 32GB it's way too large. Drop it to 1 or 2GB.
Remember, HDFS block sizes are what each mapper typically consumes. So 1-2GB of memory for 128MB of data sounds more reasonable. The added benefit is you could have up to 180 vcores available which will process jobs 10x faster than 18 vcores.
To give you an idea of how a 4 node 32 core 128GB RAM per node cluster is setup:
For Tez: Divide RAM/CORES = Max TEZ Container size
So in my case: 128/32 = 4GB
TEZ:
YARN:
On my Windows 10 machine I'm trying to start Elasticsearch 5.2.0 which fails with a following error:
D:\Tools\elasticsearch-5.2.0\bin>elasticsearch.bat
Error occurred during initialization of VM
Could not reserve enough space for 2097152KB object heap
Right now I have 20GB free RAM.
How to resolve this issue ?
Change the JVM options of Elasticsearch before launch it.
Basically go to your config/jvm.options and change the values of
-Xms2g ---> to some megabytes (200 MB)
-Xmx2g ---> to some megabytes (500 MB)
here 2g refers to 2GB so change to 200MB it should 200m
For example change it to below value
-Xms200m
-Xmx500m
It worked for me.
Updated to the last JDK version 1.8.0_121 64-Bit(I had 1.8.0_90) and the issue is gone
If a YARN container grows beyond its heap size setting, the map or reduce task will fail, with an error similar to the one below:
2015-02-06 11:58:15,461 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=10305,containerID=container_1423215865404_0002_01_000007] is running beyond physical memory limits.
Current usage: 42.1 GB of 42 GB physical memory used; 42.9 GB of 168 GB virtual memory used. Killing container.
Dump of the process-tree for container_1423215865404_0002_01_000007 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 10310 10305 10305 10305 (java) 1265097 48324 46100516864 11028122 /usr/java/default/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms40960m -Xmx40960m -XX:MaxPermSize=128m -Dspark.sql.shuffle.partitions=20 -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1423215865404_0002/container_1423215865404_0002_01_000007/tmp org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver#marx-61:56138/user/CoarseGrainedScheduler 6 marx-62 5
|- 10305 28687 10305 10305 (bash) 0 0 9428992 318 /bin/bash -c /usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms40960m -Xmx40960m -XX:MaxPermSize=128m -Dspark.sql.shuffle.partitions=20 -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1423215865404_0002/container_1423215865404_0002_01_000007/tmp org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver#marx-61:56138/user/CoarseGrainedScheduler 6 marx-62 5 1> /opt/hadoop/logs/userlogs/application_1423215865404_0002/container_1423215865404_0002_01_000007/stdout 2> /opt/hadoop/logs/userlogs/application_1423215865404_0002/container_1423215865404_0002_01_000007/stderr
It is interesting to note that all stages complete, just when save as sequence file is called, it fails. The executor is not using up the heap space, wonder what else is eating it up?
Spark executor gets killed all the time and Spark keeps retrying the failed stage. For Spark on YARN, nodemanager would kill Spark executor if it used more memory than the configured size of "spark.executor.memory" + "spark.yarn.executor.memoryOverhead". Increase "spark.yarn.executor.memoryOverhead" to make sure it covers the executor off-heap memory usage.
Some issues:
https://issues.apache.org/jira/browse/SPARK-2398
https://issues.apache.org/jira/browse/SPARK-2468
You are actually running the container out of physical memory in this case:
Current usage: 42.1 GB of 42 GB physical memory used
The virtual memory isn't the bounding factor. You'll have to increase the heap size of the container or increase spark.yarn.executor.memoryOverhead to give some more space to the YARN container without increasing the executor heap size necessarily.
I faced exact same problem as OP, all stages succeeded and only at the time of saving and writing the results, the container would be killed.
If java heap memory is exceeded, you see OutOfMemory exceptions but a container being killed is related to everything except java heap memory, which can be either related to memoryOverhead or application master memory.
In my case increasing spark.yarn.executor.memoryOverhead or spark.yarn.driver.memoryOverhead didn't help because probably it was my application master (AM) getting out of memory. In yarn-client mode, the configuration to increase AM memory is spark.yarn.am.memory. For yarn-cluster mode, it is the driver memory. This is how it worked for me.
Here's a reference to the error I got:
Application application_1471843888557_0604 failed 2 times due to AM Container for appattempt_1471843888557_0604_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://master01.prod2.everstring.com:8088/cluster/app/application_1471843888557_0604Then, click on links to logs of each attempt.
Diagnostics: Container [pid=89920,containerID=container_e59_1471843888557_0604_02_000001] is running beyond physical memory limits.
Current usage: 14.0 GB of 14 GB physical memory used; 16.0 GB of 29.4 GB virtual memory used. Killing container.
right an absolute spark noob is talking here.
this is the command I'm running and expecting 3 workers
./spark-ec2 --worker-instances=3 --key-pair=my.key --identity-file=mykey.pem --region=us-east-1 --zone=us-east-1a launch my-spark-cluster-G
however, in aws console only two servers will be created (master and slave)
on the other side in :
http://myMasterSparkURL:8080/
I get the following info which does not just add up:
Workers: 3
Cores: 3 Total, 3 Used
Memory: 18.8 GB Total, 18.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
and under workers it shows:
worker1 (port 8081) worker1IP:43595 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
worker1 (port 8082) worker1IP:53195 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
worker1 (port 8083) worker1IP:41683 ALIVE 1 (1 Used) 6.3 GB (6.0 GB Used)
now if I click on the first one (worker with 8081) it redirected me to the worker page however if I click on the other two (workers with port 8082 and 8083). it basically says page not found.
with high probability I am assuming this is a bug in spark-ec2 but I'm not quite sure since I'm a noob here.
I've searched all over the place to find someone with similar issue. so I appreciate any suggestion where can give me some ideas why this is happening and how to fix it. ty
The spark version spark-1.3.0
You might want to change that invokation a little, this is how I have been creating clusters so far:
./spark-ec2 -k MyKey
-i MyKey.pem
-s 3
--instance-type=m3.medium
--region=eu-west-1
--spark-version=1.2.0
launch MyCluster