Spark object runtime error - hadoop

while running the program in my local system getting error as
MY ram size is 3GB , need solution
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:216)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:198)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:330)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:174)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at SparkCore.cartesianTransformation$.main(cartesianTransformation.scala:11)
at SparkCore.cartesianTransformation.main(cartesianTransformation.scala)

It seems your spark driver is running in small memory try to increase the size of driver memory.
You can use --driver-memory 4g to provide the memory size to driver.
Hope this helps!

Related

Maven Project getting some Heap Size Error, Maven or Java?

when i run my programm and add a 10MB Excel and calculate something i get this error:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.HashMap.resize(HashMap.java:705)
at java.base/java.util.HashMap.putVal(HashMap.java:630)
at java.base/java.util.HashMap.put(HashMap.java:613)
at java.base/java.util.HashSet.add(HashSet.java:221)
at java.base/java.util.Collections.addAll(Collections.java:5593)
at org.logicng.formulas.FormulaFactory.or(FormulaFactory.java:532)
at org.logicng.formulas.FormulaFactory.naryOperator(FormulaFactory.java:372)
at org.logicng.formulas.FormulaFactory.naryOperator(FormulaFactory.java:359)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:130)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:129)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:129)
at
org.logicng.transformations.qe.ExistentialQuantifierElimination.apply(ExistentialQuantifierElimination.java:74)
at ToPue.calculatePueForPos(ToPue.java:59)
at PosvHandler.calculatePosv(PosvHandler.java:21)
at PueChecker$EqualBtnClicked.actionPerformed(PueChecker.java:192)
at java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
at java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
at java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
at java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
at java.desktop/javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:279)
at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636)
at java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
at java.desktop/java.awt.Component.processEvent(Component.java:6401)
at java.desktop/java.awt.Container.processEvent(Container.java:2263)
at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321)
at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307)
at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764)
I searched everything and tested to set the maven memory higher.
I set the heap space for java to 12gb
The maximum usage of the heapspace is 201MB. Why isnĀ“t it using the whole memory?
Can somebody help

memory usage grows until VM crashes while running Wildfly 9 with Java 8

We are having an issue with virtual servers (VMs) running out of native memory. These VMs are running:
Linux 7.2(Maipo)
Wildfly 9.0.1
Java 1.8.0._151 running with (different JVMs have different heap sizes. They range from 0.5G to 2G)
The JVM args are:
-XX:+UseG1GC
-XX:SurvivorRatio=1
-XX:NewRatio=2
-XX:MaxTenuringThreshold=15
-XX:-UseAdaptiveSizePolicy
-XX:G1HeapRegionSize=16m
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=64m
-javaagent:/<path to new relic.jar>
After about a month, sometimes longer, the VMs start to use all of their swap space and then eventually the OOM-Killer notices that java is using too much memory and kills one of our JVMs.
The amount of memory being used by the java process is larger than heap + metaSpace + compressed as revealed by using -XX:NativeMemoryTracking=detail
Are there tools that could tell me what is in this native memory(like a heap dump but not for the heap)?
Are there any tools that can map java heap usage to native memory usage (outside the heap) that are not jemalloc? I have used jemalloc to try to achieve this but the graph that is being drawn contains only hex values and not human readable class names so I cant really get anything out of it. Maybe I'm doing something wrong or perhaps I need another tool.
Any suggestions would be greatly appreciated.
You can use jcmd.
Start application with -XX:NativeMemoryTracking=summary or -
XX:NativeMemoryTracking=detail
Use jcmd to monitor the NMT (native memory tracker)
jcmd "pid" VM.native_memory baseline //take the baseline
jcmd "pid" VM.native_memory detail.diff // use based on your need to analyze more on change in native memory from its baseline

Spark-submit job performance

I am currently running spark-submit on the following environment:
Single node (RAM: 40GB, VCores: 8, Spark Version: 2.0.2, Python: 3.5)
My pyspark program basically will read one 450MB unstructured file from HDFS. Then it will loop through each lines and grab the necessary data and place it list. Finally it will use createDataFrame and save the data frame into Hive table.
My pyspark program code snippet:
sparkSession = (SparkSession
.builder
.master("yarn")
.appName("FileProcessing")
.enableHiveSupport()
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.getOrCreate())
lines = sparkSession.read.text('/user/test/testfiles').collect()
for line in lines:
// perform some data extrating and place it into rowList and colList using normal python operation
df = sparkSession.createDataFrame(rowList, colList)
df.registerTempTable("tempTable")
sparkSession.sql("create table test as select * from tempTable");
My spark-submit command is as the following:
spark-submit --master yarn --deploy-mode cluster --num-executors 2 --driver-memory 4g --executor-memory 8g --executor-cores 3 --files /usr/lib/spark-2.0.2-bin-hadoop2.7/conf/hive-site.xml FileProcessing.py
It took around 5 minutes to complete the processing. Is the performance consider good? How can I tune it in terms of setting the executor memory and executor cores so that the process can complete within 1-2 minutes, is it possible?
Appreciate your response. Thanks.
For tuning you application you need to know few things
1) You Need to Monitor your application whether your cluster is under utilized or not how much resources are used by your application which you have created
Monitoring can be done using various tools eg. Ganglia From Ganglia you can find CPU, Memory and Network Usage.
2) Based on Observation about CPU and Memory Usage you can get a better idea what kind of tuning is needed for your application
Form Spark point of you
In spark-defaults.conf
you can specify what kind of serialization is needed how much Driver Memory and Executor Memory needed by your application even you can change Garbage collection algorithm.
Below are few Example you can tune this parameter based on your requirements
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.executor.memory 3g
spark.executor.extraJavaOptions -XX:MaxPermSize=2G -XX:+UseG1GC
spark.driver.extraJavaOptions -XX:MaxPermSize=6G -XX:+UseG1GC
For More details refer http://spark.apache.org/docs/latest/tuning.html
Hope this Helps!!

correct way to increase hdfs java heap memory

I'm getting the following errors in my hadoop namenode log:
2015-12-20 06:15:40,717 WARN [IPC Server handler 21 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 21 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport
from 172.31.21.110:46999 Call#163559 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2015-12-20 06:15:42,710 WARN [IPC Server handler 22 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 22 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from
172.31.24.250:45624 Call#164898 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
which results in all the nodes being listed as dead.
I have checked other stackoverflow questions and the most useful suggestion seems to be that I need to set the mapred.child.java.opts option in conf/mapred-site.xml to something higher than 2048MB,
but I'm concerned that might not be enough.
I'm launching my cluster using spark with the --hadoop-major-version=yarn option, so all MapReduce jobs are run through Yarn if I understand correctly, including jobs created by HDFS.
My question is: what other settings, if any, do I need to modify (and how do I determine their amounts, given that I want to use say 4GB for the mapreduce.child.java.opts setting) to increase the memory available to HDFS's MapReduce jobs?
Hadoop daemons control their JVM arguments, including heap size settings, through the use of environment variables that have names suffixed with _OPTS. These environment variables are defined in various *-env.sh files in the configuration directory.
Using the NameNode as an example, you can set a line like this in your hadoop-env.sh file.
export HADOOP_NAMENODE_OPTS="-Xms4G -Xmx4G $HADOOP_NAMENODE_OPTS"
This sets a minimum/maximum heap size of 4 GB for the NameNode and also preserves any other arguments that were placed into HADOOP_NAMENODE_OPTS earlier in the script.

JMeter issues when running large number of threads

I'm testing using Apache's Jmeter, I'm simply accessing one page of my companies website and turning up the number of users until it reaches a threshold, the problem is that when I get to around 3000 threads JMeter doesn't run all of them. Looking at the Aggregate Graph
it only runs about 2,536 (this number varies but is always around here) of them.
The partial run comes with the following exception in the logs:
01:16 ERROR - jmeter.JMeter: Uncaught exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.jmeter.threads.ThreadGroup.start(ThreadGroup.java:293)
at org.apache.jmeter.engine.StandardJMeterEngine.startThreadGroup(StandardJMeterEngine.java:476)
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:395)
at java.lang.Thread.run(Unknown Source)
This behavior is consistent. In addition one of the times JMeter crashed in the middle outputting a file that said:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.cpp:211), pid=10748, tid=11652
#
# JRE version: 6.0_31-b05
# Java VM: Java HotSpot(TM) Client VM (20.6-b01 mixed mode, sharing windows-x86 )
Any ideas?
I tried changing the heap size in jmeter.bat, but that didn't seem to help at all.
JVM is simply not capable of running so many threads. And even if it is, JMeter will consume a lot of CPU resources to purely switch contexts. In other words, above some point you are not benchmarking your web application but the client computer, hosting JMeter.
You have few choices:
experiment with JVM options, e.g. decrease default -Xss512K to something smaller
run JMeter in a cluster
use tools taking radically different approach like Gatling
I had a similar issue and increased the heap size in jmeter.bat to 1024M and that fixed the issue.
set HEAP=-Xms1024m -Xmx1024m
For the JVM, if you read hprof it gives you some solutions among which are:
switch to a 64 bits jvm ( > 6_u25)
with this you will be able to allocate more Heap (-Xmx) , ensure you have this RAM
reduce Xss with:
-Xss256k
Then for JMeter, follow best-practices:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally ensure you use last JMeter version.
Use linux OS preferably
Tune the TCP stack, limits
Success will depend on your machine power (cpu and memory) and your test plan.
If this is not enough (for 3000 threads it should be OK), you may need to use distributed testing
Increasing the heap size in jmeter.bat works fine
set HEAP=-Xms1024m -Xmx1024m
OR
you can do something like below if you are using jmeter.sh:
JVM_ARGS="-Xms512m -Xmx1024m" jmeter.sh etc.
I ran into this same problem and the only solution that helped me is: https://stackoverflow.com/a/26190804/5796780
proper 100k threads on linux:
ulimit -s 256
ulimit -i 120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max
If you don't have root access:
echo 200000 | sudo dd of=/proc/sys/kernel/pid_max
After increasing Xms et Xmx heap size, I had to make my Java run in 64 bits mode. In jmeter.bat :
set JM_LAUNCH=java.exe -d64
Obviously, you need to run a 64 bits OS and have installed Java 64 bits (see https://www.java.com/en/download/manual.jsp)

Resources