I’ve tried to execute Oozie workflow with spark program as single step.
I've used jar which is successfully executed with spark-submit or spark-shell (the same code):
spark-submit --packages com.databricks:spark-csv_2.10:1.5.0 --master yarn-client --class "SimpleApp" /tmp/simple-project_2.10-1.1.jar
Application shouldn’t demand lot of resources – load single csv (<10MB) to hive using spark.
Spark version: 1.6.0
Oozie version: 4.1.0
Workflow is created with Hue, Oozie Workflow Editor:
<workflow-app name="Spark_test" xmlns="uri:oozie:workflow:0.5">
<start to="spark-589f"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="spark-589f">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapreduce.map.java.opts</name>
<value>-XX:MaxPermSize=2g</value>
</property>
</configuration>
<master>yarn</master>
<mode>client</mode>
<name>MySpark</name>
<jar>simple-project_2.10-1.1.jar</jar>
<spark-opts>--packages com.databricks:spark-csv_2.10:1.5.0</spark-opts>
<file>/user/spark/oozie/jobs/simple-project_2.10-1.1.jar#simple-project_2.10-1.1.jar</file>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
I got following logs after running workflow:
stdout:
Invoking Spark class now >>>
Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space
stderr:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Yarn application state monitor"
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space
syslog:
2017-03-14 12:31:19,939 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: PermGen space
Please suggest which configuration parameters should be increased.
You have at least 2 options here:
1) increase PermGen size for launcher MR job by adding this to workflow.xml:
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-XX:PermSize=512m -XX:MaxPermSize=512m</value>
</property>
see details here: http://www.openkb.info/2016/07/memory-allocation-for-oozie-launcher-job.html
2) preferred way is to use Java 8 instead of outdated Java 7
PermGen memory is a non-heap memory which is used to store the class metadata and string constants. It will not usually grow drastically if there are no runtime class loading by class.forname() or any other third-party JARs.
If you get this error message as soon as you launch your application, then it means that the allocated permanent generation space is smaller than actually required by all the class files in your application.
"-XX:MaxPermSize=2g"
You already set 2gb for PermGen memory. You can increase this value gradually and see which value does not throw outofmemoryerror and keep that value. You can also use profilers to monitor the memory usage of permanent generation and set the right value.
If this error is triggered at run time, then it might be due to runtime class loading or excessive creation of string constants in permanent generation. It requires profiling your application to fix the issue and set the right value for -XX:MaxPermSize parameter.
Related
I have a workflow with a distCp action, and it's running fairly well. However, now I'm trying to change the copy strategy and am unable to do that through the action arguments. The documentation is fairly slim on this topic and looking at the source code for the distCp action executor did not help.
If running the distCp from the command line I can use the command line argument
-strategy {uniformsize|dynamic} to set the copy strategy.
Using that logic I tried to do this in the oozie action.
<action name="distcp-run" retry-max="3" retry-interval="1">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>${poolName}</value>
</property>
</configuration>
<arg>-Dmapreduce.job.queuename=${poolName}</arg>
<arg>-Dmapreduce.job.name=distcp-s3-${wf:id()}</arg>
<arg>-update</arg>
<arg>-strategy dynamic</arg>
<arg>${region}/d=${day2HoursAgo}/h=${hour2HoursAgo}</arg>
<arg>${region2}/d=${day2HoursAgo}/h=${hour2HoursAgo}</arg>
<arg>${region3}/d=${day2HoursAgo}/h=${hour2HoursAgo}</arg>
<arg>${nameNode}${rawPath}/${partitionDate}</arg>
</distcp>
<ok to="join-distcp-steps"/>
<error to="error-report"/>
</action>
However, the action fails when I execute.
From stdout:
...>>> Invoking Main class now >>>
Fetching child yarn jobs
tag id : oozie-1d1fa70383587ae625b6495e30a315f7
Child yarn jobs are found -
Main class : org.apache.hadoop.tools.DistCp
Arguments :
-Dmapreduce.job.queuename=merged
-Dmapreduce.job.name=distcp-s3-0000019-160622133128476-oozie-oozi-W
-update
-strategy dynamic
s3a://myfirstregion/d=21/h=17,s3a://mysecondregion/d=21/h=17,s3a://ttv-logs-eu/tsv/clickstream-clean/y=2016/m=06/d=21/h=17,s3a://mythirdregion/d=21/h=17
hdfs://myurl:8020/data/raw/2016062117
found Distcp v2 Constructor
public org.apache.hadoop.tools.DistCp(org.apache.hadoop.conf.Configuration,org.apache.hadoop.tools.DistCpOptions) throws java.lang.Exception
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, Returned value from distcp is non-zero (-1)
java.lang.RuntimeException: Returned value from distcp is non-zero (-1)
at org.apache.oozie.action.hadoop.DistcpMain.run(DistcpMain.java:66)...
Looking at the syslog it seems that it grabbed the -strategy dynamic and tried to put it in the array of source paths:
2016-06-22 14:11:18,617 INFO [uber-SubtaskRunner] org.apache.hadoop.tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[-strategy dynamic, s3a://myfirstregion/d=21/h=17,s3a:/mysecondregion/d=21/h=17,s3a:/ttv-logs-eu/tsv/clickstream-clean/y=2016/m=06/d=21/h=17,s3a:/mythirdregion/d=21/h=17], targetPath=hdfs://myurl:8020/data/raw/2016062117, targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
2016-06-22 14:11:18,624 INFO [uber-SubtaskRunner] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at sandbox/10.191.5.128:8032
2016-06-22 14:11:18,655 ERROR [uber-SubtaskRunner] org.apache.hadoop.tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: -strategy dynamic doesn't exist
So from the DistCpOptions there is a copyStrategy but it's set to a default uniformsize value.
I've tried to move the argument in the first place, but then both -Dmapreduce arguments end up in the source paths (but -update does not).
How can I, through Oozie workflow configuration, set the copy strategy to dynamic?
Thanks.
Looking at the code, it doesn't seem possible to set the strategy via configuration. Instead of using the distcp-action you could use a map-reduce action, that way you can configure it however you want.
The Oozie MapReduce Cookbook has examples.
Looking at the Distcp code the relevant part is around line 237 at createJob().
Job job = Job.getInstance(getConf());
job.setJobName(jobName);
job.setInputFormatClass(DistCpUtils.getStrategy(getConf(), inputOptions));
job.setJarByClass(CopyMapper.class);
configureOutputFormat(job);
job.setMapperClass(CopyMapper.class);
job.setNumReduceTasks(0);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputFormatClass(CopyOutputFormat.class);
job.getConfiguration().set(JobContext.MAP_SPECULATIVE, "false");
job.getConfiguration().set(JobContext.NUM_MAPS, String.valueOf(inputOptions.getMaxMaps()));
The code above isn't everything you will need, you'll need to look at the distcp source to work them all out.
So you would need to configure all of the properties yourself in a map-reduce action. This way you could set the InputFormatClass which is where the strategy setting is used.
You can see the available properties for the InputFormatClass in the distcp properties file here.
The one you need is org.apache.hadoop.tools.mapred.lib.DynamicInputFormat.
I am using an Oozie workflow to run a pyspark script, and I'm running into an error I can't figure out.
When running the workflow (either locally or on YARN) a MapReduce job is run before the Spark starts. After a few minutes the task fails (before the Spark action), and digging through the logs shows the following error:
java.io.IOException: Mkdirs failed to create file:/home/oozie/oozie-oozi/0000011-160222043656138-oozie-oozi-W/bulk-load-node--spark/output/_temporary/1/_temporary/attempt_1456129482428_0003_m_000000_2 (exists=false, cwd=file:/hadoop/yarn/local/usercache/root/appcache/application_1456129482428_0003/container_e68_1456129482428_0003_01_000004)
(Apologies for the length)
There are no other evident errors. I do not directly create this folder (I assume given the name that it is used for temporary storage of MapReduce jobs). I can create this folder from the command line using mkdir -p /home/oozie/blah.... It doesn't appear to be a permissions issue, as setting that folder to 777 made no difference. I have also added default ACLs for oozie, yarn and mapred users for that folder, so I've pretty much ruled out permission issues. It's also worth noting that the working directory listed in the error does not exist after the job fails.
After some Googling I saw that a similar problem is common on Mac systems, but I'm running on CentOS. I am running the HDP 2.3 VM Sandbox, which is a single node 'cluster'.
My workflow.xml is as follows:
<workflow-app xmlns='uri:oozie:workflow:0.4' name='SparkBulkLoad'>
<start to = 'bulk-load-node'/>
<action name = 'bulk-load-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>yarn</master>
<mode>client</mode>
<name>BulkLoader</name>
<jar>file:///test/BulkLoader.py</jar>
<spark-opts>
--num-executors 3 --executor-cores 1 --executor-memory 512m --driver-memory 512m\
</spark-opts>
</spark>
<ok to = 'end'/>
<error to = 'fail'/>
</action>
<kill name = 'fail'>
<message>
Error occurred while bulk loading files
</message>
</kill>
<end name = 'end'/>
</workflow-app>
and job.properties is as follows:
nameNode=hdfs://192.168.26.130:8020
jobTracker=http://192.168.26.130:8050
queueName=spark
oozie.use.system.libpath=true
oozie.wf.application.path=file:///test/workflow.xml
If necessary I can post any other parts of the stack trace. I appreciate any help.
Update 1
After having checked my Spark History Server, I can confirm that the actual Spark action is not starting - no new Spark apps are being submitted.
Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)
A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)
hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.
I am facing issue while running the coordinator for hive using oozie. This is how my jobs.properties file looks like:
oozie.use.system.libpath=true
workflowRoot=hdfs://bigi-3000-beta-bm-20140511-2204-3467-master.imdemocloud.com:9000/user/nehpraka/price_comp1
start=2015-05-20T22:00Z
end=2015-06-22T23:00Z
# HDFS path of the coordinator app
oozie.coord.application.path=hdfs://bigi-3000-beta-bm-20140511-2204-3467-master.imdemocloud.com:9000/user/nehpraka/price_comp1
and this the coordinator.xml
<coordinator-app name="my_coord_app" frequency="${coord:hours(1)}" start="${coordStart}" end="${coordEnd}" timezone="UTC" xmlns="uri:oozie:coordinator:0.4">
<action>
<workflow>
<app-path>${workflowRoot}</app-path>
</workflow>
</action>
</coordinator-app>
But I am getting following error when I am running my job.
$OOZIE_HOME/bin/oozie job -run -config price_comp1/job.properties
Error: E1004 : Internal Server Error
My workflow is running fine, problem occurred when I tried to append the coordinator.
:) It was an mismatch of the name of variables coordStart and coordend.
Still I have one query, what this Start and End time signsiifies?
Does it mean start and end time for the oozie job?
I'm trying to use Oozie's Hive action in Hue. My Hive script is very simple:
create table test.test_2 as
select * from test.test
This Oozie action has only 3 steps:
start
hive_query
end
My job.properties:
jobTracker worker-1:8032
mapreduce.job.user.name hue
nameNode hdfs://batchlayer
oozie.use.system.libpath true
oozie.wf.application.path hdfs://batchlayer/user/hue/oozie/workspaces/_hue_-oozie-4-1425575226.04
user.name hue
I add hive-site.xml two times - as file and as job.xml. Oozie action starts and on second step stops. Job is 'accepted'. But in hue console I've got an error:
variable[user] cannot ber resolved
I'm using Apache Oozie 4.2, Apache Hive 0.14 and Hue 3.7 (from Github).
UPDATE:
This is my workflow.xml:
bash-4.1$ bin/hdfs dfs -cat /user/hue/oozie/workspaces/*.04/work*
<workflow-app name="ccc" xmlns="uri:oozie:workflow:0.4">
<start to="ccc"/>
<action name="ccc">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/hue/hive-site.xml</job-xml>
<script>/user/hue/hive_test.hql</script>
<file>/user/hue/hive-site.xml#hive-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Tried running a sample hive action in Oozie following similar steps as you, and was able to resolve error faced by you using following steps
Remove the add for hive-site.xml
Add following line to your
job.properties oozie.libpath=${nameNode}/user/oozie/share/lib
Increase visibility of your hive-site.xml file kept in HDFS. Maybe
you have very restrictive privileges over it (in my case 500)
With this both the [user] variable cannot be resolved and subsequent errors got resolved.
Hope it helps.
This message can be really misleading. You should check yarn logs and diagnostics.
In my case it was configuration settings regarding reduce task and container memory. By some error container memory limit was lower than single reduce task memory limit. After looking into yarn application logs I saw the true cause in 'diagnostics' section, which was:
REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:8192, vCores:1> maxContainerCapability:<memory:5413, vCores:4>
Regards