i am trying to run spark-submit from a shell wrapper. while the job runs fine from command line but failed when scheduling through oozie.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments.handle(SparkSubmitArguments.scala:394)
at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:163)
at org.apache.spark.deploy.SparkSubmitArguments.(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
here is my workflow:
<workflow-app name="OozieTest1" xmlns="uri:oozie:workflow:0.5">
<start to="CopyTest"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="CopyTest">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>lib/copy.sh</exec>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/file-list/xxx_xxx_201610.lst</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/sample</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/output</argument>
<argument>IMMUN</argument>
<argument>N</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/resources/script-constants.properties</argument>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/copy.sh#copy.sh</file>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/xxxx_Integration.jar#xxxx_Integration.jar</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
It depends what version of spark, hadoop and oozie you use. But most probably you have some dependency issues. (jar is missing) I would really recommend to check your dependencies. Here you can find the full working example here:
In this example the hadoop and spark versions are following:
<hadoop.version>2.6.0-cdh5.4.7</hadoop.version>
<spark.version>1.3.0-cdh5.4.7</spark.version>
Related
I am trying to run oozie workflow, but I am getting the below error:
E0701: XML schema error, cvc-pattern-valid: Value 'mockup and mapping table update' is not facet-valid with respect to pattern '([a-zA-Z_]([\-_a-zA-Z0-9])*){1,39}' for type 'IDENTIFIER'.
I am using below regex in my query. Is something wrong with that? Everything works fine when I run through Hive or CLI.
regexp_replace(id_col, '^0|[a-zA-Z]+$', '')
Below is my workflow.xml
<workflow-app name="proj_map" xmlns="uri:oozie:workflow:0.4">
<start to="sources_creation"/>
<action name="sources_creation">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>/user/sin/oozie/sources_creation.hql</script>
</hive>
<ok to="mockup and mapping table update"/>
<error to="kill"/>
</action>
<action name="mockup and mapping table update">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>/user/sin/oozie/project_mapping.hql</script>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The action name can not have white space. <action name="mockup and mapping table update"\>. Remove the white spaces from the action name. It should work afterwards.
I've written an Oozie workflow that runs a BASH shell script to do some hive queries and perform some actions on the results. The script runs but throws a permission error when accessing some of the HDFS data. The user that submitted the Oozie workflow has permission but the script is running as the yarn user.
Is it possible to make Oozie execute the script as the user who submitted the workflow? Hive and Java actions both execute as the submitted user, just shell is behaving differently.
Here's the rough outline of my Oozie action
<action name="start_action"
retry-max="12"
retry-interval="600">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${WorkflowRoot}/hive-site.xml</job-xml>
<exec>script.sh</exec>
<file>${WorkflowRoot}/script.sh</file>
<capture-output />
</shell>
<ok to="next_action"/>
<error to="send_email"/>
</action>
I'm running Oozie 4.1.0 and HDP 2.1.
This issue will occur in all cluster that are configured using Simple Security. You've an option to override the default configuration. Include the below statement at the starting of the shell script will fix this issue.
export HADOOP_USER_NAME=<Name of submitted user>;
you can make run with help of env-var
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>test.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/root/test.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I would like to make an oozie workflow where the final step of success would be to "Archive" the results.
The command in the shell to do it is
hadoop archive -archiveName=XXX.har -p /some/random/parent directorToArhive pathToArchiveDestination
I have tried the following
<workflow-app name="HARD_CODED_ARCHIVE_TEST" xmlns="uri:oozie:workflow:0.4">
<start to="archive"/>
<action name="archive">
<archive archiveName="xxx.har" src="/root/src/dir" dest="/path/to/desired/archive/location"/>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The Error I get is something like the following:
WARNING: Exception in Runloop of thread: main with message: E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'archive'. One of '{"uri:oozie:workflow:0.4":map-reduce, "uri:oozie:workflow:0.4":pig, "uri:oozie:workflow:0.4":sub-workflow, "uri:oozie:workflow:0.4":fs, "uri:oozie:workflow:0.4":java, WC[##other:"uri:oozie:workflow:0.4"]}' is expected.
So it is very clear that I can't do this. because the oozie workflow schema does not support the "archive" action.
I really don't want to run this via a cron as I would like to archive immediately after a workflow completes successfully how do I do this.
Try this:
<action name="archive">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.apache.hadoop.tools.HadoopArchives</main-class>
<arg>-archiveName</arg>
<arg>${YourArchiveName}.har</arg>
<arg>-p</arg>
<arg>${FilesParentDirectory}</arg>
<arg>${SrcDirectory}</arg>
<arg>${DestDirectory}</arg>
</java>
<ok to="end"/>
<error to="error"/>
</action>
All you need is the hadoop-archives.jar file in your workflow. Also don't forget to put the jar in your workflow directory and you should be good to go. Hope that helps!
I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
The Python script (hello.py) is simple.
print("Hello, World!")
Here is my Oozie workflow.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
<start to="shell-check-hour"/>
<action name="shell-check-hour">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hello.py</exec>
<file>hdfs://localhost:8020/user/test/hello.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Can someone see anything wrong with what I am doing? If I replace the Python script with a shell script, the shell script executes fine (both files are in the same directory). This leads me to believe that the problem is that for whatever reason, Python isn't being recognised by Oozie.
Add Hash-Bang to your script
For example, my script started with
#!/usr/bin/env python
i'm unable to import oozie workflow in hue editor, hue version 2.5.0
Error : Could not import workflow, Node kill has not been defined
<workflow-app name="mapDeply" xmlns="uri:oozie:workflow:0.4">
<start to="TestPOC"/>
<action name="TestPOC">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/data/temp"/>
</prepare>
<main-class>WordCount</main-class>
<arg>/data/input</arg>
<arg>/data/temp</arg>
</java>
<ok to="end"/>
<error to="killemail"/>
</action>
<action name="killemail">
<email xmlns="uri:oozie:email-action:0.1">
<to>test#test.com</to>
<subject>Test</subject>
<body>TEST</body>
</email>
<ok to="kill"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
if i change java action error to kill it's working. is this excepted behavior or is there any work around to resolve it
This is currently not supported. You indeed need to have each action error node point to the kill node, then import the workflow, then modify it in the editor.
This will be improved in the future and this use case can be in part replaced by the Oozie SLA, supported up in Hue 3.6.