The following is my workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.3" name="import-job">
<start to="createtimelinetable" />
<action name="createtimelinetable">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
<command>import --connect jdbc:mysql://10.65.220.75:3306/automation --table ABC --username root</command>
</sqoop>
<ok to="end"/>
<error to="end"/>
</action>
<end name="end"/>
</workflow-app>
Getting the following error on trying to submit the job:
Error: E0701 : E0701: XML schema error, cvc-elt.1.a: Cannot find the declaration of element 'action'.
However, oozie validate workflow.xml returns:
Valid worflow-app
Anyone who faced and resolved a similar issue in the past?
Confirm if you have copied your workflow.xml to hdfs. You need not copy job.properties to hdfs but have to copy all the other files and libraries to hdfs
For those who reached here by googling the error message below is the general way to resolve Oozie schema issues:
Once your workflow.xml is complete, it's a best practice to validate it against oozie XSD schema file rather than submitting the Ooozie job and facing the issue later.
note on What is XSD schema:
XSD schema is a kind of validation file which narrates,
a. Sequence of tags
b. whether a tag should be present or not
c. what are the valid sub-tags in a tag, etc.
How to validate workflow XML against XSD?
a. find out the specific XSD, this is seen in xmlns(xml namespace) property
< workflow-app name='FooBarWorkFlow' xmlns="uri:oozie:workflow:0.4">
in this case, it is uri:oozie:workflow:0.4. find the XSD file of uri:oozie:workflow:0.4(get it from appendix of Oozie official site or can be found easily by Googling)
b. There are numerous XML validation sites(example https://www.liquid-technologies.com/online-xsd-validator), provide your Workflow XML file ,XSD file and validate
Errors in workflow XML file will be listed out with line and column info. Rectify these, now use the valid Workflow XML file to avoid schema validation errors in oozie.
oozie validate some_workflow.xml
Tells you line numbers and is much easier to understand than logging output.
Related
I'm not able to execute a sample job from oozie using sqoop command to import data into hive. I've placed the hive-site.xml in hdfs path but I think it's not picking the hive-site.xml file. I'm getting class not found exception. How to fix this?
workflow.xml
<!-- This is a comment -->
<workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie-wf">
<start to = "sqoop-node1" />
<!--Step 1 -->
<action name = "sqoop-node1" >
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker></job-tracker>
<name-node></name-node>
<command> import command </command>
</sqoop>
<ok to="end"/>
<error to="kill_job"/>
</action>
<kill name = "kill_job">
<message>Job failed</message>
</kill>
<end name = "end" />
</workflow-app>
nameNode=ip jobTracker=ip queueName=default user.name=oozie oozie.use.system.libpath=true oozie.libpath=/user/hdfs/share/share/lib/sqoop oozie.wf.application.path=workflow path outputDir=/tmp/oozie.txt
java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
I guess your Sqoop action requires the HCatalog library to interact with the Hive Metastore. And Oozie does not add that library by default, you have to require it explicitly.
Note that there is some literature about using HCatalog from Pig, but very little from Sqoop. Anyway the trick is the same...
From Oozie documentation:
Oozie share libraries are organized per action type... Oozie
provides a mechanism to override the action share library JARs
... More than one share library directory name can be specified
for an action ... For example: When using HCatLoader and HCatStorer in
pig, oozie.action.sharelib.for.pig can be set to pig,hcatalog to
include both pig and hcatalog jars.
In your case, you need to override a specific <property> in your Sqoop action, named oozie.action.sharelib.for.sqoop, with value sqoop,hcatalog -- then Oozie will provide the required JARs at run-time.
I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.
My objective was however, to define some global properties in file referenced in job-xml tag in global section.
After long fight and reading many articles I still cannot make it work.
I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.
Hopefully, you can give me a hint.
In short:
I have properties, dbserver, dbuser and dbpassword defined in /user/dm/conf/environment.xml
These properties are referenced in my /user/dm/jobs/sqoop-test/workflow.xml
At runtime, I receive an EL_ERROR saying that dbserver variable cannot be resolved
Here are details:
I'm using Cloudera 5.7.1 distribution installed on single node.
environment.xml file was uploaded into hdfs into /user/dm/conf folder.
Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dbserver</name>
<value>someserver</value>
</property>
<property>
<name>dbuser</name>
<value>someuser</value>
</property>
<property>
<name>dbpassword</name>
<value>somepassword</value>
</property>
</configuration>
workflow.xml file was uploaded into /user/dm/jobs/sqoop-test-job. Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
<global>
<job-xml>/user/dm/conf/env.xml</job-xml>
</global>
<start to="get-data"/>
<action name="get-data">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputRootPath}"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
<arg>--query</arg>
<arg>select col1 from table where $CONDITIONS</arg>
<arg>--split-by</arg>
<arg>main_id</arg>
<arg>--target-dir</arg>
<arg>${outputRootPath}/table</arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
</kill>
<end name='end'/>
</workflow-app>
Now, I execute oozie workflow from command line:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
Where my job-config.xml is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>namenode:8020</value>
</property>
<property>
<name>jobTracker</name>
<value>jobtracker:8021</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
<name>outputRootPath</name>
<value>/user/dm/data/sqoop-test</value>
</property>
</configuration>
OK, you are making two big mistakes.
1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)
Workflow Functional Specification
has a section 19 about Global Configuration
has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
the XML schema specification clearly shows the <global> element
Sqoop action Extension
does not make any mention of Global parameters
has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema
In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global> thing that was introduced in Workflow schema V0.4
2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.
Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.
Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global> Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.
Two things to note:
when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)
What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.
I am running an Oozie job with a workflow.xml file which begins:
<!-- <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> -->
<start to="imagesCreateSequenceFile"/>
<action name="imagesCreateSequenceFile">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
This gives me the following error:
Error: E0701 : E0701: XML schema error, cvc-elt.1.a: Cannot find the declaration of element 'start'.
I am pretty sure my job is pointing to the correct workflow.xml file on hdfs (so this does not fix it).
Any help is appreciated in working out why my job cannot see 'start'.
TIA!
PS I have tried a different workflow.xml and it gives the same error.
I needed to remove the comment marks around the first line, i.e. make it:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
I am trying to successfully run a sqoop-action in Oozie using a Hadoop Cluster.
Whenever I check on the jobs status, Oozie returns with the following status update:
Actions
ID Status Ext ID Ext Status Err Code
0000037-140930230740727-oozie-oozi-W#:start: OK - OK -
0000037-140930230740727-oozie-oozi-W#sqoop-load ERROR job_1412278758569_0002 FAILED/KILLEDJA018
0000037-140930230740727-oozie-oozi-W#sqoop-load-fail OK - OK E0729
Which leads me to believe that there is nothing wrong with my Workflow, as opposed to some permission I am missing.
My jobs.properties config:
nameNode=hdfs://mynamenode.demo.com:8020
jobTracker=mysnamenode.demo.com:8050
queueName=default
workingRoot=working_dir
jobOutput=/user/test/out
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib
oozie.wf.application.path=${nameNode}/user/test/${workingRoot}
MyWorkFlow.xml :
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns='uri:oozie:workflow:0.4' name='sqoop-workflow'>
<start to='sqoop-load' />
<action name="sqoop-load">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/test/${workingRoot}/out-data/sqoop" />
<mkdir path="${nameNode}/user/test/${workingRoot}/out-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<command>import --connect jdbc:oracle:thin:#10.100.50.102:1521/db --username myID --password myPass --table SomeTable -target-dir /user/test/${workingRoot}/out-data/sqoop </command>
</sqoop>
<ok to="end"/>
<error to="sqoop-load-fail"/>
</action>
<kill name="sqoop-load-fail">
<message>Sqoop export failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
Steps I have taken:
Looking up the Error...didn't find much beyound what I mentioned previously
checking that the required ojdbc.jar file was executable and that the /user/oozie/share/lib/sqoop directory and is accessible on HDFS
checking to see if I have any prexisting directories that might be causing a problem
I have been searching the internet and my log files for an answer....any help provided would be much appreciated....
Update:
Ok...so I add ALL of the jars within /usr/lib/sqoop/lib to /user/oozie/share/lib/sqoop. I am still getting the same errors. checking the job log...there is something I did not post previously:
2014-10-03 11:16:35,586 WARN CoordActionUpdateXCommand:542 - USER[ambari-qa] GROUP[-] TOKEN[] APP[sqoop-workflow] JOB[0000015-141002171510902-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100
As you can see I am running the job as "Super User".....and the error is exactly the same. So it cannot be a permission issue. I am thinking there is a jar that is required other than those required to be in the /user/oozie/share/lib/sqoop directory.....perhaps I need to copy the jars for mapreduce to be in /user/oozie/share/lib/mapreduce ?
Ok...problem solved.
Apparently EVERY component of the Oozie Workflow/Job must have it's corresponding *.jar dependencies uploaded to the Oozie SharedLib(/user/oozie/share/lib/) directories corresponding to those components.
I copied ALL the *.jars in /usr/lib/sqoop/lib into -> /user/oozie/share/lib
I copied ALL the *.jars in the /usr/lib/oozie/lib into -> /user/oozie/share/lib/oozie
After running the job again....the workflow stalled, and the error given was different from the last one....namely that this time around....the workflow was trying to create a directory on HDFS that already existed, so I removed that directory and then ran the job again.....
SUCCESS!
Side Note: People really need to write better exception messages. If this was just an issue a few people where having....then fine....but this is simply not the case. This particular error is giving more than a few people fits if the requests for help online are any indication.
I faced the same problem. Just adding a
<archive>path/in/hdfs/ojdbc6.jar#ojdbc6.jar</archive>
to my workflow.xml within the <sqoop> </sqoop> tags worked for me. Got the reference here.
I'm trying to run a hive action through Oozie. My workflow.xml is as follows:
<workflow-app name='edu-apollogrp-dfe' xmlns="uri:oozie:workflow:0.1">
<start to="HiveEvent"/>
<action name="HiveEvent">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${hiveConfigDefaultXml}</value>
</property>
</configuration>
<script>${hiveQuery}</script>
<param>OUTPUT=${StagingDir}</param>
</hive>
<ok to="end"/>
<error to="end"/>
</action>
<kill name='kill'>
<message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end'/>
And here is my job.properties file:
oozie.wf.application.path=${nameNode}/user/${user.name}/hiveQuery
oozie.libpath=${nameNode}/user/${user.name}/hiveQuery/lib
queueName=interactive
#QA
nameNode=hdfs://hdfs.bravo.hadoop.apollogrp.edu
jobTracker=mapred.bravo.hadoop.apollogrp.edu:8021
# Hive
hiveConfigDefaultXml=/etc/hive/conf/hive-default.xml
hiveQuery=hiveQuery.hql
StagingDir=${nameNode}/user/${user.name}/hiveQuery/Output
When I run this workflow, I end up with this error:
ACTION[0126944-130726213131121-oozie-oozi-W#HiveEvent] Launcher exception: org/apache/hadoop/hive/cli/CliDriver
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
Error Code: JA018
Error Message: org/apache/hadoop/hive/cli/CliDriver
I'm not sure what this error means. Where am I going wrong?
EDIT
This link says error code JA018 is: JA018 is output directory exists error in workflow map-reduce action. But in my case the output directory does not exist. This makes it all the more confusing
I figured out what was going wrong!
The class org/apache/hadoop/hive/cli/CliDriver is required for execution of a Hive Action. This much is obvious from the error message. This class is within this jar file: hive-cli-0.7.1-cdh3u5.jar. (In my case cdh3u5 in my cloudera version).
Oozie checks for this jar in the ShareLib directory. The location of this directory is usually configured in hive-site.xml, with the property name as oozie.service.WorkflowAppService.system.libpath, so Oozie should find the jar easily.
But in my case, hive-site.xml did not include this property, so Oozie didn't know where to look for this jar, hence the java.lang.NoClassDefFoundError.
To resolve this, I had to include a parameter in my job.properties file to point oozie to the location of the ShareLib directory, as follows:
oozie.libpath=${nameNode}/user/oozie/share/lib. (depends on where SharedLib directory is configured on your cluster).
This got rid of the error!