Oozie Xml Workflow Schema Validation error - hadoop

When I run oozie in order to schedule HBASE through sqoop job incremental append.
I'm getting the following error:
<action name="sqoop-import">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<job-xml>/user/root/hbase-site.xml</job-xml>
<command>import --connect "jdbc:sqlserver://localhost:1433;database=test" --table test_plan_package --username sa --password pass
--incremental append --check-column testid --hbase-table test_plan --column-family testid</command>
<file>/user/root/sqljdbc4.jar#sqljdbc4.jar</file>
<file>/user/root/hbase/hbase-client.jar#hbase-client.jar</file>
<file>/user/root/hbase/hbase-common.jar#hbase-common.jar</file>
<file>/user/root/hbase/hbase-protocol.jar#hbase/hbase-protocol.jar</file>
<file>/user/root/hbase/htrace-core3.1.0-incubating.jar#htrace-core3.1.0-incubating.jar</file>
<file>/user/root/hbase/hbase-server.jar#hbase-server.jar</file>
<file>/user/root/hbase/hbase-hadoop-compat.jar#hbase-hadoop-compat.jar</file>
<file>/user/root/hbase/high-scale-lib-1.1.1.jar#high-scale-lib-1.1.1.jar</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
I try various portals and came to know that problem is with xml schema version 0.2 and it need to be upgraded to 0.4 in workflow.xml.
Could anyone provide me the steps to upgrade the xml version to 0.4 in oozie.

modify your job-xml above the configuration and there no need to upgrade to xml 0.2 to xml 0.4 directly exit 0.4 because in oozie-site.xml we have xsd file for that the error your getting because of Job-xml should be place above the configuration.
and check the jars according to version and modify the workflow.xml

Related

Unable to run "sqoop job --exec" in oozie

need some advice I'm trying to run sqoop job in oozie but suddenly it was killed and there's this warning in oozie-error.log
2018-01-21 17:30:12,473 WARN SqoopActionExecutor:523 - SERVER[edge01.domain.com] USER[linknet] GROUP[-] TOKEN[] APP[sqoop-wf] JOB[0000006-180121122345026-oozie-link-W] ACTION[0000006-180121122345026-oozie-link-W#sqoop-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
job.properties
nameNode=hdfs://hadoop01.domain.com:8020
jobTracker=hadoop01.domain.com:18032
queueName=default
oozie.use.system.libpath=true
examplesRoot=examples
oozie.libpath=${nameNode}/share/lib/oozie
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/sqoop
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-wf">
<start to="sqoop-node"/>
<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<command>job --exec ingest_cpm_alarm</command>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
And this is how I created sqoop job ingest_cpm_alarm
$ sqoop job --create ingest_cpm_alarm -- import --connect jdbc:postgresql://xxx.xxx.xxx.xxx:5432/snapshot --username "extractor" -P \
--incremental append \
--check-column snapshot_date \
--table cpm_snr_history \
--as-avrodatafile \
--target-dir /tmp/trash/cpm_alarm
I can run this sqoop job successfully but not in Oozie scheduler.
Also, jar file postgresql-42.1.4.jar and everything under $SQOOP_HOME/lib have been copied into libpath directory (/share/lib/oozie).
Oozie and sqoop reside in the same server. In my sqoop-site.xml, I only set these parameters.
sqoop.metastore.client.enable.autoconnect=true
sqoop.metastore.client.record.password=true
sqoop.metastore.client.record.password=true
Did I miss something here ?
it was resolved, I missed sqoop-site.xml that should be available in the same workflow directory in HDFS.
This post has similar issue:
sqoop exec job in oozie is not working
Thanks.

oozie over hive to fetch the data from table

I am trying to do automation through oozie over hive.I wrote simple hive query for creation of table and select queries on that table.When I submitted the same script.Script goes to running mode and doesn't execute.I checked the yarn application -list ,job was hanged on 95%.Hive table had been created successfully but not able to fetch data from table.Please let me know how to resolve this problem.
Thanks in Advance.
Workflow.xml
<action name="hive2-node">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script.q</script>
<param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive2</param>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
script.q
job.properties
nameNode=hdfs://...:8020
jobTracker=...:8050
queueName=default
jdbcURL=jdbc:hive2://..*.:10000/default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/hive2

Scheduling a sqoop job in oozie through Shell script using Hue

I am able to run a sqoop command in Oozie using Hue. But, when I try to run the same sqoop command by placing it in a shell script I am getting an error like below
Stdoutput 2016-05-20 10:52:13,241 ERROR [main] sqoop.Sqoop (Sqoop.java:runSqoop(181)) - Got exception running Sqoop:
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
I have included the jdbc jar file like I did while running the sqoop command directly. I don't understand why it is not working for shell script.
Here is the workflow generated by Hue
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="shell-ca31"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-ca31">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>/user/oozie/libext</value>
</property>
</configuration>
<exec>sqoopoozie.sh</exec>
<file>/user/yxr6907/sqoopoozie.sh#sqoopoozie.sh</file>
<archive>/user/oozie/libext/ojdbc7.jar#ojdbc7.jar</archive>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
When you use shell action, jars for sqoop are not imported into classpath.
I was able to solve it by adding the jar into the classpath. Then, i export HADOOP_CLASSPATH and sqoop works.
Use the following:
Put the jar ojdbc7.jar in files
Use the following command inside shell script: export HADOOP_CLASSPATH=${PWD}/ojdbc7.jar
Instead of step 1. you can use the following properties to load jar into classpath:
oozie.use.system.libpath=true
oozie.libpath=/path/to/jars
Exporting HADOOP_CLASSPATH is required in both ways.

Propagating oozie job last run date to last-value

I have a oozie workflow that runs a sqoop command to do incrementally load data from a table based on the lastupdatedate.
How do I set the --last-value so that we get records from the last time we ran the job to now?
In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query.
Hive action for the select query based on the logic to retrieve the
last updated value .
Sqoop action for incremental load from thecaptured output of
previous hive action.
PFB a sudo workflow :
<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
<start to="hiveact"/>
<action name="hiveact">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.sql</script>
<capture-output/>
</hive>
<ok to="sqoopact"/>
<error to="kill"/>
<action name="sqoopact">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed</message>
</kill>
<end name="end"/>
Hope this helps.

Executing Sqoops using Oozie

I have 2 Sqoops that loads data from HDFS to MySQL. I want to execute them using Oozie. I have seen that Oozie is an XML file. How can I configure it so I can execute those Sqoop? Demonstration with steps will be appreciated?
Two Sqoops are:
1.
sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1
2.
sqoop export --connect jdbc:mysql://localhost/hduser --table foo2 -m 1 --export-dir /user/cloudera/bar2
Thanks.
You don't have to execute it via a shell action. There is a separate sqoop action in oozie. Here is what you have to put in your workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie-wf">
<start to="sqoop-wf1"/>
<action name="sqoop-wf1">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</command>
</sqoop>
<ok to="sqoop-wf2"/>
<error to="fail"/>
</action>
<action name="sqoop-wf2">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar2</command>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Hope this helps..
You can use an Oozie shell action for this. Basically you need to create a shell action & provide the commands that you posted in your question as the commands to be executed within the action
Sample Oozie action:
<action name="SqoopAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<job-xml>[SHELL SETTINGS FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<exec>[SHELL-COMMAND]</exec>
<argument>[ARG-VALUE]</argument>
...
<argument>[ARG-VALUE]</argument>
<env-var>[VAR1=VALUE1]</env-var>
...
<env-var>[VARN=VALUEN]</env-var>
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
<capture-output/>
</shell>
In your case, you would replace [SHELL-COMMAND] with whatever Sqoop command you want to run, such as:
<exec>sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</exec>
Also, you could put all your sqoop commands in a shell script, and execute that script instead. This is better if you have a lot of commands to be executed.

Resources