ERROR: Oozie workflow with sqoop action - sqoop

I have a problem when I created a workflow in oozie with sqoop action, the return Message Erros:
Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
my workflow.xml is:
<workflow-app xmlns='uri:oozie:workflow:0.5' name='sqoop-workflow'>
<start to='sqoop-load' />
<action name="sqoop-load">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<command>import --hbase-create-table --hbase-table <table_name> --column-family <family_name> --hbase-row-key <key_row> --connect "jdbc:sqlserver://<server>:<port>;database=<database>;username=<user>;password=<pass>" --query "SELECT * FROM <table_name> WHERE \$CONDITIONS AND <column_name> != ''" -m 1</command>
</sqoop>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Sqoop export failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
and my job.properties:
nameNode=hdfs://localhost:8020
jobTracker=localhost:8032
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib
oozie.wf.application.path=${nameNode}/user/${user.name}/
my 'oozie.libpath' contain the mssql-jdbc-6.4.0.jre7.jar for connection SQL Server and 'oozie.wf.application.path' have my workflow.xml
I run the workflow.xml with the command:
oozie job -oozie http://localhost:11000/oozie -config job.properties -run
OBS: I run everthing in Clodeura VM.

Related

How to read config properties in sub-workflow (separate xml file)?

I am getting below mentioned error message while reading config properties in separate sub-workflow file. I am posting the sample code. Appreciate your help in resolving this issue. Thank you!
2019-01-17 08:44:52,885 WARN ActionStartXCommand:523 - SERVER[localhost] USER[user1] GROUP[-] TOKEN[] APP[subWorkflow] JOB[0338958-190114130857167-oozie-oozi-W] ACTION[0338958-190114130857167-oozie-oozi-W#subWorkflowAction1] ELException in ActionStartXCommand
javax.servlet.jsp.el.ELException: variable [jobtracker] cannot be resolved
Coordinator job trigger command
oozie job --oozie http://localhost:11000/oozie --config /home/user/oozie-scripts/props/job.properties -run
job.properties
namenode=hdfs://localhost
workflowpath=${namenode}/user/user1/oozie-workflow/parentWorkflow.xml
frequency=25
starttime=2018-08-06T13\:29Z
endtime=2108-08-06T13\:29Z
timezone=UTC
oozie.coord.application.path=${namenode}/user/user1/oozie-workflow/coordinator.xml
jobtracker=http://localhost:8088
scriptpath=/user/user1/oozie-workflow
Coordinator
<coordinator-app name="sampleCoord" frequency="${frequency}" start="${starttime}" end="${endtime}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.4">
<action>
<workflow>
<app-path>${workflowpath}</app-path>
</workflow>
</action>
</coordinator-app>
Parent Workflow
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "Parent-Workflow">
<start to = "workflowAction1" />
<action name = "workflowAction1">
<sub-workflow>
<app-path>/user/user1/oozie-workflow/subWorkflow1.xml</app-path>
</sub-workflow>
<ok to = "end" />
<error to = "end" />
</action>
Sub-Workflow
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "subWorkflow">
<start to = "subWorkflowAction1" />
<action name = "subWorkflowAction1">
<hive xmlns = "uri:oozie:hive-action:0.4">
<job-tracker>${jobtracker}</job-tracker>
<script>${scriptpath}/dropTempTable.hive</script>
<param>Temp_TableVar=${concat(concat("HBASE_",replaceAll(wf:id(),"- ","_")),"_TEMP")}</param>
</hive>
<ok to = "end" />
<error to = "kill_job" />
</action>
<kill name = "kill_job">
<message>Job failed</message>
</kill>
<end name = "end" />
</workflow-app>
Adding propagate-configuration tag in parent workflow xml file resolved the issue.
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "Parent-Workflow">
<start to = "workflowAction1" />
<action name = "workflowAction1">
<sub-workflow>
<app-path>/user/user1/oozie-workflow/subWorkflow1.xml</app-path>
<propagate-configuration />
</sub-workflow>
<ok to = "end" />
<error to = "end" />

Copying files from a hdfs directory to another with oozie distcp-action

My actions
start_fair_usage ends with status okey, but test_copy returns
Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, null
In /user/comverse/data/${1}_B I have a lot of different files, some of which I want to copy to ${NAME_NODE}/user/evkuzmin/output. For that I try to pass paths from copy_files.sh which holds an array of paths to the files I need.
<action name="start_fair_usage">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<exec>${copy_file}</exec>
<argument>${today_without_dash}</argument>
<argument>${mta}</argument>
<!-- <file>${path}#${start_fair_usage}</file> -->
<file>${path}${copy_file}#${copy_file}</file>
<capture-output/>
</shell>
<ok to="test_copy"/>
<error to="KILL"/>
</action>
<action name="test_copy">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<arg>${wf:actionData('start_fair_usage')['paths']}</arg>
<!-- <arg>${NAME_NODE}/user/evkuzmin/input/*</arg> -->
<arg>${NAME_NODE}/user/evkuzmin/output</arg>
</distcp>
<ok to="END"/>
<error to="KILL"/>
</action>
start_fair_usage starts copy_file.sh
echo ${1}
echo ${2}
dirs=(
/user/comverse/data/${1}_B
)
args=()
for i in $(hadoop fs -ls "${dirs[#]}" | egrep ${2}.gz | awk -F " " '{print $8}')
do
args+=("$i")
echo "copy file - "${i}
done
paths=${args}
echo ${paths}
Here is what I did in the end.
<start to="start_copy"/>
<fork name="start_copy">
<path start="copy_mta"/>
<path start="copy_rcr"/>
<path start="copy_sub"/>
</fork>
<action name="copy_mta">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}mta/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*mta.gz</arg>
<arg>${NAME_NODE}${dstFolder}mta/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<action name="copy_rcr">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}rcr/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*rcr.gz</arg>
<arg>${NAME_NODE}${dstFolder}rcr/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<action name="copy_sub">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}sub/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*sub.gz</arg>
<arg>${NAME_NODE}${dstFolder}sub/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<join name="end_copy" to="END"/>
<kill name="KILL">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="END"/>
Turned out it was possible to use wildcards in distcp, so I didn't need bash at all.
Also. Some people adviced me to write it in scala.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path, FileUtil}
val conf = new Configuration()
val fs = FileSystem.get(conf)
val listOfFileTypes = List("mta", "rcr", "sub")
val listOfPlatforms = List("B", "C", "H", "M", "Y")
for(fileType <- listOfFileTypes){
FileUtil.fullyDeleteContents(new File("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType))
for (platform <- listOfPlatforms) {
var srcPaths = fs.globStatus(new Path("/user/comverse/data/" + "20170404" + "_" + platform + "/*" + fileType + ".gz"))
var dstPath = new Path("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType)
for(srcPath <- srcPaths){
println("copying " + srcPath.getPath.toString)
FileUtil.copy(fs, srcPath.getPath, fs, dstPath, false, conf)
}
}
}
Both things work, thought I haven't tried to run the scala script in Oozie.

How to get file name dynamically in decision node in OOZIE?

i want check whether the file is exist or not, In HDFS location using oozie batch.
in my HDFS location , in daily base I will get file like "test_08_01_2016.csv","test_08_02_2016.csv" at every day 11PM.
So i want check whether the file exist are after 11.15 PM ,i can check file exist on not using decision node. by using below workflow .
<workflow-app name="HIVECoWorkflow" xmlns="uri:oozie:workflow:0.5">
<start to="CheckFile"/>
<decision name="CheckFile">
<switch>
<case to="nextOozieTask">
${fs:exists("/user/cloudera/file/input/test_08_01_2016.csv")}
</case>
<default to="MailActionFileMissing" />
</switch>
<action name="MailActionFileMissing" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<action name="nextOozieTask" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select1.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
but i want to get file name dynamically like for example
" filenamt_todaysdate i.e test_08_01_2016.csv".
please help me on this how can i get filename dynamical.
thanks in advance.
The solution for the above question is, we have to get the date value from coordination job like below code ,inside the coordination job.
<property>
<name>today</name>
<value>${coord:formatTime(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), 'yyyyMMdd')}</value>
</property>
We can check the file exist or not in given HDFS location with the help fs:exists i.e
${fs:exists(concat(concat(nameNode, path),today))}
And in workflow we have to pass the parameter of the coordination job date value “today” like below code
<workflow-app name="HIVECoWorkflow" xmlns="uri:oozie:workflow:0.5">
<start to="CheckFile"/>
<decision name="CheckFile">
<switch>
<case to="nextOozieTask">
${fs:exists(concat(concat(nameNode, path),today))}
</case>
<case to="nextOozieTask1">
${fs:exists(concat(concat(nameNode, path),yesterday))}
</case>
<default to="MailActionFileMissing" />
</switch> </decision>
<action name="MailActionFileMissing" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<action name="nextOozieTask" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select1.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action><kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
in job.properties we can declare all static values like below.
jobStart=2016-08-23T09:50Z
jobEnd=2016-08-23T10:26Z
tzOffset=-8
initialDataset=2016-08-23T09:50Z
oozie.use.system.libpath=True
security_enabled=False
dryrun=True
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
test=${nameNode}/user/cloudera/email1
oozie.coord.application.path=${nameNode}/user/cloudera/email1/add-partition-coord-app.xml
path=/user/cloudera/file/input/ravi_
May be you can write a shell script which does the hdfs file exists check. Upon success return 0 else 1. Based on this rewrite oozie workflow success and error nodes...

Imported Failed: Cannot convert SQL type 2005==> during importing CLOB data from Oracle database

I am trying to import a Oracle table's data with CLOB data type using sqoop and it is failing with the error Imported Failed: Cannot convert SQL type 2005. I am using Running Sqoop version: 1.4.5-cdh5.4.7.
Please help me how to import CLOB data type.
I am using the below oozie workflow to import the data
<workflow-app xmlns="uri:oozie:workflow:0.4" name="EBIH_Dly_tldb_dly_load_wf">
<credentials>
<credential name="hive2_cred" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2_jdbc_uri}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2_server_principal}</value>
</property>
</credential>
</credentials>
<start to="sqp_imp_tldb_table1"/>
<action name="sqp_imp_tldb_table1">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>-Dmapreduce.output.fileoutputformat.compress=false</arg>
<arg>--connect</arg>
<arg>${connect_string}</arg>
<arg>--username</arg>
<arg>${username}</arg>
<arg>--password</arg>
<arg>${password}</arg>
<arg>--num-mappers</arg>
<arg>8</arg>
<arg>--as-textfile</arg>
<arg>--append</arg>
<arg>--fields-terminated-by</arg>
<arg>|</arg>
<arg>--split-by</arg>
<arg>created_dt</arg>
<arg>--target-dir</arg>
<arg>${sqp_table1_dir}</arg>
<arg>--map-column-hive</arg>
<arg>ID=bigint,XML1=string,XML2=string,APP_PAYLOAD=string,created_dt=date,created_day=bigint</arg>
<arg>--query</arg>
<arg>"select * from schema1.table1 where $CONDITIONS AND trunc(created_dt) BETWEEN to_date('${load_start_date}','yyyy-mm-dd') AND to_date('${load_end_date}','yyyy-mm-dd')"</arg>
</sqoop>
<ok to="dly_load_wf_complete"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="dly_load_wf_complete"/>
</workflow-app>
Finally it worked for me with an additional clause -D oraoop.disabled=true in sqoop import option.
The below worked
<action name="sqp_imp_tldb_table1">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>-Dmapreduce.output.fileoutputformat.compress=false</arg>
<arg>-Doraoop.disabled=true</arg>
<arg>--connect</arg>
<arg>${connect_string}</arg>
<arg>--username</arg>
<arg>${username}</arg>
<arg>--password</arg>
<arg>${password}</arg>
<arg>--num-mappers</arg>
<arg>8</arg>
<arg>--as-textfile</arg>
<arg>--append</arg>
<arg>--fields-terminated-by</arg>
<arg>\t</arg>
<arg>--split-by</arg>
<arg>created_dt</arg>
<arg>--target-dir</arg>
<arg>${sqp_table1_dir}</arg>
<arg>--map-column-hive</arg>
<arg>ID=bigint,XML1=string,XML2=string,APP_PAYLOAD=string,created_dt=date,created_day=bigint</arg>
<arg>--query</arg>
<arg>"select * from schema1.table1 where $CONDITIONS AND trunc(created_dt) BETWEEN to_date('${load_start_date}','yyyy-mm-dd') AND to_date('${load_end_date}','yyyy-mm-dd')"</arg>
</sqoop>
<ok to="dly_load_wf_complete"/>
<error to="fail"/>
</action>

MapR oozie sqoop error; Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

I'm repeatedly getting this error when I submit a sqoop job using oozie on MapR. Details below. I even copied the mysql jar file to the share/lib/sqoop directory, with no result. Could you please help?
Command:
/opt/mapr/oozie/oozie-4.0.1/bin/oozie job -oozie=http://OOZIE_URL:11000/oozie -config job.properties -run
Error
2015-06-18 01:54:05,818 WARN SqoopActionExecutor:542 - SERVER[data-sci1] USER[mapr] GROUP[-] TOKEN[] APP[sqoop-orders-wf] JOB[0000024-150616000730465-oozie-mapr-W] ACTION[0000024-150616000730465-oozie-mapr-W#sqoop-orders-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
MaprFS:
/oozie/share/lib/sqoop/mysql-connector-java-5.1.25-bin.jar
job.properties:
nameNode=maprfs:/// jobTracker=YARN_RESOURCE_MANAGER:8032 queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=maprfs:/oozie/data/sqoop/orders
mapreduce.framework.name=yarn
workflow.xml:
<start to="sqoop-orders-node"/>
<action name="sqoop-orders-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<arg>import</arg>
<arg>--hbase-create-table</arg>
<arg>--hbase-table</arg>
<arg>orders</arg>
<arg>--column-family</arg>
<arg>d</arg>
<arg>--username</arg>
<arg>USERNAME</arg>
<arg>--password</arg>
<arg>PASSWORD</arg>
<arg>--connect</arg>
<arg>"jdbc:mysql://HOST?zeroDateTimeBehavior=round"</arg>
<arg>--query</arg>
<arg>--split-by</arg>
<arg>o.OrderId</arg>
<arg>--hbase-row-key</arg>
<arg>rowkey</arg>
<arg>-m</arg>
<arg>8</arg>
<arg>--verbose</arg>
<arg>--query</arg>
<arg>select o.OrderId as rowkey, o.OrderId as orderId from orders WHERE \$CONDITIONS</arg>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

Resources