Errors when using Sqoop action in Oozie editor (Hue) - hadoop

I am trying to use Sqoop action in Oozie editor in Hue, however I can't get it to work.
Here's what I have tried so far.
I put everything in arguments, instead of command (http://alvincjin.blogspot.com.au/2014/06/create-sqoop-action-in-oozie-using-hue.html)
Further, I am trying to connect to Teradata, so I've placed the jdbc jars in HDFS and have added them in Files.
This is what the current workflow looks like in the editor:
Sqoop Action
.
The workflow definition is:
<workflow-app name="Sqoop_test" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-b20d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-b20d">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>development</value>
</property>
<property>
<name>mapred.job.name</name>
<value>test_sqoop</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>0</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://XXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--fetch-size</arg>
<arg>10000</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--hive-table-name</arg>
<arg>XXXXX.tmp_sqoop_test</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
However, I get this error:
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(296)) - Error parsing arguments for import:
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-table-name
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-table-name
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: XXXXX.tmp_sqoop_test
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: tdcprdr_app_digital.tmp_sqoop_test
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-import
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-import
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-overwrite
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-overwrite
2787 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --target-dir
2016-01-06 14:13:52,115 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --target-dir
...
I was under the impression that this error can be resolved by placing everything in arguments.
The same code works when run through a shell script. I've tried placing the import command and connection string in command section, but that doesn't even run. I've tried creating a minimalistic sqoop action as well, with just the query and connect statement as follows:
<workflow-app name="Sqoop_minimal" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-eeeb"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-eeeb">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://tdXXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
With this workflow, I get a very vague error as follows:
>>> Invoking Sqoop command line now >>>
2287 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-01-06 14:57:48,381 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(175)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2324 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5.3.0.0.0-249
2016-01-06 14:57:48,418 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.5.3.0.0.0-249
2339 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2016-01-06 14:57:48,433 WARN [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1014)) - Setting your password on the command-line is insecure. Consider using -P instead.
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
The Oozie version is 4.1.0.3.0.0.0-249.
I've tried searching for a solution online, but no luck.
Any help would be appreciated. Thank you!
Already seen and tried the links:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Sqoop-fails-with-quot-Error-parsing-arguments-for-import-quot/td-p/31930
http://stackoverflow.com/questions/25770698/sqoop-free-form-query-causing-unrecognized-arguments-in-hue-oozie

There is no such argumnets for sqoop
--hive-table-name
use
--hive-table. It should not show Unrecognized argument now

Related

oozie job status always shows 50%, if i run sample oozie workflow also same in my cloudera quickstart vm? why?

i am trying to run simple oozie jobs which are given as demo in cloudera quickstart usr/share/doc/oozie-4.1.0+cdh5.10.0+389/oozie-examples.tar.gz ,when i tried to run it runs and create a job but when i see the job status it shows always 50% i dont know why its happening. can anyone help on this please.
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.oozie.example.DemoJavaMain</main-class>
<arg>Hello</arg>
<arg>Oozie!</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
============================================
job.properties
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/demo/${examplesRoot}/apps/java-main
================================================
cli run
[cloudera#quickstart ~]$ oozie job -oozie http://quickstart.cloudera:11000/oozie -config /home/cloudera/Desktop/job.properties -run
job: 0000001-170507215637609-oozie-oozi-W
[cloudera#quickstart ~]$ oozie job -oozie http://quickstart.cloudera:11000/oozie -info 0000000-170507215637609-oozie-oozi-W
Job ID : 0000000-170507215637609-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : java-main-wf
App Path : hdfs://quickstart.cloudera:8020/user/cloudera/demo/examples/apps/java-main
Status : RUNNING
Run : 0
User : cloudera
Group : -
Created : 2017-05-08 05:23 GMT
Started : 2017-05-08 05:23 GMT
Last Modified : 2017-05-08 05:23 GMT
Ended : -
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-170507215637609-oozie-oozi-W#java-node PREP - - -
------------------------------------------------------------------------------------------------------------------------------------
0000000-170507215637609-oozie-oozi-W#:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
please help. i tried 100 of sample workflow all will run successfully and creates a job id and status of job will be always 50%

oozie sqoop action fails to import

I am facing issue while executing oozie sqoop action. In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing data to hive.
Below is a sqoop action I am using.
<action name="import" retry-max="2" retry-interval="5">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${jobQueue}</value>
</property>
</configuration>
<arg>import</arg>
<arg>-D</arg>
<arg>sqoop.mapred.auto.progress.max=300000</arg>
<arg>-D</arg>
<arg>map.retry.exponentialBackOff=TRUE</arg>
<arg>-D</arg>
<arg>map.retry.numRetries=3</arg>
<arg>--options-file</arg>
<arg>${odsparamFileName}</arg>
<arg>--table</arg>
<arg>${odsTableName}</arg>
<arg>--where</arg>
<arg>${ods_data_pull_column} BETWEEN TO_DATE(${wf:actionData('getDates')['prevMonthBegin']},'YYYY-MM-DD hh24:mi:ss') AND TO_DATE(${wf:actionData('prevMonthEnd')['endDate']},'YYYY-MM-DD hh24:mi:ss')</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--hive-table</arg>
<arg>${stgTable}</arg>
<arg>--hive-drop-import-delims</arg>
<arg>--warehouse-dir</arg>
<arg>${sqoopStgDir}</arg>
<arg>--delete-target-dir</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>
<arg>--compress</arg>
<arg>--compression-codec</arg>
<arg>gzip</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--verbose</arg>
<file>${odsSqoopConnectionParamsFileLocation}</file>
</sqoop>
<ok to="rev"/>
<error to="fail"/>
</action>
Below is the error i am getting in mapred logs
20078 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '1=1' and upper bound '1=1'
Heart beat
Heart beat
Heart beat
Heart beat
151160 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 0 bytes in 135.345 seconds (0 bytes/sec)
151164 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 0 records.
151164 [main] ERROR org.apache.sqoop.tool.ImportTool - Error during import: Import job failed!
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Please suggest
You can import the table to hdfs path using --target-dir and set the location of your hive table to point that path. I fixed it using this approach. Hope it helps you as well.

Accessing Vertica Database through Oozie sqoop

I have written an Oozie workflow to access an HP Vertica database through Sqoop. This is on a Cloudera VM. I am getting the following error in Yarn logs after running:
RROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: dbDriver
java.lang.RuntimeException: Could not load db driver class: dbDriver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.tool.EvalSqlTool.run(EvalSqlTool.java:64)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)"
This is a snippet from the jobprops file:
"dbDriver=com.vertica.jdbc.Driver
dbHost=host***.assist.***
dbName=vertica247
dbPassword=*****
dbPort=5433
dbSchema=simod_chat
dbStagingSchema=simodstg_chat
dbUser=vertica"
What should I specify for --connection-manager? When I run the same workflow outside the VM, it runs without the connection-manager argument?
As the error states:
Could not load db driver class: dbDriver
There are likely two problems:
The JDBC URL is probably incorrect
The JDBC Jar needs to be included in the workflow
For the JDBC URL, make sure it looks like this:
jdbc:vertica://VerticaHost:portNumber/databaseName
For the JDBC jar, it needs to be included with the workflow. Check out this article for a brief example on how to do this with HBase. TLDR: When you run Sqoop through oozie, you have to include the driver in the workflow:
<workflow-app name="sqoop-import" xmlns="uri:oozie:workflow:0.4">
<start to="sqoop-import"/>
<action name="sqoop-import">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:vertica://VerticaHost:portNumber/databaseName --username test --password test --table test</command>
<file>/user/admin/vertica-jdbc.jar#vertica-jdbc.jar</file>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Note the line:
<file>/user/admin/vertica-jdbc.jar#vertica-jdbc.jar</file>
It will automatically be included in your sqoop job.

Sqoop - Hive import using Oozie failed

I am trying to execute a sqoop import from oracle to hive, but the job fails with error
WARN [main] conf.HiveConf (HiveConf.java:initialize(2472)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have all the jar files in place
hive-site.xml is also in place with hive metastore configuration
<property>
<name>hive.metastore.uris</name>
<value>thrift://sv2lxgsed01.xxxx.com:9083</value>
</property>
I am able to run a sqoop import(using oozie) to HDFS successfully.
I am also able to execute a hive script(using oozie) successfully
I can also execute sqoop-hive import from commandline , but the same
command fails when I execute it using oozie
My workflow.xml is as below
<workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1">
<start to="sqoopAction"/>
<action name="sqoopAction">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect
jdbc:oracle:thin:#//sv2axcrmdbdi301.xxx.com:1521/DI3CRM --username xxxxxxx --password xxxxxx--table SIEBEL.S_ORG_EXT --hive-table eg.EQX_EG_CRM_S_ORG_EXT --hive-import -m1</command>
<file>/user/oozie/oozieProject/workflowSqoopAction/hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
I can also find the data being loaded in HDFS.
You need to do 2 things
1) Copy hive-site.xml in the oozie workflow directory 2) In your Hive action tell oozie that use my hive-site.xml

Oozie hive action with kerberos on HDP-1.3.3

I'm trying to execute hive script from oozie hive action on kerberos enabled environment.
Here is my workflow.xml
<action name="hive-to-hdfs">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.q</script>
<param>HIVE_EXPORT_TIME=${hiveExportTime}</param>
</hive>
<ok to="pass"/>
<error to="fail"/>
I'm facing issue when trying to connect to hive metastore.
6870 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://10.0.0.242:9083
Heart beat
Heart beat
67016 [main] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
67018 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
68018 [main] INFO hive.metastore - Connected to metastore.
Heart beat
Heart beat
128338 [main] WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
129339 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://10.0.0.242:9083
Heart beat
Heart beat
189390 [main] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
189391 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
190391 [main] INFO hive.metastore - Connected to metastore.
Heart beat
Heart beat
250449 [main] ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table SESSION_MASTER
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:953)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:887)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
When I disable kerberos security workflow works fine
To enable your Oozie Hive action to function on a secured cluster you need to include a <credentials> section with a credential of type 'hcat' to your workflow.
Your workflow would then look something like:
<workflow-app name='workflow' xmlns='uri:oozie:workflow:0.1'>
<credentials>
<credential name='hcat' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>HCAT_URI</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>HCAT_PRINCIPAL</value>
</property>
</credential>
</credentials>
<action name="hive-to-hdfs" cred="hcat">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.q</script>
<param>HIVE_EXPORT_TIME=${hiveExportTime}</param>
</hive>
<ok to="pass"/>
<error to="fail"/>
</action>
</workflow>
There is also Oozie documentation about this feature.

Resources