How to maintain failure recovery during sqoop import Job - hadoop

We are planing to create Oozie job which run Sqoop command to import data from SQL server to HDFS on hourly basis. But we are facing challenge, how to get alert if that job fails in between and how sqoop will check which data imported successfully and which is still pending. Is there any process to maintain transactions and retry mechanism during sqoop import. And also we get alert on their failure.

You can configure the workflow of Oozie to send an email on fail.
You could achieve this by redirecting the error tag from any action to a send email action.
An example for the email configuration might be the following.
<action name="send-email">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Failed to import table.</subject>
<body>The following import has failed.
failed the workflow that was trying to perform job --exec import-${tableName}-${environment}-${format}-${db} --verbose
ID= ${wf:id()}
NAME= ${wf:name()}
APP PATH= ${wf:appPath()}
USER= ${wf:user()}
GROUP= ${wf:group()}
NAMENODE= ${nameNode}
JOBTRACKER = ${jobTracker}
QUEUE = ${queueName}
START DATE = ${start}
error message[${wf:errorMessage(wf:lastErrorNode())}]</body>
</email>
<ok to="fail-job"/>
<error to="fail-email"/>
</action>
Notice that email adressess can be multiple comma separated.
For the email to be sent properly you also need to configure the oozie email client properly at the oozie custom site. The parameters that you might need to configure are the following:
Custom oozie-site
oozie.email.smtp.password
oozie.email.from.address
oozie.email.smtp.auth
oozie.email.smtp.host
oozie.email.smtp.port
oozie.email.smtp.username
oozie.service.ProxyUserService.proxyuser.falcon.groups
oozie.service.ProxyUserService.proxyuser.falcon.hosts
About retry up from Oozie 3.1 you can configure parameter retry and retry interval in every action. To achieve this you can set the following parameters inside the action tag
<action name="a" retry-max="2" retry-interval="1">
....
</action>
More information at Oozie's documentation
You can find out or modify retry and retry interval defaults on oozie-default.xml. Generic defaults are specified here

Related

Oozie Hive Action Using -i init script

How can I run an Oozie Hive or Hive2 Action with init scripts?
In the CLI this can be usually done via the -i init.hive argument; however when using this in an Oozie Action via <argument>-i init.hive</argument> the workflow stops with an error.
I linked the init.hive file with the <file>init.hive#init.hive</file> property and it is available in the local appcache directory.
$ ll appcache/application_1480609892100_0274/container_e55_1480609892100_0274_01_000001/ | grep init
> lrwxrwxrwx 1 root root 42 Jan 12 12:24 init.hive -> /hadoop/yarn/local/filecache/519/init.hive
The error (in the local appcache) is the following
Connecting to jdbc:hive2://localhost:10000/
Connected to: Apache Hive (version 1.2.1000.2.4.0.0-169)
Driver: Hive JDBC (version 1.2.1000.2.4.0.0-169)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Running init script init.hive
init.hive (No such file or directory)
The hive2 action looks like this (the complete workflow can be found on Github https://github.com/chaosmail/oozie-bugs/tree/master/simple-hive-init/simple-hive-init-wf)
<action name="test-action">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<jdbc-url>${jdbcURL}</jdbc-url>
<script>query.hive</script>
<argument>-i init.hive</argument>
<file>init.hive#init.hive</file>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
Edit 1: added workflow action
[Recap of the comments thread above, plus some extra stuff in retrospect]
The Oozie documentation states that you may have multiple <argument> elements in your Action, which hints that the arguments must be provided separately.
In retrospect, it makes sense -- on a command line, it's the shell that would parse the list of arguments into an args[] array for the Java executable, but Oozie is not a shell interpreter...
And experience shows that Beeline accepts two syntax variants for its command-line args...
-xValue (one arg) means option -x with associated Value
-x followed by Value (two args) means the same thing
So you have two correct ways to pass command-line arguments to Beeline via Oozie:
<argument>-xValue</argument>
<argument>-x</argument> <argument>Value</argument>
On the other hand, <argument>-x Value</argument> would fail, because in single-arg syntax, Beeline considers that the separator space should be part of the value...!

How to retrieve previous action name from the current action in apache oozie?

Is there any EL function to get previous action name from current action in oozie workflow.
If it is not possible with EL function how it is possible ?
If you need this to get the error message use wf:lastErrorNode()
<kill name="kill">
<message>
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
You can use capture_output tags to navigate the node names in workflow. But this is limited to java and shell actions.

Informatica error 1417 :: Task not yet registered with this service process

I am getting following error while running a workflow in informatica.
Session task instance [worklet.session] : [TM_6775 The master DTM process was unable to connect to the master service process to update the session status with the following message: error message [ERROR: The session run for [Session task instance [worklet.session]] and [ folder id = 206, workflow id = 16042, workflow run id = 65095209, worklet run id = 65095337, task instance id = 13272 ] is not yet registered with this service process.] and error code [1417].]
This error comes randomly for many other sessions, when they are ran through workflow as a whole. However if I "start task" that failed task next time, it runs successfully.
Any help is much appreciated.
Just an idea to try if you use versioning. Check that everthing is checked in correctly. If the mapping, worflow or worklet is checked out then you and informatica will run different versions wich may cause the behaivour to differ when you start it manually.
Infromatica will allways use the checked in version and you will allways use the checked out version.

Oozie coordinator issue

I have oozie installation as part of the cloudera installation.
I'm trying to execute the coordinator workflow fro the example with the following configuration in the coordinator.xml.
<coordinator-app name="cron-coord" frequency="${coord:minutes(60)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2">
With this configuration i expected the workflow to be executed every 1 hour , but it seems that the workflow has been executed every 5 minutes , does anyone have answer for this issue?
Are you setting the start time prior to the current time? If so, Oozie will work in the catch up mode until all delayed actions have been scheduled. The "frequency" setting does not apply to the catch-up mode.
You may give time coords in hours instead of minutes as :
coordinator-app name="cron-coord" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.2"

Using Oozie workflow and coordinator - E0302: Invalid parameter error

I'm trying to run a workflow using a coordinator, but when i try to set the workflow and coordinator XML file paths together, i get an error.
This is how my jobs.properties file looks like:
nameNode=hdfs://10.74.6.155:9000
jobTracker=10.74.6.155:9010
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/examples/apps/test/
oozie.coord.application.path=${nameNode}/user/${user.name}/examples/apps/test/
when i run my workflow with the command line:
bin\oozie job -oozie http://localhost:11000/oozie -config examples\apps\test\job.properties -run
i get the following error:
Error: E0302 : E0302: Invalid parameter [{0}]
what am i doing wrong?
Thanks!
Both workflow and coordination paths cannot exist in job.properties at the same time. You can either run a job as a workflow or as a coordination.
Use only your Coordinator path in your properties file and use your workflow path in the Coordinator.xml file.
**oozie.use.system.libpath=true
workflowpath=${nameNode}/user/${user.name}/examples/apps/test/
oozie.coord.application.path=${nameNode}/user/${user.name}/examples/apps/test/**
In your coordinator.xml file add this line
'<app-path>${workflowpath}</app-path>'

Resources