Its a first time am running mapreduce program from Oozie.
Here is my job.properties file
nameNode=file:/usr/local/hadoop_store/hdfs/namenode
jobTracker=localhost:8088
queueName=default
oozie.wf.applications.path=${nameNode}/Config
Here is my hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
Here is my core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
But when I run Ozzie command to run my Mapreduce program, Its give error that lib folder is not found. Error: E0405 : E0405: Submission request doesn't have any application or lib path
oozie job -oozie http://localhost:11000/oozie -config job.properties -run
I've created Config folder in HDFS and in that folder created lib folder too. In lib folder placed my mapreduce jar file and inside Config folder placed my workflow.xml file. (Its all in HDFS)
I think I ve give wrong HDFS path (nameNode) in job.properties file. That's why its not able to find {nameNode}/Config, May I know please what would be hdfs path ..?
Thanks
Update - 1 job.properties
nameNode=hdfs://localhost:8020
jobTracker=localhost:8088
queueName=default
oozie.wf.applications.path=${nameNode}/Config
still getting same error:
Error: E0405 : E0405: Submission request doesn't have any application or lib path
Update - 2 workflow.xml in Config folder in HDFS.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="simple-Workflow">
<start to="RunMapreduceJob" />
<action name="RunMapreduceJob">
<map-reduce>
<job-tracker>localhost:8088</job-tracker>
<name-node>file:/usr/local/hadoop_store/hdfs/namenode</name-node>
<prepare>
<delete path="file:/usr/local/hadoop_store/hdfs/namenode"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>DataDividerByUser.DataDividerMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>DataDividerByUser.DataDividerReducer</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/dataoutput</value>
</property>
</configuration>
</map-reduce>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Mapreduce program Failed</message>
</kill>
<end name="end" />
</workflow-app>
The <namenode> tag should not be a file path. It should point to the NameNode of the underlying Hadoop cluster where Oozie has to run the MapReduce job. Your name node should be the value of the fs.default.name from your core-site.xml.
nameNode=hdfs://localhost:9000
Also, change the property name oozie.wf.applications.path to oozie.wf.application.path (without the s).
Add the property oozie.use.system.libpath=true to your properties file.
Source: Apache Oozie by Mohammad Kamrul Islam & Aravind Srinivasan
Related
I am trying to set up Hadoop on my local machine. However, when I'm running wordcount based on map reduce example (I did hdfs namenode -format earlier) :
This is maybe hard to read but I end up with a "Job failed with state FAILED due to:
Application failed 2 times
due to AM Container exited with
exitCode: -1000 Failing this attempt.Diagnostics: No space available in any of the local directories."
I don't understand why I have such error. This is how my applications & attempt look like :
I followed several tutorials, ending up with these parameters :
mapred-site.xml :
configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml :
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>
Can you help me with this please ? I already tried what is mentionned in these questions :
Hadoop errorcode -1000, No space available in any of the local directories, except for the namenode emptying cache part.
Hadoop Windows setup. Error while running WordCountJob: "No space available in any of the local directories"
what do you think ?
Thank you !
I have an HDFS_file_path or property that needs to be passed from workflow-1 to common_subworkflow.
I also have workflow-2 which doesn't have that property or HDFS_file_path. But workflow-2 calls common_subworkflow.
In common_subworkflow I am fetching the property value with ${HDFS_file_path}.
It works fine when workflow-1 calls common_subworkflow but fails when workflow-2 calls common_subworkflow since HDFS_file_path doesn't exist in workflow-2.
Is there any way to
read the dynamic property if present, or
set some default value(null or empty) if variable not present
<workflow-app name='hello-wf' xmlns="uri:oozie:workflow:0.4">
<parameters>
<property>
<name>inputDir</name>
</property>
<property>
<name>outputDir</name>
<value>out-dir</value>
</property>
</parameters>
...
<action name='firstjob'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>com.foo.FirstMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>com.foo.FirstReducer</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to='secondjob'/>
<error to='killcleanup'/>
</action>
...
</workflow-app>
In the above example, if inputDir is not specified, Oozie will print an error message instead of submitting the job. If =outputDir= is not specified, Oozie will use the default value, out-dir .
Taken from https://oozie.apache.org/docs/3.3.1/WorkflowFunctionalSpec.html#a4.1_Workflow_Job_Properties_or_Parameters
I will paste all my configuration below. I have a cluster of 3 computers. Configuration of namenode 1 (impc2361)
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>viewfs//ClusterA</value>
</property>
<property>
<name>fs.viewfs.mounttable.ClusterA.link./home</name>
<value>hdfs//impc2361:8021/home</value>
</property>
<property>
<name>fs.viewfs.mounttable.ClusterA.link./home1</name>
<value>hdfs//impc2359:8020/home1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/Downloads/hadoop2/tmpfold</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>home,home1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.home</name>
<value>impc2361:8021</value>
</property>
<property>
<name>dfs.namenode.http-address.home</name>
<value>impc2361:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.home1</name>
<value>impc2359:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.home1</name>
<value>impc2359:50070</value>
</property>
</configuration>
I have copied the same configurations on the other nodes as well that is namenode2 (impc2359) and datanode (impc2391)
Problems
I don't get the web^page of namenode1(impc2361) when I type impc2361.htcitmr:50070 in web url
It throws an error
HTTP ERROR 404
Problem accessing /dfshealth.jsp.
Reason: NOT_FOUND
I get a web page of namenode2 (impc2359) when I type impc2359.htcitmr:50070 but i don't find the folder /home1 which was set in core-site.xml
I am not able to do any operations through my terminal on cluster as it throws a error that it is readonly
hadoop fs -mkdir /a
mkdir: InternalDir of ViewFileSystem is readonly; operation=mkdirsPath=/a
Please kindly help
Recently, I have been investigting oozie and trying very hard to build a local oozie system.After reading the official web page again and again, I finally made it. But when I tried to run a example in examples/ direction, I always got an error:
2016-07-21 21:55:17,936 WARN ActionStartXCommand:523 - SERVER[namenode] USER[ubuntu] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-160720220441466-oozie-ubun-W] ACTION[0000001-160720220441466-oozie-ubun-W#mr-node] Error starting action [mr-node]. ErrorType [ERROR], ErrorCode [IllegalArgumentException], Message [IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///]
org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///
at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:445)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1132)
at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1286)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://166.111.81.254:9000/user/ubuntu/share/lib, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
at org.apache.oozie.service.ShareLibService.getLatestLibPath(ShareLibService.java:687)
at org.apache.oozie.service.ShareLibService.updateShareLib(ShareLibService.java:551)
at org.apache.oozie.service.ShareLibService.getShareLibJars(ShareLibService.java:346)
at org.apache.oozie.service.ShareLibService.getSystemLibJars(ShareLibService.java:412)
at org.apache.oozie.action.hadoop.JavaActionExecutor.addSystemShareLibForAction(JavaActionExecutor.java:721)
at org.apache.oozie.action.hadoop.JavaActionExecutor.addAllShareLibs(JavaActionExecutor.java:818)
at org.apache.oozie.action.hadoop.JavaActionExecutor.setLibFilesArchives(JavaActionExecutor.java:809)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1037)
... 10 more
2016-07-21 21:55:17,939 WARN ActionStartXCommand:523 - SERVER[namenode] USER[ubuntu] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-160720220441466-oozie-ubun-W] ACTION[0000001-160720220441466-oozie-ubun-W#mr-node] Setting Action Status to [DONE]
The error bothered me for a few days and I've been trying to solve it. If you have any suggestion, please teach me~ Thanks for it very much!
My configuration are AS FOLLOW:
hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://166.111.81.254:9000</value>
</property>
<property>
<name>hadoop.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>166.111.81.254</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>166.111.81.254:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop-2.6.4/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
oozie
oozie-site.xml
<configuration>
<property>
<name>oozie.service.HadoopAccessorService.root.configurations</name>
<value>*=/usr/local/hadoop-2.6.4/etc/hadoop/</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>hdfs://166.111.81.254:9000/user/${user.name}/share/lib</value>
<description>
System library path to use for workflow applications.
This path is added to workflow application if their job properties sets
the property 'oozie.use.system.libpath' to true.
</description>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>
I am trying to run a HIVE action using a OOZIE workflow. Below is the hive action:
create table abc (a INT);
I can locate the internal table in HDFS (directory abc getting created under /user/hive/warehouse) but when I trigger the command SHOW TABLES from hive>, I am not able to see the table.
This is the workflow.xml file:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
<start to="hiveac"/>
<action name="hiveac">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<!-- <prepare> <delete path="${nameNode}/user/${wf:user()}/case1/out"/> </prepare> -->
<!-- <job-xml>hive-default.xml</job-xml>-->
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>hive-default.xml</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>script.q</script>
<!-- <param>INPUT=/user/${wf:user()}/case1/sales_history_temp4</param>
<param>OUTPUT=/user/${wf:user()}/case1/out</param> -->
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Pig Script failed!!!</message>
</kill>
<end name="end"/>
</workflow-app>
This is the hive-default.xml file:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>
</configuration>
This is the job.properties file:
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
oozie.libpath=/user/oozie/shared/lib
#oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/my/jobhive
The logs did not gave any errors as such:
stderr logs
Logging initialized using configuration in jar:file:/var/lib/hadoop-hdfs/cache/mapred/mapred/local/taskTracker/distcache/3179985539753819871_-620577179_884768063/localhost/user/oozie/shared/lib/hive-common-0.9.0-cdh4.1.1.jar!/hive-log4j.properties
Hive history file=/tmp/mapred/hive_job_log_mapred_201603060735_17840386.txt
OK
Time taken: 9.322 seconds
Log file: /var/lib/hadoop-hdfs/cache/mapred/mapred/local/taskTracker/training/jobcache/job_201603060455_0012/attempt_201603060455_0012_m_000000_0/work/hive-oozie-job_201603060455_0012.log not present. Therefore no Hadoop jobids found
I came across a similar thread: Tables created by oozie hive action cannot be found from hive client but can find them in HDFS
But this did not resolved my issue. Please let me know how to resolve this issue.
I haven't used Oozie for a couple months (and did not keep archives because of legal reasons) and anyway it was V4.x so it's a bit of guesswork...
upload your valid hive-site.xml to HDFS somewhere
tell Oozie to inject all these properties in the Launcher Configuration before running the Hive class, so that it inherits them all, with
<job-xml>/some/hdfs/path/hive-site.xml</job-xml>
remove any reference to oozie.hive.defaults
Warning: all that assumes that your sandbox cluster has a persistent Metastore -- i.e. your hive-site.xml does not point to a Derby embedded DB that gets erased every time!