Running Oozie using Cloudera VM issue - hadoop

I am using cloudera quickstart in vmware to run sample Oozie.
I am trying to run some examples of Oozie that comes in Cloudera.
I am following this link: http://archive.cloudera.com/cdh/3/oozie/DG_Examples.html
I untared 'oozie-examples.tar.gz' and got examples directory.
When running the oozie, I get an error message:
[cloudera#localhost oozie-3.3.2+92]$ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
o/p:
uce/job.properties -run
Error: E0901 : E0901: Namenode [localhost:8020] not allowed, not in Oozies whitelist
oozie-site.xml looks like:
vi /etc/oozie/conf.dist/oozie-site.xml:
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value>
localhost:8021
</value>
<description>
Whitelisted job tracker for Oozie service.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.nameNode.whitelist</name>
<value>
hdfs://localhost:8020
</value>
<description>
Whitelisted job tracker for Oozie service.
</description>
</property>
vi job.properties look like:
hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
What am I doing wrong?
Thank you!

In job.properties file, I replaced localhost with: localhost.localdomain. And it fixed the problem

You should have something like following your /etc/hosts.
127.0.0.1 localhost.localdomain localhost
Visit this link for details.
https://issues.apache.org/jira/browse/OOZIE-1516

Related

oozie java.io.IOException: No FileSystem for scheme: hdfs

I have setup the oozie 4.3.1 with Hadoop 2.7.3.
oozie has been setup and running successfully and able to see web console http://localhost:11000/oozie/
and also confirm using oozie status command.
Issue 1:
While running the oozie examples after changing the job.properties with relevant details getting the error.
nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run
Error: E0902 : E0902: Exception occured: [No FileSystem for scheme: hdfs]
Issue 2: oozie admin -sharelibupdate
[ShareLib update status]
host = http://f091403isdpbato05:11000/oozie
status = java.io.IOException: No FileSystem for scheme: hdfs
hdfs path and other oozie related .xml files also updated with proper configurations.
Please let me know any solution to move ahead.
You can try adding the following to you core-site.xml :
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>

Redirecting to log server for container when view logs of a completed spark jobs run on yarn

I'm running spark on yarn.
My spark versoin is 2.1.1, and hadoop version is apache hadoop 2.7.3.
when a spark job running on yarn in cluster mode, I can view the Executor's log via the stdout/stderr links like
http://hadoop-slave1:8042/node/containerlogs/container_1500432603585_0148_01_000001/hadoop/stderr?start=-4096
but when the job completed, view the Executor's log via the stdout/stderr links will get an error page like
Redirecting to log server for container_1500432603585_0148_01_000001
java.lang.Exception: Unknown container. Container either has not
started or has already completed or doesn't belong to this node at
all.
And then it will auto redirect to
http://hadoop-slave1:8042/node/hadoop-master:19888/jobhistory/logs/hadoop-slave1:36207/container_1500432603585_0148_01_000001/container_1500432603585_0148_01_000001/hadoop
and get other error page like
Sorry, got error 404
Please consult RFC 2616 for meanings of the error code.
Error Details
org.apache.hadoop.yarn.webapp.WebAppException: /hadoop-master:19888/jobhistory/logs/hadoop-slave1:50284/container_1500432603585_0145_01_000002/container_1500432603585_0145_01_000002/oryx: controller for hadoop-master:19888 not found
at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
Actually i can visit the Executor's log using this url when the
spark job completed:
http://hadoop-master:19888/jobhistory/logs/hadoop-slave1:36207/container_1500432603585_0148_01_000001/container_1500432603585_0148_01_000001/hadoop
it's a little different from the previous url, it remove the head "hadoop-slave1:8042/node/".
Does anyone knows another better method to view the spark logs when the spark job completed ?
I have configed the yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>${yarn.resourcemanager.hostname}:19888/jobhistory/logs</value>
</property>
and mapred-site.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>${yarn.resourcemanager.hostname}:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.admin.address </name>
<value>${yarn.resourcemanager.hostname}:10033</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:19888</value>
</property>
I have encounter this situation.view the completed spark steaming job logs through YARN UI History tab, but get error below:
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.
The solution is configure the file yarn-site.xml. Add key yarn.log.server.url :
<property>
<name>yarn.log.server.url</name>
<value>http://<LOG_SERVER_HOSTNAME>:19888/jobhistory/logs</value>
</property>
Then restart yarn cluster to reload yarn-site.xml.(this step is important!)

Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server

I am new to oozie and was following this for my first oozie hive job.
As per given in tutorial ,i made following files in a directory:
hive-default.xml
hive_job1.hql
job.properties
workflow.xml
But when i run this command:
oozie job -oozie http://localhost:11000/ -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
I get following error:
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found
I tried finding solution for this on internet ,but none solved the problem.(Might have missed something)
Please let me know where i am going wrong and what additional information will be required more from my side to understand the problem.
The error is because of the incorrect value for -oozie parameter. You forgot to add the oozie in the end. It should be -oozie http://localhost:11000/oozie
oozie job -oozie http://localhost:11000/oozie -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
Please try setting following properties in core-site.xml:
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
where * represents to all users.
Restart the hadoop cluster after making above changes.

Oozie on YARN - oozie is not allowed to impersonate hadoop

I'm trying to use Oozie from Java to start a job on a Hadoop cluster. I have very limited experience with Oozie on Hadoop 1 and now I'm struggling trying out the same thing on YARN.
I'm given a machine that doesn't belong to the cluster, so when I try to start my job I get the following exception:
E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate hadoop
Why is that and what to do?
I read a bit about core-site properties that need to be set
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>users</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>master</value>
</property>
Does it seem that this is the problem? Should I contact people responsible for cluster to fix that?
Could there be problems because I'm using same code for YARN as I did for Hadoop 1? Should something be changed? For example, I'm setting nameNode and jobTracker in workflow.xml, should jobTracker exist, since there is now ResourceManager? I have set the address of ResourceManager, but left the property name as jobTracker, could that be the error?
Maybe I should also mention that Ambari is used...
Hi please update the core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
and jobTracker address is the Resourcemananger address that will not be the case . once update the core-site.xml file it will works.
Reason:
Cause of this type of error is- You run oozie server as a hadoop user but you define oozie as a proxy user in core-site.xml file.
Solution:
change the ownership of oozie installation directory to oozie user and run oozie server as a oozie user and problem will be solved.

Error running mapreduce sample in hadoop 0.23.6

I deployed Hadoop 0.23.6 in Ubuntu 12.04 LTS. I am able to copy files across and do file manipulation. I am using YARN for mapreduce.
I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
Command used:
bin/hadoop jar hadoop-mapreduce-examples-0.23.6.jar randomwriter -Dmapreduce.randomwriter.mapsperhost=1 -Dmapreduce.job.user.name=$USER -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars hadoop-mapreduce-client-app-0.23.6.jar output
Hadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Verify your yarn-site.xml configuration. You need to have below properties configured.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
For more details, have look at jira
https://issues.apache.org/jira/browse/MAPREDUCE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Resources