Running an Oozie job - hadoop

I'm trying to configure Oozie to work on my hadoop-2.7.1 cluster. Everything seems to work fine, YARN, Hue, MapReduce and Spark. Jobs send by yarn jar... command finish correctly, but sending some job with oozie, either by CLI oozie job ... -run or by Hue, the job is stuck at 33% and node logs show this:
2015-11-06 06:08:56,121 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18030
2015-11-06 06:08:57,165 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:18030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
I don't use 18030 port anywhere in my configuration, probably I should change its hostname from localhost to the network hostname. But where do I configure it? I've tried to change yarn.resourcemanager.scheduler.address, but that wasn't it.
EDIT:
I run oozie job -config examples/apps/shell/job.properties -run with job.properties containing:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
examplesRoot=examples
oozie.libpath=/data/shared/hadoop-2.7.1/etc/hadoop
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell

The error is occurring while trying to contact the Resource Manager.
The above mentioned log line is being printed in RMProxy.java:
LOG.info("Connecting to ResourceManager at " + rmAddress);
When you are using Oozie with MRv1, in "job.properties" file, the value of jobTracker is set to the Job Tracker's address:
jobTracker={JobTracker Host}:{JobTracker Port}
But, when you migrate your Oozie job to MRv2, you need to change "job.properties", to make jobTracker value to point to Resource Manager address:
jobTracker={RM Host}:{RM Port}
Please refer to the link here: https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
jobTracker = Variable to define the resource manager address in case of Yarn implementation. Format: <resourcemanager_hostname>:<port>
EDIT:
I went through the Hadoop source code. The only place where port "18030" is being used is in "SLS" (Yarn Scheduler Load Simulator).
SLS has a yarn-site.xml file (present at location: \hadoop-tools\hadoop-sls\src\main\sample-conf\yarn-site.xml), which has following configuration:
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:18030</value>
</property>
From your description, it seems the yarn-site.xml that is being used, is similar to the one used by SLS.

Related

Not able to run dump in pig

I am trying to dump a relation but getting following error.
I have tried start-all.sh and tried formatting namenode using hadoop namenode -format.
But I am not getting what is wrong.
Error:-
Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Start the JobHistoryServer
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Pig when ran in mapreduce mode expects the JobHistoryServer to be available.
To configure JobHistoryServer, add these properties to mapred-site.xml replacing hostname with actual name of the host where the process is started
<property>
<name>mapreduce.jobhistory.address</name>
<value>hostname:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hostname:19888</value>
</property>
I would first ensure I'm able to connect to namenode from hdfs client on a edge node. If not some problem/inconsistency with your namenode configs in core-site.xml file either with ports or hostname.
Once you are able to run below with out any issues and ensure namenode is not in safe mode on url http://namenode_host:50070 (which prevents any writes)
hadoop fs -ls /
Then I would proceed with pig. Looks like based on your error hdfs client is unable to reach namenode for some reason which could be firewall or config issue.

Why am I unable to connect to yarn?

I'm trying to connect to yarn by doing yarn application -list. But I cannot because it says:
<date> <time> INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s): retyr policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime= 1000 MILLISECONDS)
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s): retry policy is RetryUpToMaximumCount
<date> <time> INFO ipc.Client: Retrying connecting to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s): retry policy is RetryUpToMaximumCount
I have a file under /etc/hadoop/conf.empty/yarn-site.xml, which I assume is related to this in some way. I have a file at /etc/hadoop/conf.empty/ called yarn-env.sh. I tried running this file, but it didn't change anything.
Am I doing something wrong? Or maybe something is not correctly configured? How do I start yarn?
yarn-site.xml is for configuring YARN daemons ResourceManager, NodeManager and ApplicationMaster. The properties relating to these services go in here. And the environment settings for YARN can be modified with yarn-env.sh.
Start YARN services, (From the path of yarn-site.xml file posted, the installation does not appear to be done using tarballs. So the startup scripts might not be available)
On ResourceManager host
sudo service hadoop-yarn-resourcemanager start
And on each NodeManager host
sudo service hadoop-yarn-nodemanager start
Note: Make sure the preliminary configuration properties are set for both HDFS and YARN and the HDFS daemons Namenode and Datanode are started and running.
Additionally, Configure the mapreduce to use yarn in mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
You need to start the hadoop service, at least you need to start:
start-dfs.sh
start-yarn.sh
these shell script are located in the hadoop bin folder.
Depending on the installation maybe you need even to start history server.
If it is the first time you start hadoop, you need to format the namenode, otherwise the dfs service would not start.

HBase fails to start in single node cluster mode on Mac OSX

I am trying to get a personal HBase development environment set up. I have hdfs and yarn running, but cannot get HBase to start.
I have started up hadoop 2.7.1, by running start-dfs.sh and start-yarn.sh. I have verified these are running by testing hdfs dfs -mkdir /test and running a sample MR job bundled in the examples, I have browsed HDFS at port 50070.
I have started zookeeper 3.4.6 on port 2181 and set its dataDir. My zoo.cfg has:
dataDir=/Users/.../tools/hd/zookeeper_data
clientPort=2181
I observe its zookeeper_server.PID file in the dataDir I chose, and when I run jps I see the below:
51074 NodeManager
50743 DataNode
50983 ResourceManager
50856 SecondaryNameNode
57848 QuorumPeerMain
58731 Jps
50653 NameNode
QuorumPeerMain above matches the PID in zookeeper_server.PID, as I would expect. Is this expectation correct? From what I have done so far, should it be expected that any more processes should be showing here?
I installed hbase-1.1.2. I configure hbase-site.xml. I set the hbase.rootDir to be hdfs://localhost:8200/hbase, my hdfs is running at localhost:8200. I set hbase.zookeeper.property.dataDir to my zookeeper's dataDir, with the expectation that it will use this property to find the PID of a running zookeeper. Is this expectation correct or have I misunderstood? The config in hbase-site.xml is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>Users/.../tools/hd/zookeeper_data</value>
</property>
When I run start-hbase.sh my server fails to start. I see this log message:
2015-09-26 19:32:43,617 ERROR [main] master.HMasterCommandLine: Master exiting
To investigate I ran hbase master start and get more detail:
2015-09-26 19:41:26,403 INFO [Thread-1] server.NIOServerCnxn: Stat command output
2015-09-26 19:41:26,405 INFO [Thread-1] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:63334 (no session established for client)
2015-09-26 19:41:26,406 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2015-09-26 19:41:26,406 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:214)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
So I have a few questions:
Should I be trying to set up a zookeeper before running HBase?
Why when I have started a zookeeper and told HBase where its dataDir is, does HBase try to start its own zookeeper?
Anything obviously stupid/misguided in the above?
The script you are using to start hbase start-hbase.sh will try to start the following components, in order:
zookeeper
hbase master
hbase regionserver
hbase master-backup
So, you could either stop the zookeeper which is started by you (or) you could start the daemons individually yourself:
# start hbase master
bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
# start region server
bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} --hosts ${HBASE_CONF_DIR}/regionservers start regionserver
HBase stand alone starts it's own zookeeper (if you run start-hbase.sh), but it if fails to start or keep running, the other need hbase daemons won't work.
Make sure you explicitly set the properties for your interface lo0 in the hbase-site.xml file:
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>lo0</value>
</property>
I found that when my wifi was on, if these entries were missing, zookeeper filed to start.

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server.
The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:
2015-04-21 19:05:22,825 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:
2015-04-21 19:05:55,822 [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
I found this question here but in my case the job history server is started. If I run netstat, I find:
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 12073/java off (0.00/0/0)
Where 12073 is ...
12073 pts/4 Sl 0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
I tried opening the port 10200 in case it was a firewall issue:
ACCEPT tcp -- anywhere anywhere tcp dpt:10020
... but no luck.
After a few minutes, some of the tasks just arbitrarily continue to the next part.
I'm using Hadoop 2.3 and Pig 0.14.
My question is:
1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?
... or failing that ...
2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?
It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.
The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)
<property>
<name>mapreduce.jobhistory.address</name>
<value>cm:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
Then restart the job history server:
mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver
If you get a bind exception (port in use), it means the stop didn't work. Either
Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration
Pig should pick up the new settings automatically. Run a Pig script and hope for the best.
start history server in hadoop bin using the below command
bin$ ./mr-jobhistory-daemon.sh start historyserver
run pig using the below command
$pig
Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml,
then:
mapred --daemon start
The solution was the History server was not running:
[user#vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user#vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user#vm9 sbin]$
Now pig can run properly and it will connect to the job history server and the dump command is working fine.

Getting OOZIE error E0900: Jobtracker [localhost:8021] not allowed, not in Oozies whitelist]

I am trying to run the Oozie examples on the CDH virtual machine. I have Cloudera Manager running and I execute the following command:
oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
when I check the status I get the HadoopAccessorException.
I checked the oozie log and I see the following stack trace:
2013-07-22 14:25:56,179 WARN org.apache.oozie.command.wf.ActionStartXCommand:
USER[cloudera] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000001-130722142323751-oozie
oozi-W] ACTION[0000001-130722142323751-oozie-oozi-W#mr-node] Error starting action
[mr-node]. ErrorType [ERROR], ErrorCode [HadoopAccessorException], Message
[HadoopAccessorException: E0900: Jobtracker not allowed, not in
Oozies whitelist] org.apache.oozie.action.ActionExecutorException:
HadoopAccessorException: E0900: Jobtracker not allowed, not in Oozies
Whitelist
The oozie-site.xml and the oozie-default.xml have the oozie.service.HadoopAccessorService.jobTracker.whitelist and oozie.service.HadoopAccessorService.nameNode.whitelist set.
Any help would be appreciated.
Thanks.
Dave
I believe Cloudera Manager doesn't read your oozie-site.xml file and rather maintains its own config somewhere.
You should be able to go in the UI into Oozie Server Role, Processes, Configuration Files/Environment and click on Show and this is where you can define the whitelists for your Oozie server, as opposed to just doing it in the files.
Once this is changed, restart Oozie and you should be able to execute your command.
source
I know i am very late on this but someone looking for answers might find this helpful. I got similar error i went into the the location on Cloudera manager UI into Oozie Server Role, Processes, Configuration Files/Environment
And clicked on oozie-site.xml link and looked at the below property
<property>
<name>oozie.service.HadoopAccessorService.nameNode.whitelist</name>
<value>server1:8020,server2:8020,**<name>**</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value>server1:8032,server2:8032,**yarnRM**</value>
</property>
I used yarnRM as my value on the jobtracker in the workflow.xml file and it went past the error while running the workflow.

Resources