Hadoop Mapreduce tasktrackers keep ignoring HADOOP_CLASSPATH. Zookeeper trying to connect to localhost rather than cluster address - hadoop

I have a Hadoop cluster (Cloudera CDH4.2) with 5 datanodes. I'm trying to run a MapReduce job which creates an HBaseConfiguration object. The tasktracker attempts fail, because they're trying to connect to localhost:2181 rather than the address of the actual zookeeper installation.
I'm aware that this is because the tasktrackers are not being supplied with the correct classpath containing the hbase configuration. However, if I run the job like so:
HADOOP_CLASSPATH=`/usr/bin/hbase classpath` hadoop jar myjar.jar
The documentation indicates this should solve the problem. The first entry in hbase classpath is /usr/lib/hbase/conf which is a symlink to /etc/hbase/conf, so in theory, this should add the hbase configuration into the HADOOP_CLASSPATH variable.
However, the logs from the tasktracker show this:
2013-08-14 12:47:24,308 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=<output of `hadoop classpath`>
....
2013-08-14 12:47:24,309 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
.....
2013-08-14 12:47:24,328 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
So, for some reason, the tasktrackers are completely ignoring my effort to set HADOOP_CLASSPATH to hbase classpath. The documentation (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath) states this should just work. What's wrong?
I'm aware I could work around this by explicitly specifying the zookeeper quorum address in the jar code, but I need this jar to be portable, and pick up the local configuration without recompiling, so I don't see hard coding the address as a viable option.

if you made java programming:
conf.set("hbase.zookeeper.quorum", "server1,server2,server3");
conf.set("hbase.zookeeper.property.clientPort", "2181");
if you used command:add -Dhbase.zookeeper.quorum
sudo hadoop jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase/hbase.jar rowcounter -Dhbase.zookeeper.quorum=server1,server2,server3 hly_temp

Related

Hbase Master turing off after start. Setup for Hbase on Hadoop for a single cluster DB on my local machine

I have installed Hadoop (2.9.1) and Hbase (2.1) on my linux machine with the appropriate configurations.
1) I start all hadoop components. Using jps, I am able to see all the components that are running. This step is working fine.
2) When I start hbase, all the hbase components start again . Using the jps command, I am able to see the required components are running again. However, within 10 seconds, Hmaster turns off.
This is the contents of the log file for hbase master:-
The errors outlined below are pretty much the same for both master and regionserver log file.
2018-08-17 17:13:14,255 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
I understand that there is some port connection problem, but don't quite know what and where to make the changes.
Thank you in advance for your guidance.

Running an Oozie job

I'm trying to configure Oozie to work on my hadoop-2.7.1 cluster. Everything seems to work fine, YARN, Hue, MapReduce and Spark. Jobs send by yarn jar... command finish correctly, but sending some job with oozie, either by CLI oozie job ... -run or by Hue, the job is stuck at 33% and node logs show this:
2015-11-06 06:08:56,121 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18030
2015-11-06 06:08:57,165 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:18030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
I don't use 18030 port anywhere in my configuration, probably I should change its hostname from localhost to the network hostname. But where do I configure it? I've tried to change yarn.resourcemanager.scheduler.address, but that wasn't it.
EDIT:
I run oozie job -config examples/apps/shell/job.properties -run with job.properties containing:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
examplesRoot=examples
oozie.libpath=/data/shared/hadoop-2.7.1/etc/hadoop
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell
The error is occurring while trying to contact the Resource Manager.
The above mentioned log line is being printed in RMProxy.java:
LOG.info("Connecting to ResourceManager at " + rmAddress);
When you are using Oozie with MRv1, in "job.properties" file, the value of jobTracker is set to the Job Tracker's address:
jobTracker={JobTracker Host}:{JobTracker Port}
But, when you migrate your Oozie job to MRv2, you need to change "job.properties", to make jobTracker value to point to Resource Manager address:
jobTracker={RM Host}:{RM Port}
Please refer to the link here: https://support.pivotal.io/hc/en-us/articles/203355837-How-to-run-a-MapReduce-jar-using-Oozie-workflow
jobTracker = Variable to define the resource manager address in case of Yarn implementation. Format: <resourcemanager_hostname>:<port>
EDIT:
I went through the Hadoop source code. The only place where port "18030" is being used is in "SLS" (Yarn Scheduler Load Simulator).
SLS has a yarn-site.xml file (present at location: \hadoop-tools\hadoop-sls\src\main\sample-conf\yarn-site.xml), which has following configuration:
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:18030</value>
</property>
From your description, it seems the yarn-site.xml that is being used, is similar to the one used by SLS.

HBase fails to start in single node cluster mode on Mac OSX

I am trying to get a personal HBase development environment set up. I have hdfs and yarn running, but cannot get HBase to start.
I have started up hadoop 2.7.1, by running start-dfs.sh and start-yarn.sh. I have verified these are running by testing hdfs dfs -mkdir /test and running a sample MR job bundled in the examples, I have browsed HDFS at port 50070.
I have started zookeeper 3.4.6 on port 2181 and set its dataDir. My zoo.cfg has:
dataDir=/Users/.../tools/hd/zookeeper_data
clientPort=2181
I observe its zookeeper_server.PID file in the dataDir I chose, and when I run jps I see the below:
51074 NodeManager
50743 DataNode
50983 ResourceManager
50856 SecondaryNameNode
57848 QuorumPeerMain
58731 Jps
50653 NameNode
QuorumPeerMain above matches the PID in zookeeper_server.PID, as I would expect. Is this expectation correct? From what I have done so far, should it be expected that any more processes should be showing here?
I installed hbase-1.1.2. I configure hbase-site.xml. I set the hbase.rootDir to be hdfs://localhost:8200/hbase, my hdfs is running at localhost:8200. I set hbase.zookeeper.property.dataDir to my zookeeper's dataDir, with the expectation that it will use this property to find the PID of a running zookeeper. Is this expectation correct or have I misunderstood? The config in hbase-site.xml is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>Users/.../tools/hd/zookeeper_data</value>
</property>
When I run start-hbase.sh my server fails to start. I see this log message:
2015-09-26 19:32:43,617 ERROR [main] master.HMasterCommandLine: Master exiting
To investigate I ran hbase master start and get more detail:
2015-09-26 19:41:26,403 INFO [Thread-1] server.NIOServerCnxn: Stat command output
2015-09-26 19:41:26,405 INFO [Thread-1] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:63334 (no session established for client)
2015-09-26 19:41:26,406 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2015-09-26 19:41:26,406 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:214)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
So I have a few questions:
Should I be trying to set up a zookeeper before running HBase?
Why when I have started a zookeeper and told HBase where its dataDir is, does HBase try to start its own zookeeper?
Anything obviously stupid/misguided in the above?
The script you are using to start hbase start-hbase.sh will try to start the following components, in order:
zookeeper
hbase master
hbase regionserver
hbase master-backup
So, you could either stop the zookeeper which is started by you (or) you could start the daemons individually yourself:
# start hbase master
bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
# start region server
bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} --hosts ${HBASE_CONF_DIR}/regionservers start regionserver
HBase stand alone starts it's own zookeeper (if you run start-hbase.sh), but it if fails to start or keep running, the other need hbase daemons won't work.
Make sure you explicitly set the properties for your interface lo0 in the hbase-site.xml file:
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>lo0</value>
</property>
I found that when my wifi was on, if these entries were missing, zookeeper filed to start.

getting java.net.SocketTimeoutException when trying to run the Hadoop mapReduce on fresh install of Hortonworks

I have a fresh install of Hortonworks version 2.3_1 for oracle virtualbox and I get a java.net.SocketTimeoutException whenever I try to run a mapreduce job. I changed nothing other than the memory and the cores available to the VM.
full text of run:
WARNING: Use "yarn jar" to launch YARN applications.
15/09/01 01:15:17 INFO impl.TimelineClientImpl: Timeline service address: http:/ /sandbox.hortonworks.com:8188/ws/v1/timeline/
15/09/01 01:15:20 INFO client.RMProxy: Connecting to ResourceManager at sandbox. hortonworks.com/10.0.2.15:8050
15/09/01 01:16:19 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your applicatio n with ToolRunner to remedy this.
15/09/01 01:18:09 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor excepti on for block BP-601678901-10.0.2.15-1439987491556:blk_1073742292_1499
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0 .2.15:52924 remote=/10.0.2.15:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja va:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 61)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 31)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 18)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java :2280)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(P ipelineAck.java:244)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor .run(DFSOutputStream.java:749)
15/09/01 01:18:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area /use r/root/.staging/job_1441069639378_0001
Exception in thread "main" java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-56099a5f-3cb3-426e-8e1a-ff3b53df9bf2,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
Full name of file ova file I am using: Sandbox_HDP_2.3_1_virtualbox.ova
my host is a window 7 home premium machine with eight lines of execution(four hyperthreaded cores, I think)
The problem was exactly what it seemed a timeout error. Fixed by going to the hadoop config folder and raising all the timeouts as well as the number of retries (although from the log that didn't come into play) and stopping unnecessary services on both the host and guest operating system.
Thank, sunrise76 on of those issues pointed me to the config folder.

Hbase master not able to start

I am trying to start hbase master but getting the below error:
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
13/07/14 06:33:23 ERROR master.HMasterCommandLine: Failed to start master
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:134)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1684)
13/07/14 06:33:23 INFO server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:46283 (no session established for client)
hbase-site.xml
<configuration>
<!-- Changing the default port for REST since it conflicts with yarn nodemanager -->
<property>
<name>hbase.rest.port</name>
<value>8070</value>
<description>The port for the HBase REST server.</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
</configuration>
It seems like something else is already using port 2181, or perhaps you had started another ZK instance earlier on this port. Either stop that processe or change its port. If that's is not possible then set hbase.zookeeper.property.clientPort to 2182 in hbase-site.xml.
Please note that HBase needs ZK's services, even in standalone mode, so you should make sure that it's running OK.
HTH
Surprisingly, it can be an issue with privileges to connect to the port 2181, but not to 2182. Instead of ./start-hbase.sh try:
sudo ./start-hbase.sh
In my case it helped.
When starting Hbase in standalone mode, a single JVM hosts the HBase Master, an HBase RegionServer, and a ZooKeeper quorum peer. So, you don't need to start a ZK instance separately.
In your case, hbase is not able to start the ZK because another instance is probably already running on port 2181. So, just close that ZooKeeper instance and restart hbase. Also, do ensure proper permissions for the hbase rootdir.

Resources