My testing environment
I'm trying to deploy into my testing environment an Hadoop Cluster based on 3 nodes :
1 Namenode (master : 172.30.10.64)
2 Datanodes (slave1 : 172.30.10.72 and slave2 : 172.30.10.62)
I configured files with master properties into my namenode and with slaves properties into my datananodes.
Master's files
Hosts :
127.0.0.1 localhost
172.30.10.64 master
172.30.10.62 slave2
172.30.10.72 slave1
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
yarn-site.xml :
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>
mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
</configuration>
And I have slaves file :
slave1
slave2
masters file :
master
Slaves' files :
I added just files which changed against master's files.
hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
</configuration>
My issue
I launched from /usr/local/hadoop/sbin :
./start-dfs.sh && ./start-yarn.sh
This is what I get :
hduser#master:/usr/local/hadoop/sbin$ ./start-dfs.sh && ./start-yarn.sh
18/03/14 10:45:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
hduser#master's password:
master: starting namenode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-hduser-namenode-master.out
hduser#slave2's password: hduser#slave1's password:
slave2: starting datanode, logging to /usr/local/hadoop-2.7.5/logs/hadoop-hduser-datanode-slave2.out
So I opened log file from my slave2 :
2018-03-14 10:46:05,494 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.30.10.64:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECOND$
2018-03-14 10:46:06,495 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.30.10.64:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECOND$
2018-03-14 10:46:07,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/172.30.10.64:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECOND$
What I've done
I tried some things but none effects up to now :
ping from master to slaves and between slaves works fine
ssh from master to slaves and between slaves works fine
hdfs namenode -format in my master node
Recreate Namenode and Datanaode folders
Open port 9000 in my master VM
Firewall is disabled : sudo ufw status --> disabled
I'm a bit lost because all seems to be ok and I don't know why I don't overcome to start my hadoop cluster.
I maybe find the answer :
I regenerate ssh key from master node and then copy to slave nodes. It seems to work now.
#Generate a ssh key for hduser
$ ssh-keygen -t rsa -P ""
#Authorize the key to enable password less ssh
$ cat /home/hduser/.ssh/id_rsa.pub >> /home/hduser/.ssh/authorized_keys
$ chmod 600 authorized_keys
#Copy this key to slave1 to enable password less ssh and slave2 too
$ ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
Related
I am a noob in hadoop spark. I have setup a hadoop/spark cluster (1 namenode, 2 datanode). Now I am trying to import data from DB (mysql) using scoop in HDFS, but its failing always
16/07/27 16:50:04 INFO mapreduce.Job: Running job: job_1469629483256_0004
16/07/27 16:50:11 INFO mapreduce.Job: Job job_1469629483256_0004 running in uber mode : false
16/07/27 16:50:11 INFO mapreduce.Job: map 0% reduce 0%
16/07/27 16:50:13 INFO ipc.Client: Retrying connect to server: datanode1_hostname/172.31.58.123:59676. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/07/27 16:50:14 INFO ipc.Client: Retrying connect to server: datanode1_hostname/172.31.58.123:59676. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/07/27 16:50:15 INFO ipc.Client: Retrying connect to server: datanode1_hostname/172.31.58.123:59676. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/07/27 16:50:18 INFO mapreduce.Job: Job job_1469629483256_0004 failed with state FAILED due to: Application application_1469629483256_0004 failed 2 times due to AM Container for appattempt_1469629483256_0004_000002 exited with exitCode: 255
For more detailed output, check application tracking page:http://ip-172-31-55-182.ec2.internal:8088/cluster/app/application_1469629483256_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1469629483256_0004_02_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
16/07/27 16:50:18 INFO mapreduce.Job: Counters: 0
16/07/27 16:50:18 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/07/27 16:50:18 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 16.2369 seconds (0 bytes/sec)
16/07/27 16:50:18 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
16/07/27 16:50:18 INFO mapreduce.ImportJobBase: Retrieved 0 records.
16/07/27 16:50:18 ERROR tool.ImportTool: Error during import: Import job failed!
I am able to manually write in HDFS:
hdfs dfs -put <local file path> <hdfs path>
But when i run scoop import command
sqoop import --connect jdbc:mysql://<host>/<db_name> --username <USERNAME> --password <PASSWORD> --table <TABLE_NAME> --enclosed-by '\"' --fields-terminated-by , --escaped-by \\ -m 1 --target-dir <hdfs location>
Can any one please tell me what I am doing wrong
Here is the list of things that I have already tried
Shutting down cluster, formatting HDFS, then restarting cluster (didn't help)
Made sure that HDFS is not in SAFE MODE
all the nodes have this in their /etc/hosts
127.0.0.1 localhost
172.31.55.182 namenode_hostname
172.31.58.123 datanode1_hostname
172.31.58.122 datanode2_hostname
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Configuration Files:
All Nodes: $HADOOP_CONF_DIR/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-172-31-55-182.ec2.internal:9000</value>
</property>
</configuration>
All Nodes: $HADOOP_CONF_DIR/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ip-172-31-55-182.ec2.internal</value>
</property>
</configuration>
All Nodes: $HADOOP_CONF_DIR/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>ip-172-31-55-182.ec2.internal:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
NameNode Specific Configurations
$HADOOP_CONF_DIR/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///mnt/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
</configuration>
$HADOOP_CONF_DIR/masters:
ip-172-31-55-182.ec2.internal
$HADOOP_CONF_DIR/slaves:
ip-172-31-58-123.ec2.internal
ip-172-31-58-122.ec2.internal
DataNode Specific Configurations
$HADOOP_CONF_DIR/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///mnt/hadoop_data/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
</configuration>
From where u are trying to import the data. I mean from which machine you are trying to connect.check the master and slaves file in both namenode and datanode.
Try to ping the ip address from different server and check if it's showing as up.
Make these changes and restart your cluster, and try again:
Edit the part as mention in comment(#) below, and remove the comment
/etc/hosts file on client node:
127.0.0.1 localhost yourcomputername #get computername by "hostname -f" command and replace here
172.31.55.182 namenode_hostname ip-172-31-55-182.ec2.internal
172.31.58.123 datanode1_hostname ip-172-31-58-123.ec2.internal
172.31.58.122 datanode2_hostname ip-172-31-58-122.ec2.internal
/etc/hosts file on cluster nodes:
198.22.23.212 youcomputername #change to public ip of client node, change computername same as client node
172.31.55.182 namenode_hostname ip-172-31-55-182.ec2.internal
172.31.58.123 datanode1_hostname ip-172-31-58-123.ec2.internal
172.31.58.122 datanode2_hostname ip-172-31-58-122.ec2.internal
I am terminating this cluster and starting from scratch.
EDIT: I have looked at YARN Resourcemanager not connecting to nodemanager and the solution does not work for me. I have attached the section of the node-manager log where a connection to the resource manager is made:
[main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /0.0.0.0:8031
2016-06-17 19:01:04,697 INFO [main] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNMContainerStatuses(429)) - Sending out 0 NM container statuses: []
2016-06-17 19:01:04,701 INFO [main] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(268)) - Registering with RM using containers :[]
2016-06-17 19:01:05,815 INFO [main] ipc.Client (Client.java:handleConnectionFailure(867)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-06-17 19:01:06,816 INFO [main] ipc.Client (Client.java:handleConnectionFailure(867)) - Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
For some reason it says it is connecting to 0.0.0.0. When I ssh into one of the data nodes and ping resource-manager I get a response so it is able to resolve the hostname.
This leads me to believe that an options is incorrect in my yarn-site.xml as my nodes are trying to connect to 0.0.0.0:8031 instead of the resource-manager:8031
I am running a Cloudera hadoop cluster on dockers and am having issues with the Yarn resource manager being able to see the other nodes. They way it is set up is as follows:
Node1 - Namenode (hadoop-hdfs-namenode)
Node 2 - Secondary Namenode (hadoop-hdfs-secondarynamenode)
Node 3 - Yarn Resource-Manager (hadoop-yarn-resourcemanager)
Node 4 - datanode and node manager (hadoop-hdfs-datanode, hadoop-yarn-nodemanager)
Node 5 - datanode and node manager (hadoop-hdfs-datanode, hadoop-yarn-nodemanager)
When I go to namenode:50070 I am able to see both nodes. However, when I go to the resource-manager:8088 it shows I have zero nodes. My yarn-site.xml file which is on every node is as follows:
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>resource-manager:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>resource-manager:8030</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
</property>
<property>
<name>yarn.log.aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://namenode:8020/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>resource-manager:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>resource-manager:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>resource-manager:8033</value>
</property>
<property>
<description>
Number of seconds after an application finishes before the nodemanager's
DeletionService will delete the application's localized file directory
and log directory.
To diagnose Yarn application problems, set this property's value large
enough (for example, to 600 = 10 minutes) to permit examination of these
directories. After changing the property's value, you must restart the
nodemanager in order for it to have an effect.
The roots of Yarn applications' work directories is configurable with
the yarn.nodemanager.local-dirs property (see below), and the roots
of the Yarn applications' log directories is configurable with the
yarn.nodemanager.log-dirs property (see also below).
</description>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>600</value>
</property>
</configuration>
Does anyone have any ideas as to why this is the case?
Thanks for reading.
Specify:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master-1</value>
</property>
As indicated in the edit it appeared as if the yarn-site.xml was not being picked up and only defaults were happening. I solved this be copying the yarn-site.xml file into every directory on the machine as user root. I then ran the node-manager as to make it error reading the file as it does not run under user root. The log directed me to where it expected the file which was in a yarn specific directory instead of the general hadoop directory.
I have been trying to set up and run Hadoop in the pseudo Distributed Mode.But when I type
bin/hadoop fs -mkdir input
I get
mkdir: Call From h1/192.168.1.13 to h1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
here is the details
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/grid/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://h1:9000</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>h1:9001</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>20</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>h1:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>h1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>h1:19888</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.http.address</name>
<value>h1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>h1:9001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>h1:50090</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/grid/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.13 h1
192.168.1.14 h2
192.168.1.15 h3
After hadoop namenode -format and start-all.sh
1702 ResourceManager
1374 DataNode
1802 NodeManager
2331 Jps
1276 NameNode
1558 SecondaryNameNode
the problem occurs
[grid#h1 hadoop-2.6.0]$ bin/hadoop fs -mkdir input
15/05/13 16:37:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: Call From h1/192.168.1.13 to h1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Where is the problems?
hadoop-grid-datanode-h1.log
2015-05-12 11:26:20,329 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = h1/192.168.1.13
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0
hadoop-grid-namenode-h1.log
2015-05-08 16:06:32,561 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = h1/192.168.1.13
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0
why the port 9000 does not work?
[grid#h1 ~]$ netstat -tnl |grep 9000
[grid#h1 ~]$ netstat -tnl |grep 9001
tcp 0 0 192.168.1.13:9001 0.0.0.0:* LISTEN
Please start dfs and yarn.
[hadoop#hadooplab sbin]$ ./start-dfs.sh
[hadoop#hadooplab sbin]$ ./start-yarn.sh
Now try using "bin/hadoop fs -mkdir input"
The issue usually comes when you install hadoop in a VM and then shut it down. When you shut down VM, dfs and yarn also stops. So you need to start dfs and yarn each time you restart the VM.
Firstly try command
bin/hadoop dfs -mkdir input
If you have followed micheal-roll post properly then you should not have any issue. I suspect that passwordless ssh is not working in your configuration, recheck it.
Following procedure resolved the issue for me:
Stop all the services.
Delete namenode and datanode directories as specified in hdfs-site.xml.
Create new namenode and datanode directories and modify hdfs-site.xml accordingly.
In core-site.xml, make the following changes or add the following properties:
fs.defaultFS
hdfs://172.20.12.168/
fs.default.name
hdfs://172.20.12.168:8020
Make the following changes in hadoop-2.6.4/etc/hadoop/hadoop-env.sh file:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
Restart dfs, yarn and mr as follows:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
This command worked for me:
hadoop namenode -format
Environment : Ubuntu 14.04 , hadoop-2.2.0 , hbase-0.98.7
when i start hadoop and hbase(single node mode), both all success (I also check the website 8088 for hadoop, 60010 for hbase)
jps
4507 SecondaryNameNode
5350 HRegionServer
4197 NameNode
4795 NodeManager
3948 QuorumPeerMain
5209 HMaster
4678 ResourceManager
5831 Jps
4310 DataNode
but when i check hbase-hadoop-master-localhost.log, i found a information following
2014-10-23 14:16:11,392 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-10-23 14:16:11,426 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
i have google lot of website for that unknown error problem, but i can't solve this problem...
Following is my hadoop and hbase configuration
Hadoop :
salves content : localhost
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:9001</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:9002</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:9003</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value></value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>localhost:9004</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value></value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
</configuration>
Hbase:
hbase-env.sh :
..
export JAVA_HOME="/usr/lib/jvm/java-7-oracle"
..
export HBASE_MANAGES_ZK=true
..
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
regionserver content : localhost
my /etc/hosts content:
127.0.0.1 localhost
#127.0.1.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
I try lots of methods to solve it, but all fail, please help me to solve it, i really need to know how to solve.
Originally, i run a mapreuce program and when map 67% reduce 0%, it print out some INFO and some of INFO is following:
14/10/23 15:50:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher#ce1472
14/10/23 15:50:41 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
14/10/23 15:50:41 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
14/10/23 15:50:41 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1493be510380007, negotiated timeout = 40000
14/10/23 15:50:43 INFO mapred.LocalJobRunner: map > sort
14/10/23 15:50:46 INFO mapred.LocalJobRunner: map > sort
then it crash.. I think program maybe in dead lock and that is what i want to solve zookeeper problem above.
If want another configuration file i set in hadoop or hbase or others, just tell me, i'll post up.
thanks!
I don't think zookeeper is your problem. You should look at your other logs for more information about your map/reduce job status. Check the datanode and namenode logs for errors along with the yarn log messages through the yarn job tracker ui.
ZooKeeper Messages
Those messages are from zookeeper trying to connect with the Zookeeper sasl client. If sasl is not configured the client will still be able to connect but the connection won't be authenticated.
Error message comes from this file
ZooKeeperSaslClient.java
150 // The user did not override the default context. It might be that they just don't intend to use SASL,
151 // so log at INFO, not WARN, since they don't expect any SASL-related information.
152 String msg = "Will not attempt to authenticate using SASL ";
153 if (runtimeException != null) {
154 msg += "(" + runtimeException + ")";
155 } else {
156 msg += "(unknown error)";
157 }
158 this.configStatus = msg;
159 this.isSASLConfigured = false;
160 }
If you want to get rid of the error you will have to configure zookeeper to use sasl. Sorry don't have any experience with configuration sasl for zookeeper.
ZooKeeper SASL Configuration
Add follwing properties in hbase-site.xml file
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.56.101</value> #this is my server ip
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
restart ./start-hbase.sh
This is how I solved it - after I tried including hbase-site.xml in classpath and giving the zookeeper quorum value as -Dhbase.zookeeper.quorum - and they did not work.
I copied hbase-site.xml to the same folder as my jar and then did
jar uf myjar.jar hbase-site.xml
And then I ran hadoop jar myjar.jar Blah
This fixed the problem
logs
2014-05-12 16:41:26,773 INFO org.apache.hadoop.ipc.RPC: Server at namenode/192.168.12.196:10001 not available yet, Zzzzz...
2014-05-12 16:41:28,777 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.12.196:10001. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
core site xml....
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://user#namenode:10001</value>
</property>
</configuration>
i put in etc/hosts
192.168.12.196 namenode
in masters
user#namenode
in slaves
localhost
and my namenode is on user#192.168.12.196
if i do jps on all node it shows datanode namenode job/tasktracker working fine
You need to change to localhost into namenode in slaves and masters file and restart hadoop once it will work fine.
for better view
Thanks for your Comment
if i put hostname in slaves of namenode it runs datanode and namenode on same node
configuration of my masters and slave are hereafter,
on namenode's masters
'user#namenode'
on namenode'master
hdname1#data1 (data1 belong to ip of node and hdname1 is user)
hdname2#data2
on datanode's masters
user#namenode
on datanode's slaves
hdname1#data1