After I set up a 20 node Hadoop cluster on AWS, which to my knowledge is working, when I try to start up yarn with the code:
$HADOOP_HOME/sbin/start-yarn.sh
I get these errors:
resourcemanager running a process (process #). Stop it first
and
nodemanager running a process (process #). Stop it first
for each of the worker nodes.
my yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ec2-52-207-188-72.compute-1.amazonaws.com</value>
</property>
</configuration>
is there a solution for this?
First:
call stop-all.sh
for stopping it and you could be sure by using "JPS" command
then start it again:
all start-all.sh
Type jps (if namenode don't appear type "hadoop namenode" and check error)
Related
I have Hadoop 3.2.1 installed on Ubuntu 16.04lts and my cluster has 18 datanodes and 1 master.
After running:
$ start-dfs.sh
$ start-yarn.sh
$ jps
On master I get the following:
ResourceManager
NameNode
SecondaryNameNodecode
jps
And on datanodes:
DataNode
jps
All the nodes seems to be live:
NameNode Overview Web Page
But when I reach the Cluster overview, none of my datanodes seems to be active:
Cluster Overview
My configurations files:
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-3.2.1/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop-3.2.1/data/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop-3.2.1/data/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
The namenode and datanode directories exists on every host (master and datanodes)
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services </name>
<value> mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
</configuration>
Also I have configured hadoop-env.sh for JAVA_HOME Path and all the other variables are in .bashrc file (also in every host).
I have modified the /etc/hosts file to include all the hosts with their IPs and hostnames and finally I have also modified the workers file to include all the IPs of the datanodes.
The first time I have formatted the NameNode, the directories for the hdfs-site.xml was wrong (I had the datanode dir twice), so hdfs make its own directories under /tmp/hdfs/ (if I remember correctly). But I fixed this with formating again the NameNode with the corect directories.
I've integrated my hadoop2 and hbase0.98 with phoenix and by typing command sqlline.py localhost phoenix shell starts, but when I try to run apache phoenix example by this command : psql.py /usr/local/phoenix/examples/WEB_STAT.sql /usr/local/phoenix/examples/WEB_STAT.csv /usr/local/phoenix/examples/WEB_STAT_QUERIES.sql I faced this error ERROR client.HConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
I use hadoop 2.6 in single mode and hbase 0.98 in psudo distributed mod, in addition I didn't explicitly install zookeeper, is it required to install zookeeper explicitly?
my HBASE_HOME/conf/hbase-site.xml file contains :
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/hbase/zookeeper</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.master</name>
<value>hadoop-master:60000</value>
</property>
</configuration>
and my running java process are
7415 DataNode
7262 NameNode
9119 Jps
7605 SecondaryNameNode
7893 NodeManager
8704 HRegionServer
8544 HMaster
8475 HQuorumPeer
7763 ResourceManager
Simply you should add the address of your server here localhost to your command. Pay attention to command you've already run, sqlline.py localhost that you gave the server address.
Are you using the HDP distribution? iirc they use /hbase-unsecure or for un-Kerberized clusters. I don't remember how it interacted with your config setting for /hbase
start the ZooKeeper cli
zkCli.sh or perhaps some variant of zookeepershell
query the existing root nodes
ls /
the HBase root node is probably named hbase-unsecure
I am trying to create a cluster for using hadoop. I am trying to start my namenode but it is not starting. After restarting the system it starts for a moment and then again goes off.I am using the command as a root user and given the namenode the root user rights. I am facing the same problem with jobtracker and datanode.
To start the namenode I am using the command hadoop-daemon.sh start namenode
What is the problem here?
[hadoop#localhost ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop- namenode-localhost.localdomain.out
Warning: $HADOOP_HOME is deprecated.
[hadoop#localhost ~]$ jps
6500 Jps
[hadoop#localhost ~]$ jps
The core-site.xml file contains
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://lab1:8020</value>
</property>
</configuration>
The hdfs-site.xml contains
<configuration>
<property>
<name>dfs.replication.dir</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
I am new to hadoop. When I run wordcount test project, evrything works fine. But, I can't access the JobTracker at http://localhost:50030. in fact, when I get my secondary node log file, I get exception message :
java.io.IOException: Bad edit log manifest (expected txid = 3: [[21,22], [23,24]
[8683,8684], [8685,8686], [8687,8688], [8689,8690], [8691,8692], [8693,8694], [8695,8696], [8697,8698], [8699,8700]]...
....
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:438)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:540)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)
Btw, when I run jps, I get 53745 JobHistoryServer 77259 Jps
UPDATE : here's my config
in core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
in hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
and nothing is set in my yarn-site.xml
If you are using latest version of Hadoop, then Job Tracker will not be available. Job tracker is replaced by Resource Manager and History Server.
If you want to access past job details, go to http://hostname:19888. This is the web UI address for job history server.
Please refer Hadoop Cluster Setup for further details.
It is normal that in ResourceManager (nodemanager:8088/cluster/nodes) i can see only one node?
In my test environment i setup two node cluster and command bin/hdfs dfsadmin -report show me two nodes.
Sorry but i am find the solution.
You need to add following property in your conf/yarn-site.xml file on all nodes:
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>resourcemanager_address:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>resourcemanager_address:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>resourcemanager_address:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>resourcemanager_address:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>resourcemanager_address:8033</value>
</property>
That will be overwrite the default settings for resourcemanager address (default is 0.0.0.0).
Hope this helps someone.
You can also simply set
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager_address</value>
</property>
... and the rest of the properties will be set correctly automatically.
To point out the obvious, make sure you start/restart the nodemanager as well.
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager