Hadoop 2.6 installation on Mac, failed to start ResourceManager - macos

I'm trying to install Hadoop on Mac, following this post.
Everything looks fine, but in the end, I cannot start the ResourceManager.
i.e. after running $jps.
It is supposed to show something like below.
2507 ResourceManager ---------this is missing (not started)
1712 SecondaryNameNode
1412 NameNode
1540 DataNode
2045 NodeManager
2858 Jps
Here is the error message I got:
starting resourcemanager, logging to /usr/local/Cellar/hadoop/2.6.0/libexec/logs/yarn-myUserName-resourcemanager-Mac.out
nohup: can't detach from console: No such file or directory
FYI, I've also changed the ownership of this hadoop folder. (as follows)
$mkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/namenode
$mkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/datanode
$sudo chown -R Tiger /usr/local/Cellar/hadoop
Any suggestions?

that's wired, I just retried to start the resourcemanager, it works !!!! thought the nodemanager stoped. sorry for bothering, Omigo

Related

Preferred way to start a hadoop datanode

Functionally, I was wondering whether there is a difference between starting a hadoop namenode with the command:
$HADOOP_HOME/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
and:
hdfs datanode
The first command gives me an error stating that the datanode cannot ssh into itself(this is running on a docker container), while the second command seems to run without that issue. The official hadoop documentation for this version, (2.9.1) doesn't mention "hdfs datanode" as a way to start a datanode.

Cannot start running on browser the namenode for Hadoop

It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh

How to check if hdfs is running?

I would like to see if the hdfs file system for Hadoop is working properly. I know that jps lists the daemons that are running, but I don't actually know which daemons to look for.
I ran the following commands:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
Only namenode, resourcemanager, and nodemanager appeared when I entered jps.
Which daemons are supposed to be running in order for hdfs/Hadoop to function? Also, what could you do to fix hdfs if it is not running?
Use any of the following approaches for to check your deamons status
JPS command would list all active deamons
the below is the most appropriate
hadoop dfsadmin -report
This would list down details of datanodes which is basically in a sense your HDFS
cat any file available in hdfs path.
So, I spent two weeks validating my setup (it was fine) , finally found this command:
sudo -u hdfs jps
Initially my simple JPS command was showing only one process, but Hadoop 2.6 under Ubuntu LTS 14.04 was up. I was using 'Sudo' to run the startup scripts.
Here is the startup that work with JPS listing multiple processes:
sudo su hduser
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh

Hadoop cannot start NodeManager

I have installed the Hadoop Cluster which is the hadoop 0.23.9 version. I install the HDFS-1943.patch and now I can start all the namenode and datanode. (start-dfs.sh is working for me)
However, when I want to start the yarn daemons (running start-yarn.sh) , it shows the following error as the same as the previous happening:
[root#dbnode1 sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hchen/hadoop-0.23.9/logs/yarn-root- resourcemanager-dbnode1.out
datanode: starting nodemanager, logging to /home/hchen/hadoop-0.23.9/logs/yarn-root-nodemanager-dbnode2.out
datanode: Unrecognized option: -jvm
datanode: Error: Could not create the Java Virtual Machine.
datanode: Error: A fatal exception has occurred. Program will exit.
I have installed the patch already and start-dfs.sh is working for me. Why start-yarn.sh does not work??
Run HDFS as a non-root user with the appropriate permissions. Here is a JIRA with more details.

Datanode process not running in Hadoop

I set up and configured a multi-node Hadoop cluster using this tutorial.
When I type in the start-all.sh command, it shows all the processes initializing properly as follows:
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out
However, when I type the jps command, I get the following output:
31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker
As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.
EDIT:
hadoop-root-datanode-jawwadtest1.log:
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$
2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/
You need to do something like this:
bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
rm -Rf /app/tmp/hadoop-your-username/*
bin/hadoop namenode -format (or hdfs in the 2.x serie)
the solution was taken from:
http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.
I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.
When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!
I faced similar issue while running the datanode. The following steps were useful.
In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use
rm -r namenode
rm -r datanode
In */hadoop_store/hdfs directory use
sudo mkdir namenode
sudo mkdir datanode
In case of permission issue, use
chmod -R 755 namenode
chmod -R 755 datanode
In [hadoop_directory]/bin use
hadoop namenode -format (To format your namenode)
In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
Use jps to check the services running.
Delete the datanode under your hadoop folder then rerun start-all.sh
I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode
Follow these steps and your datanode will start again.
Stop dfs.
Open hdfs-site.xml
Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
Then start dfs again.
Need to follow 3 steps.
(1) Need to go to the logs and check the most recent log (In hadoop-
2.6.0/logs/hadoop-user-datanode-ubuntu.log)
If the error is as
java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1
i.e. namenode cluster id and datanode cluster id's are not identical.
(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error
(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version
clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089
Restart Hadoop. Will run DataNode
Stop all the services - ./stop-all.sh
Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.
Format the namenode.(hadoop namenode -format)
Now start the services on namenode.
./bin/start-all.sh
This made a difference for me to start the datanode service.
Stop the dfs and yarn first.
Remove the datanode and namenode directories as specified in the core-site.xml file.
Re-create the directories.
Then re-start the dfs and the yarn as follows.
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Hope this works fine.
Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata
then run:
hdfs namenode -format
finally run:
start-all.sh
Then your problem gets solved.
Please control if the the tmp directory property is pointing to a valid directory in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
If the directory is misconfigured, the datanode process will not start properly.
Run Below Commands in Line:-
stop-all.sh (Run Stop All to Stop all the hadoop process)
rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
hadoop namenode -format (Format your namenode)
start-all.sh (Run Start All to start all the hadoop process)
JPS (It will show the running processes)
Step 1:- Stop-all.sh
Step 2:- got to this path
cd /usr/local/hadoop/bin
Step 3:- Run that command
hadoop datanode
Now DataNode work
Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set.
If you set it, navigate to this directory, and remove or empty this directory.
If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.
In case of Mac os(Pseudo-distributed mode):
Open terminal
Stop dfs. 'sbin/stop-all.sh'.
cd /tmp
rm -rf hadoop*
Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
sbin/start-dfs.sh
Error in datanode.log file
$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log
Shows:
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c
Solution: Stop your cluster and issue the below command & then start your cluster again.
sudo rm -rf /usr/local/hadoop_tmp/hdfs/datanode/*
I have got details of the issue in the log file like below :
"Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x"
and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.
Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
You can change this to:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp2</value>
</property>
and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.
This is for newer version of Hadoop (I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml
Look out for directory paths corresponding to
dfs.namenode.name.dir
dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.
You need to check :
/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---
in those two files and that to Namespace ID of name node and datanode.
If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.
If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.
if formatting the tmp directory is not working then try this:
first stop all the entities like namenode, datanode etc. (you will
be having some script or command to do that)
Format tmp directory
Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents
in the directory manually
Now format your namenode again
start all the entities then use jps command to confirm that the
datanode has been started
Now run whichever application you have
Hope this helps.
I configured hadoop.tmp.dir in conf/core-site.xml
I configured dfs.data.dir in conf/hdfs-site.xml
I configured dfs.name.dir in conf/hdfs-site.xml
Deleted everything under "/tmp/hadoop-/" directory
Changed file permissions from 777 to 755 for directory listed under dfs.data.dir
And the data node started working.
Even after removing the remaking the directories, the datanode wasn't starting.
So, I started it manually using bin/hadoop datanode
It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process.
It's working, but I just have to keep the unfinished terminal open by the side.
Follow these steps and your datanode will start again.
1)Stop dfs.
2)Open hdfs-site.xml
3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
4)Then start dfs again.
Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.
The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)
Once I was not able to find data node using jps in hadoop, then I deleted the
current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.
This time I could find the data node and current folder was created again.
Try this
stop-all.sh
vi hdfs-site.xml
change the value given for property dfs.data.dir
format namenode
start-all.sh
I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using
${HADOOP_HOME}/sbin/stop-all.sh
Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.
Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder
Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder
Fifth >>
Now Start all the services again. And First check that all service including datanode is running.
enter image description here
mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup
mkdir /usr/local/hadoop_store/hdfs/datanode
hadoop datanode OR start-all.sh
jps

Resources