I'm struggling with installing Hadoop 2.2.0 on my Mac OSX 10.9.3. I essentially followed this tutorial:
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
When I run $HADOOP_PREFIX/bin/hdfs namenode -format to format namenode, I get the message:
SHUTDOWN_MSG: Shutting down NameNode at Macintosh.local/192.168.0.103. I believe this is preventing me from successfully running the test
$HADOOP_PREFIX/bin/hadoop jar $HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-
distributedshell-2.2.0.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar
$HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar --
shell_command date --num_containers 2 --master_memory 1024
Does anyone know how to correctly format namenode?
(Regarding the test command above, someone mentioned to me that it could have something to do with the hdfs file system not functioning properly, if this is relevant.)
Related
I had a Hadoop on my machine running but I was running into some compiler issues, so I deleted it and started fresh.
I was following this setup: https://www.guru99.com/how-to-install-hadoop.html
When I run $HADOOP_HOME/bin/hdfs namenode -format
Terminal doesn't return any thing.
Thanks in advance.
I had the same issue and fix it by following
Change to etc file for adjusting the files hadoop-env , core-site, mapred, yarn
Change to sbin to do format and lunch hds services
run with sudo will get you result
for example
sudo hdfs namenode -format
You need to make sure that you already adjust the files hadoop-env , core-site, mapred, yarn then processed to namenode format
It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh
I have successfully installed Hadoop in my ubuntu system.
But when i run the command start-all.sh all the daemons start except the namenode.
Please help me.I Have attached the image with the problem
Try to Name Node Format Format it might work for You:
bin/hadoop namenode -format
and then try
start-all.sh
I would like to see if the hdfs file system for Hadoop is working properly. I know that jps lists the daemons that are running, but I don't actually know which daemons to look for.
I ran the following commands:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
Only namenode, resourcemanager, and nodemanager appeared when I entered jps.
Which daemons are supposed to be running in order for hdfs/Hadoop to function? Also, what could you do to fix hdfs if it is not running?
Use any of the following approaches for to check your deamons status
JPS command would list all active deamons
the below is the most appropriate
hadoop dfsadmin -report
This would list down details of datanodes which is basically in a sense your HDFS
cat any file available in hdfs path.
So, I spent two weeks validating my setup (it was fine) , finally found this command:
sudo -u hdfs jps
Initially my simple JPS command was showing only one process, but Hadoop 2.6 under Ubuntu LTS 14.04 was up. I was using 'Sudo' to run the startup scripts.
Here is the startup that work with JPS listing multiple processes:
sudo su hduser
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.