Hadoop localhost:9870 don't work before format hdfs namenode - hadoop

I have installed the hadoop.
When I start dfs and yarn, just yarn localhost work. For dfs localhost work, I need to do "bin/hdfs namenode -format" every time I start my laptop, and then start dfs, and it works.
How can I fix this ?
Sorry my bad english

You always have to format the namenode at first start.
If you are needing to do it more than once, you'll need to look at the logs to find out why HDFS is not starting... More than likely, you're just shutting down your computer, and not stopping HDFS process, and the file blocks are becoming corrupt

Related

Hadoop HDFS start up fails requires formatting

I have a multi-node standalone hadoop cluster for HDFS. I am able to load data to HDFS, however everytime I reboot my computer and start the cluster by start-dfs.sh, I don't see the dashboard until I perform hdfs namenode -format which erases all my data.
How do I start hadoop cluster without having to go through hdfs namenode -format?
You need to shutdown hdfs and the namenode cleanly (stop-dfs) before you shutdown your computer. Otherwise, you can corrupt the namenode, causing you to need to format to get back to a clean state

Hadoop : swap DataNode & NameNode without losing any HDFS data

I have a cluster of 5 machines:
1 big NameNode
4 standard DataNodes
I want to change my current NameNode with a DataNode without losing the data stored in HDFS, so my cluster could become:
1 standard NameNode
3 standard DataNodes
1 big DataNode
Does someone know a simple way to do that?
Thank you very much
Decomission data node where namenode will be moved.
Stop the cluster.
Create a tar of dfs.name.dir from current namenode.
Copy all hadoop config files from current NN to target NN.
Replace the name/ip of target namenode by modifying core-site.xml.
Restore tarball of dfs.name.dir. Make sure that full path is same.
Now start the cluster by starting new namenode and one less datanode.
Verify that everything is working perfectly.
Add old namenode as datanode by configuring it as datanode.
I would suggest to uninstall and then install hadoop on both the nodes so that previous configuration does not cause any problem.

NameNode is not formatted

After rebooting the servers, hdfs refuses to start and keeps saying :
2016-01-09 17:39:21,117 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
Data is still there! All seems to be okey, but it keeps failing.
I checked all the solution out there, none helped! I would like avoiding running a namenode format, as I would lose all my data.
Any ideas besides rebuilding?
Thanks
just format your namenode by using below command:
$ bin/hdfs namenode -format

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Hadoop installation - Datanode running, but not showing in JPS

I have installed CDH3U5 on a 2 node cluster. Everything seems to run fine such as all the services, web UI, MR jobs, HDFS shell commands. However, interestingly, when I started the datanode service, it gave me an OK message that datanode is running as process say X. But when I run JPS, I do not see the label "Datanode" for the process. So the output looks like -
17153 TaskTracker
18908 Jps
16267
The process ID - 16267 is the Datanode process. All other checkpoints have passed. So this seems weird. The same thing happens on the other node in the cluster. Any insight into this behavior and if this is something that needs fixing would be helpful.
can you check the following and reply?
- web interface for namenode and what does it show there for livenode
- logfiles for datanode to see if any exception
- if datanode is pingable/ssh from namenode and viceversa
If all the above look ok I'm not sure what the problem is but to fix you can
- stop all hadoop deamons
- delete temp directory pointed in conf/core-site.xml for both NN and DN
- format namenode
- start deamon

Resources