How can I troubshoot this Hadoop filesystem installation error? - hadoop

I'm trying to install Hadoop on a non-Cloudera Ubuntu test image. Everything seems to have been going well until I ran ./bin/start-all.sh. The name node never comes up so I can't even run a hadoop fs -ls to connect to the filesystem.
Here's the namenode log:
2011-03-24 11:38:00,256 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310
2011-03-24 11:38:00,257 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1006)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1015)
2011-03-24 11:38:00,258 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Brash/192.168.1.5
************************************************************/
I've chmod -R 755 on the root directory and even gone so far as to make sure the directory exists by creating it with mkdir -p.
hadoop#Brash:/usr/lib/hadoop$ ls -la /usr/local/hadoop-datastore/hadoop-hadoop/dfs/
total 16
drwxr-xr-x 4 hadoop hadoop 4096 2011-03-24 11:41 .
drwxr-xr-x 4 hadoop hadoop 4096 2011-03-24 11:31 ..
drwxr-xr-x 2 hadoop hadoop 4096 2011-03-24 11:31 data
drwxr-xr-x 2 hadoop hadoop 4096 2011-03-24 11:41 name
Here's my /conf/hdfs-site.xml:
hadoop#Brash:/usr/lib/hadoop$ cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>

You should never have to create the directory yourself. It will create it on its own. Did you forget to format namenode? Delete the existing directory, then reformat the namenode (bin/hadoop namenode -format) and try again.

Related

Hadoop Edge HDFS points to local FS

I have done my Hadoop cluster including 1 NameNode and 2 DataNodes and everything works perfectly :)
Now, I want to add a Hadoop Edge (aka Hadoop Gateway), I followed instructions here and finally, I execute :
hadoop fs -ls /
But unfortunately, I expected to see my HDFS's content but I see my local FS :
Found 22 items
-rw-r--r-- 1 root root 0 2017-03-30 16:44 /autorelabel
dr-xr-xr-x - root root 20480 2017-03-30 16:49 /bin
...
drwxr-xr-x - root root 20480 2016-07-08 17:31 /home
I think my core-site.xml is configurated as needed with specific property :
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnodemaster1:8020/</value>
</property>
hadoopmaster1 is my namenode and is reachable ..
I don't understand why I see my Local FS and not my HDFS .. Thank you :)

Incompatible clusterIDs in datanode and namenode

I checked solutions in this site.
I went to the (hadoop folder)/data/dfs/datanode to change ID.
but, there are not anything in datanode folder.
what can I do?
Thank for reading.
And If you help me, I will be appreciate you.
PS
2017-04-11 20:24:05,507 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/tmp/hadoop-knu/dfs/data/
java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-knu/dfs/data: namenode clusterID = CID-4491e2ea-b0dd-4e54-a37a-b18aaaf5383b; datanode clusterID = CID-13a3b8e1-2f8e-4dd2-bcf9-c602420c1d3d
2017-04-11 20:24:05,509 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to localhost/127.0.0.1:9010. Exiting.
java.io.IOException: All specified directories are failed to load.
2017-04-11 20:24:05,509 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to localhost/127.0.0.1:9010
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9010</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/knu/hadoop/hadoop-2.7.3/data/dfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/knu/hadoop/hadoop-2.7.3/data/dfs/namesecondary</value>
</property>
<property>
<name>dfs.dataode.data.dir</name>
<value>/home/knu/hadoop/hadoop-2.7.3/data/dfs/datanode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>localhost:50090</value>
</property>
</configuration>
PS2
[knu#localhost ~]$ ls -l /home/knu/hadoop/hadoop-2.7.3/data/dfs/
drwxrwxr-x. 2 knu knu 6 4월 11 21:28 datanode
drwxrwxr-x. 3 knu knu 40 4월 11 22:15 namenode
drwxrwxr-x. 3 knu knu 40 4월 11 22:15 namesecondary
The problem is with the property name dfs.datanode.data.dir, it is misspelt as dfs.dataode.data.dir. This invalidates the property from being recognised and as a result, the default location of ${hadoop.tmp.dir}/hadoop-${USER}/dfs/data is used as data directory.
hadoop.tmp.dir is /tmp by default, on every reboot the contents of this directory will be deleted and forces datanode to recreate the folder on startup. And thus Incompatible clusterIDs.
Edit this property name in hdfs-site.xml before formatting the namenode and starting the services.
My solution :
rm -rf ./tmp/hadoop-${user}/dfs/data/*
./bin/hadoop namenode -format
./sbin/hadoop-daemon.sh start datanode
Try formatting the namenode and then restarting HDFS.
Copy the cluster under directory /hadoop/bin$:
./hdfs namenode -format -clusterId CID-a5a9348a-3890-4dce-94dc-0fec2ba999a9

Unable to start datanode and file permissions of datanode are changing when start-dfs.sh is started

I was facing issues in deploying local files to hdfs and found that I should have "drwx------" for datanode and namenode.
Initial permission status of datanode and namenode in hdfs.
drwx------ 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
Permission of datanode is changed to 755
hduser#pradeep:~$ chmod -R 755 /usr/local/hadoop_store/hdfs/
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
After initiating start-dfs.sh, datanode didn't start and permissions to datanode were restored to original state.
hduser#pradeep:~$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop- hduser-namenode-pradeep.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-pradeep.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-pradeep.out
hduser#pradeep:~$ jps
4385 Jps
3903 NameNode
4255 SecondaryNameNode
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwx------ 3 hduser hadoop 4096 Mar 2 22:34 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 22:34 namenode
As datanode is not running i am not able to deploy data to hdfs from local file system. I couldn't understand or find any reason why the file permissions are restored to previous state only for datanode folder.
it appears the name space id generated by the NameNode is different from your DataNode.
Solution:
if you goto the path where your hadoop files are stored on the local file system.
forexample /usr/local/hadoop. go down the path to /usr/local/hadoop/tmp/dfs/name/version. copy the namespaceid and take it to the path /usr/local/hadoop/tmp/dfs/data/version , replace the namespaceid.
i hope this helps.

HDFS Federation Unknown Namespace

Let's say that I have configured two name nodes to manage /marketing and /finance respectively. I am wondering what is going to be happened if I were to put a file in /accounting directory. Would HDFS accept the file? If so, which namespace manages the file?
The write will fail. Neither namespace will manage the file.
You will get an IOException with a No such file or directory error from the ViewFs client.
For example, given the following ViewFs config in core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>viewfs:///</value>
</property>
<property>
<name>fs.viewfs.mounttable.default.link./namenode-a</name>
<value>hdfs://namenode-a</value>
</property>
<property>
<name>fs.viewfs.mounttable.default.link./namenode-b</name>
<value>hdfs://namenode-b</value>
</property>
</configuration>
The following behavior is exhibited:
$ bin/hdfs dfs -ls /
-r--r--r-- - sirianni gopher 0 2013-10-22 15:58 /namenode-a
-r--r--r-- - sirianni gopher 0 2013-10-22 15:58 /namenode-b
$ bin/hdfs dfs -copyFromLocal /tmp/bar.txt /foo/bar.txt
copyFromLocal: `/foo/bar.txt': No such file or directory

Hadoop previous.checkpoint location

I tried hadoop-1.0.3 as well as 1.0.4. Both under pseudo cluster mode.
My understanding is that the previous.checkpoint directory should be created under secondary name node designated by "fs.checkpoint.dir"? On all occasions I am finding it under the namenode directory designated by "dfs.name.dir". It this something to do with Pseudo mode or my understanding is wrong? Can someone please help!
Below is my hdfs-site.xml file.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/lab/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/lab/hdfs/datanode</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoop/lab/hdfs/secnamenode</value>
</property>
</configuration>
The below is high level directory structure for the daemons
hadoop#ubuntu:~/lab/hdfs$ ls -l
total 12
drwxr-xr-x 6 hadoop hadoop 4096 Mar 14 03:41 datanode
drwxrwxr-x 5 hadoop hadoop 4096 Mar 14 03:41 namenode
drwxrwxr-x 4 hadoop hadoop 4096 Mar 14 04:46 secnamenode
Below is the NameNode directory details
hadoop#ubuntu:~/lab/hdfs$ ls -l namenode
total 12
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 04:46 current
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:13 image
-rw-rw-r-- 1 hadoop hadoop 0 Mar 14 03:41 in_use.lock
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:34 previous.checkpoint
Below is the SNN directory details
hadoop#ubuntu:~/lab/hdfs$ ls -l secnamenode
total 8
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 04:46 current
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:46 image
-rw-rw-r-- 1 hadoop hadoop 0 Mar 14 03:41 in_use.lock
If you need any further details please let me know.
Thanks
Rags
I have done an extensive and desperate search about this. There seems to be a long pending Bug HDFS-1839 which removes previous.checkpoint directory from the SecondaryNameNode. The same bug might be responsible for having this directory created under the NameNode.
I have seen all the versions of hadoop so far and under all these the previous.checkpoint directory is consistently being created under the NameNode.
Hope soon this bug will be fixed or Apache Hadoop clarifies why directory is created under NameNode

Resources