HDFS Namenode High Availability - hadoop

I enabled the Namenode High Availability using ambari.
I want to verify the connection using dfs.nameservices (nameservice ID) before start the coding.
Is there any command line or tool to verifiy it?

You can use the normal HDFS CLI.
hdfs dfs -ls hdfs://nameservice/user
Which should also work the same as
hdfs dfs -ls hdfs:///user
Or giving your active namenode
hdfs dfs -ls hdfs://namenode-1:port/user
If you provide the standby namenode, it will say operation READ not supported in state standby

Related

Hadoop fs relates to generic filesystem supported by hadoop But hdfs dfs related to hdfs only

Hadoop fs relates to generic filesystem supported by hadoop But hdfs dfs related to hdfs only.
Then why following command is allowed
hdfs dfs -ls file:///
How can I access localfilesystem using hdfs dfs ?

Adding a new Namenode to an existing HDFS cluster

In Hadoop HDFS Federation the latest step of adding a new NameNode to an existing HDFS cluster is:
==> Refresh the Datanodes to pickup the newly added Namenode by running the following command against all the Datanodes in the cluster:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
Witch is the best place to execute the flowing command: NameNode or datanode ?
If I have 1000 Datanodes is it logical to run it 1OOO time ?
In namenode run this command once.
$HADOOP_PREFIX/sbin/slaves.sh hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
slaves.sh script will distribute the command to all the slave hosts which are mentioned in slaves file (typically placed in $HADOOP_CONF_DIR)

How to remove removed datanode details from hadoop cluster

I used following property for reduce dead node timeout.
Propertyname : dfs.heartbeat.recheck.interval
value : 1
But when I remove datanode from cluster this details not removed from hadoop cluster.It is in dead node state only in that cluster.
Please suggest any way to remove removed datanode details from hadoop cluster.
You can view the live nodes or dead nodes alone by using the below HDFS commands
hdfs dfsadmin -report -live
hdfs dfsadmin -report -dead
You can get the live node name or any other particular details using the below HDFS command
hdfs dfsadmin -report -live | grep Name:
Hope it helps.

Cloudera CDH4 - how come I can't browse the hdfs filesystem from the nodes?

I installed my test cluster using Cloudera Manager free.
I can only browse the filesystem from the main NameNode. When running hadoop dfs -ls only shows the local folder.
JPS shows the Jps, TaskTracker, DataNode on the nodes.
MapReduce tasks/jobs run fine on all the nodes as a cluster.
With my custom setup Hadoop cluster (without Cloudera), I can easily browse and manipulate the hdfs filesystem (eg. I can run hadoop dfs -mkdir test1 on all the nodes - but only on the NameNode in CDH4)
why is this?
Try using the command ./bin/hadoop fs -ls / for HDFS browsing.

SafeModeException : Name node is in safe mode

I tried copying files from my local disk to hdfs . At first it gave SafeModeException. While searching for solution I read that the problem does not appear if one executes same command again. So I tried again and it didn't gave exception.
hduser#saket:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg/ /user/hduser/gutenberg
copyFromLocal: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hduser/gutenberg. Name node is in safe mode.
hduser#saket:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg/ /user/hduser/gutenberg
Why is this happening?. Should I keep safemode off by using this code?
hadoop dfs -safemode leave
NameNode is in safemode until configured percent of blocks reported to be online by the data nodes. It can be configured by parameter dfs.namenode.safemode.threshold-pct in the hdfs-site.xml
For small / development clusters, where you have very few blocks - it makes sense to make this parameter lower then its default 0.9999f value. Otherwise 1 missing block can lead to system to hang in safemode.
Go to the hadoop path into bin(in my system /usr/local/hadoop/bin/),
cd /usr/local/hadoop/bin/
Check there is a file hadoop,
hadoopuser#arul-PC:/usr/local/hadoop/bin$ ls
the o/p will be,
hadoop hadoop-daemons.sh start-all.sh start-jobhistoryserver.sh stop-balancer.sh stop-mapred.sh
hadoop-config.sh rcc start-balancer.sh start-mapred.sh stop-dfs.sh task-controller
hadoop-daemon.sh slaves.sh start-dfs.sh stop-all.sh stop-jobhistoryserver.sh
Then you have to off safe mode by using command ./hadoop dfsadmin -safemode leave,
hadoopuser#arul-PC:/usr/local/hadoop/bin$ ./hadoop dfsadmin -safemode leave
you will get response as,
Safe mode is OFF
Note: I created Hadoop user with the name of hadoopuser.

Resources