Cloudera CDH4 - how come I can't browse the hdfs filesystem from the nodes? - hadoop

I installed my test cluster using Cloudera Manager free.
I can only browse the filesystem from the main NameNode. When running hadoop dfs -ls only shows the local folder.
JPS shows the Jps, TaskTracker, DataNode on the nodes.
MapReduce tasks/jobs run fine on all the nodes as a cluster.
With my custom setup Hadoop cluster (without Cloudera), I can easily browse and manipulate the hdfs filesystem (eg. I can run hadoop dfs -mkdir test1 on all the nodes - but only on the NameNode in CDH4)
why is this?

Try using the command ./bin/hadoop fs -ls / for HDFS browsing.

Related

HDFS Namenode High Availability

I enabled the Namenode High Availability using ambari.
I want to verify the connection using dfs.nameservices (nameservice ID) before start the coding.
Is there any command line or tool to verifiy it?
You can use the normal HDFS CLI.
hdfs dfs -ls hdfs://nameservice/user
Which should also work the same as
hdfs dfs -ls hdfs:///user
Or giving your active namenode
hdfs dfs -ls hdfs://namenode-1:port/user
If you provide the standby namenode, it will say operation READ not supported in state standby

Adding a new Namenode to an existing HDFS cluster

In Hadoop HDFS Federation the latest step of adding a new NameNode to an existing HDFS cluster is:
==> Refresh the Datanodes to pickup the newly added Namenode by running the following command against all the Datanodes in the cluster:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
Witch is the best place to execute the flowing command: NameNode or datanode ?
If I have 1000 Datanodes is it logical to run it 1OOO time ?
In namenode run this command once.
$HADOOP_PREFIX/sbin/slaves.sh hdfs dfsadmin -refreshNameNodes <datanode_host_name>:<datanode_rpc_port>
slaves.sh script will distribute the command to all the slave hosts which are mentioned in slaves file (typically placed in $HADOOP_CONF_DIR)

No folders in Hadoop 2.6 after installing

I am new to Hadoop. I succesfully installed hadoop 2.6 in my Ubuntu 12.04 by follwing the below link.
Hadoop 2.6 Installation
All services are running. But when I try to load file from local to HDFS, but it not at all showing folders in HDFS like /user or /data
hduse#vijee-Lenovo-IdeaPad-S510p:~$ jps
4163 SecondaryNameNode
4374 ResourceManager
3783 DataNode
3447 NameNode
5048 RunJar
18538 Jps
4717 NodeManager
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop fs -ls hdfs:/
No output
If I run the above command: hadoop fs -ls hdfs:/, it is not showing any folder. I installed Pig as well and now I want to load data to Pig in mapreduce mode. In most of the websites they mentioned blindly URI in place of HDFS path. Please guide how to create folders and load data in the hdfs path.
If you are using plain vanilla hadoop, you will not see any directories. You have to create those.
You can start creating by running hadoop fs -mkdir /user

Not able to access HDFS from the Datanodes in clusture

Have installed cloudera cdh4 on 3 node cluster.Facing problem when trying to access the data in HDFS through slave nodes(Datanodes).
When I am trying to create the new folder in HDFS using
hadoop fs -mkdir Flume(Foldername)
command not able to put the data or create the folder in the hdfs of the cluster from either of the slaves,but working from the master node,also flume ,hive ,pig all other process are running in the slaves
(Datanodes)
Tried
restarting the cluster
namenode format
Still not working!!
Secondly When I am doing
hadoop fs -ls /
results are not of from the hdfs but from the current directory path of the slave node from where I am usin this command.
And how to check if hdfs is working and installed properly in slave nodes(Datanodes) in cluster apart from creating the directory in HDFS.
Could anybody help?
Please verify the property "fs.default.name" in the core-site.xml.
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:9000</value>
</property>

How to remove a hadoop node from DFS but not from Mapred?

I am fairly new to hadoop. For running some benchmarks, I need variety of hadoop configuration for comparison.
I want to know a method to remove a hadoop slave from DFS (not running datanode daemon anymore) but not from Mapred (keep running tasktracker), or vice-versa.
AFAIK, there is a single slave file for such hadoop nodes and not separate slave files for DFS and Mapred.
Currently, I am trying to start both DFS and Mapred on the slave node , and then killing datanode on the slave. But it takes a while to put that node in to 'dead nodes' on HDFS GUI. Any parameter can be tuned to make this timeout quicker ?
Thankssss
Try using dfs.hosts and dfs.hosts.exclude in the hdfs-site.xml, mapred.hosts and mapred.hosts.exclude in mapred-site.xml. These are for allowing/excluding hosts to connect to the NameNode and the JobTracker.
Once the list of nodes in the files has been updated appropriately, the NameNode and the JobTracker have to be refreshed using the hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes command respectively.
Instead of using slaves file to start all processes on your cluster, you can start only required daemons on each machine if you have few nodes.

Resources