I am new to Hadoop. I succesfully installed hadoop 2.6 in my Ubuntu 12.04 by follwing the below link.
Hadoop 2.6 Installation
All services are running. But when I try to load file from local to HDFS, but it not at all showing folders in HDFS like /user or /data
hduse#vijee-Lenovo-IdeaPad-S510p:~$ jps
4163 SecondaryNameNode
4374 ResourceManager
3783 DataNode
3447 NameNode
5048 RunJar
18538 Jps
4717 NodeManager
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop fs -ls hdfs:/
No output
If I run the above command: hadoop fs -ls hdfs:/, it is not showing any folder. I installed Pig as well and now I want to load data to Pig in mapreduce mode. In most of the websites they mentioned blindly URI in place of HDFS path. Please guide how to create folders and load data in the hdfs path.
If you are using plain vanilla hadoop, you will not see any directories. You have to create those.
You can start creating by running hadoop fs -mkdir /user
Related
I would like to see if the hdfs file system for Hadoop is working properly. I know that jps lists the daemons that are running, but I don't actually know which daemons to look for.
I ran the following commands:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
Only namenode, resourcemanager, and nodemanager appeared when I entered jps.
Which daemons are supposed to be running in order for hdfs/Hadoop to function? Also, what could you do to fix hdfs if it is not running?
Use any of the following approaches for to check your deamons status
JPS command would list all active deamons
the below is the most appropriate
hadoop dfsadmin -report
This would list down details of datanodes which is basically in a sense your HDFS
cat any file available in hdfs path.
So, I spent two weeks validating my setup (it was fine) , finally found this command:
sudo -u hdfs jps
Initially my simple JPS command was showing only one process, but Hadoop 2.6 under Ubuntu LTS 14.04 was up. I was using 'Sudo' to run the startup scripts.
Here is the startup that work with JPS listing multiple processes:
sudo su hduser
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
I am very new to hadoop. I am referring "hadoop for dummies" book.
I have setup a vm with following specs
hadoop version 2.0.6-alpha
bigtop
os centos
problem is while running any hdfs file system command I am getting following error
example command : hadoop hdfs dfs -ls
error : Could not find or load main class hdfs
Please advice
Regards,
Try running:
hadoop fs -ls
or
hdfs dfs -ls
what do they return?
fs and dfs are the same commands.
Difference between `hadoop dfs` and `hadoop fs`
Remove either hadoop or hdfs and the command should run.
I see the below step in CDH4 MRV1 installation instructions at:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_2.html
Step 4: Create the MapReduce system directories:
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
But I am not able to figure out where this directories are used. Are these well known locations or these directory paths are configured in some config files. If so can I use any structure I want?
These directories will be used by MapReduce. They match the value specified in the default configuration. I would not change them unless you have a good reason. But in the end, sure, you can configure a different location once you're up and running.
I installed my test cluster using Cloudera Manager free.
I can only browse the filesystem from the main NameNode. When running hadoop dfs -ls only shows the local folder.
JPS shows the Jps, TaskTracker, DataNode on the nodes.
MapReduce tasks/jobs run fine on all the nodes as a cluster.
With my custom setup Hadoop cluster (without Cloudera), I can easily browse and manipulate the hdfs filesystem (eg. I can run hadoop dfs -mkdir test1 on all the nodes - but only on the NameNode in CDH4)
why is this?
Try using the command ./bin/hadoop fs -ls / for HDFS browsing.
I've set up a sigle-node Hadoop configuration running via cygwin under Win7. After starting Hadoop bybin/start-all.sh I run bin/hadoop dfs -ls which returns me a list of files in my hadoop directory. Then I run bin/hadoop datanode -formatbin/hadoop namenode -format but -ls still returns me the contents of my hadoop directory. As far as I understand it should return nothing(empty folder). What am I doing wrong?
Did you edit the core-site.xml and mapred-site.xml under conf folder ?
It seems like your hadoop cluster is in local mode.
I know this question is quite old, but directory structure in Hadoop has changed a bit (version 2.5 )
Jeroen's current version would be.
hdfs dfs -ls hdfs://localhost:9000/users/smalldata
Also Just for information - use of start-all.sh and stop-all.sh has been deprecated, instead one should use start-dfs.sh and start-yarn.sh
I had the same problem and solved it by explicitly specifying the URL to the NameNode.
To list all directories in the root of your hdfs space do the following:
./bin/hadoop dfs -ls hdfs://<ip-of-your-server>:9000/
The documentation says something about a default hdfs point in the configuration, but I cannot find it. If someone knows what they mean please enlighten us.
This is where I got the info: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#Overview
Or you could just do:
Run stop-all.sh.
Remove dfs data and name directories
Namenode -format
Run start-all.sh