Need explanation on Hadoop file system - hadoop

For the following command,
hadoop fs -put foo.txt bar.txt
After the operation succeeds, where will bar.txt locate in my local hard drive, given
a singe node setup?
pseudo distributed setup?
Will bar.txt still get replicated 3 times for backup?

bar.txt will be placed in the current hadoop user home directory as
/user/<hadoop-user> as per the following code
#Override
public Path getHomeDirectory() {
return makeQualified(new Path("/user/" + dfs.ugi.getShortUserName()));
}
Source here
If the cluster is single node, It only replicates one time even you set the dfs.replication to 3 because Hadoop will not save the same block on same node more than once.
pseudo distributed mode will have all the hadoop daemons running on the same machine. It's nothing but single node cluster.
It you set dfs.replication to 3, Hadoop just gives you warning only.
Hope it helps!

the above fs command tries to put the file foo.txt as bar.txt in current hdfs. The path of the hdfs is determined by the current user the operation is performing. This is because you are not providing the absolute path to the destination.
If you have /user as the home directory configured in hdfs, it will take the path of /user/ and places the file there.
Also, if there is no folder in hdfs that corresponds to the current user it will fail stating file doesn't exists.
e.g. Current user running is "testusr1". and the above command places the file under "/users/testusr1" .
You can verify this by executing a command #hadoop fs -ls /user/
AFAIK this will be should be same for Pseudo or single node setup.
[root#sandbox ~]# hadoop fs -ls /user
Found 11 items
drwx------ - root hdfs 0 2015-04-13 03:59 /user/root
.
.
.
.
.
drwxr-xr-x - root hdfs 0 2015-04-13 04:18 /user/testusr1
[root#sandbox ~]#
[root#sandbox ~]# su - testusr1
[testusr1#sandbox ~]$ whoami
testusr1
[testusr1#sandbox ~]$ pwd
/home/testusr1
[testusr1#sandbox ~]$ ll
total 7
-rw-rw-r-- 1 testusr1 testusr1 49 2015-04-13 04:17 foo-testusr2.txt
[testusr1#sandbox ~]$ hadoop fs -put foo-testusr2.txt bar-testusr2.txt
And for the replication factor, you can check with he help of basic hadoop fs -ls command.
[testusr1#sandbox ~]$exit
logout
[root#sandbox ~]# hdfs dfs -ls /user/testusr1
Found 1 items
-rw-r--r-- 1 testusr1 hdfs 49 2015-04-13 04:18 /user/testusr1/bar-testusr2.txt
[root#sandbox ~]#
In the above sample output, you can see the number 1 right after the file permissions. It is reflecting as 1 and it is as per my hdfs configurations.

Related

How to navigate directories in Hadoop HDFS

I would like to navigate in HDFS
First i looked on the directories in "root" HDFS
[cloudera#localhost ~]$ sudo -u hdfs hadoop fs -ls hdfs:/
Found 5 items
drwxr-xr-x - hbase hbase 0 2015-10-10 07:03 hdfs:///hbase
drwxr-xr-x - solr solr 0 2014-06-01 16:16 hdfs:///solr
drwxrwxrwx - hdfs supergroup 0 2015-10-08 11:45 hdfs:///tmp
drwxr-xr-x - hdfs supergroup 0 2015-04-13 08:26 hdfs:///user
drwxr-xr-x - hdfs supergroup 0 2014-06-01 16:15 hdfs:///var
then i tried entering one of them
[cloudera#localhost ~]$ sudo -u hdfs hadoop -cd hdfs:///hbase
Error: No command named `-cd' was found. Perhaps you meant `hadoop cd'
trying also 'hadoop cd' do not work
[cloudera#localhost ~]$ sudo -u hdfs hadoop cd hdfs:///hbase
Exception in thread "main" java.lang.NoClassDefFoundError: cd
Caused by: java.lang.ClassNotFoundException: cd
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: cd. Program will exit.
Please don't offer to use -ls -R (recursive) to show all files .
I want to be able navigate using commands like cd
There is no cd (change directory) command in hdfs file system. You can only list the directories and use them for reaching the next directory.
You have to navigate manually by providing the complete path using the ls command.
hdfs dfs -ls /user/username/app1/subdir/
hadoop fs –ls /user/scott/
To see the list of values in the path, we have to give the full path. Other than that navigation is not possible.
You can make use of the UI to navigate http://<hostname of hdfs>:9870/explorer.html#/tmp or you can login to CDH UI, then click on the NameNode URL location.
Guidline for cloudera psudo mode distribution code
First use the
hadoop fs -ls
command
Then see the directory let suppose there is folder of output
So use this command to see inside ouput folder
hadoop fs -ls ouput

Confusion on HDFS 'pwd' equivalents

First, I have read this post:Is there an equivalent to `pwd` in hdfs?. It says there is no such 'pwd' in HDFS.
However, as I progressed with the instructions of Hadoop: Setting up a Single Node Cluster, I failed on this command:
$ bin/hdfs dfs -put etc/hadoop input
put: 'input': No such file or directory
It's weird that I succeed on this command for the first time I went through the instructions, but failed for the second time. It's also weird that I succeed on this command on my friends computer, which has the same system (Ubuntu 14.04) and hadoop version (2.7.1) as mine.
Can anyone explain what happened here? Is there some 'pwd' in HDFS after all?
Firstly, You are trying to run the command $ bin/hdfs dfs -put etc/hadoop input with user that doesn't exist in the VM/HDFS
Let me clearly explain you with the following example in HDP VM
[root#sandbox hadoop-hdfs-client]# bin/hdfs dfs -put /etc/hadoop input
put: `input': No such file or directory
Here I executed the command with root user and it didn't exist in the HDP VM. Check in the following command to list the users
[root#sandbox hadoop-hdfs-client]# hadoop fs -ls /user
Found 8 items
drwxrwx--- - ambari-qa hdfs 0 2015-08-20 08:33 /user/ambari-qa
drwxr-xr-x - guest guest 0 2015-08-20 08:47 /user/guest
drwxr-xr-x - hcat hdfs 0 2015-08-20 08:36 /user/hcat
drwx------ - hive hdfs 0 2015-09-04 09:52 /user/hive
drwxr-xr-x - hue hue 0 2015-08-20 09:05 /user/hue
drwxrwxr-x - oozie hdfs 0 2015-08-20 08:37 /user/oozie
drwxr-xr-x - solr hdfs 0 2015-08-20 08:41 /user/solr
drwxrwxr-x - spark hdfs 0 2015-08-20 08:34 /user/spark
In HDFS, If you want to copy a file and not mentioning the absolute path for destination argument, it will consider home of the logged user and place your file there. Here root user not found.
Now let's switch to hive user and test
[root#sandbox hadoop-hdfs-client]# su hive
[hive#sandbox hadoop-hdfs-client]$ bin/hdfs dfs -put /etc/hadoop input
[hive#sandbox hadoop-hdfs-client]$ hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x - hive hdfs 0 2015-09-04 10:07 /user/hive/input
Yay..Successfully Copied..
Hope it helps..!!!
It means that we need to move input files to hdfs location.
Suppose you have input file named input.txt and we need to move to HDFS, then follow the below command.
Command: hdfs dfs -put /input_location /hdfs_location
In case no specific directory in HDFS
hdfs dfs -put /home/Desktop/input.txt /
In case specific directory in HDFS (Note: We need to create a directory before proceeding)
hdfs dfs -put /home/Desktop/input.txt /MR_input
After that you can run the examples
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
Here Input and output are the paths which should be in HDFS.
Hope this helps.

why does my hadoop command does not work?

I have my hadoop cluster set up with one master and two slaves.
when I type
hadoop fs -ls
ls: Cannot access .: No such file or directory.
But when I type the following:
hadoop fs -ls /
Found 1 items
drwxr-xr-x - Mike supergroup 0 2014-06-24 00:24 /usr
I get the same output both on master and slaves. why hadoop fs -ls does not work?
Thanks!
hadoop fs -ls
This tries to list current user's home directory on hdfs. since i think /user/{username} directory doesn't exist in your case hence you get the error,
hadoop fs -ls /
you are specifically telling it to list root directory which it does successfully as it exist.

How to open HDFS output file using gedit?

I have installed and executed an mapreduce program successfully in my system(Ubuntu 14.04).
I can see the output file as,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hadoopuser/MapReduceSample-output
Found 3 items
-rw-r--r-- 1 hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_SUCCESS
drwxr-xr-x - hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_logs
-rw-r--r-- 1 hadoopuser supergroup 880838 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/part-00000
And I can open it on terminal using following command,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hadoopuser/MapReduceSample-output/part-00000
I can see the output file on terminal, but I can't see the full result because my output has large amount of lines.
So I want to open it on gedit or nano.
Need Solution.
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
instead of looking for plugin. You can add jar files from $HADOOP_INSTALL/bin in eclipse and compiler issues must be gone.
You can't access HDFS file from local machine(system user), so that you can't open HDFS file using gedit.
To open in gedit you have to copy to local machine.
To do that, open terminal(Ctrl+Alt+T) and use copyToLocal a Hadoop Shell Command to copy the output file into local machine.
Do the following,
hadoopuser#arul-PC:/usr/local/hadoop$ sudo bin/hadoop dfs -copyToLocal /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/Downloads/
Now you can open the output file using gedit as follows,
$ sudo gedit /home/arul/Downloads/part-00000
Note :
My HDFS username is hadoopuser.
You can move a file from HDFS to local machine. The Hadoop Shell Command fs -mv allow to move different HDFS location.
For more Hadoop Shell Commands(click here).
Update (An another option to do the same from Y-Prithvi's post)
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
Eclipse Setup for Hadoop Development
this should help

Hadoop HDFS copy with wildcards?

I want to copy a certain pattern of files from within hdfs to another location in the same hdfs cluster. The dfs shell does not seem to be able to handle this:
hadoop dfs -cp /tables/weblog/server=jeckle/webapp.log.1* /tables/tinylog/server=jeckle/
No error is returned: yet also no files are copied.
You need use double quote with your path that contains wildcard, like this:
hdfs fs -cp "/path/to/foo*" /path/to/bar/
First of all, HDFS copy with wildcards is supported. Secondly, use of hadoop dfs is deprecated, you'd better use hadoop fs or hdfs dfs instead. If you're sure the operation was not successful (although it seems succeed), you could check out the log files of namenode to see what's wrong.
Interesting. This is what I get in my local VM running Hadoop 0.18.0. What version are you using? I can try on 1.2.1 also
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
hadoop-user#hadoop-desk:~$ hadoop dfs -cp /user/hadoop-user/input/*.txt /user/hadoop-user/testcopy/
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
Found 2 items
-rw-r--r-- 1 hadoop-user supergroup 79 2014-01-06 04:35 /user/hadoop-user/testcopy/HelloWorld.txt
-rw-r--r-- 1 hadoop-user supergroup 140 2014-01-06 04:35 /user/hadoop-user/testcopy/SampleData.txt
These both worked for me:
~]$ hadoop fs -cp -f /user/cloudera/Dec_17_2017/cric* /user/cloudera/Dec_17_2017/Dec_18
~]$ hadoop fs -cp -f "/user/cloudera/Dec_17_2017/cric*" /user/cloudera/Dec_17_2017/Dec_18
I thinks better way is that don't give double/single("/') quotes.
In case anybody wants to copy the files and folders from the current directory where the user is in the terminal, then
hdfs dfs -put ./

Resources