Hadoop copying directory with contents - hadoop

I am new to Hadoop and am doing a project for university. I have a folder called 'docs' that I have several text files in. When I look at it locally, I can see the various text files are there. When I copy it to Hadoop, the directory is empty.
The screenshot below shows the files in the local directory.
I use copyFromLocal to copy the directory to HDFS. As far as I can tell it should be copying the contents too?
hadoop fs -copyFromLocal ./docs
This screenshot showing the directory is empty (or is it?)

All directories (lines starting with d) show as having 0 size with the HDFS ls command. If you do hadoop fs -ls docs then you'll see all of the files and their sizes.

Related

Hadoop/HDFS: put command fails - No such file or directory

I do not know why I cannot move a file from one directory to another. I can view the content of the file but I cannot move the same file into another directory.
WORKS FINE:
hadoop fs -cat /user/hadoopusr/project-data.txt
DOES NOT WORK:
hadoop fs -put /user/hadoopusr/project-data.txt /user/hadoopusr/Projects/MarketAnalysis
I got a No such file or directory error message. What is wrong? Please help. Thank you!
As we can read from here about the -put command:
This command is used to copy files from the local file system to the
HDFS filesystem. This command is similar to –copyFromLocal command.
This command will not work if the file already exists unless the –f
flag is given to the command. This overwrites the destination if the
file already exists before the copy
Which makes it clear why it doesn't work and throws the No such file or directory message. It's because it can't find any file with the name project-data.txt on your current directory of your local filesystem.
You plan on moving a file between directories inside the HDFS, so instead of using the -put parameter for moving, we can simply use the -mv parameter as we would in our local filesystem!
Tested it out on my own HDFS as follows:
Create the source and destination directories in HDFS
hadoop fs -mkdir source_dir dest_dir
Create an empty (for the sake of the test) file under the source directory
hadoop fs -touch source_dir/test.txt
Move the empty file to the destination directory
hadoop fs -mv source_dir/test.txt dest_dir/test.txt
(Notice how the /user/username/part of the path for the file and the destination directory is not needed, because HDFS is by default on this directory where you are working. You also should note that you have to write the full path of the destination with name of the file included.)
You can see below with the HDFS browser that the empty text file has been moved to the destination directory:

hadoop dfs -ls gives list of folders not present in the local file system

I have just installed a standalone cluster on my laptop. On running the hdfs dfs -ls command in a terminal, I get to see a list of folders. Upon searching the local file system through the File Explorer window I couldn't locate those files in my file system.
rishirich#localhost:/$ hdfs dfs -ls
Found 1 items
drwxr-xr-x - rishirich supergroup 0 2017-11-09 03:32 user
This folder named 'user' was nowhere to be seen on the local filesystem. Is it that the folder is hidden?
If so, then what terminal command should I use in order to find this folder?
If not, then how do I locate it?
You can't see the hdfs directory structure in graphical view to view it you have to use your terminal only.
hdfs dfs -ls /
and to see local file directory structure in the terminal you should try
ls <path>
cd <path>
cd use to change the directory in terminal.
In your installation of Hadoop, you had set up a core-site.xml file to establish the fs.defaultFS property. If you did not make this file://, it will not be the local filesystem.
If you set it to hdfs://, then the default locations for the namenode and datanode directories are in your local /tmp folder.
Note - those are HDFS blocks, not whole, readable files stored in HDFS.
If you want to list your local filesystem, you're welcome to use hadoop fs -ls file://

hadoop discp issue while copying singe file

(Note: I need to use distcp to get parallelism)
I have 2 files in /user/bhavesh folder
I have 1 file in /user/bhavesh1 folder
Copying 2 files from /user/bhavesh to /user/uday folder (This work fine)
This create /user/uday folder
Copying 1 file from /user/bhavesh1 to /user/uday1 folder if creates file instead of folder
What i need is if there is one file /user/bhavesh1/emp1.csv i need is it should create /user/uday1/emp1.csv [uday1 should form as directory] Any suggestion or help is highly appreciated.
In unix systems, when u copy a single file by giving destination directory name ending with /user/uday1/, destination directory will be created, however hadoop fs -cp command will fail if destination directory is missing.
When it comes it hdfs distcp, file/dir names ending with / will be ignored if it's a single file. One workaround is to create the destination directory before executing distcp command. you may add -p option in -mkdir to avoid directory already exists error.
hadoop fs -mkdir -p /user/uday1 ; hadoop distcp /user/bhavesh1/emp*.csv /user/uday1/
this works for both single file and multiple files in the source directory.

Shell Script to copy directories from hdfs to local

i'm looking for a shell script which should copy directory (with files under) from HDFS to local system.
I think it is pointless to write a whole script, when you only need to write one command into terminal.
With
hadoop fs -ls /myDir/path
you can verify name and path to directory, which you want to copy and write
hadoop fs -get /myDir/path
to get file into local. You also can specify destination directory by
hadoop fs -get /myDir/path /myLocal/destDir
It copies while directory (with subdirectories) to your working directory or to specified directory. You also can get file by file (dir by dir) with
hadoop fs -get /myDir/path/*
or specific dirs or files in one command
hadoop fs -get /myDir/path/dir1 /myDir/path/dir2 .
to your directory. I tried it on my Hadoop VM and it works fine.

Reading files from hdfs vs local directory

I am a beginner in hadoop. I have two doubts
1) how to access files stored in the hdfs? Is it same as using a FileReader in java.io and giving the local path or is it something else?
2) i have created a folder where i have copied the file to be stored in hdfs and the jar file of the mapreduce program. When I run the command in any directory
${HADOOP_HOME}/bin/hadoop dfs -ls
it just shows me all the files in the current dir. So does that mean all the files got added without me explicitly adding it?
Yes, it's pretty much the same. Read this post to read files from HDFS.
You should keep in mind that HDFS is different than your local file system. With hadoop dfs you access the HDFS, not the local file system. So, hadoop dfs -ls /path/in/HDFS shows you the contents of the /path/in/HDFS directory, not the local one. That's why it's the same, no matter where you run it from.
If you want to "upload" / "download" files to/from HDFS you should use the commads:
hadoop dfs -copyFromLocal /local/path /path/in/HDFS and
hadoop dfs -copyToLocal /path/in/HDFS /local/path, respectively.

Resources