hadoop discp issue while copying singe file - hadoop

(Note: I need to use distcp to get parallelism)
I have 2 files in /user/bhavesh folder
I have 1 file in /user/bhavesh1 folder
Copying 2 files from /user/bhavesh to /user/uday folder (This work fine)
This create /user/uday folder
Copying 1 file from /user/bhavesh1 to /user/uday1 folder if creates file instead of folder
What i need is if there is one file /user/bhavesh1/emp1.csv i need is it should create /user/uday1/emp1.csv [uday1 should form as directory] Any suggestion or help is highly appreciated.

In unix systems, when u copy a single file by giving destination directory name ending with /user/uday1/, destination directory will be created, however hadoop fs -cp command will fail if destination directory is missing.
When it comes it hdfs distcp, file/dir names ending with / will be ignored if it's a single file. One workaround is to create the destination directory before executing distcp command. you may add -p option in -mkdir to avoid directory already exists error.
hadoop fs -mkdir -p /user/uday1 ; hadoop distcp /user/bhavesh1/emp*.csv /user/uday1/
this works for both single file and multiple files in the source directory.

Related

Hadoop copying directory with contents

I am new to Hadoop and am doing a project for university. I have a folder called 'docs' that I have several text files in. When I look at it locally, I can see the various text files are there. When I copy it to Hadoop, the directory is empty.
The screenshot below shows the files in the local directory.
I use copyFromLocal to copy the directory to HDFS. As far as I can tell it should be copying the contents too?
hadoop fs -copyFromLocal ./docs
This screenshot showing the directory is empty (or is it?)
All directories (lines starting with d) show as having 0 size with the HDFS ls command. If you do hadoop fs -ls docs then you'll see all of the files and their sizes.

Hadoop/HDFS: put command fails - No such file or directory

I do not know why I cannot move a file from one directory to another. I can view the content of the file but I cannot move the same file into another directory.
WORKS FINE:
hadoop fs -cat /user/hadoopusr/project-data.txt
DOES NOT WORK:
hadoop fs -put /user/hadoopusr/project-data.txt /user/hadoopusr/Projects/MarketAnalysis
I got a No such file or directory error message. What is wrong? Please help. Thank you!
As we can read from here about the -put command:
This command is used to copy files from the local file system to the
HDFS filesystem. This command is similar to –copyFromLocal command.
This command will not work if the file already exists unless the –f
flag is given to the command. This overwrites the destination if the
file already exists before the copy
Which makes it clear why it doesn't work and throws the No such file or directory message. It's because it can't find any file with the name project-data.txt on your current directory of your local filesystem.
You plan on moving a file between directories inside the HDFS, so instead of using the -put parameter for moving, we can simply use the -mv parameter as we would in our local filesystem!
Tested it out on my own HDFS as follows:
Create the source and destination directories in HDFS
hadoop fs -mkdir source_dir dest_dir
Create an empty (for the sake of the test) file under the source directory
hadoop fs -touch source_dir/test.txt
Move the empty file to the destination directory
hadoop fs -mv source_dir/test.txt dest_dir/test.txt
(Notice how the /user/username/part of the path for the file and the destination directory is not needed, because HDFS is by default on this directory where you are working. You also should note that you have to write the full path of the destination with name of the file included.)
You can see below with the HDFS browser that the empty text file has been moved to the destination directory:

`No such file or directory` while copying from local filesystem to hadoop

I have installed local single node Hadoop on Windows 10 and it appatently works.
Unfortunately, when I am trying to copy files to Hadoop from local filesystem, it swears:
λ hadoop fs -copyFromLocal ../my_models/*.model hdfs://localhost/tmp
copyFromLocal: `../my_models/aaa.model': No such file or directory
copyFromLocal: `../my_models/bbb.model': No such file or directory
copyFromLocal: `../my_models/ccc.model': No such file or directory
copyFromLocal: `../my_models/ddd.model': No such file or directory
As you see, it lists all model files in local directory, which proves it sees them. Unfortunately, it doesn't copy them.
Simultaneously I can create directories
λ hadoop fs -mkdir -p hdfs://localhost/tmp/
λ hadoop fs -ls hdfs://localhost/
Found 1 items
drwxr-xr-x - dims supergroup 0 2018-04-22 22:16 hdfs://localhost/tmp
What can be the problem?
You're probably getting this error because :
You can't use an asterisk(*) to specify the file format with the files you want to copy. You can only mention the path to the file or dir.(In your case this is the possible cause)
The folder you're copying from LFS is in the root dir. or some other dir. which HDFS user can't access.
Try using cd command as HDFS user to the same folder where your files exist, if the permission denied error persist then you must copy the files to /tmp folder.
Why dont you use a for loop for this something like below
for file in aaa.model bbb.model ccc.model; do hadoop fs -copyFromLocal ../my_models/$file hdfs://localhost/tmp; done

How does getMerge work in Hadoop?

I would like to know, how does the getMerge command work in OS/HDFS level. Will it copy each and every byte/blocks from one file to another file,or just a simple file descriptor change? How costliest operation is it?
getmerge
Usage: hadoop fs -getmerge <src> <localdst> [addnl]
Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file.
So, to answer your question,
Will it copy each and every byte/blocks from one file to another file
Yes, and no. It will find every HDFS block containing the files in the given source directory and concatenate them together into a single file on your local filesystem.
a simple file descriptor change
Not sure what you mean by that. getmerge doesn't change any file descriptors; it is just reading data from HDFS to your local filesystem.
How costliest operation is it?
Expect it to be as costly as manually cat-ing all the files in an HDFS directory. The same operation for
hadoop fs -getmerge /tmp/ /home/user/myfile
Could be achieved by doing
hadoop fs -cat /tmp/* > /home/user/myfile
The costly operation being the fetching of many file pointers and transferring those records over the network to your local disk.

Shell Script to copy directories from hdfs to local

i'm looking for a shell script which should copy directory (with files under) from HDFS to local system.
I think it is pointless to write a whole script, when you only need to write one command into terminal.
With
hadoop fs -ls /myDir/path
you can verify name and path to directory, which you want to copy and write
hadoop fs -get /myDir/path
to get file into local. You also can specify destination directory by
hadoop fs -get /myDir/path /myLocal/destDir
It copies while directory (with subdirectories) to your working directory or to specified directory. You also can get file by file (dir by dir) with
hadoop fs -get /myDir/path/*
or specific dirs or files in one command
hadoop fs -get /myDir/path/dir1 /myDir/path/dir2 .
to your directory. I tried it on my Hadoop VM and it works fine.

Resources