Hadoop/HDFS: put command fails - No such file or directory - hadoop

I do not know why I cannot move a file from one directory to another. I can view the content of the file but I cannot move the same file into another directory.
WORKS FINE:
hadoop fs -cat /user/hadoopusr/project-data.txt
DOES NOT WORK:
hadoop fs -put /user/hadoopusr/project-data.txt /user/hadoopusr/Projects/MarketAnalysis
I got a No such file or directory error message. What is wrong? Please help. Thank you!

As we can read from here about the -put command:
This command is used to copy files from the local file system to the
HDFS filesystem. This command is similar to –copyFromLocal command.
This command will not work if the file already exists unless the –f
flag is given to the command. This overwrites the destination if the
file already exists before the copy
Which makes it clear why it doesn't work and throws the No such file or directory message. It's because it can't find any file with the name project-data.txt on your current directory of your local filesystem.
You plan on moving a file between directories inside the HDFS, so instead of using the -put parameter for moving, we can simply use the -mv parameter as we would in our local filesystem!
Tested it out on my own HDFS as follows:
Create the source and destination directories in HDFS
hadoop fs -mkdir source_dir dest_dir
Create an empty (for the sake of the test) file under the source directory
hadoop fs -touch source_dir/test.txt
Move the empty file to the destination directory
hadoop fs -mv source_dir/test.txt dest_dir/test.txt
(Notice how the /user/username/part of the path for the file and the destination directory is not needed, because HDFS is by default on this directory where you are working. You also should note that you have to write the full path of the destination with name of the file included.)
You can see below with the HDFS browser that the empty text file has been moved to the destination directory:

Related

HDFS put: no such file or directory even though the file is there

I am trying to upload a file in HDFS with:
sudo -u hdfs hdfs dfs -put /home/hive/warehouse/sample.csv hdfs://[ip_redacted]:9000/data
I can confirm that HDFS works, as I managed to create the /data directory just fine.
Even giving the full path to the .csv file gives the same error:
put: `/home/hive/warehouse/sample.csv': No such file or directory
Why is it giving this error?
I encountered the problem, too.
Because user hdfs has no permission to access one of the file's ancestry directories, so it gave the error No such file or directory.
As crystyxn commentted, using environment variable HADOOP_USER_NAME instead of sudo -u hdfs worked.
Is the csv file in your local system or in HDFS? You can use -put command (or the -copyFromLocal command) ONLY to move a LOCAL file into the distributed file system.

hadoop discp issue while copying singe file

(Note: I need to use distcp to get parallelism)
I have 2 files in /user/bhavesh folder
I have 1 file in /user/bhavesh1 folder
Copying 2 files from /user/bhavesh to /user/uday folder (This work fine)
This create /user/uday folder
Copying 1 file from /user/bhavesh1 to /user/uday1 folder if creates file instead of folder
What i need is if there is one file /user/bhavesh1/emp1.csv i need is it should create /user/uday1/emp1.csv [uday1 should form as directory] Any suggestion or help is highly appreciated.
In unix systems, when u copy a single file by giving destination directory name ending with /user/uday1/, destination directory will be created, however hadoop fs -cp command will fail if destination directory is missing.
When it comes it hdfs distcp, file/dir names ending with / will be ignored if it's a single file. One workaround is to create the destination directory before executing distcp command. you may add -p option in -mkdir to avoid directory already exists error.
hadoop fs -mkdir -p /user/uday1 ; hadoop distcp /user/bhavesh1/emp*.csv /user/uday1/
this works for both single file and multiple files in the source directory.

Shell Script to copy directories from hdfs to local

i'm looking for a shell script which should copy directory (with files under) from HDFS to local system.
I think it is pointless to write a whole script, when you only need to write one command into terminal.
With
hadoop fs -ls /myDir/path
you can verify name and path to directory, which you want to copy and write
hadoop fs -get /myDir/path
to get file into local. You also can specify destination directory by
hadoop fs -get /myDir/path /myLocal/destDir
It copies while directory (with subdirectories) to your working directory or to specified directory. You also can get file by file (dir by dir) with
hadoop fs -get /myDir/path/*
or specific dirs or files in one command
hadoop fs -get /myDir/path/dir1 /myDir/path/dir2 .
to your directory. I tried it on my Hadoop VM and it works fine.

Can't put file from local directory to HDFS

I have created a file with name "file.txt" in the local directory , now I want to put it in HDFS by using :-
]$ hadoop fs -put file.txt abcd
I am getting a response like
put: 'abcd': no such file or directory
I have never worked on Linux. Please help me out - How do I put the file "file.txt" into HDFS?
If you don't specify an absolute path in hadoop (HDFS or wathever other file system used), it will pre-append your user directory to create an absloute path.
By default, in HDFS you default folder should be /user/user name.
Then in your case you are trying to create the file /user/<user name>/abcd and put inside it the content of your local file.txt.
The user name is your operative system user, in your local machine. You can get it using the whoami command.
The the problem is that your user folder doesn't exist in HDFS, and you need to create it.
BTW, according with hadoop documentation, the correct command to work with HDFS is hdfs dfs instead hadoop fs (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html). But by now both should work.
Then:
If you don't know your user name in your local operative system. Open a terminal and run the whoami command.
Execute the follow command, replacing your user name.
hdfs dfs -mkdir -p /user/<user name>
And then you should be able to execute your PUT command.
NOTE: The -p parameter is to create the /user folder if it doesn't exist.

getmerge command in hadoop datacopy

My aim is to read all the files that starts with "trans" in a directory and convert them into a single file and load that single file into HDFS location
my source directory is /user/cloudera/inputfiles/
Assume that inside that above directory , there are lot of file , but i need all the files that start with "trans"
my destination directory is /user/cloudera/transfiles/
So i tried this command below
hadoop dfs - getmerge /user/cloudera/inputfiles/trans* /user/cloudera/transfiles/records.txt
but the above command is not working .
If i try the below command then it works
hadoop dfs - getmerge /user/cloudera/inputfiles /user/cloudera/transfiles/records.txt
Any suggestion on how do i merge some files from a hdfs location and store that merged single file in another hdfs location
Below is the usage of getmerge command:
Usage: hdfs dfs -getmerge <src> <localdst> [addnl]
Takes a source directory and a destination file as input and
concatenates files in src into the destination local file.
Optionally addnl can be set to enable adding a newline character at the
end of each file.
It expects directory as first parameter.
you can try cat command like this:
hadoop dfs -cat /user/cloudera/inputfiles/trans* > /<local_fs_dir>/records.txt
hadoop dfs -copyFromLocal /<local_fs_dir>/records.txt /user/cloudera/transfiles/records.txt

Resources