Shell Script to copy directories from hdfs to local - hadoop

i'm looking for a shell script which should copy directory (with files under) from HDFS to local system.

I think it is pointless to write a whole script, when you only need to write one command into terminal.
With
hadoop fs -ls /myDir/path
you can verify name and path to directory, which you want to copy and write
hadoop fs -get /myDir/path
to get file into local. You also can specify destination directory by
hadoop fs -get /myDir/path /myLocal/destDir
It copies while directory (with subdirectories) to your working directory or to specified directory. You also can get file by file (dir by dir) with
hadoop fs -get /myDir/path/*
or specific dirs or files in one command
hadoop fs -get /myDir/path/dir1 /myDir/path/dir2 .
to your directory. I tried it on my Hadoop VM and it works fine.

Related

Hadoop/HDFS: put command fails - No such file or directory

I do not know why I cannot move a file from one directory to another. I can view the content of the file but I cannot move the same file into another directory.
WORKS FINE:
hadoop fs -cat /user/hadoopusr/project-data.txt
DOES NOT WORK:
hadoop fs -put /user/hadoopusr/project-data.txt /user/hadoopusr/Projects/MarketAnalysis
I got a No such file or directory error message. What is wrong? Please help. Thank you!
As we can read from here about the -put command:
This command is used to copy files from the local file system to the
HDFS filesystem. This command is similar to –copyFromLocal command.
This command will not work if the file already exists unless the –f
flag is given to the command. This overwrites the destination if the
file already exists before the copy
Which makes it clear why it doesn't work and throws the No such file or directory message. It's because it can't find any file with the name project-data.txt on your current directory of your local filesystem.
You plan on moving a file between directories inside the HDFS, so instead of using the -put parameter for moving, we can simply use the -mv parameter as we would in our local filesystem!
Tested it out on my own HDFS as follows:
Create the source and destination directories in HDFS
hadoop fs -mkdir source_dir dest_dir
Create an empty (for the sake of the test) file under the source directory
hadoop fs -touch source_dir/test.txt
Move the empty file to the destination directory
hadoop fs -mv source_dir/test.txt dest_dir/test.txt
(Notice how the /user/username/part of the path for the file and the destination directory is not needed, because HDFS is by default on this directory where you are working. You also should note that you have to write the full path of the destination with name of the file included.)
You can see below with the HDFS browser that the empty text file has been moved to the destination directory:

`No such file or directory` while copying from local filesystem to hadoop

I have installed local single node Hadoop on Windows 10 and it appatently works.
Unfortunately, when I am trying to copy files to Hadoop from local filesystem, it swears:
λ hadoop fs -copyFromLocal ../my_models/*.model hdfs://localhost/tmp
copyFromLocal: `../my_models/aaa.model': No such file or directory
copyFromLocal: `../my_models/bbb.model': No such file or directory
copyFromLocal: `../my_models/ccc.model': No such file or directory
copyFromLocal: `../my_models/ddd.model': No such file or directory
As you see, it lists all model files in local directory, which proves it sees them. Unfortunately, it doesn't copy them.
Simultaneously I can create directories
λ hadoop fs -mkdir -p hdfs://localhost/tmp/
λ hadoop fs -ls hdfs://localhost/
Found 1 items
drwxr-xr-x - dims supergroup 0 2018-04-22 22:16 hdfs://localhost/tmp
What can be the problem?
You're probably getting this error because :
You can't use an asterisk(*) to specify the file format with the files you want to copy. You can only mention the path to the file or dir.(In your case this is the possible cause)
The folder you're copying from LFS is in the root dir. or some other dir. which HDFS user can't access.
Try using cd command as HDFS user to the same folder where your files exist, if the permission denied error persist then you must copy the files to /tmp folder.
Why dont you use a for loop for this something like below
for file in aaa.model bbb.model ccc.model; do hadoop fs -copyFromLocal ../my_models/$file hdfs://localhost/tmp; done

Can't put file from local directory to HDFS

I have created a file with name "file.txt" in the local directory , now I want to put it in HDFS by using :-
]$ hadoop fs -put file.txt abcd
I am getting a response like
put: 'abcd': no such file or directory
I have never worked on Linux. Please help me out - How do I put the file "file.txt" into HDFS?
If you don't specify an absolute path in hadoop (HDFS or wathever other file system used), it will pre-append your user directory to create an absloute path.
By default, in HDFS you default folder should be /user/user name.
Then in your case you are trying to create the file /user/<user name>/abcd and put inside it the content of your local file.txt.
The user name is your operative system user, in your local machine. You can get it using the whoami command.
The the problem is that your user folder doesn't exist in HDFS, and you need to create it.
BTW, according with hadoop documentation, the correct command to work with HDFS is hdfs dfs instead hadoop fs (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html). But by now both should work.
Then:
If you don't know your user name in your local operative system. Open a terminal and run the whoami command.
Execute the follow command, replacing your user name.
hdfs dfs -mkdir -p /user/<user name>
And then you should be able to execute your PUT command.
NOTE: The -p parameter is to create the /user folder if it doesn't exist.

How to copy file from HDFS to the local file system

How to copy file from HDFS to the local file system . There is no physical location of a file under the file , not even directory . how can i moved them to my local for further validations.i am tried through winscp .
bin/hadoop fs -get /hdfs/source/path /localfs/destination/path
bin/hadoop fs -copyToLocal /hdfs/source/path /localfs/destination/path
Point your web browser to HDFS WEBUI(namenode_machine:50070), browse to the file you intend to copy, scroll down the page and click on download the file.
In Hadoop 2.0,
hdfs dfs -copyToLocal <hdfs_input_file_path> <output_path>
where,
hdfs_input_file_path maybe obtained from http://<<name_node_ip>>:50070/explorer.html
output_path is the local path of the file, where the file is to be copied to.
you may also use get in place of copyToLocal.
In order to copy files from HDFS to the local file system the following command could be run:
hadoop dfs -copyToLocal <input> <output>
<input>: the HDFS directory path (e.g /mydata) that you want to copy
<output>: the destination directory path (e.g. ~/Documents)
Update: Hadoop is deprecated in Hadoop 3
use hdfs dfs -copyToLocal <input> <output>
you can accomplish in both these ways.
1.hadoop fs -get <HDFS file path> <Local system directory path>
2.hadoop fs -copyToLocal <HDFS file path> <Local system directory path>
Ex:
My files are located in /sourcedata/mydata.txt
I want to copy file to Local file system in this path /user/ravi/mydata
hadoop fs -get /sourcedata/mydata.txt /user/ravi/mydata/
If your source "file" is split up among multiple files (maybe as the result of map-reduce) that live in the same directory tree, you can copy that to a local file with:
hadoop fs -getmerge /hdfs/source/dir_root/ local/destination
This worked for me on my VM instance of Ubuntu.
hdfs dfs -copyToLocal [hadoop directory] [local directory]
1.- Remember the name you gave to the file and instead of using hdfs dfs -put. Use 'get' instead. See below.
$hdfs dfs -get /output-fileFolderName-In-hdfs
if you are using docker you have to do the following steps:
copy the file from hdfs to namenode (hadoop fs -get output/part-r-00000 /out_text).
"/out_text" will be stored on the namenode.
copy the file from namenode to local disk by (docker cp namenode:/out_text output.txt)
output.txt will be there on your current working directory
bin/hadoop fs -put /localfs/destination/path /hdfs/source/path

Using multiple local folders as source in hadoop mapreduce job

I have data in multiple local folders i.e. /usr/bigboss/data1, /usr/bigboss/data2 and many more folders. I want to use all of these folders as input source for my MapReduce command and store the result at HDFS. I can not find a working command to use Hadoop Grep example to do it.
The data will need to reside in HDFS for you to process it with the grep example. You can upload the folders to HDFS using the -put FsShell command:
hadoop fs -mkdir bigboss
hadoop fs -put /usr/bigboss/data* bigboss
Which will create a folder in the current user HDFS directory, and upload each of the data directories to it
Now you should be able to run the grep example over the data

Resources