Hadoop HDFS copy with wildcards? - hadoop

I want to copy a certain pattern of files from within hdfs to another location in the same hdfs cluster. The dfs shell does not seem to be able to handle this:
hadoop dfs -cp /tables/weblog/server=jeckle/webapp.log.1* /tables/tinylog/server=jeckle/
No error is returned: yet also no files are copied.

You need use double quote with your path that contains wildcard, like this:
hdfs fs -cp "/path/to/foo*" /path/to/bar/

First of all, HDFS copy with wildcards is supported. Secondly, use of hadoop dfs is deprecated, you'd better use hadoop fs or hdfs dfs instead. If you're sure the operation was not successful (although it seems succeed), you could check out the log files of namenode to see what's wrong.

Interesting. This is what I get in my local VM running Hadoop 0.18.0. What version are you using? I can try on 1.2.1 also
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
hadoop-user#hadoop-desk:~$ hadoop dfs -cp /user/hadoop-user/input/*.txt /user/hadoop-user/testcopy/
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
Found 2 items
-rw-r--r-- 1 hadoop-user supergroup 79 2014-01-06 04:35 /user/hadoop-user/testcopy/HelloWorld.txt
-rw-r--r-- 1 hadoop-user supergroup 140 2014-01-06 04:35 /user/hadoop-user/testcopy/SampleData.txt

These both worked for me:
~]$ hadoop fs -cp -f /user/cloudera/Dec_17_2017/cric* /user/cloudera/Dec_17_2017/Dec_18
~]$ hadoop fs -cp -f "/user/cloudera/Dec_17_2017/cric*" /user/cloudera/Dec_17_2017/Dec_18
I thinks better way is that don't give double/single("/') quotes.

In case anybody wants to copy the files and folders from the current directory where the user is in the terminal, then
hdfs dfs -put ./

Related

Hadoop: dfs is deprecated but hdfs class not found

I'm new to Hadoop, and am trying to check what data is available in HDFS. However, the dfs command returns a response that indicates the class is deprecated, and that hdfs should be used:
-bash-4.2$ hadoop dfs -ls
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
ls: `.': No such file or directory
When I try the hdfs command, though, I get what appears to be a Java class lookup error:
-bash-4.2$ hadoop hdfs -ls
Error: Could not find or load main class hdfs
Is there something wrong with my Hadoop setup, or have others encountered this catch-22?
It is hadoop fs or hdfs dfs, then -ls
You can run hdfs dfs -ls / to check the root of HDFS, but you will get .: No such file or directory because the output of echo "hdfs:///user/$(whoami)" does not exist yet, and you need to make it using hadoop fs -mkdir -p hdfs:///user/$(whoami).
That command must be repeated for every user account that attempts to access their HDFS user directory

Confusion on HDFS 'pwd' equivalents

First, I have read this post:Is there an equivalent to `pwd` in hdfs?. It says there is no such 'pwd' in HDFS.
However, as I progressed with the instructions of Hadoop: Setting up a Single Node Cluster, I failed on this command:
$ bin/hdfs dfs -put etc/hadoop input
put: 'input': No such file or directory
It's weird that I succeed on this command for the first time I went through the instructions, but failed for the second time. It's also weird that I succeed on this command on my friends computer, which has the same system (Ubuntu 14.04) and hadoop version (2.7.1) as mine.
Can anyone explain what happened here? Is there some 'pwd' in HDFS after all?
Firstly, You are trying to run the command $ bin/hdfs dfs -put etc/hadoop input with user that doesn't exist in the VM/HDFS
Let me clearly explain you with the following example in HDP VM
[root#sandbox hadoop-hdfs-client]# bin/hdfs dfs -put /etc/hadoop input
put: `input': No such file or directory
Here I executed the command with root user and it didn't exist in the HDP VM. Check in the following command to list the users
[root#sandbox hadoop-hdfs-client]# hadoop fs -ls /user
Found 8 items
drwxrwx--- - ambari-qa hdfs 0 2015-08-20 08:33 /user/ambari-qa
drwxr-xr-x - guest guest 0 2015-08-20 08:47 /user/guest
drwxr-xr-x - hcat hdfs 0 2015-08-20 08:36 /user/hcat
drwx------ - hive hdfs 0 2015-09-04 09:52 /user/hive
drwxr-xr-x - hue hue 0 2015-08-20 09:05 /user/hue
drwxrwxr-x - oozie hdfs 0 2015-08-20 08:37 /user/oozie
drwxr-xr-x - solr hdfs 0 2015-08-20 08:41 /user/solr
drwxrwxr-x - spark hdfs 0 2015-08-20 08:34 /user/spark
In HDFS, If you want to copy a file and not mentioning the absolute path for destination argument, it will consider home of the logged user and place your file there. Here root user not found.
Now let's switch to hive user and test
[root#sandbox hadoop-hdfs-client]# su hive
[hive#sandbox hadoop-hdfs-client]$ bin/hdfs dfs -put /etc/hadoop input
[hive#sandbox hadoop-hdfs-client]$ hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x - hive hdfs 0 2015-09-04 10:07 /user/hive/input
Yay..Successfully Copied..
Hope it helps..!!!
It means that we need to move input files to hdfs location.
Suppose you have input file named input.txt and we need to move to HDFS, then follow the below command.
Command: hdfs dfs -put /input_location /hdfs_location
In case no specific directory in HDFS
hdfs dfs -put /home/Desktop/input.txt /
In case specific directory in HDFS (Note: We need to create a directory before proceeding)
hdfs dfs -put /home/Desktop/input.txt /MR_input
After that you can run the examples
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
Here Input and output are the paths which should be in HDFS.
Hope this helps.

why does my hadoop command does not work?

I have my hadoop cluster set up with one master and two slaves.
when I type
hadoop fs -ls
ls: Cannot access .: No such file or directory.
But when I type the following:
hadoop fs -ls /
Found 1 items
drwxr-xr-x - Mike supergroup 0 2014-06-24 00:24 /usr
I get the same output both on master and slaves. why hadoop fs -ls does not work?
Thanks!
hadoop fs -ls
This tries to list current user's home directory on hdfs. since i think /user/{username} directory doesn't exist in your case hence you get the error,
hadoop fs -ls /
you are specifically telling it to list root directory which it does successfully as it exist.

How to open HDFS output file using gedit?

I have installed and executed an mapreduce program successfully in my system(Ubuntu 14.04).
I can see the output file as,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hadoopuser/MapReduceSample-output
Found 3 items
-rw-r--r-- 1 hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_SUCCESS
drwxr-xr-x - hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_logs
-rw-r--r-- 1 hadoopuser supergroup 880838 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/part-00000
And I can open it on terminal using following command,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hadoopuser/MapReduceSample-output/part-00000
I can see the output file on terminal, but I can't see the full result because my output has large amount of lines.
So I want to open it on gedit or nano.
Need Solution.
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
instead of looking for plugin. You can add jar files from $HADOOP_INSTALL/bin in eclipse and compiler issues must be gone.
You can't access HDFS file from local machine(system user), so that you can't open HDFS file using gedit.
To open in gedit you have to copy to local machine.
To do that, open terminal(Ctrl+Alt+T) and use copyToLocal a Hadoop Shell Command to copy the output file into local machine.
Do the following,
hadoopuser#arul-PC:/usr/local/hadoop$ sudo bin/hadoop dfs -copyToLocal /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/Downloads/
Now you can open the output file using gedit as follows,
$ sudo gedit /home/arul/Downloads/part-00000
Note :
My HDFS username is hadoopuser.
You can move a file from HDFS to local machine. The Hadoop Shell Command fs -mv allow to move different HDFS location.
For more Hadoop Shell Commands(click here).
Update (An another option to do the same from Y-Prithvi's post)
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
Eclipse Setup for Hadoop Development
this should help

hadoop fs -put command

I have constructed a single-node Hadoop environment on CentOS using the Cloudera CDH repository. When I want to copy a local file to HDFS, I used the command:
sudo -u hdfs hadoop fs -put /root/MyHadoop/file1.txt /
But,the result depressed me:
put: '/root/MyHadoop/file1.txt': No such file or directory
I'm sure this file does exist.
Please help me,Thanks!
As user hdfs, do you have access rights to /root/ (in your local hdd)?. Usually you don't.
You must copy file1.txt to a place where local hdfs user has read rights before trying to copy it to HDFS.
Try:
cp /root/MyHadoop/file1.txt /tmp
chown hdfs:hdfs /tmp/file1.txt
# older versions of Hadoop
sudo -u hdfs hadoop fs -put /tmp/file1.txt /
# newer versions of Hadoop
sudo -u hdfs hdfs dfs -put /tmp/file1.txt /
--- edit:
Take a look at the cleaner roman-nikitchenko's answer bellow.
I had the same situation and here is my solution:
HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /
Advantages:
You don't need sudo.
You don't need actually appropriate local user 'hdfs' at all.
You don't need to copy anything or change permissions because of previous points.
try to create a dir in the HDFS by usig: $ hadoop fs -mkdir your_dir
and then put it into it $ hadoop fs -put /root/MyHadoop/file1.txt your_dir
Here is a command for writing df directly to hdfs file system in python script:
df.write.save('path', format='parquet', mode='append')
mode can be append | overwrite
If you want to put in in hdfs using shell use this command:
hdfs dfs -put /local_file_path_location /hadoop_file_path_location
You can then check on localhost:50070 UI for verification

Resources