I have installed and executed an mapreduce program successfully in my system(Ubuntu 14.04).
I can see the output file as,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hadoopuser/MapReduceSample-output
Found 3 items
-rw-r--r-- 1 hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_SUCCESS
drwxr-xr-x - hadoopuser supergroup 0 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/_logs
-rw-r--r-- 1 hadoopuser supergroup 880838 2014-07-09 16:10 /user/hadoopuser/MapReduceSample-output/part-00000
And I can open it on terminal using following command,
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hadoopuser/MapReduceSample-output/part-00000
I can see the output file on terminal, but I can't see the full result because my output has large amount of lines.
So I want to open it on gedit or nano.
Need Solution.
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
instead of looking for plugin. You can add jar files from $HADOOP_INSTALL/bin in eclipse and compiler issues must be gone.
You can't access HDFS file from local machine(system user), so that you can't open HDFS file using gedit.
To open in gedit you have to copy to local machine.
To do that, open terminal(Ctrl+Alt+T) and use copyToLocal a Hadoop Shell Command to copy the output file into local machine.
Do the following,
hadoopuser#arul-PC:/usr/local/hadoop$ sudo bin/hadoop dfs -copyToLocal /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/Downloads/
Now you can open the output file using gedit as follows,
$ sudo gedit /home/arul/Downloads/part-00000
Note :
My HDFS username is hadoopuser.
You can move a file from HDFS to local machine. The Hadoop Shell Command fs -mv allow to move different HDFS location.
For more Hadoop Shell Commands(click here).
Update (An another option to do the same from Y-Prithvi's post)
you can also use getmerge to copy HDFS file to local system.
hadoopuser#arul-PC:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hadoopuser/MapReduceSample-output/part-00000 /home/arul/MROutput
hadoop dfs -getmerge /path/to/HDFS /path/to/save
Eclipse Setup for Hadoop Development
this should help
Related
First, I have read this post:Is there an equivalent to `pwd` in hdfs?. It says there is no such 'pwd' in HDFS.
However, as I progressed with the instructions of Hadoop: Setting up a Single Node Cluster, I failed on this command:
$ bin/hdfs dfs -put etc/hadoop input
put: 'input': No such file or directory
It's weird that I succeed on this command for the first time I went through the instructions, but failed for the second time. It's also weird that I succeed on this command on my friends computer, which has the same system (Ubuntu 14.04) and hadoop version (2.7.1) as mine.
Can anyone explain what happened here? Is there some 'pwd' in HDFS after all?
Firstly, You are trying to run the command $ bin/hdfs dfs -put etc/hadoop input with user that doesn't exist in the VM/HDFS
Let me clearly explain you with the following example in HDP VM
[root#sandbox hadoop-hdfs-client]# bin/hdfs dfs -put /etc/hadoop input
put: `input': No such file or directory
Here I executed the command with root user and it didn't exist in the HDP VM. Check in the following command to list the users
[root#sandbox hadoop-hdfs-client]# hadoop fs -ls /user
Found 8 items
drwxrwx--- - ambari-qa hdfs 0 2015-08-20 08:33 /user/ambari-qa
drwxr-xr-x - guest guest 0 2015-08-20 08:47 /user/guest
drwxr-xr-x - hcat hdfs 0 2015-08-20 08:36 /user/hcat
drwx------ - hive hdfs 0 2015-09-04 09:52 /user/hive
drwxr-xr-x - hue hue 0 2015-08-20 09:05 /user/hue
drwxrwxr-x - oozie hdfs 0 2015-08-20 08:37 /user/oozie
drwxr-xr-x - solr hdfs 0 2015-08-20 08:41 /user/solr
drwxrwxr-x - spark hdfs 0 2015-08-20 08:34 /user/spark
In HDFS, If you want to copy a file and not mentioning the absolute path for destination argument, it will consider home of the logged user and place your file there. Here root user not found.
Now let's switch to hive user and test
[root#sandbox hadoop-hdfs-client]# su hive
[hive#sandbox hadoop-hdfs-client]$ bin/hdfs dfs -put /etc/hadoop input
[hive#sandbox hadoop-hdfs-client]$ hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x - hive hdfs 0 2015-09-04 10:07 /user/hive/input
Yay..Successfully Copied..
Hope it helps..!!!
It means that we need to move input files to hdfs location.
Suppose you have input file named input.txt and we need to move to HDFS, then follow the below command.
Command: hdfs dfs -put /input_location /hdfs_location
In case no specific directory in HDFS
hdfs dfs -put /home/Desktop/input.txt /
In case specific directory in HDFS (Note: We need to create a directory before proceeding)
hdfs dfs -put /home/Desktop/input.txt /MR_input
After that you can run the examples
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
Here Input and output are the paths which should be in HDFS.
Hope this helps.
I was trying to unzip a zip file, stored in Hadoop file system, & store it back in hadoop file system. I tried following commands, but none of them worked.
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp
I get errors like gzip: stdin has more than one entry--rest ignored, cat: Unable to write to output stream., Error: Could not find or load main class put on terminal, when I run those commands. Any help?
Edit 1: I don't have access to UI. So, only command lines are allowed. Unzip/gzip utils are installed on my hadoop machine. I'm using Hadoop 2.4.0 version.
To unzip a gzipped (or bzipped) file, I use the following
hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/
If the file sits on your local drive, then
zcat <infile> | hdfs dfs -put - /data/
I use most of the times hdfs fuse mounts for this
So you could just do
$ cd /hdfs_mount/somewhere/
$ unzip file_in_hdfs.zip
http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_28.html
Edit 1/30/16: In case if you use hdfs ACLs: In some cases fuse mounts don't adhere to hdfs ACLs, so you'll be able to do file operations that are permitted by basic unix access privileges. See https://issues.apache.org/jira/browse/HDFS-6255, comments at the bottom that I recently asked to reopen.
To stream the data through a pipe to hadoop, you need to use the hdfs command.
cat mydatafile | hdfs dfs -put - /MY/HADOOP/FILE/PATH/FILENAME.EXTENSION
gzip use -c to read data from stdin
hadoop fs -put doesnt support read the data from stdin
I tried a lots of things and would help.I cant find the zip input support of hadoop.So it left me no choice but download the hadoop file to local fs ,unzip it and upload to hdfs again.
I try to copy file from local to hadoop file system...
I'm using single node cluster
hduser#jothinathan-VirtualBox:~$ hdfs dfs -mkdir -p /usr/hduser
hduser#jothinathan-VirtualBox:~$ hadoop fs -ls
Found 1 items
drwxr-xr-x - hduser supergroup 0 2015-03-10 18:33 sample
hduser#jothinathan-VirtualBox:~$ cd Documents
hduser#jothinathan-VirtualBox:~/Documents$ ls
file hadoopFIle.txt URICat URICat.java
hduser#jothinathan-VirtualBox:~/Documents$ cd
hduser#jothinathan-VirtualBox:~$ hadoop fs -copyFromLocal /Documents/file /usr/local/hadoop
copyFromLocal: `/usr/local/hadoop': No such file or directory
I am getting this error message, please help me with this problem.
first try this command.
hadoop fs -ls /
if it is listing out the local file system files.(not hdfs),then try
hadoop fs -ls hdfs://IP-ADDRESS-of your-machine/
now copy your file to hdfs by
hadoop fs -copyFromLocal /Documents/file hdfs://Ip-addressofyourmachine/above result path
I want to copy a certain pattern of files from within hdfs to another location in the same hdfs cluster. The dfs shell does not seem to be able to handle this:
hadoop dfs -cp /tables/weblog/server=jeckle/webapp.log.1* /tables/tinylog/server=jeckle/
No error is returned: yet also no files are copied.
You need use double quote with your path that contains wildcard, like this:
hdfs fs -cp "/path/to/foo*" /path/to/bar/
First of all, HDFS copy with wildcards is supported. Secondly, use of hadoop dfs is deprecated, you'd better use hadoop fs or hdfs dfs instead. If you're sure the operation was not successful (although it seems succeed), you could check out the log files of namenode to see what's wrong.
Interesting. This is what I get in my local VM running Hadoop 0.18.0. What version are you using? I can try on 1.2.1 also
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
hadoop-user#hadoop-desk:~$ hadoop dfs -cp /user/hadoop-user/input/*.txt /user/hadoop-user/testcopy/
hadoop-user#hadoop-desk:~$ hadoop fs -ls /user/hadoop-user/testcopy
Found 2 items
-rw-r--r-- 1 hadoop-user supergroup 79 2014-01-06 04:35 /user/hadoop-user/testcopy/HelloWorld.txt
-rw-r--r-- 1 hadoop-user supergroup 140 2014-01-06 04:35 /user/hadoop-user/testcopy/SampleData.txt
These both worked for me:
~]$ hadoop fs -cp -f /user/cloudera/Dec_17_2017/cric* /user/cloudera/Dec_17_2017/Dec_18
~]$ hadoop fs -cp -f "/user/cloudera/Dec_17_2017/cric*" /user/cloudera/Dec_17_2017/Dec_18
I thinks better way is that don't give double/single("/') quotes.
In case anybody wants to copy the files and folders from the current directory where the user is in the terminal, then
hdfs dfs -put ./
I have constructed a single-node Hadoop environment on CentOS using the Cloudera CDH repository. When I want to copy a local file to HDFS, I used the command:
sudo -u hdfs hadoop fs -put /root/MyHadoop/file1.txt /
But,the result depressed me:
put: '/root/MyHadoop/file1.txt': No such file or directory
I'm sure this file does exist.
Please help me,Thanks!
As user hdfs, do you have access rights to /root/ (in your local hdd)?. Usually you don't.
You must copy file1.txt to a place where local hdfs user has read rights before trying to copy it to HDFS.
Try:
cp /root/MyHadoop/file1.txt /tmp
chown hdfs:hdfs /tmp/file1.txt
# older versions of Hadoop
sudo -u hdfs hadoop fs -put /tmp/file1.txt /
# newer versions of Hadoop
sudo -u hdfs hdfs dfs -put /tmp/file1.txt /
--- edit:
Take a look at the cleaner roman-nikitchenko's answer bellow.
I had the same situation and here is my solution:
HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /
Advantages:
You don't need sudo.
You don't need actually appropriate local user 'hdfs' at all.
You don't need to copy anything or change permissions because of previous points.
try to create a dir in the HDFS by usig: $ hadoop fs -mkdir your_dir
and then put it into it $ hadoop fs -put /root/MyHadoop/file1.txt your_dir
Here is a command for writing df directly to hdfs file system in python script:
df.write.save('path', format='parquet', mode='append')
mode can be append | overwrite
If you want to put in in hdfs using shell use this command:
hdfs dfs -put /local_file_path_location /hadoop_file_path_location
You can then check on localhost:50070 UI for verification