how to write customizesd output file format in mapreduce - hadoop

Please suggest to me how to update the output fileformat (part-r-00000)(default file format) to another file format like csv or txt file formatsin map reduce programs.

You could do this:
hdfs dfs -cat /path/in/hdfs/part* |hdfs dfs -put - /chosen/path/in/hdfs/name_of_file.txt
OR
hdfs dfs -cat /path/in/hdfs/part* |hdfs dfs -put - chosen/path/in/hdfs/name_of_file.csv
Another method is -getmerge which copies to local but then you need to -copyFromLocal back to hdfs but it serves the purpose of changing your file format:
hdfs dfs -getmerge /path/in/hdfs/part* /path/in/local/file_name.format
hdfs dfs -copyFromLocal /path/in/local/file_name.format /path/in/hdfs/archive/

one way is you can copy the part-r-00000 file to xyz.txt file by using put command of hadoop.
like hdfs dfs -put part-r-00000 to xyz.txt

Related

Append hdfs file to local file?

Since I can use hdfs dfs -appendToFile <localFile> ... <hdfsFile> command to append local file to hdfs files as mentioned in HDFS Command Line Append.
Are there any similar commands that allow me to append files in the opposite direction? That is, append hdfs files to certain local file.
For example, some commands like
# append files to local
hdfs dfs -appendToLocal <hdfsFile> <localFile>
I found that hdfs dfs -getmerge solves my question.
hdfs dfs -getmerge -nl <hdfsFile1> <hdfsFile2> ... <hdfsFileN> <localFile>

Read File directly from HDFS

Is there a way to read any file format from HDFS directly by using the HDFS path, instead of having to pull the file locally from HDFS and read it.
You can use cat command on HDFS to read regular text files.
hdfs dfs -cat /path/to/file.csv
To read compressed files like gz, bz2 etc, you can use:
hdfs dfs -text /path/to/file.gz
These are the two read methods that Hadoop supports natively using FsShell comamnds. For other complex file types, you will have to use a more complex way, like, a Java program or something along those lines.
hdfs dfs -cat /path or hadoop fs -cat /path
You have to pull the entire file. Whether you use cat or text commands, the entire file is still being streamed to your shell. There's just no remnant of the file when the command ends. So, if you plan on inspecting the file a few times, it's better to get it
As an hdfs client, you must contact the namenode to acquire all block locations for a particular file.
You can try with hdfs dfs -cat
Usage: hdfs dfs -cat [-ignoreCrc] URI [URI ...]
hdfs dfs -cat /your/path

Hadoop hdfs unable to locate file

Im trying to copy a file to hdfs using the below command. The filename is googlebooks-eng.... etc....
When I try to list the file within hdfs I don't see the filename being listed.What would be the actual filename?
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -put /home/hadoop-user/googlebooks-eng-all-1gram-20120701-0 /user/prema
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -ls /user/prema
Found 1 items
-rw-r--r-- 1 hadoop-user supergroup 192403080 2014-11-19 02:43 /user/prema
Almost all hadoop dfs utilies follows unix style. Syntax of hadoop dfs -put is
hadoop dfs -put <source_file> <destination>. Here destination can be a directory or a file. In your case /user directory exists but the directory prema doesn't exist, So when you copy files from local to hdfs prema will be used for the name of the file. googlebooks-eng-all-1gram-20120701-0 and /user/prema are same file.
If you wanted to persist the file name. You need to delete the existing file and create a new directory /user/prema before copying;
bin/hadoop dfs -rm /user/prema;
bin/hadoop dfs -mkdir /user/prema;
bin/hadoop dfs -put /home/hadoop-user/googlebooks-eng-all-1gram-20120701-0 /user/prema
Now you should be able to see the file inside the hdfs directory /user/prema
bin/hadoop dfs -rm /user/prema

Hadoop 1.2.1 - I need to remove a file from HDFS

Good Day,
I have added a file to HDFS via the command
hadoop fs -put query1.txt .
Now I would like to remove it but I don't have the HDFS location of the file. Is there any way to remove it
You can remove the file using this command.
hadoop fs -rmr query1.txt
By default it will store in /user/(hadoopuser) in your hdfs path.
Use the below command to see the HDFS file location
hadoop fs -ls
hadoop fs -ls /
You will see the hdfs location of your file.
To remove the file use below command
hadoop fs -rmr query1.txt

hadoop dfs -ls complains

Can anyone let me know what seems to be wrong here ? hadoop dfs command seems to be OK but any following options are not recognized.
[hadoop-0.20]$bin/hadoop dfs -ls ~/wordcount/input/
ls: Cannot access /home/cloudera/wordcount/input/ : No such file or directory
hadoop fs -ls /some/path/here - will list a HDFS location, not your local linux location
try first this command
hadoop fs -ls /
then investigate step by step other folders.
if you want to copy some files from local directory to users directory on HDFS location, then just use this:
hadoop fs -mkdir /users
hadoop fs -put /some/local/file /users
for more hdfs commands see this: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html
FS relates to a generic file system which can point to any file systems like local, HDFS, s3 etc But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS.

Resources