Hadoop hdfs unable to locate file - hadoop

Im trying to copy a file to hdfs using the below command. The filename is googlebooks-eng.... etc....
When I try to list the file within hdfs I don't see the filename being listed.What would be the actual filename?
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -put /home/hadoop-user/googlebooks-eng-all-1gram-20120701-0 /user/prema
hadoop-user#hadoop-desk:~/hadoop$ bin/hadoop dfs -ls /user/prema
Found 1 items
-rw-r--r-- 1 hadoop-user supergroup 192403080 2014-11-19 02:43 /user/prema

Almost all hadoop dfs utilies follows unix style. Syntax of hadoop dfs -put is
hadoop dfs -put <source_file> <destination>. Here destination can be a directory or a file. In your case /user directory exists but the directory prema doesn't exist, So when you copy files from local to hdfs prema will be used for the name of the file. googlebooks-eng-all-1gram-20120701-0 and /user/prema are same file.
If you wanted to persist the file name. You need to delete the existing file and create a new directory /user/prema before copying;
bin/hadoop dfs -rm /user/prema;
bin/hadoop dfs -mkdir /user/prema;
bin/hadoop dfs -put /home/hadoop-user/googlebooks-eng-all-1gram-20120701-0 /user/prema
Now you should be able to see the file inside the hdfs directory /user/prema
bin/hadoop dfs -rm /user/prema

Related

HDFS dfs full path

How to find full path for HDFS storage in my system?
e.g. I have /user/cloudera/ folder on hdfs storage, but what is path to the "/user/cloudera"? Are there any specific commands?
HDFS dfs -ls and HDFS dfs -ls -R return only directory list, but not path.
My question is unique, because in here you don't get the HDFS path in the end.
If you are an HDFS admin, you can run:
hdfs fsck /user/cloudera -files -blocks -locations
References:
HDFS Commands Guide: fsck
hdfs file actual block paths

Copying files into HDFS Hadoop

I am currently working on a project for one of my lectures at the university. The task is to download a book from https://www.gutenberg.org/ and copy it into HDFS. I've tried using put <localSrc> <dest> but it didnt work at all.
This is how my code looks in Terminal at the moment:
[cloudera#quickstart ~]$ put <pg16328.txt> <documents>
bash: syntax error near unexpected token `<'
Any help is appreciated. Thanks in advance.
UPDATE 30.05.2017: I haved used following link https://www.cloudera.com/downloads/quickstart_vms/5-10.html to install Hadoop and did not configure anything at all. Only thing I did was to absolve the tutorial Getting started.
It should just be:
hdfs fs -copyFromLocal pg16328.txt /HDFS/path
I'm not familiar with the put command, but have you tried it without the <>s?
If you have successfully extracted and configured Hadoop, then
you should be in hadoop-home directory ( the location where you extracted and configured hadoop)
Then apply the following command
bin/hadoop dfs -put <local file location> <hdfs file location>
or
bin/hdfs dfs -put <local file location> <hdfs file location>
You can do the same with -copyFromLocal command too. Just replace -put with -copyFromLocal in above commands.
for example :
Lets say you have pg16328.txt in your Desktop directory, then the above command would be
bin/hadoop dfs -put /home/cloudera/Desktop/pg16328.txt /user/hadoop/
where /user/hadoop is a directory in hdfs
If /user/hadoop directory doesn't exists then you can create it by
bin/hadoop dfs -mkdir -f /user/hadoop
You can look at the uploaded file using webUI (namenodeIP:50070) or by using command line as
bin/hadoop dfs -ls /user/hadoop/

hadoop 2.7.2 HDFS: no such file or directory

I have this:
I had also tried to edit this:
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib
as
export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$HADOOP_INSTALL/lib
in ~/.bashrc
But still I am getting a warning message and I'm not able to solve the problem.
Unable to create the directory
I'm using this code to create the directory for twitter analysis:
hadoop fs -mkdir hdfs://localhost:54310/home/vipal/hadoop_store/hdfs/namenode/twitter_data
Notice how hadoop fs -ls says .: No such file or directory?
First, you must create your home directory, which is /user in HDFS.
hdfs dfs -mkdir -p /user/$(whoami)
(You should also chown and chmod that directory)
Then, you can place files into a twitter_data directory.
hdfs dfs -mkdir twitter_data
hdfs dfs -put <local_files> twitter_data
(I removed hadoop_store/hdfs/namenode because that doesn't make sense)

Bash unable to create directory

In docker, I want to copy a file README.md from an existing directory /opt/ibm/labfiles to a new one /input/tmp. I try this
hdfs dfs -put /opt/ibm/labfiles/README.md input/tmp
to no effect, because there seems to be no /input folder in the root. So I try to create it:
hdfs dfs -mkdir /input
mkdir:'/input': File exists
However, when I ls, there is no input file or directory
How can I create a folder and copy the file? Thank you!!
Please try hdfs dfs -ls / if you want to see there is an input folder that exists in HDFS at the root.
You cannot cd into an HDFS directory
It's also worth mentioning that the leading slash is important. In other words,
This will try to put the file in HDFS at /user/<name>/input/tmp
hdfs dfs -put /opt/ibm/labfiles/README.md input/tmp
While this puts the file at the root of HDFS
hdfs dfs -put /opt/ibm/labfiles/README.md /input/tmp

how to write customizesd output file format in mapreduce

Please suggest to me how to update the output fileformat (part-r-00000)(default file format) to another file format like csv or txt file formatsin map reduce programs.
You could do this:
hdfs dfs -cat /path/in/hdfs/part* |hdfs dfs -put - /chosen/path/in/hdfs/name_of_file.txt
OR
hdfs dfs -cat /path/in/hdfs/part* |hdfs dfs -put - chosen/path/in/hdfs/name_of_file.csv
Another method is -getmerge which copies to local but then you need to -copyFromLocal back to hdfs but it serves the purpose of changing your file format:
hdfs dfs -getmerge /path/in/hdfs/part* /path/in/local/file_name.format
hdfs dfs -copyFromLocal /path/in/local/file_name.format /path/in/hdfs/archive/
one way is you can copy the part-r-00000 file to xyz.txt file by using put command of hadoop.
like hdfs dfs -put part-r-00000 to xyz.txt

Resources