Incorrect HDFS File URL - hadoop

I was trying to access a file present in the HDFS (location: /user/input/UsageSummary.txt). I am not sure what will be the URL for this file.
I tried this url but it did not worked:
hdfs://127.0.0.1:9000/user/input/UsageSummary.txt
Even I tried these but none of them worked:
hdfs://localhost:9000/user/input/UsageSummary.txt
hdfs://localhost/user/input/UsageSummary.txt
Please let me know how to find out the correct URL.
EDIT
This is the content of core-site.xml file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<!-- HTTPFS proxy user setting -->
<property>
<name>hadoop.proxyuser.tomcat.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.tomcat.groups</name>
<value>*</value>
</property>
</configuration>

Typically, the HDFS resource path is
hdfs://<NameNodeHost>:<port>/path to resource
If you just want to print a file content, the below is sufficient.
hadoop fs -cat /user/input/UsageSummary.txt

What is the output or error you are getting, and what mode are you running hadoop in - local, fully or psuedo distributed?
What do you have set as fs.defaultFS in your core-site.xml? if its set to hdfs://host:port/ you should be able to run something like
hdfs dfs -cat /user/input/UsageSummary.txt
or run
hdfs dfs -ls /
to see root ensure directory structure does exist.

Related

Hadoop 'put' command: No such file or directory

I've tried to add a text file to HDFS filesystem, but Hadoop refuses it with error message "No such file or directory".
$ bin/hdfs dfs -put /home/NDelt/Datasets/SampleText.txt /home/NDelt/HadoopDir/hdata
put: `/home/NDelt/HadoopDir/hdata': No such file or directory: `hdfs://localhost:9000/home/NDelt/HadoopDir/hdata'
But the path of SampleText.txt and hdata directory is correct. What is the problem?
This is my hdfs-site.xml file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/NDelt/HadoopDir/hdata/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/NDelt/HadoopDir/hdata/dfs/datanode</value>
</property>
</configuration>
As cricket_007 mentioned, there's no /home directory in HDFS.
How to test hdfs put
$ bin/hdfs dfs -put /home/NDelt/Datasets/SampleText.txt /tmp
And test whether the file is added in HDFS by
$ bin/hdfs dfs -get /tmp/SampleText.txt
If the file content is small, you can also view the content
$ bin/hdfs dfs -cat /tmp/SampleText.txt
On HDFS, there is no /home directory
Your user account in HDFS would be under /user
And you'd need to explicitly create the HDFS parent path of where you're putting files first with hdfs mkdir -p
There is also no requirement to match your local file system exactly over into HDFS

Hadoop not using config files?

I'm deploying Hadoop at work and I've been troubleshooting some days. Yesterday it was working perfectly but today something strange is happening.
I have hadoop.tmp.dir set in core-site.xml as well as other directories for HDFS (datanode, namenode and secondarynamenode in hdfs-site.xml). But today, when I format the FS it's creating all the files in /tmp and not in /usr/local/hadoop/tmp which is the one I have configured.
$ bin/hdfs namenode -format
[...]
INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
[...]
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/hdfs/secondname</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
Anyone has any clue about what's happening?
Thanks!
make sure this directory exists and have enough permission
give the path as file:///usr/local/hadoop/tmp
Found what was wrong and it was really embarrassing. My hadoop user had bash as default sh but wasn't loading correctly the profile until I explicitly did "bash" on command line.
Saw it with printenv command.

Hive 2.3.2 Local Mode Cannot Find Hadoop Installation

According to what I've been reading, you can run Hive without Hadoop or HDFS (like in cases of using Spark or Tez), i.e. in local mode by setting the fs.default.name and hive.metastore.warehouse.dir to local paths. However, when I do this, I get an error:
Starting Hive metastore service.
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
My hive-site.xml file:
<property>
<name>mapred.job.tracker</name>
<value>local</value>
</property>
<property>
<name>hive.metastore.schema.verification/name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp/hive/warehouse</value>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp/hive</value>
</property>
Does this mean that I still need to have all of the hadoop binaries downloaded and have HADOOP_HOME set to that path? Or does local mode in hive allow me to run without needing all of that content?
Hive doesn't require HDFS or YARN to execute, but it still requires the Hadoop input / output formats like Spark

Hadoop fs -ls outputs current working directory's files rather than hdfs volume's files

Have set up a single pseudo-distributed node (localhost) with Hadoop 2.8.2 on OpenSuse Tumbleweed 20170703. Java version is 1.8.0_151. Generally, it seems to be set up correctly. I can format namenode with no errors etc.
However, when I try hadoop fs -ls, files/dirs from the current working directory are returned rather than the expected behaviour of returning the hdfs volume files (which should be nothing at the moment).
Was originally following this guide for CentOS (making changes as required) and the Apache Hadoop guide.
I'm assuming that it's a config issue, but I can't see why it would be. I've played around with core-site.xml and hdfs-site.xml as per below with no luck.
/opt/hadoop-hdfs-volume/ exists and is assigned to user hadoop in user group hadoop. As is the /opt/hadoop/ directory (for bin stuff).
EDIT:
/tmp/hadoop-hadoop/dfs/name is where the hdfs namenode -format command runs. /tmp/ also seems to hold my user (/tmp/hadoop-dijksterhuis) and the hadoop user directories.
This seems odd to me considering the *-site.xml config files below.
Have tried restarting the dfs and yarn services with the .sh scripts in the hadoop/sbin/ directory. Have also rebooted. No luck!
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-hdfs-volume/${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Anyone have any ideas? I can provide more details if needed.
Managed to hack a fix via another SO answer:
Add $HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop to the hadoop user's .bashrc.
This has the effect of overriding the value in etc/hadoop-env.sh, which keeps pointing the namenode to the default tmp/hadoop-${user-name} directory.
source .bashrc et voila! Problem fixed.

hadoop wordcount and upload file into hdfs

hello everyone i am very new in hadoop and i install hadoop in pseudo mode.
configurations files are here
core-site.xml
<configuration>
<property>
<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop_usr/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
and am successfully start datanode and namenode
Now i want to put my file into hdfs by using following way
what's going wrong why i get error message. Please help me to resolve this problem
If i using following way to put file into hdfs that time command is working fine. now i appand hdfs url.
Please help me why i getting error in first way.
Because when in running my wordcount.jar that time am also getting error message when i mentioned data.txt as input file on which operation sould be performed.
Thanks in advance.
The reason the first put operation to data/data.txt is not working is likely that you do not have a folder data in your hdfs yet.
You can just create it using hadoop fs -mkdir /data.

Resources