I've tried to add a text file to HDFS filesystem, but Hadoop refuses it with error message "No such file or directory".
$ bin/hdfs dfs -put /home/NDelt/Datasets/SampleText.txt /home/NDelt/HadoopDir/hdata
put: `/home/NDelt/HadoopDir/hdata': No such file or directory: `hdfs://localhost:9000/home/NDelt/HadoopDir/hdata'
But the path of SampleText.txt and hdata directory is correct. What is the problem?
This is my hdfs-site.xml file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/NDelt/HadoopDir/hdata/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/NDelt/HadoopDir/hdata/dfs/datanode</value>
</property>
</configuration>
As cricket_007 mentioned, there's no /home directory in HDFS.
How to test hdfs put
$ bin/hdfs dfs -put /home/NDelt/Datasets/SampleText.txt /tmp
And test whether the file is added in HDFS by
$ bin/hdfs dfs -get /tmp/SampleText.txt
If the file content is small, you can also view the content
$ bin/hdfs dfs -cat /tmp/SampleText.txt
On HDFS, there is no /home directory
Your user account in HDFS would be under /user
And you'd need to explicitly create the HDFS parent path of where you're putting files first with hdfs mkdir -p
There is also no requirement to match your local file system exactly over into HDFS
Related
I am new to Hadoop and I'm going to configure the hadoop cluster. The Version of Hadoop is 3.1.3. I want to set the NameNode, DataNode, NodeManager on host hadoop102, DataNode, ResourceNode, NodeManager on host hadoop103, and SecondaryNameNode, DataNode, NodeManager on hadoop104
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
workers
hadoop102
hadoop103
hadoop104
I upload the test file from host hadoop102 with the command
hadoop fs -put $HADOOP_HOME/wcinput/word.txt /input
Why the file is only available on hadoop102? I think the file should be copied into hadoop103, hadoop104 in the local file system.
File Information
You need to know that HDFS is not like replicated file system, so if you put one file to HDFS does not mean that it will be placed on data nodes as files (under / filesystem for example).
HDFS splits the file into blocks, and these blocks are replicated on your cluster and configured by replication factor.
When you run -copyFromLocal or hdfs put what does perform is just split the file into blocks and send these blocks in replicated fashion.
So if one node goes down. you can still retrieve your file.
But where's my file? the file will not be in your machines' local filesystem. It will be stored on data nodes.
How can you configure the number of replicas?
You can setup dfs.replication to 3 in hdfs-site.xml
and you set number of replica for a file:
hadoop fs –setrep –w 3 /my/file
You can change the replication factor of all the files under a directory.
hadoop fs –setrep –w 3 -R /my/dir
I'm deploying Hadoop at work and I've been troubleshooting some days. Yesterday it was working perfectly but today something strange is happening.
I have hadoop.tmp.dir set in core-site.xml as well as other directories for HDFS (datanode, namenode and secondarynamenode in hdfs-site.xml). But today, when I format the FS it's creating all the files in /tmp and not in /usr/local/hadoop/tmp which is the one I have configured.
$ bin/hdfs namenode -format
[...]
INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
[...]
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/hdfs/secondname</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
Anyone has any clue about what's happening?
Thanks!
make sure this directory exists and have enough permission
give the path as file:///usr/local/hadoop/tmp
Found what was wrong and it was really embarrassing. My hadoop user had bash as default sh but wasn't loading correctly the profile until I explicitly did "bash" on command line.
Saw it with printenv command.
Have set up a single pseudo-distributed node (localhost) with Hadoop 2.8.2 on OpenSuse Tumbleweed 20170703. Java version is 1.8.0_151. Generally, it seems to be set up correctly. I can format namenode with no errors etc.
However, when I try hadoop fs -ls, files/dirs from the current working directory are returned rather than the expected behaviour of returning the hdfs volume files (which should be nothing at the moment).
Was originally following this guide for CentOS (making changes as required) and the Apache Hadoop guide.
I'm assuming that it's a config issue, but I can't see why it would be. I've played around with core-site.xml and hdfs-site.xml as per below with no luck.
/opt/hadoop-hdfs-volume/ exists and is assigned to user hadoop in user group hadoop. As is the /opt/hadoop/ directory (for bin stuff).
EDIT:
/tmp/hadoop-hadoop/dfs/name is where the hdfs namenode -format command runs. /tmp/ also seems to hold my user (/tmp/hadoop-dijksterhuis) and the hadoop user directories.
This seems odd to me considering the *-site.xml config files below.
Have tried restarting the dfs and yarn services with the .sh scripts in the hadoop/sbin/ directory. Have also rebooted. No luck!
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-hdfs-volume/${user.name}</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Anyone have any ideas? I can provide more details if needed.
Managed to hack a fix via another SO answer:
Add $HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop to the hadoop user's .bashrc.
This has the effect of overriding the value in etc/hadoop-env.sh, which keeps pointing the namenode to the default tmp/hadoop-${user-name} directory.
source .bashrc et voila! Problem fixed.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/abcd/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/abcd/data1</value>
</property>
</configuration>
bin/hadoop namenode -format
bin/hadoop-daemon.sh start namenode
Error msg -
logs:directory /abcd/name is in an inconsistent state:storage directory does not exist or is not accessible.**
Configuration properties used in your config file - dfs.name.dir,dfs.data.dir etc are deprecated from Hadoop2.0 onwards.
Hadoop-1.0 -> Hadoop2.0
dfs.name.dir -> dfs.namenode.name.dir
dfs.data.dir -> dfs.datanode.data.dir
dfs.name.edits.dir -> dfs.namenode.checkpoint.dir
There might be files inside your hdfs name, data, checkpoint directories. Try to delete the contents of data, name and checkpoint directories before formatting name node, HDFS formatting fails if there are hdfs specific files in the directories.
dfs.namenode.checkpoint.dir - default location is /tmp/dfs/namesecondary
Check the permissions of namenode and datanode directories.
I was trying to access a file present in the HDFS (location: /user/input/UsageSummary.txt). I am not sure what will be the URL for this file.
I tried this url but it did not worked:
hdfs://127.0.0.1:9000/user/input/UsageSummary.txt
Even I tried these but none of them worked:
hdfs://localhost:9000/user/input/UsageSummary.txt
hdfs://localhost/user/input/UsageSummary.txt
Please let me know how to find out the correct URL.
EDIT
This is the content of core-site.xml file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<!-- HTTPFS proxy user setting -->
<property>
<name>hadoop.proxyuser.tomcat.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.tomcat.groups</name>
<value>*</value>
</property>
</configuration>
Typically, the HDFS resource path is
hdfs://<NameNodeHost>:<port>/path to resource
If you just want to print a file content, the below is sufficient.
hadoop fs -cat /user/input/UsageSummary.txt
What is the output or error you are getting, and what mode are you running hadoop in - local, fully or psuedo distributed?
What do you have set as fs.defaultFS in your core-site.xml? if its set to hdfs://host:port/ you should be able to run something like
hdfs dfs -cat /user/input/UsageSummary.txt
or run
hdfs dfs -ls /
to see root ensure directory structure does exist.