hadoop conf "fs.default.name" can't be setted ip:port format directly? - hadoop

all
I have setupped a hadoop cluster in fully distributed mode. First, I set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in hostname:port format, and chang /etc/hosts correspondingly, the cluster works succesfully.
Then I use another way, I set set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in ip:port format. It dosen't work.
I find
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
in namenode log file and
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhos
t name. Using 'localhost'...
java.net.UnknownHostException: slave01: slave01: Name or service not known
in datanode log file.
In my opinion,ip and hostname is equivalent. Is there something wrong in my hadoop conf?

maybe there is a wrong configured hostname in /etc,
you should check hostname /etc/hosts /etc/HOSTNAME (rhel/debian) or rc.conf (archlinux) etc.

I got your point. This is because of that you probably wrote in mapred-site.xml, hdfs://ip:port (it starts with hdfs, this is wrong) but when you write hostname:port, you probably did not write hdfs at the beginning of the value which is correct way. THerefore, firstone did not work,but, second has worked
Fatih haltas

I found answer here.
It seems that HDFS uses host name only for it's all communication and display purposes, so we can NOT use ip directly in core-site.xml and mapred-site.xml

Related

Meaning of fs.defaultFS property in core-site.xml in hadoop

I am trying to set up hadoop in fully distributed mode, and to some extent I am successful in doing this.
However, I have got some doubt in one of the parameter setting in core-site.xml --> fs.defaultFS
In my set up, I have three nodes as described below:
Node1 -- 192.168.1.2 --> Configured to be Master (Running ResourceManager and NameNode daemons)
Node2 -- 192.168.1.3 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Node3 -- 192.168.1.4 --> Configured to be Slave (Running NodeManager and Datanode daemons)
Now what does property fs.defaultFS mean? For example, if I set it like this:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
I am not able to understand the meaning of hdfs://192.168.1.2:9000. I can figure out that hdfs would mean that we are using hdfs file system, but what does the other parts means?
Does this mean that the host with IP address 192.168.1.2 is running the Namenode at port 9000?
Can anyone help me understand this?
In this code:
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.2:9000/</value>
</property>
Include fs.defaultFS/fs.default.name in core-site.xml to allow dfs commands without providing full site name in the command. Running hdfs dfs -ls / instead of hdfs dfs -ls hdfs://hdfs/
This is used to specify the default file system and defaults to your local file system that's why it needs be set to a HDFS address. This is important for client configuration as well so your local configuration file should include this element.
Above #Shashank explained very appropriate that :
hdfs://192.168.1.2:9000/. Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
<name>fs.default.name</name>.
Here fs denotes file system and default.name denotes namenode
<value>hdfs://192.168.1.2:9000/</value>.
Here 9000 denotes port on which datanode will send heartbeat to namenode. And full address is the machine name that is converted to hostname.
Something important to note about port is that you can give any port greater than 1024 as lesser than that have to give root privileges.

How do i check hadoop server name?

when I try to import data to excel from hdfs, it asks to enter the name of hadoop server, so I am confused what to type there, kindly help me on the same..
Give the name of your Namenode and try once.
You should enter the name of your name node here.
For e.g. in hdfs-site.xml, there is a property dfs.namenode.http-address, check the value of this property. You need to set this to the name of the server mentioned in this property.
For me, it is set to:
<property>
<name>dfs.namenode.http-address</name>
<value>mballur.myorg.com:50070</value>
<description>The address and the base port where the dfs namenode
web ui will listen on. If the port is 0 then the server will
start on a free port.</description>
<final>true</final>
</property>
So, if you take my example, you need to set this to mballur.myorg.com.
You can also get the name of the NameNode, by running hadoop fsck command.
For e.g. when I run the following command:
CMD PROMPT>hadoop fsck /tmp/
I get the following output:
Connecting to namenode via http://mballur.myorg.com:50070/fsck?ugi=mballur&path=%2Ftmp
FSCK started by mballur (auth:SIMPLE) from /192.168.56.1 for path /tmp at Wed Jan 06 18:29:57 IST 2016
You can see the first line, highlighted portion is the name of the Name Node:
Connecting to namenode via http://mballur.myorg.com:50070/fsck?ugi=mballur&path=%2Ftmp
Also, check this tutorial in Youtube: https://www.youtube.com/watch?v=_eyE7Qcj0_A

Hadoop UI shows only one Datanode

I've started hadoop cluster composed of on master and 4 slave nodes.
Configuration seems ok:
hduser#ubuntu-amd64:/usr/local/hadoop$ ./bin/hdfs dfsadmin -report
When I enter NameNode UI (http://10.20.0.140:50070/) Overview card seems ok - for example total Capacity of all Nodes sumes up.
The problem is that in the card Datanodes I see only one datanode.
I came across the same problem, fortunately, I solved it. I guess it causes by the 'localhost'.
Config different name for these IP in /etc/host
Remember to restart all the machines, things will go well.
It's because of the same hostname in both datanodes.
In your case both datanodes are registering to the namenode with same hostname ie 'localhost' Try with different hostnames it will fix your problem.
in UI it will show only one entry for a hostname.
in "hdfs dfsadmin -report" output you can see both.
The following tips may help you
Check the core-site.xml and ensure that the namenode hostname is correct
Check the firewall rules in namenode and datanodes and ensure that the required ports are open
Check the logs of datanodes
Ensure that all the datanodes are up and running
As #Rahul said the problem is because of the same hostname
change your hostname in /etc/hostname file and give different hostname for each host
and resolve hostname with ip address /etc/hosts file
then restart your cluster you will see all datanodes in Datanode information tab on browser
I have the same trouble because I use ip instead of hostname, [hdfs dfsadmin -report] is correct though it is only one[localhost] in UI. Finally, I solved it like this:
<property>
       <name>dfs.datanode.hostname</name>                   
       <value>the name you want to show</value>
</property>
you almost can't find it in any doucument...
Sorry, feels like it's been a time. But still I'd like to share my answer:
the root cause is from hadoop/etc/hadoop/hdfs-site.xml:
the xml file has a property named dfs.datanode.data.dir. If you set all the datanodes with the same name, then hadoop is assuming the cluster has only one datanode. So the proper way of doing it is naming every datanode with a unique name:
Regards,
YUN HANXUAN
Your admin report looks absolutely fine. Please run the below to check the HDFS disk space details.
"hdfs dfs -df /"
If you still see the size being good, its just a UI glitch.
My Problems: I have 1 master node and 3 slave nodes. when I start all nodes by start-all. sh and accessing the dashboard of master nodes. I was able to see only one data node on the web UI.
My Solution:
Try to stop the Firewall temporary by sudo systemctl stop firewalld. if you do not want to stop your firewalld service then r allow the ports of the data node by
sudo firewall-cmd --permanent --add-port{PORT_Number/tcp,PORT_number2/tcp} ; sudo firewall-cmd --reload
If you are using sapretae user for Hadoop in my case I am using hadoop user to manage hadoop daemons then change the owner on your dataNode and nameNode file by. sudo chown hadoop:hadoop /opt/data -R
My hdfs-site.xml config as given in image
Check your daemons on data node by jps command. it should show as given in the below image.
jps Output

A Hadoop DataNode error: host:port authority

guys.when I try to run the hadoop cluster ,but i don't make it .The main error is like this:
But the strong strange is that the NameNode,JobTracker,SecondNameNode and TaskTracker are ok,besides the dataNode .
My other configurations are like these:
hdfs-site.xml
core-site.xml
mapred-site.xml
I am not sure if it would help, but check this page
To quote from there,
Even thought I configured the core-site.xml, mapred-site.xml &
hdfs-site.xml under /usr/local/hadoop/conf/ folder, by default the
system is referring to /etc/hadoop/ *.xml. Once I update the
configuration files in /etc/hadoop location everything started
working.
Please make sure you are picking the correct set of configuration files. Looks like some classpath related issue since your setup is bypassing whatever you have configured in your core-site.xml. Make sure you don't have any classpath related issue. Do you have any other Hadoop setup on the same machine, which was done earlier, and then you forgot to edit the classpath for the current setup?
Also, http:// is not required in mapred-site.xml.
HTH

hadoop hdfs points to file:/// not hdfs://

So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd
hadoop fs -ls /
I expected to see the contents of hdfs://localhost.localdomain:8020/
However, it had returned the contents of file:///
Now, this goes without saying that I can access my hdfs:// through
hadoop fs -ls hdfs://localhost.localdomain:8020/
But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:///
Question is, has anyone ran into this issue and how did you resolve it?
I had a look at HDFS thrift server returns content of local FS, not HDFS , which was a similar issue, but did not solve this issue.
Also, I do not get this issue with Cloudera Manager cdh4.
By default, Hadoop is going to use local mode. You probably need to set fs.default.name to hdfs://localhost.localdomain:8020/ in $HADOOP_HOME/conf/core-site.xml.
To do this, you add this to core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost.localdomain:8020/</value>
</property>
The reason why Accumulo is confused is because it's using the same default configuration to figure out where HDFS is... and it's defaulting to file://
We should specify data node data directory and name node meta data directory.
dfs.name.dir,
dfs.namenode.name.dir,
dfs.data.dir,
dfs.datanode.data.dir,
fs.default.name
in core-site.xml file and format name node.
To format HDFS Name Node:
hadoop namenode -format
Enter 'Yes' to confirm formatting name node. Restart HDFS service and deploy client configuration to access HDFS.
If you have already did the above steps. Ensure client configuration is deployed correctly and it points to the actual cluster endpoints.

Resources