A Hadoop DataNode error: host:port authority - hadoop

guys.when I try to run the hadoop cluster ,but i don't make it .The main error is like this:
But the strong strange is that the NameNode,JobTracker,SecondNameNode and TaskTracker are ok,besides the dataNode .
My other configurations are like these:
hdfs-site.xml
core-site.xml
mapred-site.xml

I am not sure if it would help, but check this page
To quote from there,
Even thought I configured the core-site.xml, mapred-site.xml &
hdfs-site.xml under /usr/local/hadoop/conf/ folder, by default the
system is referring to /etc/hadoop/ *.xml. Once I update the
configuration files in /etc/hadoop location everything started
working.

Please make sure you are picking the correct set of configuration files. Looks like some classpath related issue since your setup is bypassing whatever you have configured in your core-site.xml. Make sure you don't have any classpath related issue. Do you have any other Hadoop setup on the same machine, which was done earlier, and then you forgot to edit the classpath for the current setup?
Also, http:// is not required in mapred-site.xml.
HTH

Related

How to change java.io.tmpdir for spark job running on yarn

How can I change java.io.tmpdir folder for my Hadoop 3 Cluster running on YARN?
By default it gets something like /tmp/***, but my /tmp filesystem is to small for everythingYARN Job will write there.
Is there a way to change it ?
I have also set hadoop.tmp.dir in core-site.xml, but it looks like, it is not really used.
perhaps its a duplicate of What should be hadoop.tmp.dir ?. Also, go through all .conf's in /etc/hadoop/conf and search tmp, see if anything is hardcoded. Also specify:
Whether you see (any) files getting created # what you specified as hadoop.tmp.dir.
What pattern of files are being formed # /tmp/** after your changes are applied.
I have also noticed hive creating files in /tmp. So, you may also have a look # hive-site.xml. Similar for any other ecosystem product you are using.
I have configured yarn.nodemanager.local-dirs property in yarn-site.xml and restarted the cluster. After that spark stopped using /tmp file system and used directories, configured in yarn.nodemanager.local-dirs.
java.io.tmpdir property for spark executors was also set to directories defined in yarn.nodemanager.local-dirs property.
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/somepath1,/anotherpath2</value>
</property>

what is the different between configuration files under /etc/hadoop/conf and /etc/hadoop/conf.cloudera.HDFS and /etc/hadoop/conf.cloudera.YARN

I have cloudera 5.7, I have Cloudera Manager too.
under the directory /etc/hadoop, I saw three sub-directories.
/etc/hadoop/conf
/etc/hadoop/conf.cloudera.HDFS/
/etc/hadoop/conf.cloudera.YARN/
the hadoop-env.sh in ../conf/ is different from ../conf.cloudera.HDFS/..
the core-site.xml in ../conf/ is different from ../conf.cloudera.HDFS/.. as well.
the hadoop-env.sh in ../conf/ has settings on YARN, while the one under../conf.cloudera.HDFS doesn't has it.
and the one in ../conf.cloudera.HDFS/.. has the settings for Namenode, datanodes, etc.
I have CM installed, I am wondering if these configuration files are really in use?
If yes, and I need to change some environment variables, should I change all of these hadoop-env.sh? and copy it to the other nodes?
Thanks.
Cloudera Manager handle settings for you. If you edit the settings files manually, it will erase by CM.
If you want make some change, do it by CM.

Hadoop/MR temporary directory

I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.
I've added the following to my core-site.xml config file:
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
I've added the following to my mapreduce-site.xml config file:
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
</property>
No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.
Am I missing something from the config?
This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.
EDIT:
After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.
If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml, not mapreduce-site.xml.
An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
and it uses the mapreduce.cluster.local.dir property name, with a default value of ${hadoop.tmp.dir}/mapred/local
Try renaming your mapreduce-site.xml file to mapred-site.xml in your /etc/hadoop/conf/ directories and see if that fixes it.
If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.
I think you need to specify this property in hdfs-site.xml rather than core-site.xml.Try setting this property in hdfs-site.xml. I hope this will solve your problem
The mapreduce properties should be in mapred-site.xml.
I was facing a similar issue where some nodes would not honor the hadoop.tmp.dir set in the config.
A reboot of the misbehaving nodes fixed it for me.

"hadoop namenode -format" formats wrong directory

I'm trying to install Hadoop 1.1.2.21 on CentOS 6.3
I've configured dfs.name.dir in /etc/hadoop/conf/hdfs-site.xml file
<name>dfs.name.dir</name>
<value>/mnt/ext/hadoop/hdfs/namenode</value>
But when I run "hadoop namenode -format" command, it formats /tmp/hadoop-hadoop/dfs/name instead.
What am I missing?
I ran into this problem and solved it. So updating this answer.
Make sure your environment variable HADOOP_CONF_DIR points to the directory where it can find all you xml files for used for configuration. It solved it for me.
It might be taking the path /tmp/hadoop-hadoop/dfs/name from hdfs-default.xml. Not sure why the value from hdfs-site.xml is not taken. Is dfs.name.dir marked as final in hdfs-default.xml?
Check if some Hadoop Process is running in the background already. This happens if you have aborted a previous process and it has not been killed and has become a ZOMBIE process
If that is the case kill the process and then again try to format the system
Also you can check the permission of the Directory.
Try to give a different location for the directory, if it is reflected
Please don't set HADOOP_CONF_DIR. You can check .bashrc file and remove it.

hadoop conf "fs.default.name" can't be setted ip:port format directly?

all
I have setupped a hadoop cluster in fully distributed mode. First, I set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in hostname:port format, and chang /etc/hosts correspondingly, the cluster works succesfully.
Then I use another way, I set set core-site.xml "fs.default.name" and mapred-site.xml "mapred.job.tracker" in ip:port format. It dosen't work.
I find
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhost name. Using 'localhost'...
in namenode log file and
ERROR org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error getting localhos
t name. Using 'localhost'...
java.net.UnknownHostException: slave01: slave01: Name or service not known
in datanode log file.
In my opinion,ip and hostname is equivalent. Is there something wrong in my hadoop conf?
maybe there is a wrong configured hostname in /etc,
you should check hostname /etc/hosts /etc/HOSTNAME (rhel/debian) or rc.conf (archlinux) etc.
I got your point. This is because of that you probably wrote in mapred-site.xml, hdfs://ip:port (it starts with hdfs, this is wrong) but when you write hostname:port, you probably did not write hdfs at the beginning of the value which is correct way. THerefore, firstone did not work,but, second has worked
Fatih haltas
I found answer here.
It seems that HDFS uses host name only for it's all communication and display purposes, so we can NOT use ip directly in core-site.xml and mapred-site.xml

Resources