Failed to start Hadoop namenode in cloudera CDH5 on debian OS

Failed to start Hadoop namenode in cloudera CDH5 on debian OS - hadoop

I am installing Cloudera on debian OS.
this is my core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-test-1:8020</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
</configuration>
when i try to start the name node I am getting the following errors.
# sudo service hadoop-hdfs-namenode start
these are the errors that i have received when i execute the above cmd
FAIL - Failed to start Hadoop namenode. Return value: 1 ... failed!
in hadoop-hdfs-namenode-hadoop-test-1.log file :
Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.

Related

Hadoop HA ERROR: Exception in doCheckpoint (IOException) Exception during image upload doCheckpoint

I am using Hadoop 3.2.2 in a cluster based on Windows 10 and on which the high availability is configured on HDFS using the Quorum Journal manager.
The system works just fine, I am able to transition nodes from active to standby state without issues, but I often get the following error message :
java.io.IOException: Exception during image upload
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:315)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1300(StandbyCheckpointer.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:480)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:383)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:403)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:502)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:399)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error writing request body to server
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:295)
... 6 more
Caused by: java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3597)
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3580)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:377)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:321)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:295)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:230)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748
My cluster setup is the following
A: Namenode, Zookeeper, ZKFC, Journal
B: Namenode, Zookeeper, ZKFC, Journal
C: Namenode, Zookeeper, ZKFC
D: Journal, Datanode
E,F,G....: Datanode
Here is my hdfs-site configuration
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>A,B,C</value>
<description>Unique identifiers for each NameNode in the
nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.A</name>
<value>A:8020</value>
<description>RPC address for NameNode 1, it is necessary to use the real host name of the machine instead of an aliases</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.B</name>
<value>B:8020</value>
<description>RPC address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.C</name>
<value>C:8020</value>
<description>RPC address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.A</name>
<value>A:9870</value>
<description>HTTP address for NameNode 1</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.B</name>
<value>B:9870</value>
<description>HTTP address for NameNode 2</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.C</name>
<value>C:9870</value>
<description>HTTP address for NameNode 3</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://A:8485;B:8485;D:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(C:/mylocation/stop-namenode.bat $target_host)</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>C:/hadoop-3.2.2/data/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>A:2181,B:2181,C:2181</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop-3.2.2/data/dfs/datanode</value>
</property>
<property>
<name>dfs.namenode.safemode.threshold-pct</name>
<value>0.5f</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
</configuration>
Does someone got the same issue ? Am I missing something here ?

Not sure if this issue is resolved. It may be because of this change https://issues.apache.org/jira/browse/HADOOP-16886. Solution would be to add the desired value for hadoop.http.idle_timeout.ms in core-site.xml.

hadoop's start-dfs not creating datanode on the slave

I am trying to set a Hadoop cluster over two nodes. start-dfs.sh on my master node is opening a window and shortly after the window closes, and when i execute start-dfs it logs namenode is correctly launched, but datanode is not and logs the following :
Problem binding to [slave-VM1:9005] java.net.BindException: Cannot assign requested address: bind; For more details see: http://wiki.apache.org/hadoop/BindException
I have set
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(and also set adminstrators_authorized_keys file with the right public key) (also ssh user#remotemachine is working and gives access to the slave)
Here's my full Hadoop configuration set on both master and slave machines (Windows):
hdfs-site.xml :
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/data/namenode</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>slaveVM1:50475</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/data/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>dfs.datanode.http.address</name>
<value>slaveVM1:9005</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://masterVM2:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/C:/Hadoop/hadoop-3.2.2/hadoopTmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://masterVM2:8020</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>masterVM2:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
PS : i am adminstrator on both machines, and i set HADOOP_CONF_DIR C:\Hadoop\hadoop-3.2.2\etc\hadoop
I also set the slave IP in hadoop_conf_dir slaves file.
PS : if i remove the code :
<property>
<name>dfs.datanode.https.address</name>
<value>slave:50475</value>
</property>
from hdfs-site.xml
Then both datanote and namenode launch on the master node.
hosts :
*.*.*.* slaveVM1
*.*.*.* masterVM2
... are the IPs of the respective machines, all other entries are commented out

This usually happens
BindException: Cannot assign requested address: bind;
when the port in use. Meaning maybe it's the application was already started, or was started previously and didn't shut down properly or another applicaiton is using that port. Try rebooting, (as a heavy handed but reasonably effective way of clearing ports).

Hadoop: datanode not starting on slave

I have two VMs setup with Ubuntu 12.04. I am trying to setup Hadoop multinode, but after executing hadoop/sbin/start-dfs.shI see following process on my master:
20612 DataNode
20404 NameNode
20889 SecondaryNameNode
21372 Jps
However, there is nothing in the slave. Also when I do hdfs dfsadmin -report, I only see:
Live datanodes (1):
Name: 10.222.208.221:9866 (master)
Hostname: master
I checked logs, my start-dfs.sh does not even try to start datanode on my slave.
I am using following configuration:
#/etc/hosts
127.0.0.1 localhost
10.222.208.221 master
10.222.208.68 slave-1
changed hostanme in /etc/hostname in respective systems
Also, I am able to ping slave-1 from master system and vice-versa using ping.
/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
#hadoop/etc/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/data/namenode</value>
<description>NameNode directory</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/data/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
I have also added master and slave-1 in /hadoop/etc/master and /hadoop/etc/slaveson both my master and slave system.
I have also tried cleaning data/* and then hdfs namenode -format before start-dfs.sh, still the problem persists.
Also, I have Network adapter setting marked as Bridged adapter.
Any possible reason datanode not starting on slave?

Can't claim to have the answer, but I found this "start-all.sh" and "start-dfs.sh" from master node do not start the slave node services?
Changed my slaves file to workers file and everything clicked in.

It seems you are using hadoop-2.x.x or above, so, try this configuration. And by default masters file( hadoop-2.x.x/etc/hadoop/masters) won't available on hadoop-2.x.x onwards.
hadoop-2.x.x/etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
~/etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/data/namenode</value>
<description>NameNode directory</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/data/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
~/etc/hadoop/mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
~/etc/hadoop/yarn-site.xml:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
~/etc/hadoop/slaves
slave-1
copy all the above configured file from master and replace it on slave on this path hadoop-2.x.x/etc/hadoop/.

How to set up a federated cluster?

I will paste all my configuration below. I have a cluster of 3 computers. Configuration of namenode 1 (impc2361)
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>viewfs//ClusterA</value>
</property>
<property>
<name>fs.viewfs.mounttable.ClusterA.link./home</name>
<value>hdfs//impc2361:8021/home</value>
</property>
<property>
<name>fs.viewfs.mounttable.ClusterA.link./home1</name>
<value>hdfs//impc2359:8020/home1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/Downloads/hadoop2/tmpfold</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>home,home1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.home</name>
<value>impc2361:8021</value>
</property>
<property>
<name>dfs.namenode.http-address.home</name>
<value>impc2361:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.home1</name>
<value>impc2359:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.home1</name>
<value>impc2359:50070</value>
</property>
</configuration>
I have copied the same configurations on the other nodes as well that is namenode2 (impc2359) and datanode (impc2391)
Problems
I don't get the web^page of namenode1(impc2361) when I type impc2361.htcitmr:50070 in web url
It throws an error
HTTP ERROR 404
Problem accessing /dfshealth.jsp.
Reason: NOT_FOUND
I get a web page of namenode2 (impc2359) when I type impc2359.htcitmr:50070 but i don't find the folder /home1 which was set in core-site.xml
I am not able to do any operations through my terminal on cluster as it throws a error that it is readonly
hadoop fs -mkdir /a
mkdir: InternalDir of ViewFileSystem is readonly; operation=mkdirsPath=/a
Please kindly help

nodemanager is not starting while upgrading to hadoop 2 from hadoop classic

I have one master one worker cluster. I am upgrading to YARN from Hadoop classic. resourcemanager and historyserver successfully started, but nodemanager is not starting it is giving error
java.lang.NumberFormatException: For input string: "${nodemanager.resource.memory-mb}"
I have kept same yarn-site.xml.template in both server.
I have replaced ${nodemanager.resource.memory-mb} to 8192
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>__RM_IP__</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>${nodemanager.resource.memory-mb}</value>
</property>
</configuration></br>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Failed to start Hadoop namenode in cloudera CDH5 on debian OS - hadoop

Related

Hadoop HA ERROR: Exception in doCheckpoint (IOException) Exception during image upload doCheckpoint

hadoop's start-dfs not creating datanode on the slave

Hadoop: datanode not starting on slave

How to set up a federated cluster?

nodemanager is not starting while upgrading to hadoop 2 from hadoop classic

Categories

Resources