NodeManager in my slave node stops after starting. I have 3 node 1 master and 2 slaves when I use command start-yarn.sh my resourcemanager and nodemanagers start correctly but when I query in hive my mapreduce dont start running and stops at kill command when I check my nodemanager logs in both slaves I just saw that my nodemanagers are shutdown and this is my error:
FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException:
Call From to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I have searched for this problem and I did not find any clue. How to solve this?
Thanks
Add this property in yarn-site.xml file for all nodes,
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ip_or_fqdn_or_hostname_of_resourcemanager_host</value>
</property>
Please add these properties in these files :-
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ip_of_resource_manager</value>
</property>
Related
Hi all I am trying to install the multinode hadoop installation. Everything works fine but my nodemanager for yarn is not working. When I looked at the log file for Yarn nodemanager, I got following information
"org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Initialized nodemanager for null: physical-memory=-1 virtual-memory=-2
virtual-cores=-1"
I have no idea why its not showing the actual memory and virtual core. My VM has 8GB memory and 8Vcpus. Because of above values I am getting this error:
"org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
SHUTDOWN signal from Resourcemanager ,Registration of NodeManager
failed, Message from ResourceManager: NodeManager from SFeUbuntuVM2
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager"
Can someone help me out with this issue?
Check if you have
Selinux disabled
firewall disabled
Check your configuration files.
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>{your host name}</value>
</property>
After all do format your namenode, and start all services again.
I have just started learning hadoop from the book Hadoop: The definitive guide.
I followed the tutorial for Hadoop installation in Pseudodistribution mode. I enabled the passwordless login to ssh.
Formatted the hdfs filesystem before using it for the first time. It started successfully for the first time.
After that I copied a text file using copyFromLocal to HDFS and everything went fine. But if I restart the system and start the daemons again and look at the web UI , only YARN is started successfully.
When I issue the stop-dfs.sh commmand I get
Stopping namenodes on [localhost]
localhost: no namenode to stop
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
If I format the hdfs file system again and then try starting the daemons then they all start successfully.
Here are my configuration files.Exactly as what is told in hadoop definitive guide book.
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
This is the error in the namenode log file
WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop/dfs/name does not exist
WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
This is from mapred log
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 33 more
I visited apache hadoop : connection refused which says
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
I found there is an entry in my /etc/hosts, but if I remove it my sudo breaks causing error sudo: unable to resolve host . What should I append in /etc/hosts if not remove my hostname mapped to 127.0.1.1
I cannot understand what is the root cause of this problem.
Well it says in your Namenode log file that default storage of your namenode directory is /tmp/hadoop. The /tmp directory is formatted in linux on reboot by some systems. So it must be the problem.
You need to change your default namenode and datanode directory by changing your hdfs-site.xml configuration file.
Add this in your hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/"your-user-name"/hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/"your-user-name"/datanode</value>
</property>
After this format your namenode by hdfs namenode -format command.
I think this will end your problem.
If configuration file is not a problem, please try following:
1.first delete all contents from temporary folder:
rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
2.format the namenode:
bin/hadoop namenode -format
3.start all processes again:
bin/start-all.sh
I'm trying to run a hadoop cluster via Docker. I have one virtual machine as the namenode and another for the datanode, but the datanode gives me this error running start-dfs.sh:
namenode: namenode running as process 130. Stop it first.
The command jps on the datanode does not show the namenode running. Then I try to start it by hand, using:
hadoop namenode
And it fails with this error:
java.net.BindException: Problem binding to [namenode:9000] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
So far it seems that namenode is not accesible or is not listening on port 9000. But the network setup is correct: if I execute on datanode:
telnet namenode 9000
It correctly connects to the namenode, and the command netstat -apn | grep 9000 from namenode shows the incoming connection. If I shut down dfs on namenode (stop-dfs.sh), the telnet command from datanode fails with "Connection closed by foreign host."
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- I have tried with 1 and 2 too -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
Thanks!
I set up Hadoop(2.6.0) with multi machines mode : 1 namenode + 3 datanodes. When I used command : start-all.sh, they (namenode, datanode, resource manager, node manager) worked ok. I checked it with jps command and result on each node were bellow:
NameNode :
7300 ResourceManager
6942 NameNode
7154 SecondaryNameNode
DataNodes:
3840 DataNode
3924 NodeManager
And I also uploaded sample text file on HDFS at: /user/hadoop/data/sample.txt. Absolutely no error at that moment.
But when I tried to run a mapreduce with hadoop example's jar :
hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hadoop/data/sample.txt /user/hadoop/output
I have this error:
15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 running in uber mode : false
15/04/08 03:31:26 INFO mapreduce.Job: map 0% reduce 0%
15/04/08 03:31:26 INFO mapreduce.Job: Job job_1428478232474_0001 failed with state FAILED due to: Application application_1428478232474_0001 failed 2 times due to Error launching appattempt_1428478232474_0001_000002. Got exception: java.net.ConnectException: Call From hadoop/127.0.0.1 to localhost:53245 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy31.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more Failing the application.
15/04/08 03:31:26 INFO mapreduce.Job: Counters: 0
About the configuration, sure that namenode can ssh to datanodes and vice versa without prompt password.I also dissabled IP6 and modified /etc/hosts file :
127.0.0.1 localhost hadoop
192.168.56.102 hadoop-nn
192.168.56.103 hadoop-dn1
192.168.56.104 hadoop-dn2
192.168.56.105 hadoop-dn3
I dont know why mapreduced can't run althought namenode and datanodes worked alright. I'm almost stucked at here, can you help me find the reason??
Thank you
Edit :
Here config in hdfs-site.xml (namenode):
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hadoop_stores/hdfs/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-nn:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-nn:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
In datanodes :
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hadoop_stores/hdfs/data/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-nn:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-nn:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
Here's result with command : hadoop fs -ls /user/hadoop/data
hadoop#hadoop:~/DATA$ hadoop fs -ls /user/hadoop/data 15/04/09 00:23:27
Found 2 items
-rw-r--r-- 3 hadoop supergroup 29 2015-04-09 00:22 >/user/hadoop/data/sample.txt
-rw-r--r-- 3 hadoop supergroup 27 2015-04-09 00:22 >/user/hadoop/data/sample1.txt
hadoop fs -ls /user/hadoop/output
ls: `/user/hadoop/output': No such file or directory
Found solution!! see this post- yarn shows data nodes id/name as localhost
Call From localhost.localdomain/127.0.0.1 to localhost.localdomain:56148 failed on connection exception: java.net.ConnectException: Connection refused;
Both master and slaves were having host names of localhost.localdomain in /etc/hostname.
I changed host names of slaves to slave1 and slave2. That worked.
Thank you everyone for your time.
#kate make sure etc/hostname in namenode and datanodes are not set to localhost. Just type ~# hostname in terminal to see. You can set a new hostname by the same command.
My master and workers or slaves' /etc/hosts looks like this-
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#127.0.1.1 localhost
192.168.111.72 master
192.168.111.65 worker1
192.168.111.66 worker2
hostname of worker1
hduser#worker1:/mnt/hdfs/datanode$ cat /etc/hostname
worker1
and worker2
hduser#worker2:/usr/local/hadoop/logs$ cat /etc/hostname
worker2
Also, probably you don't want to have "hadoop" hostname with loopback interface. i.e.
127.0.0.1 localhost hadoop
Check this point (1) in https://wiki.apache.org/hadoop/ConnectionRefused.
Thank you.
FIREWALL ISSUE:
java.net.ConnectException: Connection refused
This error might be due to firewall issues. Do this in terminal:
sudo apt-get install iptables-persistent
sudo iptables -L
sudo iptables-save > /usr/iptables-backup/iptables.v4.rules
Check whether the file is created before continuing (since this will be used to restore firewall if something goes wrong).
Now, flush iptable rules (i.e. stop firewall):
sudo iptables -F
Now try,
sudo iptables -L
This command should return no rules. Now, try to run your map/reduce job.
Note: If you want to restore iptables to previous condition, type this in terminal:
sudo iptables-restore < /usr/iptables-backup/iptables.v4.rules
I have a single node hadoop and have installed hbase also on my ubuntu 12.04. Now i want to install titan over hbase. I have setup hadoop-1.0.3 and hbase-0.94.18 and titan/hbase-0.4.2
I have added a user mnit.My /usr/local/ folder contains hadoop2 , hbase2, titan2 .First i start my hadoop using command bin/start-all.sh and then i start hbase using command bin/start-hbase.sh . after it when i do jps i found the following :
mnit#aman:/usr/local$ jps
9921 DataNode
11386 HRegionServer
11041 HQuorumPeer
11537 Jps
11115 HMaster
10153 SecondaryNameNode
10252 JobTracker
9691 NameNode
10483 TaskTracker
now i start gremlin.sh in titan2 using command bin/gremlin.sh .
i applied the following commands
mnit#aman:/usr/local/titan2$ bin/gremlin.sh
gremlin> conf = new BaseConfiguration();
==>org.apache.commons.configuration.BaseConfiguration#19288c2
gremlin> conf.setProperty("storage.backend","hbase");
==>null
gremlin> conf.setProperty("storage.hostname","127.0.0.1");
==>null
gremlin> g = TitanFactory.open(conf);
WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
When i searched over this problem i found that there is a file named pom.xml but the titan that i have downloaded does not contain pom.xml. please tell me if this is a problem due to pom.xml. or i am doing something wrong or there is some other issue.
Thanks in advance
zk is managed by hbase in my system. i have added the following line in bin/hbase-env.sh
export HBASE_MANAGES_ZK=true
the content of my hbase-site.xml is as follows :
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/user/hbase</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.datadir</name>
<value>/app/hadoop/tmp/zookeeper</value>
</property>
</configuration>
Your Titan and HBase configurations appear to be inconsistent. Your hbase-site.xml overrides the default ZK port (2181) to 2222, but it seems you haven't told Titan to use this non-default ZK port by setting storage.port in your Titan config file. Naturally, they can't talk to each other in that state. This doesn't have anything to do with pom.xml.
By the way, please don't simultaneously crosspost to SO and the aureliusgraphs Google Group. They're both good venues with slightly different purposes, but you seem to have just copy-pasted between this SO question and your subjectless thread on the aureliusgraphs list.