Nodemanger getting killed in Hadoop 2.6.0

Nodemanger getting killed in Hadoop 2.6.0 - hadoop

Before running a job all the daemons in slave nodes are working fine. But while executing a process, the NodeManger is getting killed in Hadoop 2.6.0
2014-07-20 05:16:00,568 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
java.net.ConnectException: Call From node06.nadcse.edu/172.16.6.129 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Caused by: java.net.ConnectException: Connection refused

0.0.0.0:8031
I would start looking at the problem from this part.
Address 0.0.0.0? Does not look right.
RM and NM assuming on different nodes. It wouldn't be to connect obviously.
If this is a test cluster or something like that, try changing yarn.nodemanager.hostname to 127.0.0.1.

Related

ambari on HDP cluster + ambari-metrics-collector service not start

we have some issue with ambari-metrics-collector service , ( we have HDP cluster version - 2.6.4 with 8 nodes )
ambari metrics collector service can’t start or start of few second then failed
the details about metrics collector version
rpm -qa | grep metrics
ambari-metrics-grafana-2.6.1.0-143.x86_64
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
all machines are rhel 7.2
we performed the following steps in order to resolve the problem
1.restart metrics-collector service
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ stop'
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ start'
or
ambari-metrics-collector stop
ambari-metrics-collector start
2.restart ambari-metrics-monitor on all nodes
ambari-metrics-monitor stop
ambari-metrics-monitor start
3.clean the folder /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/
mv /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0 /tmp/bck/zookeeper/
Then restart metrics-collector service
4.Tuning the metrics-collector parameters according - https://docs.cloudera.com/HDPDocuments/Ambari-2.2.1.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
we update the follwing parameters in ambari
metrics_collector_heap_size=1024
hbase_regionserver_heapsize=1024
hbase_master_heapsize=512
hbase_master_xmn_size=128
status for now: - steps 1-4 doesn’t help
From the logs we can see the following:
log file - ambari-metrics-collector.log
2020-06-25 09:06:14,474 WARN org.apache.zookeeper.ClientCnxn: Session 0x172eab71f310002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-06-25 09:06:14,575 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys671.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server
log file - hbase-ams-master-master02.sys671.com.log
2020-06-25 09:38:18,799 WARN [RS:0;master02:51842-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2020-06-25 09:38:20,437 INFO [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server master02.sys671.com/23.2.35.171:61181. Will not attempt to authenticate using SASL (unknown error)
2020-06-25 09:38:20,438 WARN [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
we also not see that port is listening ( timeline.metrics.service.webapp.address )
netstat -tulpn | grep 6188
any advice how to continue from this point ?
we'll appreciate to get any help about this problem

Error on starting HDFS daemons on hadoop Multinode cluster.Datanode not starting

I am trying to setup hadoop cluster and getting following error while connecting datanode.Namenode is up and running fine,however datanode is creating problem.
/etc/hosts file is available on both the nodes.
IP tables stopped(f/w).
ssh happening.
2015-05-20 20:54:05,008 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: nn1.cluster1.com/192.168.1.11:9000. Already tried 9
time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS) 2015-05-20 20:54:05,017 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Call to nn1.cluster1.com/192.168.1.11:9000 failed on local exception:
java.net.NoRouteToHostException: No route to host

java.io.IOException: Call to nn1.cluster1.com/192.168.1.11:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
This error occurs if you have firewall on namenode system. To disable firewall, type these commands in terminal.
service iptables save
service iptables stop
chkconfig iptables off
Now, stop and start the hadoop processess.

YARN: what subsystem connecting to port 44874

I am trying to run my MR job on YARN. There is this error in one of the userlogs on the node3:
2014-10-10 00:57:16,965 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-10-10 00:57:16,965 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1412895371072_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#69d5af30)
2014-10-10 00:57:17,330 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-10-10 00:57:18,547 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: node03/127.0.1.1:44874. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-10-10 00:57:19,548 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: node03/127.0.1.1:44874. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
2014-10-10 00:57:27,558 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: node03/127.0.1.1:44874. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-10-10 00:57:27,562 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From node03/127.0.1.1 to node03:44874 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy9.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 4 more
2014-10-10 00:57:27,564 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2014-10-10 00:57:27,566 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
:
I have the same config on all nodes. I can't find anywhere having specified port 44874. What does this error actually telling?

If by "half-random" you mean completely random and by "can't really be configured" you mean undocumented and completely f'ed when used in a hardened environmment - you are correct.
The problem is that map-reduce jobs are using dynamic ports. Horton of course doesn't document why the random port is being created.
The answer thus far: disable firewall or allow high ranges (32768-65535) to on each data node. I am still looking for why this situation occurs.

Whenever I see a problem with Hadoop's ports, I google the port number and see if it is a default port for something. In your case it doesn't seem to be.
As far as I can tell, Hadoop uses this kind of half-random ports internally for some things and they can't be really configured. If there is a problem with this kind of ports, for me it has always been an indication of some other (detectable) problem.
I suggest you go through all your logs again to find other problems. Also check namenode statuses (web interface) and make sure all connections are working.

Hadoop-1.2.1 in Solaris 11.1 VM: Call to name-node failed on connection exception

Hi I am following this below guide in link for VirtualBox Solaris Zones Hadoop installation.
Oracle Solaris Zones Hadoop Setup
I was able to successfully follow till step 10. Once I tried to check report I am getting this error::
adoop#name-node:~$ hadoop dfsadmin -report
14/05/17 16:45:12 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/17 16:45:13 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
....
14/05/17 16:45:21 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
report: Call to name-node/192.168.1.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
hadoop#name-node:~$
can someone kindly suggest resolution.
Also netstat shows this
name-node.8021 . 0 0 128000 0 LISTEN
*.50030 . 0 0 128000 0 LISTEN
how to configure dfsadmin to port 8021 instead?

Step by step to configure Hadoop cluster on Oracle Solaris 11.1 using zones --- http://hashprompt.blogspot.com/2014/05/multi-node-hadoop-cluster-on-oracle.html

Probably this is too old question and you might have already solved it. But just in case if anyone is wondering.
in core-site.xml make the following changes
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.1:8021/</value>
</property>
This will configure name node server port.

Problems in setting hadoop on mac os x 10.8

I have set something up on my mac for installing hadoop. But there is an error message like this:
13/02/18 04:05:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
13/02/18 04:05:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
13/02/18 04:05:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
13/02/18 04:05:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
13/02/18 04:05:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
13/02/18 04:05:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
13/02/18 04:05:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
13/02/18 04:05:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
13/02/18 04:06:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
13/02/18 04:06:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
java.lang.RuntimeException: java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:265)
at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
... 17 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
at org.apache.hadoop.ipc.Client.call(Client.java:1050)
... 31 more
then I enter jps to check the service, the result is :
20635 Jps
20466 TaskTracker
20189 DataNode
20291 SecondaryNameNode
I don't know how to deal with this error. could someone give me an answer?
Thx a Lot!!!

Actually all processes that (hadoop should run) are not running because of misconfiguration of IP.I am not familiar with Mac OS but on linux and windows we need to put hadoop enteries for connection in hosts files(etc/hosts) and I am damn sure that it should be for Mac.Now the point is You need to put your hadoop entry in that file as a local mnachine like 127.0.0.1Actually you need to put it against actual IP of your machine For example
hadoop-machine 127.0.0.1 -->(placing loop back IP is wrong here because hadoop will try to connect with this IP).
remove this 127.0.0.1 and place the actual IP of your machine infront of this entry.You can find ip of your mac machine eaisly.here are some questions which are not directly related to hadoop but I guess they would be helpful for you.Question 1 , Question 2, Question 3

This might help a bit. But first U gotta remove the earlier installation using the command
rm -rf /usr/local/Cellar /usr/local/.git && brew cleanup
Then U may begin with a fresh installation of Hadoop on U're Mac.
Step 1: Install Homebrew
$ ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
Step 2: Install Hadoop
$ brew install hadoop
Assume that brew installs Hadoop 1.2.1
Step 3: Configure Hadoop
$ cd /usr/local/Cellar/hadoop/1.2.1/libexec
Add the following line to conf/hadoop-env.sh:
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
Add the following lines to conf/core-site.xml inside the configuration tags:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Add the following lines to conf/hdfs-site.xml inside the configuration tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Add the following lines to conf/mapred-site.xml inside the configuration tags:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
Go to System Preferences > Sharing.
Make sure “Remote Login” is checked.
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step 4: Enable SSH to localhost
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step 5: Format Hadoop filesystem
$ bin/hadoop namenode -format
Step 6: Start Hadoop
$ bin/start-all.sh
Make sure that all Hadoop processes are running:
$ jps
Run a Hadoop example:
One more thing ! “brew update” will update the hadoop binaries to recent version (1.2.1 at present).

I had this same error: ConnectException: Connection refused in all the secondarynamenode log files.
But I also found this in the namenode's log file:
2015-10-25 16:35:15,720 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /private/tmp/hadoop-admin/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
I therefore did a
hadoop namenode -format
and the problem has gone away. Thus, the error message was just about the fact that the namenode had not started successfully.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Nodemanger getting killed in Hadoop 2.6.0 - hadoop

0.0.0.0:8031 I would start looking at the problem from this part. Address 0.0.0.0? Does not look right. RM and NM assuming on different nodes. It wouldn't be to connect obviously. If this is a test cluster or something like that, try changing yarn.nodemanager.hostname to 127.0.0.1.

Related

ambari on HDP cluster + ambari-metrics-collector service not start

Error on starting HDFS daemons on hadoop Multinode cluster.Datanode not starting

YARN: what subsystem connecting to port 44874

Hadoop-1.2.1 in Solaris 11.1 VM: Call to name-node failed on connection exception

Problems in setting hadoop on mac os x 10.8

Categories

Resources