Error on starting HDFS daemons on hadoop Multinode cluster.Datanode not starting - hadoop

I am trying to setup hadoop cluster and getting following error while connecting datanode.Namenode is up and running fine,however datanode is creating problem.
/etc/hosts file is available on both the nodes.
IP tables stopped(f/w).
ssh happening.
2015-05-20 20:54:05,008 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: nn1.cluster1.com/192.168.1.11:9000. Already tried 9
time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
SECONDS) 2015-05-20 20:54:05,017 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Call to nn1.cluster1.com/192.168.1.11:9000 failed on local exception:
java.net.NoRouteToHostException: No route to host

java.io.IOException: Call to nn1.cluster1.com/192.168.1.11:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
This error occurs if you have firewall on namenode system. To disable firewall, type these commands in terminal.
service iptables save
service iptables stop
chkconfig iptables off
Now, stop and start the hadoop processess.

Related

Nodemanger getting killed in Hadoop 2.6.0

Before running a job all the daemons in slave nodes are working fine. But while executing a process, the NodeManger is getting killed in Hadoop 2.6.0
2014-07-20 05:16:00,568 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
java.net.ConnectException: Call From node06.nadcse.edu/172.16.6.129 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Caused by: java.net.ConnectException: Connection refused
0.0.0.0:8031
I would start looking at the problem from this part.
Address 0.0.0.0? Does not look right.
RM and NM assuming on different nodes. It wouldn't be to connect obviously.
If this is a test cluster or something like that, try changing yarn.nodemanager.hostname to 127.0.0.1.

Datanode is not showing up on hitting jps command

I am newbie in hadoop i have setup multinode cluster but when i hit jps command on master node it shows only namenode not datanode and when i paste this url 'Master:50070' it shows no live node due to which i am unable to copy data from my local system into hdfs it throws this error
hduser#oodles-Latitude-3540:~$ hadoop fs -copyFromLocal /home/oodles/input/test /tmp
15/06/28 16:27:56 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/test._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
after starting hadoop cluster using this command start-dfs.sh my namenode started successfully but datanode did't . when i check datanode log it shows this
ToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:53,496 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:54,498 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:55,499 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-06-28 04:01:56,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Master/192.168.0.126:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
i googled but not found solution for this .
when i hit jps command on slave node there it is showing datanode only
and one thing more when i paste 'Master:50070' into browser and Browse file system
it shows me this error
HTTP ERROR 500
Problem accessing /nn_browsedfscontent.jsp. Reason:
Can't browse the DFS since there are no live nodes available to redirect to.
Caused by:
java.io.IOException: Can't browse the DFS since there are no live nodes available to redirect to.
at org.apache.hadoop.hdfs.server.namenode.NamenodeJspHelper.redirectToRandomDataNode(NamenodeJspHelper.java:666)
at org.apache.hadoop.hdfs.server.namenode.nn_005fbrowsedfscontent_jsp._jspService(nn_005fbrowsedfscontent_jsp.java:70)
My hadoop cluster configuration is like this
1) /etc/host file on master
2) /etc/hosts file on slave
i have edit entry in master and slave file in hadoop configuration folder i.e masters file i added Master and slaves file i added Slave1
Can anybody help me to solve these problems!
datanode logs showing in two pictures
Do you config the ssh? Try you use ssh to login the other node to check the ssh connection.

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server.
The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:
2015-04-21 19:05:22,825 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:
2015-04-21 19:05:55,822 [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
I found this question here but in my case the job history server is started. If I run netstat, I find:
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 12073/java off (0.00/0/0)
Where 12073 is ...
12073 pts/4 Sl 0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
I tried opening the port 10200 in case it was a firewall issue:
ACCEPT tcp -- anywhere anywhere tcp dpt:10020
... but no luck.
After a few minutes, some of the tasks just arbitrarily continue to the next part.
I'm using Hadoop 2.3 and Pig 0.14.
My question is:
1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?
... or failing that ...
2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?
It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.
The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)
<property>
<name>mapreduce.jobhistory.address</name>
<value>cm:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
Then restart the job history server:
mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver
If you get a bind exception (port in use), it means the stop didn't work. Either
Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration
Pig should pick up the new settings automatically. Run a Pig script and hope for the best.
start history server in hadoop bin using the below command
bin$ ./mr-jobhistory-daemon.sh start historyserver
run pig using the below command
$pig
Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml,
then:
mapred --daemon start
The solution was the History server was not running:
[user#vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user#vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user#vm9 sbin]$
Now pig can run properly and it will connect to the job history server and the dump command is working fine.

Spark Script for Hadoop EC2 Installation: IPC client connection refused

I was trying to use distcp to copy between Hadoop and Amazon S3 on a EC2 cluster setup by the spark scripts for EC2
[root]# bin/hadoop distcp s3n://bucket/f1 hdfs:///user/root/
The error I got was
INFO ipc.Client: Retrying connect to server: .. Already tried n time(s).
Copy failed: java.net.ConnectException: Call to ..my_server failed on connection excep\
tion: java.net.ConnectException: Connection refused
Spark EC2 scripts, perhaps intentionally, do not start JobTracker and TaskTracker services.
So after running the Spark EC2 installation scripts, to start the services, I ran
{HADOOP_HOME}/bin/start-all.sh
Reference: Thanks Brock Noland at https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/H6wAfdeLIJo

Connection Error in Apache Pig

I am running Apache Pig .11.1 with Hadoop 2.0.5.
Most simple jobs that I run in Pig work perfectly fine.
However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:
2013-07-29 13:24:08,591 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
013-07-29 11:57:29,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:30,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:31,422 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException
The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.
So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.
The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.
One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.
Any suggestions on how to get rid of these messages?
Yes the problem was that the job history server was not running.
All we had to do to fix this problem was enter this command into the command prompt:
mr-jobhistory-daemon.sh start historyserver
This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.
I think, this problem is related to hadoop mapred-site configuration issue. History Server runs default in localhost, so you need to add your configured host.
<property>
<name>mapreduce.jobhistory.address</name>
<value>host:port</value>
</property>
then fire this command -
mr-jobhistory-daemon.sh start historyserver
I am using Hadoop 2.6.0, so I had to do
$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver
where, /usr/local/hadoop/etc is my HADOOP_CONF_DIR.
I am using Hadoop 2.2.0. This problem was due to The History server was not running. I had to start the history server. I used following command to start history server:
[root#localhost ~]$ /usr/lib/hadoop-2.2.0/sbin/mr-jobhistory-daemon.sh
start historyserver

Resources