MapReduce Jobs failing, after accepted by YARN - hadoop

Even a simple WordCount mapreduce also fails with same error.
Hadoop 2.6.0
Below are the Yarn logs.
It seems some sort of timeout happens during resource negotiation.
But i am unable to verify the same, exactly what causes timeout.
2016-11-11 15:38:09,313 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Error launching appattempt_1478856936677_0004_000002. Got exception:
java.io.IOException: Failed on local exception: java.io.IOException:
java.net.SocketTimeoutException: 60000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054
remote=platform-demo/10.0.37.145:60487]; Host Details : local host is:
"platform-demo/10.0.37.145"; destination host is:
"platform-demo":60487;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy79.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054
remote=platform-demo/10.0.37.145:60487]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054
remote=platform-demo/10.0.37.145:60487]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
... 12 more
2016-11-11 15:38:09,319 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1478856936677_0004_000002 with
final state: FAILED, and exit status: -1000 2016-11-11 15:38:09,319
INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1478856936677_0004_000002 State change from ALLOCATED to
FINAL_SAVING
I tried to change below properties
yarn.nodemanager.resource.memory-mb
2200 Amount of physical memory, in MB,
that can be allocated for containers.
yarn.scheduler.minimum-allocation-mb
500
dfs.datanode.socket.write.timeout
3000000
dfs.socket.timeout 3000000

Q1.MapReduce Jobs failing, after accepted by YARN
Reason, multiple connections around 130 stuck on port 60487.
Q2.MapReduce Jobs failing, after accepted by YARN
Issue is due to hadoop tmp /app/hadoop/tmp. Empty this directory and re-tried MAPR job, job was executed successfully.
Q3.Unhealthy Node local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir
Edit yarn-site.xml with folowing property.
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>98.5</value>
</property>
Refer Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?

Related

Call From kv.local/172.20.12.168 to localhost:8020 failed on connection exception, when using tera gen

I am working with hadoop teragen to check the hadoop mapreduce benchmarking with the terasort.
But when i run the following command,
hadoop jar /Users/**/Documents/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar teragen -Dmapreduce.job.maps=100 1t random-data
I got the following exception,
17/06/01 15:09:21 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/06/01 15:09:22 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
17/06/01 15:09:23 INFO terasort.TeraSort: Generating -727379968 using 100
17/06/01 15:09:23 INFO mapreduce.JobSubmitter: number of splits:100
17/06/01 15:09:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1496303775726_0003
17/06/01 15:09:23 INFO impl.YarnClientImpl: Submitted application application_1496303775726_0003
17/06/01 15:09:23 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1496303775726_0003/
17/06/01 15:09:23 INFO mapreduce.Job: Running job: job_1496303775726_0003
17/06/01 15:09:27 INFO mapreduce.Job: Job job_1496303775726_0003 running in uber mode : false
17/06/01 15:09:27 INFO mapreduce.Job: map 0% reduce 0%
17/06/01 15:09:27 INFO mapreduce.Job: Job job_1496303775726_0003 failed with state FAILED due to: Application application_1496303775726_0003 failed 2 times due to AM Container for appattempt_1496303775726_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://localhost:8088/proxy/application_1496303775726_0003/Then, click on links to logs of each attempt.
Diagnostics: Call From KV.local/172.20.12.168 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
java.net.ConnectException: Call From KV.local/172.20.12.168 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522)
at org.apache.hadoop.ipc.Client.call(Client.java:1439)
... 31 more
As the error show, it is not able to connect to localhost:8020, but when i chech the namenode web UI, it shows that the namenode is active. Please see the below screenshot:
I found many posts related to this, but none helped me out. I also checked out the hosts file, which contains the following line:
127.0.0.1 localhost
172.20.12.168 localhost
Can anybody help me out sorting out this problem?
The following procedure helped me out in solving the issue:
Stop all the services.
Delete namenode and datanode directories as specified in hdfs-site.xml.
Create new namenode and datanode directories and modify hdfs-site.xml accordingly.
in core-site.xml, make the following changes or add the following properties:
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.20.12.168/</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://172.20.12.168:8020</value>
</property>
Make the following changes in hadoop-2.6.4/etc/hadoop/hadoop-env.sh file:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
Restart dfs, yarn and mr as follows:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

Hadoop datanode cannot restart after its failure

I am running Map/Reduce tasks with hadoop 1.2.1.
While running heavy MR tasks, I encountered data node failure. The log messages follows:
2017-01-24 21:55:41,735 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: Problem binding to /0.0.0.0:50020 :
at org.apache.hadoop.ipc.Server.bind(Server.java:267)
at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:341)
at org.apache.hadoop.ipc.Server.<init>(Server.java:1539)
at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:569)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:530)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:554)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
Caused by: java.net.BindException:
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:265)
... 11 more
I guess, after the data node failure, it tried to restart but it failed.
How can I make it able to restart normally? so that the whole MR task is not harmed.
I cannot increase data replication factor in HDFS (it's set to 1 currently) due to the space problem of disks

What is causing "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null"?

I have an Elastic MapReduce job which uses elasticsearch-hadoop via scalding-taps to transfer data from Amazon S3 to Amazon Elasticsearch Service. For a long time this job ran successfully. However, it has recently started failing with the following stack trace:
2016-03-02 07:28:34,003 FATAL [IPC Server handler 0 on 41019] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1456902751849_0012_m_000000_0 - exited : cascading.tuple.TupleException: unable to sink into output identifier: myindex/mytable
at cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:95)
at cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
at cascading.flow.stream.SinkStage.receive(SinkStage.java
:37)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at com.twitter.scalding.MapFunction.operate(Operations.scala:59)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null
We have enabled the "es.nodes.wan.only" setting.
What could be causing this failure?

Hadoop Balancer fails with - IOException: Couldn't set up IO streams (LeaseRenewer Warning)

I am stumbling across this error while running the Hadoop Balancer via Namenode. Anytips on cracking this. The process is also blocking the current user and giving an Out of Memory error on issuing any other command.
14/05/09 11:30:05 WARN hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-77290934_1] for 936 seconds. Will retry shortly ...
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "hadoop01.xx.xx.xx.xx.com/30.0.1.176"; destination host is: "hadoop01.xx.xx.xx.xx.com":8022;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
at org.apache.hadoop.ipc.Client.call(Client.java:1242)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.renewLease(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:458)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:649)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:671)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
at org.apache.hadoop.ipc.Client.call(Client.java:1209)
... 15 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:664)
... 18 more
Once the number of threads created by Hadoop RPC reaches the (ulimit -u) on the node's number of processes, java will report it as an out-of-memory error.
Try increasing the maximum number of processes allowed, i.e. your ulimit -u value.

after replace mapred/hdfs/common jar build via Hadoop from SVN "no namenode to stop"

I checkout the source code from
http://svn.apache.org/repos/asf/hadoop/common
http://svn.apache.org/repos/asf/hadoop/hdfs
http://svn.apache.org/repos/asf/hadoop/mapreduce
and get
hadoop-mapred-0.23.0-SNAPSHOT.jar
hadoop-hdfs -0.23.0-SNAPSHOT.jar
hadoop-common-0.23.0-SNAPSHOT.jar
but I failed to start-all.sh with these jars...
Jobtracker and tasktracker started for just 5 secs and automatically shut down...
Anyone could help?
I tried to check out the log
tasktracker said
2011-03-01 00:43:06,242 ERROR org.apache.hadoop.io.nativeio.NativeIO: Unable to initialize NativeIO libraries
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO.initNative()V
at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO.(NativeIO.java:55)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:558)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:352)
at org.apache.hadoop.mapred.TaskController.setup(TaskController.java:90)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:698)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1391)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3619)
2011-03-01 00:43:12,983 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1063)
at org.apache.hadoop.ipc.Client.call(Client.java:1031)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:197)
at org.apache.hadoop.mapred.$Proxy4.getProtocolSignature(Unknown Source)
at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:238)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:422)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:278)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:232)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:194)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:176)
at org.apache.hadoop.mapred.TaskTracker$2.run(TaskTracker.java:710)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1142)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:706)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1391)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3619)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:59)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:132)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:368)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:760)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:698)
2011-03-01 00:43:12,984 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/**********************************
SHUTDOWN_MSG: Shutting down TaskTracker at Vaio-sz65/127.0.1.1
**********************************/
Now I know how to deal with it...
upgrade the dfs first!

Resources