Sqoop Workflow with Oozie Always Fails - sqoop

In the process of learning Sqoop I execute a sqoop command to fetch all mysql database in Cloudera's DH, which returns all available databases correctly. The problem is if I run the same command as a job in an Oozie workflow it always fails.
job.properties
nameNode=hdfs://quickstart.cloudera:8020
resourceManager=0.0.0.0:8032
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/oozie/pig_demo
workflow.xml
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.2">
<start to="sqoop-36c5"/>
<action name="sqoop-36c5">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<command>list-databases --m 1
--connect "jdbc:mysql://quickstart.cloudera:3306"
--username retail_dba
--password cloudera</command>
</sqoop>
<ok to="finish"/>
<error to="errorHalt"/>
</action>
<kill name="errorHalt">
<message>Input unavailable,error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="finish"/>
</workflow-app>
The following is logs generated
2019-01-24 09:52:09,352 INFO [Thread-69] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/history/done_intermediate/cloudera/job_1548311302916_0001-1548312666604-cloudera-oozie%3Alauncher%3AT%3Dsqoop%3AW%3Dfoo%2Dwf%3AA%3Dsqoop%2D36c5%3AID%3D00-1548312729060-1-0-SUCCEEDED-root.cloudera-1548312703464.jhist_tmp to hdfs://quickstart.cloudera:8020/tmp/hadoop-yarn/staging/history/done_intermediate/cloudera/job_1548311302916_0001-1548312666604-cloudera-oozie%3Alauncher%3AT%3Dsqoop%3AW%3Dfoo%2Dwf%3AA%3Dsqoop%2D36c5%3AID%3D00-1548312729060-1-0-SUCCEEDED-root.cloudera-1548312703464.jhist
2019-01-24 09:52:09,352 INFO [Thread-69] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2019-01-24 09:52:09,353 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1548311302916_0001_m_000000_0
2019-01-24 09:52:09,437 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1548311302916_0001_m_000000_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
2019-01-24 09:52:09,441 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to
2019-01-24 09:52:09,442 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://quickstart.cloudera:19888/jobhistory/job/job_1548311302916_0001
2019-01-24 09:52:09,480 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
2019-01-24 09:52:10,483 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2019-01-24 09:52:10,488 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://quickstart.cloudera:8020 /tmp/hadoop-yarn/staging/cloudera/.staging/job_1548311302916_0001
2019-01-24 09:52:10,505 INFO [Thread-69] org.apache.hadoop.ipc.Server: Stopping server on 34049
2019-01-24 09:52:10,517 INFO [IPC Server listener on 34049] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 34049
2019-01-24 09:52:10,523 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-01-24 09:52:10,524 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted
2019-01-24 09:52:10,531 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: TaskAttemptFinishingMonitor thread interrupted
2019-01-24 09:52:10,556 INFO [Thread-69] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job end notification started for jobID : job_1548311302916_0001
2019-01-24 09:52:10,560 INFO [Thread-69] org.mortbay.log: Job end notification attempts left 0
2019-01-24 09:52:10,560 INFO [Thread-69] org.mortbay.log: Job end notification trying http://quickstart.cloudera:11000/oozie/callback?id=0000000-190124093049946-oozie-oozi-W#sqoop-36c5&status=SUCCEEDED
2019-01-24 09:52:10,590 INFO [Thread-69] org.mortbay.log: Job end notification to http://quickstart.cloudera:11000/oozie/callback?id=0000000-190124093049946-oozie-oozi-W#sqoop-36c5&status=SUCCEEDED succeeded
2019-01-24 09:52:10,590 INFO [Thread-69] org.mortbay.log: Job end notification succeeded for job_1548311302916_0001
2019-01-24 09:52:15,605 INFO [Thread-69] org.apache.hadoop.ipc.Server: Stopping server on 44688
2019-01-24 09:52:15,613 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2019-01-24 09:52:15,617 INFO [IPC Server listener on 44688] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 44688
2019-01-24 09:52:15,637 INFO [Thread-69] org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:0
The sqoop job succeeds but TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED is KILLED, why is that.
[cloudera#quickstart ~]$ yarn version
Hadoop 2.6.0-cdh5.13.0
Subversion http://github.com/cloudera/hadoop -r 42e8860b182e55321bd5f5605264da4adc8882be
Compiled by jenkins on 2017-10-04T18:08Z
Compiled with protoc 2.5.0
From source with checksum 5e84c185f8a22158e2b0e4b8f85311
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar

First of all, you are running a listing database command. Not sure why you have put that in workflow.xml file.
I have experienced the Cloudera VM does not behave very consistently as we generally run it with limited memory and either container is not allocated or it is killed. Restarting the entire VM does not help also.
If you get a fresh instance of Cloudera VM again with the image and try to run it, it may solve your problem. It worked for us in past.

Related

Hadoop Mapreduce job map 100% reduce 0% - rduction is not running and node manager is shutting down

I am trying to run the hadoop wordcount example after installation:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output
I tried to run with default memory settings. The moment map job finishes nodemanager shutsdown and reduce job cannot start. Find the logs below:
2022-03-08 16:18:22,557 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14844 for container-id container_1646774131562_0001_01_00
0001: 320.0 MB of 2 GB physical memory used; 2.7 GB of 4.2 GB virtual memory used
2022-03-08 16:18:25,266 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Done waiting for Applications to be Finished. Still alive: [application_1646774131562_0001]
2022-03-08 16:18:25,266 INFO org.apache.hadoop.ipc.Server: Stopping server on 38731
2022-03-08 16:18:25,270 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,271 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 38731
2022-03-08 16:18:25,275 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2022-03-08 16:18:25,300 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2022-03-08 16:18:25,302 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2022-03-08 16:18:25,303 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2022-03-08 16:18:25,304 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2022-03-08 16:18:25,306 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2022-03-08 16:18:25,307 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at ubuntu2110/127.0.1.1
************************************************************/

Hadoop distcp jobs SUCCEEDED but attempt_xxx killed by ApplicationMaster

Running a distcp job I encounter the following problem:
Almost all map tasks are marked as successful but with note saying Container killed.
On the online interface the log for the map jobs says:
Progress 100.00
State SUCCEEDED
but under Note it says for almost every attempt (~200)
Container killed by the ApplicationMaster.
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.
stderr output is empty for all jobs/attempts.
When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:
2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
i have set "mapreduce.map.speculative=false"!
All MAP task are SUCCEEDED(distcp job has no REDUCE),but MAPREDUCE is going for a long time(several hours) , then it is succeeded and distcp job is done.
I am running 'yarn version'= Hadoop 2.5.0-cdh5.3.1
Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!
Those killed attempts might be due to speculative execution. In this case there is nothing to worry about.
To make sure it is the case, try running your distcp like this:
hadoop distcp -Dmapreduce.map.speculative=false ...
You should stop seeing those killed attempts.

MapReduce job is failing with an error failed to write data

I'm trying to export data from teradata to hadoop. but my export query is failing by giving an error "Failed to write data".Please look at the Mapreduce and application logs below:
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 4931
2016-03-08 22:47:07,414 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-03-08 22:47:07,509 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-03-08 22:47:07,510 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1457504560070_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#175b9425)
2016-03-08 22:47:07,556 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.2:8032,39.7.48.3:8032, Ident: (owner=hive, renewer=oozie mr token, realUser=oozie, issueDate=1457506410968, maxDate=1458111210968, sequenceNumber=908, masterKeyId=280)
2016-03-08 22:47:07,599 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-03-08 22:47:07,848 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data1/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data2/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data3/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data4/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data5/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data6/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data7/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data8/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data9/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data10/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data12/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004
2016-03-08 22:47:08,132 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-03-08 22:47:08,623 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-03-08 22:47:08,840 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataInputSplit#2ece4966
2016-03-08 22:47:08,844 INFO [main] com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReader: recordreader class com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReaderinitialize time is: 1457506028844
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 300417020(1201668080)
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1146
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 841167680
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1201668096
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020; length = 75104256
2016-03-08 22:47:09,515 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2016-03-08 22:47:09,848 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.local does not exist
2016-03-08 22:47:09,914 INFO [main] hive.metastore: Trying to connect to metastore with URI thrift://apus2.labs.teradata.com:9083
2016-03-08 22:47:09,951 INFO [main] hive.metastore: Connected to metastore.
2016-03-08 22:47:10,407 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2016-03-08 22:47:10,452 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2016-03-08 22:47:10,457 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
APPLICATION Master LOGS:
Log Type: stderr
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 240
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Log Type: stdout
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 0
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 66959
Showing 4096 bytes of 66959 total. Click here for the full log.
ILED
2016-03-08 22:59:19,325 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event JOB_FAILED
2016-03-08 22:59:19,456 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1.jhist to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,550 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,562 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1_conf.xml to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,614 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,645 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary
2016-03-08 22:59:19,654 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2016-03-08 22:59:19,671 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1457504560070_0004_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-03-08 22:59:19,672 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: History url is http://apus2.labs.teradata.com:19888/jobhistory/job/job_1457504560070_0004
2016-03-08 22:59:19,680 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Waiting for application to be successfully unregistered.
2016-03-08 22:59:20,682 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:7 ContRel:0 HostLocal:6 RackLocal:1
2016-03-08 22:59:20,684 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://C423A /user/hive/.staging/job_1457504560070_0004
2016-03-08 22:59:20,711 INFO [Thread-89] org.apache.hadoop.ipc.Server: Stopping server on 46067
2016-03-08 22:59:20,712 INFO [IPC Server listener on 46067] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 46067
2016-03-08 22:59:20,712 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-08 22:59:20,714 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted.
Plese help me in resolving the issue.
You must be using sqoop to bring data to hadoop. Please paste command you are running. For "Failed to write data" , there can be multiple issues. destination parent directory is not avialble, space is not there at cluster etc.Only command can give explanation.

Hadoop data node start failing

I have a Hadoop cluster of 11 nodes. One node is acting as master node and 10 slave nodes running DATANODE & TASKTRACKERS.
TASK TRACKER is started on all the slave nodes.
DATANODE is only started 6 out of 10 nodes
Below is the log from /hadoop/logs/...Datanode....log file.
2014-12-03 17:55:05,057 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = trans9/192.168.0.16
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_65
************************************************************/
2014-12-03 17:55:05,371 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-12-03 17:55:05,384 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-12-03 17:55:05,385 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-12-03 17:55:05,385 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2014-12-03 17:55:05,776 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-12-03 17:55:05,789 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-12-03 17:55:08,850 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2014-12-03 17:55:08,865 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened data transfer server at 50010
2014-12-03 17:55:08,867 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2014-12-03 17:55:08,876 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2014-12-03 17:55:08,962 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-12-03 17:55:09,055 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-12-03 17:55:09,068 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2014-12-03 17:55:09,068 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2014-12-03 17:55:09,068 INFO org.mortbay.log: jetty-6.1.26
2014-12-03 17:55:09,804 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2014-12-03 17:55:09,812 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2014-12-03 17:55:09,813 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2014-12-03 17:55:09,893 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2014-12-03 17:55:09,894 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2014-12-03 17:55:09,895 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2014-12-03 17:55:09,903 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave9:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020)
2014-12-03 17:55:09,914 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished generating blocks being written report for 1 volumes in 0 seconds
2014-12-03 17:55:09,933 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 5ms
2014-12-03 17:55:09,933 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/home/ubuntu/hadoop/hadoop-data/dfs/data/current'}
2014-12-03 17:55:09,945 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-12-03 17:55:09,946 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2014-12-03 17:55:09,946 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2014-12-03 17:55:09,955 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2014-12-03 17:55:09,955 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2014-12-03 17:55:09,959 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2014-12-03 17:55:10,140 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.0.16:50010 is attempting to report storage ID DS-551911532-192.168.0.31-50010-1417617118848. Node 192.168.0.14:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:5049)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3939)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1095)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy3.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1084)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,144 INFO org.mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50075
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2014-12-03 17:55:10,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2014-12-03 17:55:10,148 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2014-12-03 17:55:10,148 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2014-12-03 17:55:10,149 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2014-12-03 17:55:10,149 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:248)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:100)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,149 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2014-12-03 17:55:10,149 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2014-12-03 17:55:10,150 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2014-12-03 17:55:10,151 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down
2014-12-03 17:55:10,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.16:50010, storageID=DS-551911532-192.168.0.31-50010-1417617118848, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/home/ubuntu/hadoop/hadoop-data/dfs/data/current'}
2014-12-03 17:55:10,152 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:586)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:855)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1601)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,152 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2014-12-03 17:55:10,152 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2014-12-03 17:55:10,153 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2014-12-03 17:55:10,153 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-551911532-192.168.0.31-50010-1417617118848
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-551911532-192.168.0.31-50010-1417617118848
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2093)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:917)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1601)
at java.lang.Thread.run(Thread.java:745)
2014-12-03 17:55:10,159 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down.
2014-12-03 17:55:10,159 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-12-03 17:55:10,166 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at trans9/192.168.0.16
************************************************************/
Oh, One system for name node and rest of the systems allocated to the data node is not recommendable. I recommend one system for Name node , another for Job Tracker, another system for secondary name node. Rest of the 8 systems should data node is highly recommended.
Come to your question. Format the name node to resolve this problem. Also re open Terminal. some time network also one of the reason.
This is due to incompatible cluster id problems, so format your datanode directory and start again.
Since your cluster is a tens of nodes, it is fine to have a single node with Namenode and JobTracker. Secondary Namenode should be on the other node as it requires more memory to perform periodic backups.
Coming to your question,
It could be the configuration file copying causing conflits.Same problem Answered here
And you can try copying the working datanode configuration with the appropriate changes.
If there is no data on the cluster, You can format the namenode, before this stop all the the daemons and then start again.
Hope it helps!
The fix would be as follows. This will not cause any data deletion as you would have set replication factor to more than 1.
The fix would be to check for the services running on the box. In this case, you will have tasktracker running on the box. Stop task tracker by running
hadoop-daemon stop tasktracker
Once this is done, go to the location you might have mentioned under dfs.data.dir property and delete all the files\folders here. Once this is done, run
hadoop-daemon start datanode
hadoop-daemon start tasktracker
This should bring up the datanode and the tasktracker. If you are successful in doing this, go to the namenode and run
hadoop dfsadmin -refreshNodes

HADOOP datanode strange things

I think I must have some misunderstanding about the datanodes in Hadoop Cluster. I have a hadoop virtural cluster composed of master,slave1,slave2, slave3. Master and slave1 are in a phsical machine while slave2 and slave3 are in one physical machine. When I start the cluster, in the HDFS webUI, I can only see three living datanodes, slave1,master, slave2. But sometimes, the three living datanodes are master, slave1,slave3. That's strange. I ssh to the unstarted data node, though I execute jps and found no datanode, I can still copy and delete files on HDFS on this node.
So I believe I must not understand datanode correctly. I have three questions here. 1 Is there one datanode per node? 2 Why the node which is not datanode can still read and write on HDFS? 3 can we decide the number of datanode?
Here is the log of the unstarted datanode:
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = slave11/192.168.111.31
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch- 1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
2012-08-03 17:47:07,578 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-08-03 17:47:07,595 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-08-03 17:47:07,596 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-08-03 17:47:07,911 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-08-03 17:47:07,915 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-08-03 17:47:09,457 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 0 time(s).
2012-08-03 17:47:10,460 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 1 time(s).
2012-08-03 17:47:11,464 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.111.21:54310. Already tried 2 time(s).
2012-08-03 17:47:19,565 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2012-08-03 17:47:19,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
2012-08-03 17:47:19,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2012-08-03 17:47:24,721 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-08-03 17:47:24,854 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-08-03 17:47:24,952 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2012-08-03 17:47:24,953 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2012-08-03 17:47:24,953 INFO org.mortbay.log: jetty-6.1.26
2012-08-03 17:47:25,665 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:25,688 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-08-03 17:47:25,690 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2012-08-03 17:47:30,717 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2012-08-03 17:47:30,718 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2012-08-03 17:47:30,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave11:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)
2012-08-03 17:47:30,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
2012-08-03 17:47:30,766 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:30,774 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2012-08-03 17:47:30,778 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2012-08-03 17:47:30,772 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2012-08-03 17:47:30,773 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2012-08-03 17:47:30,795 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.
2012-08-03 17:47:30,816 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 52ms
2012-08-03 17:47:30,838 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 32 ms
2012-08-03 17:47:30,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 2 ms
2012-08-03 17:47:31,158 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-6072482390929551157_78209
2012-08-03 17:47:33,775 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 1 ms
2012-08-03 17:47:33,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 192.168.111.31:50010 is attempting to report storage ID DS-1062340636-127.0.0.1-50010-1339803955209. Node 192.168.111.32:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:4608)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:3460)
at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:1001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy5.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:958)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,873 INFO org.mortbay.log: Stopped SelectChannelConnector#0.0.0.0:50075
2012-08-03 17:47:33,980 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: exiting
2012-08-03 17:47:33,981 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,982 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):DataXceiveServer:java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:170)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:102)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,982 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 50020
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting DataXceiveServer
2012-08-03 17:47:33,983 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-08-03 17:47:33,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
2012-08-03 17:47:33,984 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Exiting DataBlockScanner thread.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting down all async disk service threads...
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async disk service threads have been shut down.
2012-08-03 17:47:33,985 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.111.31:50010, storageID=DS-1062340636-127.0.0.1-50010-1339803955209, infoPort=50075, ipcPort=50020):Finishing DataNode in: FSDataset{dirpath='/app/hadoop/tmp/dfs/data/current'}
2012-08-03 17:47:33,987 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=DataNodeInfo
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=DataNodeInfo
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataNode.unRegisterMXBean(DataNode.java:522)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:737)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020
2012-08-03 17:47:33,988 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-08-03 17:47:33,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 0
2012-08-03 17:47:33,988 WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
javax.management.InstanceNotFoundException: Hadoop:service=DataNode,name=FSDatasetState-DS-1062340636-127.0.0.1-50010-1339803955209
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:433)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:421)
at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:540)
at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:2067)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:799)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1471)
at java.lang.Thread.run(Thread.java:636)
2012-08-03 17:47:33,988 WARN org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: AsyncDiskService has already shut down.
2012-08-03 17:47:33,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
There are problems having several DataNodes per single hostname. You say it is virtual, so are they on different virtual machines? If so, this shouldn't be a problem...
I would check the DataNode logs for slave2 and slave3 and see why one isn't booting. The error message will be printed there. If the error says something along the lines of the port being taken or something like that.
You don't need to be on a DataNode to access HDFS. The HDFS client (such as hadoop fs -put) directly communicates with the NameNode and other DataNode processes without ever having to access the local one.
It is actually quite common on large clusters to have a separate "query node" that has access to HDFS and MapReduce, but isn't running any DataNode or TaskTracker services.
As long as you have the Hadoop packages installed and the configuration files point to the NameNode and JobTracker correctly, you can access your cluster "remotely".

Resources