map reduce job execution stop on map task at 67% - hadoop

I have created map reduce job in java on eclipse.I have create map reduce job for wordcount which reads data(approximate 7453215 record - 670MB) from sql server and store result back to sql server.I have created HDInsight cluster on azure which has 2 head node and 3 worker nodes.Each node has 4 cores and 14GB RAM.Map Reduce job running successfully on local but while i am submitting jar file of map reduce job to HDInsight cluster on azure then it stoppped on map task at 67%.
here is the log,
17/12/01 13:23:20 INFO client.AHSProxy: Connecting to Application
History server at headnodehost/10.0.0.20:10200 17/12/01 13:23:21 INFO
client.RequestHedgingRMFailoverProxyProvider: Looking for the active
RM in [rm1, rm2]... 17/12/01 13:23:21 INFO
client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
17/12/01 13:23:21 WARN mapreduce.JobResourceUploader: Hadoop
command-line option parsing not performed. Implement the Tool
interface and execute your application with ToolRunner to remedy this.
17/12/01 13:23:36 INFO mapreduce.JobSubmitter: number of splits:2
17/12/01 13:23:37 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1512119994740_0011 17/12/01 13:23:37 INFO
impl.YarnClientImpl: Submitted application
application_1512119994740_0011 17/12/01 13:23:37 INFO mapreduce.Job:
The url to track the job:
http://hn1-hdpclu.53o3id15rwte5en44vyo02sv0h.dx.internal.cloudapp.net:8088/proxy/application_1512119994740_0011/
17/12/01 13:23:37 INFO mapreduce.Job: Running job:
job_1512119994740_0011 17/12/01 13:23:47 INFO mapreduce.Job: Job
job_1512119994740_0011 running in uber mode : false
17/12/01 13:23:47 INFO mapreduce.Job: map 0% reduce 0%
17/12/01 13:24:00 INFO mapreduce.Job: map 33% reduce 0%
17/12/01 13:24:06 INFO mapreduce.Job: map 67% reduce 0%
Error:
2017-12-02 07:09:17,697 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:
Processing the event EventType: TASK_ABORT 2017-12-02 07:09:17,697
WARN [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Output
Path is null in abortTask() 2017-12-02 07:09:17,699 INFO
[AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000000_0 TaskAttempt Transitioned from
FAIL_TASK_CLEANUP to FAILED 2017-12-02 07:09:17,708 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures
on node 10.0.0.11 2017-12-02 07:09:17,709 INFO [AsyncDispatcher event
handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000000_1 TaskAttempt Transitioned from
NEW to UNASSIGNED 2017-12-02 07:09:17,709 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added
attempt_1512191608534_0003_m_000000_1 to list of failed maps
2017-12-02 07:09:17,721 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_0 TaskAttempt Transitioned from
FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP 2017-12-02 07:09:17,728
INFO [CommitterEvent Processor #2]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:
Processing the event EventType: TASK_ABORT 2017-12-02 07:09:17,728
WARN [CommitterEvent Processor #2]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Output
Path is null in abortTask() 2017-12-02 07:09:17,728 INFO
[AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_0 TaskAttempt Transitioned from
FAIL_TASK_CLEANUP to FAILED 2017-12-02 07:09:17,729 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 2 failures
on node 10.0.0.11 2017-12-02 07:09:17,729 INFO [AsyncDispatcher event
handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1512191608534_0003_m_000001_1 TaskAttempt Transitioned from
NEW to UNASSIGNED 2017-12-02 07:09:17,729 INFO [Thread-56]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added
attempt_1512191608534_0003_m_000001_1 to list of failed maps
2017-12-02 07:09:18,234 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:2 ScheduledReds:0
AssignedMaps:3 AssignedReds:0 CompletedMaps:0 CompletedReds:0
ContAlloc:3 ContRel:0 HostLocal:0 RackLocal:0 2017-12-02 07:09:18,240
INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
getResources() for application_1512191608534_0003: ask=1 release= 0
newContainers=0 finishedContainers=1 resourcelimit= knownNMs=1 2017-12-02 07:09:18,240 INFO [RMCommunicator
Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Received completed container container_1512191608534_0003_01_000002
2017-12-02 07:09:18,240 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Recalculating schedule, headroom= 2017-12-02
07:09:18,240 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce
slow start threshold not met. completedMapsForReduceSlowstart 1
2017-12-02 07:09:18,240 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1512191608534_0003_m_000000_0:
Container killed by the ApplicationMaster. Container killed on
request. Exit code is 143 Container exited with a non-zero exit code
143

Related

Hadoop MapReduce job container throws java.io.FileNotFoundException for path /tmp/crunch-1324412807/p2/MAP

I am seeing the below error when I submit an Oozie job in an EMR Hadoop cluster. I could see that a certain container is not finding the temporary output file that is produced by another job (?).
I verified the name node address, and also verified there is enough memory on it (attached additional EBS volume). I am using Master instance type - m5.2xlarge and core instance type - r5a.2xlarge.
This is the error I see in the Application Master logs.
Application application_1632399471753_0051 failed 2 times due to AM Container for appattempt_1632399471753_0051_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
For more detailed output, check the application tracking page: http://ip-10-23-31-119.us-west-2.compute.internal:8088/cluster/app/application_1632399471753_0051 Then click on links to logs of each attempt.
. Failing the application.
This is the error I see when I go into one of the task logs.
2021-09-23 13:19:43,280 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1632399471753_0051Job Transitioned from INITED to SETUP
2021-09-23 13:19:43,282 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2021-09-23 13:19:43,294 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1632399471753_0051Job Transitioned from SETUP to RUNNING
2021-09-23 13:19:43,309 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1632399471753_0051_m_000000 Task Transitioned from NEW to SCHEDULED
2021-09-23 13:19:43,311 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1632399471753_0051_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2021-09-23 13:19:43,311 INFO [Thread-85] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:7168, vCores:1>
2021-09-23 13:19:43,380 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1632399471753_0051, File: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job_1632399471753_0051_1.jhist
2021-09-23 13:19:44,270 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2021-09-23 13:19:44,298 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1632399471753_0051: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:14336, vCores:10> knownNMs=2
2021-09-23 13:19:45,306 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
2021-09-23 13:19:45,308 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1632399471753_0051_01_000002 to attempt_1632399471753_0051_m_000000_0
2021-09-23 13:19:45,309 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2021-09-23 13:19:45,362 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-jar file on the remote FS is hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job.jar
2021-09-23 13:19:45,364 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /tmp/hadoop-yarn/staging/hadoop/.staging/job_1632399471753_0051/job.xml
2021-09-23 13:19:45,367 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:902)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createContainerLaunchContext(TaskAttemptImpl.java:947)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1714)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1691)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1210)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:147)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1459)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1451)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-23-31-119.us-west-2.compute.internal:8020/tmp/crunch-1324412807/p2/MAP
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1444)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:774)
at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:601)
at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:491)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:821)
... 14 more
2021-09-23 13:19:45,369 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
End of LogType:syslog
I am not sure what other things I need to check and troubleshoot this issue. I don not see any logs being printed from my application code.

Hadoop distcp jobs SUCCEEDED but attempt_xxx killed by ApplicationMaster

Running a distcp job I encounter the following problem:
Almost all map tasks are marked as successful but with note saying Container killed.
On the online interface the log for the map jobs says:
Progress 100.00
State SUCCEEDED
but under Note it says for almost every attempt (~200)
Container killed by the ApplicationMaster.
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.
stderr output is empty for all jobs/attempts.
When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:
2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
i have set "mapreduce.map.speculative=false"!
All MAP task are SUCCEEDED(distcp job has no REDUCE),but MAPREDUCE is going for a long time(several hours) , then it is succeeded and distcp job is done.
I am running 'yarn version'= Hadoop 2.5.0-cdh5.3.1
Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!
Those killed attempts might be due to speculative execution. In this case there is nothing to worry about.
To make sure it is the case, try running your distcp like this:
hadoop distcp -Dmapreduce.map.speculative=false ...
You should stop seeing those killed attempts.

MapReduce job is failing with an error failed to write data

I'm trying to export data from teradata to hadoop. but my export query is failing by giving an error "Failed to write data".Please look at the Mapreduce and application logs below:
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 4931
2016-03-08 22:47:07,414 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-03-08 22:47:07,499 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2016-03-08 22:47:07,509 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2016-03-08 22:47:07,510 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1457504560070_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#175b9425)
2016-03-08 22:47:07,556 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 39.7.48.2:8032,39.7.48.3:8032, Ident: (owner=hive, renewer=oozie mr token, realUser=oozie, issueDate=1457506410968, maxDate=1458111210968, sequenceNumber=908, masterKeyId=280)
2016-03-08 22:47:07,599 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2016-03-08 22:47:07,848 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data1/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data2/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data3/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data4/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data5/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data6/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data7/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data8/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data9/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data10/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004,/data12/hadoop/yarn/local/usercache/hive/appcache/application_1457504560070_0004
2016-03-08 22:47:08,132 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-03-08 22:47:08,623 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-03-08 22:47:08,840 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataInputSplit#2ece4966
2016-03-08 22:47:08,844 INFO [main] com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReader: recordreader class com.teradata.dynaload.hcatalog.mapper.TDInputFormat$TeradataRecordReaderinitialize time is: 1457506028844
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 300417020(1201668080)
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1146
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 841167680
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1201668096
2016-03-08 22:47:09,512 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020; length = 75104256
2016-03-08 22:47:09,515 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2016-03-08 22:47:09,518 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2016-03-08 22:47:09,848 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.metastore.local does not exist
2016-03-08 22:47:09,914 INFO [main] hive.metastore: Trying to connect to metastore with URI thrift://apus2.labs.teradata.com:9083
2016-03-08 22:47:09,951 INFO [main] hive.metastore: Connected to metastore.
2016-03-08 22:47:10,407 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2016-03-08 22:47:10,452 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2016-03-08 22:47:10,453 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2016-03-08 22:47:10,457 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
APPLICATION Master LOGS:
Log Type: stderr
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 240
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Log Type: stdout
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 0
Log Type: syslog
Log Upload Time: Tue Mar 08 22:59:27 -0800 2016
Log Length: 66959
Showing 4096 bytes of 66959 total. Click here for the full log.
ILED
2016-03-08 22:59:19,325 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing event JOB_FAILED
2016-03-08 22:59:19,456 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1.jhist to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,550 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp
2016-03-08 22:59:19,562 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://C423A:8020/user/hive/.staging/job_1457504560070_0004/job_1457504560070_0004_1_conf.xml to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,614 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp
2016-03-08 22:59:19,645 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004.summary
2016-03-08 22:59:19,654 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004_conf.xml
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist_tmp to hdfs://C423A:8020/mr-history/tmp/hive/job_1457504560070_0004-1457506422934-hive-oozie%3Aaction%3AT%3Djava%3AW%3DTDExportMR%3AA%3Dexport%3AID%3D00001-1457506759193-0-0-FAILED-default-1457506429243.jhist
2016-03-08 22:59:19,666 INFO [Thread-89] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2016-03-08 22:59:19,671 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1457504560070_0004_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-03-08 22:59:19,672 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: History url is http://apus2.labs.teradata.com:19888/jobhistory/job/job_1457504560070_0004
2016-03-08 22:59:19,680 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Waiting for application to be successfully unregistered.
2016-03-08 22:59:20,682 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:7 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:7 ContRel:0 HostLocal:6 RackLocal:1
2016-03-08 22:59:20,684 INFO [Thread-89] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://C423A /user/hive/.staging/job_1457504560070_0004
2016-03-08 22:59:20,711 INFO [Thread-89] org.apache.hadoop.ipc.Server: Stopping server on 46067
2016-03-08 22:59:20,712 INFO [IPC Server listener on 46067] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 46067
2016-03-08 22:59:20,712 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-03-08 22:59:20,714 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted.
Plese help me in resolving the issue.
You must be using sqoop to bring data to hadoop. Please paste command you are running. For "Failed to write data" , there can be multiple issues. destination parent directory is not avialble, space is not there at cluster etc.Only command can give explanation.

Sqoop job to import data drom SQL server stuck at Map 0%

I have a pseudo distributed hadoop cluster running CDH5.0.2. I'm running a sqoop import command:
sudo -u sqoop sqoop import --connect "jdbc:sqlserver://x.x.x.x:1433;databaseName=yyyyy" --username x --password y --table table_name
I'm just importing a very small table that has 12 rows and 2 columns for testing. The job has been running for half hour. On My resource manager, the mapper tasks' status are listed as NEW and their state are listed as SCHEDULED. I don't think it ever runs!
When i list the job on yarn using:
yarn application -list
i'm getting:
14/07/01 15:55:06 INFO client.RMProxy: Connecting to ResourceManager at host/x.x.x.x:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1404252440376_0001 ActivityType.jar MAPREDUCE sqoop root.sqoop RUNNING UNDEFINED 5% http://host:42583
this is the Application Master log im seeing. How do i fix this?
2014-07-01 15:14:12,880 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-07-01 15:14:12,885 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-07-01 15:14:12,888 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at host/x.x.x.x:8030
2014-07-01 15:14:12,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: 1024
2014-07-01 15:14:12,973 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.sqoop
2014-07-01 15:14:12,977 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2014-07-01 15:14:12,979 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
2014-07-01 15:14:12,985 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1404252440376_0001Job Transitioned from INITED to SETUP
2014-07-01 15:14:12,987 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2014-07-01 15:14:12,997 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1404252440376_0001Job Transitioned from SETUP to RUNNING
2014-07-01 15:14:13,018 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000000 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000001 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000002 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,019 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1404252440376_0001_m_000003 Task Transitioned from NEW to SCHEDULED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,021 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1404252440376_0001_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-01 15:14:13,022 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:1024
2014-07-01 15:14:13,066 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1404252440376_0001, File: hdfs://host:8020/user/sqoop/.staging/job_1404252440376_0001/job_1404252440376_0001_1.jhist
2014-07-01 15:14:13,976 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-07-01 15:14:14,054 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1404252440376_0001: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1
The main problem I saw when this happened to me is that there were not enough resources for executing sqoop.
When I was executing Sqoop along with other YARN applications it usually didn't have enough resources so map tasks was always stuck at 0%. I went to the driver logs and the last lines of the logs had something like:
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,638 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,639 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000004_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-02-01 14:54:48,639 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1484947659248_0500_m_000005_0 TaskAttempt Transitioned from NEW to UNASSIGNED
After this, nothing is logged and sqoop is still at 0%.
When there was nothing else running on YARN, Sqoop executed without any problem.
this is not the most helpful answer, but I ended up reinstalling the whole thing.

hadoop streaming error,mapreduce with python

I'm newbie to hadoop environment,Do you have any idea about how to solve this error,or what may be the reason behind this error?
hduser#intel-HP-Pavilion-g6-Notebook-PC:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py -input /home/hduser/tmp/cddb.txt -output /home/hduser/op1
packageJobJar: [/home/hduser/map.py, /home/hduser/red.py] [] /tmp/streamjob7455767556382290755.jar tmpDir=null
13/06/20 12:43:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/20 12:43:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/20 12:43:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/20 12:43:55 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
13/06/20 12:43:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
13/06/20 12:43:56 INFO streaming.StreamJob: Running job: job_local_0001
13/06/20 12:43:56 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:43:56 INFO util.ProcessTree: setsid exited with exit code 0
13/06/20 12:43:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#e2081
13/06/20 12:43:56 INFO mapred.MapTask: numReduceTasks: 1
13/06/20 12:43:56 INFO mapred.MapTask: io.sort.mb = 100
13/06/20 12:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/06/20 12:43:56 INFO mapred.MapTask: record buffer = 262144/327680
13/06/20 12:43:56 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./map.py]
13/06/20 12:43:56 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:43:57 INFO streaming.StreamJob: map 0% reduce 0%
13/06/20 12:44:02 INFO mapred.LocalJobRunner: file:/home/hduser/tmp/cddb.txt:0+1205
13/06/20 12:44:03 INFO streaming.StreamJob: map 100% reduce 0%
13/06/20 12:48:11 INFO streaming.PipeMapRed: Records R/W=9/1
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: mapRedFinished
13/06/20 12:48:11 INFO mapred.MapTask: Starting flush of map output
13/06/20 12:48:11 INFO mapred.MapTask: Finished spill 0
13/06/20 12:48:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/06/20 12:48:11 INFO mapred.LocalJobRunner: Records R/W=9/1
13/06/20 12:48:11 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/06/20 12:48:11 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1c84be9
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO mapred.Merger: Merging 1 sorted segments
13/06/20 12:48:11 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1356 bytes
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./red.py]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
Traceback (most recent call last):
File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
main()
File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:11 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:12 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:48:12 ERROR streaming.StreamJob: Job not successful. Error: NA
13/06/20 12:48:12 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
I'm using hadoop 1.0.4,and wrote map reduce in python(hadoop streaming is used)
.
The error is obvious:
Traceback (most recent call last):
File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
main()
File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)
groupby only accepts 2 arguments. Here is the document of groupby.

Resources