sqoop job failing due to following reason - sqoop

java.lang.Exception: java.io.IOException: Mkdirs failed to create file:/user/City/_temporary/0/_temporary/attempt_local1259965155_0001_m_000000_0 (exists=false, cwd=file:/home/centos)
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.io.IOException: Mkdirs failed to create file:/user/City/_temporary/0/_temporary/attempt_local1259965155_0001_m_000000_0 (exists=false, cwd=file:/home/centos)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
at org.apache.sqoop.mapreduce.RawKeyTextOutputFormat.getRecordWriter(RawKeyTextOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:653)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
18/12/05 13:14:05 INFO mapreduce.Job: Job job_local1259965155_0001 running in uber mode : false
18/12/05 13:14:05 INFO mapreduce.Job: map 0% reduce 0%
18/12/05 13:14:05 INFO mapreduce.Job: Job job_local1259965155_0001 failed with state FAILED due to: NA
18/12/05 13:14:05 INFO mapreduce.Job: Counters: 0
18/12/05 13:14:05 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/12/05 13:14:05 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 2.6049 seconds (0 bytes/sec)
18/12/05 13:14:05 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/12/05 13:14:05 INFO mapreduce.ImportJobBase: Retrieved 0 records.
18/12/05 13:14:05 ERROR tool.ImportAllTablesTool: Error during import: Import job failed!

Related

Execute hive query cause yarn resource manager to throw file does not exist exception

I'm configuring hive 3.1.0 to work with hadoop 3.0.0.
This error throw almost immediately when I submit a simple query on beeline that cause map reduce
0: jdbc:hive2://> select count(*) from airlinedata;
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in the futureversions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = UUT81HC_20181011102444_2df01ff5-ca05-403c-b0e1-15f8f7715dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
2018-10-11 10:24:45,510 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
2018-10-11 10:24:45,555 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.mapreduce.TypeConverter.toYarn(TypeConverter.java:78)
at org.apache.hadoop.mapred.ClientServiceDelegate.(ClientServiceDelegate.java:120)
at org.apache.hadoop.mapred.ClientCache.getClient(ClientCache.java:68)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:343)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 40 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.(JobIdPBImpl.java:37)
... 45 more
yarn resoucemanager stacktrace
2018-10-11 10:24:49,896 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2018-10-11 10:24:49,896 INFO recovery.RMStateStore: Updating info for app: application_1539226955170_0002
2018-10-11 10:24:49,897 INFO capacity.CapacityScheduler: Application Attempt appattempt_1539226955170_0002_000002 is done. finalState=FAILED
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application.
2018-10-11 10:24:49,897 INFO scheduler.AppSchedulingInfo: Application application_1539226955170_0002 requests cleared
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2018-10-11 10:24:49,898 INFO capacity.LeafQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2018-10-11 10:24:49,898 WARN resourcemanager.RMAuditLogger: USER=UUT81HC OPERATION=Application Finished - Failed
TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application. APPID=application_1539226955170_0002
2018-10-11 10:24:49,898 INFO capacity.ParentQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC leaf-queue of parent: root #applications: 0
2018-10-11 10:24:49,899 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1539226955170_0002,name=select count(*) from airlinedata (Stage-1),user=UUT81HC,queue=default,state=FAILED,trackingUrl=http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002,appMasterHost=N/A,submitTime=1539228287412,startTime=1539228287413,finishTime=1539228289896,finalStatus=FAILED,memorySeconds=1482,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE,resourceSeconds=1482 MB-seconds\, 0 vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
After examine how hive execute mapreduce job on yarn, I found that it first it create map.xml and reduce.xml in /tmp with permission drwx------ (only owner can use it)
2018-10-11 10:24:45,133 INFO hdfs.StateChange: BLOCK* allocate blk_1073742318_1495, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,225 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,248 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,294 INFO hdfs.StateChange: BLOCK* allocate blk_1073742319_1496, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,411 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,437 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,772 INFO hdfs.StateChange: BLOCK* allocate blk_1073742320_1497, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,438 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,463 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,618 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,639 INFO hdfs.StateChange: BLOCK* allocate blk_1073742321_1498, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,706 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,791 INFO hdfs.StateChange: BLOCK* allocate blk_1073742322_1499, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo
2018-10-11 10:24:46,870 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,971 INFO hdfs.StateChange: BLOCK* allocate blk_1073742323_1500, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml
2018-10-11 10:24:47,370 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:32:15,741 INFO blockmanagement.BlockManager: StorageInfo TreeSet fill ratio DS-d4c2a5a0-435d-4b44-b408-3cd04587cd09 : 1.0
But somehow yarn can't read that when executing job and throw out file does not exist. I did set permission 777 on /tmp but this file is self create by hive in executing process so I can't do anything with it.
I doubt that this problem is something related to user or permission when using hive in hadoop. What should I do with this?

Hadoop Streaming job failing

I am trying to run my first mapreduce job, which aggregates some data from xml files. My job is failing, and as I am a newbie at Hadoop, I would appreciate if someone could please take a look at what is going wrong.
I have:
posts_mapper.py:
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET
input_string = sys.stdin.read()
class User(object):
def __init__(self, id):
self.id = id
self.post_type_1_count = 0
self.post_type_2_count = 0
self.aggregate_post_score = 0
self.aggregate_post_size = 0
self.tags_count = {}
users = {}
root = ET.fromstring(input_string)
for child in root.getchildren():
user_id = int(child.get("OwnerUserId"))
post_type = int(child.get("PostTypeId"))
score = int(child.get("Score"))
#view_count = int(child.get("ViewCount"))
post_size = len(child.get("Body"))
tags = child.get("Tags")
if user_id not in users:
users[user_id] = User(user_id)
user = users[user_id]
if post_type == 1:
user.post_type_1_count += 1
else:
user.post_type_2_count += 1
user.aggregate_post_score += score
user.aggregate_post_size += post_size
if tags != None:
tags = tags.replace("<", " ").replace(">", " ").split()
for tag in tags:
if tag not in user.tags_count:
user.tags_count[tag] = 0
user.tags_count[tag] += 1
for i in users:
user = users[i]
out = "%d %d %d %d %d " % (user.id, user.post_type_1_count, user.post_type_2_count, user.aggregate_post_score, user.aggregate_post_size)
for tag in user.tags_count:
out += "%s %d " % (tag, user.tags_count[tag])
print out
posts_reducer.py:
#!/usr/bin/env python
import sys
class User(object):
def __init__(self, id):
self.id = id
self.post_type_1_count = 0
self.post_type_2_count = 0
self.aggregate_post_score = 0
self.aggregate_post_size = 0
self.tags_count = {}
users = {}
for line in sys.stdin:
vals = line.split()
user_id = int(vals[0])
post_type_1 = int(vals[1])
post_type_2 = int(vals[2])
aggregate_post_score = int(vals[3])
aggregate_post_size = int(vals[4])
tags = {}
if len(vals) > 5:
#this means we got tags
for i in range (5, len(vals), 2):
tag = vals[i]
count = int((vals[i+1]))
tags[tag] = count
if user_id not in users:
users[user_id] = User(user_id)
user = users[user_id]
user.post_type_1_count += post_type_1
user.post_type_2_count += post_type_2
user.aggregate_post_score += aggregate_post_score
user.aggregate_post_size += aggregate_post_size
for tag in tags:
if tag not in user.tags_count:
user.tags_count[tag] = 0
user.tags_count[tag] += tags[tag]
for i in users:
user = users[i]
out = "%d %d %d %d %d " % (user.id, user.post_type_1_count, user.post_type_2_count, user.aggregate_post_score, user.aggregate_post_size)
for tag in user.tags_count:
out += "%s %d " % (tag, user.tags_count[tag])
print out
I run the command:
bin/hadoop jar hadoop-streaming-2.6.0.jar -input /stackexchange/beer/posts -output /stackexchange/beer/results -mapper posts_mapper.py -reducer posts_reducer.py -file ~/mapreduce/posts_mapper.py -file ~/mapreduce/posts_reducer.py
and get the output:
packageJobJar: [/home/hduser/mapreduce/posts_mapper.py, /home/hduser/mapreduce/posts_reducer.py, /tmp/hadoop-unjar6585010774815976682/] [] /tmp/streamjob8863638738687983603.jar tmpDir=null
15/03/20 10:18:55 INFO client.RMProxy: Connecting to ResourceManager at Master/10.1.1.22:8040
15/03/20 10:18:55 INFO client.RMProxy: Connecting to ResourceManager at Master/10.1.1.22:8040
15/03/20 10:18:57 INFO mapred.FileInputFormat: Total input paths to process : 10
15/03/20 10:18:57 INFO mapreduce.JobSubmitter: number of splits:10
15/03/20 10:18:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426769192808_0004
15/03/20 10:18:58 INFO impl.YarnClientImpl: Submitted application application_1426769192808_0004
15/03/20 10:18:58 INFO mapreduce.Job: The url to track the job: http://i-644dd931:8088/proxy/application_1426769192808_0004/
15/03/20 10:18:58 INFO mapreduce.Job: Running job: job_1426769192808_0004
15/03/20 10:19:11 INFO mapreduce.Job: Job job_1426769192808_0004 running in uber mode : false
15/03/20 10:19:11 INFO mapreduce.Job: map 0% reduce 0%
15/03/20 10:19:41 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000006_0, Status : FAILED
15/03/20 10:19:48 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000007_0, Status : FAILED
15/03/20 10:19:50 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000008_0, Status : FAILED
15/03/20 10:19:50 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000009_0, Status : FAILED
15/03/20 10:20:00 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000006_1, Status : FAILED
15/03/20 10:20:08 INFO mapreduce.Job: map 7% reduce 0%
15/03/20 10:20:10 INFO mapreduce.Job: map 20% reduce 0%
15/03/20 10:20:10 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000007_1, Status : FAILED
15/03/20 10:20:11 INFO mapreduce.Job: map 10% reduce 0%
15/03/20 10:20:17 INFO mapreduce.Job: map 20% reduce 0%
15/03/20 10:20:17 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000008_1, Status : FAILED
15/03/20 10:20:19 INFO mapreduce.Job: map 10% reduce 0%
15/03/20 10:20:19 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000009_1, Status : FAILED
15/03/20 10:20:22 INFO mapreduce.Job: map 20% reduce 0%
15/03/20 10:20:22 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000006_2, Status : FAILED
15/03/20 10:20:25 INFO mapreduce.Job: map 40% reduce 0%
15/03/20 10:20:25 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000002_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
15/03/20 10:20:28 INFO mapreduce.Job: map 50% reduce 0%
15/03/20 10:20:28 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000007_2, Status : FAILED
15/03/20 10:20:42 INFO mapreduce.Job: map 50% reduce 17%
15/03/20 10:20:52 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000008_2, Status : FAILED
15/03/20 10:20:54 INFO mapreduce.Job: Task Id : attempt_1426769192808_0004_m_000009_2, Status : FAILED
15/03/20 10:20:56 INFO mapreduce.Job: map 90% reduce 0%
15/03/20 10:20:57 INFO mapreduce.Job: map 100% reduce 100%
15/03/20 10:20:58 INFO mapreduce.Job: Job job_1426769192808_0004 failed with state FAILED due to: Task failed task_1426769192808_0004_m_000006
Job failed as tasks failed. failedMaps:1 failedReduces:0
Unfortunately, hadoop does not show stderr for your python mapper/reducer so this output does not give any clue.
I would recommend you the following 2 throubleshooting steps:
Test your mapper/reducer locally:
cat {your_input_files} | ./posts_mapper.py | sort | ./posts_reducer.py
If you did not find any issue on step1, create the map reduce job and check the output logs:
yarn logs -applicationId application_1426769192808_0004
or
hdfs dfs -cat /var/log/hadoop-yarn/apps/{user}/logs/

From hive to elasticsearch :

I'am working with Cloudera CDH5.3 with 1 Namenode (ip:...169) and 3 slaves.
I have ElasticSearch 1.4.4 installed on my master machine (ip:...169).
I have downloaded the ES-Hadoop jar and added it to the path.
With that being said; I now want to load data from Hive to ES.
1) First of all, I created a table via a CSV file under table metastore (with HUE)
2) I defined an external table on top of ES in hive to write and load data in it later:
ADD JAR
/usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-hive-2.0.2.jar;
CREATE EXTERNAL TABLE es_cdr(
id bigint,
calling int,
called int,
duration int,
location string,
date string)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'='10.44.162.169',
'es.resource' = 'indexOmar/typeOmar');
I've also added manually the serde snapshot jar via paramaters=> add file =>jar
Now, I want to load data from my table in the new ES table :
INSERT OVERWRITE TABLE es_cdr
select NULL, h.appelant, h.called_number,
h.call_duration, h.location_number, h.date_heure_appel from hive_cdr h;
But an error is appearing saying that :
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
And this is what's written in the log :
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE TABLE hive_es_cdr_10
SELECT NULL,h.appelant,h.called_number,h.call_dur,h.loc_number,h.h_appel FROM hive_cdr h limit 2
15/03/05 14:36:34 INFO parse.ParseDriver: Parse Completed
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=parse start=1425562594378 end=1425562594381 duration=3 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for source tables
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for destination tables
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis
15/03/05 14:36:34 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/.hive-staging_hive_2015-03-05_14-36-34_378_4527939627221909415-1
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Set stats collection dir : hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/.hive-staging_hive_2015-03-05_14-36-34_378_4527939627221909415-1/-ext-10000
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for FS(109)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for SEL(108)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for LIM(107)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for EX(106)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for RS(105)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for LIM(104)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for SEL(103)
15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for TS(102)
15/03/05 14:36:34 INFO optimizer.ColumnPrunerProcFactory: RS 105 oldColExprMap: {_col5=Column[_col5], _col4=Column[_col4], _col3=Column[_col3], _col2=Column[_col2], _col1=Column[_col1], _col0=Column[_col0]}
15/03/05 14:36:34 INFO optimizer.ColumnPrunerProcFactory: RS 105 newColExprMap: {_col5=Column[_col5], _col4=Column[_col4], _col3=Column[_col3], _col2=Column[_col2], _col1=Column[_col1], _col0=Column[_col0]}
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1425562594461 end=1425562594461 duration=0 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
15/03/05 14:36:34 INFO physical.MetadataOnlyOptimizer: Looking for table scans where optimization is applicable
15/03/05 14:36:34 INFO physical.MetadataOnlyOptimizer: Found 0 metadata only table scans
15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed plan generation
15/03/05 14:36:34 INFO ql.Driver: Semantic Analysis Completed
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1425562594381 end=1425562594463 duration=82 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:bigint, comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:int, comment:null), FieldSchema(name:_col3, type:int, comment:null), FieldSchema(name:_col4, type:string, comment:null), FieldSchema(name:_col5, type:string, comment:null)], properties:null)
15/03/05 14:36:34 INFO ql.Driver: EXPLAIN output for queryid hive_20150305143636_528f97d4-b670-40e2-ba80-7d7a7bd441ff : ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_FROM
TOK_TABREF
TOK_TABNAME
hive_cdr
h
TOK_INSERT
TOK_DESTINATION
TOK_TAB
TOK_TABNAME
hive_es_cdr_10
TOK_SELECT
TOK_SELEXPR
TOK_NULL
TOK_SELEXPR
.
TOK_TABLE_OR_COL
h
appelant
TOK_SELEXPR
.
TOK_TABLE_OR_COL
h
called_number
TOK_SELEXPR
.
TOK_TABLE_OR_COL
h
call_dur
TOK_SELEXPR
.
TOK_TABLE_OR_COL
h
loc_number
TOK_SELEXPR
.
TOK_TABLE_OR_COL
h
h_appel
TOK_LIMIT
2
STAGE DEPENDENCIES:
Stage-0 is a root stage [MAPRED]
STAGE PLANS:
Stage: Stage-0
Map Reduce
Map Operator Tree:
TableScan
alias: h
GatherStats: false
Select Operator
expressions: null (type: string), appelant (type: int), called_number (type: int), call_dur (type: int), loc_number (type: string), h_appel (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Limit
Number of rows: 2
Reduce Output Operator
sort order:
tag: -1
value expressions: _col0 (type: void), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: string), _col5 (type: string)
Path -> Alias:
hdfs://master:8020/user/hive/warehouse/hive_cdr [h]
Path -> Partition:
hdfs://master:8020/user/hive/warehouse/hive_cdr
Partition
base file name: hive_cdr
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE true
bucket_count -1
columns traffic_type_id,appelant,called_number,call_dur,loc_number,h_appel
columns.comments
columns.types int:int:int:int:string:string
field.delim ;
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location hdfs://master:8020/user/hive/warehouse/hive_cdr
name default.hive_cdr
numFiles 1
numRows 0
rawDataSize 0
serialization.ddl struct hive_cdr { i32 traffic_type_id, i32 appelant, i32 called_number, i32 call_dur, string loc_number, string h_appel}
serialization.format ;
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 56373362
transient_lastDdlTime 1425459002
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE true
bucket_count -1
columns traffic_type_id,appelant,called_number,call_dur,loc_number,h_appel
columns.comments
columns.types int:int:int:int:string:string
field.delim ;
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location hdfs://master:8020/user/hive/warehouse/hive_cdr
name default.hive_cdr
numFiles 1
numRows 0
rawDataSize 0
serialization.ddl struct hive_cdr { i32 traffic_type_id, i32 appelant, i32 called_number, i32 call_dur, string loc_number, string h_appel}
serialization.format ;
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 56373362
transient_lastDdlTime 1425459002
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.hive_cdr
name: default.hive_cdr
Truncated Path -> Alias:
/hive_cdr [h]
Needs Tagging: false
Reduce Operator Tree:
Extract
Limit
Number of rows: 2
Select Operator
expressions: UDFToLong(_col0) (type: bigint), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: string), _col5 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
File Output Operator
compressed: false
GlobalTableId: 1
directory: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10
NumFilesPerFileSink: 1
Stats Publishing Key Prefix: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/
table:
input format: org.elasticsearch.hadoop.hive.EsHiveInputFormat
jobProperties:
EXTERNAL TRUE
bucket_count -1
columns id_traffic,caller,called,call_dur,caller_location,call_date
columns.comments
columns.types bigint:int:int:int:string:string
es.nodes 10.44.162.169
es.port 9200
es.resource myindex/mytype
file.inputformat org.apache.hadoop.mapred.SequenceFileInputFormat
file.outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat
location hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10
name default.hive_es_cdr_10
serialization.ddl struct hive_es_cdr_10 { i64 id_traffic, i32 caller, i32 called, i32 call_dur, string caller_location, string call_date}
serialization.format 1
serialization.lib org.elasticsearch.hadoop.hive.EsSerDe
storage_handler org.elasticsearch.hadoop.hive.EsStorageHandler
transient_lastDdlTime 1425561441
output format: org.elasticsearch.hadoop.hive.EsHiveOutputFormat
properties:
EXTERNAL TRUE
bucket_count -1
columns id_traffic,caller,called,call_dur,caller_location,call_date
columns.comments
columns.types bigint:int:int:int:string:string
es.nodes 10.44.162.169
es.port 9200
es.resource myindex/mytype
file.inputformat org.apache.hadoop.mapred.SequenceFileInputFormat
file.outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat
location hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10
name default.hive_es_cdr_10
serialization.ddl struct hive_es_cdr_10 { i64 id_traffic, i32 caller, i32 called, i32 call_dur, string caller_location, string call_date}
serialization.format 1
serialization.lib org.elasticsearch.hadoop.hive.EsSerDe
storage_handler org.elasticsearch.hadoop.hive.EsStorageHandler
transient_lastDdlTime 1425561441
serde: org.elasticsearch.hadoop.hive.EsSerDe
name: default.hive_es_cdr_10
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=compile start=1425562594378 end=1425562594484 duration=106 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO lockmgr.DummyTxnManager: Creating lock manager of type org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
15/03/05 14:36:34 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=600000 watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher#70e69669
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1425562594502 end=1425562594523 duration=21 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE hive_es_cdr_10
SELECT NULL,h.appelant,h.called_number,h.call_dur,h.loc_number,h.h_appel FROM hive_cdr h limit 2
15/03/05 14:36:34 INFO ql.Driver: Total jobs = 1
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1425562594500 end=1425562594526 duration=26 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=task.MAPRED.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:34 INFO ql.Driver: Launching Job 1 out of 1
15/03/05 14:36:34 INFO exec.Task: Number of reduce tasks determined at compile time: 1
15/03/05 14:36:34 INFO exec.Task: In order to change the average load for a reducer (in bytes):
15/03/05 14:36:34 INFO exec.Task: set hive.exec.reducers.bytes.per.reducer=<number>
15/03/05 14:36:34 INFO exec.Task: In order to limit the maximum number of reducers:
15/03/05 14:36:34 INFO exec.Task: set hive.exec.reducers.max=<number>
15/03/05 14:36:34 INFO exec.Task: In order to set a constant number of reducers:
15/03/05 14:36:34 INFO exec.Task: set mapreduce.job.reduces=<number>
15/03/05 14:36:34 INFO ql.Context: New scratch dir is hdfs://master:8020/tmp/hive-hive/hive_2015-03-05_14-36-34_378_4527939627221909415-7
15/03/05 14:36:34 INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
15/03/05 14:36:34 INFO mr.ExecDriver: adding libjars: file:///tmp/d39b23a8-98d2-4bc3-9008-3eff080dd20c_resources/hive-serdes-1.0-SNAPSHOT.jar,file:///usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-hive-2.0.2.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hive/lib/hive-hbase-handler-0.13.1-cdh5.3.1.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-server.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/lib/htrace-core.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/lib/htrace-core-2.04.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-common.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-client.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-protocol.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-hadoop2-compat.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-hadoop-compat.jar
15/03/05 14:36:34 INFO exec.Utilities: Processing alias h
15/03/05 14:36:34 INFO exec.Utilities: Adding input file hdfs://master:8020/user/hive/warehouse/hive_cdr
15/03/05 14:36:34 INFO exec.Utilities: Content Summary not cached for hdfs://master:8020/user/hive/warehouse/hive_cdr
15/03/05 14:36:34 INFO ql.Context: New scratch dir is hdfs://master:8020/tmp/hive-hive/hive_2015-03-05_14-36-34_378_4527939627221909415-7
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/03/05 14:36:34 INFO exec.Utilities: Serializing MapWork via kryo
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=serializePlan start=1425562594554 end=1425562594638 duration=84 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
15/03/05 14:36:34 INFO exec.Utilities: Serializing ReduceWork via kryo
15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=serializePlan start=1425562594653 end=1425562594708 duration=55 from=org.apache.hadoop.hive.ql.exec.Utilities>
15/03/05 14:36:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.44.162.169:8032
15/03/05 14:36:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.44.162.169:8032
15/03/05 14:36:34 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
15/03/05 14:36:34 INFO mr.EsOutputFormat: Writing to [myindex/mytype]
15/03/05 14:36:34 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/03/05 14:36:35 INFO log.PerfLogger: <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/03/05 14:36:35 INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://master:8020/user/hive/warehouse/hive_cdr; using filter path hdfs://master:8020/user/hive/warehouse/hive_cdr
15/03/05 14:36:35 INFO input.FileInputFormat: Total input paths to process : 1
15/03/05 14:36:35 INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 3, size left: 0
15/03/05 14:36:35 INFO io.CombineHiveInputFormat: number of splits 1
15/03/05 14:36:35 INFO log.PerfLogger: </PERFLOG method=getSplits start=1425562595867 end=1425562595896 duration=29 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
15/03/05 14:36:35 INFO mapreduce.JobSubmitter: number of splits:1
15/03/05 14:36:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425457357655_0006
15/03/05 14:36:36 INFO impl.YarnClientImpl: Submitted application application_1425457357655_0006
15/03/05 14:36:36 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1425457357655_0006/
15/03/05 14:36:36 INFO exec.Task: Starting Job = job_1425457357655_0006, Tracking URL = http://master:8088/proxy/application_1425457357655_0006/
15/03/05 14:36:36 INFO exec.Task: Kill Command = /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop/bin/hadoop job -kill job_1425457357655_0006
15/03/05 14:36:58 INFO exec.Task: Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
15/03/05 14:36:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
15/03/05 14:36:58 INFO exec.Task: 2015-03-05 14:36:58,687 Stage-0 map = 0%, reduce = 0%
15/03/05 14:36:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
15/03/05 14:36:58 ERROR exec.Task: Ended Job = job_1425457357655_0006 with errors
15/03/05 14:36:58 INFO impl.YarnClientImpl: Killed application application_1425457357655_0006
15/03/05 14:36:58 ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
15/03/05 14:36:58 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1425562594523 end=1425562618754 duration=24231 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:58 INFO ql.Driver: MapReduce Jobs Launched:
15/03/05 14:36:58 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
15/03/05 14:36:58 INFO ql.Driver: Stage-Stage-0: HDFS Read: 0 HDFS Write: 0 FAIL
15/03/05 14:36:58 INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec
15/03/05 14:36:58 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default/hive_es_cdr_10
15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default/hive_cdr
15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default
15/03/05 14:36:58 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1425562618768 end=1425562618780 duration=12 from=org.apache.hadoop.hive.ql.Driver>
15/03/05 14:36:58 ERROR operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147)
at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run
It seems the failure is caused by a type issue. You can use es.mapping property to set types in TBLPROPERTIES

java.lang.NullPointerException at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.close

I am running two map-reduce pairs. The output of first map-reduce is being used as the input for the next map-reduce. In order to do that I have given the job.setOutputFormatClass(SequenceFileOutputFormat.class). While running the following Driver class:
package org;
import org.apache.commons.configuration.ConfigurationFactory;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.mahout.math.VarLongWritable;
import org.apache.mahout.math.VectorWritable;
public class Driver1 extends Configured implements Tool
{
public int run(String[] args) throws Exception
{
if(args.length !=3) {
System.err.println("Usage: MaxTemperatureDriver <input path> <outputpath>");
System.exit(-1);
}
//ConfFactory WorkFlow=new ConfFactory(new Path("/input.txt"),new Path("/output.txt"),TextInputFormat.class,VarLongWritable.class,Text.class,VarLongWritable.class,VectorWritable.class,SequenceFileOutputFormat.class);
Job job = new Job();
Job job1=new Job();
job.setJarByClass(Driver1.class);
job.setJobName("Max Temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(UserVectorMapper.class);
job.setReducerClass(UserVectorReducer.class);
job.setOutputKeyClass(VarLongWritable.class);
job.setOutputValueClass(VectorWritable.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job1.setJarByClass(Driver1.class);
//job.setJobName("Max Temperature");
job1.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.addInputPath(job1, new Path("output/part-r-00000"));
FileOutputFormat.setOutputPath(job1,new Path(args[2]));
job1.setMapperClass(ItemToItemPrefMapper.class);
//job1.setReducerClass(UserVectorReducer.class);
job1.setOutputKeyClass(VectorWritable.class);
job1.setOutputValueClass(VectorWritable.class);
job1.setOutputFormatClass(SequenceFileOutputFormat.class);
System.exit(job.waitForCompletion(true) && job1.waitForCompletion(true) ? 0:1);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
Driver1 driver = new Driver1();
int exitCode = ToolRunner.run(driver, args);
System.exit(exitCode);
}
}
I am getting the following runtime log.
15/02/24 20:00:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/24 20:00:49 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/02/24 20:00:49 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/02/24 20:00:49 INFO input.FileInputFormat: Total input paths to process : 1
15/02/24 20:00:49 WARN snappy.LoadSnappy: Snappy native library not loaded
15/02/24 20:00:49 INFO mapred.JobClient: Running job: job_local1723586736_0001
15/02/24 20:00:49 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/24 20:00:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1723586736_0001_m_000000_0
15/02/24 20:00:49 INFO util.ProcessTree: setsid exited with exit code 0
15/02/24 20:00:49 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1185f32
15/02/24 20:00:49 INFO mapred.MapTask: Processing split: file:/home/smaiti/workspace/recommendationsy/data.txt:0+1979173
15/02/24 20:00:50 INFO mapred.MapTask: io.sort.mb = 100
15/02/24 20:00:50 INFO mapred.MapTask: data buffer = 79691776/99614720
15/02/24 20:00:50 INFO mapred.MapTask: record buffer = 262144/327680
15/02/24 20:00:50 INFO mapred.JobClient: map 0% reduce 0%
15/02/24 20:00:50 INFO mapred.MapTask: Starting flush of map output
15/02/24 20:00:51 INFO mapred.MapTask: Finished spill 0
15/02/24 20:00:51 INFO mapred.Task: Task:attempt_local1723586736_0001_m_000000_0 is done. And is in the process of commiting
15/02/24 20:00:51 INFO mapred.LocalJobRunner:
15/02/24 20:00:51 INFO mapred.Task: Task 'attempt_local1723586736_0001_m_000000_0' done.
15/02/24 20:00:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1723586736_0001_m_000000_0
15/02/24 20:00:51 INFO mapred.LocalJobRunner: Map task executor complete.
15/02/24 20:00:51 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#9cce9
15/02/24 20:00:51 INFO mapred.LocalJobRunner:
15/02/24 20:00:51 INFO mapred.Merger: Merging 1 sorted segments
15/02/24 20:00:51 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2074779 bytes
15/02/24 20:00:51 INFO mapred.LocalJobRunner:
15/02/24 20:00:51 INFO mapred.Task: Task:attempt_local1723586736_0001_r_000000_0 is done. And is in the process of commiting
15/02/24 20:00:51 INFO mapred.LocalJobRunner:
15/02/24 20:00:51 INFO mapred.Task: Task attempt_local1723586736_0001_r_000000_0 is allowed to commit now
15/02/24 20:00:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1723586736_0001_r_000000_0' to output
15/02/24 20:00:51 INFO mapred.LocalJobRunner: reduce > reduce
15/02/24 20:00:51 INFO mapred.Task: Task 'attempt_local1723586736_0001_r_000000_0' done.
15/02/24 20:00:51 INFO mapred.JobClient: map 100% reduce 100%
15/02/24 20:00:51 INFO mapred.JobClient: Job complete: job_local1723586736_0001
15/02/24 20:00:51 INFO mapred.JobClient: Counters: 20
15/02/24 20:00:51 INFO mapred.JobClient: File Output Format Counters
15/02/24 20:00:51 INFO mapred.JobClient: Bytes Written=1012481
15/02/24 20:00:51 INFO mapred.JobClient: File Input Format Counters
15/02/24 20:00:51 INFO mapred.JobClient: Bytes Read=1979173
15/02/24 20:00:51 INFO mapred.JobClient: FileSystemCounters
15/02/24 20:00:51 INFO mapred.JobClient: FILE_BYTES_READ=6033479
15/02/24 20:00:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=5264031
15/02/24 20:00:51 INFO mapred.JobClient: Map-Reduce Framework
15/02/24 20:00:51 INFO mapred.JobClient: Reduce input groups=943
15/02/24 20:00:51 INFO mapred.JobClient: Map output materialized bytes=2074783
15/02/24 20:00:51 INFO mapred.JobClient: Combine output records=0
15/02/24 20:00:51 INFO mapred.JobClient: Map input records=100000
15/02/24 20:00:51 INFO mapred.JobClient: Reduce shuffle bytes=0
15/02/24 20:00:51 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
15/02/24 20:00:51 INFO mapred.JobClient: Reduce output records=943
15/02/24 20:00:51 INFO mapred.JobClient: Spilled Records=200000
15/02/24 20:00:51 INFO mapred.JobClient: Map output bytes=1874777
15/02/24 20:00:51 INFO mapred.JobClient: Total committed heap usage (bytes)=415760384
15/02/24 20:00:51 INFO mapred.JobClient: CPU time spent (ms)=0
15/02/24 20:00:51 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
15/02/24 20:00:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=118
15/02/24 20:00:51 INFO mapred.JobClient: Map output records=100000
15/02/24 20:00:51 INFO mapred.JobClient: Combine input records=0
15/02/24 20:00:51 INFO mapred.JobClient: Reduce input records=100000
15/02/24 20:00:51 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/02/24 20:00:51 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/02/24 20:00:51 INFO input.FileInputFormat: Total input paths to process : 1
15/02/24 20:00:51 INFO mapred.JobClient: Running job: job_local735350013_0002
15/02/24 20:00:51 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/24 20:00:51 INFO mapred.LocalJobRunner: Starting task: attempt_local735350013_0002_m_000000_0
15/02/24 20:00:51 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1a970
15/02/24 20:00:51 INFO mapred.MapTask: Processing split: file:/home/smaiti/workspace/recommendationsy/output/part-r-00000:0+1004621
15/02/24 20:00:51 INFO mapred.MapTask: io.sort.mb = 100
15/02/24 20:00:51 INFO mapred.MapTask: data buffer = 79691776/99614720
15/02/24 20:00:51 INFO mapred.MapTask: record buffer = 262144/327680
15/02/24 20:00:51 INFO mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader#9cc591
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.close(SequenceFileRecordReader.java:101)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/24 20:00:51 INFO mapred.LocalJobRunner: Map task executor complete.
15/02/24 20:00:51 WARN mapred.LocalJobRunner: job_local735350013_0002
java.lang.Exception: java.lang.ClassCastException: class org.apache.mahout.math.VectorWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: class org.apache.mahout.math.VectorWritable
at java.lang.Class.asSubclass(Class.java:3208)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:964)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/24 20:00:52 INFO mapred.JobClient: map 0% reduce 0%
15/02/24 20:00:52 INFO mapred.JobClient: Job complete: job_local735350013_0002
15/02/24 20:00:52 INFO mapred.JobClient: Counters: 0
The first exception that I am getting is this:
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.close(SequenceFileRecordReader.java:101)
Please help.
This is mainly because Hadoop is confused while Serializing the data.
Make sure to
You should set Input and output file format class to both the reducers.
Check that Inputformat of second class is OutputFormat of first class.
It might be possible that intermediate file format is different from what the reducer is expecting to read.
Maintain consistent FileFormats across your program.

error in reducer function of hadoop multi node cluster

i follows the link here
when i run command of step 8 of the above tutorial:-
hduser#ila:/usr/local/hadoop-0.22.0$ ./bin/hadoop jar hadoop-mapred-examples-0.22.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg6-out
it runs map function correctly but not the reduce function, gives error as follos:-
12/04/24 02:06:56 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/04/24 02:06:56 INFO input.FileInputFormat: Total input paths to process : 3
12/04/24 02:06:56 INFO mapreduce.JobSubmitter: number of splits:3
12/04/24 02:06:56 INFO mapreduce.Job: Running job: job_201204232307_0012
12/04/24 02:06:57 INFO mapreduce.Job: map 0% reduce 0%
12/04/24 02:07:06 INFO mapreduce.Job: map 33% reduce 0%
12/04/24 02:07:09 INFO mapreduce.Job: map 100% reduce 0%
12/04/24 02:07:15 INFO mapreduce.Job: map 100% reduce 11%
12/04/24 02:08:14 INFO mapreduce.Job: Task Id : attempt_201204232307_0012_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
at org.apache.hadoop.mapred.Child$4.run(Child.java:223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.mapred.Child.main(Child.java:217)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:227)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
how can i overcome from this problem..
#chris white i faced a new problem that gives a new error when i run the above command
hduser#vijay-P5E-VM-DO:/usr/local/hadoop-1.0.0$ ./bin/hadoop jar hadoop-examples-1.0.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg9-out
12/05/14 20:42:28 INFO mapred.JobClient: Cleaning up the staging area hdfs://master:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201205142041_0001
12/05/14 20:42:28 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://master:54310/user/hduser/gutenberg
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://master:54310/user/hduser/gutenberg
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Resources