Table not fount exception (E0729) , while executing hive query from oozie workflow - hadoop

Script_SusRes.q
select * from ufo_session_details limit 5
Workflow_SusRes.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hive-wf">
<start to="hive-node"/>
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<script>Script_SusRes.q</script>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
SusRes.properties
oozieClientUrl=http://zltv5636.vci.att.com:11000/oozie
nameNode=hdfs://zltv5635.vci.att.com:8020
jobTracker=zltv5636.vci.att.com:50300
queueName=default
userName=wfe
oozie.use.system.libpath=true
oozie.libpath = ${nameNode}/tmp/nt283s
oozie.wf.application.path=/tmp/nt283s/workflow_SusRes.xml
Error Log
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001] Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
stderr logs
Logging initialized using configuration in file:/opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/hive-log4j.properties
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'ufo_session_details'
Intercepting System.exit(10001)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
syslog logs
2015-11-03 00:26:20,599 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2015-11-03 00:26:20,902 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/8045442539840332845_326451332_1282624021/zltv5635.vci.att.com/tmp/nt283s/Script_SusRes.q <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/Script_SusRes.q
2015-11-03 00:26:20,911 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/3435440518513182209_187825668_1219418250/zltv5635.vci.att.com/tmp/nt283s/Script_SusRes.sql <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/Script_SusRes.sql
2015-11-03 00:26:20,913 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/-5883507949569818012_2054276612_1203833745/zltv5635.vci.att.com/tmp/nt283s/lib <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/lib
2015-11-03 00:26:20,916 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/6682880817470643170_1186359172_1225814386/zltv5635.vci.att.com/tmp/nt283s/workflow_SusRes.xml <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/workflow_SusRes.xml
2015-11-03 00:26:21,441 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2015-11-03 00:26:21,448 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#698cdde3
2015-11-03 00:26:21,602 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://zltv5635.vci.att.com:8020/user/wfe/oozie-oozi/0000088-151013062722898-oozie-oozi-W/hive-node--hive/input/dummy.txt:0+5
2015-11-03 00:26:21,630 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library
2015-11-03 00:26:21,635 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
2015-11-03 00:26:21,652 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available
2015-11-03 00:26:21,652 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded
2015-11-03 00:26:21,663 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-11-03 00:26:22,654 INFO SessionState:
Logging initialized using configuration in file:/opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/hive-log4j.properties
2015-11-03 00:26:22,910 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG method=Driver.run>
2015-11-03 00:26:22,911 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG method=TimeToSubmit>
2015-11-03 00:26:22,912 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG method=compile>
2015-11-03 00:26:22,998 INFO hive.ql.parse.ParseDriver: Parsing command: select * from ufo_session_details limit 5
2015-11-03 00:26:23,618 INFO hive.ql.parse.ParseDriver: Parse Completed
2015-11-03 00:26:23,799 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Starting Semantic Analysis
2015-11-03 00:26:23,802 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
2015-11-03 00:26:23,802 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Get metadata for source tables
2015-11-03 00:26:23,990 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-11-03 00:26:24,031 INFO org.apache.hadoop.hive.metastore.ObjectStore: ObjectStore, initialize called
2015-11-03 00:26:24,328 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
2015-11-03 00:26:28,112 INFO org.apache.hadoop.hive.metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-11-03 00:26:28,169 INFO org.apache.hadoop.hive.metastore.ObjectStore: Initialized ObjectStore
2015-11-03 00:26:30,767 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 0: get_table : db=default tbl=ufo_session_details
2015-11-03 00:26:30,768 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=wfe ip=unknown-ip-addr cmd=get_table : db=default tbl=ufo_session_details
2015-11-03 00:26:30,781 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-11-03 00:26:30,782 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-11-03 00:26:33,319 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: NoSuchObjectException(message:default.ufo_session_details table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
at com.sun.proxy.$Proxy11.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:836)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
at com.sun.proxy.$Proxy12.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:945)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:887)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8680)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:261)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:238)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

1677 [main] INFO org.apache.hadoop.hive.ql.Driver -
1679 [main] INFO org.apache.hadoop.hive.ql.Driver -
1680 [main] INFO org.apache.hadoop.hive.ql.Driver -
1771 [main] INFO hive.ql.parse.ParseDriver - Parsing command: select * from ufo_session_master limit 5
2512 [main] INFO hive.ql.parse.ParseDriver - Parse Completed
2683 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Starting Semantic Analysis
2686 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Completed phase 1 of Semantic Analysis
2686 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Get metadata for source tables
2831 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://zltv5636.vci.att.com:9083
2952 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
2952 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
3952 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://zltv5636.vci.att.com:9083
3959 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
3960 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
4960 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://zltv5636.vci.att.com:9083
4967 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
4967 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
5978 [main] ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table ufo_session_master

Related

Execute hive query cause yarn resource manager to throw file does not exist exception

I'm configuring hive 3.1.0 to work with hadoop 3.0.0.
This error throw almost immediately when I submit a simple query on beeline that cause map reduce
0: jdbc:hive2://> select count(*) from airlinedata;
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in the futureversions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = UUT81HC_20181011102444_2df01ff5-ca05-403c-b0e1-15f8f7715dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
2018-10-11 10:24:45,510 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
2018-10-11 10:24:45,555 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.mapreduce.TypeConverter.toYarn(TypeConverter.java:78)
at org.apache.hadoop.mapred.ClientServiceDelegate.(ClientServiceDelegate.java:120)
at org.apache.hadoop.mapred.ClientCache.getClient(ClientCache.java:68)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:343)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 40 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.(JobIdPBImpl.java:37)
... 45 more
yarn resoucemanager stacktrace
2018-10-11 10:24:49,896 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2018-10-11 10:24:49,896 INFO recovery.RMStateStore: Updating info for app: application_1539226955170_0002
2018-10-11 10:24:49,897 INFO capacity.CapacityScheduler: Application Attempt appattempt_1539226955170_0002_000002 is done. finalState=FAILED
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application.
2018-10-11 10:24:49,897 INFO scheduler.AppSchedulingInfo: Application application_1539226955170_0002 requests cleared
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2018-10-11 10:24:49,898 INFO capacity.LeafQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2018-10-11 10:24:49,898 WARN resourcemanager.RMAuditLogger: USER=UUT81HC OPERATION=Application Finished - Failed
TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application. APPID=application_1539226955170_0002
2018-10-11 10:24:49,898 INFO capacity.ParentQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC leaf-queue of parent: root #applications: 0
2018-10-11 10:24:49,899 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1539226955170_0002,name=select count(*) from airlinedata (Stage-1),user=UUT81HC,queue=default,state=FAILED,trackingUrl=http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002,appMasterHost=N/A,submitTime=1539228287412,startTime=1539228287413,finishTime=1539228289896,finalStatus=FAILED,memorySeconds=1482,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE,resourceSeconds=1482 MB-seconds\, 0 vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
After examine how hive execute mapreduce job on yarn, I found that it first it create map.xml and reduce.xml in /tmp with permission drwx------ (only owner can use it)
2018-10-11 10:24:45,133 INFO hdfs.StateChange: BLOCK* allocate blk_1073742318_1495, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,225 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,248 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,294 INFO hdfs.StateChange: BLOCK* allocate blk_1073742319_1496, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,411 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,437 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,772 INFO hdfs.StateChange: BLOCK* allocate blk_1073742320_1497, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,438 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,463 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,618 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,639 INFO hdfs.StateChange: BLOCK* allocate blk_1073742321_1498, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,706 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,791 INFO hdfs.StateChange: BLOCK* allocate blk_1073742322_1499, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo
2018-10-11 10:24:46,870 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,971 INFO hdfs.StateChange: BLOCK* allocate blk_1073742323_1500, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml
2018-10-11 10:24:47,370 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:32:15,741 INFO blockmanagement.BlockManager: StorageInfo TreeSet fill ratio DS-d4c2a5a0-435d-4b44-b408-3cd04587cd09 : 1.0
But somehow yarn can't read that when executing job and throw out file does not exist. I did set permission 777 on /tmp but this file is self create by hive in executing process so I can't do anything with it.
I doubt that this problem is something related to user or permission when using hive in hadoop. What should I do with this?

Does sqoop spill temporary data to disk

As I understand sqoop, it launches few mappers on different data nodes making jdbc connection with RDBMS. Once connection is formed data is transferred to HDFS.
Just trying to understand, does sqoop mapper spill data temporary on disk (data node)? I know spilling happens in MapReduce but not sure about sqoop job.
It seems sqoop-import runs on mapper and doesn't spill. And sqoop-merge runs on map-reduce and does spill. You can check it on Job tracker during sqoop import run.
Have a look at this part of sqoop import log, it does not spill, fetches and writes to hdfs:
INFO [main] ... mapreduce.db.DataDrivenDBRecordReader: Using query: SELECT...
[main] mapreduce.db.DBRecordReader: Executing query: SELECT...
INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
INFO [Thread-16] ...mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1489705733959_2462784_m_000000_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1489705733959_2462784_m_000000_0' to hdfs://
Have a look at this sqoop-merge log(skipped some rows), it spills on disk (note Spilling map output in the log):
INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://bla-bla/part-m-00000:0+48322717
...
INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
...
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1024
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 751619264
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1073741824
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 268435452; length = 67108864
INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$**MapOutputBuffer**
INFO [main] com.pepperdata.supervisor.agent.resource.r: Datanode bla-bla is LOCAL.
INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy]
...
INFO [main] org.apache.hadoop.mapred.MapTask: **Starting flush of map output**
INFO [main] org.apache.hadoop.mapred.MapTask: **Spilling map output**
INFO [main] org.apache.hadoop.mapred.MapTask: **bufstart** = 0; **bufend** = 184775274; bufvoid = 1073741824
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 268435452(1073741808); kvend = 267347800(1069391200); length = 1087653/67108864
INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
[main] org.apache.hadoop.mapred.MapTask: Finished spill 0
...Task:attempt_1489705733959_2479291_m_000000_0 is done. And is in the process of committing

hive1.2.1 create table failed MetaException

when i create a table in my CLI,here i get some message like follow,help!
hive> create table a (x int);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
what should i do?
here is my version message:
hive version is 1.2.1
mac 10.10.3
hadoop 2.6.0
2015-07-17 10:39:37,384 ERROR [main]: exec.DDLTask (DDLTask.java:failed(520)) - org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4135)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.reconnect(HiveMetaStoreClient.java:308)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:148)
at com.sun.proxy.$Proxy8.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
... 21 more
2015-07-17 10:39:37,385 ERROR [main]: ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
2015-07-17 10:39:37,385 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2015-07-17 10:39:37,386 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2015-07-17 10:39:37,386 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2015-07-17 10:39:37,388 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2015-07-17 10:39:37,388 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -

Getting error while running query on hive over tez

Getting error while running query on hive over tez. As per logs, hive is failing while copying tez jars to a hdfs location on start of tez session.Below is the complete log obtained from hive log file :
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Query ID = saurabh_20150619012323_f52f1d6c-2adb-4edc-8ba4-b64d7d898325
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Total jobs = 1
2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=TimeToSubmit start=1434657232288 end=1434657232289 duration=1 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.TEZ.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Launching Job 1 out of 1
2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (Driver.java:launchTask(1630)) - Starting task [Stage-1:MAPRED] in parallel
2015-06-19 01:23:52,312 INFO [Thread-21]: session.SessionState (SessionState.java:start(488)) - No Tez session required at this point. hive.execution.engine=mr.
2015-06-19 01:23:52,314 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getSession(125)) - QueueName: null nonDefaultUser: true defaultQueuePool: null blockingQueueLength: -1
2015-06-19 01:23:52,315 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getNewSessionState(154)) - Created a new session for queue: null session id: 85d83746-a48e-419e-a7ca-8c98faf173ea
2015-06-19 01:23:52,380 INFO [Thread-21]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2015-06-19 01:23:52,412 INFO [Thread-21]: ql.Context (Context.java:getMRScratchDir(328)) - New scratch dir is hdfs://localhost:9000/tmp/hive/saurabh/e5a701ae-242d-488f-beec-cf18878becdc/hive_2015-06-19_01-23-49_794_2167174123575230985-2
2015-06-19 01:23:52,420 INFO [Thread-21]: exec.Task (TezTask.java:updateSession(233)) - Tez session hasn't been created yet. Opening session
2015-06-19 01:23:52,420 INFO [Thread-21]: tez.TezSessionState (TezSessionState.java:open(142)) - User of session id 85d83746-a48e-419e-a7ca-8c98faf173ea is saurabh
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(950)) - Localizing resource because it does not exist: file:/usr/lib/tez/* to dest: hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/*
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(954)) - Looks like another thread is writing the same file will wait.
2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(961)) - Number of wait attempts: 5. Wait interval: 5000
2015-06-19 01:24:17,449 ERROR [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(977)) - Could not find the jar that was being uploaded
2015-06-19 01:24:17,451 ERROR [Thread-21]: exec.Task (TezTask.java:execute(184)) - Failed to execute tez graph.
java.io.IOException: Previous writer likely failed to write hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/*. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:978)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:859)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:802)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:228)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:154)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
2015-06-19 01:24:18,329 ERROR [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printError(861)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1434657232288 end=1434657258329 duration=26041 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258329 end=1434657258329 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,333 ERROR [HiveServer2-Background-Pool: Thread-41]: operation.Operation (SQLOperation.java:run(200)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:147)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-19 01:24:18,342 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(595)) - 40 finished. closing...
2015-06-19 01:24:18,343 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(613)) - 40 Close done
2015-06-19 01:24:18,393 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-06-19 01:24:18,394 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258393 end=1434657258394 duration=1 from=org.apache.hadoop.hive.ql.Driver>

Pig "Max" command for pig-0.12.1 and pig-0.13.0 with Hadoop-2.4.0

I have a pig script I got from Hortonworks that works fine with pig-0.9.2.15 with Hadoop-1.0.3.16. But when I run it with pig-0.12.1(recompiled with -Dhadoopversion=23) or pig-0.13.0 on Hadoop-2.4.0, it won't work.
It seems the following line is where the problem is.
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
Here's the whole script.
batting = load 'pig_data/Batting.csv' using PigStorage(',');
runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
grp_data = GROUP runs by (year);
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
STORE join_data INTO './join_data';
And here's the hadoop error info:
2014-07-29 18:03:02,957 [main] ERROR
org.apache.pig.tools.pigstats.PigStats - ERROR 0:
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: grp_data: Local
Rearrange[tuple]{bytearray}(false) - scope-34 Operator Key: scope-34):
org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
Error executing an algebraic function 2014-07-29 18:03:02,958 [main]
ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map
reduce job(s) failed!
How can I fix this if I still want to use "MAX" function? Thank you!
Here's the complete information:
14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType :
MAPREDUCE 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Picked
MAPREDUCE as the ExecType 2014-07-29 17:50:12,104 [main] INFO
org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled
Jun 29 2014, 02:27:58 2014-07-29 17:50:12,104 [main] INFO
org.apache.pig.Main - Logging error messages to:
/root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log
2014-07-29 17:50:13,050 [main] INFO org.apache.pig.impl.util.Utils -
Default bootup file /root/.pigbootup not found 2014-07-29 17:50:13,415
[main] INFO org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.tracker is deprecated. Instead, use
mapreduce.jobtracker.address 2014-07-29 17:50:13,415 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:13,415 [main]
INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at:
hdfs://namenode.cmda.hadoop.com:8020 2014-07-29 17:50:14,302 [main]
INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: namenode.cmda.hadoop.com:8021
2014-07-29 17:50:14,990 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:15,570 [main]
INFO org.apache.hadoop.conf.Configuration.deprecation -
fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29
17:50:15,665 [main] WARN org.apache.pig.newplan.BaseOperatorPlan -
Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 2014-07-29
17:50:15,705 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.textoutputformat.separator is deprecated. Instead, use
mapreduce.output.textoutputformat.separator 2014-07-29 17:50:15,791
[main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features
used in the script: HASH_JOIN,GROUP_BY 2014-07-29 17:50:15,873 [main]
INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,
GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, PartitionFilterOptimizer,
PushDownForEachFlatten, PushUpFilter, SplitFilter,
StreamTypeCastInserter],
RULES_DISABLED=[FilterLogicExpressionSimplifier]} 2014-07-29
17:50:16,319 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false 2014-07-29 17:50:16,377 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer
- Choosing to move algebraic foreach to combiner 2014-07-29 17:50:16,410 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
- Rewrite: POPackage->POForEach to POPackage(JoinPackager) 2014-07-29 17:50:16,417 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3 2014-07-29 17:50:16,418 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 map-reduce splittees. 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 out of total 3 MR operators. 2014-07-29 17:50:16,418 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 2 2014-07-29 17:50:16,493 [main] INFO org.apache.hadoop.conf.Configuration.deprecation -
fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29
17:50:16,575 [main] INFO org.apache.hadoop.yarn.client.RMProxy -
Connecting to ResourceManager at
namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29 17:50:16,973 [main]
INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig
script settings are added to the job 2014-07-29 17:50:17,007 [main]
INFO org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent 2014-07-29 17:50:17,007
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-29 17:50:17,007 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress 2014-07-29 17:50:17,020
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Reduce phase detected, estimating # of required reducers. 2014-07-29 17:50:17,020 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-07-29 17:50:17,064 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=6398990 2014-07-29 17:50:17,067 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting Parallelism to 1 2014-07-29 17:50:17,067 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks
is deprecated. Instead, use mapreduce.job.reduces 2014-07-29
17:50:17,068 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- This job cannot be converted run in-process 2014-07-29 17:50:17,068 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job2337803902169382273.jar 2014-07-29 17:50:20,957 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job2337803902169382273.jar created 2014-07-29 17:50:20,957 [main] INFO org.apache.hadoop.conf.Configuration.deprecation -
mapred.jar is deprecated. Instead, use mapreduce.job.jar 2014-07-29
17:50:21,001 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job 2014-07-29 17:50:21,036 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is
false, will not generate code. 2014-07-29 17:50:21,036 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cacche 2014-07-29 17:50:21,046 [main]
INFO org.apache.pig.data.SchemaTupleFrontend - Setting key
[pig.schematuple.classes] with classes to deserialize [] 2014-07-29
17:50:21,310 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission. 2014-07-29 17:50:21,311 [main] INFO org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.tracker.http.address is deprecated. Instead, use
mapreduce.jobtracker.http.address 2014-07-29 17:50:21,332 [JobControl]
INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to
ResourceManager at namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29
17:50:21,366 [JobControl] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:22,606
[JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1 2014-07-29 17:50:22,606 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1 2014-07-29 17:50:22,629 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1 2014-07-29 17:50:22,729
[JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number
of splits:1 2014-07-29 17:50:22,745 [JobControl] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:23,026
[JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -
Submitting tokens for job: job_1406677482986_0003 2014-07-29
17:50:23,258 [JobControl] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1406677482986_0003 2014-07-29 17:50:23,340
[JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track
the job:
http://namenode.cmda.hadoop.com:8088/proxy/application_1406677482986_0003/
2014-07-29 17:50:23,340 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_1406677482986_0003 2014-07-29 17:50:23,340 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases batting,grp_data,max_runs,runs 2014-07-29 17:50:23,340 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: batting[3,10],runs[5,7],max_runs[7,11],grp_data[6,11] C:
max_runs[7,11],grp_data[6,11] R: max_runs[7,11] 2014-07-29
17:50:23,340 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://namenode.cmda.hadoop.com:50030/jobdetails.jsp?jobid=job_1406677482986_0003
2014-07-29 17:50:23,357 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete 2014-07-29 17:50:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:15,564 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete 2014-07-29 17:51:15,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:18,582 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2014-07-29 17:51:18,582 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_1406677482986_0003 has failed! Stop running all dependent jobs 2014-07-29 17:51:18,582 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete 2014-07-29 17:51:18,825 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0:
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: grp_data: Local
Rearrange[tuple]{bytearray}(false) - scope-73 Operator Key: scope-73):
org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
Error executing an algebraic function 2014-07-29 17:51:18,825 [main]
ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map
reduce job(s) failed! 2014-07-29 17:51:18,826 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script
Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0 0.13.0 root 2014-07-29 17:50:16 2014-07-29 17:51:18 HASH_JOIN,GROUP_BY
Failed!
Failed Jobs: JobId Alias Feature Message Outputs
job_1406677482986_0003 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message:
Job failed!
Input(s): Failed to read data from
"hdfs://namenode.cmda.hadoop.com:8020/user/root/pig_data/Batting.csv"
Output(s):
Counters: Total records written : 0 Total bytes written : 0 Spillable
Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0
Job DAG: job_1406677482986_0003 -> null, null
2014-07-29 17:51:18,826 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed! 2014-07-29 17:51:18,827 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2106: Error executing
an algebraic function Details at logfile:
/root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log
2014-07-29 17:51:18,828 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job scope-58
failed, hadoop does not return any error message Details at logfile:
/root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log
try by casting MAX function
max_runs = FOREACH grp_data GENERATE group as grp, (int)MAX(runs.runs) as max_runs;
hope it will work
You should use data types in your load statement.
runs = FOREACH batting GENERATE $0 as playerID:chararray, $1 as year:int, $8 as runs:int;
If this doesn't help for some reason, try explicit casting.
max_runs = FOREACH grp_data GENERATE group as grp, MAX((int)runs.runs) as max_runs;
Thank both #BigData and #Mikko Kupsu for the hint. The issue does indeed have something to do the datatype casting.
After specifying the data type of each column as follows everything runs great.
batting =
LOAD '/user/root/pig_data/Batting.csv' USING PigStorage(',')
AS (playerID: CHARARRAY, yearID: INT, stint: INT, teamID: CHARARRAY, lgID: CHARARRAY,
G: INT, G_batting: INT, AB: INT, R: INT, H: INT, two_B: INT, three_B: INT, HR: INT, RBI: INT,
SB: INT, CS: INT, BB:INT, SO: INT, IBB: INT, HBP: INT, SH: INT, SF: INT, GIDP: INT, G_old: INT);

Resources