any hive query except select * hangs - hadoop
I just installed Hive in my Hadoop cluster and loaded my data into a Hive table. When I issue select * it works perfectly but when I issue select * from table where column1 in (select max(column1) from table ); it freezes. Please help me.
Here is my hive log
2017-02-17 07:42:28,116 INFO [main]: SessionState (SessionState.java:printInfo(951)) -
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
2017-02-17 07:42:28,438 WARN [main]: util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-02-17 07:42:28,560 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(589)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-02-17 07:42:28,710 INFO [main]: metastore.ObjectStore (ObjectStore.java:initialize(289)) - ObjectStore, initialize called
2017-02-17 07:42:30,831 INFO [main]: metastore.ObjectStore (ObjectStore.java:getPMF(370)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2017-02-17 07:42:33,354 INFO [main]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(139)) - Using direct SQL, underlying DB is DERBY
.....
2017-02-17 07:43:04,861 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:04,927 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:04,953 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: select consume_date,hour_id,fromdate,company_name,b03 from consumes where b03 in (select max(b03) from consumes)
2017-02-17 07:43:05,527 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2017-02-17 07:43:05,528 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=parse start=1487346184927 end=1487346185528 duration=601 from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:05,530 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:05,576 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10127)) - Starting Semantic Analysis
2017-02-17 07:43:05,579 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:genResolvedParseTree(10074)) - Completed phase 1 of Semantic Analysis
2017-02-17 07:43:05,579 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1552)) - Get metadata for source tables
2017-02-17 07:43:05,579 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_table : db=default tbl=consumes
2017-02-17 07:43:05,580 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=linux ip=unknown-ip-addr cmd=get_table : db=default tbl=consumes
2017-02-17 07:43:06,076 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1704)) - Get metadata for subqueries
2017-02-17 07:43:06,092 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1728)) - Get metadata for destination tables
2017-02-17 07:43:06,096 ERROR [main]: hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
2017-02-17 07:43:06,129 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:06,131 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:genResolvedParseTree(10078)) - Completed getting MetaData in Semantic Analysis
2017-02-17 07:43:06,252 INFO [main]: parse.BaseSemanticAnalyzer (CalcitePlanner.java:canCBOHandleAst(388)) - Not invoking CBO because the statement has too few joins
2017-02-17 07:43:06,450 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1552)) - Get metadata for source tables
2017-02-17 07:43:06,451 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_table : db=default tbl=consumes
2017-02-17 07:43:06,454 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=linux ip=unknown-ip-addr cmd=get_table : db=default tbl=consumes
2017-02-17 07:43:06,488 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1704)) - Get metadata for subqueries
2017-02-17 07:43:06,488 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:getMetaData(1728)) - Get metadata for destination tables
2017-02-17 07:43:06,631 INFO [main]: common.FileUtils (FileUtils.java:mkdir(501)) - Creating directory if it doesn't exist: hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10000/.hive-staging_hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:06,759 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:genFileSinkPlan(6653)) - Set stats collection dir : hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10000/.hive-staging_hive_2017-02-17_07-43-04_926_1561320960043112851-1/-ext-10002
2017-02-17 07:43:06,839 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for FS(16)
2017-02-17 07:43:06,840 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for SEL(15)
2017-02-17 07:43:06,841 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(457)) - Processing for JOIN(13)
2017-02-17 07:43:06,841 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for RS(10)
2017-02-17 07:43:06,841 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(402)) - Processing for FIL(9)
2017-02-17 07:43:06,846 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of FIL For Alias : consumes
2017-02-17 07:43:06,846 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - b03 is not null
2017-02-17 07:43:06,847 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(382)) - Processing for TS(0)
2017-02-17 07:43:06,847 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of TS For Alias : consumes
2017-02-17 07:43:06,847 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - b03 is not null
2017-02-17 07:43:06,849 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for RS(12)
2017-02-17 07:43:06,849 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(402)) - Processing for FIL(11)
2017-02-17 07:43:06,850 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of FIL For Alias :
2017-02-17 07:43:06,850 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - _col0 is not null
2017-02-17 07:43:06,850 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for GBY(8)
2017-02-17 07:43:06,851 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of GBY For Alias :
2017-02-17 07:43:06,851 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - _col0 is not null
2017-02-17 07:43:06,851 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for SEL(7)
2017-02-17 07:43:06,851 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of SEL For Alias : sq_1
2017-02-17 07:43:06,851 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - _col0 is not null
2017-02-17 07:43:06,852 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for SEL(6)
2017-02-17 07:43:06,852 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(707)) - Pushdown Predicates of SEL For Alias : sq_1
2017-02-17 07:43:06,852 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:logExpr(710)) - _col0 is not null
2017-02-17 07:43:06,852 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for GBY(5)
2017-02-17 07:43:06,853 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for RS(4)
2017-02-17 07:43:06,853 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for GBY(3)
2017-02-17 07:43:06,853 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(655)) - Processing for SEL(2)
2017-02-17 07:43:06,853 INFO [main]: ppd.OpProcFactory (OpProcFactory.java:process(382)) - Processing for TS(1)
2017-02-17 07:43:06,863 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
2017-02-17 07:43:06,863 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=partition-retrieving start=1487346186863 end=1487346186863 duration=0 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
2017-02-17 07:43:06,880 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneJoinOperator(975)) - JOIN 13 oldExprs: {0=[Column[VALUE._col0], Column[VALUE._col1], Column[VALUE._col2], Column[VALUE._col3], Column[VALUE._col4], Column[VALUE._col5], Column[KEY.reducesinkkey0], Column[VALUE._col6], Column[VALUE._col7], Column[VALUE._col8], Column[VALUE._col9], Column[VALUE._col10], Column[VALUE._col11], Column[VALUE._col12]], 1=[]}
2017-02-17 07:43:06,880 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneJoinOperator(1080)) - JOIN 13 newExprs: {0=[Column[VALUE._col0], Column[VALUE._col1], Column[VALUE._col2], Column[VALUE._col5], Column[KEY.reducesinkkey0]], 1=[]}
2017-02-17 07:43:06,881 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(817)) - RS 10 oldColExprMap: {VALUE._col10=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col11=Column[INPUT__FILE__NAME], VALUE._col12=Column[ROW__ID], KEY.reducesinkkey0=Column[b03], VALUE._col2=Column[fromdate], VALUE._col3=Column[todate], VALUE._col4=Column[company_code], VALUE._col5=Column[company_name], VALUE._col0=Column[consume_date], VALUE._col1=Column[hour_id], VALUE._col6=Column[b04], VALUE._col7=Column[b27], VALUE._col8=Column[b31], VALUE._col9=Column[b32]}
2017-02-17 07:43:06,881 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(866)) - RS 10 newColExprMap: {KEY.reducesinkkey0=Column[b03], VALUE._col2=Column[fromdate], VALUE._col5=Column[company_name], VALUE._col0=Column[consume_date], VALUE._col1=Column[hour_id]}
2017-02-17 07:43:06,881 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(817)) - RS 12 oldColExprMap: {KEY.reducesinkkey0=Column[_col0]}
2017-02-17 07:43:06,881 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(866)) - RS 12 newColExprMap: {KEY.reducesinkkey0=Column[_col0]}
2017-02-17 07:43:06,883 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(817)) - RS 4 oldColExprMap: {VALUE._col0=Column[_col0]}
2017-02-17 07:43:06,883 INFO [main]: optimizer.ColumnPrunerProcFactory (ColumnPrunerProcFactory.java:pruneReduceSinkOperator(866)) - RS 4 newColExprMap: {VALUE._col0=Column[_col0]}
2017-02-17 07:43:06,948 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:06,956 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=getInputSummary from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:06,984 INFO [main]: exec.Utilities (Utilities.java:run(2615)) - Cannot get size of hdfs://hadoopmaster:9000/user/hive/warehouse/consumes. Safely ignored.
2017-02-17 07:43:06,987 INFO [main]: exec.Utilities (Utilities.java:run(2615)) - Cannot get size of hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10003. Safely ignored.
2017-02-17 07:43:06,988 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=getInputSummary start=1487346186956 end=1487346186988 duration=32 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:06,990 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=clonePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,123 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,123 INFO [main]: exec.Utilities (Utilities.java:serializePlan(938)) - Serializing MapredWork via kryo
2017-02-17 07:43:07,321 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=serializePlan start=1487346187123 end=1487346187321 duration=198 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,321 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,321 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(965)) - Deserializing MapredWork via kryo
2017-02-17 07:43:07,387 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=deserializePlan start=1487346187321 end=1487346187387 duration=66 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,387 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=clonePlan start=1487346186990 end=1487346187387 duration=397 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,400 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:07,401 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:07,406 INFO [main]: physical.LocalMapJoinProcFactory (LocalMapJoinProcFactory.java:process(139)) - Setting max memory usage to 0.9 for table sink not followed by group by
2017-02-17 07:43:07,447 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(175)) - Looking for table scans where optimization is applicable
2017-02-17 07:43:07,451 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(199)) - Found 0 null table scans
2017-02-17 07:43:07,452 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(175)) - Looking for table scans where optimization is applicable
2017-02-17 07:43:07,452 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(199)) - Found 0 null table scans
2017-02-17 07:43:07,453 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(175)) - Looking for table scans where optimization is applicable
2017-02-17 07:43:07,453 INFO [main]: physical.NullScanTaskDispatcher (NullScanTaskDispatcher.java:dispatch(199)) - Found 0 null table scans
2017-02-17 07:43:07,473 INFO [main]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10213)) - Completed plan generation
2017-02-17 07:43:07,473 INFO [main]: ql.Driver (Driver.java:compile(436)) - Semantic Analysis Completed
2017-02-17 07:43:07,473 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=semanticAnalyze start=1487346185530 end=1487346187473 duration=1943 from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,519 INFO [main]: exec.ListSinkOperator (Operator.java:initialize(332)) - Initializing operator OP[32]
2017-02-17 07:43:07,521 INFO [main]: exec.ListSinkOperator (Operator.java:initialize(372)) - Initialization Done 32 OP
2017-02-17 07:43:07,521 INFO [main]: exec.ListSinkOperator (Operator.java:initializeChildren(429)) - Operator 32 OP initialized
2017-02-17 07:43:07,529 INFO [main]: ql.Driver (Driver.java:getSchema(240)) - Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:consume_date, type:string, comment:null), FieldSchema(name:hour_id, type:int, comment:null), FieldSchema(name:fromdate, type:string, comment:null), FieldSchema(name:company_name, type:string, comment:null), FieldSchema(name:b03, type:decimal(18,8), comment:null)], properties:null)
2017-02-17 07:43:07,529 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=compile start=1487346184861 end=1487346187529 duration=2668 from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,530 INFO [main]: ql.Driver (Driver.java:checkConcurrency(160)) - Concurrency mode is disabled, not creating a lock manager
2017-02-17 07:43:07,530 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,530 INFO [main]: ql.Driver (Driver.java:execute(1328)) - Starting command(queryId=linux_20170217074304_4798207d-cb6e-4a87-8292-3baebe3907d4): select consume_date,hour_id,fromdate,company_name,b03 from consumes where b03 in (select max(b03) from consumes)
2017-02-17 07:43:07,531 INFO [main]: ql.Driver (SessionState.java:printInfo(951)) - Query ID = linux_20170217074304_4798207d-cb6e-4a87-8292-3baebe3907d4
2017-02-17 07:43:07,531 INFO [main]: ql.Driver (SessionState.java:printInfo(951)) - Total jobs = 3
2017-02-17 07:43:07,534 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=TimeToSubmit start=1487346184861 end=1487346187534 duration=2673 from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,534 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,534 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.MAPRED.Stage-2 from=org.apache.hadoop.hive.ql.Driver>
2017-02-17 07:43:07,552 INFO [main]: ql.Driver (SessionState.java:printInfo(951)) - Launching Job 1 out of 3
2017-02-17 07:43:07,554 INFO [main]: ql.Driver (Driver.java:launchTask(1651)) - Starting task [Stage-2:MAPRED] in serial mode
2017-02-17 07:43:07,555 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Number of reduce tasks determined at compile time: 1
2017-02-17 07:43:07,555 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - In order to change the average load for a reducer (in bytes):
2017-02-17 07:43:07,555 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - set hive.exec.reducers.bytes.per.reducer=<number>
2017-02-17 07:43:07,556 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - In order to limit the maximum number of reducers:
2017-02-17 07:43:07,562 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - set hive.exec.reducers.max=<number>
2017-02-17 07:43:07,565 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - In order to set a constant number of reducers:
2017-02-17 07:43:07,567 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - set mapreduce.job.reduces=<number>
2017-02-17 07:43:07,568 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:07,575 INFO [main]: mr.ExecDriver (ExecDriver.java:execute(288)) - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2017-02-17 07:43:07,577 INFO [main]: exec.Utilities (Utilities.java:getInputPaths(3397)) - Processing alias sq_1:consumes
2017-02-17 07:43:07,580 INFO [main]: exec.Utilities (Utilities.java:getInputPaths(3414)) - Adding input file hdfs://hadoopmaster:9000/user/hive/warehouse/consumes
2017-02-17 07:43:07,580 INFO [main]: exec.Utilities (Utilities.java:isEmptyPath(2698)) - Content Summary not cached for hdfs://hadoopmaster:9000/user/hive/warehouse/consumes
2017-02-17 07:43:07,651 INFO [main]: exec.Utilities (Utilities.java:createDummyFileForEmptyPartition(3497)) - Changed input file hdfs://hadoopmaster:9000/user/hive/warehouse/consumes to empty file hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10006/0
2017-02-17 07:43:07,651 INFO [main]: ql.Context (Context.java:getMRScratchDir(330)) - New scratch dir is hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1
2017-02-17 07:43:07,665 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:07,666 INFO [main]: exec.Utilities (Utilities.java:serializePlan(938)) - Serializing MapWork via kryo
2017-02-17 07:43:08,663 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=serializePlan start=1487346187665 end=1487346188663 duration=998 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:08,669 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2017-02-17 07:43:08,702 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:08,703 INFO [main]: exec.Utilities (Utilities.java:serializePlan(938)) - Serializing ReduceWork via kryo
2017-02-17 07:43:08,745 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=serializePlan start=1487346188702 end=1487346188745 duration=43 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-02-17 07:43:08,747 ERROR [main]: mr.ExecDriver (ExecDriver.java:execute(400)) - yarn
2017-02-17 07:43:08,836 INFO [main]: client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hadoopmaster/192.168.23.132:8050
2017-02-17 07:43:09,138 INFO [main]: client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hadoopmaster/192.168.23.132:8050
2017-02-17 07:43:09,146 INFO [main]: exec.Utilities (Utilities.java:getBaseWork(390)) - PLAN PATH = hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10007/a922e0ae-c541-4b92-8f9d-088bde0d1475/map.xml
2017-02-17 07:43:09,147 INFO [main]: exec.Utilities (Utilities.java:getBaseWork(390)) - PLAN PATH = hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10007/a922e0ae-c541-4b92-8f9d-088bde0d1475/reduce.xml
2017-02-17 07:43:09,454 WARN [main]: mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2017-02-17 07:43:12,706 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
2017-02-17 07:43:12,707 INFO [main]: exec.Utilities (Utilities.java:getBaseWork(390)) - PLAN PATH = hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10007/a922e0ae-c541-4b92-8f9d-088bde0d1475/map.xml
2017-02-17 07:43:12,707 INFO [main]: io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(517)) - Total number of paths: 1, launching 1 threads to check non-combinable ones.
2017-02-17 07:43:12,729 INFO [main]: io.CombineHiveInputFormat (CombineHiveInputFormat.java:getCombineSplits(439)) - CombineHiveInputSplit creating pool for hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10006/0; using filter path hdfs://hadoopmaster:9000/tmp/hive/linux/d79925b9-fb4a-41c8-b45e-cc42db800405/hive_2017-02-17_07-43-04_926_1561320960043112851-1/-mr-10006/0
2017-02-17 07:43:12,768 INFO [main]: input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2017-02-17 07:43:12,771 INFO [main]: input.CombineFileInputFormat (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated node allocation with : CompletedNodes: 0, size left: 0
2017-02-17 07:43:12,773 INFO [main]: io.CombineHiveInputFormat (CombineHiveInputFormat.java:getCombineSplits(494)) - number of splits 1
2017-02-17 07:43:12,775 INFO [main]: io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(587)) - Number of all splits 1
2017-02-17 07:43:12,775 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=getSplits start=1487346192706 end=1487346192775 duration=69 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
2017-02-17 07:43:12,857 INFO [main]: mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
2017-02-17 07:43:12,951 INFO [main]: mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_1487346076570_0001
2017-02-17 07:43:13,435 INFO [main]: impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1487346076570_0001
2017-02-17 07:43:13,505 INFO [main]: mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://hadoopmaster:8088/proxy/application_1487346076570_0001/
2017-02-17 07:43:13,510 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Starting Job = job_1487346076570_0001, Tracking URL = http://hadoopmaster:8088/proxy/application_1487346076570_0001/
2017-02-17 07:43:13,514 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1487346076570_0001
2017-02-17 07:43:41,582 INFO [SIGINT handler]: CliDriver (SessionState.java:printInfo(951)) - Interrupting... Be patient, this might take some time.
2017-02-17 07:43:41,584 INFO [SIGINT handler]: CliDriver (SessionState.java:printInfo(951)) - Press Ctrl+C again to kill JVM
2017-02-17 07:43:41,841 INFO [SIGINT handler]: impl.YarnClientImpl (YarnClientImpl.java:killApplication(395)) - Killed application application_1487346076570_0001
2017-02-17 07:43:42,058 INFO [SIGINT handler]: CliDriver (SessionState.java:printInfo(951)) - Exiting the JVM
2017-02-17 07:43:42,102 INFO [Thread-11]: impl.YarnClientImpl (YarnClientImpl.java:killApplication(395)) - Killed application application_1487346076570_0001
I have 2 errors
1) ERROR [main]: hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
2) ERROR [main]: mr.ExecDriver (ExecDriver.java:execute(400)) - yarn
It seems that as per the logs you are executing the below query which has in clause : Hive has some limitations with in clause.
select consume_date,hour_id,fromdate,company_name,b03 from consumes where b03 in (select max(b03) from consumes)
You can use below query,
select consume_date,hour_id,fromdate,company_name,b03 from consumes order by b03 desc limit 1;
Related
Multiple Hive joins failing with Execution Error, return code 2
I'm trying to execute a query in which a table is left outer joined on two other tables. The query is given below: SELECT T.Rdate, c.Specialty_Cruises, b.Specialty_Cruises from arunf.PASSENGER_HISTORY_FACT T LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num) LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number); This query fails with the notification: : exec.Task (SessionState.java:printError(922)) - /tmp/arunf/hive.log : mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(308)) - Execution failed with exit status: 2 : ql.Driver (SessionState.java:printError(922)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask The error logs contain the following: 2015-12-01 10:25:16,077 INFO [main]: mr.ExecDriver (SessionState.java:printInfo(913)) - Execution log at: /tmp/arunf/arunf_20151201102525_914a2eab-652b-440c-9fdc-a473b4caa026.log 2015-12-01 10:25:16,278 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities> 2015-12-01 10:25:16,278 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo 2015-12-01 10:25:16,421 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1448983516278 end=1448983516421 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities> 2015-12-01 10:25:16,429 INFO [main]: mr.MapredLocalTask (SessionState.java:printInfo(913)) - 2015-12-01 10:25:16 Starting to launch local task to process map join; maximum memory = 1029701632 2015-12-01 10:25:16,498 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for c created 2015-12-01 10:25:16,500 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for b created 2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[2] 2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 2 TS initialized 2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 2 TS 2015-12-01 10:25:16,500 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK 2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 2 TS 2015-12-01 10:25:16,500 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for b initialized 2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0] 2015-12-01 10:25:16,501 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized 2015-12-01 10:25:16,501 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS 2015-12-01 10:25:16,502 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK 2015-12-01 10:25:16,503 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1] 2015-12-01 10:25:16,503 INFO [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1029701632 2015-12-01 10:25:16,533 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(357)) - Hive Runtime Error: Map local work failed java.lang.RuntimeException: cannot find field courtesy_hold_booking_num from [0:rdate, 1:online_booking_number, 2:pages, 3:mobile_device_type, 4:specialty_cruises] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.BaseStructObjectInspector.getStructFieldRef(BaseStructObjectInspector.java:133) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:138) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:460) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:366) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Please note that, when the main table is left outer joined with the tables separately they succeed. Example, the below queries succeed: SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num); SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number); I'm also able to do a left outer join of this main table with two other tables in the same combined manner. I'm facing this issue only when I try to left join the main table with these two secondary tables. Kindly provide your insights on this issue.
Hive bugs come and go. It may depend on Hive version (?) and the table format (text? AVRO? Sequence? ORC? Parquet?). Now, if each query appears to work, why don't you try a workaround based on the divide-and-conquer approach (or: if Hive is not smart enough to design an execution plan, then let's design it ourselves) e.g. SELECT TC.RDate, TC.Specialty_Cruises, B.Specialty_Cruises FROM (SELECT T.Rdate, C.Specialty_Cruises FROM arunf.PASSENGER_HISTORY_FACT T LEFT JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C ON unix_timestamp(T.RDate,'yyyy-MM-dd')=unix_timestamp(C.RDate,'yyyy-MM-dd') AND T.book_num = C.Courtesy_Hold_Booking_Num ) TC LEFT JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP B ON unix_timestamp(TC.RDate,'yyyy-MM-dd')=unix_timestamp(B.RDate,'yyyy-MM-dd') AND TC.book_num = B.Online_Booking_Number ;
How to divert the output of pig command to a text file in order to print it out?
2015-09-24 01:59:28,436 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2015-09-24 01:59:28,539 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-09-24 01:59:28,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2015-09-24 01:59:28,560 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2015-09-24 01:59:28,561 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2015-09-24 01:59:28,620 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-09-24 01:59:28,624 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2015-09-24 01:59:28,638 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2015-09-24 01:59:28,640 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2015-09-24 01:59:28,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2015-09-24 01:59:29,268 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/vivek/Applications/pig/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-1176581946/tmp-2078805221/pig-0.14.0-core-h2.jar 2015-09-24 01:59:29,452 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/vivek/Applications/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1176581946/tmp-1750967439/automaton-1.11-8.jar 2015-09-24 01:59:29,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/vivek/Applications/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1176581946/tmp1997290065/antlr-runtime-3.4.jar 2015-09-24 01:59:29,843 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/vivek/Applications/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-1176581946/tmp-256046780/guava-11.0.2.jar 2015-09-24 01:59:29,990 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/vivek/Applications/pig/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-1176581946/tmp955728106/joda-time-2.1.jar 2015-09-24 01:59:30,129 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2015-09-24 01:59:30,131 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2015-09-24 01:59:30,131 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2015-09-24 01:59:30,132 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2015-09-24 01:59:30,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2015-09-24 01:59:30,283 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2015-09-24 01:59:30,568 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2015-09-24 01:59:30,868 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2015-09-24 01:59:30,871 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2015-09-24 01:59:30,874 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2015-09-24 01:59:31,190 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2015-09-24 01:59:31,499 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1443082231600_0003 2015-09-24 01:59:31,516 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2015-09-24 01:59:31,704 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1443082231600_0003 2015-09-24 01:59:31,738 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://ubuntu:8088/proxy/application_1443082231600_0003/ 2015-09-24 01:59:31,742 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1443082231600_0003 2015-09-24 01:59:31,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases highsal,salaries 2015-09-24 01:59:31,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: salaries[3,10],salaries[-1,-1],highsal[13,9] C: R: 2015-09-24 01:59:31,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2015-09-24 01:59:31,782 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1443082231600_0003] 2015-09-24 02:00:48,025 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2015-09-24 02:00:48,025 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1443082231600_0003] 2015-09-24 02:00:53,055 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2015-09-24 02:00:53,104 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2015-09-24 02:00:58,180 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-09-24 02:00:59,182 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-09-24 02:01:00,185 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) (F,96,86000.0,95105) (M,24,80000.0,95050) (F,84,89000.0,94040) (M,36,85000.0,95101) (F,69,91000.0,95050) (F,96,80000.0,95051) (M,78,87000.0,95105) (M,25,96000.0,95103) (M,89,90000.0,95102) (F,82,77000.0,95051) (M,97,96000.0,95102) (F,39,82000.0,95051) (M,36,79000.0,95101) (M,75,84000.0,95103) (F,78,91000.0,95102) (M,59,77000.0,95051) (F,52,76000.0,95050) (M,52,97000.0,95102) (F,28,98000.0,95105) (M,91,96000.0,94041) (F,47,85000.0,95051) (M,79,85000.0,95101) (F,93,93000.0,95102) (F,33,82000.0,95101) (F,77,96000.0,95103) (F,93,84000.0,95051) (M,23,83000.0,95050) (M,54,97000.0,95101) (F,25,93000.0,94040) (M,52,85000.0,95102) (M,60,78000.0,94040) (F,74,89000.0,94040) (F,23,76000.0,95101) (M,46,93000.0,95051) (F,63,92000.0,95105) (F,86,93000.0,95101) (F,37,95000.0,95101) (M,41,89000.0,95050) (F,89,77000.0,94041) (F,82,84000.0,95050) (M,66,96000.0,95051) (F,75,79000.0,95051) (M,91,90000.0,95105) (M,27,98000.0,95051) (M,24,85000.0,94041) (M,82,96000.0,95050) (F,75,88000.0,95101) (F,80,77000.0,95051) (M,63,80000.0,95101) (M,29,86000.0,95103) (F,44,91000.0,95101) (M,40,78000.0,95103) (F,46,83000.0,95051) (F,42,85000.0,95105) (M,44,90000.0,95102) (F,26,90000.0,94041) (F,31,87000.0,95051) (F,88,76000.0,95050) (M,67,87000.0,95102) (F,58,86000.0,94041) (F,57,85000.0,95051) (M,97,85000.0,95101) (M,73,90000.0,95103) (M,47,95000.0,95105) (F,83,98000.0,94040) (F,56,78000.0,95101) (M,72,89000.0,94041) (M,90,99000.0,95101) (F,59,79000.0,95105) (F,32,84000.0,95051) (F,60,93000.0,95103) (M,47,87000.0,94041) (M,52,87000.0,95103) (M,82,92000.0,95051) (M,39,87000.0,95102) (F,93,89000.0,95103) (M,31,88000.0,95050) (M,21,92000.0,94040) (F,65,84000.0,95050) (M,68,89000.0,94041) (F,63,92000.0,94041) (F,95,77000.0,95050) (F,34,98000.0,95102) (F,44,94000.0,94040) (M,69,81000.0,95103) (F,30,85000.0,95051) (F,85,82000.0,95050) (M,75,78000.0,94040) (F,91,94000.0,95105) (F,71,91000.0,94041) (M,39,91000.0,95051) (M,43,90000.0,95105) (F,35,94000.0,94040) (F,41,83000.0,95051) (M,62,94000.0,94041) (F,38,77000.0,94041) (F,63,89000.0,95051) (M,78,90000.0,95050) (M,65,92000.0,95101) (F,42,94000.0,95103) (M,65,80000.0,95103) (F,38,91000.0,95102) (M,58,93000.0,94040) (F,63,83000.0,95103) (F,23,96000.0,95103) (F,43,96000.0,95102) (F,27,86000.0,94041) (M,94,76000.0,94041) (F,53,79000.0,94041) (M,78,79000.0,95102) (F,62,82000.0,95101) (M,86,83000.0,95051) (F,91,98000.0,95105) (M,61,99000.0,95103) (M,58,94000.0,95050) (F,47,99000.0,95102) (F,24,89000.0,95101) (M,80,92000.0,95051) (F,30,83000.0,95102) (F,35,86000.0,95051) (M,69,82000.0,95102) (F,49,83000.0,95105) (M,59,82000.0,95103) (F,74,84000.0,95103) (F,82,83000.0,95051) (M,32,85000.0,95102) (M,39,91000.0,95103) (M,50,95000.0,95051) (M,98,89000.0,95105) (M,84,96000.0,95050) (M,61,90000.0,95103) (F,69,83000.0,95102) (F,59,91000.0,95101) (M,79,90000.0,95050) (F,98,83000.0,95050) (F,65,78000.0,94040) (F,74,81000.0,95103) (M,83,97000.0,95101) (M,42,92000.0,95102) (M,82,92000.0,95105) (F,41,91000.0,94041) (F,35,97000.0,94040) (F,46,85000.0,95050) (M,34,86000.0,94041) (F,37,85000.0,94041) (M,64,91000.0,94040) (M,92,84000.0,95051) (M,56,83000.0,95103) (F,68,98000.0,95101) (M,28,81000.0,95050) (F,81,93000.0,95050) (M,71,87000.0,95051) (M,90,86000.0,95050) (F,92,78000.0,94041) (M,42,97000.0,95101) (F,97,83000.0,94041) (M,41,86000.0,95051) (F,96,99000.0,95102) (F,56,96000.0,95051) (F,63,99000.0,95105) (F,69,89000.0,95050) (M,67,85000.0,95105) (M,61,83000.0,95051) (M,86,96000.0,95103) (F,84,82000.0,94041) (M,91,90000.0,95050) (F,36,99000.0,94041) (M,75,97000.0,95105) (M,39,93000.0,95050) (M,56,90000.0,95050) (M,61,91000.0,95105) (M,29,93000.0,94041) (M,79,99000.0,95102) (M,48,91000.0,95101) (F,95,76000.0,95101) (M,47,98000.0,95050) (M,61,88000.0,95101) (M,74,77000.0,95101) (M,75,83000.0,94040) (M,34,82000.0,95103) (M,70,85000.0,95103) (F,43,94000.0,94041) (F,64,91000.0,95105) (F,21,95000.0,95051) (M,55,91000.0,95051) (M,27,85000.0,95105) (F,40,84000.0,94040) (F,41,84000.0,94041) (F,50,87000.0,95051) (M,72,82000.0,95103) (F,50,87000.0,95105) (F,31,93000.0,95102) (F,45,80000.0,95050) (F,62,77000.0,94040) (M,93,91000.0,95101) (M,77,94000.0,95051) (F,33,82000.0,95051) (M,95,87000.0,95105) (M,40,79000.0,95102) (M,82,87000.0,95050) (M,55,85000.0,95051) (M,52,96000.0,95102) (F,52,96000.0,95050) (F,78,82000.0,95102) (F,31,82000.0,94041) (F,60,97000.0,95101) (M,77,81000.0,95102) (F,78,93000.0,95101) (M,74,82000.0,94040) (M,62,77000.0,95050) (F,72,77000.0,95102) (M,96,87000.0,94041) (F,89,93000.0,95051) (M,59,87000.0,95050) (F,26,81000.0,95105) (F,84,77000.0,95051) (F,42,84000.0,94040) (F,59,96000.0,94041) (F,31,78000.0,95050) (F,91,85000.0,95105) (F,87,79000.0,95102) (M,39,88000.0,95105) (F,47,86000.0,95051) (F,24,92000.0,95101) (F,76,85000.0,95103) (F,48,83000.0,95105) (M,50,88000.0,95105) (F,61,93000.0,94041) (F,59,98000.0,95050) (F,57,95000.0,95050) (M,77,76000.0,95105) (M,34,90000.0,95105) (M,23,91000.0,95050) (M,38,88000.0,95051) (F,35,86000.0,95102) (M,27,91000.0,95103) (F,99,78000.0,95051) (F,77,94000.0,94041) (M,23,83000.0,95103) (M,93,91000.0,95051) (F,94,89000.0,95103) (M,99,99000.0,95105) (M,75,84000.0,94040) (M,32,89000.0,94041) (F,57,76000.0,94040) (F,94,95000.0,95103) (M,66,82000.0,94041) (F,56,98000.0,94041) (M,37,88000.0,95105) (M,89,82000.0,95050) (M,91,79000.0,95103) (F,72,90000.0,95102) (F,53,85000.0,95050) (F,87,91000.0,95105) (M,74,91000.0,95050) (F,62,99000.0,95102) (M,46,95000.0,95105) (F,73,78000.0,95050) (F,35,94000.0,95102) (F,60,77000.0,95105) (M,83,93000.0,95105) (F,55,76000.0,95051) (F,36,90000.0,95101) (F,75,87000.0,95103) (F,91,98000.0,95103) (F,66,87000.0,95101) (M,83,91000.0,95103) (M,52,77000.0,94040) (F,76,85000.0,95103) (F,98,78000.0,95102) (F,60,89000.0,95050) (F,30,76000.0,95101) (F,53,95000.0,95050) (M,63,85000.0,95105) (F,25,94000.0,95050) (M,29,98000.0,95103) (M,53,82000.0,95050) (F,70,89000.0,95101) (F,76,83000.0,95105) (M,85,98000.0,95050) (F,81,97000.0,95103) (M,30,77000.0,94041) (F,73,85000.0,95102) (M,94,93000.0,95103) (F,83,80000.0,95101) (F,44,88000.0,94040) (F,35,83000.0,95051) (F,25,82000.0,94040) (M,26,92000.0,95101) (F,60,81000.0,95105) (F,47,78000.0,94040) (F,53,87000.0,94040) (F,44,88000.0,95051) (M,73,96000.0,95103) (F,77,95000.0,95103) (M,24,93000.0,95050) (F,21,76000.0,95050) (F,82,90000.0,95103) (M,71,97000.0,95051) (M,53,79000.0,95105) (M,28,84000.0,94040) (M,35,97000.0,95101) (F,75,76000.0,94040) (M,87,94000.0,94041) (F,89,79000.0,95102) (F,80,92000.0,95102) (M,24,77000.0,95102) (F,40,94000.0,95105) (M,43,80000.0,94041) (M,23,80000.0,94041) (F,51,83000.0,94041) (F,90,78000.0,94040) (F,41,79000.0,95102) (M,48,93000.0,94041) (M,69,94000.0,94040) (F,36,81000.0,95101) (M,35,91000.0,95051) (F,26,88000.0,95050) (M,35,83000.0,94041) (F,36,77000.0,95103) (M,57,91000.0,95103) (F,57,89000.0,95101) (F,38,86000.0,94041) (F,31,83000.0,95050) (M,47,96000.0,94041) (F,91,83000.0,95101) (F,21,78000.0,95103) (M,32,84000.0,95051) (F,41,93000.0,94041) (M,81,93000.0,95102) (F,59,78000.0,95105) (M,71,90000.0,95050) (F,51,77000.0,95051) (M,29,88000.0,95102) (F,40,93000.0,95102) (F,89,99000.0,95105) (F,64,77000.0,95103) (F,53,87000.0,94041) (M,53,97000.0,94040) (M,45,78000.0,94040) (F,76,89000.0,94041) (M,59,81000.0,95050) (F,24,76000.0,94041) (M,72,95000.0,95051) (M,63,83000.0,94040) (F,39,76000.0,94041) (F,26,85000.0,95101) (M,90,99000.0,95102) (F,47,76000.0,95103) (M,72,86000.0,95105) (M,38,92000.0,95050) (M,54,78000.0,95101) (F,48,86000.0,95102) (F,37,78000.0,94040) (F,75,88000.0,95103) (F,66,78000.0,95050) (M,58,80000.0,94040) (M,84,88000.0,95050) (F,35,94000.0,95050) (M,57,88000.0,95102) (M,68,83000.0,95050) (M,37,91000.0,95103) (M,65,79000.0,95101) (M,65,85000.0,95101) (F,97,83000.0,95102) (M,43,83000.0,95051) (F,73,82000.0,95103) (M,89,87000.0,95050) (F,74,84000.0,95103) (M,73,90000.0,94041) (F,46,97000.0,95103) (M,36,82000.0,94041) (M,80,82000.0,95105) (F,78,79000.0,95102) (M,67,96000.0,94040) (F,48,98000.0,95102) (F,82,86000.0,95050) (M,79,80000.0,95050) (M,96,84000.0,95103) (M,51,87000.0,94040) (F,29,84000.0,95051) (M,47,86000.0,94040) (M,54,96000.0,94041) (F,80,94000.0,94041) (F,92,93000.0,95103) (F,59,79000.0,95050) (M,95,80000.0,95050) (M,67,92000.0,94040) (F,23,98000.0,95103) (M,91,82000.0,95051) (M,27,89000.0,95105) (M,43,77000.0,94041) (F,65,83000.0,94040) (F,65,82000.0,95051) (M,43,98000.0,95105) (F,51,86000.0,95102) (M,76,83000.0,95051) (F,25,92000.0,94040) (M,48,76000.0,95102) (F,43,86000.0,95050) (F,57,83000.0,95101) (F,48,84000.0,95051) (M,37,98000.0,95102) (F,98,81000.0,95105) (M,78,86000.0,94041) (F,34,93000.0,95102) (M,53,94000.0,95102) (M,69,98000.0,94040) (F,70,84000.0,94041) (F,89,87000.0,94040) (F,52,89000.0,95102) (F,84,79000.0,95102) (M,44,86000.0,94041) (M,51,93000.0,94041) (M,98,81000.0,95102) (F,82,77000.0,95101) (M,50,82000.0,95103) (F,59,76000.0,95051) (M,29,76000.0,94041) (F,30,81000.0,95051) (F,22,96000.0,95105) (M,64,88000.0,94040) (M,80,78000.0,95102) (F,94,85000.0,95051) (M,63,95000.0,95103) (F,51,78000.0,95050) (M,39,94000.0,95105) (M,80,85000.0,95101) (M,92,89000.0,95102) (M,44,88000.0,95103) (M,57,92000.0,95050) (F,64,94000.0,95051) (F,88,91000.0,95102) (F,43,83000.0,95101) (F,33,93000.0,95050) (M,64,92000.0,95102) (M,91,92000.0,95050) (F,32,88000.0,95105) (M,78,87000.0,94041) (F,64,85000.0,94040) (M,93,96000.0,95102) (F,72,98000.0,95103) (M,68,76000.0,95051) (M,52,95000.0,95050) (F,75,93000.0,95103) (M,45,85000.0,94041) (F,70,98000.0,95051) (F,74,96000.0,95101) (F,81,85000.0,95102) (M,83,91000.0,95105) (M,32,89000.0,95101) (F,58,90000.0,94041) (M,55,80000.0,95050) (F,23,79000.0,95051) (M,91,79000.0,95103) (F,21,98000.0,95102) (F,57,91000.0,95101) (M,58,91000.0,95051) (F,41,94000.0,95101) (M,67,95000.0,94041) (M,69,80000.0,95101) (M,23,77000.0,94041) (F,94,92000.0,95105) (F,60,92000.0,95051) (F,53,84000.0,94041) (F,48,98000.0,95103) (M,70,88000.0,95051) (M,76,94000.0,95103) (F,22,88000.0,94040) (F,80,81000.0,95102) (F,57,80000.0,95051) (F,57,99000.0,95103) (M,50,78000.0,95050) (M,40,81000.0,95050) (F,93,97000.0,95050) (M,40,80000.0,94041) (M,35,91000.0,95101) (F,50,96000.0,94041) (F,27,90000.0,95105) (F,23,91000.0,95105) (M,49,80000.0,94041) (M,90,98000.0,95105) (M,29,91000.0,95050) (F,99,83000.0,95103) (F,43,83000.0,94040) (F,30,90000.0,94041) (F,96,97000.0,95102) (M,83,77000.0,95103) (F,77,97000.0,94040) (F,74,98000.0,95105) (F,96,96000.0,95103) (F,37,81000.0,94041) (M,82,91000.0,94040) (F,33,90000.0,95101) (F,35,86000.0,95102) (F,67,87000.0,95105) (M,95,95000.0,95051) (M,82,95000.0,95101) (F,26,76000.0,95050) (F,65,84000.0,95103) (F,34,91000.0,95102) (F,48,81000.0,94040) (F,93,84000.0,94041) (F,37,79000.0,95105) (M,77,84000.0,95102) (M,94,78000.0,94040) (M,28,79000.0,95051) (F,30,80000.0,94041) (F,54,80000.0,95103) (F,93,96000.0,95105) (F,45,78000.0,94041) right now i'm just executing pig command .I wanna direct or make a copy of output at execution time let it is really difficult to take a snapshot of it. just suggest a solution to overcome from it. the code ILLUSTRATE THE OUTPUT OF THE COMMAND grunt>salaries= load 'salaries' using PigStorage(',') As (gender, age,salary,zip); grunt> salaries= load 'salaries' using PigStorage(',') As (gender:chararray,age:int,salary:double,zip:long); grunt>highsal= filter salaries by salary > 75000; grunt>dump highsal; WHEN THE ABOVE COMMAND EXECUTED THE OUTPUT LISTING ABOVE WIILL BE DISPLAYED . JUST I HAVE COPIED salaries.txt from local FS to hdfs . grunt> store highsal into 'file'; 2015-09-24 02:59:15,981 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 1, column 6> Undefined alias: highsal Details at logfile: /home/vivek/pig_1443088724224.log grunt> i'm still getting error by suggested query.
You have not defined "highsal" alias when trying to run STORE command. Pig do not store any alias previous session. you have to execute all your command in one session or write a pig script and invoke it. Try like : grunt>salaries= load 'salaries' using PigStorage(',') As (gender, age,salary,zip); grunt> salaries= load 'salaries' using PigStorage(',') As (gender:chararray,age:int,salary:double,zip:long); grunt>highsal= filter salaries by salary > 75000; grunt>STORE highsal INTO 'file'; This will store the "highsal" content in a file name 'file/part-x-xxxxx' on user's HDFS directory. You can also provide HDFS absolute directory path instead of 'file' if you want to wish to store data in directory other than users home directory Hope this helps
store highsal into 'file'; Have a look at apache pig documentation for all commands.
Getting error while running query on hive over tez
Getting error while running query on hive over tez. As per logs, hive is failing while copying tez jars to a hdfs location on start of tez session.Below is the complete log obtained from hive log file : 2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Query ID = saurabh_20150619012323_f52f1d6c-2adb-4edc-8ba4-b64d7d898325 2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Total jobs = 1 2015-06-19 01:23:52,289 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=TimeToSubmit start=1434657232288 end=1434657232289 duration=1 from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:23:52,290 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.TEZ.Stage-1 from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printInfo(852)) - Launching Job 1 out of 1 2015-06-19 01:23:52,302 INFO [HiveServer2-Background-Pool: Thread-41]: ql.Driver (Driver.java:launchTask(1630)) - Starting task [Stage-1:MAPRED] in parallel 2015-06-19 01:23:52,312 INFO [Thread-21]: session.SessionState (SessionState.java:start(488)) - No Tez session required at this point. hive.execution.engine=mr. 2015-06-19 01:23:52,314 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getSession(125)) - QueueName: null nonDefaultUser: true defaultQueuePool: null blockingQueueLength: -1 2015-06-19 01:23:52,315 INFO [Thread-21]: tez.TezSessionPoolManager (TezSessionPoolManager.java:getNewSessionState(154)) - Created a new session for queue: null session id: 85d83746-a48e-419e-a7ca-8c98faf173ea 2015-06-19 01:23:52,380 INFO [Thread-21]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 2015-06-19 01:23:52,412 INFO [Thread-21]: ql.Context (Context.java:getMRScratchDir(328)) - New scratch dir is hdfs://localhost:9000/tmp/hive/saurabh/e5a701ae-242d-488f-beec-cf18878becdc/hive_2015-06-19_01-23-49_794_2167174123575230985-2 2015-06-19 01:23:52,420 INFO [Thread-21]: exec.Task (TezTask.java:updateSession(233)) - Tez session hasn't been created yet. Opening session 2015-06-19 01:23:52,420 INFO [Thread-21]: tez.TezSessionState (TezSessionState.java:open(142)) - User of session id 85d83746-a48e-419e-a7ca-8c98faf173ea is saurabh 2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(950)) - Localizing resource because it does not exist: file:/usr/lib/tez/* to dest: hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/* 2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(954)) - Looks like another thread is writing the same file will wait. 2015-06-19 01:23:52,433 INFO [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(961)) - Number of wait attempts: 5. Wait interval: 5000 2015-06-19 01:24:17,449 ERROR [Thread-21]: tez.DagUtils (DagUtils.java:localizeResource(977)) - Could not find the jar that was being uploaded 2015-06-19 01:24:17,451 ERROR [Thread-21]: exec.Task (TezTask.java:execute(184)) - Failed to execute tez graph. java.io.IOException: Previous writer likely failed to write hdfs://localhost:9000/tmp/hive/saurabh/_tez_session_dir/85d83746-a48e-419e-a7ca-8c98faf173ea/*. Failing because I am unlikely to write too. at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:978) at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:859) at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:802) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:228) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:154) at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) 2015-06-19 01:24:18,329 ERROR [HiveServer2-Background-Pool: Thread-41]: ql.Driver (SessionState.java:printError(861)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask 2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1434657232288 end=1434657258329 duration=26041 from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:24:18,329 INFO [HiveServer2-Background-Pool: Thread-41]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258329 end=1434657258329 duration=0 from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:24:18,333 ERROR [HiveServer2-Background-Pool: Thread-41]: operation.Operation (SQLOperation.java:run(200)) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:147) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-19 01:24:18,342 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(595)) - 40 finished. closing... 2015-06-19 01:24:18,343 INFO [HiveServer2-Handler-Pool: Thread-29]: exec.ListSinkOperator (Operator.java:close(613)) - 40 Close done 2015-06-19 01:24:18,393 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 2015-06-19 01:24:18,394 INFO [HiveServer2-Handler-Pool: Thread-29]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1434657258393 end=1434657258394 duration=1 from=org.apache.hadoop.hive.ql.Driver>
From hive to elasticsearch :
I'am working with Cloudera CDH5.3 with 1 Namenode (ip:...169) and 3 slaves. I have ElasticSearch 1.4.4 installed on my master machine (ip:...169). I have downloaded the ES-Hadoop jar and added it to the path. With that being said; I now want to load data from Hive to ES. 1) First of all, I created a table via a CSV file under table metastore (with HUE) 2) I defined an external table on top of ES in hive to write and load data in it later: ADD JAR /usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-hive-2.0.2.jar; CREATE EXTERNAL TABLE es_cdr( id bigint, calling int, called int, duration int, location string, date string) ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe' STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES( 'es.nodes'='10.44.162.169', 'es.resource' = 'indexOmar/typeOmar'); I've also added manually the serde snapshot jar via paramaters=> add file =>jar Now, I want to load data from my table in the new ES table : INSERT OVERWRITE TABLE es_cdr select NULL, h.appelant, h.called_number, h.call_duration, h.location_number, h.date_heure_appel from hive_cdr h; But an error is appearing saying that : Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask And this is what's written in the log : 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE TABLE hive_es_cdr_10 SELECT NULL,h.appelant,h.called_number,h.call_dur,h.loc_number,h.h_appel FROM hive_cdr h limit 2 15/03/05 14:36:34 INFO parse.ParseDriver: Parse Completed 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=parse start=1425562594378 end=1425562594381 duration=3 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Starting Semantic Analysis 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for source tables 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Get metadata for destination tables 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis 15/03/05 14:36:34 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/.hive-staging_hive_2015-03-05_14-36-34_378_4527939627221909415-1 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Set stats collection dir : hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/.hive-staging_hive_2015-03-05_14-36-34_378_4527939627221909415-1/-ext-10000 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for FS(109) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for SEL(108) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for LIM(107) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for EX(106) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for RS(105) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for LIM(104) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for SEL(103) 15/03/05 14:36:34 INFO ppd.OpProcFactory: Processing for TS(102) 15/03/05 14:36:34 INFO optimizer.ColumnPrunerProcFactory: RS 105 oldColExprMap: {_col5=Column[_col5], _col4=Column[_col4], _col3=Column[_col3], _col2=Column[_col2], _col1=Column[_col1], _col0=Column[_col0]} 15/03/05 14:36:34 INFO optimizer.ColumnPrunerProcFactory: RS 105 newColExprMap: {_col5=Column[_col5], _col4=Column[_col4], _col3=Column[_col3], _col2=Column[_col2], _col1=Column[_col1], _col0=Column[_col0]} 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1425562594461 end=1425562594461 duration=0 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner> 15/03/05 14:36:34 INFO physical.MetadataOnlyOptimizer: Looking for table scans where optimization is applicable 15/03/05 14:36:34 INFO physical.MetadataOnlyOptimizer: Found 0 metadata only table scans 15/03/05 14:36:34 INFO parse.SemanticAnalyzer: Completed plan generation 15/03/05 14:36:34 INFO ql.Driver: Semantic Analysis Completed 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1425562594381 end=1425562594463 duration=82 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:bigint, comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:int, comment:null), FieldSchema(name:_col3, type:int, comment:null), FieldSchema(name:_col4, type:string, comment:null), FieldSchema(name:_col5, type:string, comment:null)], properties:null) 15/03/05 14:36:34 INFO ql.Driver: EXPLAIN output for queryid hive_20150305143636_528f97d4-b670-40e2-ba80-7d7a7bd441ff : ABSTRACT SYNTAX TREE: TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME hive_cdr h TOK_INSERT TOK_DESTINATION TOK_TAB TOK_TABNAME hive_es_cdr_10 TOK_SELECT TOK_SELEXPR TOK_NULL TOK_SELEXPR . TOK_TABLE_OR_COL h appelant TOK_SELEXPR . TOK_TABLE_OR_COL h called_number TOK_SELEXPR . TOK_TABLE_OR_COL h call_dur TOK_SELEXPR . TOK_TABLE_OR_COL h loc_number TOK_SELEXPR . TOK_TABLE_OR_COL h h_appel TOK_LIMIT 2 STAGE DEPENDENCIES: Stage-0 is a root stage [MAPRED] STAGE PLANS: Stage: Stage-0 Map Reduce Map Operator Tree: TableScan alias: h GatherStats: false Select Operator expressions: null (type: string), appelant (type: int), called_number (type: int), call_dur (type: int), loc_number (type: string), h_appel (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Limit Number of rows: 2 Reduce Output Operator sort order: tag: -1 value expressions: _col0 (type: void), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: string), _col5 (type: string) Path -> Alias: hdfs://master:8020/user/hive/warehouse/hive_cdr [h] Path -> Partition: hdfs://master:8020/user/hive/warehouse/hive_cdr Partition base file name: hive_cdr input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE true bucket_count -1 columns traffic_type_id,appelant,called_number,call_dur,loc_number,h_appel columns.comments columns.types int:int:int:int:string:string field.delim ; file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location hdfs://master:8020/user/hive/warehouse/hive_cdr name default.hive_cdr numFiles 1 numRows 0 rawDataSize 0 serialization.ddl struct hive_cdr { i32 traffic_type_id, i32 appelant, i32 called_number, i32 call_dur, string loc_number, string h_appel} serialization.format ; serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 56373362 transient_lastDdlTime 1425459002 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE true bucket_count -1 columns traffic_type_id,appelant,called_number,call_dur,loc_number,h_appel columns.comments columns.types int:int:int:int:string:string field.delim ; file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location hdfs://master:8020/user/hive/warehouse/hive_cdr name default.hive_cdr numFiles 1 numRows 0 rawDataSize 0 serialization.ddl struct hive_cdr { i32 traffic_type_id, i32 appelant, i32 called_number, i32 call_dur, string loc_number, string h_appel} serialization.format ; serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 56373362 transient_lastDdlTime 1425459002 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.hive_cdr name: default.hive_cdr Truncated Path -> Alias: /hive_cdr [h] Needs Tagging: false Reduce Operator Tree: Extract Limit Number of rows: 2 Select Operator expressions: UDFToLong(_col0) (type: bigint), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: string), _col5 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 File Output Operator compressed: false GlobalTableId: 1 directory: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10 NumFilesPerFileSink: 1 Stats Publishing Key Prefix: hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10/ table: input format: org.elasticsearch.hadoop.hive.EsHiveInputFormat jobProperties: EXTERNAL TRUE bucket_count -1 columns id_traffic,caller,called,call_dur,caller_location,call_date columns.comments columns.types bigint:int:int:int:string:string es.nodes 10.44.162.169 es.port 9200 es.resource myindex/mytype file.inputformat org.apache.hadoop.mapred.SequenceFileInputFormat file.outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat location hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10 name default.hive_es_cdr_10 serialization.ddl struct hive_es_cdr_10 { i64 id_traffic, i32 caller, i32 called, i32 call_dur, string caller_location, string call_date} serialization.format 1 serialization.lib org.elasticsearch.hadoop.hive.EsSerDe storage_handler org.elasticsearch.hadoop.hive.EsStorageHandler transient_lastDdlTime 1425561441 output format: org.elasticsearch.hadoop.hive.EsHiveOutputFormat properties: EXTERNAL TRUE bucket_count -1 columns id_traffic,caller,called,call_dur,caller_location,call_date columns.comments columns.types bigint:int:int:int:string:string es.nodes 10.44.162.169 es.port 9200 es.resource myindex/mytype file.inputformat org.apache.hadoop.mapred.SequenceFileInputFormat file.outputformat org.apache.hadoop.mapred.SequenceFileOutputFormat location hdfs://master:8020/user/hive/warehouse/hive_es_cdr_10 name default.hive_es_cdr_10 serialization.ddl struct hive_es_cdr_10 { i64 id_traffic, i32 caller, i32 called, i32 call_dur, string caller_location, string call_date} serialization.format 1 serialization.lib org.elasticsearch.hadoop.hive.EsSerDe storage_handler org.elasticsearch.hadoop.hive.EsStorageHandler transient_lastDdlTime 1425561441 serde: org.elasticsearch.hadoop.hive.EsSerDe name: default.hive_es_cdr_10 TotalFiles: 1 GatherStats: false MultiFileSpray: false 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=compile start=1425562594378 end=1425562594484 duration=106 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO lockmgr.DummyTxnManager: Creating lock manager of type org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager 15/03/05 14:36:34 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=600000 watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher#70e69669 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1425562594502 end=1425562594523 duration=21 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE hive_es_cdr_10 SELECT NULL,h.appelant,h.called_number,h.call_dur,h.loc_number,h.h_appel FROM hive_cdr h limit 2 15/03/05 14:36:34 INFO ql.Driver: Total jobs = 1 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1425562594500 end=1425562594526 duration=26 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=task.MAPRED.Stage-0 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:34 INFO ql.Driver: Launching Job 1 out of 1 15/03/05 14:36:34 INFO exec.Task: Number of reduce tasks determined at compile time: 1 15/03/05 14:36:34 INFO exec.Task: In order to change the average load for a reducer (in bytes): 15/03/05 14:36:34 INFO exec.Task: set hive.exec.reducers.bytes.per.reducer=<number> 15/03/05 14:36:34 INFO exec.Task: In order to limit the maximum number of reducers: 15/03/05 14:36:34 INFO exec.Task: set hive.exec.reducers.max=<number> 15/03/05 14:36:34 INFO exec.Task: In order to set a constant number of reducers: 15/03/05 14:36:34 INFO exec.Task: set mapreduce.job.reduces=<number> 15/03/05 14:36:34 INFO ql.Context: New scratch dir is hdfs://master:8020/tmp/hive-hive/hive_2015-03-05_14-36-34_378_4527939627221909415-7 15/03/05 14:36:34 INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 15/03/05 14:36:34 INFO mr.ExecDriver: adding libjars: file:///tmp/d39b23a8-98d2-4bc3-9008-3eff080dd20c_resources/hive-serdes-1.0-SNAPSHOT.jar,file:///usr/elasticsearch-hadoop-2.0.2/dist/elasticsearch-hadoop-hive-2.0.2.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hive/lib/hive-hbase-handler-0.13.1-cdh5.3.1.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-server.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/lib/htrace-core.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/lib/htrace-core-2.04.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-common.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-client.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-protocol.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-hadoop2-compat.jar,file:///opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hbase/hbase-hadoop-compat.jar 15/03/05 14:36:34 INFO exec.Utilities: Processing alias h 15/03/05 14:36:34 INFO exec.Utilities: Adding input file hdfs://master:8020/user/hive/warehouse/hive_cdr 15/03/05 14:36:34 INFO exec.Utilities: Content Summary not cached for hdfs://master:8020/user/hive/warehouse/hive_cdr 15/03/05 14:36:34 INFO ql.Context: New scratch dir is hdfs://master:8020/tmp/hive-hive/hive_2015-03-05_14-36-34_378_4527939627221909415-7 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities> 15/03/05 14:36:34 INFO exec.Utilities: Serializing MapWork via kryo 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=serializePlan start=1425562594554 end=1425562594638 duration=84 from=org.apache.hadoop.hive.ql.exec.Utilities> 15/03/05 14:36:34 INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities> 15/03/05 14:36:34 INFO exec.Utilities: Serializing ReduceWork via kryo 15/03/05 14:36:34 INFO log.PerfLogger: </PERFLOG method=serializePlan start=1425562594653 end=1425562594708 duration=55 from=org.apache.hadoop.hive.ql.exec.Utilities> 15/03/05 14:36:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.44.162.169:8032 15/03/05 14:36:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.44.162.169:8032 15/03/05 14:36:34 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption 15/03/05 14:36:34 INFO mr.EsOutputFormat: Writing to [myindex/mytype] 15/03/05 14:36:34 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 15/03/05 14:36:35 INFO log.PerfLogger: <PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat> 15/03/05 14:36:35 INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://master:8020/user/hive/warehouse/hive_cdr; using filter path hdfs://master:8020/user/hive/warehouse/hive_cdr 15/03/05 14:36:35 INFO input.FileInputFormat: Total input paths to process : 1 15/03/05 14:36:35 INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 3, size left: 0 15/03/05 14:36:35 INFO io.CombineHiveInputFormat: number of splits 1 15/03/05 14:36:35 INFO log.PerfLogger: </PERFLOG method=getSplits start=1425562595867 end=1425562595896 duration=29 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat> 15/03/05 14:36:35 INFO mapreduce.JobSubmitter: number of splits:1 15/03/05 14:36:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425457357655_0006 15/03/05 14:36:36 INFO impl.YarnClientImpl: Submitted application application_1425457357655_0006 15/03/05 14:36:36 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1425457357655_0006/ 15/03/05 14:36:36 INFO exec.Task: Starting Job = job_1425457357655_0006, Tracking URL = http://master:8088/proxy/application_1425457357655_0006/ 15/03/05 14:36:36 INFO exec.Task: Kill Command = /opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/hadoop/bin/hadoop job -kill job_1425457357655_0006 15/03/05 14:36:58 INFO exec.Task: Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0 15/03/05 14:36:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/03/05 14:36:58 INFO exec.Task: 2015-03-05 14:36:58,687 Stage-0 map = 0%, reduce = 0% 15/03/05 14:36:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 15/03/05 14:36:58 ERROR exec.Task: Ended Job = job_1425457357655_0006 with errors 15/03/05 14:36:58 INFO impl.YarnClientImpl: Killed application application_1425457357655_0006 15/03/05 14:36:58 ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 15/03/05 14:36:58 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1425562594523 end=1425562618754 duration=24231 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:58 INFO ql.Driver: MapReduce Jobs Launched: 15/03/05 14:36:58 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 15/03/05 14:36:58 INFO ql.Driver: Stage-Stage-0: HDFS Read: 0 HDFS Write: 0 FAIL 15/03/05 14:36:58 INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec 15/03/05 14:36:58 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default/hive_es_cdr_10 15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default/hive_cdr 15/03/05 14:36:58 INFO ZooKeeperHiveLockManager: about to release lock for default 15/03/05 14:36:58 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1425562618768 end=1425562618780 duration=12 from=org.apache.hadoop.hive.ql.Driver> 15/03/05 14:36:58 ERROR operation.Operation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147) at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run
It seems the failure is caused by a type issue. You can use es.mapping property to set types in TBLPROPERTIES
Pig "Max" command for pig-0.12.1 and pig-0.13.0 with Hadoop-2.4.0
I have a pig script I got from Hortonworks that works fine with pig-0.9.2.15 with Hadoop-1.0.3.16. But when I run it with pig-0.12.1(recompiled with -Dhadoopversion=23) or pig-0.13.0 on Hadoop-2.4.0, it won't work. It seems the following line is where the problem is. max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs; Here's the whole script. batting = load 'pig_data/Batting.csv' using PigStorage(','); runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs; grp_data = GROUP runs by (year); max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs; join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs); join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs; STORE join_data INTO './join_data'; And here's the hadoop error info: 2014-07-29 18:03:02,957 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grp_data: Local Rearrange[tuple]{bytearray}(false) - scope-34 Operator Key: scope-34): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error executing an algebraic function 2014-07-29 18:03:02,958 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! How can I fix this if I still want to use "MAX" function? Thank you! Here's the complete information: 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 14/07/29 17:50:11 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2014-07-29 17:50:12,104 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58 2014-07-29 17:50:12,104 [main] INFO org.apache.pig.Main - Logging error messages to: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log 2014-07-29 17:50:13,050 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found 2014-07-29 17:50:13,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2014-07-29 17:50:13,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:13,415 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://namenode.cmda.hadoop.com:8020 2014-07-29 17:50:14,302 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: namenode.cmda.hadoop.com:8021 2014-07-29 17:50:14,990 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:15,570 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:15,665 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 2014-07-29 17:50:15,705 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2014-07-29 17:50:15,791 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,GROUP_BY 2014-07-29 17:50:15,873 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]} 2014-07-29 17:50:16,319 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-29 17:50:16,377 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner 2014-07-29 17:50:16,410 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POPackage(JoinPackager) 2014-07-29 17:50:16,417 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2014-07-29 17:50:16,418 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2014-07-29 17:50:16,493 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:16,575 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29 17:50:16,973 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-29 17:50:17,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2014-07-29 17:50:17,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-29 17:50:17,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2014-07-29 17:50:17,020 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-29 17:50:17,020 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-29 17:50:17,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=6398990 2014-07-29 17:50:17,067 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-29 17:50:17,067 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2014-07-29 17:50:17,068 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2014-07-29 17:50:17,068 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2337803902169382273.jar 2014-07-29 17:50:20,957 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2337803902169382273.jar created 2014-07-29 17:50:20,957 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2014-07-29 17:50:21,001 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2014-07-29 17:50:21,036 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2014-07-29 17:50:21,036 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2014-07-29 17:50:21,046 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2014-07-29 17:50:21,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2014-07-29 17:50:21,311 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 2014-07-29 17:50:21,332 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at namenode.cmda.hadoop.com/10.0.3.1:8050 2014-07-29 17:50:21,366 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:22,606 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-07-29 17:50:22,606 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2014-07-29 17:50:22,629 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2014-07-29 17:50:22,729 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2014-07-29 17:50:22,745 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-29 17:50:23,026 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1406677482986_0003 2014-07-29 17:50:23,258 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1406677482986_0003 2014-07-29 17:50:23,340 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://namenode.cmda.hadoop.com:8088/proxy/application_1406677482986_0003/ 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1406677482986_0003 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases batting,grp_data,max_runs,runs 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: batting[3,10],runs[5,7],max_runs[7,11],grp_data[6,11] C: max_runs[7,11],grp_data[6,11] R: max_runs[7,11] 2014-07-29 17:50:23,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://namenode.cmda.hadoop.com:50030/jobdetails.jsp?jobid=job_1406677482986_0003 2014-07-29 17:50:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2014-07-29 17:50:23,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:15,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2014-07-29 17:51:15,564 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1406677482986_0003] 2014-07-29 17:51:18,582 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2014-07-29 17:51:18,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1406677482986_0003 has failed! Stop running all dependent jobs 2014-07-29 17:51:18,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2014-07-29 17:51:18,825 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grp_data: Local Rearrange[tuple]{bytearray}(false) - scope-73 Operator Key: scope-73): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error executing an algebraic function 2014-07-29 17:51:18,825 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2014-07-29 17:51:18,826 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.4.0 0.13.0 root 2014-07-29 17:50:16 2014-07-29 17:51:18 HASH_JOIN,GROUP_BY Failed! Failed Jobs: JobId Alias Feature Message Outputs job_1406677482986_0003 batting,grp_data,max_runs,runs MULTI_QUERY,COMBINER Message: Job failed! Input(s): Failed to read data from "hdfs://namenode.cmda.hadoop.com:8020/user/root/pig_data/Batting.csv" Output(s): Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1406677482986_0003 -> null, null 2014-07-29 17:51:18,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-07-29 17:51:18,827 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2106: Error executing an algebraic function Details at logfile: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log 2014-07-29 17:51:18,828 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job scope-58 failed, hadoop does not return any error message Details at logfile: /root/hadooptestingsuite/scripts/tests/pig_test/hadoop2/pig_1406677812103.log
try by casting MAX function max_runs = FOREACH grp_data GENERATE group as grp, (int)MAX(runs.runs) as max_runs; hope it will work
You should use data types in your load statement. runs = FOREACH batting GENERATE $0 as playerID:chararray, $1 as year:int, $8 as runs:int; If this doesn't help for some reason, try explicit casting. max_runs = FOREACH grp_data GENERATE group as grp, MAX((int)runs.runs) as max_runs;
Thank both #BigData and #Mikko Kupsu for the hint. The issue does indeed have something to do the datatype casting. After specifying the data type of each column as follows everything runs great. batting = LOAD '/user/root/pig_data/Batting.csv' USING PigStorage(',') AS (playerID: CHARARRAY, yearID: INT, stint: INT, teamID: CHARARRAY, lgID: CHARARRAY, G: INT, G_batting: INT, AB: INT, R: INT, H: INT, two_B: INT, three_B: INT, HR: INT, RBI: INT, SB: INT, CS: INT, BB:INT, SO: INT, IBB: INT, HBP: INT, SH: INT, SF: INT, GIDP: INT, G_old: INT);