hive multi table join with same condition error - hadoop

I am running several scripts and I keep getting this same error. All of them are multi tables join with the same condition.
Data is stored as parquet.
Hive version 1.2.1 / MR
SELECT count(*)
FROM xxx.tmp_usr_1 m
INNER JOIN xxx.tmp_usr n
ON m.date_id = n.date_id AND m.end_user_id = n.end_user_id
LEFT JOIN xxx.usr_2 p
ON m.date_id = p.date_id AND m.end_user_id = p.end_user_id;
Here is the error message:
2017-01-22 16:47:55,208 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 560.81 sec
2017-01-22 16:47:56,248 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 577.74 sec
2017-01-22 16:47:57,290 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 446.32 sec MapReduce
Total cumulative CPU time: 7 minutes 26 seconds 320 msec Ended Job = job_1484710871657_6350 with errors Error during job, obtaining debugging information... Examining task ID: task_1484710871657_6350_m_000061 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000069 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000053 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000011 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000063 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000049 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000052 (and more) from job job_1484710871657_6350
Task with the most failures(4):
----- Task ID: task_1484710871657_6350_m_000071
URL: http://xxxxxxxxxx/taskdetails.jsp?jobid=job_1484710871657_6350&tipid=task_1484710871657_6350_m_000071
----- Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
... 11 more Caused by: java.lang.IllegalStateException: Invalid schema data type, found: PRIMITIVE, expected: STRUCT
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:118)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:156)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:99)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
... 16 more
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
My data consists of about 20M records. When I try to join the tables with one column (end_user_id), I get the same error.
The join columns are the same data type. A join B as a subquery and then join C can solve this issue .
We have many SQL queries with multi-table join statements with the same condition, but only a few SQL scripts encounter these errors.

Make sure matching column data types should be same for all the tables

Related

Hive On MR java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter not found

I am running a group by query in kerberos secured hadoop cluster using hive. But it is having some error from logs we can see that map-reduce is completed then i am facing this error some time i face error of Some Other class not found. How to debug such issues and what are the possible cause of this kind of issues.
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2023-02-16 19:18:34,771 Stage-1 map = 0%, reduce = 0%
2023-02-16 19:18:54,195 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.47 sec
2023-02-16 19:19:34,108 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.47 sec
MapReduce Total cumulative CPU time: 10 seconds 470 msec
Ended Job = job_1676554595644_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1676554595644_0001_m_000000 (and more) from job job_1676554595644_0001
Task with the most failures(4):
-----
Task ID:
task_1676554595644_0001_r_000000
URL:
http://hadoop-namenode.hadoop.com:8088/taskdetails.jsp?jobid=job_1676554595644_0001&tipid=task_1676554595644_0001_r_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427)
at org.apache.hadoop.mapred.JobConf.getOutputCommitter(JobConf.java:725)
at org.apache.hadoop.mapred.Task.initialize(Task.java:602)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:332)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2395)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2419)
... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.HiveFileFormatUtils$NullOutputCommitter not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2299)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2393)
... 9 more

ACID transactions on data added from Spark not working

I'm trying to use ACID transactions in Hive but I have a problem when the data are added with Spark.
First, I created a table with the following statement :
CREATE TABLE testdb.test(id string, col1 string)
CLUSTERED BY (id) INTO 4 BUCKETS
STORED AS ORC TBLPROPERTIES('transactional'='true');
Then I added data with those queries :
INSERT INTO testdb.test VALUES("1", "A");
INSERT INTO testdb.test VALUES("2", "B");
INSERT INTO testdb.test VALUES("3", "C");
And I've been able to delete rows with this query :
DELETE FROM testdb.test WHERE id="1";
All that worked perfectly, but a problem occurs when I try to delete rows that were added with Spark.
What I do in Spark (iPython) :
hc = HiveContext(sc)
data = sc.parallelize([["1", "A"], ["2", "B"], ["3", "C"]])
data_df = hc.createDataFrame(data)
data_df.registerTempTable(data_df)
hc.sql("INSERT INTO testdb.test SELECT * FROM data_df");
Then, when I come back to Hive, I'm able to run a SELECT query on this the "test" table.
However, when I try to run the exact same DELETE query as before, I have the following error (it happens after the reduce phase) :
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
I have no idea where this is coming from, that is why I'm looking for ideas.
I'm using the Cloudera Quickstart VM (5.4.2).
Hive version : 1.1.0
Spark Version : 1.3.0
And here is the complete output of the Hive DELETE command :
hive> delete from testdb.test where id="1";
Query ID = cloudera_20160914090303_795e40b7-ab6a-45b0-8391-6d41d1cfe7bd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1473858545651_0036, Tracking URL =http://quickstart.cloudera:8088/proxy/application_1473858545651_0036/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1473858545651_0036
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 4
2016-09-14 09:03:55,571 Stage-1 map = 0%, reduce = 0%
2016-09-14 09:04:14,898 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.66 sec
2016-09-14 09:04:15,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.33 sec
2016-09-14 09:04:44,101 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 4.21 sec
2016-09-14 09:04:46,523 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 4.79 sec
2016-09-14 09:04:47,673 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 5.8 sec
2016-09-14 09:04:50,041 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 7.45 sec
2016-09-14 09:05:18,486 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.69 sec
MapReduce Total cumulative CPU time: 7 seconds 690 msec
Ended Job = job_1473858545651_0036 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1473858545651_0036/
Examining task ID: task_1473858545651_0036_m_000000 (and more) from job job_1473858545651_0036
Task with the most failures(4):
-----
Task ID:
task_1473858545651_0036_r_000001
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1473858545651_0036&tipid=task_1473858545651_0036_r_000001
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 4 Cumulative CPU: 7.69 sec HDFS Read: 21558 HDFS Write: 114 FAIL
Total MapReduce CPU Time Spent: 7 seconds 690 msec
Thanks !
Use the Spark HiveAcid Datasource - http://github.com/qubole/spark-acid
val df = spark.read.format("HiveAcid").option("table", "testdb.test").load()
df.collect()
Spark needs to run with HMS 3.1.1 so that the underlying datasource can take necessary locks etc.
The directory structure, file formats are different for a Hive ACID table compared with a normal table. CRUD needs to happen from Hive.
With respect to Spark, normal table reads are not compatible with Hive ACID table reads. We could not use the native Spark apis to read the table.
Also, currently there is no support for updates, deletes, inserts in Spark
As for reading the data, one could use the connector - http://github.com/qubole/spark-acid
I had the same issue running from hue, but after I set these parameters from hive cli, it started working:
set hive.support.concurrency=true
set hive.enforce.bucketing=true
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DBTxnManager;
set hive.compactor.initiator.on=true;

HBase Hive Integration - Error

When I try to Load Data from HDFS to HBase using Hive logical tables, I am facing the following problem. I am new for hadoop and not able to trace the error,.I am using CDH4 VM,
Creating a new HBase table which is managed by Hive
CREATE TABLE hive_hbasetable(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hivehbasek1");
Hbase shell Output
hbase(main):002:0> list
TABLE
hivebasek1
mysql_cityclimate
2 row(s) in 0.2470 seconds
I created a logical table hive_logictable in Hive
CREATE TABLE hive_logictable (foo INT, bar STRING) row format delimited fields terminated by ',';
Inserting data in hive_logictable from HDFS.
cat TextFile.txt
100,value1
101,value2
102,value3
103,value4
104,value5
105,value6
LOAD DATA LOCAL INPATH '/home/cloudera/TextFile.txt' OVERWRITE INTO TABLE hive_logictable;
Loading data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable SELECT * FROM hive_logictable;
Below are the error messages throwing....
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501200937_0004, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201501200937_0004
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2015-01-20 10:38:07,412 Stage-0 map = 0%, reduce = 0%
2015-01-20 10:38:52,822 Stage-0 map = 100%, reduce = 100%
Ended Job = job_201501200937_0004 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Examining task ID: task_201501200937_0004_m_000002 (and more) from job job_201501200937_0004
Task with the most failures(4):
-----
Task ID:
task_201501200937_0004_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201501200937_0004&tipid=task_201501200937_0004_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
End of Error Message.
Could you please check if the atomic insert works fine on the HIVE table ? And share the results ?

error in hive-hbase integration

I am using hive version 0.12.0 , hadoop version 2.4.0 and hbase version 0.98.3
I created a table info in hbase, populated it with one row of data, gave hive its access (using external table)
when running the query
select count (*) from info;
I get
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1405407486295_0003, Tracking URL = http://prashasti-Vostro-2520:8088 /proxy/application_1405407486295_0003/
Kill Command = /home/prashasti/Installed/hadoop/bin/hadoop job -kill job_1405407486295_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-15 12:55:14,643 Stage-1 map = 0%, reduce = 0%
2014-07-15 12:55:39,914 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1405407486295_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1405407486295_0003_m_000000 (and more) from job job_1405407486295_0003
Task with the most failures(4):
-----
Task ID:
task_1405407486295_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1405407486295_0003& tipid=task_1405407486295_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:161)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:80)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.close(HiveHBaseTableInputFormat.java:198)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I already tried the following
1) I made a directory hive/auxlib and added protobuf-.jar, zookeeper-.jar , hbase-client-.jar, hbase-server-.jar , hbase-common-.jar , hbase-protobuf-.jar, hive-hbase-handler-.jar ,guava-.jar from hbase/lib
This appears to be a known issue.
https://issues.apache.org/jira/browse/HIVE-4520

Datastax Enterprise 3.2 hive timeout exception

I tried to run simple hive query through Datastax Enterprise, but it always fails with timeout (on small data set or even empty tables). I've got 4 nodes of m1.large on AWS (2x Cassandra & 2x Analytics). See below:
cqlsh:intracker> select count(*) from event_tracks_by_browser_date LIMIT 100000;
count
-------
15030
Then with hive:
hive> select * from event_tracks_by_browser_date where type_id=10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312292327_0003, Tracking URL = http://node3:50030/jobdetails.jsp?jobid=job_201312292327_0003
Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=10.234.9.204:8012 -kill job_201312292327_0003
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2013-12-30 10:30:27,578 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:31:27,890 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:32:28,137 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:33:12,344 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201312292327_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201312292327_0003_m_000003 (and more) from job job_201312292327_0003
Exception in thread "Thread-10" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:240)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:227)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://node2:50060/tasklog?taskid=attempt_201312292327_0003_m_000000_1&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
at java.net.URL.openStream(URL.java:1037)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:192)
... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I checked system.log and it looks like sime kind of timeout appears.
INFO [IPC Server handler 6 on 8012] 2013-12-30 10:32:24,880 TaskInProgress.java (line 551) Error from attempt_201312292327_0003_m_000001_2: java.io.IOException: java.io.IOEx
ception: java.lang.RuntimeException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.io.IOException: java.lang.RuntimeException
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:100)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)
... 9 more
Caused by: java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:648)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:94)
... 10 more
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:42710)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1724)
at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1709)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:637)
... 14 more
Cassandra CQLSH works with no problems... any idea??
Try increasing your replication factor to 2 for Analytics.

Resources