Cloudera Hive: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce - hadoop

I keep getting this error when trying to query data using hue
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce
From the hue job browser under the syslog tab
The error log is too big to paste here
http://pastebin.com/h8tgYuzR
Error from terminal
hive> SELECT count(*) FROM tweets;
Query ID = cloudera_20161128145151_137efb02-413b-4457-b21d-084101b77091
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1480364897609_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1480364897609_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-28 14:52:09,804 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:10,955 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:13,213 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1480364897609_0003 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Examining task ID: task_1480364897609_0003_m_000000 (and more) from job job_1480364897609_0003
Task with the most failures(4):
-----
Task ID:
task_1480364897609_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1480364897609_0003&tipid=task_1480364897609_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:496)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2396)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126)
... 12 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Here is the table
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/cloudera/flume/tweets';
data from the file I am trying to load http://pastebin.com/g7eg1BaP

I have a feeling that the table was defined using AVRO as the data type but a non-avro file was loaded to the table.
Remember that Hive is "schema on read" and not "schema on load". It will check for schema only when a job is run, not during loading or defining.
Please post the CREATE TABLE command used and a few records from the file you are trying to load.
Hope this helps.

Related

Hive error: java.lang.Throwable: Child Error

I am using CDH 5.9, while executing following hive query it is throwing error. Any idea about the issue?
For normal select query its working but for complex query it results failure.
hive> select * from table where dt='22-01-2017' and field like '%xyz%' limit 10;
Query ID = hdfs_20170123200303_44a9c423-4bb3-4f80-ade4-b1312971eb63
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201701131637_0067, Tracking URL = http://cdhum03.temp-dsc-updates.bms.bz:50030/jobdetails.jsp?jobid=job_201701131637_0067
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201701131637_0067
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2017-01-23 20:05:46,563 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201701131637_0067 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cdhum03.temp-dsc-updates.bms.bz:50030/jobdetails.jsp?jobid=job_201701131637_0067
Examining task ID: task_201701131637_0067_m_000007 (and more) from job job_201701131637_0067
Examining task ID: task_201701131637_0067_r_000000 (and more) from job job_201701131637_0067
Task with the most failures(4):
-----
Task ID:
task_201701131637_0067_m_000006
URL:
http://cdhum03.temp-dsc-updates.bms.bz:50030/taskdetails.jsp?jobid=job_201701131637_0067&tipid=task_201701131637_0067_m_000006
-----
Diagnostic Messages for this Task:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 6 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Thanks.
Please check your data size as your job needs more space for logs but the jvm are less please scale your cluster or use specific query as you are using -
select * from table where dt='22-01-2017' and field like '%xyz%' limit 10
, as '%xyz%' will check whole data better to use specific requirement.
Else drop your table and create a new partitioned table with date as a partition column.

hive multi table join with same condition error

I am running several scripts and I keep getting this same error. All of them are multi tables join with the same condition.
Data is stored as parquet.
Hive version 1.2.1 / MR
SELECT count(*)
FROM xxx.tmp_usr_1 m
INNER JOIN xxx.tmp_usr n
ON m.date_id = n.date_id AND m.end_user_id = n.end_user_id
LEFT JOIN xxx.usr_2 p
ON m.date_id = p.date_id AND m.end_user_id = p.end_user_id;
Here is the error message:
2017-01-22 16:47:55,208 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 560.81 sec
2017-01-22 16:47:56,248 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 577.74 sec
2017-01-22 16:47:57,290 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 446.32 sec MapReduce
Total cumulative CPU time: 7 minutes 26 seconds 320 msec Ended Job = job_1484710871657_6350 with errors Error during job, obtaining debugging information... Examining task ID: task_1484710871657_6350_m_000061 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000069 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000053 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000011 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000063 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000049 (and more) from job job_1484710871657_6350 Examining task ID: task_1484710871657_6350_m_000052 (and more) from job job_1484710871657_6350
Task with the most failures(4):
----- Task ID: task_1484710871657_6350_m_000071
URL: http://xxxxxxxxxx/taskdetails.jsp?jobid=job_1484710871657_6350&tipid=task_1484710871657_6350_m_000071
----- Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
... 11 more Caused by: java.lang.IllegalStateException: Invalid schema data type, found: PRIMITIVE, expected: STRUCT
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:118)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:156)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:99)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
... 16 more
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
My data consists of about 20M records. When I try to join the tables with one column (end_user_id), I get the same error.
The join columns are the same data type. A join B as a subquery and then join C can solve this issue .
We have many SQL queries with multi-table join statements with the same condition, but only a few SQL scripts encounter these errors.
Make sure matching column data types should be same for all the tables

spark-sql (hive#spark and hive#hadoop) dies with exceptions

Spark-SQL dies with the following exceptions:
Lost task 13.0 in stage 1.0 (TID 14, 10.15.0.224): java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.AttributeMap; local class incompatible: stream classdesc serialVersionUID = -4625798594364953144, local class serialVersionUID = -4625798594364953144
and / or
com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
What is going on?
More details follow below.
Case: Spark-sql
single server case, local configuration
downloaded spark-1.4.0-bin-hadoop2.6 distribution
centos 6.5 Server (32 Cores, 100GB RAM)
configuration:
(variables)
export SPARK_HOME=/opt/spark-1.4.0-bin-hadoop2.6
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_LOCAL_DIRS=$SPARK_HOME/work
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMORY=4G
export SPARK_WORKER_INSTANCES=2
export SPARK_DAEMON_MEMORY=384m
(starting and running)
$SPARK_HOME/sbin/start-all.sh
$SPARK_HOME/bin/spark-sql --master spark://127.0.0.1:7077
The data reside in *.csv.gz individual files (a few thousands of them, a few tens to hundred of megabytes each). I've tested all files for gzip/io errors (non found). The format:
A1;A2;A3;A4;A5;A6;A7;A8;A9
'string';841054310;0;11.383907333;48.788023833;380.700000000;'2014-04-28T06:11:01.753990';0;'2015-06-26T23:54:49.461211'
'string';841054310;1;11.383867000;48.788031667;381.000000000;'2014-04-28T06:14:15.272059';4.77;'2015-06-26T23:55:03.132637'
'string';841054310;2;11.383829000;48.788019000;381.000000000;'2014-04-28T06:14:18.765123';3.19;'2015-06-26T23:55:03.414938'
'string';841054310;3;11.383804667;48.788041333;380.900000000;'2014-04-28T06:14:28.477338';5.1;'2015-06-26T23:55:04.190245'
'string';841054310;4;11.383767167;48.788053167;381.000000000;'2014-04-28T06:14:31.765796';4.29;'2015-06-26T23:55:04.459112'
'string';841054310;5;11.383726500;48.788057667;381.000000000;'2014-04-28T06:14:33.778935';6.18;'2015-06-26T23:55:04.628419'
'string';841054310;6;11.383685667;48.788059333;381.000000000;'2014-04-28T06:14:35.584490';5.71;'2015-06-26T23:55:04.779281'
'string';841054310;7;11.383643667;48.788062833;381.000000000;'2014-04-28T06:14:37.289736';9.21;'2015-06-26T23:55:04.921655'
'string';841054310;8;11.383601333;48.788069333;381.100000000;'2014-04-28T06:14:38.463847';10.78;'2015-06-26T23:55:05.022049'
'string';841054310;9;11.383558000;48.788074500;381.200000000;'2014-04-28T06:14:39.570567';10.92;'2015-06-26T23:55:05.118134'
'string';841054310;10;11.383514500;48.788076000;381.200000000;'2014-04-28T06:14:40.757880';6.88;'2015-06-26T23:55:05.217862'
'string';841054310;11;11.383472000;48.788074000;381.100000000;'2014-04-28T06:14:43.364629';7.45;'2015-06-26T23:55:05.440946'
'string';841054310;12;11.383423667;48.788068333;381.100000000;'2014-04-28T06:14:44.762990';11.91;'2015-06-26T23:55:05.565626'
'string';841054310;13;11.383375833;48.788059000;381.100000000;'2014-04-28T06:14:45.762960';15.1;'2015-06-26T23:55:05.653718'
Spark-SQL commands:
create external table raw (
A1 string,
A2 int,
A3 int,
A4 double,
A5 double,
A6 double,
A7 string,
A8 double,
A9 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
location '/directory/'
alter table raw set tblproperties('skip.header.line.count'='1');
select count(*) from raw;
Typical error:
15/07/15 20:13:22 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
Serialization trace:
org$apache$spark$scheduler$CompressedMapStatus$$loc (org.apache.spark.scheduler.CompressedMapStatus)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:265)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:95)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
at org.apache.spark.scheduler.CompressedMapStatusFieldAccess.set(Unknown Source)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:617)
... 12 more
for a directory with a few tens of Data-files spark computes the result without an error. Once more files are added errors are thrown.
Case hive#spark:
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://127.0.0.1:7077 --hiveconf hive.server2.thrift.port=10001 --driver-memory 2G
$SPARK_HOME/bin/beeline
Beeline version 1.4.0 by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10001
0: jdbc:hive2://127.0.0.1:10001> select count(*) from raw;
I get the following error:
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Buffer underflow.
Serialization trace:
org$apache$spark$scheduler$CompressedMapStatus$$compressedSizes (org.apache.spark.scheduler.CompressedMapStatus) (state=,code=0)
Case: hive#hadoop:
Details:
single server, local mode
extracted hadoop-2.6.0 distribution under /opt
extracted apache-hive-1.2.1-bin distribution under /opt
extracted db-derby-10.11.1.1-bin distribution under /opt
configuration:
export HADOOP_HOME=/opt/hadoop-2.6.0
export DERBY_HOME=/opt/db-derby-10.11.1.1-bin
. /opt/db-derby-10.11.1.1-bin/bin/setEmbeddedCP
export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'
export HIVE_HOME=/opt/apache-hive-1.2.1-bin
export JAVA_HOME='/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/'
Running the thing:
$HIVE_HOME/bin/hive
hive> select count(*) from raw;
Output (and error):
Query ID = root_20150715204331_a7a4be13-31e5-41d2-a7dd-8fdbf9d9f2b0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-07-15 20:44:02,482 Stage-1 map = 0%, reduce = 0%
Ended Job = job_local1201654881_0001 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive>
where the hack are the log information stored in this case? I cannot find anything under /tmp/hive, nor in /opt/hadoop-2.6.0/, nor in /opt/apache-hive-1.2.1-bin

HBase Hive Integration - Error

When I try to Load Data from HDFS to HBase using Hive logical tables, I am facing the following problem. I am new for hadoop and not able to trace the error,.I am using CDH4 VM,
Creating a new HBase table which is managed by Hive
CREATE TABLE hive_hbasetable(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hivehbasek1");
Hbase shell Output
hbase(main):002:0> list
TABLE
hivebasek1
mysql_cityclimate
2 row(s) in 0.2470 seconds
I created a logical table hive_logictable in Hive
CREATE TABLE hive_logictable (foo INT, bar STRING) row format delimited fields terminated by ',';
Inserting data in hive_logictable from HDFS.
cat TextFile.txt
100,value1
101,value2
102,value3
103,value4
104,value5
105,value6
LOAD DATA LOCAL INPATH '/home/cloudera/TextFile.txt' OVERWRITE INTO TABLE hive_logictable;
Loading data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable SELECT * FROM hive_logictable;
Below are the error messages throwing....
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501200937_0004, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201501200937_0004
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2015-01-20 10:38:07,412 Stage-0 map = 0%, reduce = 0%
2015-01-20 10:38:52,822 Stage-0 map = 100%, reduce = 100%
Ended Job = job_201501200937_0004 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Examining task ID: task_201501200937_0004_m_000002 (and more) from job job_201501200937_0004
Task with the most failures(4):
-----
Task ID:
task_201501200937_0004_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201501200937_0004&tipid=task_201501200937_0004_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
End of Error Message.
Could you please check if the atomic insert works fine on the HIVE table ? And share the results ?

error in hive-hbase integration

I am using hive version 0.12.0 , hadoop version 2.4.0 and hbase version 0.98.3
I created a table info in hbase, populated it with one row of data, gave hive its access (using external table)
when running the query
select count (*) from info;
I get
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1405407486295_0003, Tracking URL = http://prashasti-Vostro-2520:8088 /proxy/application_1405407486295_0003/
Kill Command = /home/prashasti/Installed/hadoop/bin/hadoop job -kill job_1405407486295_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-15 12:55:14,643 Stage-1 map = 0%, reduce = 0%
2014-07-15 12:55:39,914 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1405407486295_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1405407486295_0003_m_000000 (and more) from job job_1405407486295_0003
Task with the most failures(4):
-----
Task ID:
task_1405407486295_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1405407486295_0003& tipid=task_1405407486295_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:161)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:80)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.close(HiveHBaseTableInputFormat.java:198)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I already tried the following
1) I made a directory hive/auxlib and added protobuf-.jar, zookeeper-.jar , hbase-client-.jar, hbase-server-.jar , hbase-common-.jar , hbase-protobuf-.jar, hive-hbase-handler-.jar ,guava-.jar from hbase/lib
This appears to be a known issue.
https://issues.apache.org/jira/browse/HIVE-4520

Resources