spark-sql (hive#spark and hive#hadoop) dies with exceptions - hadoop

Spark-SQL dies with the following exceptions:
Lost task 13.0 in stage 1.0 (TID 14, 10.15.0.224): java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.AttributeMap; local class incompatible: stream classdesc serialVersionUID = -4625798594364953144, local class serialVersionUID = -4625798594364953144
and / or
com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
What is going on?
More details follow below.
Case: Spark-sql
single server case, local configuration
downloaded spark-1.4.0-bin-hadoop2.6 distribution
centos 6.5 Server (32 Cores, 100GB RAM)
configuration:
(variables)
export SPARK_HOME=/opt/spark-1.4.0-bin-hadoop2.6
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_LOCAL_DIRS=$SPARK_HOME/work
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMORY=4G
export SPARK_WORKER_INSTANCES=2
export SPARK_DAEMON_MEMORY=384m
(starting and running)
$SPARK_HOME/sbin/start-all.sh
$SPARK_HOME/bin/spark-sql --master spark://127.0.0.1:7077
The data reside in *.csv.gz individual files (a few thousands of them, a few tens to hundred of megabytes each). I've tested all files for gzip/io errors (non found). The format:
A1;A2;A3;A4;A5;A6;A7;A8;A9
'string';841054310;0;11.383907333;48.788023833;380.700000000;'2014-04-28T06:11:01.753990';0;'2015-06-26T23:54:49.461211'
'string';841054310;1;11.383867000;48.788031667;381.000000000;'2014-04-28T06:14:15.272059';4.77;'2015-06-26T23:55:03.132637'
'string';841054310;2;11.383829000;48.788019000;381.000000000;'2014-04-28T06:14:18.765123';3.19;'2015-06-26T23:55:03.414938'
'string';841054310;3;11.383804667;48.788041333;380.900000000;'2014-04-28T06:14:28.477338';5.1;'2015-06-26T23:55:04.190245'
'string';841054310;4;11.383767167;48.788053167;381.000000000;'2014-04-28T06:14:31.765796';4.29;'2015-06-26T23:55:04.459112'
'string';841054310;5;11.383726500;48.788057667;381.000000000;'2014-04-28T06:14:33.778935';6.18;'2015-06-26T23:55:04.628419'
'string';841054310;6;11.383685667;48.788059333;381.000000000;'2014-04-28T06:14:35.584490';5.71;'2015-06-26T23:55:04.779281'
'string';841054310;7;11.383643667;48.788062833;381.000000000;'2014-04-28T06:14:37.289736';9.21;'2015-06-26T23:55:04.921655'
'string';841054310;8;11.383601333;48.788069333;381.100000000;'2014-04-28T06:14:38.463847';10.78;'2015-06-26T23:55:05.022049'
'string';841054310;9;11.383558000;48.788074500;381.200000000;'2014-04-28T06:14:39.570567';10.92;'2015-06-26T23:55:05.118134'
'string';841054310;10;11.383514500;48.788076000;381.200000000;'2014-04-28T06:14:40.757880';6.88;'2015-06-26T23:55:05.217862'
'string';841054310;11;11.383472000;48.788074000;381.100000000;'2014-04-28T06:14:43.364629';7.45;'2015-06-26T23:55:05.440946'
'string';841054310;12;11.383423667;48.788068333;381.100000000;'2014-04-28T06:14:44.762990';11.91;'2015-06-26T23:55:05.565626'
'string';841054310;13;11.383375833;48.788059000;381.100000000;'2014-04-28T06:14:45.762960';15.1;'2015-06-26T23:55:05.653718'
Spark-SQL commands:
create external table raw (
A1 string,
A2 int,
A3 int,
A4 double,
A5 double,
A6 double,
A7 string,
A8 double,
A9 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
location '/directory/'
alter table raw set tblproperties('skip.header.line.count'='1');
select count(*) from raw;
Typical error:
15/07/15 20:13:22 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
Serialization trace:
org$apache$spark$scheduler$CompressedMapStatus$$loc (org.apache.spark.scheduler.CompressedMapStatus)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:265)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:95)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.storage.BlockManagerId
at org.apache.spark.scheduler.CompressedMapStatusFieldAccess.set(Unknown Source)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:617)
... 12 more
for a directory with a few tens of Data-files spark computes the result without an error. Once more files are added errors are thrown.
Case hive#spark:
$SPARK_HOME/sbin/start-thriftserver.sh --master spark://127.0.0.1:7077 --hiveconf hive.server2.thrift.port=10001 --driver-memory 2G
$SPARK_HOME/bin/beeline
Beeline version 1.4.0 by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10001
0: jdbc:hive2://127.0.0.1:10001> select count(*) from raw;
I get the following error:
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: Buffer underflow.
Serialization trace:
org$apache$spark$scheduler$CompressedMapStatus$$compressedSizes (org.apache.spark.scheduler.CompressedMapStatus) (state=,code=0)
Case: hive#hadoop:
Details:
single server, local mode
extracted hadoop-2.6.0 distribution under /opt
extracted apache-hive-1.2.1-bin distribution under /opt
extracted db-derby-10.11.1.1-bin distribution under /opt
configuration:
export HADOOP_HOME=/opt/hadoop-2.6.0
export DERBY_HOME=/opt/db-derby-10.11.1.1-bin
. /opt/db-derby-10.11.1.1-bin/bin/setEmbeddedCP
export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'
export HIVE_HOME=/opt/apache-hive-1.2.1-bin
export JAVA_HOME='/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/'
Running the thing:
$HIVE_HOME/bin/hive
hive> select count(*) from raw;
Output (and error):
Query ID = root_20150715204331_a7a4be13-31e5-41d2-a7dd-8fdbf9d9f2b0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-07-15 20:44:02,482 Stage-1 map = 0%, reduce = 0%
Ended Job = job_local1201654881_0001 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive>
where the hack are the log information stored in this case? I cannot find anything under /tmp/hive, nor in /opt/hadoop-2.6.0/, nor in /opt/apache-hive-1.2.1-bin

Related

Hive error: java.lang.Throwable: Child Error

I am using CDH 5.9, while executing following hive query it is throwing error. Any idea about the issue?
For normal select query its working but for complex query it results failure.
hive> select * from table where dt='22-01-2017' and field like '%xyz%' limit 10;
Query ID = hdfs_20170123200303_44a9c423-4bb3-4f80-ade4-b1312971eb63
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201701131637_0067, Tracking URL = http://cdhum03.temp-dsc-updates.bms.bz:50030/jobdetails.jsp?jobid=job_201701131637_0067
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201701131637_0067
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2017-01-23 20:05:46,563 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201701131637_0067 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cdhum03.temp-dsc-updates.bms.bz:50030/jobdetails.jsp?jobid=job_201701131637_0067
Examining task ID: task_201701131637_0067_m_000007 (and more) from job job_201701131637_0067
Examining task ID: task_201701131637_0067_r_000000 (and more) from job job_201701131637_0067
Task with the most failures(4):
-----
Task ID:
task_201701131637_0067_m_000006
URL:
http://cdhum03.temp-dsc-updates.bms.bz:50030/taskdetails.jsp?jobid=job_201701131637_0067&tipid=task_201701131637_0067_m_000006
-----
Diagnostic Messages for this Task:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 6 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Thanks.
Please check your data size as your job needs more space for logs but the jvm are less please scale your cluster or use specific query as you are using -
select * from table where dt='22-01-2017' and field like '%xyz%' limit 10
, as '%xyz%' will check whole data better to use specific requirement.
Else drop your table and create a new partitioned table with date as a partition column.

Cloudera Hive: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce

I keep getting this error when trying to query data using hue
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce
From the hue job browser under the syslog tab
The error log is too big to paste here
http://pastebin.com/h8tgYuzR
Error from terminal
hive> SELECT count(*) FROM tweets;
Query ID = cloudera_20161128145151_137efb02-413b-4457-b21d-084101b77091
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1480364897609_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1480364897609_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-28 14:52:09,804 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:10,955 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:13,213 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1480364897609_0003 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Examining task ID: task_1480364897609_0003_m_000000 (and more) from job job_1480364897609_0003
Task with the most failures(4):
-----
Task ID:
task_1480364897609_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1480364897609_0003&tipid=task_1480364897609_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:496)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2396)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126)
... 12 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Here is the table
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/cloudera/flume/tweets';
data from the file I am trying to load http://pastebin.com/g7eg1BaP
I have a feeling that the table was defined using AVRO as the data type but a non-avro file was loaded to the table.
Remember that Hive is "schema on read" and not "schema on load". It will check for schema only when a job is run, not during loading or defining.
Please post the CREATE TABLE command used and a few records from the file you are trying to load.
Hope this helps.

Hive will not write to aws s3

I have an external table in hive that is stored on my hadoop cluster and I want to move its contents into an external table that is stored on Amazon s3.
So I created an s3 backed table like so:
CREATE EXTERNAL TABLE IF NOT EXISTS export.export_table
like table_to_be_exported
ROW FORMAT SERDE ...
with SERDEPROPERTIES ('fieldDelimiter'='|')
STORED AS TEXTFILE
LOCATION 's3a://bucket/folder';
Then I run: INSERT INTO export.export_table SELECT * FROM table_to_be_exported
It outputs the following
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : Starting Job = job_1435176004514_0028, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1435176004514_0028/
INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1435176004514_0028
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2015-07-06 09:22:18,379 Stage-1 map = 0%, reduce = 0%
INFO : 2015-07-06 09:22:27,795 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.9 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 900 msec
INFO : Ended Job = job_1435176004514_0028
INFO : Stage-4 is selected by condition resolver.
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-5 is filtered out by condition resolver.
INFO : Moving data to: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10000 from s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002
ERROR : Failed with exception Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
java.lang.IllegalArgumentException: Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1916)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1187)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2449)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)
I have s3a key and secret set in my hadoop core-site.xml and am able to do reads and writes from s3 using hadoop directly hdfs dfs -ls s3a://.
Any guesses as to what I could do to get this to work?
Try using s3 instead of s3a, my guess is that s3a is not supported yet in EMR's Hive distribution.

HBase Hive Integration - Error

When I try to Load Data from HDFS to HBase using Hive logical tables, I am facing the following problem. I am new for hadoop and not able to trace the error,.I am using CDH4 VM,
Creating a new HBase table which is managed by Hive
CREATE TABLE hive_hbasetable(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hivehbasek1");
Hbase shell Output
hbase(main):002:0> list
TABLE
hivebasek1
mysql_cityclimate
2 row(s) in 0.2470 seconds
I created a logical table hive_logictable in Hive
CREATE TABLE hive_logictable (foo INT, bar STRING) row format delimited fields terminated by ',';
Inserting data in hive_logictable from HDFS.
cat TextFile.txt
100,value1
101,value2
102,value3
103,value4
104,value5
105,value6
LOAD DATA LOCAL INPATH '/home/cloudera/TextFile.txt' OVERWRITE INTO TABLE hive_logictable;
Loading data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable SELECT * FROM hive_logictable;
Below are the error messages throwing....
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501200937_0004, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201501200937_0004
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2015-01-20 10:38:07,412 Stage-0 map = 0%, reduce = 0%
2015-01-20 10:38:52,822 Stage-0 map = 100%, reduce = 100%
Ended Job = job_201501200937_0004 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Examining task ID: task_201501200937_0004_m_000002 (and more) from job job_201501200937_0004
Task with the most failures(4):
-----
Task ID:
task_201501200937_0004_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201501200937_0004&tipid=task_201501200937_0004_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
End of Error Message.
Could you please check if the atomic insert works fine on the HIVE table ? And share the results ?

error in hive-hbase integration

I am using hive version 0.12.0 , hadoop version 2.4.0 and hbase version 0.98.3
I created a table info in hbase, populated it with one row of data, gave hive its access (using external table)
when running the query
select count (*) from info;
I get
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1405407486295_0003, Tracking URL = http://prashasti-Vostro-2520:8088 /proxy/application_1405407486295_0003/
Kill Command = /home/prashasti/Installed/hadoop/bin/hadoop job -kill job_1405407486295_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-15 12:55:14,643 Stage-1 map = 0%, reduce = 0%
2014-07-15 12:55:39,914 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1405407486295_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1405407486295_0003_m_000000 (and more) from job job_1405407486295_0003
Task with the most failures(4):
-----
Task ID:
task_1405407486295_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1405407486295_0003& tipid=task_1405407486295_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:161)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:80)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.close(HiveHBaseTableInputFormat.java:198)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I already tried the following
1) I made a directory hive/auxlib and added protobuf-.jar, zookeeper-.jar , hbase-client-.jar, hbase-server-.jar , hbase-common-.jar , hbase-protobuf-.jar, hive-hbase-handler-.jar ,guava-.jar from hbase/lib
This appears to be a known issue.
https://issues.apache.org/jira/browse/HIVE-4520

Resources