Hive will not write to aws s3 - hadoop

I have an external table in hive that is stored on my hadoop cluster and I want to move its contents into an external table that is stored on Amazon s3.
So I created an s3 backed table like so:
CREATE EXTERNAL TABLE IF NOT EXISTS export.export_table
like table_to_be_exported
ROW FORMAT SERDE ...
with SERDEPROPERTIES ('fieldDelimiter'='|')
STORED AS TEXTFILE
LOCATION 's3a://bucket/folder';
Then I run: INSERT INTO export.export_table SELECT * FROM table_to_be_exported
It outputs the following
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : Starting Job = job_1435176004514_0028, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1435176004514_0028/
INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1435176004514_0028
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2015-07-06 09:22:18,379 Stage-1 map = 0%, reduce = 0%
INFO : 2015-07-06 09:22:27,795 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.9 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 900 msec
INFO : Ended Job = job_1435176004514_0028
INFO : Stage-4 is selected by condition resolver.
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-5 is filtered out by condition resolver.
INFO : Moving data to: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10000 from s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002
ERROR : Failed with exception Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
java.lang.IllegalArgumentException: Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1916)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1187)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2449)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)
I have s3a key and secret set in my hadoop core-site.xml and am able to do reads and writes from s3 using hadoop directly hdfs dfs -ls s3a://.
Any guesses as to what I could do to get this to work?

Try using s3 instead of s3a, my guess is that s3a is not supported yet in EMR's Hive distribution.

Related

ACID transactions on data added from Spark not working

I'm trying to use ACID transactions in Hive but I have a problem when the data are added with Spark.
First, I created a table with the following statement :
CREATE TABLE testdb.test(id string, col1 string)
CLUSTERED BY (id) INTO 4 BUCKETS
STORED AS ORC TBLPROPERTIES('transactional'='true');
Then I added data with those queries :
INSERT INTO testdb.test VALUES("1", "A");
INSERT INTO testdb.test VALUES("2", "B");
INSERT INTO testdb.test VALUES("3", "C");
And I've been able to delete rows with this query :
DELETE FROM testdb.test WHERE id="1";
All that worked perfectly, but a problem occurs when I try to delete rows that were added with Spark.
What I do in Spark (iPython) :
hc = HiveContext(sc)
data = sc.parallelize([["1", "A"], ["2", "B"], ["3", "C"]])
data_df = hc.createDataFrame(data)
data_df.registerTempTable(data_df)
hc.sql("INSERT INTO testdb.test SELECT * FROM data_df");
Then, when I come back to Hive, I'm able to run a SELECT query on this the "test" table.
However, when I try to run the exact same DELETE query as before, I have the following error (it happens after the reduce phase) :
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
I have no idea where this is coming from, that is why I'm looking for ideas.
I'm using the Cloudera Quickstart VM (5.4.2).
Hive version : 1.1.0
Spark Version : 1.3.0
And here is the complete output of the Hive DELETE command :
hive> delete from testdb.test where id="1";
Query ID = cloudera_20160914090303_795e40b7-ab6a-45b0-8391-6d41d1cfe7bd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1473858545651_0036, Tracking URL =http://quickstart.cloudera:8088/proxy/application_1473858545651_0036/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1473858545651_0036
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 4
2016-09-14 09:03:55,571 Stage-1 map = 0%, reduce = 0%
2016-09-14 09:04:14,898 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.66 sec
2016-09-14 09:04:15,944 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.33 sec
2016-09-14 09:04:44,101 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 4.21 sec
2016-09-14 09:04:46,523 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 4.79 sec
2016-09-14 09:04:47,673 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 5.8 sec
2016-09-14 09:04:50,041 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 7.45 sec
2016-09-14 09:05:18,486 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.69 sec
MapReduce Total cumulative CPU time: 7 seconds 690 msec
Ended Job = job_1473858545651_0036 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1473858545651_0036/
Examining task ID: task_1473858545651_0036_m_000000 (and more) from job job_1473858545651_0036
Task with the most failures(4):
-----
Task ID:
task_1473858545651_0036_r_000001
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1473858545651_0036&tipid=task_1473858545651_0036_r_000001
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":0}},"value":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 4 Cumulative CPU: 7.69 sec HDFS Read: 21558 HDFS Write: 114 FAIL
Total MapReduce CPU Time Spent: 7 seconds 690 msec
Thanks !
Use the Spark HiveAcid Datasource - http://github.com/qubole/spark-acid
val df = spark.read.format("HiveAcid").option("table", "testdb.test").load()
df.collect()
Spark needs to run with HMS 3.1.1 so that the underlying datasource can take necessary locks etc.
The directory structure, file formats are different for a Hive ACID table compared with a normal table. CRUD needs to happen from Hive.
With respect to Spark, normal table reads are not compatible with Hive ACID table reads. We could not use the native Spark apis to read the table.
Also, currently there is no support for updates, deletes, inserts in Spark
As for reading the data, one could use the connector - http://github.com/qubole/spark-acid
I had the same issue running from hue, but after I set these parameters from hive cli, it started working:
set hive.support.concurrency=true
set hive.enforce.bucketing=true
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DBTxnManager;
set hive.compactor.initiator.on=true;

Hive Insert - Failed with exception Unable to alter table. java.lang.NullPointerException

We are using Cloudera 5.6. We have configured Sentry for Hive. Whenever we issue an insert statment, it fails with the below exception. But when we check the table, the row is inserted properly. We have set all the permissions to hive.
$$$ insert into beckman values('rinku');
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1459405260708_0002
INFO : The url to track the job: http://xxx.xxx.xxx.xxx:8088/proxy/application_1459405260708_0002/
INFO : Starting Job = job_1459405260708_0002, Tracking URL = http://xxx.xxx.xxx.xxx:8088/proxy/application_1459405260708_0002/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop job -kill job_1459405260708_0002
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2016-03-31 23:20:31,401 Stage-1 map = 0%, reduce = 0%
INFO : 2016-03-31 23:20:37,788 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
INFO : MapReduce Total cumulative CPU time: 1 seconds 400 msec
INFO : Ended Job = job_1459405260708_0002
INFO : Stage-4 is selected by condition resolver.
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-5 is filtered out by condition resolver.
INFO : Moving data to: hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10000 from hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10002
INFO : Loading data to table test.beckman from hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10000
ERROR : Failed with exception Unable to alter table. java.lang.NullPointerException
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.NullPointerException
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:533)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:519)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1685)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:312)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1645)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1190)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:143)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:207)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: MetaException(message:java.lang.NullPointerException)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result$alter_table_with_cascade_resultStandardScheme.read(ThriftHiveMetastore.java:42087)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result$alter_table_with_cascade_resultStandardScheme.read(ThriftHiveMetastore.java:42064)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result.read(ThriftHiveMetastore.java:42006)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_cascade(ThriftHiveMetastore.java:1402)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_cascade(ThriftHiveMetastore.java:1386)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:340)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table(SessionHiveMetaStoreClient.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy10.alter_table(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1998)
at com.sun.proxy.$Proxy10.alter_table(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:531)
... 22 more
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)
I have noticed this issue when you try altering table with format/syntax along with database name. I.e something like : "<DATABASE>.<TABLENAME>".. Somehow this looks like an issue.
In order to overcome it, explicitly mention the database before you execute the actual query. A sample illustration :
use sampledb;
alter employeeTable .... ; // This works
vs
[Do not use the below]
alter sampledb.employeeTable.... // This doesn't work

HBase Hive Integration - Error

When I try to Load Data from HDFS to HBase using Hive logical tables, I am facing the following problem. I am new for hadoop and not able to trace the error,.I am using CDH4 VM,
Creating a new HBase table which is managed by Hive
CREATE TABLE hive_hbasetable(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hivehbasek1");
Hbase shell Output
hbase(main):002:0> list
TABLE
hivebasek1
mysql_cityclimate
2 row(s) in 0.2470 seconds
I created a logical table hive_logictable in Hive
CREATE TABLE hive_logictable (foo INT, bar STRING) row format delimited fields terminated by ',';
Inserting data in hive_logictable from HDFS.
cat TextFile.txt
100,value1
101,value2
102,value3
103,value4
104,value5
105,value6
LOAD DATA LOCAL INPATH '/home/cloudera/TextFile.txt' OVERWRITE INTO TABLE hive_logictable;
Loading data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable SELECT * FROM hive_logictable;
Below are the error messages throwing....
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501200937_0004, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201501200937_0004
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2015-01-20 10:38:07,412 Stage-0 map = 0%, reduce = 0%
2015-01-20 10:38:52,822 Stage-0 map = 100%, reduce = 100%
Ended Job = job_201501200937_0004 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201501200937_0004
Examining task ID: task_201501200937_0004_m_000002 (and more) from job job_201501200937_0004
Task with the most failures(4):
-----
Task ID:
task_201501200937_0004_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201501200937_0004&tipid=task_201501200937_0004_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
End of Error Message.
Could you please check if the atomic insert works fine on the HIVE table ? And share the results ?

error in hive-hbase integration

I am using hive version 0.12.0 , hadoop version 2.4.0 and hbase version 0.98.3
I created a table info in hbase, populated it with one row of data, gave hive its access (using external table)
when running the query
select count (*) from info;
I get
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1405407486295_0003, Tracking URL = http://prashasti-Vostro-2520:8088 /proxy/application_1405407486295_0003/
Kill Command = /home/prashasti/Installed/hadoop/bin/hadoop job -kill job_1405407486295_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-15 12:55:14,643 Stage-1 map = 0%, reduce = 0%
2014-07-15 12:55:39,914 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1405407486295_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1405407486295_0003_m_000000 (and more) from job job_1405407486295_0003
Task with the most failures(4):
-----
Task ID:
task_1405407486295_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1405407486295_0003& tipid=task_1405407486295_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:161)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:80)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat$1.close(HiveHBaseTableInputFormat.java:198)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I already tried the following
1) I made a directory hive/auxlib and added protobuf-.jar, zookeeper-.jar , hbase-client-.jar, hbase-server-.jar , hbase-common-.jar , hbase-protobuf-.jar, hive-hbase-handler-.jar ,guava-.jar from hbase/lib
This appears to be a known issue.
https://issues.apache.org/jira/browse/HIVE-4520

Datastax Enterprise 3.2 hive timeout exception

I tried to run simple hive query through Datastax Enterprise, but it always fails with timeout (on small data set or even empty tables). I've got 4 nodes of m1.large on AWS (2x Cassandra & 2x Analytics). See below:
cqlsh:intracker> select count(*) from event_tracks_by_browser_date LIMIT 100000;
count
-------
15030
Then with hive:
hive> select * from event_tracks_by_browser_date where type_id=10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312292327_0003, Tracking URL = http://node3:50030/jobdetails.jsp?jobid=job_201312292327_0003
Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=10.234.9.204:8012 -kill job_201312292327_0003
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2013-12-30 10:30:27,578 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:31:27,890 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:32:28,137 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:33:12,344 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201312292327_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201312292327_0003_m_000003 (and more) from job job_201312292327_0003
Exception in thread "Thread-10" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:240)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:227)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://node2:50060/tasklog?taskid=attempt_201312292327_0003_m_000000_1&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
at java.net.URL.openStream(URL.java:1037)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:192)
... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I checked system.log and it looks like sime kind of timeout appears.
INFO [IPC Server handler 6 on 8012] 2013-12-30 10:32:24,880 TaskInProgress.java (line 551) Error from attempt_201312292327_0003_m_000001_2: java.io.IOException: java.io.IOEx
ception: java.lang.RuntimeException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.io.IOException: java.lang.RuntimeException
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:100)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)
... 9 more
Caused by: java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:648)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:94)
... 10 more
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:42710)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1724)
at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1709)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:637)
... 14 more
Cassandra CQLSH works with no problems... any idea??
Try increasing your replication factor to 2 for Analytics.

Resources