Hive Insert - Failed with exception Unable to alter table. java.lang.NullPointerException - insert

We are using Cloudera 5.6. We have configured Sentry for Hive. Whenever we issue an insert statment, it fails with the below exception. But when we check the table, the row is inserted properly. We have set all the permissions to hive.
$$$ insert into beckman values('rinku');
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1459405260708_0002
INFO : The url to track the job: http://xxx.xxx.xxx.xxx:8088/proxy/application_1459405260708_0002/
INFO : Starting Job = job_1459405260708_0002, Tracking URL = http://xxx.xxx.xxx.xxx:8088/proxy/application_1459405260708_0002/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop job -kill job_1459405260708_0002
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2016-03-31 23:20:31,401 Stage-1 map = 0%, reduce = 0%
INFO : 2016-03-31 23:20:37,788 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
INFO : MapReduce Total cumulative CPU time: 1 seconds 400 msec
INFO : Ended Job = job_1459405260708_0002
INFO : Stage-4 is selected by condition resolver.
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-5 is filtered out by condition resolver.
INFO : Moving data to: hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10000 from hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10002
INFO : Loading data to table test.beckman from hdfs://ip-172-31-0-203.us-west-2.compute.internal:8020/user/hive2/warehouse/test.db/beckman/.hive-staging_hive_2016-03-31_23-20-23_107_7263355827488393299-2/-ext-10000
ERROR : Failed with exception Unable to alter table. java.lang.NullPointerException
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.NullPointerException
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:533)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:519)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1685)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:312)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1645)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1190)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:143)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:207)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: MetaException(message:java.lang.NullPointerException)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result$alter_table_with_cascade_resultStandardScheme.read(ThriftHiveMetastore.java:42087)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result$alter_table_with_cascade_resultStandardScheme.read(ThriftHiveMetastore.java:42064)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_cascade_result.read(ThriftHiveMetastore.java:42006)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_cascade(ThriftHiveMetastore.java:1402)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_cascade(ThriftHiveMetastore.java:1386)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:340)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table(SessionHiveMetaStoreClient.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy10.alter_table(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1998)
at com.sun.proxy.$Proxy10.alter_table(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:531)
... 22 more
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)

I have noticed this issue when you try altering table with format/syntax along with database name. I.e something like : "<DATABASE>.<TABLENAME>".. Somehow this looks like an issue.
In order to overcome it, explicitly mention the database before you execute the actual query. A sample illustration :
use sampledb;
alter employeeTable .... ; // This works
vs
[Do not use the below]
alter sampledb.employeeTable.... // This doesn't work

Related

hive: java.lang.IllegalArgumentException: Can not create a Path from an empty string

I'm using Hive 1.1.0, cloudera cdh5.4.7, Hadoop 2.6.0.
When i ran a query with an aggregation function with hive , i got the following error :
ERROR : Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)'
The log file :
INFO : Number of reduce tasks not specified. Defaulting to jobconf value of: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : Cleaning up the staging area /user/hive/.staging/job_1453471005617_0680
ERROR : Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)'
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.<init>(Path.java:135)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1398)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1182)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1048)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any idea please ?

Cosmos Hive error entering and using map reduce

I've a couple of problems executing Hive on cosmos fiware lab instance.
First, after log into the machine, I enter in Hive command line and I get the following error (I saw other questions related to this, but I couldn't find a solution):
$ hive
log4j:ERROR Could not instantiate class [org.apache.hadoop.hive.shims.HiveEventCounter].
java.lang.RuntimeException: Could not load shims in class org.apache.hadoop.log.metrics.EventCounter
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:123)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
at org.apache.hadoop.hive.shims.ShimLoader.getEventCounter(ShimLoader.java:98)
at org.apache.hadoop.hive.shims.HiveEventCounter.<init>(HiveEventCounter.java:34)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:357)
at java.lang.Class.newInstance(Class.java:310)
at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:330)
at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)
at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:354)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:127)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:641)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.log.metrics.EventCounter
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
... 27 more
log4j:ERROR Could not instantiate appender named "EventCounter".
Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
However, I'm able to run a query like SELECT * FROM table;
On the other hand, if I try to run other query more specific like display only a column field, a map reduce job starts to run and it results in the following error:
hive> SELECT table.column FROM table;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201507101501_40071, Tracking URL = http://cosmosmaster-gi:50030/jobdetails.jsp?jobid=job_201507101501_40071
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -kill job_201507101501_40071
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-01-29 12:49:45,518 Stage-1 map = 0%, reduce = 0%
2016-01-29 12:50:08,642 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201507101501_40071 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cosmosmaster-gi:50030/jobdetails.jsp?jobid=job_201507101501_40071
Examining task ID: task_201507101501_40071_m_000002 (and more) from job job_201507101501_40071
Task with the most failures(4):
-----
Task ID:
task_201507101501_40071_m_000000
URL:
http://cosmosmaster-gi:50030/taskdetails.jsp?jobid=job_201507101501_40071&tipid=task_201507101501_40071_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Any help or suggestion is welcome.
Thanks.
The first error is not relevant and does not affect Hive querying, as you have seen.
Regarding the second error, most probably it is because the stored data in HDFS is in Json format (most probably stored by the Cygnus tool) and a Json SerializerDeserializer (serde) must be set. You can do this by executoing the following sentence before doing the select column from table:
$ add jar /usr/local/apache-hive-0.13.0-bin/lib/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar;
$ select column from table;

Hive will not write to aws s3

I have an external table in hive that is stored on my hadoop cluster and I want to move its contents into an external table that is stored on Amazon s3.
So I created an s3 backed table like so:
CREATE EXTERNAL TABLE IF NOT EXISTS export.export_table
like table_to_be_exported
ROW FORMAT SERDE ...
with SERDEPROPERTIES ('fieldDelimiter'='|')
STORED AS TEXTFILE
LOCATION 's3a://bucket/folder';
Then I run: INSERT INTO export.export_table SELECT * FROM table_to_be_exported
It outputs the following
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : Starting Job = job_1435176004514_0028, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1435176004514_0028/
INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1435176004514_0028
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO : 2015-07-06 09:22:18,379 Stage-1 map = 0%, reduce = 0%
INFO : 2015-07-06 09:22:27,795 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.9 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 900 msec
INFO : Ended Job = job_1435176004514_0028
INFO : Stage-4 is selected by condition resolver.
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-5 is filtered out by condition resolver.
INFO : Moving data to: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10000 from s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002
ERROR : Failed with exception Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
java.lang.IllegalArgumentException: Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1916)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1187)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2449)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)
I have s3a key and secret set in my hadoop core-site.xml and am able to do reads and writes from s3 using hadoop directly hdfs dfs -ls s3a://.
Any guesses as to what I could do to get this to work?
Try using s3 instead of s3a, my guess is that s3a is not supported yet in EMR's Hive distribution.

java.io.IOException: Job status not available about hive

When I use hive with select * from table_name;, it works.
When I use select t.a from table_name t OR select * from table_name where ..., the following error happens :
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1414949555870_118360, Tracking URL = N/A
Kill Command = /usr/local/hadoop-2.5.1/bin/hadoop job -kill job_1414949555870_118360
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
at org.apache.hadoop.mapreduce.Job.getJobState(Job.java:347)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.getJobState(JobClient.java:295)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.isJobPreparing(HadoopShimsSecure.java:104)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:541)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1414949555870_118360 with exception java.io.IOException(Job status not available )
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask`
So, what's wrong about hive?
DEBUG info as follow, maybe useful!! Please help me!!!
15/04/14 16:53:48 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client#60cb201b
15/04/14 16:53:48 DEBUG mapred.ClientServiceDelegate: Failed to contact AM/History for job job_1414949555870_118441 retrying..
java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "slave109":43759; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost

Datastax Enterprise 3.2 hive timeout exception

I tried to run simple hive query through Datastax Enterprise, but it always fails with timeout (on small data set or even empty tables). I've got 4 nodes of m1.large on AWS (2x Cassandra & 2x Analytics). See below:
cqlsh:intracker> select count(*) from event_tracks_by_browser_date LIMIT 100000;
count
-------
15030
Then with hive:
hive> select * from event_tracks_by_browser_date where type_id=10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312292327_0003, Tracking URL = http://node3:50030/jobdetails.jsp?jobid=job_201312292327_0003
Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=10.234.9.204:8012 -kill job_201312292327_0003
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2013-12-30 10:30:27,578 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:31:27,890 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:32:28,137 Stage-1 map = 0%, reduce = 0%
2013-12-30 10:33:12,344 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201312292327_0003 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201312292327_0003_m_000003 (and more) from job job_201312292327_0003
Exception in thread "Thread-10" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:240)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:227)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://node2:50060/tasklog?taskid=attempt_201312292327_0003_m_000000_1&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
at java.net.URL.openStream(URL.java:1037)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:192)
... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I checked system.log and it looks like sime kind of timeout appears.
INFO [IPC Server handler 6 on 8012] 2013-12-30 10:32:24,880 TaskInProgress.java (line 551) Error from attempt_201312292327_0003_m_000001_2: java.io.IOException: java.io.IOEx
ception: java.lang.RuntimeException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: java.io.IOException: java.lang.RuntimeException
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:100)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)
... 9 more
Caused by: java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:648)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:94)
... 10 more
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:42710)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1724)
at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1709)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:637)
... 14 more
Cassandra CQLSH works with no problems... any idea??
Try increasing your replication factor to 2 for Analytics.

Resources