Unable to load an file from ADLS(azure Datalake) to Hive table - hadoop

When ever i am tring to load a file from my azure datalake storage to an Hive table using below command,
hiveContext.sql(LOAD DATA INPATH 'adl://bienodad56872stgadlstemp.azuredatalakestore.net/Enriched/Nielsen/NielsenScantrack/Incremental_withoutRepartition/NLS_SYN_SCT.csv' OVERWRITE INTO TABLE sample.test03)
i am getting an error :ApplicationMaster: User class threw exception: java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
Whole error Log:
17/07/05 05:45:48 INFO SparkSqlParser: Parsing command: CREATE TABLE IF NOT EXISTS sample.test03 ( GEO STRING,UPC STRING,WeekEnding STRING,BaseDollars INT,BaseDollars_AnyPromo INT,BaseDollars_Display INT,BaseDollars_FeatAndDisp INT,BaseDollars_FeatAndOrDisp INT,BaseDollars_Feature INT,BaseDollars_NoPromo INT,BaseDollars_TPR INT,BaseUnits INT,BaseUnits_AnyPromo INT,BaseUnits_Display INT,BaseUnits_EQ STRING,BaseUnits_EQ_AnyPromo STRING,BaseUnits_EQ_Display STRING,BaseUnits_EQ_FeatAndDisp STRING,BaseUnits_EQ_FeatAndOrDisp STRING,BaseUnits_EQ_Feature STRING,BaseUnits_EQ_NoPromo STRING,BaseUnits_EQ_TPR STRING,BaseUnits_FeatAndDisp INT,BaseUnits_FeatAndOrDisp INT,BaseUnits_Feature INT,BaseUnits_NoPromo INT,BaseUnits_TPR INT,Dollars INT,Dollars_AnyPromo INT,Dollars_Display INT,Dollars_FeatAndDisp INT,Dollars_FeatAndOrDisp INT,Dollars_Feature INT,Dollars_NoPromo INT,Dollars_TPR INT,PACV_Discount INT,PACV_DispWOFeat INT,PACV_FeatAndDisp INT,PACV_FeatWODisp INT,Units INT,Units_AnyPromo INT,Units_Display INT,Units_EQ INT,Units_EQ_AnyPromo STRING,Units_EQ_Display STRING,Units_EQ_FeatAndDisp STRING,Units_EQ_FeatAndOrDisp STRING,Units_EQ_Feature STRING,Units_EQ_NoPromo STRING,Units_EQ_TPR STRING,Units_FeatAndDisp INT,Units_FeatAndOrDisp INT,Units_Feature INT,Units_NoPromo INT,Units_TPR INT,ACV INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
17/07/05 05:45:49 INFO SparkSqlParser: Parsing command: LOAD DATA INPATH 'adl://bienodad56872stgadlstemp.azuredatalakestore.net/Enriched/Nielsen/NielsenScantrack/Incremental_withoutRepartition/NLS_SYN_SCT.csv' OVERWRITE INTO TABLE sample.test03
17/07/05 05:45:49 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem.
17/07/05 05:45:49 INFO Hive: Replacing src:adl://bienodad56872stgadlstemp.azuredatalakestore.net/Enriched/Nielsen/NielsenScantrack/Incremental_withoutRepartition/NLS_SYN_SCT.csv, dest: wasb://bieno-da-d-56872-unilevercom-hdi-01#049bienobrunilevercomstg.blob.core.windows.net/hive/warehouse/sample.db/test03/NLS_SYN_SCT.csv, Status:false
17/07/05 05:45:49 ERROR ApplicationMaster: User class threw exception: java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:633)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:646)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:646)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:646)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:645)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:248)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:246)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:246)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:246)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:297)
at org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:335)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
at com.accenture.Unilever.Nielsen.RestatementSample.Restatement(RestatementSample.scala:70)
at com.accenture.Unilever.StageToEnrich.RestatementLogic$.main(RestatementLogic.scala:36)
at com.accenture.Unilever.StageToEnrich.RestatementLogic.main(RestatementLogic.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error moving: adl://bienodad56872stgadlstemp.azuredatalakestore.net/Enriched/Nielsen/NielsenScantrack/Incremental_withoutRepartition/NLS_SYN_SCT.csv into: wasb://bieno-da-d-56872-unilevercom-hdi-01#049bienobrunilevercomstg.blob.core.windows.net/hive/warehouse/sample.db/test03/NLS_SYN_SCT.csv
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2919)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1640)
... 44 more
Caused by: java.io.IOException: Error moving: adl://bienodad56872stgadlstemp.azuredatalakestore.net/Enriched/Nielsen/NielsenScantrack/Incremental_withoutRepartition/NLS_SYN_SCT.csv into: wasb://bieno-da-d-56872-unilevercom-hdi-01#049bienobrunilevercomstg.blob.core.windows.net/hive/warehouse/sample.db/test03/NLS_SYN_SCT.csv
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2913)
... 45 more
I can execute the same code from HIVE shell but from spark script i am getting this error. Is there any special Jar file i need to include. Any help will be appreciated.

Related

Spark and Hive interoperability

I am using EMR-4.3.0, Spark 1.6.0, Hive 1.0.0.
I write a table like so (pseudocode) -
val df = <a dataframe>
df.registerTempTable("temptable")
sqlContext.setConf("hive.exec.dynamic.partition.mode","true")
sqlContext.sql("create external table exttable ( some columns ... )
partitioned by (partkey int) stored as parquet location 's3://some.bucket'")
sqlContext.sql("insert overwrite exttable partition(partkey) select columns from
temptable")
The write works fine and I can read the table back using -
sqlContext.sql("select * from exttable")
However, when I try to read the table using Hive as -
hive -e 'select * from exttable'
Hive throws a NullPointerException with the stack trace below. Any help appreciated! -
questTime=[0.008], ResponseProcessingTime=[0.295], HttpClientSendRequestTime=[0.313],
2016-05-19 03:08:02,537 ERROR [main()]: CliDriver (SessionState.java:printError(833)) - Failed with exception java.io.IOException:java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1619)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:221)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:247)
at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:368)
at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:346)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:296)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:254)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:200)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:498)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)
... 15 more
UPDATE - After messing around for a bit, it seems like null values in the data mess Hive up. How do I avoid this?

pig ERROR 1200: null when using fs commands

while running pig in mapreduce mode im occuring really strange error.
The pigscript.pig contains....
x= load 'hdfs://file.avro' USING AvroStorage();
some transofrmations...
fs mv src/file dest/file;
up to this point all works fine, but script continues as
y = load 'hdfs://file2.avro' USING AvroStorage();
While executed previous command i got error bellow. I double check and the file2.avro is there ... stored in the HDFS.
When I quit pig and re-run the code from the line
y = load 'hdfs://file2.avro' USING AvroStorage();
all works fine.
Any idea?
Pig Stack Trace
---------------
ERROR 1200: null
Failed to parse: null
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:298)
at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:282)
at org.apache.pig.builtin.AvroStorage.getSchema(AvroStorage.java:256)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 16 more
================================================================================

LOAD HADOOP fails while pulling from Teradata

I am using IBM BigInsights version 4.1.0.
I used the below command to pull data from teradata.
LOAD HADOOP USING JDBC CONNECTION URL 'jdbc:teradata://<<ip_address>>/database=<<db_name>>' WITH PARAMETERS ('user' = '<<user_name>>','password'='<<password>>') FROM TABLE <<table_name>> COLUMNS (<<COL1, COL2, COL3, .... COLN>>) SPLIT COLUMN <<COLM>> INTO TABLE <<Target_bigsql_schema>>.<<target_bigsql_table>> APPEND WITH LOAD PROPERTIES ('tdch.enable'='true');
The error I am getting while executing the above command is below
2015-12-10 14:21:01,336 ERROR com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool [Thread-3] : Teradata Connector for Hadoop tool error.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:618)
at com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool.callTDCH(TDImportTool.java:104)
at com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool.run(TDImportTool.java:72)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at com.ibm.biginsights.ie.db.SqoopUtils.runSqoopTool(SqoopUtils.java:146)
at com.ibm.biginsights.ie.db.DBImportImpl.importData(DBImportImpl.java:159)
at com.ibm.biginsights.ie.impl.ImporterImpl.executeImport(ImporterImpl.java:504)
at com.ibm.biginsights.ie.impl.ImporterImpl.executePerformImport(ImporterImpl.java:417)
at com.ibm.biginsights.ie.impl.ImporterImpl.performImport(ImporterImpl.java:264)
at com.ibm.biginsights.biga.udf.LoadTool.performImport(LoadTool.java:214)
at com.ibm.biginsights.biga.udf.BIGSQL_DDL.doLoadStatement(BIGSQL_DDL.java:644)
at com.ibm.biginsights.biga.udf.BIGSQL_DDL.processDDL(BIGSQL_DDL.java:207)
Caused by: com.teradata.connector.common.exception.ConnectorException: Hive table's InputFormat class is not supported
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)
... 17 more
2015-12-10 14:21:01,337 ERROR org.apache.sqoop.Sqoop [Thread-3] : Got exception running Sqoop: java.lang.RuntimeException: com.teradata.connector.common.exception.ConnectorException: Hive table's InputFormat class is not supported
2015-12-10 14:21:01,337 ERROR com.ibm.biginsights.ie.db.DBImportImpl [Thread-3] : Error during import
java.lang.RuntimeException: com.teradata.connector.common.exception.ConnectorException: Hive table's InputFormat class is not supported
at com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool.callTDCH(TDImportTool.java:123)
at com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool.run(TDImportTool.java:72)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at com.ibm.biginsights.ie.db.SqoopUtils.runSqoopTool(SqoopUtils.java:146)
at com.ibm.biginsights.ie.db.DBImportImpl.importData(DBImportImpl.java:159)
at com.ibm.biginsights.ie.impl.ImporterImpl.executeImport(ImporterImpl.java:504)
at com.ibm.biginsights.ie.impl.ImporterImpl.executePerformImport(ImporterImpl.java:417)
at com.ibm.biginsights.ie.impl.ImporterImpl.performImport(ImporterImpl.java:264)
at com.ibm.biginsights.biga.udf.LoadTool.performImport(LoadTool.java:214)
at com.ibm.biginsights.biga.udf.BIGSQL_DDL.doLoadStatement(BIGSQL_DDL.java:644)
at com.ibm.biginsights.biga.udf.BIGSQL_DDL.processDDL(BIGSQL_DDL.java:207)
Caused by: com.teradata.connector.common.exception.ConnectorException: Hive table's InputFormat class is not supported
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:618)
at com.ibm.biginsights.ie.sqoop.td.wrapper.TDImportTool.callTDCH(TDImportTool.java:104)
... 12 more
2015-12-10 14:21:01,337 ERROR com.ibm.biginsights.ie.db.DBImportImpl [Thread-3] : [BSL-0-18c443e19]: Error during import (Job Id = ):com.teradata.connector.common.exception.ConnectorException: Hive table's InputFormat class is not supported
Is there any possible resolution for this?
Teradata's native CHAR and VARCHAR is not supported in TDCH.
http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.1.0/com.ibm.swg.im.infosphere.biginsights.db2biga.doc/doc/biga_load_from.html?lang=en

Issue in Hive-0.14.0 Integration in HBase-0.98.8-hadoop2

I have hive 0.14.1 hbase 0.98.8 and hadoop 2.5.0 I am trying to integrate hive with hbase and put zookeeper-3.4.6.jar,hbase-common-0.98.8-hadoop2.jar file from HBase/lib to Hive/lib. Steps Followed are as follows:
1.hive --auxpath $HIVE_HOME/lib/hive-hbase-handler-0.14.1.jar,$HIVE_HOME/lib/hbase-common-0.98.8-hadoop2.jar,$HIVE_HOME/lib/zookeeper-3.4.6.jar,$HIVE_HOME/lib/guava-11.0.2.jar -hiveconf hbase.master=masternode:60000
2.CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
This Command does not works for me.It give me error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$Builder.setCaching(I)Lorg/apache/hadoop/hbase/protobuf/generated/ClientProtos$Scan$Builder;
at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:216)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:125)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:93)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:745)
at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:542)
at org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:310)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:307)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:321)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:182)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:602)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
at com.sun.proxy.$Proxy8.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:670)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3959)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:295)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$Builder.setCaching(I)Lorg/apache/hadoop/hbase/protobuf/generated/ClientProtos$Scan$Builder;
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:879)
at org.apache.hadoop.hbase.protobuf.RequestConverter.buildScanRequest(RequestConverter.java:488)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:303)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:117)
... 40 more
)
You may need to add the hbase-protocol JAR to the classpath. The symbol that could not be resolved per your stack trace appears to come from there, at least based on cursory inspection only.

"Could not get input splits" Error, with Hive-Cassandra-CqlStorageHandler

Im trying to read data from cassandra using Hive with CqlStorageHandler.
The versions:
Hive 0.11.0
Hadoop 1.2.1
Cassandra 1.2.6
Im able to create EXTERNAL table with the following HIVE Query
CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160") TBLPROPERTIES ("cassandra.input.split.size" = "64000","cassandra.range.size" = "1000","cassandra.slice.predicate.size" = "1000");
(The table "input" is already existing and containing some data in cassandra created with CQL3)
However, When I try to read data with the following query
select * from input where number="1";
Im facing the folowing issue:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Could not get input splits
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:213)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:169)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:297)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:185)
... 31 more
Caused by: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:444)
at java.lang.Long.valueOf(Long.java:540)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:239)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:207)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.io.IOException(Could not get input splits)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
Am I missing anything? Kindly advise.

Resources