org.apache.hadoop.hbase.io.ImmutableBytesWritable exception in HBase - hadoop

We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0):
import sys
from pyspark import SparkContext
if __name__ == "__main__":
if len(sys.argv) != 3:
print >> sys.stderr, """
Usage: hbase_inputformat <host> <table>
Run with example jar:
./bin/spark-submit --driver-class-path /path/to/example/jar \
/path/to/examples/hbase_inputformat.py <host> <table>
Assumes you have some data in HBase already, running on <host>, in <table>
"""
exit(-1)
host = sys.argv[1]
table = sys.argv[2]
sc = SparkContext(appName="HBaseInputFormat")
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)
output = hbase_rdd.collect()
for (k, v) in output:
print (k, v)
sc.stop()
We got the following error:
15/10/14 12:46:24 INFO BlockManagerMaster: Registered BlockManager
Traceback (most recent call last):
File "/opt/python/son.py", line 30, in
conf=conf)
File "/usr/hdp/2.3.0.0-2557/spark/python/pyspark/context.py", line 547, in newAPIHadoopRDD
jconf, batchSize)
File "/usr/hdp/2.3.0.0-2557/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call
File "/usr/hdp/2.3.0.0-2557/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:157)
at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDDFromClassNames(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:494)
at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Any insights are highly appreciated.

The error occurs because you haven't got the HBase libs in your classpath. You will need hbase-common and hbase-client jars, which you should pass to pyspark via the --jars parameters

I resolved this by execute the MapReduce Job by adding hbase-common.jar in environment variable: HADOOP_CLASSPATH:
export
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/hbase-common-1.3.1.jar

Related

Sparksession error is about hive

My OS is windows 10
from pyspark.conf import SparkConf
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
This code gives me below error
Py4JJavaError Traceback (most recent call
last)
~\Documents\spark\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py
in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
~\Documents\spark\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py
in get_return_value(answer, gateway_client, target_id, name)
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
Py4JJavaError: An error occurred while calling o22.sessionState. :
java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState': at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at
py4j.Gateway.invoke(Gateway.java:280) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:214) at
java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 13 more Caused by: java.lang.IllegalArgumentException: Error
while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at
org.apache.spark.sql.internal.SharedState.(SharedState.scala:86)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at
org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at
org.apache.spark.sql.internal.SessionState.(SessionState.scala:157)
at
org.apache.spark.sql.hive.HiveSessionState.(HiveSessionState.scala:32)
... 18 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
... 26 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270)
at
org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:65)
... 31 more Caused by: java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:192)
... 39 more Caused by: java.lang.RuntimeException: Unable to
instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
My full code is here
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
import findspark
findspark.init('C:/Users/asus/Documents/spark/spark-2.1.0-bin-hadoop2.7')
import pyspark from pyspark.conf
import SparkConf sc = SparkContext.getOrCreate()
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
From the code you posted it seems you are a Java developer or maybe you were in a hurry to paste the code. In python, you do not write variables with their types like we do in Java i.e
SparkContext sc =SparkContext.getOrCreate().
Also, starting from Spark version 2.0+, you need to create a SparkSession object which is the entry point to your application. you derive your SparkContext from this object itself. Trying to create another SparkContext "sc = SparkContext.getOrCreate()" results in errors. This is due to the fact that by design, only a single SparkContext can run in a given single JVM. if a new Context is required you need to stop the previously created SparkContext with sc.stop().
That being said from your stack-trace and code I also think you are testing your app locally and do not have a Hadoop and Hive installation on your local machine which is giving you the error:
Caused by: java.lang.RuntimeException: java.lang.RuntimeException:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at ...
You can install Hadoop and Hive on your Windows machine and try out the following code snippet.
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName('CalculatingGeoDistances') \
.enableHiveSupport() \
.getOrCreate()
sc = spark.sparkContext

Error with hc=H2OContext.getOrCreate(sc) in pysparkling

I am new in Pysparkling. I work with yarn cluster, Spark 1.6, Cloudera CDH 5.8.0,python 2.7.6 and i have problem with hc=H2OContext.getOrCreate(sc). Do you have some ideas ?
from pysparkling import * import h2o hc = H2OContext.getOrCreate(sc)
17/04/16 17:13:59 INFO spark.SparkContext: Added JAR /root/.cache/Python-Eggs/h2o_pysparkling_1.6-1.6.10-py2.7.eg g-tmp/sparkling_water/sparkling_water_assembly.jar at spark://147.232.202.114:47251/jars/sparkling_water_assembly .jar with timestamp 1492355639066
17/04/16 17:13:59 WARN internal.InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000 17/04/16 17:13:59 WARN internal.InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based jo ins We recommend to disable them by configuring spark.sql.autoBroadcastJoinThreshold variable to value -1: sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
17/04/16 17:13:59 WARN internal.InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified! We recommend to pass --conf spark.scheduler.minRegisteredResourcesRatio=1
17/04/16 17:13:59 WARN internal.InternalH2OBackend: Unsupported options spark.dynamicAllocation.enabled detected!
17/04/16 17:13:59 WARN internal.InternalH2OBackend: The application is going down, since the parameter (spark.ext.h2o.fail.on.unsupported.spark.param,true) is true! If you would like to skip the fail call, please, specify the value of the parameter to false.
Traceback (most recent call last): File "", line 1, in File "build/bdist.linux-x86_64/egg/pysparkling/context.py", line 128, in getOrCreate File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway. py", line 813, in call File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/sql/utils.py", line 45, in deco return f(a, *kw) File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o54.invoke. : java.lang.IllegalArgumentException: Unsupported argument: (spark.dynamicAllocation.enabled,true) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$checkUnsupportedSparkOptions$1.ap ply(InternalBackendUtils.scala:48) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$checkUnsupportedSparkOptions$1.ap ply(InternalBackendUtils.scala:40) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.checkUnsupportedSparkOptions(Interna lBackendUtils.scala:40) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.checkUnsupportedSparkOptions(InternalH2OBack end.scala:31) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.checkAndUpdateConf(InternalH2OBackend.scala: 61) at org.apache.spark.h2o.H2OContext.(H2OContext.scala:96) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:294) at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala) at org.apache.spark.h2o.JavaH2OContext.getOrCreate(JavaH2OContext.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)
This can be solved by running the program with the following command, this has been tested with spark 1.6 and H2O version > 3.0
bin/pysparkling h2o_spark.py --conf spark.ext.h2o.fail.on.unsupported.spark.param=false

pig ERROR 1200: null when using fs commands

while running pig in mapreduce mode im occuring really strange error.
The pigscript.pig contains....
x= load 'hdfs://file.avro' USING AvroStorage();
some transofrmations...
fs mv src/file dest/file;
up to this point all works fine, but script continues as
y = load 'hdfs://file2.avro' USING AvroStorage();
While executed previous command i got error bellow. I double check and the file2.avro is there ... stored in the HDFS.
When I quit pig and re-run the code from the line
y = load 'hdfs://file2.avro' USING AvroStorage();
all works fine.
Any idea?
Pig Stack Trace
---------------
ERROR 1200: null
Failed to parse: null
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:298)
at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:282)
at org.apache.pig.builtin.AvroStorage.getSchema(AvroStorage.java:256)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 16 more
================================================================================

NLTK StanfordPOSTagger not working - Windows

OS:Windows 10 x64
Python: 2.7.3
NLTK: 3.1
I want to use Stanford Pos tagger in python so based on documentation in here, I did these:
I've downloaded stanford-postagger-2015-04-20.zip from here and extract it to D:\Downloads\stanford-postagger-2015-04-20\stanford-postagger-2015-04-20
I've created two environment variables:
CLASSPATH >>> D:\Downloads\stanford-postagger-2015-04-20\stanford-postagger-2015-04-20\stanford-postagger.jar
STANFORD_MODELS >>> D:\Downloads\stanford-postagger-2015-04-20\stanford-postagger-2015-04-20\models\
Using this code to use the tagger:
import os
java_path = "C:/Program Files (x86)/Java/jdk1.7.0_71/bin/java.exe"
os.environ['JAVAHOME'] = java_path
from nltk.tag import StanfordPOSTagger
st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
st.tag('What is the airspeed of an unladen swallow ?'.split())
raises this error:
java.lang.UnsupportedClassVersionError: edu/stanford/nlp/tagger/maxent/MaxentTagger : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Exception in thread "main"
Traceback (most recent call last):
File "<tmp 1>", line 9, in <module>
st.tag('What is the airspeed of an unladen swallow ?'.split())
File "c:\python27\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
return sum(self.tag_sents([tokens]), [])
File "c:\python27\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
stdout=PIPE, stderr=PIPE)
File "c:\python27\lib\site-packages\nltk\internals.py", line 134, in java
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : ['C:/Program Files (x86)/Java/jdk1.7.0_71/bin/java.exe', '-mx1000m', '-cp', 'D:\\Downloads\\stanford-postagger-2015-04-20\\stanford-postagger-2015-04-20\\stanford-postagger.jar', 'edu.stanford.nlp.tagger.maxent.MaxentTagger', '-model', 'D:\\Downloads\\stanford-postagger-2015-04-20\\stanford-postagger-2015-04-20\\models\\english-bidirectional-distsim.tagger', '-textFile', 'c:\\users\\wiki\\appdata\\local\\temp\\tmp0ruajz', '-tokenize', 'false', '-outputFormatOptions', 'keepEmptySentences', '-encoding', 'utf8']
What should I do?

"Could not get input splits" Error, with Hive-Cassandra-CqlStorageHandler

Im trying to read data from cassandra using Hive with CqlStorageHandler.
The versions:
Hive 0.11.0
Hadoop 1.2.1
Cassandra 1.2.6
Im able to create EXTERNAL table with the following HIVE Query
CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160") TBLPROPERTIES ("cassandra.input.split.size" = "64000","cassandra.range.size" = "1000","cassandra.slice.predicate.size" = "1000");
(The table "input" is already existing and containing some data in cassandra created with CQL3)
However, When I try to read data with the following query
select * from input where number="1";
Im facing the folowing issue:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Could not get input splits
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:213)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:169)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:297)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:185)
... 31 more
Caused by: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:444)
at java.lang.Long.valueOf(Long.java:540)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:239)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:207)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.io.IOException(Could not get input splits)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
Am I missing anything? Kindly advise.

Resources