Sparksession error is about hive - hadoop

My OS is windows 10
from pyspark.conf import SparkConf
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
This code gives me below error
Py4JJavaError Traceback (most recent call
last)
~\Documents\spark\spark-2.1.0-bin-hadoop2.7\python\pyspark\sql\utils.py
in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
~\Documents\spark\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py
in get_return_value(answer, gateway_client, target_id, name)
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
Py4JJavaError: An error occurred while calling o22.sessionState. :
java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState': at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at
py4j.Gateway.invoke(Gateway.java:280) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:214) at
java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 13 more Caused by: java.lang.IllegalArgumentException: Error
while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at
org.apache.spark.sql.internal.SharedState.(SharedState.scala:86)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at
org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at
org.apache.spark.sql.internal.SessionState.(SessionState.scala:157)
at
org.apache.spark.sql.hive.HiveSessionState.(HiveSessionState.scala:32)
... 18 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
... 26 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270)
at
org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:65)
... 31 more Caused by: java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:192)
... 39 more Caused by: java.lang.RuntimeException: Unable to
instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
My full code is here
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
import findspark
findspark.init('C:/Users/asus/Documents/spark/spark-2.1.0-bin-hadoop2.7')
import pyspark from pyspark.conf
import SparkConf sc = SparkContext.getOrCreate()
spark = SparkSession.builder.enableHiveSupport().getOrCreate()

From the code you posted it seems you are a Java developer or maybe you were in a hurry to paste the code. In python, you do not write variables with their types like we do in Java i.e
SparkContext sc =SparkContext.getOrCreate().
Also, starting from Spark version 2.0+, you need to create a SparkSession object which is the entry point to your application. you derive your SparkContext from this object itself. Trying to create another SparkContext "sc = SparkContext.getOrCreate()" results in errors. This is due to the fact that by design, only a single SparkContext can run in a given single JVM. if a new Context is required you need to stop the previously created SparkContext with sc.stop().
That being said from your stack-trace and code I also think you are testing your app locally and do not have a Hadoop and Hive installation on your local machine which is giving you the error:
Caused by: java.lang.RuntimeException: java.lang.RuntimeException:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at ...
You can install Hadoop and Hive on your Windows machine and try out the following code snippet.
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName('CalculatingGeoDistances') \
.enableHiveSupport() \
.getOrCreate()
sc = spark.sparkContext

Related

Can Create Hive Table using XML but having an error when Querying

I need to create a hive table using a set of xml files in a hadoop directory.
I have tried using the code below:
add jar hdfs:///user/hivexmlserde-1.0.5.3.jar;
CREATE EXTERNAL TABLE test
(sport STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.sport"="/event_list/event/#Sport")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION'/user/2019-04-21'
TBLPROPERTIES (
"xmlinput.start"="<event_list>",
"xmlinput.end"="</event_list>");
It returns the error below:
So I tried adding another jar file.
add jar hdfs:///user/hive_serde.jar;
and with this I can create the table.
Querying select * from test limit 10; will return a result but when querying with aggregates like select sport, count(*) from test limit 10; it returns an error below:
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1555929930390_0107_1_00, diagnostics=[Vertex vertex_1555929930390_0107_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://sbx-hdp-mn01.ourbiworld.prod:8020/tmp/hive/hive/937f8627-235f-4adb-a06d-38a975990aaa/hive_2019-04-27_05-34-39_420_1560175914632249959-28/hive/_tez_scratch_dir/025ba737-1593-4ccc-b255-5959bc6d4b70/map.xml
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:509)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:342)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:137)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer" for class: org.apache.hadoop.hive.ql.metadata.VirtualColumn
Serialization trace:
virtualCols (org.apache.hadoop.hive.ql.plan.TableScanDesc)
conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:707)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:470)
... 32 more
Caused by: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer" for class: org.apache.hadoop.hive.ql.metadata.VirtualColumn
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:67)
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:359)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:74)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:490)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getSerializer(Kryo.java:505)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:120)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 51 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:60)
... 61 more
Caused by: java.lang.NoSuchFieldError: stringTypeInfo
at org.apache.hadoop.hive.ql.metadata.VirtualColumn.<clinit>(VirtualColumn.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.lang.Class.getEnumConstantsShared(Class.java:3320)
at java.lang.Class.getEnumConstants(Class.java:3297)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer.<init>(DefaultSerializers.java:392)
... 66 more
]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1555929930390_0107_1_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1555929930390_0107_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : AM_CPU_MILLISECONDS: 500
INFO : AM_GC_TIME_MILLIS: 0
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1555929930390_0107_1_00, diagnostics=[Vertex vertex_1555929930390_0107_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://sbx-hdp-mn01.ourbiworld.prod:8020/tmp/hive/hive/937f8627-235f-4adb-a06d-38a975990aaa/hive_2019-04-27_05-34-39_420_1560175914632249959-28/hive/_tez_scratch_dir/025ba737-1593-4ccc-b255-5959bc6d4b70/map.xml
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:509)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:342)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:137)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer" for class: org.apache.hadoop.hive.ql.metadata.VirtualColumn
Serialization trace:
virtualCols (org.apache.hadoop.hive.ql.plan.TableScanDesc)
conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:707)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:470)
... 32 more
Caused by: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer" for class: org.apache.hadoop.hive.ql.metadata.VirtualColumn
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:67)
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:359)
at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:74)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:490)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.getSerializer(Kryo.java:505)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:120)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 51 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:60)
... 61 more
Caused by: java.lang.NoSuchFieldError: stringTypeInfo
at org.apache.hadoop.hive.ql.metadata.VirtualColumn.<clinit>(VirtualColumn.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.lang.Class.getEnumConstantsShared(Class.java:3320)
at java.lang.Class.getEnumConstants(Class.java:3297)
at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer.<init>(DefaultSerializers.java:392)
... 66 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1555929930390_0107_1_01, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1555929930390_0107_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO : Completed executing command(queryId=hive_20190427053439_1914cb17-3ac9-43ce-9d1a-0fde3b0e117e); Time taken: 3.29 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1555929930390_0107_1_00, diagnostics=[Vertex vertex_1555929930390_0107_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://sbx-hdp-mn01.ourbiworld.prod:8020/tmp/hive/hive/937f8627-235f-4adb-a06d-38a975990aaa/hive_2019-04-27_05-34-39_420_1560175914632249959-28/hive/_tez_scratch_dir/025ba737-1593-4ccc-b255-5959bc6d4b70/map.xml
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:509)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:342)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:137)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$EnumSerializer" for class: org.apache.hadoop.hive.ql.metadata.VirtualColumn
Serialization trace:
virtualCols (org.apache.hadoop.hive.ql.plan.TableScanDesc)
conf (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:707)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)
at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:470)
... 32 more
The problem is I can create a table but cannot query.
Thanks for any help.
Your aggregation query does not contain group by, this is allowed only if you are aggregating all columns in the select. Add group by sport:
select sport, count(*) cnt from test group by sport limit 10;

Spark2 data load issue in instantiating HiveSessionState

Getting following issues during data read using Spark2 with cluster mode.
"java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
I am absolutely clueless about this issue after googling a lot. Please help.
The code I have run
spark = SparkSession.builder.getOrCreate();
val lines: Dataset[String] = spark.read.textFile("/data/sample/abc.csv").
Exception is coming from above line.
Exception full stack Trace:
ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:549)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:605)
at com.abcd.Learning$.main(Learning.scala:26)
at com.abcd.Learning.main(Learning.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:646)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 11 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 16 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
... 24 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:353)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:257)
at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
... 29 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:548)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 37 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2705)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:97)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2748)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2730)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:356)
at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:666)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:593)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:526)
... 38 more
Caused by: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
... 48 more
Similar to the solution given here worked for me.
I did the following
zipped the spark jars directory here: /usr/local/Cellar/apache-spark/2.1.0/libexec/jars, and named it spark-jars.zip
copied the spark-jars.zip to hdfs: $ hdfs dfs -copyFromLocal /usr/local/Cellar/apache-spark/2.1.0/libexec/spark-jars.zip hdfs:/user/<username>/
passed the spark-jars.zip location in the configuration while executing the spark job: $ HADOOP_CONF_DIR=/Users/<username>/hadoop_conf spark-submit --conf spark.yarn.archive=hdfs:/user/<username>/spark-jars.zip --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --class "com.<whatever>.<package>" --master yarn --deploy-mode cluster --queue online1 --driver-memory 3G --executor-memory 3G ./build/libs/<main class>.jar

some errors on our Spark, our spark yarn model can not work any away

The system report the error log tips under the lines, it saying that we have some problems in our hive system, but i had reviews so many times on my hive conf file, don't find any problems:
================================== Traceback (most recent call last): File "/data/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw) File "/data/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line
319, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o22.sessionState. :
java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 13 more Caused by: java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 18 more Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.hive.HiveExternalCatalog.<init>(org.apache.spark.SparkConf,
org.apache.hadoop.conf.Configuration)
at java.lang.Class.getConstructor0(Class.java:2892)
at java.lang.Class.getDeclaredConstructor(Class.java:2058)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:164)
... 26 more

org.apache.hadoop.hbase.io.ImmutableBytesWritable exception in HBase

We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0):
import sys
from pyspark import SparkContext
if __name__ == "__main__":
if len(sys.argv) != 3:
print >> sys.stderr, """
Usage: hbase_inputformat <host> <table>
Run with example jar:
./bin/spark-submit --driver-class-path /path/to/example/jar \
/path/to/examples/hbase_inputformat.py <host> <table>
Assumes you have some data in HBase already, running on <host>, in <table>
"""
exit(-1)
host = sys.argv[1]
table = sys.argv[2]
sc = SparkContext(appName="HBaseInputFormat")
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD(
"org.apache.hadoop.hbase.mapreduce.TableInputFormat",
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"org.apache.hadoop.hbase.client.Result",
keyConverter=keyConv,
valueConverter=valueConv,
conf=conf)
output = hbase_rdd.collect()
for (k, v) in output:
print (k, v)
sc.stop()
We got the following error:
15/10/14 12:46:24 INFO BlockManagerMaster: Registered BlockManager
Traceback (most recent call last):
File "/opt/python/son.py", line 30, in
conf=conf)
File "/usr/hdp/2.3.0.0-2557/spark/python/pyspark/context.py", line 547, in newAPIHadoopRDD
jconf, batchSize)
File "/usr/hdp/2.3.0.0-2557/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call
File "/usr/hdp/2.3.0.0-2557/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:157)
at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDDFromClassNames(PythonRDD.scala:509)
at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:494)
at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Any insights are highly appreciated.
The error occurs because you haven't got the HBase libs in your classpath. You will need hbase-common and hbase-client jars, which you should pass to pyspark via the --jars parameters
I resolved this by execute the MapReduce Job by adding hbase-common.jar in environment variable: HADOOP_CLASSPATH:
export
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/hbase-common-1.3.1.jar

"Could not get input splits" Error, with Hive-Cassandra-CqlStorageHandler

Im trying to read data from cassandra using Hive with CqlStorageHandler.
The versions:
Hive 0.11.0
Hadoop 1.2.1
Cassandra 1.2.6
Im able to create EXTERNAL table with the following HIVE Query
CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160") TBLPROPERTIES ("cassandra.input.split.size" = "64000","cassandra.range.size" = "1000","cassandra.slice.predicate.size" = "1000");
(The table "input" is already existing and containing some data in cassandra created with CQL3)
However, When I try to read data with the following query
select * from input where number="1";
Im facing the folowing issue:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Could not get input splits
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:213)
at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:169)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:297)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:185)
... 31 more
Caused by: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:444)
at java.lang.Long.valueOf(Long.java:540)
at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:188)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:239)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:207)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.io.IOException(Could not get input splits)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
Am I missing anything? Kindly advise.

Resources