Unable to create sql context with hive support in spark - hadoop

In spark 1.6
After creating a soft link under /etc/spark/conf when I am running spark-shell it is not creating sql context with hive support and
error---“native snappy library not available: This version of hadoop was built without snappysupport” is displayed.
Please advice what can be done here?
I am trying to create sql context with hive support but unable to do so.

You can use sqlContext to access Hive tables.

Related

Connect Apache Zeppelin to Hive

I try to connect my apache zeppelin with my hive metastore. I use zeppelin 0.7.3 so there is not a hive interpreter only jdbc. I have copied my hive-site.xml to zeppelin conf folder but I don't know how to create a new hive interpreter.
I also tried to access hive tables through spark's hive context but when I try this way, I can not see my hive databases only a default database is shown.
Can someone explain either how to create a hive interpreter or how to access my hive metastore through spark correctly?
Any answer is appreciated.
I solved it by following this documentation. After adding this parameters in jdbc connector you should be able to run hive interpreter with
%jdbc(hive)
In my case it was a little trickier because I used Cloudera Hadoop so the standard jdbc hive connector was not working. So I changed the external hive-jdbc.jar with the one suitable for my cdh version (for cdh 5.9.- for example it located here).
I also find out that you can change hive.url with the one for impala port and connect with jdbc to impala if you prefer.

Issue running Hive on spark with Hive table view mapping with Hbase table : java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy

I am trying to access Hbase table by mapping it from hive through spark engine.
From Hive:
When I run the query on Hive view mapped with Hbase I could get all the desired result.
From Spark:
When i run query to fetch from hive table ,i could get it but when i do the same for hbase mapped hive table i get below error.
Error: java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters; (state=,code=0)
I like to know whether it is possible to do it through spark as I did not find any solutions from internet.
From the above error I could figure that this is some jar issue and this method exists in hive-serde jar and hive-exec jar,but tried with all possible alternative but could not succeed..
Can any one help on this..
Note:My main objective here to check the performance as it is taking good amount of time in hive and pig as well.
I'm not sure which versions of software you are using hive, hbase & spark ...
Seems like this is version mismatch issue.
Spark 1.3.1 uses Hive 0.13 API, while Spark 1.5.1 uses Hive 1.2, hbase serde that comes with hive 2.3 is not compatible with Hive 0.13 API.
I tried with old Serde for Hive 0.13, but there are lots of conflicts with Hbase API versions.
Depending on your need, you can try Spark+Hbase native integration instead of Hive-Hbase Serde.
Or You can use correct compatible versions of hive.
Also have a look at https://issues.apache.org/jira/browse/HIVE-12406 can be the same issue...

Does Hive depend on/require Hadoop?

Hive installation guide says that Hive can be applied to RDBMS, my question is, sounds like Hive can exist without Hadoop, right? It's an independent HQL engineer that could work with any data source?
You can run Hive in local mode to use it without Hadoop for debugging purposes. See below url
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-Hive,Map-ReduceandLocal-Mode
Hive provided JDBC driver to query hive like JDBC, however if you are planning to run Hive queries on production system, you need Hadoop infrastructure to be available. Hive queries eventually converts into map-reduce jobs and HDFS is used as data storage for Hive tables.

Connect tableau with Elastic search via Spark SQL

I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,

Datatax Cassandra not accessible from BYOH HiveServer2

I followed the instructions from Datastax to setup a BYOH environment using the following article: Datastax BYOH
So i have a Datastax Enterprise and Hortonworks Hadoop running on a node. I created a column family in Cassandra and inserted some sample data, and i was able to access and manipulate the data in Cassandra from Hive(which is running on Hortonworks Data Platform, not Datastax Enterprise).
Now, when i try to access the same Cassandra column family using the JDBC driver for HiveServer2, i am able to see the column family in the database but when i try to manipulate it, or even try to view it using SELECT query, or do a DESCRIBE query, i am getting the following error:
Error: Error while processing statement: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat
And the same error shows up when i try to run hive without the BYOH prefix. In a nutshell, i am only able to manipulate Cassandra data from hive when i use the byoh prefix while starting hive command line interface otherwise the above error shows up.
I am not sure what the issue is. Any help would be appreciated.
I am using:
Datastax Enterprise: 4.5.1
Cassandra: 2.0.8
Hive: 0.12
This first paragraph on this page of the docs, which perhaps you didn't see, seem to say that using the byoh prefix is required to manipulate Cassandra data from Hive: http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/byoh/byohUsing.html

Resources