Datatax Cassandra not accessible from BYOH HiveServer2 - hadoop

I followed the instructions from Datastax to setup a BYOH environment using the following article: Datastax BYOH
So i have a Datastax Enterprise and Hortonworks Hadoop running on a node. I created a column family in Cassandra and inserted some sample data, and i was able to access and manipulate the data in Cassandra from Hive(which is running on Hortonworks Data Platform, not Datastax Enterprise).
Now, when i try to access the same Cassandra column family using the JDBC driver for HiveServer2, i am able to see the column family in the database but when i try to manipulate it, or even try to view it using SELECT query, or do a DESCRIBE query, i am getting the following error:
Error: Error while processing statement: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat
And the same error shows up when i try to run hive without the BYOH prefix. In a nutshell, i am only able to manipulate Cassandra data from hive when i use the byoh prefix while starting hive command line interface otherwise the above error shows up.
I am not sure what the issue is. Any help would be appreciated.
I am using:
Datastax Enterprise: 4.5.1
Cassandra: 2.0.8
Hive: 0.12

This first paragraph on this page of the docs, which perhaps you didn't see, seem to say that using the byoh prefix is required to manipulate Cassandra data from Hive: http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/byoh/byohUsing.html

Related

HBase components doesn't appear in Pentaho Kettle

I am trying to working with Pentaho, in order to build some big data solutions. But the Hadoop HBase components aren't appering in the dashboard. I don't understand why HBase doesn't appear, since HBase is up an running on my machine... I've been seeking for a solutions, but without success...
Please check this property value 'hbase.client.scanner.timeout.period' set to 10 mins in hbase-default.xml to get rid of hbase exceptions.
Check that you have added zookeeper host in the hbase output host in pentaho data integration tool.
Have you read this wiki in order to load hbase data into pentaho.

Issue running Hive on spark with Hive table view mapping with Hbase table : java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy

I am trying to access Hbase table by mapping it from hive through spark engine.
From Hive:
When I run the query on Hive view mapped with Hbase I could get all the desired result.
From Spark:
When i run query to fetch from hive table ,i could get it but when i do the same for hbase mapped hive table i get below error.
Error: java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters; (state=,code=0)
I like to know whether it is possible to do it through spark as I did not find any solutions from internet.
From the above error I could figure that this is some jar issue and this method exists in hive-serde jar and hive-exec jar,but tried with all possible alternative but could not succeed..
Can any one help on this..
Note:My main objective here to check the performance as it is taking good amount of time in hive and pig as well.
I'm not sure which versions of software you are using hive, hbase & spark ...
Seems like this is version mismatch issue.
Spark 1.3.1 uses Hive 0.13 API, while Spark 1.5.1 uses Hive 1.2, hbase serde that comes with hive 2.3 is not compatible with Hive 0.13 API.
I tried with old Serde for Hive 0.13, but there are lots of conflicts with Hbase API versions.
Depending on your need, you can try Spark+Hbase native integration instead of Hive-Hbase Serde.
Or You can use correct compatible versions of hive.
Also have a look at https://issues.apache.org/jira/browse/HIVE-12406 can be the same issue...

Connect tableau with Elastic search via Spark SQL

I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,

Cassandra integration with Hadoop

I am newbie to Cassandra. I am posting this question as different documentations were providing different details with respect to integeting Hive with Cassandra and I was not able to find the github page.
I have installed a single node Cassandra 2.0.2 (Datastax Community Edition) in one of the data nodes of my 3 node HDP 2.0 cluster.
I am unable to use hive to access Cassandra using 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'. I am getting the error ' return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
I have copied all the jars in /$cassandra_home/lib/* to /$hive-home/lib and also included the /cassandra_home/lib/* in the $HADOOP_CLASSPATH.
Is there any other configuration changes that I have to make to integrate Cassandra with Hadoop/Hive?
Please let me know. Thanks for the help!
Thanks,
Arun
Probably these are starting points for you:
Hive support for Cassandra, github
Top level article related to your topic with general information: Hive support for Cassandra CQL3.
Hadoop support, Cassandra Wiki.
Actually your question is not so narrow, there could be lot of reasons for this. But what you should remember Hive is based on MapReduce engine.
Hope this helps.

How to connect to hive via CLI on cloudera

We are running CDH 4.1.1 from the HUE / Beeswax Hive is runng fine and /beeswax/tables shows all tables.
I want to use the hive CLI to list all tables:
overlord#overlord-datanode1:~$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/overlord/hive_job_log_overlord_201211280646_1426149164.txt
hive> SHOW TABLES;
OK
Time taken: 0.071 seconds
This appears to be empty, which leads me to believe that I'm maybe connecting to the wrong hive metastore?
How can I access the same hive data as from HUE/beeswax?
One possible reason is hive cli and beehive is using 2 different users(with different previlage) so when you switch users Meta store switch automatically(if it does not exist already).
If you are using derby as your metastore i would suggest you to migrated it to Mysql or PostgreSQL as derby is not suitable for production.
to migrate follow these guides.
http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation

Resources