How to connect to hive via CLI on cloudera - hadoop

We are running CDH 4.1.1 from the HUE / Beeswax Hive is runng fine and /beeswax/tables shows all tables.
I want to use the hive CLI to list all tables:
overlord#overlord-datanode1:~$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/overlord/hive_job_log_overlord_201211280646_1426149164.txt
hive> SHOW TABLES;
OK
Time taken: 0.071 seconds
This appears to be empty, which leads me to believe that I'm maybe connecting to the wrong hive metastore?
How can I access the same hive data as from HUE/beeswax?

One possible reason is hive cli and beehive is using 2 different users(with different previlage) so when you switch users Meta store switch automatically(if it does not exist already).
If you are using derby as your metastore i would suggest you to migrated it to Mysql or PostgreSQL as derby is not suitable for production.
to migrate follow these guides.
http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation

Related

Connecting Apache Superset with Hive

I've my Hadoop cluster running in AWS environment where the schema got mapped with Hive. And I could see the complete Data in Hive.
Now, Here is the Problem - I am trying to connect my hive to Superset where I couldn't able to connect with.
This is how I have provided my URI:
jdbc+hive://MYIP:PORT
Also tried:
hive://username:password#MYIP:PORT
Make sure hive server2 is up and running
Also you can try this one
hive://hostname:10000/default?auth=NOSASL

Connect Apache Zeppelin to Hive

I try to connect my apache zeppelin with my hive metastore. I use zeppelin 0.7.3 so there is not a hive interpreter only jdbc. I have copied my hive-site.xml to zeppelin conf folder but I don't know how to create a new hive interpreter.
I also tried to access hive tables through spark's hive context but when I try this way, I can not see my hive databases only a default database is shown.
Can someone explain either how to create a hive interpreter or how to access my hive metastore through spark correctly?
Any answer is appreciated.
I solved it by following this documentation. After adding this parameters in jdbc connector you should be able to run hive interpreter with
%jdbc(hive)
In my case it was a little trickier because I used Cloudera Hadoop so the standard jdbc hive connector was not working. So I changed the external hive-jdbc.jar with the one suitable for my cdh version (for cdh 5.9.- for example it located here).
I also find out that you can change hive.url with the one for impala port and connect with jdbc to impala if you prefer.

Does Hive depend on/require Hadoop?

Hive installation guide says that Hive can be applied to RDBMS, my question is, sounds like Hive can exist without Hadoop, right? It's an independent HQL engineer that could work with any data source?
You can run Hive in local mode to use it without Hadoop for debugging purposes. See below url
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-Hive,Map-ReduceandLocal-Mode
Hive provided JDBC driver to query hive like JDBC, however if you are planning to run Hive queries on production system, you need Hadoop infrastructure to be available. Hive queries eventually converts into map-reduce jobs and HDFS is used as data storage for Hive tables.

Connect tableau with Elastic search via Spark SQL

I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,

hadoop hive question

I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database.
Is any setting i need to configure?
Thanks in advance.
Make sure you run hive from the same directory every time because when you launch hive CLI for the first time, it creates a metastore derby db in the current directory. This derby DB contains metadata of hive tables. If you change directories, you will have unorganized metadata for hive tables. Also the Derby DB cannot handle multiple sessions. To allow for concurrent Hive access you would need to use a real database to manage the Metastore rather than the wimpy little derbyDB that comes with it. You can download mysql for this and change hive properties for jdbc connection to mysql type 4 pure java driver.
Try emailing the Hive userlist or the IRC channel.
You probably need to setup the central Hive metastore (by default, Derby, but it can be mySQL/Oracle/Postgres). The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
For more information, see http://wiki.apache.org/hadoop/HiveDerbyServerMode
Examine your hadoop logs. For me this happened when my hadoop system was not setup properly. The namenode was not able to contact the datanodes on other machines etc.
Yeah, it's due to the metastore not being set up properly. Metastore stores the metadata associated with your Hive table (e.g. the table name, table location, column names, column types, bucketing/sorting information, partitioning information, SerDe information, etc.).
The default metastore is an embedded Derby database which can only be used by one client at any given time. This is obviously not good enough for most practical purposes. You, like most users, should configure your Hive installation to use a different metastore. MySQL seems to be a popular choice. I have used this link from Cloudera's website to successfully configure my MySQL metastore.

Resources