Hive Metastore asking for Derby DB in default - hadoop

Just for a query. Am having a hadoop 1 cluster.
In hive am using Oracle/MySQL as a MetaStore DB, but when the MR job is being trigger I could see the Every Time Hive executing a Query Operation every time it seeking Derby DB instead of MySQL/Oracle DB.
For Setting default DB in hive-site.xml and hive-default.xml have passed the
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:oracle</value>
<description>The default database that stores temporary hive statistics.</description>
</property>
with other parameters like hive.stats.dbconnectionstring or javax.jdo.option.ConnectionURL
along with so is there any other changes required,
As of now what have been applied is have removed the DERBY jar from the Lib location of the cluster, so while trying to access the Derby DB its getting failed and afterword it's checking Oracle/MySQL Connector to access the same.
Any workaround can be done, as every time Hive is searching for DERBY DB, which causing bit a load for processing Job.

Related

Can we install multiple hive servers in the same cluster

I would like to enable two application instances to share a single HDFS cluster, but each instance of the application requires its own Hive database.
Is there a way to configure multiple independent Hive Servers/Metastores within a cluster so that each application can use the data in the cluster?
each instance of the application requires its own Hive database
Then do CREATE DATABASE my_own_database; in Hive.
Before any queries in the other app, run USE my_own_database; or SELECT * FROM my_own_database.table
Otherwise, sure, you would have to install and configure a separate Hive metastore Java process pointing at a different database (or even separate server)
in hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:<protocol>://<host>:<port>/<databasename></value>
</property>
Then your applications would have to set hive.metastore.uris to point at that instance

Connect Apache Zeppelin to Hive

I try to connect my apache zeppelin with my hive metastore. I use zeppelin 0.7.3 so there is not a hive interpreter only jdbc. I have copied my hive-site.xml to zeppelin conf folder but I don't know how to create a new hive interpreter.
I also tried to access hive tables through spark's hive context but when I try this way, I can not see my hive databases only a default database is shown.
Can someone explain either how to create a hive interpreter or how to access my hive metastore through spark correctly?
Any answer is appreciated.
I solved it by following this documentation. After adding this parameters in jdbc connector you should be able to run hive interpreter with
%jdbc(hive)
In my case it was a little trickier because I used Cloudera Hadoop so the standard jdbc hive connector was not working. So I changed the external hive-jdbc.jar with the one suitable for my cdh version (for cdh 5.9.- for example it located here).
I also find out that you can change hive.url with the one for impala port and connect with jdbc to impala if you prefer.

where to set config values in cloudera hive setup?

I am new to Cloudera quickstart. As per the requirement, we need to partition the data of large hive tables. there is cap of 100 dynamic partition in hive. We need to increase number of dynamic partitions in the configurations. I don't want to set it on the CLI everytime.
Where can i find the configuration file to update the following settings ?
hive.exec.max.dynamic.partitions.pernode
hive.exec.max.dynamic.partitions
hive.exec.dynamic.partition.mode=nonstrict
Will sqoop create any problem while importing data from sql server to hive with dynamic partitions?
Hive is configured through hive-site.xml.
On your local server, execute this command to try and find it
locate hive-site.xml

Hive Server doesn't see old hdfs tables

I'm having a problem about hive server that I don't understand. I've just set up a hadoop cluster and want to access to it from a hive service. First try I did was running the hive server in one of the cluster machines.
Everything worked nicely but I wanted to move the hive service to another machine outside the hadoop cluster.
So I just started a new machine outside this hadoop cluster. I've just install hive (+ hadoop libraries) and copied the hadoop config from the cluster. When I run the hiveserver almost everything goes ok. I can connect with the hive cli from a different machine to my hiveserver, create new tables in the hive warehouse within the hdfs filesystem in the hadoop cluster, query then and so on.
The thing I don't understand is that hiveserver seems to not recognize old tables which were created in my first try.
Some notes about my config are that all tables are handled by Hive and stored in HDFS. Hive configuration is the default one. I suppose that it has to do with my hive metastore but it couldn't say what.
Thank you!!

hadoop hive question

I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database.
Is any setting i need to configure?
Thanks in advance.
Make sure you run hive from the same directory every time because when you launch hive CLI for the first time, it creates a metastore derby db in the current directory. This derby DB contains metadata of hive tables. If you change directories, you will have unorganized metadata for hive tables. Also the Derby DB cannot handle multiple sessions. To allow for concurrent Hive access you would need to use a real database to manage the Metastore rather than the wimpy little derbyDB that comes with it. You can download mysql for this and change hive properties for jdbc connection to mysql type 4 pure java driver.
Try emailing the Hive userlist or the IRC channel.
You probably need to setup the central Hive metastore (by default, Derby, but it can be mySQL/Oracle/Postgres). The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
For more information, see http://wiki.apache.org/hadoop/HiveDerbyServerMode
Examine your hadoop logs. For me this happened when my hadoop system was not setup properly. The namenode was not able to contact the datanodes on other machines etc.
Yeah, it's due to the metastore not being set up properly. Metastore stores the metadata associated with your Hive table (e.g. the table name, table location, column names, column types, bucketing/sorting information, partitioning information, SerDe information, etc.).
The default metastore is an embedded Derby database which can only be used by one client at any given time. This is obviously not good enough for most practical purposes. You, like most users, should configure your Hive installation to use a different metastore. MySQL seems to be a popular choice. I have used this link from Cloudera's website to successfully configure my MySQL metastore.

Resources