Can we install multiple hive servers in the same cluster - hadoop

I would like to enable two application instances to share a single HDFS cluster, but each instance of the application requires its own Hive database.
Is there a way to configure multiple independent Hive Servers/Metastores within a cluster so that each application can use the data in the cluster?

each instance of the application requires its own Hive database
Then do CREATE DATABASE my_own_database; in Hive.
Before any queries in the other app, run USE my_own_database; or SELECT * FROM my_own_database.table
Otherwise, sure, you would have to install and configure a separate Hive metastore Java process pointing at a different database (or even separate server)
in hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:<protocol>://<host>:<port>/<databasename></value>
</property>
Then your applications would have to set hive.metastore.uris to point at that instance

Related

Creating s3 external table in amazon EMR with remote metastore

We started using Amazon EMR recently for one new project (Version emr-5.11.0).We made some architecture changes in the EMR cluster
1) We moved metastore to another Postgres instance instead of default mysql/derby
2) Running metastore service in a different instance (which is not part of amazon EMR cluster) and made the necessary changes in hive-site.xml.
In EMR
stop hive-hcatalog-server
In new instance
hive --service metastore
Everything is working as expected except 's3 external tables' .When I try to create an external s3 table it is giving us an error like below
message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
We tried with s3/s3n/s3a with credentials also for creating the external table .If we are running the metastore service inside the EMR master node and ran the same query .It is working without issues .
Do we need to do any configuration / adding additional libraries in metastore instance this to work ?
Note: The metastore instance has both Apache hadoop and hive latest binaries .We are going with HDFS filesystem .Able to perform all operations other than external s3 tables .Tried everything from beeline and hive CLI

Apache Hive Installation on pseudo distributed or multi node cluster environment

I have installed hadoop on multi node environment in my PC as below
1: 4 virtual box instances loaded with ubuntu(14.04)
2: 1-master node , 2-slave node and remaining vm instance works as client
Note: All 4 VM'S are running in my PC itself
I was able to complete apace-2.6 hadoop setup successfully on the above mentioned setup .Now I want to install hive in order to do some data summarization, query, and analysis .
But I am not sure how I have to proceed further. I have few queries mentioned below :
Q1: Do I need to install/setup Apache Hive(0.14) on all nodes(master/name-node and slave/data-node)? or is it only on master node?
Q2: what is the mode should be used to deal with the meta-store is it local mode or remote mode ?
Q3: In case if I want to use mysql for hive meta-store,should I install it on master/name node itself or do I need to use separate client machine for this?
please can some one also share me if there are any steps to be followed to configure metastore? in multi node/pseudo distributed environment.
BR,
San
You need to install the required Hive services (HiveServer2, Metastore, WebHCat) only once. In your lab scenario, you would probably put them on the master. The client can then run Beeline (the HiveServer2 client.)
If you configure the Metastore as Local, Hive will use a local Derby database. Again, for your lab setup, this is probably just what you need/want.
In a production scenario, you would
set up a dedicated server for supporting services that should not fight for resources with the namenode process(es)
and use a dedicated database server for your Metastore database, which will be remote.

Hive Metastore asking for Derby DB in default

Just for a query. Am having a hadoop 1 cluster.
In hive am using Oracle/MySQL as a MetaStore DB, but when the MR job is being trigger I could see the Every Time Hive executing a Query Operation every time it seeking Derby DB instead of MySQL/Oracle DB.
For Setting default DB in hive-site.xml and hive-default.xml have passed the
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:oracle</value>
<description>The default database that stores temporary hive statistics.</description>
</property>
with other parameters like hive.stats.dbconnectionstring or javax.jdo.option.ConnectionURL
along with so is there any other changes required,
As of now what have been applied is have removed the DERBY jar from the Lib location of the cluster, so while trying to access the Derby DB its getting failed and afterword it's checking Oracle/MySQL Connector to access the same.
Any workaround can be done, as every time Hive is searching for DERBY DB, which causing bit a load for processing Job.

Hive Server doesn't see old hdfs tables

I'm having a problem about hive server that I don't understand. I've just set up a hadoop cluster and want to access to it from a hive service. First try I did was running the hive server in one of the cluster machines.
Everything worked nicely but I wanted to move the hive service to another machine outside the hadoop cluster.
So I just started a new machine outside this hadoop cluster. I've just install hive (+ hadoop libraries) and copied the hadoop config from the cluster. When I run the hiveserver almost everything goes ok. I can connect with the hive cli from a different machine to my hiveserver, create new tables in the hive warehouse within the hdfs filesystem in the hadoop cluster, query then and so on.
The thing I don't understand is that hiveserver seems to not recognize old tables which were created in my first try.
Some notes about my config are that all tables are handled by Hive and stored in HDFS. Hive configuration is the default one. I suppose that it has to do with my hive metastore but it couldn't say what.
Thank you!!

hadoop hive question

I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database.
Is any setting i need to configure?
Thanks in advance.
Make sure you run hive from the same directory every time because when you launch hive CLI for the first time, it creates a metastore derby db in the current directory. This derby DB contains metadata of hive tables. If you change directories, you will have unorganized metadata for hive tables. Also the Derby DB cannot handle multiple sessions. To allow for concurrent Hive access you would need to use a real database to manage the Metastore rather than the wimpy little derbyDB that comes with it. You can download mysql for this and change hive properties for jdbc connection to mysql type 4 pure java driver.
Try emailing the Hive userlist or the IRC channel.
You probably need to setup the central Hive metastore (by default, Derby, but it can be mySQL/Oracle/Postgres). The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
For more information, see http://wiki.apache.org/hadoop/HiveDerbyServerMode
Examine your hadoop logs. For me this happened when my hadoop system was not setup properly. The namenode was not able to contact the datanodes on other machines etc.
Yeah, it's due to the metastore not being set up properly. Metastore stores the metadata associated with your Hive table (e.g. the table name, table location, column names, column types, bucketing/sorting information, partitioning information, SerDe information, etc.).
The default metastore is an embedded Derby database which can only be used by one client at any given time. This is obviously not good enough for most practical purposes. You, like most users, should configure your Hive installation to use a different metastore. MySQL seems to be a popular choice. I have used this link from Cloudera's website to successfully configure my MySQL metastore.

Resources