Hive metastore Configuration with derby - hadoop

In RedHat test server I installed hadoop 2.7 and I ran Hive ,Pig & Spark with out issues .But when tried to access metastore of Hive from Spark I got errors So I thought of putting hive-site.xml(After extracting 'apache-hive-1.2.1-bin.tar.gz' file I just add $HIVE_HOME to bashrc as per tutorial and everything was working other than this integration with Spark) In apache site I found that I need to put hive-site.xml as metastore configuration
I created the file as below
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
</configuration>
I put IP as localhost since it is single node machine .After that I am not able to connect to even Hive .It is throwing error
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
....
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby://localhost:1527/metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:derby://localhost:1527/metastore_db;create=true
There are lot many error log pointing to the same thing . If I remove hive-site.xml from the conf folder hive is working without issues .Can anyone point me to the right path for default metastore configuration
Thanks
Anoop R

Derby is used as an embedded database. try using
jdbc:derby:metastore_db;create=true
as jdbc-url. see also
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore
To use the metastore fully functional (and by that to be able to access it from different services), try setting up using mysql as described in the document above.

As you are setting up an embedded metastore database, use the property below as JDBC URL:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>

I was also facing similar kind of exception while installing hive. The thing which worked for me was to initialize the derby db. I used the following command to solve the problem : command -> Go to $HIVE_HOME/bin and run the command schematool -initSchema -dbType derby .
You can follow the link http://www.edureka.co/blog/apache-hive-installation-on-ubuntu

It will work if you put derbyclient.jar in lib folder of hive

Related

SemanticException in Hive Shell Mode

hive exception
I have installed Hadoop 3.0.0 and Hive 2.3.1 in my PC. Parallely i installed mysql and working with sql commands in sql shell mode and working fine. But While executing queries in Hive shell mode, i am receiving the following error,
hive> create table saurzcode(id int, name string);
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Please let me know the reason for failure.
Also please clarify the following queries,
1) Difference between hive shell mode vs mysql shell mode.
2) Why to configure MySql Metastore for Hive?
Please find the hive-site.xml configuration,
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hivelogin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>apache</value>
</property>
</configuration>
Your Original exception is
Unable to load authentication plugin 'caching_sha2_password' as you can see in below error log.
org.apache.hadoop.hive.metastore.HiveMetaStore - Retrying creating default database after error: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
, username = hivelogin. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Unable to load authentication plugin 'caching_sha2_password'.
Solution:
This error happens due to all new MySQL version come up with added password plugin called "caching_sha2_password", and it has to be configured properly at MySQL server or else you can simply use "mysql_native_password" parameter with "CREATE USER" in MySQL as below to get it resolved.
While creating the hive Meta Store user just follow the below command.
CREATE USER 'username'#'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT ALL PRIVILEGES ON metastore_db.* TO 'hive'#'%';

Query tables present in external hive from Apache Spark [duplicate]

This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
I am relatively new to hadoop ecosystem. My goal is to read hive tables using Apache Spark and process it. Hive is running in EC2 instance. Whereas Spark is running in my local machine.
To do a prototype, I've installed Apache Hadoop by following steps present over here . I've added required environment variables as well.
I've started dfs using $HADOOP_HOME/sbin/start-dfs.sh
I've installed Apache Hive by following steps present over here. I've started hiverserver2 and hive metadatastore. I've configured apache derby db (Server mode) in hive. I've created a sample table 'web_log' and added few rows in it using beeline.
I've added below in hadoop core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
And added below in hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
I've added core-site.xml, hdfs-site.xml and hive-site.xml in $SPARK_HOME/conf in my local spark instance
core-site.xml and hdfs-site.xml are empty. i.e.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>
hive-site.xml has below content
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://ec2-instance-external-dbs-name:9083</value>
<description>URI for client to contact metastore server</description>
</property>
</configuration>
I've started spark-shell and executed the following command
scala> sqlContext
res0: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext#57d0c779
It seems spark has created HiveContext.
I've executed sql using below command
scala> val df = sqlContext.sql("select * from web_log")
df: org.apache.spark.sql.DataFrame = [viewtime: int, userid: bigint, url: string, referrer: string, ip: string]
The columns and its types matches the sample table 'web_log' that I've created.
Now when I execute scala> df.show, it took some time and throws below error
16/11/21 18:46:17 WARN BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ec2-instance-private-ip:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
It seems DFSClient is using EC2 instances internal ip. And AFAIK, I didn't start any application on port 50010.
Do I need to install and start any other application?
How can make sure that DFSClient uses EC2 instance external IP or external DNS name?
Is it possible to access hive from external spark instance?
Add below code snippet to program which you are running ,
hiveContext.getConf.getAll.mkString("\n") this will print which hive metastore its connecting to... you can review all the properties which are not correct.
if they are not what you are looking for, and you cant adjust...
due to some limitations then as described the link. you can try
like this to point to correct uris... etc
hiveContext.setConf("hive.metastore.uris", "thrift://METASTOREl:9083");

How to change sqoop metastore?

I am using sqoop 1.4.2 version.
I am trying to change the sqoop metastore from default hsqldb to mysql.
I have configured following properties in sqoop-site.xml file.
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:mysql://ip:3206/sqoop?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>userName</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value>password</value>
</property>
</configuration>
When I try to create a sqoop jobs with meta-connect url it fails to connect to configured mysql db.
sqoop job --create --meta-connect {mysql_jdbc_url} sqoop job defination
it is throwing following exception.
14/06/06 15:04:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.0.6.1-101
14/06/06 15:04:55 WARN hsqldb.HsqldbJobStorage: Could not interpret as a number: null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: Can not interpret metadata schema
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The metadata schema version is null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The highest version supported is 0
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: To use this version of Sqoop, you must downgrade your metadata schema.
14/06/06 15:04:55 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Invalid metadata version.
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:202)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.open(HsqldbJobStorage.java:161)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:274)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
Does sqoop 1.4.2 supports metastore other than hsql db?
Please suggest.
The answer is Yes, in my case I am using PostgreSQL. I ran into this recently and I am using Version 1.4.4. I am not sure if what I did is the recommended way, but it works. Here are the steps I followed
In sqoop-site.xml I configured it with, the connect string to my database, username and password.
Created the following object in the database, as Sqoop was failing at it.
CREATE TABLE SQOOP_ROOT (
version INT,
propname VARCHAR(128) NOT NULL,
propval VARCHAR(256),
CONSTRAINT SQOOP_ROOT_unq UNIQUE (version, propname)
);
Inserted the following row (This seems to be the reason your script is failing)
INSERT INTO
SQOOP_ROOT
VALUES(
NULL,
'sqoop.hsqldb.job.storage.version',
'0'
);
I think the correct way might be is to download the source, and extend
org.apache.sqoop.metastore.JobStorage with you DB implementation.
Sqoop metastore does not support any other database other hsqldb. Number 2 points of notes on the link.
cloudera
Public service announcement: Sqoop Metastore on other DBs may fail
We have been able to get PostgreSQL and MySQL working as targets for the Sqoop Metastore on Sqoop 1, replacing the HyperSQL database. There's a little setup and seeding of the database needed, but from then on, it seemed fine.
However, we are seeing cases when we are running many sqoop jobs, updating the metastore concurrently -- sqoop 1.4.6 has no code to trap and handle cases where metastore updates for incremental updates fail due to concurrency issues. In particular, Sqoop _will complete it's import successfully but not update the metastore with the most recently imported values. This will cause the next incremental run will import duplicate data. Sqoop will return a non-zero return code, but data in either Hadoop or the metastore need to be synced afterward in order for data to be correct.
We're not sure there is a solution, but this is an expansion of #SandeerKumar's answer. This may be an issue with HyperSQL as well, but it would be much less likely because HSQL is in memory, so faster.

Hive Metastore tries to create a Derby connection instead of MySQL

I am using Hive 0.11 and Metastore in local mode. When I try to start the Metastore daemon, it exits after spitting the following error message:
2013-11-21 08:47:19.541 GMT Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
2013-11-21 08:47:19.646 GMT Thread[main,5,main] Cleanup action starting
ERROR XBM0H: Directory /metastore_db cannot be created.
This is my hive-site.xml. I am using MySQL as Metastore storage. What I don't understand is why is Hive trying to create metastore_db locally.
Thanks.
Set hive.metastore.local property as false. (Removed as of Hive 0.10: If hive.metastore.uris is empty local mode is assumed, remote otherwise)
Set hive.metastore.uris property with valid uri (Host and port for the Thrift metastore server)
For eg:
<property>
<name>hive.metastore.uris</name>
<value>thrift://hap-db:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
Hi faced similar issue on hive 0.14. I had installed hive as root user and was trying to run hive services as a sudo user i use for all hadoop jobs.
Once i changed the installation owner to sudo and restarted it worked . so this error is mostly related to file permissions issue.

java.io.EOFException when trying to run examples on HBase standalone

I'm trying to run this example: https://github.com/larsgeorge/hbase-book/blob/master/ch03/src/main/java/client/PutExample.java, from this book: http://ofps.oreilly.com/titles/9781449396107/, on a standalone HBase installation. Starting HBase works fine and the shell is accessible, but when I try to run the example I get the following error:
Exception in thread "main" java.io.IOException: Call to /127.0.0.1:55958 failed on local exception: java.io.EOFException
at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:872)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:841)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:141)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:174)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:228)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1228)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1190)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1177)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:914)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:784)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1014)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:814)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:778)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:188)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
at client.CRUDExample.main(CRUDExample.java:26)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:548)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:486)
Thanks in advance.
The problem was that I compiled the examples with a newer version of HBase. To fix this error, for these examples, edit pom.xml and make sure that the HBase dependency is the same version as you're running and build again. (Also don't forget to remove chXX/target/cached_classpath.txt, otherwise it still adds the other library to your classpath)
I met this problem with exactly same error logs, but difference reason.
It is fixed after I added bellow items to hbase client config:
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>

Resources