How to change sqoop metastore? - hadoop

I am using sqoop 1.4.2 version.
I am trying to change the sqoop metastore from default hsqldb to mysql.
I have configured following properties in sqoop-site.xml file.
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:mysql://ip:3206/sqoop?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>userName</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value>password</value>
</property>
</configuration>
When I try to create a sqoop jobs with meta-connect url it fails to connect to configured mysql db.
sqoop job --create --meta-connect {mysql_jdbc_url} sqoop job defination
it is throwing following exception.
14/06/06 15:04:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.0.6.1-101
14/06/06 15:04:55 WARN hsqldb.HsqldbJobStorage: Could not interpret as a number: null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: Can not interpret metadata schema
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The metadata schema version is null
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: The highest version supported is 0
14/06/06 15:04:55 ERROR hsqldb.HsqldbJobStorage: To use this version of Sqoop, you must downgrade your metadata schema.
14/06/06 15:04:55 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Invalid metadata version.
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:202)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.open(HsqldbJobStorage.java:161)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:274)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
Does sqoop 1.4.2 supports metastore other than hsql db?
Please suggest.

The answer is Yes, in my case I am using PostgreSQL. I ran into this recently and I am using Version 1.4.4. I am not sure if what I did is the recommended way, but it works. Here are the steps I followed
In sqoop-site.xml I configured it with, the connect string to my database, username and password.
Created the following object in the database, as Sqoop was failing at it.
CREATE TABLE SQOOP_ROOT (
version INT,
propname VARCHAR(128) NOT NULL,
propval VARCHAR(256),
CONSTRAINT SQOOP_ROOT_unq UNIQUE (version, propname)
);
Inserted the following row (This seems to be the reason your script is failing)
INSERT INTO
SQOOP_ROOT
VALUES(
NULL,
'sqoop.hsqldb.job.storage.version',
'0'
);
I think the correct way might be is to download the source, and extend
org.apache.sqoop.metastore.JobStorage with you DB implementation.

Sqoop metastore does not support any other database other hsqldb. Number 2 points of notes on the link.
cloudera

Public service announcement: Sqoop Metastore on other DBs may fail
We have been able to get PostgreSQL and MySQL working as targets for the Sqoop Metastore on Sqoop 1, replacing the HyperSQL database. There's a little setup and seeding of the database needed, but from then on, it seemed fine.
However, we are seeing cases when we are running many sqoop jobs, updating the metastore concurrently -- sqoop 1.4.6 has no code to trap and handle cases where metastore updates for incremental updates fail due to concurrency issues. In particular, Sqoop _will complete it's import successfully but not update the metastore with the most recently imported values. This will cause the next incremental run will import duplicate data. Sqoop will return a non-zero return code, but data in either Hadoop or the metastore need to be synced afterward in order for data to be correct.
We're not sure there is a solution, but this is an expansion of #SandeerKumar's answer. This may be an issue with HyperSQL as well, but it would be much less likely because HSQL is in memory, so faster.

Related

SemanticException in Hive Shell Mode

hive exception
I have installed Hadoop 3.0.0 and Hive 2.3.1 in my PC. Parallely i installed mysql and working with sql commands in sql shell mode and working fine. But While executing queries in Hive shell mode, i am receiving the following error,
hive> create table saurzcode(id int, name string);
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Please let me know the reason for failure.
Also please clarify the following queries,
1) Difference between hive shell mode vs mysql shell mode.
2) Why to configure MySql Metastore for Hive?
Please find the hive-site.xml configuration,
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hivelogin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>apache</value>
</property>
</configuration>
Your Original exception is
Unable to load authentication plugin 'caching_sha2_password' as you can see in below error log.
org.apache.hadoop.hive.metastore.HiveMetaStore - Retrying creating default database after error: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
, username = hivelogin. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Unable to load authentication plugin 'caching_sha2_password'.
Solution:
This error happens due to all new MySQL version come up with added password plugin called "caching_sha2_password", and it has to be configured properly at MySQL server or else you can simply use "mysql_native_password" parameter with "CREATE USER" in MySQL as below to get it resolved.
While creating the hive Meta Store user just follow the below command.
CREATE USER 'username'#'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT ALL PRIVILEGES ON metastore_db.* TO 'hive'#'%';

how beeline uses jceks file

I am hiding the metastore password in hive-site.xml using below property and I am using postgres as my metastore db.
<property>
<name>hadoop.security.credential.provider.path</name>
<value>jceks://file//...hive_new.jceks</value>
</property>
1)Iam getting error when I initiate db like "schematool -dbType postgres -initSchema"?
2)How can I use the these jceks file using beeline?.I have tried like below
beeline -u "jdbc:hive2://myhost:1000/default;principal=hive/principal#REALM?hadoop.security.credential.provider.path=jceks://hdfs#hostname/path/to/jceks"
but its halted at connecting.....
Do I need to set any other properties in hive-site.xml for beeline?

Hive metastore Configuration with derby

In RedHat test server I installed hadoop 2.7 and I ran Hive ,Pig & Spark with out issues .But when tried to access metastore of Hive from Spark I got errors So I thought of putting hive-site.xml(After extracting 'apache-hive-1.2.1-bin.tar.gz' file I just add $HIVE_HOME to bashrc as per tutorial and everything was working other than this integration with Spark) In apache site I found that I need to put hive-site.xml as metastore configuration
I created the file as below
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
</configuration>
I put IP as localhost since it is single node machine .After that I am not able to connect to even Hive .It is throwing error
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
....
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby://localhost:1527/metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:derby://localhost:1527/metastore_db;create=true
There are lot many error log pointing to the same thing . If I remove hive-site.xml from the conf folder hive is working without issues .Can anyone point me to the right path for default metastore configuration
Thanks
Anoop R
Derby is used as an embedded database. try using
jdbc:derby:metastore_db;create=true
as jdbc-url. see also
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore
To use the metastore fully functional (and by that to be able to access it from different services), try setting up using mysql as described in the document above.
As you are setting up an embedded metastore database, use the property below as JDBC URL:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
I was also facing similar kind of exception while installing hive. The thing which worked for me was to initialize the derby db. I used the following command to solve the problem : command -> Go to $HIVE_HOME/bin and run the command schematool -initSchema -dbType derby .
You can follow the link http://www.edureka.co/blog/apache-hive-installation-on-ubuntu
It will work if you put derbyclient.jar in lib folder of hive

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

Hive Metastore tries to create a Derby connection instead of MySQL

I am using Hive 0.11 and Metastore in local mode. When I try to start the Metastore daemon, it exits after spitting the following error message:
2013-11-21 08:47:19.541 GMT Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
2013-11-21 08:47:19.646 GMT Thread[main,5,main] Cleanup action starting
ERROR XBM0H: Directory /metastore_db cannot be created.
This is my hive-site.xml. I am using MySQL as Metastore storage. What I don't understand is why is Hive trying to create metastore_db locally.
Thanks.
Set hive.metastore.local property as false. (Removed as of Hive 0.10: If hive.metastore.uris is empty local mode is assumed, remote otherwise)
Set hive.metastore.uris property with valid uri (Host and port for the Thrift metastore server)
For eg:
<property>
<name>hive.metastore.uris</name>
<value>thrift://hap-db:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
Hi faced similar issue on hive 0.14. I had installed hive as root user and was trying to run hive services as a sudo user i use for all hadoop jobs.
Once i changed the installation owner to sudo and restarted it worked . so this error is mostly related to file permissions issue.

Resources