Tables not found when hive cli called from different directory - hadoop

I am facing a weird problem with Hive Tables. I have HIVE_HOME set in my environ and it is also in my search path so i can invoke hive directly.
Now I invoke hive from a directory lets say /a/b/c and create some tables. I can see the tables.
Now I change to a directory e.g /a/b and invoke hive from there. Here is the problem part. Either i am unable to see the tables or i get this error
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start
database 'metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to start database 'metastore_db', see the next exception
for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Why are tables tied to the directory from which the hive cli was called from? Any pointers?

I think you are using derby server which hive uses for storing the metadata. So, for that what you can do is delete everything inside metastore_db folder and then try to restart the hadoop. And then try to see. But, i think best advice would be you use the mysql as a metastore.

Related

Hive Transactions + Remote Metastore Error

I'm running Hive 2.1.1 on EMR 5.5.0 with a remote mysql metastore DB. I need to enable transactions on hive, but when I follow the configuration here and run any query, I get the following error
FAILED: Error in acquiring locks: Error communicating with the metastore
Settings on the metastore:
hive.compactor.worker.threads = 0
hive.compactor.initiator.on = true
Settings in the hive client:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
This only happens when I set hive.txn.manager, so my hive metastore is definitely online.
I've tried some of the old suggestions of turning hive test features on which didn't work, but I don't think this is a test feature anymore. I can't turn off concurrency as a similar post in SO suggests because I need concurrency. It seems like the problem is that either DbTxnManager isn't getting the remote metastore connection info properly from hive config or the mysqldb is missing some tables required by DbTxnManager. I have datanucleus.autoCreateTables=true.
It looks like hive wasn't properly creating the tables needed for the transaction manager. I'm not sure where it was getting its schema, but it was definitely wrong.
So we just ran the hive-txn-schema query to setup the schema manually. We'll do this at the start of any of our clusters from now on.
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-txn-schema-2.1.0.mysql.sql
The error from
FAILED: Error in acquiring locks: Error communicating with the metastore
sometimes because of it without any data, you need to initialization some data in your tables. for example below.
create table t1(id int, name string)
clustered by (id) into 8 buckets
stored as orc TBLPROPERTIES ('transactional'='true');

Unable to update hive table via JDBC

I am unable to do an Update to my hive table via JDBC. I able to Select, but not Update.
Connecting to the hive database:
Connection connection =
DriverManager.getConnection("jdbc:hive2://localhost:10000/db", "", "");
My query:
ResultSet resultSet = statement.executeQuery("update db.test set name='yo yo' where id=1");
Stacktrace:
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:355)
at com.spnotes.hive.App.main(App.java:63)
Again, I am able to Select but not Update via JDBC. I am however, able to Update my table via the hive shell only. I believe this is a user permissions issue. I have seen other problems where an HDFS directory needed to be granted permissions before it could be written to.
I had to invoke my hive shell with my HDFS user as so:
sudo -u hdfs hive
Can I somehow pass a "hfds" user via JDBC? It does not look like this is possible. This is how I'm thinking the exception will not happen anymore.
Here is the "secure way" of passing in a username and password as so:
Connection con = DriverManager.getConnection("jdbc:hive2:/hiveserver.domain.com:10000/default;user=username;password=password");
BUT this is NOT the same thing as passing the user hdfs. Perhaps it is possible to link the "username" with permissions to update the hive table?
Any help is welcome. Thanks!
You are trying to pass a update statement in a executeQuery()
For security reasons, any update statement will fail when using this method. Change it to executeUpdate()
Also, instead of using queries like this, I suggest using Prepared Statements, since by using parameters you make it less vulnerable to SQL Injections

"Unexpected Error" on Join 2 simple tables

I have created a hive database. I have created an ODBC Data source to Hive using Hortonworks ODBC Driver for Hive.
I use this data source from Tableau 9 (desktop).
I can query Table DimA, I can query Table FactA. But in tableau if I try to do a join I get error
[Hortonworks][HiveODBC] (35) Error from Hive: error code: '0' error message: 'ExecuteStatement finished with operation state: ERROR_STATE'.
Unexpected Error
I can easily go to my cluster and issue the same query in hiveshell without any problems and it returns results.
I searched the Internet and people have this permission problem which gets solved by "grant".. but in this case I am able to query individual 2 tables (dima, facta) easily from tableau... but ONLY when I JOIN the tables that it throws the above error.
I tried the "New Custom SQL" and copy pasted the SQL which worked in hive Shell... but tableau threw the error.
[Hortonworks][HiveODBC] (35) Error from Hive: error code: '40000' error message: 'Error while compiling statement: FAILED: ParseException line 1:11 cannot recognize input near 'TOP' '1' '*' in select expression'.
I fixed the issue. I had chosen the user "hue" to connect to HIVE.
I did this because a tutorial showed me the steps to connect to hive.
http://hortonworks.com/hadoop-tutorial/how-to-install-and-configure-the-hortonworks-odbc-driver-on-windows-7/
but the tutorial is wrong in suggesting the user hue. they should instead use hdfs because hue user does not have rights to launch MR jobs which are required to run joins on Hive.
Possible fix:
This SQL error is a known issue when using Hadoop Hive driver 1.4.8 to
1.4.13. This issue can be resolved by rolling the client driver back to 1.3. The most recent drivers produce issues when using a CASE
statements in Tableau, and Hortonworks is in the process of repairing
this functionality. (http://community.tableau.com/thread/150002)

Cloudera/Hive - Can't access tables after hostname change

I created a Cloudera cluster and imported some sample test files from oracle DB. But after a while I had to change the hostnames of the nodes. I followed the guide mentioned in cloudera site and everything worked fine. But when I try to access tables(using both hive and impala) I created earlier I get the following error:
Fetching results ran into the following error(s):
java.io.IOException: java.lang.IllegalArgumentException: java.net.UnknownHostException: [Old Host Name]
Then I created another table under the same DB (Using Hue>Metastore Tables) and I can access these new tables created under the new hostname with no issue.
Can someone explain how I can access my old tables without reverting back my hostnames. Can I access metastore db and change the table pointers to new hostname.
Never Mind I found the answer.
You can confirm that hive/impala is looking for the wrong location by executing
describe formatted [tablename];
O/P
14 Location: hdfs://[oldhostname]:8020/user/hive/warehouse/sample_07 NULL
Then you can change "Location" property using :
ALTER TABLE sample_07 SET LOCATION "hdfs://[newhostname]:8020/user/hive/warehouse/sample_07";
ps - sample_07 is my the table in concern
Some times this doesn't WORK !!
Above workaround works for sample table which is available by default but I had another table which I sqooped from external DB to a custome metastore DB, this gave me again an error similar to above.
Solution :
Go to host where you've installed hive.
Temporally add the old hostname of the hive server to /etc/hosts (if you don't have external DNS both new and old hostnames should exist in the same host file)
Execute the 'ALTER TABLE ....' at hive shell (or web interface)
Remove the oldhostname entry from /etc/hosts
Try this
hive --service metatool -updateLocation <newfsDefaultFSValue> <old_fsDefaultFSValue>
You can refer to https://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.trb.doc/doc/trb_inst_hive_hostnames.html

Oozie cannot access metastore database in HUE

I'm on CDH4, in HUE, I have a database in Metastore Manager named db1. I can run Hive queries that create objects in db1 with no problem. I put those same queries in scripts and run them through Oozie and they fail with this message:
FAILED: SemanticException 0:0 Error creating temporary folder on: hdfs://lad1dithd1002.thehartford.com:8020/appl/hive/warehouse/db1.db. Error encountered near token 'TOK_TMP_FILE'
I created db1 in the Metastore Manager as HUE user db1, and as HUE user admin, and as HUE user db1, and nothing works. The db1 user also has a db1 ID on the underlying Linux cluster, if that helps.
I have chmod'd the /appl/hive/warehouse/db1.db to read, write, execute to owner, group, other, and none of that makes a difference.
I'm almost certain it's a rights issue, but what? Oddly, I have this working under another ID where I had hacked some combination of things that seemed to have worked, but I'm not sure how. It was all in HUE, so if possible, I'd like a solution doable in HUE so I can easily hand it off to folks who prefer to work at the GUI level.
Thanks!
Did you also add hive-site.xml into your Files and Job XML fields? Hue has great tutorial about how to run Hive job. Watch it here. Adding of hive-site.xml is described around 4:20.
Exact same error on Hadoop MapR.
Root cause : Main database and temporary(scrat) database were created by different users.
Resolution : Creating both folders with same ID might help with this.

Resources