Cloudera/Hive - Can't access tables after hostname change - hadoop

I created a Cloudera cluster and imported some sample test files from oracle DB. But after a while I had to change the hostnames of the nodes. I followed the guide mentioned in cloudera site and everything worked fine. But when I try to access tables(using both hive and impala) I created earlier I get the following error:
Fetching results ran into the following error(s):
java.io.IOException: java.lang.IllegalArgumentException: java.net.UnknownHostException: [Old Host Name]
Then I created another table under the same DB (Using Hue>Metastore Tables) and I can access these new tables created under the new hostname with no issue.
Can someone explain how I can access my old tables without reverting back my hostnames. Can I access metastore db and change the table pointers to new hostname.

Never Mind I found the answer.
You can confirm that hive/impala is looking for the wrong location by executing
describe formatted [tablename];
O/P
14 Location: hdfs://[oldhostname]:8020/user/hive/warehouse/sample_07 NULL
Then you can change "Location" property using :
ALTER TABLE sample_07 SET LOCATION "hdfs://[newhostname]:8020/user/hive/warehouse/sample_07";
ps - sample_07 is my the table in concern
Some times this doesn't WORK !!
Above workaround works for sample table which is available by default but I had another table which I sqooped from external DB to a custome metastore DB, this gave me again an error similar to above.
Solution :
Go to host where you've installed hive.
Temporally add the old hostname of the hive server to /etc/hosts (if you don't have external DNS both new and old hostnames should exist in the same host file)
Execute the 'ALTER TABLE ....' at hive shell (or web interface)
Remove the oldhostname entry from /etc/hosts

Try this
hive --service metatool -updateLocation <newfsDefaultFSValue> <old_fsDefaultFSValue>
You can refer to https://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.trb.doc/doc/trb_inst_hive_hostnames.html

Related

Hive permission denied for user anonymous using beeline shell

I created a 3 node Hadoop cluster with 1 namenode and 2 datanode.
I can perform a read/write query from Hive shell, but not beeline.
I found many suggestions and answers related to this issue.
In every suggestion it was mentioned to give the permission for the userX for each individual table.
But I don't know how to set the permission for an anonymous user once and for all.
Why I am getting the user anonymous while accessing the data from beeline or from a Java program?
I am able to read the data from the both beeline shell and using Java JDBC connection.
But I can't insert the data in the table.
This is my jdbc connection : jdbc:hive2://hadoop01:10000.
Below is the error i am getting while on insert request:
Permission denied: user=anonymous, access=WRITE, inode="/user/hive/warehouse/test_log/.hive-staging_hive_2017-10-07_06-54-36_347_6034469031019245441-1":hadoop:supergroup:drwxr-xr-x
Beeline syntax is
beeline -n username -u "url"
I assume you are missing the username. Also, no one but the hadoop user has WRITE access to that table anyway
If you don't have full control over the table permissions, you can try relocating the staging directory with the setting hive.exec.stagingdir
If no database is specified in the connection URL to connect, like
jdbc:hive2://hadoop01:10000/default
then beeline connects to the database DEFAULT , and while inserting the data into the table - first the data is loaded to a temporary table in default database and then loaded to the actual table.
So, you need to give the user access to the DEFAULT database also, or you can connect to the databases where you have access to.
jdbc:hive2://hadoop01:10000/your_db

Hive Transactions + Remote Metastore Error

I'm running Hive 2.1.1 on EMR 5.5.0 with a remote mysql metastore DB. I need to enable transactions on hive, but when I follow the configuration here and run any query, I get the following error
FAILED: Error in acquiring locks: Error communicating with the metastore
Settings on the metastore:
hive.compactor.worker.threads = 0
hive.compactor.initiator.on = true
Settings in the hive client:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
This only happens when I set hive.txn.manager, so my hive metastore is definitely online.
I've tried some of the old suggestions of turning hive test features on which didn't work, but I don't think this is a test feature anymore. I can't turn off concurrency as a similar post in SO suggests because I need concurrency. It seems like the problem is that either DbTxnManager isn't getting the remote metastore connection info properly from hive config or the mysqldb is missing some tables required by DbTxnManager. I have datanucleus.autoCreateTables=true.
It looks like hive wasn't properly creating the tables needed for the transaction manager. I'm not sure where it was getting its schema, but it was definitely wrong.
So we just ran the hive-txn-schema query to setup the schema manually. We'll do this at the start of any of our clusters from now on.
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-txn-schema-2.1.0.mysql.sql
The error from
FAILED: Error in acquiring locks: Error communicating with the metastore
sometimes because of it without any data, you need to initialization some data in your tables. for example below.
create table t1(id int, name string)
clustered by (id) into 8 buckets
stored as orc TBLPROPERTIES ('transactional'='true');

create database in hive with multiple locations having sentry enable

I am creating a database in hive with multiple location for example
CREATE DATABASE sample1 location 'hdfs://nameservice1:8020/db/dev/abc','hdfs://nameservice1:8020/db/dev/def','hdfs://nameservice1:8020/db/dev/ghi'
but i am getting error while doing this. Can anyone help in this kind of creating a database with multiple locations is allowed ? Is there any alternate solution for this.
PS: My cluster is sentry enabled
Which error? If that is
User xx does not have privileges for CREATETABLE
then look at
http://community.cloudera.com/t5/Batch-SQL-Apache-Hive/quot-User-does-not-have-privileges-for-CREATETABLE-quot-Error/td-p/21044
You may have to omit LOCATION, and upload file directly to a hive warehouse location of that hive schema. I can't think of a better workaround.

How to change location of a Hive metastore?

I recently decommissioned an analytics node with IP X.X.X.51. After this, I can't execute Shark queries since my Shark/Hive database is bound to the analytics node I just decommisioned:
shark> DESCRIBE DATABASE mykeyspace;
OK
mykeyspace cfs://X.X.X.51/user/hive/warehouse/mykeyspace.db
According to the Hive documentation for ALTER DATABASE I can't change the location of my metadata database. How can I resolve this? Is there any other way for me to change the location IP of my store?
You can't change the location of it. Sad, but true.
However, see https://stackoverflow.com/a/28112448/260805.

Tables not found when hive cli called from different directory

I am facing a weird problem with Hive Tables. I have HIVE_HOME set in my environ and it is also in my search path so i can invoke hive directly.
Now I invoke hive from a directory lets say /a/b/c and create some tables. I can see the tables.
Now I change to a directory e.g /a/b and invoke hive from there. Here is the problem part. Either i am unable to see the tables or i get this error
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start
database 'metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to start database 'metastore_db', see the next exception
for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Why are tables tied to the directory from which the hive cli was called from? Any pointers?
I think you are using derby server which hive uses for storing the metadata. So, for that what you can do is delete everything inside metastore_db folder and then try to restart the hadoop. And then try to see. But, i think best advice would be you use the mysql as a metastore.

Resources