I recently decommissioned an analytics node with IP X.X.X.51. After this, I can't execute Shark queries since my Shark/Hive database is bound to the analytics node I just decommisioned:
shark> DESCRIBE DATABASE mykeyspace;
OK
mykeyspace cfs://X.X.X.51/user/hive/warehouse/mykeyspace.db
According to the Hive documentation for ALTER DATABASE I can't change the location of my metadata database. How can I resolve this? Is there any other way for me to change the location IP of my store?
You can't change the location of it. Sad, but true.
However, see https://stackoverflow.com/a/28112448/260805.
Related
I had an database named default and test user /user/hive/warehouse/ and I was messing with --delete-target-dir in sqoop and unfortunately deleted both the databases so the tables are also gone.
Luckily I have everything backed and there was not much in those databases. So i tried to go create both the databases again and it says that databases with those names already exists. So I tried to see those databases and tables in hive terminal. I can see both the databases and all the tables in both the databases using show databases; and show tables; in hive but the tables are empty.
I also tried to use desc database default and the location it shows I can't see them on WEB UI file system.
Is there a way to get them back? or should I drop the databases and recreate them with tables?
I am using Hadoop 2.6.0-cdh5.10.0
Thank you in advance.
The data is gone, not the metadata.
The database and tables definitions are still stored in the metastore, pointing to non existing locations.
If the trash feature is turned on, your data might still exist (moved to another location instead of deleted immediately).
If it is, it would be under /user/{The user who owned the data}/.Trash.
Check the values of fs.trash.interval and fs.trash.checkpoint.interval.
fs.trash.interval
Number of minutes after which the checkpoint gets deleted. If zero,
the trash feature is disabled. This option may be configured both on
the server and the client. If trash is disabled server side then the
client side configuration is checked. If trash is enabled on the
server side then the value configured on the server is used and the
client configuration value is ignored.
fs.trash.checkpoint.interval
Number of minutes between trash checkpoints. Should be smaller or
equal to fs.trash.interval. If zero, the value is set to the value of
fs.trash.interval. Every time the checkpointer runs it creates a new
checkpoint out of current and removes checkpoints created more than
fs.trash.interval minutes ago.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
After restarting the Impala server, we are not able to see the tables(i.e. tables are not coming up).Anyone help me what order we have to follow to avoid this issue.
Thanks,
Srinivas
You should try running "invalidate metadata;" from impala-shell. This usually clears up tables not being visible as impala caches metadata.
From:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_invalidate_metadata.html
The following example shows how you might use the INVALIDATE METADATA
statement after creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the INVALIDATE METADATA statement was issued, Impala would give a "table not found" error if you tried to refer to those table names.
I am creating a database in hive with multiple location for example
CREATE DATABASE sample1 location 'hdfs://nameservice1:8020/db/dev/abc','hdfs://nameservice1:8020/db/dev/def','hdfs://nameservice1:8020/db/dev/ghi'
but i am getting error while doing this. Can anyone help in this kind of creating a database with multiple locations is allowed ? Is there any alternate solution for this.
PS: My cluster is sentry enabled
Which error? If that is
User xx does not have privileges for CREATETABLE
then look at
http://community.cloudera.com/t5/Batch-SQL-Apache-Hive/quot-User-does-not-have-privileges-for-CREATETABLE-quot-Error/td-p/21044
You may have to omit LOCATION, and upload file directly to a hive warehouse location of that hive schema. I can't think of a better workaround.
I created a Cloudera cluster and imported some sample test files from oracle DB. But after a while I had to change the hostnames of the nodes. I followed the guide mentioned in cloudera site and everything worked fine. But when I try to access tables(using both hive and impala) I created earlier I get the following error:
Fetching results ran into the following error(s):
java.io.IOException: java.lang.IllegalArgumentException: java.net.UnknownHostException: [Old Host Name]
Then I created another table under the same DB (Using Hue>Metastore Tables) and I can access these new tables created under the new hostname with no issue.
Can someone explain how I can access my old tables without reverting back my hostnames. Can I access metastore db and change the table pointers to new hostname.
Never Mind I found the answer.
You can confirm that hive/impala is looking for the wrong location by executing
describe formatted [tablename];
O/P
14 Location: hdfs://[oldhostname]:8020/user/hive/warehouse/sample_07 NULL
Then you can change "Location" property using :
ALTER TABLE sample_07 SET LOCATION "hdfs://[newhostname]:8020/user/hive/warehouse/sample_07";
ps - sample_07 is my the table in concern
Some times this doesn't WORK !!
Above workaround works for sample table which is available by default but I had another table which I sqooped from external DB to a custome metastore DB, this gave me again an error similar to above.
Solution :
Go to host where you've installed hive.
Temporally add the old hostname of the hive server to /etc/hosts (if you don't have external DNS both new and old hostnames should exist in the same host file)
Execute the 'ALTER TABLE ....' at hive shell (or web interface)
Remove the oldhostname entry from /etc/hosts
Try this
hive --service metatool -updateLocation <newfsDefaultFSValue> <old_fsDefaultFSValue>
You can refer to https://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.trb.doc/doc/trb_inst_hive_hostnames.html
I need to have a high availability database system but I can't do it through database cluster or master/slave. I need a jdbc proxy that knows to update multiple data source for a single update statement. I found HA-JDBC project http://ha-jdbc.github.com/ that does that but I was wondering if there are similar or better library than HA-JDBC.
You can see other project around HA features: Sequoia/C-JDBC (http://c-jdbc.ow2.org).
But some links on page are broken ;(