Hive doesn't show tables when started from another directory - hadoop

I installed Hive cdh4 on RHEL. Whenever I start Hive from a directory, it creates metastore_db dir in it and a derby.log file. Is it a normal behaviour? Moreover, when I create a table, starting Hive from a particular directory; I'm unable to see that table when I start Hive from a directory, other than that.
For example,
Let's say I started Hive from my home dir, i.e. $HOME or ~ and I create table in Hive. But when I start Hive from /path/to/my/Hive/directory and do a show tables, the table i just creted wouldn't show up. However, if start Hive from my home directory again and look for tables, I'm able to see the table.
Also, if I make some changes in hive-site.xml, they are simply being ignored by Hive.
Please help me where am I going wrong.

You can change this and use one metastore_db by updating "$HIVE_HOME/conf/hive-default.xml" file's "javax.jdo.option.ConnectionURL" as below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/path/to/my/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
Where /path/to/my/metastore_db is the location you want to keep your meta store dB.

Related

how hive is running without hive-site.xml file?

I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.

Not able to see databases after creating new hive metastore

I have manually installed hadoop and hive on my Ubuntu 16.04 laptop. Hive was working fine and I created a few test databases (derby).
On restarting laptop, I found that hive was running but running any command like show databases, it was giving error.
I followed the solutions given web. ie:
1) rename metastore_db to metastore_db.tmp.
2) run schematool to generate new metastore_db
3) remove tmp metastore_db.tmp (Not removing gives error when you run hive)
Now I am able to run hive but on running show databases I see only default database.
Is there any way to add databases I created previously (for exxample /user/hive/warehouse/computersalesdb.db saved in hdfs filesystem) to newly generated metastore?
* UPDATE *
On further analysis I found, metastore_db folder is being created where ever I run hive. So this seems to be the cause of problem. The solution is:
1) As advised in comment by #cricket_007 have metastore in mysql or any other rdbms you are using.
2) Always run hive from same folder
3) set property “javax.jdo.option.ConnectionURL” to create metastore in specific folder, which is defined in hive-site.xml
Leaving this comment for the benefit of other nubes like me :D

Hive with emrfs

I am importing tables from Amazon RDS to Hive using sqoop. The process is working and the data is being stored in the hive default hdfs directory : /user/hive/warehouse.
I need to change the storage location from hdfs to emrfs s3.
It is my understanding that I need to change (in hive-site.xml on the master node) value of the property hive.metastore.warehouse.dir to the s3//bucket/warehouse-location. It appears that I don't have the permission to modify the file hive-site.xml.
I am looking for some advise on how best to do it.
Sudi
You will need sudo privileges to modify the hive-site.xml file on the masternode (located in /etc/hive/conf/hive-site.xml usually).
If this is not an option, try setting this property before the cluster is started. An example with CloudFormation :
"Configurations" : [
{
"Classification" : "hive-site",
"ConfigurationProperties" : {
"hive.metastore.warehouse.dir" : "s3://your_s3_bucket/hive_warehouse/",
}
}
],
Or through the EMR dialogue in the section for "Edit Software Settings"
sudo vi /etc/hive/conf/hive-site
or
sudo -su root
vi /etc/hive/conf/hive-site.xml
If you are using hive in EMR. The hive metastore is recommended to be set in an external DB or use glue data catalog as hive metastore.
For your concern,
Create the tables you want to import as external tables in the hive. While creating the external table you will have to provide the location parameter as s3 location of your table.
Example: Suppose I have s3 bucket named bucket-xyz and I want my data to be stored in s3://bukcet-xyz/my-table location, where my table name is my-table. Then I will create my-table as an external table using hive.
CREATE EXTERNAL TABLE my-table (A VARCHAR(30), B DOUBLE(9))
ROW FORMAT DELIMITED ...
LOCATION s3://bukcet-xyz/my-table
After this when you will insert data into this table using hive . Hive will store the data in the s3 location you specified.

Hive not fully honoring fs.default.name/fs.defaultFS value in core-site.xml

I have the NameNode service installed on a machine called hadoop.
The core-site.xml file has the fs.defaultFS (equivalent to fs.default.name) set to the following:
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:8020</value>
</property>
I have a very simple table called test_table that currently exists in the Hive server on the HDFS. That is, it is stored under /user/hive/warehouse/test_table. It was created using a very simple command in Hive:
CREATE TABLE new_table (record_id INT);
If I attempt to load data into the table locally (that is, using LOAD DATA LOCAL), everything proceeds as expected. However, if the data is stored on the HDFS and I want to load from there, an issue occurs.
I run a very simple query to attempt this load:
hive> LOAD DATA INPATH '/user/haduser/test_table.csv' INTO TABLE test_table;
Doing so leads to the following error:
FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''/user/haduser/test_table.csv'':
Move from: hdfs://hadoop:8020/user/haduser/test_table.csv to: hdfs://localhost:8020/user/hive/warehouse/test_table is not valid.
Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
As the error states, it is attempting to move from hdfs://hadoop:8020/user/haduser/test_table.csv to hdfs://localhost:8020/user/hive/warehouse/test_table. The first path is correct because it references hadoop:8020; the second path is incorrect, because it references localhost:8020.
The core-site.xml file clearly states to use hdfs://hadoop:8020. The hive.metastore.warehouse value in hive-site.xml correctly points to /user/hive/warehouse. Thus, I doubt this error message has any true value.
How can I get the Hive server to use the correct NameNode address when creating tables?
I found that the Hive metastore tracks the location of each table. You can see the that location be running the following in the Hive console.
hive> DESCRIBE EXTENDED test_table;
Thus, this issue occurs if the NameNode in core-site.xml was changed while the metastore service was still running. Therefore, to resolve this issue the service should be restarted on that machine:
$ sudo service hive-metastore restart
Then, the metastore will use the new fs.defaultFS for newly created tables such.
Already Existing Tables
The location for tables that already exist can be corrected by running the following set of commands. These were obtained from Cloudera documentation to configure the Hive metastore to use High-Availability.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://localhost:8020/user/hive/warehouse
hdfs://localhost:8020/user/hive/warehouse/test.db
Correcting the NameNode location:
$ /usr/lib/hive/bin/metatool -updateLocation hdfs://hadoop:8020 hdfs://localhost:8020
Now the listed NameNode is correct.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://hadoop:8020/user/hive/warehouse
hdfs://hadoop:8020/user/hive/warehouse/test.db

Hive tables went missing

I had created a couple of tables in hive. I hit a few queries on them. Then exited hive, closed hadoop mapred and dfs after that. Then came back the next day only to see that tables went missing !!
My hive uses local metastore. After a lot of searching I saw only one such issue posted by someone. It was suggested in the answer that local if metastore is used then hive should be started from that same location. And I had done the same. I ran the hive from the master only, never even had logged into slave. Metastore folder is still there. So what must have gone wrong? I checked datanode logs of hadoop and hive metastore logs. But found nothing. Where can I found what went wrong? Please help me with this. Also what can be done to avoid such things?
If you use local metastore, Hive creates metastore_db in the directory from where hiveserver2 is started. So if you start the hiveserver2 from a different directory location next time, then a new metastore_db will be created at that location and this metastore_db will not have metadata about your earlier tables.
Where you using a database the first day? Where you using it the second day?
Meaning
hive> show databases;
OK
default
test
Time taken: 1.575 seconds
hive> use database test;
hive> show tables;
OK
blah
Time taken: 0.141 seconds
hive use table blah;
If you forgot to use a database or create one things could get messy.
Also what does the following command return?
sudo -u hdfs hadoop fs -ls -R \

Resources