Hive not fully honoring fs.default.name/fs.defaultFS value in core-site.xml - hadoop

I have the NameNode service installed on a machine called hadoop.
The core-site.xml file has the fs.defaultFS (equivalent to fs.default.name) set to the following:
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:8020</value>
</property>
I have a very simple table called test_table that currently exists in the Hive server on the HDFS. That is, it is stored under /user/hive/warehouse/test_table. It was created using a very simple command in Hive:
CREATE TABLE new_table (record_id INT);
If I attempt to load data into the table locally (that is, using LOAD DATA LOCAL), everything proceeds as expected. However, if the data is stored on the HDFS and I want to load from there, an issue occurs.
I run a very simple query to attempt this load:
hive> LOAD DATA INPATH '/user/haduser/test_table.csv' INTO TABLE test_table;
Doing so leads to the following error:
FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''/user/haduser/test_table.csv'':
Move from: hdfs://hadoop:8020/user/haduser/test_table.csv to: hdfs://localhost:8020/user/hive/warehouse/test_table is not valid.
Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
As the error states, it is attempting to move from hdfs://hadoop:8020/user/haduser/test_table.csv to hdfs://localhost:8020/user/hive/warehouse/test_table. The first path is correct because it references hadoop:8020; the second path is incorrect, because it references localhost:8020.
The core-site.xml file clearly states to use hdfs://hadoop:8020. The hive.metastore.warehouse value in hive-site.xml correctly points to /user/hive/warehouse. Thus, I doubt this error message has any true value.
How can I get the Hive server to use the correct NameNode address when creating tables?

I found that the Hive metastore tracks the location of each table. You can see the that location be running the following in the Hive console.
hive> DESCRIBE EXTENDED test_table;
Thus, this issue occurs if the NameNode in core-site.xml was changed while the metastore service was still running. Therefore, to resolve this issue the service should be restarted on that machine:
$ sudo service hive-metastore restart
Then, the metastore will use the new fs.defaultFS for newly created tables such.
Already Existing Tables
The location for tables that already exist can be corrected by running the following set of commands. These were obtained from Cloudera documentation to configure the Hive metastore to use High-Availability.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://localhost:8020/user/hive/warehouse
hdfs://localhost:8020/user/hive/warehouse/test.db
Correcting the NameNode location:
$ /usr/lib/hive/bin/metatool -updateLocation hdfs://hadoop:8020 hdfs://localhost:8020
Now the listed NameNode is correct.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://hadoop:8020/user/hive/warehouse
hdfs://hadoop:8020/user/hive/warehouse/test.db

Related

how hive is running without hive-site.xml file?

I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.

Error while adding UDF in hive

I have to add a UDF in hive.
The query I am trying is :
create function strip1 as 'com.hadoopbook.hive.Strip' using jar '/home/hduser/Hadoop-tutorial/hadoop-book-master/ch17-hive/src/main/java/com/hadoopbook/hive/Strip.jar'
But I am getting a exception as :
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. Hive warehouse is non-local, but /home/hduser/Hadoop-tutorial/hadoop-book-master/ch17-hive/src/main/java/com/hadoopbook/hive/Strip.jar specifies file on local filesystem. Resources on non-local warehouse should specify a non-local scheme/path
Can anyone tell how to solve this ?
Three options:
copy the jar on hdfs and use that path.
OR
as error is telling you: In the $HIVE_HOME/conf directory there is the hive-default.xml and/or hive-site.xml which has the hive.metastore.warehouse.dir property. add hdfs:/ to this path, and restart/re-run the hive shell/script:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://usr/hive/warehouse </value>
<description>location of the warehouse directory</description>
</property>
OR
if you are running hive queries from hive shell then:
hive> set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/user/hive/warehouse
above command prints the path, just prefix the hdfs:/ to it as below and then re-run your hive command(s) :
hive> set hive.metastore.warehouse.dir="hdfs://user/hive/warehouse";
You could setting the configuration hive.aux.jars.path to /home/hduser/Hadoop-tutorial/hadoop-book-master/ch17-hive/src/main/java/com/hadoopbook/hive/
and create hive udf function via below command:
create function strip1 as 'com.hadoopbook.hive.Strip'
You can first try to add UDF jar to a hdfs location instead of the local directory:
$ add jar "hdfs://user/cloudera/hive/udf/Strip.jar"
and then create hive function as below:
$ create function test_function as "com.hadoopbook.hive.Strip"
Hope this helps :)

Issue with load data into HIVE

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.
Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.
ERROR MESSAGE:
An error occurred when executing the SQL command: load data inpath
's3://buckername/export/employee_1/' into table employee_2 Query
returned non-zero code: 10028, cause: FAILED: SemanticException [Error
10028]: Line 1:17 Path is not legal
''s3://buckername/export/employee_1/'': Move from:
s3://buckername/export/employee_1 to:
hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not
valid. Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB
Errorcode=10028]
I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.
Command:
--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000
Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason:
Unrecognized option: -hiveconf
Any help in this will be really appreciated.
LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?
If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).
As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.
Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.

Hive doesn't show tables when started from another directory

I installed Hive cdh4 on RHEL. Whenever I start Hive from a directory, it creates metastore_db dir in it and a derby.log file. Is it a normal behaviour? Moreover, when I create a table, starting Hive from a particular directory; I'm unable to see that table when I start Hive from a directory, other than that.
For example,
Let's say I started Hive from my home dir, i.e. $HOME or ~ and I create table in Hive. But when I start Hive from /path/to/my/Hive/directory and do a show tables, the table i just creted wouldn't show up. However, if start Hive from my home directory again and look for tables, I'm able to see the table.
Also, if I make some changes in hive-site.xml, they are simply being ignored by Hive.
Please help me where am I going wrong.
You can change this and use one metastore_db by updating "$HIVE_HOME/conf/hive-default.xml" file's "javax.jdo.option.ConnectionURL" as below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/path/to/my/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
Where /path/to/my/metastore_db is the location you want to keep your meta store dB.

Unable to instantiate HiveMetaStoreClient

I have a 3 nodes cluster running hive.
When i try to run some test from outside the cluster i am getting following given below error
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Logging initialized using configuration in file:/net/slc01nwj/scratch/ashsshar/view_storage/ashsshar_bda_latest_2/work/hive_scratch/conf/hive-log4j.properties
When I login to cluster node and execute hive its working fine.
hive> show databases ;
OK
default
Following error is genereted in test log files
13/04/04 03:10:49 ERROR security.UserGroupInformation: PriviledgedActionException as:ashsshar {my username }(auth:SIMPLE) cause:java.io.IOException: javax.jdo.JDOFatalDataStoreException: Failed to create database '/var/lib/hive/metastore/metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to create database '/var/lib/hive/metastore/metastore_db', see the next exception for details.
My hive-site.xml file contains this connection property ::
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
I have changed the /var/lib/hive/metastore/metastore_db at my cluster node, but still getting the same error
I have also tried removing all *lck files from above directory
Does {username} have the permissions to create
/var/lib/hive/metastore/metastore_db ?
If it is a test cluster you could do
sudo chmod -R 777 /var/lib/hive/metastore/metastore_db
or chown it to the user running it.
Try removing the $HADOOP_HOME/build folder. I had same problem with hive-0.10.0 or above versions. Then I tried hive-0.9.0 and got a different set of errors. Luckily found this thread Hive doesn't work on install. Tried the same trick and it worked for me magically. I am using default derby db.
this is for permissions issue for hive folder. please de the following will work well.
go to hive user ,for me hduser,
sudo chmod -R 777 hive
This issue occur due to abrupt termination of hive shell. Which created a unattended db.lck file.
TO resolve this issue,
browse to your metastore_db location
remove the tmp, dbex.lck and db.lck files.
Open the hive shell again. It will work.
You can see tmp, dbex.lck and db.lck files get created once again.
It worked after i moved the metastore from /var/lib/hive/. I did that by editing: /etc/hive/conf.dist/hive-site.xml
from:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
to:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/prashant/hive/metastore/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
`
Pls make sure of that whether you have a MetaStore_db in your hadoop directory already, if have, remove it and format your hdfs again,
and then try to start hive
Yes it's privilege problem. Enter your hive shell by following command:
sudo -u hdfs hive

Resources