Where hive stores table locally? - hadoop

I have created a hive table and trying to locate where hive have created an hdfs file for this table locally. The Hive version is 2.3.0.
I tried this command to retrieve the location of my table
hive> describe formatted table_name;
I got this as an output(only showing relevant output! tb2 is the table_name in this case)
Location: hdfs://localhost:54310/user/hive/warehouse/tb2
I have no clue how to redirect to hdfs://localhost:54310 locally(from terminal). Also the table is not present in hadoop default directory.

Try running the below command to view the hive table. In the output you will find a folder by your tablename
hadoop dfs -ls /user/hive/warehouse/tb2
A table in hive is basically a folder in hdfs.

Related

SemanticException Line 1:23 Invalid path

I'm trying to load a text file from HDFS into Hive database using following command
hive> load data local inpath '/user/hive/input/emp_details.txt' into table emp;
I'm getting the following exception:
FAILED: SemanticException Line 1:23 Invalid path ''/user/hive/input/emp_details.txt'': No files matching path file:/user/hive/input/emp_details.txt
I'm using hive 1.2.2 on hadoop 2.7.2 on Centos7 OS
I gave the full permissions to the file path in HDFS using following command:
hdfs dfs -chmod -R 777 /user/hive/input
Not sure what else is missing, could anyone please suggest what to do. Thanks in advance!
LOCAL keyword means you are trying to load data from local filesystem and not from HDFS.
You should use:
load data inpath '/user/hive/input/emp_details.txt' into table emp;
See also Difference between `load data inpath ` and `location` in hive?

error getting while creating hive table

Before creating the twitter table i added this
ADD JAR hdfs:///user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar;
I got the following error when create the twitter table in hive:
Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde:
com.cloudera.hive.serde.JSONSerDe
Move the Jar from HDFS to Local.
Then try to add JAR in hive terminal
Then try to use the query on Twitter Table
Ideally speaking you can add jars from both Local file system or hdfs, looks like problem could be something else here.
I would recommend to follow below sequence of steps:
List the file on hdfs to make sure it exists
hadoop fs -ls hdfs://namenode_hostname:8020/user/hive/warehouse/hive-serdes-1.0-SNAPSHOT.jar
Add the jar in the hive using full path like above and verify the
addition using list jars command in hive cli Use the serde in
hive>list jars;
create table statement with proper syntax as show here for example
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe

Hive with emrfs

I am importing tables from Amazon RDS to Hive using sqoop. The process is working and the data is being stored in the hive default hdfs directory : /user/hive/warehouse.
I need to change the storage location from hdfs to emrfs s3.
It is my understanding that I need to change (in hive-site.xml on the master node) value of the property hive.metastore.warehouse.dir to the s3//bucket/warehouse-location. It appears that I don't have the permission to modify the file hive-site.xml.
I am looking for some advise on how best to do it.
Sudi
You will need sudo privileges to modify the hive-site.xml file on the masternode (located in /etc/hive/conf/hive-site.xml usually).
If this is not an option, try setting this property before the cluster is started. An example with CloudFormation :
"Configurations" : [
{
"Classification" : "hive-site",
"ConfigurationProperties" : {
"hive.metastore.warehouse.dir" : "s3://your_s3_bucket/hive_warehouse/",
}
}
],
Or through the EMR dialogue in the section for "Edit Software Settings"
sudo vi /etc/hive/conf/hive-site
or
sudo -su root
vi /etc/hive/conf/hive-site.xml
If you are using hive in EMR. The hive metastore is recommended to be set in an external DB or use glue data catalog as hive metastore.
For your concern,
Create the tables you want to import as external tables in the hive. While creating the external table you will have to provide the location parameter as s3 location of your table.
Example: Suppose I have s3 bucket named bucket-xyz and I want my data to be stored in s3://bukcet-xyz/my-table location, where my table name is my-table. Then I will create my-table as an external table using hive.
CREATE EXTERNAL TABLE my-table (A VARCHAR(30), B DOUBLE(9))
ROW FORMAT DELIMITED ...
LOCATION s3://bukcet-xyz/my-table
After this when you will insert data into this table using hive . Hive will store the data in the s3 location you specified.

Not able to create new table in hive from Spark-shell

I am using single node setup in Redhat and installed Hadoop Hive Pig and Spark . I configured hive metadata in Derby and everything . I created new folder for Hive tables and gave full privilege (chmod 777 ) . Then I created one table from Hive CLI and I am able to select those data in Spark-shell and printed those values to the console. But from Spark-shell/Spark-Sql I am not able to create new tables .It is throwing error as
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/2016/hive/test2 is not a directory or unable to create one)
I checked the permission and User(using same user for Installation and Hive and Hadoop Spark etc).
Is there anything need to be done for getting full integration of Spark and Hive
Thanks
Check that the permissions in hdfs are correct (not just the filesystem)
hadoop fs -chmod -R 755 /user
If the error message persists afterwards please update the question.

Hive doesn't show tables when started from another directory

I installed Hive cdh4 on RHEL. Whenever I start Hive from a directory, it creates metastore_db dir in it and a derby.log file. Is it a normal behaviour? Moreover, when I create a table, starting Hive from a particular directory; I'm unable to see that table when I start Hive from a directory, other than that.
For example,
Let's say I started Hive from my home dir, i.e. $HOME or ~ and I create table in Hive. But when I start Hive from /path/to/my/Hive/directory and do a show tables, the table i just creted wouldn't show up. However, if start Hive from my home directory again and look for tables, I'm able to see the table.
Also, if I make some changes in hive-site.xml, they are simply being ignored by Hive.
Please help me where am I going wrong.
You can change this and use one metastore_db by updating "$HIVE_HOME/conf/hive-default.xml" file's "javax.jdo.option.ConnectionURL" as below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/path/to/my/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
Where /path/to/my/metastore_db is the location you want to keep your meta store dB.

Resources