How to get table name based on the HDFS location? - shell

I would like to find out the table name based on the HDFS location using a shell script.
Is it possible?

I doubt there is a direct hive cli command to get that but you can check the Hive Metastore database/table (mostly MySQL/MariaDB db/table) to get Hive table name based on location, if you have access.

Related

Is it possible to use/query data using Pig/Tableau or some-other tool from HDFS which was inserted/loaded using a HIVE Managed table?

Is it possible to use or query data using Pig or Drill or Tableau or some other tool from HDFS which was inserted/loaded using a HIVE Managed table; or is it only applicable with data in HDFS which was inserted/loaded using a HIVE External table?
Edit 1: Is the data associated with Managed Hive Tables locked to Hive?
Managed and external tables only refer to the file system, not visibility to clients. Both can be accessed with Hive clients
Hive has HiveServer2 (use Thrift) which is a service that lets clients execute queries against Hive. It provides JDBC/ODBC.
So you have query data in hive however it is managed table by hive or external tables.
DBeaver/Tableau can query hive once connected to HiveServer2.
For Pig you can use HCatalog
pig -useHCatalog

Importing data to hbase using sqoop

When I want to import the data to hive using sqoop I can specify --hive-home <dir> and sqoop will call that specified copy of hive installed on the machine where the script is being executed. But what about hbase? How does sqoop know which hbase instance/database I want the data to be imported on?
Maybe the documentation helps?
By specifying --hbase-table, you instruct Sqoop to import to a table in HBase rather than a directory in HDFS
Every example I see just shows that option along with column families, and whatnot, so I assume it depends on whatever variables that might be part of the sqoop-env.sh, like what the Hortonworks docs say
When you give the hive home directory, that's not telling it any database or table information either, but rather where Hive configuration files exist on the machine you're running Sqoop on. By default, that's set to be the environment variable $HIVE_HOME

Where to find the location of hive database when i am giving the location?

I am creating a database and I am giving the location like this:
create database talent loaction '/home/hadoop';
OR
create database talent location '/Input';
Input is the folder that I created in HDFS.
but when I check in this location I am not getting this database.
I can see the databases in warehouse that I created without giving the location.
For anyone who arrived at this page looking to find the location on HDFS of a database your_database_name, you can get that with the following:
DESCRIBE DATABASE EXTENDED your_database_name
For example in PySpark:
sqlContext.sql('DESCRIBE DATABASE EXTENDED your_database_name').toPandas()
when you create database without using location like create database talent ,it will create in by default location /user/hive/warehouse in hdfs. You can see in hdfs by using command
hdfs dfs -ls /user/hive/warehouse
if you create database using location then it will create the db in given location.
create database talent location '/Input';
You can see the db using command
hdfs dfs -ls /Input

Unable to retain HIVE tables

I have set up a single node hadoop cluster on ubuntu.I have installed hadoop 2.6 version on in my machine.
Problem:
Everytime i create HIVE tables and load data into it , i can see the data by querying on it but once i shut-down my hadoop , tables gets wiped out. Is there any way i can retain them or is there any setting i am missing?
I tried some online solution provided , but nothing worked , kindly help me out with this.
Blockquote
Thanks
B
The hive table data is on the hadoop hdfs, hive just add a meta data and provide users sql like commands to prevent them from writing basic MR jobs.So if you shutdown the hadoop cluster,Hive cant find the data in the table.
But if you are saying when you restart the hadoop cluster, the data is lost.That's another problem.
seems you are using default derby as metastore.configure the hive metastore.i am pointing the link.please fallow it.
Hive is not showing tables

hadoop hive question

I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database.
Is any setting i need to configure?
Thanks in advance.
Make sure you run hive from the same directory every time because when you launch hive CLI for the first time, it creates a metastore derby db in the current directory. This derby DB contains metadata of hive tables. If you change directories, you will have unorganized metadata for hive tables. Also the Derby DB cannot handle multiple sessions. To allow for concurrent Hive access you would need to use a real database to manage the Metastore rather than the wimpy little derbyDB that comes with it. You can download mysql for this and change hive properties for jdbc connection to mysql type 4 pure java driver.
Try emailing the Hive userlist or the IRC channel.
You probably need to setup the central Hive metastore (by default, Derby, but it can be mySQL/Oracle/Postgres). The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
For more information, see http://wiki.apache.org/hadoop/HiveDerbyServerMode
Examine your hadoop logs. For me this happened when my hadoop system was not setup properly. The namenode was not able to contact the datanodes on other machines etc.
Yeah, it's due to the metastore not being set up properly. Metastore stores the metadata associated with your Hive table (e.g. the table name, table location, column names, column types, bucketing/sorting information, partitioning information, SerDe information, etc.).
The default metastore is an embedded Derby database which can only be used by one client at any given time. This is obviously not good enough for most practical purposes. You, like most users, should configure your Hive installation to use a different metastore. MySQL seems to be a popular choice. I have used this link from Cloudera's website to successfully configure my MySQL metastore.

Resources