I have a data set in parquet saved on s3, I would like to save them on Oracled DB on AWS. Is there a way to transfer the data without passing the schema of all table?
Related
I'm new to hive and read about it online too. But still having doubts which are not cleared.
for hive external tables, hive keep table's metadata within HDFS, but not in its warehouse which is also in HDFS. correct ?
whether its internal or external table, in both cases data of table will be available in HDFS only but NOWHERE else. Mean to say, data can taken from anywhere but has to be loaded in HDFS, because HIVE uses hadoop's processing engine to process data. Correct ?
internal table, table's metadata and table's data both will be available in HIVE's data warehouse, and this data warehouse will be at nowhere else but in HDFS only. correct ?
in external table, table's metadata and table's data both will be NOT available in HIVE's data warehouse but in HDFS. But hive must be keeping some info with itself that where is table's metadata located and where is its data located in HDFS, correct ?
Can anyone share feedback to above understanding ?
THanks
Hive uses relational database like MySQL, MariaDB, PostgreSQL, Oracle, DerbyDB(for embedded deployment only) for storing metadata (databases, tables definitions, statistics, grants, etc). See deployment modes and database requirements. Does not matter Internal or external table, the metadata are stored in the relational database.
Yes, the data is stored in HDFS, but also Hive supports integration with external databases using JDBC storage handler. Such table looks like normal Hive table, but the data is stored in some database, your queries executed in the database, predicate push-down works, you can use hive native tables with storage handler tables in single query. Also HBase storage handler is available, Kafka storage handler, etc, you can write your own storage handler.
Depending on your Hive version/vendor It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS. Though Cloudera prefers to have managed tables in dedicated HDFS location for them, see https://stackoverflow.com/a/67073849/2700344 and does not allow to specify location for managed tables outside the warehouse root by default. Read abot the difference between managed and external tables here.
Everything seems correct except last one. When you create external table table metadata will be stored in the Hive otherwise you can not query through hive. HDFS itself keeps control of your data when you create external table. While when you create internal table Hive will be responsible. Dropping internal table drops your data and metadata but dropping external table only drops metadata from Hive. But your data will be remain inside of your file system. Thats why we are changing table types a lot as a workaround when some of our external connection is not compatible with our hive version.
I have DB Oracle in two clients independently with the same structures.
and I want to transfer data from each client to Central DB Oracle with new PK and FK for each one With the consistency of data in each data base according to sequences in Central DB.
Is there any tools in oracle DB or solution to doing this.
I have a requirement where I want to convert my oracle DB data to parquet. So in my database I have multiple tables for example Employee, Department.
So is it possible to insert the data of both the tables in single parquet file? Or do i need to create separate parquet file for each table?
I am a little confused on where does the hive stores it's data.
Does it stores it's data in HDFS or in a RDBMS ??
Does Hive Meta store uses a RDBMS to store the hive tables metadata ??
Thanks in Advance !!
Hive data are stored in one of Hadoop compatible filesystem: S3, HDFS or other compatible filesystem.
Hive metadata are stored in RDBMS like MySQL, see supported RDBMS.
The location of Hive tables data in S3 or HDFS can be specified for both managed and external tables.
The difference between managed and external tables is that DROP TABLE statement, in managed table, will drop the table and delete table's data. Whereas, for external table DROP TABLE will drop only the table and data will remain as is and can be used for creating other tables over it.
See details here: Create/Drop/Truncate Table
Here is the answer to your question. But I will suggest you to read hive books or apache hive site for better understanding.
Does it stores it's data in HDFS or in a RDBMS ?? - The Data for HIVE is always stored in HDFS. For managed tables the data is stored in hive warehouse by default which is a directory in HDFS. For HIVE External table user can specify the location anywhere in HDFS.
Does Hive Meta store uses a RDBMS to store the hive tables metadata ?? - Yes HIVE uses RDBMS to store the metadata.
I am using hive v0.13
My data is stored in hdfs, I use create "CREATE external TABLE" to create a table for those data. Everything works fine, I can issue "select" statements. The question is under the warehouse directory (hive.metastore.warehouse.dir), I don't see any files/data get added, is this normal? I know with "external" table data will not get copy to warehouse directory but shouldn't there be table meta data be stored under there?
When you create a internal table hive creates a directory with table name under the directory you have specified in hive.metastore.warehouse.dir. For me it /apps/hive/warehouse.
Suppose you have created a table name test_tbl then there will be a directory /apps/hive/warehouse/test_tbland hive store metadata into mysql or your configured RDBMS for store metadata.and when you load data using LOAD DATA INPATH command into this directory.
But in external table you specify a location in your create statement hence hive doesn't create any directory in default warehouse directory because you have already provided the location. it just store metadata information in RDBMS
You can directly load data into that location using hdfs dfs -put command and hive will treat that data for the table which is associated with that particular directory. Hence it is expected behavior for external table.
when you create a external table Metadata will be genrally stored in the RDBMS i.e., in metastore database and the data which you insert or load will be stored in the directory.
either it is an external or managed table metadata will always be in RDBMS when you query on any table hive will actually get the table schema from metastore and data from HDFS evaluates the schema with data and displays.
So, there wont be any metadata created in warehouse for external tables.