I have a problem in writing Query using HiveQL.
Is it possible to join a hive table with oracle table?
if yes how?
if no why?
To access data stored in your Hive tables, including joining on them, you will need Oracle Big Data connector.
From the documentation:
Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL.
You first access Hive tables from Oracle Database via external tables . The The external table definition is generated automatically from the Hive table definition. Hive table data can be accessed by querying this external table. The data can be queried with Oracle SQL and joined with other tables in the database.
You can use the Hive table that uses data and can access this Hive table from Oracle Database.
Related
I need to migrate the Oracle database to PostgreSQL.
tables in Oracle are partitioned. I need to migrate data from a partial list of partitions of specific tables.
Is this supported by Ora2Pg?
Thanks.
I am a little confused on where does the hive stores it's data.
Does it stores it's data in HDFS or in a RDBMS ??
Does Hive Meta store uses a RDBMS to store the hive tables metadata ??
Thanks in Advance !!
Hive data are stored in one of Hadoop compatible filesystem: S3, HDFS or other compatible filesystem.
Hive metadata are stored in RDBMS like MySQL, see supported RDBMS.
The location of Hive tables data in S3 or HDFS can be specified for both managed and external tables.
The difference between managed and external tables is that DROP TABLE statement, in managed table, will drop the table and delete table's data. Whereas, for external table DROP TABLE will drop only the table and data will remain as is and can be used for creating other tables over it.
See details here: Create/Drop/Truncate Table
Here is the answer to your question. But I will suggest you to read hive books or apache hive site for better understanding.
Does it stores it's data in HDFS or in a RDBMS ?? - The Data for HIVE is always stored in HDFS. For managed tables the data is stored in hive warehouse by default which is a directory in HDFS. For HIVE External table user can specify the location anywhere in HDFS.
Does Hive Meta store uses a RDBMS to store the hive tables metadata ?? - Yes HIVE uses RDBMS to store the metadata.
There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.
Please any one suggest me how to take hive database backup. we are using mapr.
Regards
Sunilkumar
Currently I have taken backups of Hive DB by using the Import/ Export hive provided utilities. It will backup both the metadata (hive structure info) and the actual data.
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
TO 'export_target_path' [ FOR replication('eventid') ]
IMPORT [[EXTERNAL] TABLE new_or_original_tablename
[PARTITION (part_column="value"[, ...])]]
FROM 'source_path'
[LOCATION 'import_target_path']
But the problem with the above method is for every individual table, you need to provide this statement.
The other method is to get a list of all the available tables in the Hive DB by querying the MySQL Database which will have the metadata of all the Hive Tables. Refer to TBLS table in MySQL for the list of tables.
External Table creation via HIVE JDBC isnt reflected in the hive datawarehouse whereas the normal table creation inside the hive datawarehouse happens without any issue.
After creating the table via Hive JDBC,
stmt.executeQuery("create external table trial (TOPIC STRING) row format delimited fields terminated by '' STORED as TEXTFILE LOCATION '/user/ranjitha/trial'");`
no error returned.
But when I try retrieving from this table trial, nothing is returned.
Here in this link, https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/YTekdFtbelE, it says external table creation not possible using HIVE JDBC.
It would be really helpful if someone can guide me on the above. Is this not possible with JDBC or is there another alternative for the same.
Thanks