Is there a way in sqoop to query data from hive table and write result to RDBMS table.
For example - I want to execute this query
SELECT MAX(DATE) FROM hivedbname.hivetablename
and write(insert or update) the result(in this case,the maximum date)to a table in MYSQL DB.
I know that we can use python or any other programming lang. to achieve this. But, I just want to know is this possible with sqoop.
Thanks
Related
There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.
Please any one suggest me how to take hive database backup. we are using mapr.
Regards
Sunilkumar
Currently I have taken backups of Hive DB by using the Import/ Export hive provided utilities. It will backup both the metadata (hive structure info) and the actual data.
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
TO 'export_target_path' [ FOR replication('eventid') ]
IMPORT [[EXTERNAL] TABLE new_or_original_tablename
[PARTITION (part_column="value"[, ...])]]
FROM 'source_path'
[LOCATION 'import_target_path']
But the problem with the above method is for every individual table, you need to provide this statement.
The other method is to get a list of all the available tables in the Hive DB by querying the MySQL Database which will have the metadata of all the Hive Tables. Refer to TBLS table in MySQL for the list of tables.
I have a problem in writing Query using HiveQL.
Is it possible to join a hive table with oracle table?
if yes how?
if no why?
To access data stored in your Hive tables, including joining on them, you will need Oracle Big Data connector.
From the documentation:
Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL.
You first access Hive tables from Oracle Database via external tables . The The external table definition is generated automatically from the Hive table definition. Hive table data can be accessed by querying this external table. The data can be queried with Oracle SQL and joined with other tables in the database.
You can use the Hive table that uses data and can access this Hive table from Oracle Database.
I have an oracle table which has 80 columns and id partitioned on state column. My requirement is to create a hive table with similar schema of oracle table and partitioned on state.
I tried using sqoop -create-hive-table option. But keep getting an error
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Partition key state cannot be a column to import.
I understand that in Hive the partitioned column should not be in table definition, but then how do I get around the issue?
I do not want to manually write create table command, as I have 50 such tables to import and would like to use sqoop.
Any suggestion or ideas?
Thanks
There is a turn around for this.
Below is the procedure i fallow :
On Oracle run query to get the schema for a table and store it to a file.
Move that file to Hadoop
On Hadoop create a shell script which constructs a HQL file.
That hql file contains "Hive create table statement along with columns". For this we can use the above file(Oracle schema file copied to hadoop).
For this script to run u need to just pass Hive database name,table name, partition column name,path, etc.. depending on u r customization level.At the end of this shell script add "hive -f HQL filename".
If everything is ready it just takes couple of mins for each table creation.
My Sqoop query is stored in a DB. I need to fetch that query and execute it.
Intern that query will archieve the oracle tables and store in the HDFS.
Can anyone please let me know how can I retrieve the row from the DB(containing sqoop query) and execute it.
I am new to Hadoop/Sqoop/Hive.
You need programming language support or you can use shell script.