My Sqoop query is stored in a DB. I need to fetch that query and execute it.
Intern that query will archieve the oracle tables and store in the HDFS.
Can anyone please let me know how can I retrieve the row from the DB(containing sqoop query) and execute it.
I am new to Hadoop/Sqoop/Hive.
You need programming language support or you can use shell script.
Related
I have some tables with huge columns (more than 600 columns) and don't have DDL(create table) scripts for it. I can create the script by seeing the table schema using DESCRIBE keyword in oracle nosql, but it's a huge pain in the back because of manual operation.
Is there any way to generate DDL scripts for existing tables in Oracle NoSql database?
There is no way to do this at this time. It's being considered as an extension to the sql shell. One option in user code would be to use the "describe as json ..." call and parse the JSON description of the table to construct the DDL string
Assume I have an Oracle view created by joining few other tables, it is possible to import the view data into Hive.
Thanks in advance
Yes it is possible as it is an object for sqoop and if we can fire a query, we can fetch data via sqoop. This can be also be checked from sqoop eval command and you can try to run to see if view is accessible or not.
sqoop eval (generic-args) (eval-args)
I need to hash certain columns (like email) while copying MySQL tables to HDFS using Sqoop.
Is there a built-in option in sqoop?
If not, how can this be achieved?
EDIT-1
Currently I could think of a very crude way to achieve this: passing a SQL query (instead of table-name) like following to sqoop
SELECT
`name`,
SHA1(`email`) AS `email`,
`dob`
FROM
`my_db`.`users`
Not sure if this would work at all [will update once I've tried]
Even if it works, it (most probably) would require generating SQL-query specific to underlying DB (MySQL, PostgreSQL etc)
Is there a built-in option in sqoop?
No
If not, how can this be achieved?
Approach-1: use SQL-query as already described in question
Approach-2: another straight-forward way would be to perform a 2-step import
do a sqoop import into a Hive temp-table
create a new Hive table from this temp table and perform hashing in the process (a good approach would be Hive CTAS)
Is there a way in sqoop to query data from hive table and write result to RDBMS table.
For example - I want to execute this query
SELECT MAX(DATE) FROM hivedbname.hivetablename
and write(insert or update) the result(in this case,the maximum date)to a table in MYSQL DB.
I know that we can use python or any other programming lang. to achieve this. But, I just want to know is this possible with sqoop.
Thanks
There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.