Sqoop export from Hcatalog to MySQL with different col names assign - hadoop

Now my hive table with columns - id, name
and MySQL table - number, id, name
I want to map id (from hive) with number (from mysql), name (from hive) with id (from mysql).
I use the command :
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
However, it didn't work.
The same scenario liked this case can work fine [1]. The requirement can be fulfilled by locating the hive table on hdfs and using the following command to achieve.
sqoop export --export-dir /[hdfs_path] --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
Is there any solution can fulfill my scenario via Hcatalog?
reference :
[1]. Sqoop export from hive to oracle with different col names, number of columns and order of columns

I didn't used the hcatalog part of sqoop, but as is written in the manual, the next script should do the work:
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --map-column-hive "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
This option: --map-column-hive when is used along with --hcatalog, does the work for hcatalog instead of hive.
Hope that this works for you.

Related

sqoop import as parquet file to target dir, but can't find the file

I have been using sqoop to import data from mysql to hive, the command I used are below:
sqoop import --connect jdbc:mysql://localhost:3306/datasync \
--username root --password 654321 \
--query 'SELECT id,name FROM test WHERE $CONDITIONS' --split-by id \
--hive-import --hive-database default --hive-table a \
--target-dir /tmp/yfr --as-parquetfile
The Hive table is created and the data is inserted, however I can not find the parquet file.
Does anyone know?
Best regards,
Feiran
Sqoop import to hive works in 2 steps:
Fetching data from RDBMS to HDFS
Create hive table if not exists and Load data into hive table
In your case,
firstly, data is stored at --target-dir i.e. /tmp/yfr
Then, it is loaded into Hive table a using
LOAD DATA INPTH ... INTO TABLE..
command.
As mentioned in the comments, data is moved to hive warehouse directory that's why there is no data in --target-dir.

Incrimental update in HIVE table using sqoop

I have a table in oracle with only 4 columns...
Memberid --- bigint
uuid --- String
insertdate --- date
updatedate --- date
I want to import those data in HIVE table using sqoop. I create corresponding HIVE table with
create EXTERNAL TABLE memberimport(memberid BIGINT,uuid varchar(36),insertdate timestamp,updatedate timestamp)LOCATION '/user/import/memberimport';
and sqoop command
sqoop import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username ** --password *** --hive-import --table MEMBER --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
Its working properly and able to import data in HIVE table.
Now I want to update this table with incremental update with updatedate (last value today's date) so that I can get day to day update for that OLTP table into my HIVE table using sqoop.
For Incremental import I am using following sqoop command
sqoop import --hive-import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username *** --password *** --table MEMBER --check-column UPDATEDATE --incremental append --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
But I am getting exception
"Append mode for hive imports is not yet supported. Please remove the parameter --append-mode"
When I remove the --hive-import it run properly but I did not found those new update in HIVE table that I have in OLTP table.
Am I doing anything wrong ?
Please suggest me how can I run incremental update with Oracle - Hive using sqoop.
Any help will be appropriated..
Thanks in Advance ...
Although i don't have resources to replicate your scenario exactly.
You might want to try building a sqoop job and test your use case.
sqoop job --create sqoop_job \
-- import \
--connect "jdbc:oracle://server:port/dbname" \
--username=(XXXX) \
--password=(YYYY) \
--table (TableName)\
--target-dir (Hive Directory corresponding to the table) \
--append \
--fields-terminated-by '(character)' \
--lines-terminated-by '\n' \
--check-column "(Column To Monitor Change)" \
--incremental append \
--last-value (last value of column being monitored) \
--outdir (log directory)
when you create a sqoop job, it takes care of --last-value for subsequent runs. Also here i have used the Hive table's data file as target for incremental update.
Hope this provides a helpful direction to proceed.
There is no direct way to achieve this in Sqoop. However you can use 4 Step Strategy.

Using sqoop to create an orc table

I have a DB2 database running with a table t1, and a hadoop cluster. I want to create an orc table in hadoop, with the same table definition as t1.
For this task I want to use Sqoop.
I tried using the sqoop create-hive-table command, but this command isn't compatible with hcatalog - and from what I've found, hcatalog is the only command, that allows me to create orc tables.
Instead i do this:
sqoop import \
--driver com.ibm.db2.jcc.DB2Driver \
--connect jdbc:db2://XXXXXXX \
--username user \
--password-file file:///pass.txt \
--query "select * from D1.t1 where \$CONDITIONS and reptime < '1864-11-16 13:23:54.749' fetch first 1 rows only" \
--split-by 1 \
--hcatalog-database default \
--hcatalog-table t1 \
--create-hcatalog-table \
--hcatalog-storage-stanza "stored as orcfile"
Which queries the database about somthing that does not exist and creates an orc table. Of course, this isn't optimal - any ideas on how to do this with sqoop create-hive-table, or at least without having to do a useless database query returning nothing?

Appending Data to hive Table using Sqoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe
Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.
The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

How to use a specified Hive database when using Sqoop import

sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import
The above command imports table tb into the 'default' Hive database.
Can I use other database instead?
Off the top of my head i recall you can specify --hive-table foo.tb
where foo is your hive database and tb is your hive table.
so in your case it would be:
sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import --hive-table foo.tb
As a footnote, here is the original jira issue https://issues.apache.org/jira/browse/SQOOP-322
Hive database using Sqoop import:
sqoop import --connect jdbc:mysql://localhost/arun --table account --username root --password root -m 1 --hive-import **--hive-database** company **--create-hive-table --hive-table** account --target-dir /tmp/customer/ac
You can specify the database name as a part of the --hive-table parameter, e.g. "--hive-table foo.tb".
There is a new request to add a special parameter for the database that is being tracked: SQOOP-912.

Resources