Sqoop bulk export to SQL Server - sqoop

I'm trying to export 21046329 rows into SQL Server but it is very slow. I tried to use bulk insert but it doesn't work. Looking into SQL Server Profiler I saw that it inserts only 1 row per stmt, after that I took a look at sqoop mapper and I saw that it generates SQL queries like this
'org.apache.sqoop.mapreduce.sqlserver.SqlServerExportBatchOutputFormat:
Using query INSERT INTO [test3] ([col1], [col2], [col3], [col4],
[col5], [col6], [col7]) VALUES (?, ?, ?, ?, ?, ?, ?)'
Sqoop export command:
sqoop export
--connect 'jdbc:sqlserver://server:port;database=test2;EnableBulkLoad=true;BulkLoadBatchSize=100024;BulkLoadOptions=0'
--username test --password pass --table 'test3'
--export-dir /exportDirFromHDFS --input-lines-terminated-by "n"
--input-fields-terminated-by ','
--batch -m 10
Does someone know how to solve this?

Try to use the property sqoop.export.records.per.statement to
specify the number of records that will be used in each insert statement:
sqoop export \
-Dsqoop.export.records.per.statement=10 \
--connect
...
As a result, Sqoop will create the following query:
INSERT INTO table VALUES (...), (...), (...), ...;

Related

Using teradata fast export within sqoop command

Having an issue with sqooping from a Teradata database when using the Teradata method "--fast-export", example sqoop query is below
-Dhadoop.security.credential.provider.path=jceks:/PATH/TO/password/password.jcecks
-Dteradata.db.job.data.dictionary.usexviews=false
--connect
jdbc:teradata://DATABASE
--password-alias
password.alias
--username
USER
--connection-manager
org.apache.sqoop.teradata.TeradataConnManager
--fields-terminated-by
'\t'
--lines-terminated-by
'\n'
--null-non-string
''
--null-string
''
--num-mappers
8
--split-by
column3
--target-dir
/THE/TARGET/DIR
--query
SELECT column1,column2,column3 WHERE column3 > '2020-01-01 00:00:00' and column3 <= '2020-01-12 10:41:20' AND $CONDITIONS
--
--method
internal.fastexport
The error I am getting is
Caused by: com.teradata.connector.common.exception.ConnectorException: java.sql.SQLException: [Teradata Database] [TeraJDBC ] [Error 3524] [SQLState 42000] The user does not have CREATE VIEW access to database DATABASE.
I suspect fast export will implement a staging table/view to be temporarily created, and the job under the hood will be ingesting from the temp table. Is this a sqoop mechanism and is it possible to turn it off?
Many thanks
Dan
Fast export does not implement any view to extract data. The view is being created by Sqoop based on --query value. Hence, the user running the job must have CV right granted on the DATABASE.
You can check user's rights on the database by running the below query replacing USER_NAME and DATABASE_NAME by their values in your env.
ACCESS_RIGHT = 'CV' , means CREATE VIEW so leave it as it is.
SELECT *
FROM dbc.allRoleRights WHERE roleName IN
(SELECT roleName FROM dbc.roleMembers WHERE grantee = 'USER_NAME')
AND DATABASENAME = 'DATABASE_NAME'
AND ACCESS_RIGHT = 'CV'
ORDER BY 1,2,3,5;
You may need CT (Create table) rights in order to create log table for fast export. This is given by Sqoop parameters --error-table and --error-database

How to pass a string value in sqoop free form query

I need to import data from few different SQL servers which have same tables, table structure and even primary key value. So to uniquely identify a record, ingested from a SQLserver say "S1", i want to have a extra column - say "serverName" in my hive tables. How should i add this in my sqoop free form query.
All i want to do is pass a hardcoded value along with list of columns such that the hardcoded column value should get stored in Hive. Once done, I can take care of dynamically changing this value depending upon the server data.
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims
S1 being the hardcoded value here. I am thinking in SQL-way that when you pass a hardcode value, same is returned as the query result. Any pointers how to get this done?
Thanks in Advance.
SOLVED: Actually it just needed an alias for the hardcoded value. So the sqoop command executed is -
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' as serverName from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims

Hadoop-Sqoop import without an integer value using split-by

I am importing data from memsql to Hdfs using Sqoop. My source table in Memsql doesn't have any integer value, I created a new table including a new column 'test' with the existing columns.
FOllowing is the query
sqoop import --connect jdbc:mysql://XXXXXXXXX:3306/db_name --username XXXX --password XXXXX --query "select closed,extract_date,open,close,cast(floor(rand()*1000000 as int) as test from tble_name where \$CONDITIONS" --target-dir /user/XXXX--split-by test;
this query gave me following error :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as int) as test from table_name where (1 = 0)' at line 1
I tried it another way as well:
sqoop import --connect jdbc:mysql://XXXXX:3306/XXXX --username XXXX --password XXXX --query "select closed,extract_date,open,close,ceiling(rand()*1000000) as test from table_name where \$CONDITIONS" --target-dir /user/dfsdlf --split-by test;
With the following query the job gets executed, but there is no data being transferred. It says split-by column is of float type and change it to integer type strictly.
Please help me with this to change split-by column as integer type from float type
The problem mostly seems to be related with the use of alias as the --split-by parameter.
If it's required to use the particular column in the query , you can run the query
'select closed,extract_date,open,close,ceiling(rand()*1000000) from table_name' in the console, get the column name thus coming for the table in the console and use it in --split-by 'complete_column_name_from_console' (here it should be --split-by 'ceiling(rand()*1000000)') .

Error unrecognized argument --hive-partition-key

I am getting error Unrecognized argument --hive-partition-key , when I run the following statement:
sqoop import
--connect 'jdbc:sqlserver://192.168.56.1;database=xyz_dms_cust_100;username-hadoop;password=hadoop'
--table e_purchase_category
--hive_import
--delete-target-dir
--hive-table purchase_category_p
--hive-partition-key "creation_date"
--hive-partition-value "2015-02-02"
The partitioned table exists.
Hive partition key (creation_date in your example) should not be part of your database table when you are using hive-import. When you are trying to create table in hive with partition you will not include partition column in your table schema. The same applies to sqoop hive-import as well.
Based on your sqoop command, i am guessing that creation_date column is present in your SQLServer table. If yes, you might be getting this error
ERROR tool.ImportTool: Imported Failed:
Partition key creation_date cannot be a column to import.
To resolve this issue, i have two solutions:
Make sure that the partition column is not present in the SQLServer table. So, when sqoop creates hive table it includes that partition column and its value as directory in hive warehouse.
Change the sqoop command by including a free form query to get all the columns expect the partiton column and do hive-import. Below is a example for this solution
Example:
sqoop import
--connect jdbc:mysql://localhost:3306/hadoopexamples
--query 'select City.ID, City.Name, City.District, City.Population from City where $CONDITIONS'
--target-dir /user/XXXX/City
--delete-target-dir
--hive-import
--hive-table City
--hive-partition-key "CountryCode"
--hive-partition-value "USA"
--fields-terminated-by ','
-m 1
Another method:
You can also try to do your tasks in different steps:
Create a partition table in hive (Example: city_partition)
Load data from RDBMS to sqoop using hive-import into a plain hive table (Example: city)
Using insert overwrite, import data into partition table (city_partition) from plain hive table (city) like:
INSERT OVERWRITE TABLE city_partition
PARTITION (CountryCode='USA')
SELECT id, name, district, population FROM city;
It could applied too :
sqoop import --connect jdbc:mysql://localhost/akash
--username root
--P
--table mytest
--where "dob='2019-12-28'"
--columns "id,name,salary"
--target-dir /user/cloudera/
--m 1 --hive-table mytest
--hive-import
--hive-overwrite
--hive-partition-key dob
--hive-partition-value '2019-12-28'

sqoop free form query to import n records from a table

I'm trying to import "50" records from a single table using the following query
sqoop import --connect jdbc:mysql://xxxxxxx/db_name --username yyyyy --query 'select * from table where (id <50) AND $CONDITIONS' --target-dir /user/tmp/ -P
I'm having error on this query.
Any ideas ?
i removed the parenthesis in where clause and it worked and when using two or more logical operators use parenthesis otherwise it doesn't work

Resources