Error in TiDB: `java.sql.BatchUpdateExecption:statement count 5001 exceeds the transaction limitation` - sqoop

When I was using Sqoop to write data into TiDB in batches, I ran into the following error:
java.sql.BatchUpdateExecption:statement count 5001 exceeds the transaction limitation
I configured the --batch option already, but this error still occurred. How to resolve this error?

In Sqoop, --batch means committing 100 statements in each batch, but by default each statement contains 100 SQL statements. So, 100 * 100 = 10000 SQL statements, which exceeds 5000, the maximum number of statements allowed in a single TiDB transaction.
Two solutions:
Add the -Dsqoop.export.records.per.statement=10 option as follows:
sqoop export \
-Dsqoop.export.records.per.statement=10 \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop ${user} \
--password ${passwd} \
--table ${tab_name} \
--export-dir ${dir} \
--batch
You can also increase the limited number of statements in a single TiDB transaction, but this will consume more memory.

Related

Sqoop import from Teradata - No more room in database

I am new to Big data, when I am using Sqoop commands to import data from teradata into my Hadoop cluster I am encountering a "No more room in database" error
I am doing the following:
1.The data I am trying to pull into my Hadoop cluster is a view table
2.The I have used the following sqoop command
sqoop import --connect "jdbc:teradata://xxx.xxx.xxx.xxx/DATABASE=XY" \
-- username user1 \
-- password xyc
-- query "
SELECT * FROM TABLE1 WHERE .... AND \$CONDITIONS \
" \
--split-by ITEM_1 \
--delete-target-dir \
--target-dir /user/home/folder1 \
--as-avrodatafile;
I know that the default mappers is 4 since I do not have a primary key for my view, I am using split-by.
Using --num-mappers 1, works but takes a long time for to port over roughly 36GB of data, hence I wanted to increase the num-mappers to 4 or more, however, I am getting the "no more room" error. Does anyone know what's happening?

Sqoop export for 100 million records faster

I have query similar to the below
sqoop export
--connect jdbc:teradata://server/database=BIGDATA
--username dbuser
--password dbpw
-Dsqoop.export.records.per.statement=500
--batch
--hive-table country
--table COUNTRY
--input-null-non-string '\\N' \ --input-null-string '\\N'
The above query is working fine for 3 million records(Takes 1 hour to load the data to TeraData table). For 100 million records to export the data to Teradata empty table i think it may take more time to complete the export. How can i efficiently write the query to export the data faster without failing the query?
You may want to consider increasing your --fetch-size (the number of entries that sqoop has to fetch per scoop of data) from the default 1000 to e.g --fetch-size 10000 0r 20000 depending on your available memory as well as your environments' bandwidth.

why to use following command in sqoop?

I have a doubt in the following sqoop import command
sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username user_name \
--table user_table \
--m 1 \
--target-dir /sample
why we use M in the above command? please clarify
-m is representing the mappers, by specifying -m 1 means you need only one mapper to be run to import the table. This is used for controlling parallelism. To achieve the parallelism sqoop uses the primary key/unique key to split the rows from source table.
Basically the default number of mappers in sqoop is 4. so for this you need to mention by which column you need to acheive parallelism using --split-by column_name, so by giving -m 1 you dont need splitting.
for more information check the below link,
click here

Error : sqoop to add records in hdfs

My scenario : I will get daily 100 records in hdfs through sqoop at particular time. But, yesterday i got only 50 records for that particular time today i need to get 50+100 records in hdfs through sqoop for that particular time. Please help me. Thanks in advance.
To handle such scenario, you need to add a where condition on time. No matters, what the record count is.
You can use something like this in sqoop import command using --query parameter:
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password sqoop \
--query 'SELECT * from records
WHERE recordTime BETWEEN ('<datetime>' AND NOW()) \
--target-dir /user/hadoop/records
You need to modify the where condition as per your table schema.
Please refer Sqoop Documentation for more details.
sqoop import --connect jdbc:mysql://localhost:3306/your_mysql_databasename --username root -P --query 'SELECT * from records WHERE recordTime BETWEEN ('' AND NOW()) --target-dir /where you want to store data
and make when sqoop ask for password enter your mysql password eg.(my pwd is root)

sqoop with sql server retrieving more records

Q: I want to import 5000 rows from SQL server using SQOOP but its giving me 20000 rows. I am using below query.
sudo -E -u hdfs sqoop import --connect "jdbc:sqlserver://hostname;username=*****;password=*****;database=*****" --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --query "select top 5000 * from Tb_Emp where \$CONDITIONS" --split-by EmpID -m 4 --target-dir /home/sqoop_SQLServeroutput
retrieved 20000 records
every mapper is getting 5000 records. but if i do this on mysql then it gives 5000 records as expected.
sudo -E -u hdfs sqoop import --connect jdbc:mysql://hostname/<database_name> --username **** --password **** --query 'select * from Tb_Emp where $CONDITIONS limit 5000' --split-by EmpID -m 4 --target-dir /home/sqoop_MySqloutput
retrieved 5000 records.
don't why its happening.
Using the "top x" or "limit x" clauses do not make much sense with Sqoop as it can return different values on each query execution (there is no "order by"). Also in addition the clause will very likely confuse split generation, ending with not that easily deterministic outputs. Having said that I would recommend you to use only 1 mapper (-m 1 or --num-mappers 1) in case that you need to import predefined number of rows. Another solution would be to create temporary table with the required data on the MySQL/SQL Server side and import this whole temp table with Sqoop.

Resources