Sqoop Incremental Load With Epoch Timestamp - sqoop

Using Sqoop incremental tool needs last modified date to be provided in --last-value in format similar to 2016-09-05 06:04:27.0. The problem in this case in the source MySQL databases, update_date data is stored as Epoch timestamp( 1550218178).
With the following sqoop command
sqoop import --verbose --connect jdbc:mysql://192.18.2.5:3306/iprocure_ip --table depot --username usernamehere --password-file /user/admin/.password --check-column update_date --incremental lastmodified --last-value '1550218178' --target-dir /user/admin/notexist --merge-key "depot_id"
Thows an error stating that the date in epoch timestamp provided is not a timestamp
19/03/06 12:57:31 ERROR manager.SqlManager: Column type is neither timestamp nor date!
19/03/06 12:57:31 ERROR sqoop.Sqoop: Got exception running Sqoop:
java.lang.RuntimeException: Column type is neither timestamp nor date!
java.lang.RuntimeException: Column type is neither timestamp nor date!
at org.apache.sqoop.manager.ConnManager.datetimeToQueryString(ConnManager.java:788)
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:350)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:526)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:656)
at org.apache.sqoop.Sqoop.run(Sqoop.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249)
at org.apache.sqoop.Sqoop.main(Sqoop.java:258)
How can one fetch incremental data with sqoop using Epoch timestamp?

The exception is clearly saying that there is type mismatch and Sqoop is expecting date or timestamp but your --last-value format is int.
If you read the sqoop documentation, is says...
Incremental imports are performed by comparing the values in a check column against a reference value for the most recent import. For example, if the --incremental append argument was specified, along with --check-column id and --last-value 100, all rows with id > 100 will be imported
Since Sqoop is internally Java and it must match java.sql.Data types. Recheck the DDL and adapt the sqoop import command.

Related

incremental load using sqoop from mysql to hive

I am new to sqoop and hive . Please help me with understanding
The count of mysql and hive table are different
mysql is 51 rows (table has primary key and no duplicates ) ad hive is 38rows - first run itself
sqoop job --create mmod -- import --connect "jdbc:mysql://cxln2.c.thelab-240901.internal:3306/retail_db" --username sqoopuser --password-file
/tmp/.mysql-pass.txt --table mod --compression-codec org.apache.hadoop.io.compress.BZip2Codec --hive-import --hive-database encry --hive-table mod2 --h
ive-overwrite --check-column last_update_date --incremental lastmodified --merge-key id --last-value 0 --target-dir /user/user_name/append1sqo
pp
It is not creating target dir in given location , instead it creating in warehouse location
I am trying to schedule a sqoop incremental job , somehow I am doing mistake some where
command : above command
2.1 new rows are added with same date
2.2 delete and update on few rows
Output :
No new updates on given table .
It is not updating lastvalue in sqoop job
How to choose merge-key column in sqoop
Where condition in sqoop
--query "select * from reason where id>20 AND $CONDITIONS"
What is the use of $CONDITIONS and do we need to pass the variable in Linux
Is that possible to track rejected rows in sqoop job

sqoop import error for timestamp coulmn in parquet table

I'm getting an error while mapping SQL Server table to parquet table. I have made parquet table to match SQL Server table with corresponding column data type.
But sqoop infer timestamp column as long. which creates a problem in loading data to parquet table. Loading data to parquet seems to be successful but fetching is a problem.
Error Message:
hive> select updated_at from bkfs.address_par1;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
Time taken: 0.146 seconds
Sqoop parquet import interprets the Date and timestamp Oracle data types as Long. Which is trying to get date in unix epoch format. So, importing can be handled like below,
sqoop import \
--connect [connection string] \
--username [username] \
--password [password] \
--query "select to_char(date_col,'YYYY-MM-DD HH:mi:SS.SS') as date_col from test_table where \$CONDITIONS" \
--as-parquetfile \
-m 1 \
--delete-target-dir \
--target-dir /sample/dir/path/hive_table
you can have a look at the below question posted already,
{Sqoop function '--map-column-hive' being ignored}

incremental "lastmodified" not working in sqoop

I'm trying sqoop to perform incremental import from Teradata DB to Hive. Below is the query:
sqoop import --connect jdbc:teradata://xxx.xxx.x.xx/DATABASE=DBN --driver com.teradata.jdbc.TeraDriver --username userN --password pass --query "SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where \$CONDITIONS" --target-dir /apps/hive/warehouse/staging.db/tableName -m 26 --check-column call_date --incremental append --split-by alias.colA --last-value '2016-02-01'
The column call_date is of DATE type, values in the format 'YYYY-MM-DD'.
When I use 'append' for --incremental, everything works fine. But when I put 'lastmodified', the following error is thrown:
ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
ERROR util.SqlTypeMap: correct column names in Sqoop options.
ERROR tool.ImportTool: Imported Failed: column not found: call_date
I'm using sqoop 1.4.4.2.1 on HDP 2.1
While Teradata DB is 14.10
Any pointers will be helpful.
I think, in case of query you can perform the last value check in the query itself some think like this
"SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where call_date >'2016-02-01' and \$CONDITIONS" .
Reference (refer section Incrementally Updating Data in Hive > 1.Ingest the data.)
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html

Sqoop incremental lastmodified

I am having a accounts table in mysql db.
it has around 19654 records.I used sqoop to import the table data in HDFS.It created four files in HDFS with data evenly distributed
then i executed below sql statement on DB
update accounts set modified = now() where acct_num in (1,2,3,4) ;
Then i executed below sqoop tool
sqoop import --table accounts --connect jdbc:mysql://localhost/loudacre
--username training --password training
--incremental lastmodified
--check-column modified --last-value '2014-03-18 13:29:47.0'
--merge-key acct_num --target-dir /accounts/
After the above completed it created only one file with around 10 entries only.Does not even include new timestamp value.
I was just trying to update the rows which have new timestamp. Can anyone help?

Sqoop job incremental import using free form query

I am trying to do sqoop job incremental import using free form query. Here's the query being used
sqoop job --create importjobinl -- import --connect jdbc:mysql://localhost/test --username training --password training --query 'select id,name,unix_timestamp(time_updated) from intest where $CONDITIONS' --target-dir /user/new/lll/`date +%d%T|sed 's/://g'` -m 1 --check-column time_updated --incremental append --last-value '1441526438'
The job is not getting created It shows.
Incremental imports require a table.
Try --help for usage instructions.
It works when I use --table intest instead of --query, but I want to use --query to convert date to epochtime using unix_timestamp since the value in mysql table intest is in yyyy-mm-dd format
Version used :Sqoop 1.2.0-cdh3u0
Sqoop incremental imports for free form queries was added from Sqoop 1.4.2
JIRA link : Sqoop Incremental import Support for free form queries
Since you are using Sqoop 1.2.0, this feature might not be available for you to use
Do an initial pull using sqoop.
Make sure the date format of your column is in YYYY-MM-DD HH:MM:SS if you are using the last modified column as date.
Run below statement for incremental load to your hive table which includes free from query.
sqoop import --connect jdbc:mysql://localhost/test --username training --password training --query "select * from intest where $CONDITIONS" --hive-import --hive-table db_name_x.table_name_x --incremental lastmodified -check-column date_x --target-dir /user/xyz -m 1

Resources