incremental "lastmodified" not working in sqoop - hadoop

I'm trying sqoop to perform incremental import from Teradata DB to Hive. Below is the query:
sqoop import --connect jdbc:teradata://xxx.xxx.x.xx/DATABASE=DBN --driver com.teradata.jdbc.TeraDriver --username userN --password pass --query "SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where \$CONDITIONS" --target-dir /apps/hive/warehouse/staging.db/tableName -m 26 --check-column call_date --incremental append --split-by alias.colA --last-value '2016-02-01'
The column call_date is of DATE type, values in the format 'YYYY-MM-DD'.
When I use 'append' for --incremental, everything works fine. But when I put 'lastmodified', the following error is thrown:
ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
ERROR util.SqlTypeMap: correct column names in Sqoop options.
ERROR tool.ImportTool: Imported Failed: column not found: call_date
I'm using sqoop 1.4.4.2.1 on HDP 2.1
While Teradata DB is 14.10
Any pointers will be helpful.

I think, in case of query you can perform the last value check in the query itself some think like this
"SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where call_date >'2016-02-01' and \$CONDITIONS" .
Reference (refer section Incrementally Updating Data in Hive > 1.Ingest the data.)
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html

Related

incremental load using sqoop from mysql to hive

I am new to sqoop and hive . Please help me with understanding
The count of mysql and hive table are different
mysql is 51 rows (table has primary key and no duplicates ) ad hive is 38rows - first run itself
sqoop job --create mmod -- import --connect "jdbc:mysql://cxln2.c.thelab-240901.internal:3306/retail_db" --username sqoopuser --password-file
/tmp/.mysql-pass.txt --table mod --compression-codec org.apache.hadoop.io.compress.BZip2Codec --hive-import --hive-database encry --hive-table mod2 --h
ive-overwrite --check-column last_update_date --incremental lastmodified --merge-key id --last-value 0 --target-dir /user/user_name/append1sqo
pp
It is not creating target dir in given location , instead it creating in warehouse location
I am trying to schedule a sqoop incremental job , somehow I am doing mistake some where
command : above command
2.1 new rows are added with same date
2.2 delete and update on few rows
Output :
No new updates on given table .
It is not updating lastvalue in sqoop job
How to choose merge-key column in sqoop
Where condition in sqoop
--query "select * from reason where id>20 AND $CONDITIONS"
What is the use of $CONDITIONS and do we need to pass the variable in Linux
Is that possible to track rejected rows in sqoop job

Adding column in sqoop import

While importing data from SQL through sqoop, Is it possible to add a new column and insert time stamp into that column?
Is it possible by any other ways before getting data into HDFS?
You may use --query parameter of sqoop command and add SQL function to get current timestamp in query.
Example: To import stud table from MySQL having rollnum and name columns.
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username root --query 'select name, rollnum, current_timestamp from stud where $CONDITIONS' --target-dir '/tmp/stud1' --split-by id
Note current_timestamp mysql function used in query.

sqoop import error for timestamp coulmn in parquet table

I'm getting an error while mapping SQL Server table to parquet table. I have made parquet table to match SQL Server table with corresponding column data type.
But sqoop infer timestamp column as long. which creates a problem in loading data to parquet table. Loading data to parquet seems to be successful but fetching is a problem.
Error Message:
hive> select updated_at from bkfs.address_par1;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
Time taken: 0.146 seconds
Sqoop parquet import interprets the Date and timestamp Oracle data types as Long. Which is trying to get date in unix epoch format. So, importing can be handled like below,
sqoop import \
--connect [connection string] \
--username [username] \
--password [password] \
--query "select to_char(date_col,'YYYY-MM-DD HH:mi:SS.SS') as date_col from test_table where \$CONDITIONS" \
--as-parquetfile \
-m 1 \
--delete-target-dir \
--target-dir /sample/dir/path/hive_table
you can have a look at the below question posted already,
{Sqoop function '--map-column-hive' being ignored}

Incrimental update in HIVE table using sqoop

I have a table in oracle with only 4 columns...
Memberid --- bigint
uuid --- String
insertdate --- date
updatedate --- date
I want to import those data in HIVE table using sqoop. I create corresponding HIVE table with
create EXTERNAL TABLE memberimport(memberid BIGINT,uuid varchar(36),insertdate timestamp,updatedate timestamp)LOCATION '/user/import/memberimport';
and sqoop command
sqoop import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username ** --password *** --hive-import --table MEMBER --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
Its working properly and able to import data in HIVE table.
Now I want to update this table with incremental update with updatedate (last value today's date) so that I can get day to day update for that OLTP table into my HIVE table using sqoop.
For Incremental import I am using following sqoop command
sqoop import --hive-import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username *** --password *** --table MEMBER --check-column UPDATEDATE --incremental append --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
But I am getting exception
"Append mode for hive imports is not yet supported. Please remove the parameter --append-mode"
When I remove the --hive-import it run properly but I did not found those new update in HIVE table that I have in OLTP table.
Am I doing anything wrong ?
Please suggest me how can I run incremental update with Oracle - Hive using sqoop.
Any help will be appropriated..
Thanks in Advance ...
Although i don't have resources to replicate your scenario exactly.
You might want to try building a sqoop job and test your use case.
sqoop job --create sqoop_job \
-- import \
--connect "jdbc:oracle://server:port/dbname" \
--username=(XXXX) \
--password=(YYYY) \
--table (TableName)\
--target-dir (Hive Directory corresponding to the table) \
--append \
--fields-terminated-by '(character)' \
--lines-terminated-by '\n' \
--check-column "(Column To Monitor Change)" \
--incremental append \
--last-value (last value of column being monitored) \
--outdir (log directory)
when you create a sqoop job, it takes care of --last-value for subsequent runs. Also here i have used the Hive table's data file as target for incremental update.
Hope this provides a helpful direction to proceed.
There is no direct way to achieve this in Sqoop. However you can use 4 Step Strategy.

Sqoop job incremental import using free form query

I am trying to do sqoop job incremental import using free form query. Here's the query being used
sqoop job --create importjobinl -- import --connect jdbc:mysql://localhost/test --username training --password training --query 'select id,name,unix_timestamp(time_updated) from intest where $CONDITIONS' --target-dir /user/new/lll/`date +%d%T|sed 's/://g'` -m 1 --check-column time_updated --incremental append --last-value '1441526438'
The job is not getting created It shows.
Incremental imports require a table.
Try --help for usage instructions.
It works when I use --table intest instead of --query, but I want to use --query to convert date to epochtime using unix_timestamp since the value in mysql table intest is in yyyy-mm-dd format
Version used :Sqoop 1.2.0-cdh3u0
Sqoop incremental imports for free form queries was added from Sqoop 1.4.2
JIRA link : Sqoop Incremental import Support for free form queries
Since you are using Sqoop 1.2.0, this feature might not be available for you to use
Do an initial pull using sqoop.
Make sure the date format of your column is in YYYY-MM-DD HH:MM:SS if you are using the last modified column as date.
Run below statement for incremental load to your hive table which includes free from query.
sqoop import --connect jdbc:mysql://localhost/test --username training --password training --query "select * from intest where $CONDITIONS" --hive-import --hive-table db_name_x.table_name_x --incremental lastmodified -check-column date_x --target-dir /user/xyz -m 1

Resources