sqoop export from hdfs to oracle Error - oracle

Command used:
sqoop export --connect jdbc:oracle:thin:#//xxx:1521/BDWDEV4 --username xxx --password xxx --table TW5T0 --export-dir '/data/raw/oltp/cogen/oraclexport/TW5T0/2015-08-18' -m 8 --input-fields-terminated-by '\001' --lines-terminated-by '\n' --input-escaped-by '\"' --input-optionally-enclosed-by '\"'
The destination table has columns with datatype date in oracle but as show in error it is parsing simple date as timestamp
Error:
15/09/11 06:07:12 INFO mapreduce.Job: map 0% reduce 0% 15/09/11 06:07:17 INFO mapreduce.Job: Task Id : attempt_1438142065989_99811_m_000000_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: Can't parse input data: '2015-08-15'
at TZ401.__loadFromFields(TZ401.java:792)
at TZ401.parse(TZ401.java:645)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:202)
at TZ401.__loadFromFields(TZ401.java:709)
... 12 more

Instead of changing your data files in Hadoop, you should use the --map-column-java argument in your sqoop export.
If you have for example two DATE columns named DATE_COLUMN_1 and DATE_COLUMN_2 in your Oracle table, then you can add the following argument to your sqoop command:
--map-column-java DATE_COLUMN_1=java.sql.Date,DATE_COLUMN_2=java.sql.Date
As mentioned before, the JDBC format has to be used in your Hadoop text file. But in this case yyyy-mm-dd will work.

From
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_dates_and_times,
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.
When exporting data back to a database, Sqoop parses text fields as
TIMESTAMP types (with the form yyyy-mm-dd HH:MM:SS.ffffffff) even if
you expect these fields to be formatted with the JDBC date escape
format of yyyy-mm-dd. Dates exported to Oracle should be formatted as
full timestamps.
So you would need to format the dates in your files to conform to the format yyyy-mm-dd HH:MM:SS.ffffffff before exporting to Oracle.
EDIT:
Answering the comment,
There around 70 files(tables) in hdfs I need to export..So,in all
files I need to change the date from yyyy-mm-dd to yyyy-mm-dd
HH:MM:SS.ffffffff, any simple way to format it.
Well you could write an awk script to do that for you. Or else you can check if the below idea works:
Create a new temporary table TEMPIMPORT with the same structure as table TW5T0 except changing the column which has the DATE datatype to VARCHAR2
Load using Sqoop into the new temporary table TEMPIMPORT.
Run the DML below to export the data back int TW5T0 (and commit of course):
insert into tw5t0 (select [[all_your_columns_here_except_date_column]],to_date(date_column,'yyyy-mm-dd') from tempimport);

used --connection-param-file ora.porperties in export sqoop
ora.properties contains
oracle.jdbc.mapDateToTimestamp=false

Oracle drivers map oracle.sql.DATE to java.sql.Timestamp, retaining
the time information. If you still want the incorrect but 10g
compatible oracle.sql.DATE to java.sql.Date mapping, then you can get
it by setting the value of mapDateToTimestamp flag to false (default
is true).
https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxref.htm#JJDBC28920
For using with sqoop you need to add option:
--connection-param-file conn-param-file.txt
conn-param-file.txt:
oracle.jdbc.mapDateToTimestamp=false

For using with sqoop you need to add option:
--connection-param-file conn-param-file.txt
conn-param-file.txt:
oracle.jdbc.mapDateToTimestamp=false

If Hive table columns sequence doesn't match with RDBMS table columns sequence order then there is a chance of same error.
I have resolve my issue after rearrange the columns in RDBMS by creating the table again.

Related

sqoop can't convert long to timestamp when export data from HDFS to oracle

I have a csv file in hdfs with this format:
000000131,2020-07-22,0.0,"","",1595332359218,khf987ksdfi34
000000112,2020-07-22,0.0,"","",1595442610265,khf987ksdfi34
000000150,2020-07-22,0.0,"","",1595442610438,khf987ksdfi34
I want to export this file to oracle using sqoop like this:
sqoop export --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=oracledb)(port=1521))(connect_data=(service_name=stgdb)))" --table CORE_ETL.DEPOSIT_TURNOVER --username xxxx --password xxxx --export-dir /tmp/merged_deposit_turnover/ --input-fields-terminated-by "," --input-lines-terminated-by '\n' --input-optionally-enclosed-by '\"' --map-column-java DATE=java.sql.Date,INSERT_TS=java.sql.Timestamp
but the process ended with this error:
Caused by: java.lang.RuntimeException: Can't parse input data: '1595332359218' at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:546) at
CORE_ETL_DEPOSIT_TURNOVER.parse(CORE_ETL_DEPOSIT_TURNOVER.java:431) at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:88) ... 10 more Caused
by: java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:529) ... 12 more
I wonder there is a way without changing the format of data in HDFS I can export this file to oracle.
also oracle schema:
As per sqoop official documentation:
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_export_date_and_timestamp_data_types_into_oracle
while exporting data from hdfs, the Sqoop export command will fail if the data is not in the required format and the required format for the timestamp is : yyyy-mm-dd hh24:mi:ss.ff . So you will have to format the timestamps in your files to conform to the aforementioned format to export properly to Oracle.

Sqoop Incremental Load With Epoch Timestamp

Using Sqoop incremental tool needs last modified date to be provided in --last-value in format similar to 2016-09-05 06:04:27.0. The problem in this case in the source MySQL databases, update_date data is stored as Epoch timestamp( 1550218178).
With the following sqoop command
sqoop import --verbose --connect jdbc:mysql://192.18.2.5:3306/iprocure_ip --table depot --username usernamehere --password-file /user/admin/.password --check-column update_date --incremental lastmodified --last-value '1550218178' --target-dir /user/admin/notexist --merge-key "depot_id"
Thows an error stating that the date in epoch timestamp provided is not a timestamp
19/03/06 12:57:31 ERROR manager.SqlManager: Column type is neither timestamp nor date!
19/03/06 12:57:31 ERROR sqoop.Sqoop: Got exception running Sqoop:
java.lang.RuntimeException: Column type is neither timestamp nor date!
java.lang.RuntimeException: Column type is neither timestamp nor date!
at org.apache.sqoop.manager.ConnManager.datetimeToQueryString(ConnManager.java:788)
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:350)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:526)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:656)
at org.apache.sqoop.Sqoop.run(Sqoop.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249)
at org.apache.sqoop.Sqoop.main(Sqoop.java:258)
How can one fetch incremental data with sqoop using Epoch timestamp?
The exception is clearly saying that there is type mismatch and Sqoop is expecting date or timestamp but your --last-value format is int.
If you read the sqoop documentation, is says...
Incremental imports are performed by comparing the values in a check column against a reference value for the most recent import. For example, if the --incremental append argument was specified, along with --check-column id and --last-value 100, all rows with id > 100 will be imported
Since Sqoop is internally Java and it must match java.sql.Data types. Recheck the DDL and adapt the sqoop import command.

Loading Sequence File data into hive table created using stored as sequence file failing

Importing the content from MySQL to HDFS as sequence files using below sqoop import command
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username retail_dba --password cloudera
--table orders
--target-dir /user/cloudera/sqoop_import_seq/orders
--as-sequencefile
--lines-terminated-by '\n' --fields-terminated-by ','
Then i'm creating the hive table using the below command
create table orders_seq(order_id int,order_date string,order_customer_id int,order_status string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS SEQUENCEFILE
But when I tried to load sequence data obtained from 1st command into hive table using the below command
LOAD DATA INPATH '/user/cloudera/sqoop_import_seq/orders' INTO TABLE orders_seq;
It is giving the below error.
Loading data to table practice.orders_seq
Failed with exception java.lang.RuntimeException: java.io.IOException: WritableName can't load class: orders
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Where am I going wrong?
First of all, It's necessary to have the data in that format?
Let's suppose you have to have the data in that format. The load data command is not necessary. Once the sqoop finishes importing data, you will just have to create a Hive table pointing the same directory where you sqoop the data.
One side note from your scripts:
create table orders_seq(order_id int,order_date string,order_customer_id int,order_status string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS SEQUENCEFILE
Your sqoop command says this: --fields-terminated-by ',' but when you are creating the table you are using: FIELDS TERMINATED BY '|'
In my experience, the best approach I thing is to sqoop the data as avro, this will create automatically an avro-schema. Then you will just to have to create a Hive table using the schema previously created (AvroSerde) and using the location where you stored the data you got from sqooping process.

Sqoop export of a hive table partitioned on an int column

I have a Hive table partitioned on an 'int' column.
I want to export the Hive table to MySql using Sqoop export tool.
sqoop export --connect jdbc:mysql://XXXX:3306/temp --username root --password root --table emp --hcatalog-database temp --hcatalog-table emp
I tried the above sqoop command but it failed with below exception.
ERROR tool.ExportTool: Encountered IOException running export job: java.io.IOException:
The table provided temp.emp uses unsupported partitioning key type for column mth_id : int.
Only string fields are allowed in partition columns in HCatalog
I understand that the partition on int column is not supported.
But would like to check whether this issue is fixed in any of the latest releases with an extra config/option.
As a workaround, I can create another table without a partition before exporting.But I would like to check whether there is a better way to achieve this?
Thanks in advance.

Sqoop Hive table import, Table dataType doesn't match with database

Using Sqoop to import data from oracle to hive, its working fine but it create table in hive with only 2 dataTypes String and Double. I want to use timeStamp as datatype for some columns.
How can I do it.
bin/sqoop import --table TEST_TABLE --connect jdbc:oracle:thin:#HOST:PORT:orcl --username USER1 -password password -hive-import --hive-home /user/lib/Hive/
In addition to above answers we may also have to observe when the error is coming, e.g.
In my case I had two types of data columns that caused error: json and binary
for json column the error came while a Java Class was executing, at the very beginning of the import process :
/04/19 09:37:58 ERROR orm.ClassWriter: Cannot resolve SQL type
for binary column, error was thrown while importing into the hive tables (after data is imported and put into HDFS files)
16/04/19 09:51:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive does not support the SQL type for column featured_binary
To get rid of these two errors, I had to provide the following options
--map-column-java column1_json=String,column2_json=String,featured_binary=String --map-column-hive column1_json=STRING,column2_json=STRING,featured_binary=STRING
In summary, we may have to provide the
--map-column-java
or
--map-column-hive
depending upon the failure.
You can use the parameter --map-column-hive to override default mapping. This parameter expects a comma-separated list of key-value pairs separated by = to specify which column should be matched to which type in Hive.
sqoop import \
...
--hive-import \
--map-column-hive id=STRING,price=DECIMAL
A new feature was added with sqoop-2103/sqoop 1.4.5 that lets you call out the decimal precision with the map-column-hive parameter. Example:
--map-column-hive 'TESTDOLLAR_AMT=DECIMAL(20%2C2)'
This syntax would define the field as a DECIMAL(20,2). The %2C is used as a comma and the parameter needs to be in single quotes if submitting from the bash shell.
I tried using Decimal with no modification and I got a Decimal(10,0) as a default.

Resources