I'm trying to import a huge table from oracle 10g to HDFS (GCS since i'm using sqoop with Google Cloud Dataproc) as AVRO. Everything works fine when the table doesnt have any date columns, but when it does some dates are imported very wrong.
Like: Oracle data -> 30/07/76 and HDFS data -> 14976-07-30 20:02:00.0
Like: Oracle data -> 26/03/84 and HDFS data -> 10384-03-26 20:32:34.0
I'm already mapping the date fields as String to bring them like that. I was importing using the default sqoop way that is bringing the date fields as epoch ints but the conversion was incorrect too.
Like: Oracle data -> 01/01/01 and HDFS data -> -62135769600000 when it should be 978314400000
Please, hope someone help me to fix this issue.
Thanks
Aditional information:
Sqoop command that i'm running
import -Dmapreduce.job.user.classpath.first=true -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect=$JDBC_STR --username=$USER --password=$PASS --target-dir=gs://sqoop-dev-out-files/new/$TABLE --num-mappers=10 --fields-terminated-by="\t" --lines-terminated-by="\n" --null-string='null' --null-non-string='null' --table=$SCHEMA.$TABLE --as-avrodatafile --map-column-java="DATACADASTRO=String,DATAINICIAL=String,DATAFINAL=String"
Sqoop version: 1.4.7
JDBC version: 6
I think your date in oracle is 01/01/0001, try to_char(COLUMN,'DD/MM/YYYY').
My issue is that my date is really 01/01/0001, because of user mistyping, and I can't update the column in the origin oracle database.
My issue is that converting to unix should have come -62135596800000 but instead, it comes -62135769600000(30/12/0000).
At first, I thought that was a timezone issue but it is two days difference.
Related
We are trying to import PostgreSQL data using apache sqoop in to Hadoop environment. On which, identified that direct(keyword: --direct) mode of SQOOP import using the PostgreSQL COPY operation to fast import the data in to HDFS. If the column is having a line breaker(\n) as a value then the QUOTE is added in the column value(example as below:1) which was considered as another record in HIVE table(LOAD DATA INPATH). Is there alternative is available to make this work?
E1: Sample data in HDFS (tried importing with: Default or --input-escaped-by '\' or --input-escaped-by '\n' doesn't help)
value1,"The some data
has line break",value3
Hive table considered it as 2 records.(provided:--hive-delims-replacement '' seems HDFS level data has \n hive detects as new record)
value1 "the same data NULL
has line break" value3 NULL
It seems apache retired this project seems it no longer support bug fixes or any release.
Any of you faced the same problem or any one could help me on this?
Note: I am able to import using non-direct and select query mode.
You could try exporting your data to a non-text format (e.g. Parquet, "-as-parquetfile" sqoop flag). That would fix the issue with new lines.
I am using SSIS for ETL. Source and destination databases are Oracle.
When I run job through SQL agent its prompts me with the following error:
This table contains 5 date columns which are creating this issue.
I have tried all possible solution but it didn't work. It does not seems data issue as I rerun job on those selective dates which worked perfectly. On full load it failed.
The bottom error message is:
Data Flow: Task:Error: SQLSTATE 22007, Message: [Microsoft][ODBC Oracle Wire Protocol driver]Invalid datetime format. Error in parameter 17.
You have an Invalid datetime format. You need to fix it by correcting either the data or the format model you are using but, since you haven't included any code, we can't help further.
I have a similar issue, the difference is my source is the SQL Server database and the destination is Oracle database.
I converted the source DateTime columns to type String first and then they were loaded to destination date columns successfully.
I am trying to use the Postgres import functionality, but it gives me an error on the date datatype.
I have dates in the following format 17-MAY-90 12.00.00.000000000 AM in my Oracle db, but I need to import the data into my Postgres db. I tried timestap with and without timezone and it still gives me an error msg.
What datatype can I use?
PS: Error I am getting
Thanks.
I was able to fix the problem by going to SQL Developer and under
Tools-Preferences-Database-Advanced-NLS
and changed the date format to the format I wanted. This way when I exported the file, it exports with the format I want to use and import in my Postgres db.
This sounds really weird , I have an oracle database , I am trying to make a select from an oracle database through spark sql , but the data I look for exits really in the database , but I cannot find it in the request launched from scala. so i tried to compute the number of exiting data
select count (*) from TMP_STRUCTURE
from oracle console I got 373799
when I put
val sparkSession = SparkSession.builder.master("local").appName("spark session example").getOrCreate()
//connect to table TMP_STRUCTURE oracle
val spark = sparkSession.sqlContext
val df = spark.load("jdbc",
Map("url" -> "jdbc:oracle:thin:IPTECH/IPTECH#//localhost:1521/XE",
"dbtable" -> "TMP_STRUCTURE"))
println(df.count())
373797
I cannot find why ?
Any help please .
This is quite impossible as there might be 2 chances
Case 1 : you might be reading uncommitted data from oracle session and through spark sql you might be reading committed dataset. (execute commit and check again)
Case 2: from oracle session you might be connecting to different database having nearly same number of rows. and might have used different database for spark sql
(ensure/cross check both databases you tried to connect are same)
I got Too many ROS containers ... error when exporting large amount of data from HDFS to Vertica. I know there is a direct option for vsql COPY which will bypass the WOS and load data into ROS containers. I also notice the --direct in Sqoop Export, see this Sqoop User Guide. I'm just wondering if these two "direct" have same function.
I have tried modify Vertica configuration parameters like MoveOutInterval, MergeOutInterval... But this didn't help much.
So does anyone know if direct mode of Sqoop export will help to solve the ROS containers issue. Thanks!
--direct is only supported by specific database connectors. Since there isn't one for Vertica, you would be using the Generic JDBC one. I really doubt using --direct does anything... but if you really want to test this you can look at the statement sent in query_requests.
select *
from query_requests
where request_type = 'LOAD'
and start_timestamp > clock_timestamp() - interval '1 hour'
That will show you all load statements within the last hour. The sqoop statements should get converted to a COPY. I would really hope anyhow! If it is a bunch of INSERT ... VALUES statements then I highly suggest NOT using it. If it is not producing a COPY then you'll need to change the query above to look for the INSERT.
select *
from query_requests
where request_type = 'QUERY'
and request ilike 'insert%'
and start_timestamp > clock_timestamp() - interval '1 hour'
Let me know what you find here. If it is doing INSERT...VALUES then I can tell you how to fix it (but it is a bit of work).