Sqoop Support for Oracle Date Format - oracle

Using Sqoop I am importing from Oracle table into hdfs and Loading to Manage Table by giving the hdfs Path Location.Below is the sqoop Command
sqoop import \
--connect jdbcconnection \
--username user \
--password password \
--table EMPDETAILS \
--column "EMP_ID,EMP_NAME,EMP_DOB,EMP_DOJ" \
--target-dir hdfspath \
-m 1
This command executed successfully and when loading into hive table using hdfs location it is giving null for EMP_DOB(date type is Date)
create table EMP_TARGET(
empid int,
empname string,
empdob date,
empdoj timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Location 'hdfspath';
When I execute the above query the empdob column in target hive is giving NULL but empdoj is giving Correct Value. When I checked the value in hdfs path for empdob it is 1980-01-01 00:00:00:0.
Kindly help to solve the issue.

Related

Sqoop export from Hcatalog to MySQL with different col names assign

Now my hive table with columns - id, name
and MySQL table - number, id, name
I want to map id (from hive) with number (from mysql), name (from hive) with id (from mysql).
I use the command :
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
However, it didn't work.
The same scenario liked this case can work fine [1]. The requirement can be fulfilled by locating the hive table on hdfs and using the following command to achieve.
sqoop export --export-dir /[hdfs_path] --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
Is there any solution can fulfill my scenario via Hcatalog?
reference :
[1]. Sqoop export from hive to oracle with different col names, number of columns and order of columns
I didn't used the hcatalog part of sqoop, but as is written in the manual, the next script should do the work:
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --map-column-hive "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
This option: --map-column-hive when is used along with --hcatalog, does the work for hcatalog instead of hive.
Hope that this works for you.

sqoop export commands for the data which has spaces before in hdfs

i have data which has stored in hdfs, the data has space before and after of the value, when i try to export to mysql, it gives numberformat exception but when i create data without space, it has inserted into mysql successfully.
my question is can't we export the data which has space from hdfs to mysql usong sqoop export command?
The data which i used
1201, adi, sen manager, 30000, it
1201, pavan, jun manager, 5000, cs
1203, santhosh, junior, 60000, mech
i created table like
create table emp(id BIGINT,name varchar(20),desg varchar(20),salary BIGINT,dept varchar(20));
sqoop command -- sqoop export \
--connect jdbc:mysql://127.0.0.1/mydb \
--username root \
--table emp \
--m 1 \
--export-dir /mydir \
--input-fields-terminated-by ',' \
--input-lines-terminated-by '\n'
result: numberformatexception input string:'1201'
can't parse the data
i discussed in forum, they said trim the space but i wants to know that automatically trim the spaces while perform sqoop export.
can somebody give suggestions on this?
You can do 1 simple thing:
Create temporary table in MySQL with all VARCHAR
create table emp-temp(id BIGINT,name varchar(20),desg varchar(20),salary BIGINT,dept varchar(20));
Now create another with numeric fields after TRIM() and CAST()
create table emp as select CAST(TRIM(id) AS UNSIGNED), name, desg, CAST(TRIM(salary) AS UNSIGNED), dept FROM emp_temp;
Sqoop internally runs MapReduce job.
Simple solution is to run a Mapper and trim the spaces in your data and get the output in different file and run sqoop export on new file.

sqoop import error for timestamp coulmn in parquet table

I'm getting an error while mapping SQL Server table to parquet table. I have made parquet table to match SQL Server table with corresponding column data type.
But sqoop infer timestamp column as long. which creates a problem in loading data to parquet table. Loading data to parquet seems to be successful but fetching is a problem.
Error Message:
hive> select updated_at from bkfs.address_par1;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
Time taken: 0.146 seconds
Sqoop parquet import interprets the Date and timestamp Oracle data types as Long. Which is trying to get date in unix epoch format. So, importing can be handled like below,
sqoop import \
--connect [connection string] \
--username [username] \
--password [password] \
--query "select to_char(date_col,'YYYY-MM-DD HH:mi:SS.SS') as date_col from test_table where \$CONDITIONS" \
--as-parquetfile \
-m 1 \
--delete-target-dir \
--target-dir /sample/dir/path/hive_table
you can have a look at the below question posted already,
{Sqoop function '--map-column-hive' being ignored}

Incrimental update in HIVE table using sqoop

I have a table in oracle with only 4 columns...
Memberid --- bigint
uuid --- String
insertdate --- date
updatedate --- date
I want to import those data in HIVE table using sqoop. I create corresponding HIVE table with
create EXTERNAL TABLE memberimport(memberid BIGINT,uuid varchar(36),insertdate timestamp,updatedate timestamp)LOCATION '/user/import/memberimport';
and sqoop command
sqoop import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username ** --password *** --hive-import --table MEMBER --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
Its working properly and able to import data in HIVE table.
Now I want to update this table with incremental update with updatedate (last value today's date) so that I can get day to day update for that OLTP table into my HIVE table using sqoop.
For Incremental import I am using following sqoop command
sqoop import --hive-import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username *** --password *** --table MEMBER --check-column UPDATEDATE --incremental append --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
But I am getting exception
"Append mode for hive imports is not yet supported. Please remove the parameter --append-mode"
When I remove the --hive-import it run properly but I did not found those new update in HIVE table that I have in OLTP table.
Am I doing anything wrong ?
Please suggest me how can I run incremental update with Oracle - Hive using sqoop.
Any help will be appropriated..
Thanks in Advance ...
Although i don't have resources to replicate your scenario exactly.
You might want to try building a sqoop job and test your use case.
sqoop job --create sqoop_job \
-- import \
--connect "jdbc:oracle://server:port/dbname" \
--username=(XXXX) \
--password=(YYYY) \
--table (TableName)\
--target-dir (Hive Directory corresponding to the table) \
--append \
--fields-terminated-by '(character)' \
--lines-terminated-by '\n' \
--check-column "(Column To Monitor Change)" \
--incremental append \
--last-value (last value of column being monitored) \
--outdir (log directory)
when you create a sqoop job, it takes care of --last-value for subsequent runs. Also here i have used the Hive table's data file as target for incremental update.
Hope this provides a helpful direction to proceed.
There is no direct way to achieve this in Sqoop. However you can use 4 Step Strategy.

Using sqoop to create an orc table

I have a DB2 database running with a table t1, and a hadoop cluster. I want to create an orc table in hadoop, with the same table definition as t1.
For this task I want to use Sqoop.
I tried using the sqoop create-hive-table command, but this command isn't compatible with hcatalog - and from what I've found, hcatalog is the only command, that allows me to create orc tables.
Instead i do this:
sqoop import \
--driver com.ibm.db2.jcc.DB2Driver \
--connect jdbc:db2://XXXXXXX \
--username user \
--password-file file:///pass.txt \
--query "select * from D1.t1 where \$CONDITIONS and reptime < '1864-11-16 13:23:54.749' fetch first 1 rows only" \
--split-by 1 \
--hcatalog-database default \
--hcatalog-table t1 \
--create-hcatalog-table \
--hcatalog-storage-stanza "stored as orcfile"
Which queries the database about somthing that does not exist and creates an orc table. Of course, this isn't optimal - any ideas on how to do this with sqoop create-hive-table, or at least without having to do a useless database query returning nothing?

Resources