sqoop incremental job is failing due to org.kitesdk.data.DatasetOperationException - oracle

I am trying to import data from oracle to hive table using sqoop incremental job, using parquet file format. But job is failing due to below error
Error: org.kitesdk.data.DatasetOperationException: Failed to append
{"CLG_ID": "5",.....19/03/27 00:37:06 INFO mapreduce.Job: Task Id :
attempt_15088_130_m_000_2, Status : FAILED
Query to create saved job:
sqoop job -Dhadoop.security.credential.provider.path=jceks://xxxxx
--create job1 -- import --connect "jdbc:oracle:thinxxxxxx" --verbose --username user1 --password-alias alisas --query "select CLG_ID,.... from CLG_TBL where \$CONDITIONS" --as-parquetfile --incremental
append --check-column CLG_TS --target-dir /hdfs/clg_data/ -m 1
import query :
sqoop job -Dhadoop.security.credential.provider.path=jceks:/xxxxx
--exec job1 -- --connect "jdbc:oracle:xxx"
--username user1 --password-alias alisas --query "select CLG_ID,.... from CLG_TBL where \$CONDITIONS" --target-dir /hdfs/clg_data/ -m 1
--hive-import --hive-database clg_db --hive-table clg_table --as-parquetfile

This error is a known issue. We have faced with same problem a couple of weeks ago and
found this.
Here is the link.
Description of the problem or behavior
In HDP 3, managed Hive tables must be transactional (hive.strict.managed.tables=true). Transactional tables with Parquet format are not supported by Hive. Hive imports with --as-parquetfile must use external tables by specifying --external-table-dir.
Associated error message
Table db.table failed strict managed table checks due to the
following reason: Table is marked as a managed table but is not
transactional.
Workaround
When using --hive-import with --as-parquetfile, users must also provide --external-table-dir with a fully qualified location of the table:
sqoop import ... --hive-import
--as-parquetfile
--external-table-dir hdfs:///path/to/table

Related

Getting Protocol violation error while sqoop import from oracle to hive

I am trying to import data from oracle to hive through sqoop import command.But i am getting java.sql.SQLexception-protocol violation error. I checked found 1 text column with length(4000).
So i removed that column and ran the sqoop command its working.
So i found that because of that column only i am getting that protocol violation error.
Is this because of the length or something else.
Can someone help me on solving this.Below is the sqoop command i am using
sqoop import --connect jdbc:oracle:thin:#:port/servicename--username --password --query "select *from table_name where $CONDITIONS" --hive-drop-import-delims --target-dir /user/test --map-column-java -m 1

Error with sqoop import from mysql to hbase

I started learning sqoop recently with cloudera CDH5 VM.
I created mysql table from a CSV file having columns baseid, date, cars, kms.
Database used: mysql
Table created: uberdata
In hbase shell, I created with table name --myuberdatatable and column family --uber_details.
I checked with scan command and got to see empty table with 0 rows.
To Transfer the data from my mysql to hbase:
sqoop import jdbc:mysql://localhost/mysql --username root --password cloudera
--table uberdata --hbase-table myuberdatatable --column-family trip_details
--hbase-row-key base -m 1**
I am getting the following error:
Syntax error, unexpected tIdentifier
with a mark showing before jdbc.
It could be small error but tried to find solution in stackoverflow.
Can anyone help to fix this. Thanks in advance...
Yes, it is a syntax error. You have missed the connect keyword in the sqoop import statement.
Please use this format.[tested]
sqoop import --connect jdbc:mysql://localhost/emp --username root --password cloudera --table employee --hbase-table empdump --column-family emp_id --hbase-row-key id -m 1

Sqoop Imported Failed: Cannot convert SQL type 2005 when trying to import Oracle table

I get the following error when trying to import a table from an Oracle database as a parquet file.
ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 2005
This question has already been raised here, but the proposed solution does not help me.
I am trying to import a table from command line using the following command with parameters in <> filled in with their corresponding value:
sqoop import --connect jdbc:oracle:thin:#<host>:<port>/<service> --username <user> --password <password> --hive-import --query 'SELECT * FROM <DB>.<table> WHERE $CONDITIONS' --split-by <ID> --hive-database <HIVE_DB> --hive-table <HIVE_TABLE> --incremental append --check-column <ID> --map-column-hive <ID>=integer --compression-codec=snappy --target-dir=/user/hive/<FOLDER> --as-parquetfile --last-value 0 -m 1
Does anyone know how to solve this? I am not an expert on the sqooped Oracle database, but it seems to be due to the presence of CLOB data types.
I am running this command on CDH 5.8 with sqoop 1.4.6
Running the job without --as-parquetfile results in a sqoop job that seems to get stuck at map 0% reduce 0%.
Use --map-column-java to map clob datatype to Java String.
For example, you have a column C1. Use:
--map-column-java C1=String
Check docs for more details.

Oraoop disabled for Sqoop import

I'm using the Hortonworks HDP Sandbox, and I’ve installed Oraoop per the instructions, but whenever I run a Sqoop import I get the message “oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.”. I’m not sure what else I need to do for it to pick it up. I have verified that the oraoop driver is in my sqoop lib directory. The imports do work, but they are just using the oracle driver, and I would like to play around with some of the features that you get with Oraoop.
This is the command I'm running:
sqoop-import --connect jdbc:oracle:thin:#<ip>:1521/sid --username myUser -P --query "select * from mytable where \$CONDITIONS" -split-by sequence_id -as-sequencefile --target-dir /user/hue/data/deactivatedsponsor
If '--query' argument is specified in place of '--table' parm, Oraoop connector is not used.
Following is mentioned in Sqoop Documentation
Data Connector for Oracle and Hadoop accepts responsibility for those Sqoop Jobs with the following attributes:
Oracle-related
Table-Based - Jobs where the table argument is used and the specified object is a table.
Following command should use Oraoop Connector. I have included "--direct" option as well which indicates to Sqoop that Oraoop should be used.
sqoop-import --connect jdbc:oracle:thin:#<ip>:1521/sid --direct --username myUser -P --table mytable -split-by sequence_id -as-sequencefile --target-dir /user/hue/data/deactivatedsponsor --columns <columns list> --where <where condition if needed>
Oraoop connector cannot process --query tool, when you use --query it automatically invokes sqoop.
So instead of using --query use --table for import.
Hope this helps!!

Moving Sqoop data from HDFS to Hive

When importing a bunch of large MySQL tables into HDFS using Sqoop, I forgot to include the --hive-import flag. So now I've got these tables sitting in HDFS, and am wondering if there's an easy way to load the data into Hive (without writing the LOAD DATA statements myself).
I tried using sqoop create-hive-table:
./bin/sqoop create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets
While this did create the correct hive table, it didn't import any data into it. I have a feeling I'm missing something simple here...
For the record, I am using Elastic MapReduce, with Sqoop 1.4.1.
Can't you create an external table in hive and point it to these files?
create external table something(a string, b string) location 'hdfs:///some/path'
You did not specify "import" in your command. Syntax is sqoop tool-name [tool-arguments]
It should look like this:
$ sqoop import --create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets

Resources