Sqoop export of a hive table partitioned on an int column - hadoop

I have a Hive table partitioned on an 'int' column.
I want to export the Hive table to MySql using Sqoop export tool.
sqoop export --connect jdbc:mysql://XXXX:3306/temp --username root --password root --table emp --hcatalog-database temp --hcatalog-table emp
I tried the above sqoop command but it failed with below exception.
ERROR tool.ExportTool: Encountered IOException running export job: java.io.IOException:
The table provided temp.emp uses unsupported partitioning key type for column mth_id : int.
Only string fields are allowed in partition columns in HCatalog
I understand that the partition on int column is not supported.
But would like to check whether this issue is fixed in any of the latest releases with an extra config/option.
As a workaround, I can create another table without a partition before exporting.But I would like to check whether there is a better way to achieve this?
Thanks in advance.

Related

Can Sqoop update record on Oracle RDBMS table that have different column structure with Hive table

I'm a Hadoop newcomer trying to export data from Hive to Oracle. Can Sqoop update data to Oracle table let say,
Oracle Table have column A,B,C,D,E
I stored data on Hive table as B,C,E
Can Sqoop export update(just update, not upsert) with B,C as update keys and update just the E column from Hive?
Pls mention --update-key Prim_key_col_in_table. Pls note --update-mode default is updateonly so you dont have to mention anything.
You can also add input-fields-terminated-by command if you want to.
here is a sample command -
sqoop export --connect jdbc:mysql://xxxxxx/mytable --username xxxxx --password xxxxx --table export_sqoop_mytable --update-key Prim_key_col_in_table --export-dir /user/ingenieroandresangel/datasets/mytable.txt -m 1

Loading Sequence File data into hive table created using stored as sequence file failing

Importing the content from MySQL to HDFS as sequence files using below sqoop import command
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username retail_dba --password cloudera
--table orders
--target-dir /user/cloudera/sqoop_import_seq/orders
--as-sequencefile
--lines-terminated-by '\n' --fields-terminated-by ','
Then i'm creating the hive table using the below command
create table orders_seq(order_id int,order_date string,order_customer_id int,order_status string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS SEQUENCEFILE
But when I tried to load sequence data obtained from 1st command into hive table using the below command
LOAD DATA INPATH '/user/cloudera/sqoop_import_seq/orders' INTO TABLE orders_seq;
It is giving the below error.
Loading data to table practice.orders_seq
Failed with exception java.lang.RuntimeException: java.io.IOException: WritableName can't load class: orders
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Where am I going wrong?
First of all, It's necessary to have the data in that format?
Let's suppose you have to have the data in that format. The load data command is not necessary. Once the sqoop finishes importing data, you will just have to create a Hive table pointing the same directory where you sqoop the data.
One side note from your scripts:
create table orders_seq(order_id int,order_date string,order_customer_id int,order_status string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS SEQUENCEFILE
Your sqoop command says this: --fields-terminated-by ',' but when you are creating the table you are using: FIELDS TERMINATED BY '|'
In my experience, the best approach I thing is to sqoop the data as avro, this will create automatically an avro-schema. Then you will just to have to create a Hive table using the schema previously created (AvroSerde) and using the location where you stored the data you got from sqooping process.

incremental "lastmodified" not working in sqoop

I'm trying sqoop to perform incremental import from Teradata DB to Hive. Below is the query:
sqoop import --connect jdbc:teradata://xxx.xxx.x.xx/DATABASE=DBN --driver com.teradata.jdbc.TeraDriver --username userN --password pass --query "SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where \$CONDITIONS" --target-dir /apps/hive/warehouse/staging.db/tableName -m 26 --check-column call_date --incremental append --split-by alias.colA --last-value '2016-02-01'
The column call_date is of DATE type, values in the format 'YYYY-MM-DD'.
When I use 'append' for --incremental, everything works fine. But when I put 'lastmodified', the following error is thrown:
ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
ERROR util.SqlTypeMap: correct column names in Sqoop options.
ERROR tool.ImportTool: Imported Failed: column not found: call_date
I'm using sqoop 1.4.4.2.1 on HDP 2.1
While Teradata DB is 14.10
Any pointers will be helpful.
I think, in case of query you can perform the last value check in the query itself some think like this
"SELECT alias.colA, alias.call_date, alias.colB, alias.colC FROM tableName alias where call_date >'2016-02-01' and \$CONDITIONS" .
Reference (refer section Incrementally Updating Data in Hive > 1.Ingest the data.)
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html

Sqoop Hive table import, Table dataType doesn't match with database

Using Sqoop to import data from oracle to hive, its working fine but it create table in hive with only 2 dataTypes String and Double. I want to use timeStamp as datatype for some columns.
How can I do it.
bin/sqoop import --table TEST_TABLE --connect jdbc:oracle:thin:#HOST:PORT:orcl --username USER1 -password password -hive-import --hive-home /user/lib/Hive/
In addition to above answers we may also have to observe when the error is coming, e.g.
In my case I had two types of data columns that caused error: json and binary
for json column the error came while a Java Class was executing, at the very beginning of the import process :
/04/19 09:37:58 ERROR orm.ClassWriter: Cannot resolve SQL type
for binary column, error was thrown while importing into the hive tables (after data is imported and put into HDFS files)
16/04/19 09:51:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive does not support the SQL type for column featured_binary
To get rid of these two errors, I had to provide the following options
--map-column-java column1_json=String,column2_json=String,featured_binary=String --map-column-hive column1_json=STRING,column2_json=STRING,featured_binary=STRING
In summary, we may have to provide the
--map-column-java
or
--map-column-hive
depending upon the failure.
You can use the parameter --map-column-hive to override default mapping. This parameter expects a comma-separated list of key-value pairs separated by = to specify which column should be matched to which type in Hive.
sqoop import \
...
--hive-import \
--map-column-hive id=STRING,price=DECIMAL
A new feature was added with sqoop-2103/sqoop 1.4.5 that lets you call out the decimal precision with the map-column-hive parameter. Example:
--map-column-hive 'TESTDOLLAR_AMT=DECIMAL(20%2C2)'
This syntax would define the field as a DECIMAL(20,2). The %2C is used as a comma and the parameter needs to be in single quotes if submitting from the bash shell.
I tried using Decimal with no modification and I got a Decimal(10,0) as a default.

Sqoop Import is completed successfully. How to view these tables in Hive

I am trying something on hadoop and its related things. For this, I have configured hadoop, hase, hive, sqoop in Ubuntu machine.
raghu#system4:~/sqoop$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/mysql --username root --password password --table user --hive-import -m 1
All goes fine, but when I enter hive command line and execute show tables, there are nothing. I am able to see that these tables are created in HDFS.
I have seen some options in Sqoop import - it can import to Hive/HDFS/HBase.
When importing into Hive, it is indeed importing directly into HDFS. Then why Hive?
Where can I execute HiveQL to check the data.
From cloudera Support, I understood that I can Hue and check it. But, I think Hue is just an user interface to Hive.
Could someone help me here.
Thanks in advance,
Raghu
I was having the same issue. I was able to work around/through it by importing the data directly into HDFS and then create an External Hive table to point at that specific location in HDFS. Here is an example that works for me.
create external table test (
sequencenumber int,
recordkey int,
linenumber int,
type string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
location '/user/hdfs/testdata';
You will need to change your location to where you saved the data in HDFS.
Can you post the output from sqoop? Try using --verbose option.
Here's an example of the command I use, and it does import directly to a Hive table.
sqoop import --hive-overwrite --hive-drop-import-delims --warehouse-dir "/warehouse" --hive-table hive_users --connect jdbc:mysql://$MYSQL_HOST/$DATABASE_NAME --table users --username $MYSQL_USER --password $MYSQL_PASS --hive-import
when we are not giving any database in the sqoop import command,the table will be created in the default database with the same name of the RDBMS table name.
you can specify the database name where you want to import the the RDBMS table in hive by "--hive-database".
Instead of creating the Hive table every time, you can import the table structure in the hive using the create-hive-table command of sqoop. It will import the table as managed_table then you can convert that table to external table by changing the table properties to external table and then add partition. This will reduce the effort of finding the right data type. Please note that there will be precision change
Whenever ,you are using a Sqoop with Hive import option,the sqoop connects directly the corresponding the database's metastore and gets the corresponding table 's metadata(the table's schema),so there is no need to create a table structure in Hive.This schema is then provided to the Hive when used with Hive-import option.
So the output of all the sqoop data on HDFS will by default stored in the default directory .i.e /user/sqoop/tablename/part-m files
with hive import option,the tables will be downloaded directly into the default warehouse direcotry i.e.
/user/hive/warehouse/tablename
command : sudo -u hdfs hadoop fs -ls -R /user/
this lists recursively all the files with in the user.
Now go to Hive and type show databases.if there is only default database,
then type show tables:
remember OK is common default system output and is not part of the command output.
hive> show databases;
OK
default
Time taken: 0.172 seconds
hive> show tables;
OK
genre
log_apache
movie
moviegenre
movierating
occupation
user
Time taken: 0.111 seconds
Try sqoop command like this, its working for me and directly creating hive table, u need not create external table every time
sqoop import --connect DB_HOST --username ***** --password ***** --query "select *from SCHEMA.TABLE where \$CONDITIONS"
--num-mappers 5 --split-by PRIMARY_KEY --hive-import --hive-table HIVE_DB.HIVE_TABLE_NAME --target-dir SOME_DIR_NAME;
The command you are using imports data into the $HIVE_HOME directory. If the HIVE_HOME environment variable is not set or points to a wrong directory, you will not be able to see imported tables.
The best way to find the hive home directory is to use the Hive QL SET command:
hive -S -e 'SET' | grep warehouse.dir
Once you retrieved the hive home directory, append the --hive-home <hive-home-dir>option to your command.
Another possible reason is that in some Hive setups the metadata is cached and you cannot see the changes immediately. In this case you need to flush the metadata cache, using the INVALIDATE METADATA;command.

Resources