SQOOP-IMPORT : create-hcatalog-table vs create-hive-table - sqoop

I am trying to do a sqoop-import into hive database.
I would like to know the difference in these two available options
create-hcatalog-table vs create-hive-table
sqoop-import
--connect jdbc:mysql://localhost:3306/hadoopexample
--table employees
--create-hive-table
--fields-terminated-by ','
;
vs
sqoop-import
--connect jdbc:mysql://localhost:3306/hadoopexample
--table employees
--create-hcatalog-table
--fields-terminated-by ','
;

Related

Sqoop Hive Import not support alphanumeric (plus '_')

I would like to import data from Oracle to Hive by using Sqoop as Parquet file.
I have been trying to import data using sqoop using the following command:
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --hive-overwrite --hive-import --hive-database default --hive-table tmp3_cmst_offense_record_family --hive-partition-key load_dt --hive-partition-value '20200213' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I get the following error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
I've tried to remove
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I still got the same error.
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
Please try rewriting this part:
--hive-table default.tmp3_cmst_offense_record_family
with this one:
--hive-table tmp3_cmst_offense_record_family
You already specified the database name with the clause --hive-database

Sqoop - FileAlreadyExists exception

I need some help with sqoop.
First of all, I'm sorry, my english isn't very good.
Using the folowing command:
sqoop import -D mapreduce.output.fileoutputformat.compress=false --num-mappers 1 --connection-manager "com.quest.oraoop.OraOopConnManager" --connect "jdbc:Oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver)(PORT=1534)))(CONNECT_DATA=(SERVICE_NAME=myservice)))" --username "rodrigo" --password pwd \
--query "SELECT column1, column2 from myTable where \$CONDITIONS" \
--null-string '' --null-non-string '' --fields-terminated-by '|' \
--lines-terminated-by '\n' --as-textfile --target-dir /data/rodrigo/myTable \
--hive-import --hive-partition-key yearmonthday --hive-partition-value '20180101' --hive-overwrite --verbose -P --m 1 --hive-table myTable
My table is already created, because I must create a solicitation for create a table in my hive database, so I can't create dinamically inside sqoop command.
I have permission to create the directory in hdfs.
When I remove the directory, sqoop logs an error saying that I have no create table permissions, and when I already create the diretory, it returns a FileAlreadyExistsException.
What can I do to solve that?
Thanks from Brazil.

Sqoop append not working for hdfs

I tried to (CDC)append new data of a table from Teradata database to HDFS path of already moved old data.
sqoop import --connect 'jdbc:teradata://xx.xx.xx.xx/database_name' --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --query "select * from database_name.table_name WHERE (1=0 OR \$CONDITIONS) tymstp>'2016-07-28 04:49:53.44'" --fields-terminated-by '\t' --lines-terminated-by '\n' --target-dir /hdfs/path -m 1 --append

Import BLOB (Image) from oracle to hive

I am trying to import BLOB(Image)data form oracle to Hive using below Sqoop command.
sqoop import --connect jdbc:oracle:thin:#host --username --password --m 3 --table tablename --hive-drop-import-delims --hive-table tablename --target-dir '' --split-by id;
But unsuccessful. Remember, BLOB Data stored in oracle database as Hexadecimal and we need to store this to Hive table as text or bianary.
What are the possible way to do that?
Sqoop does not know how to map blob datatype in oracle into Hive. So You need to specify --map-column-hive COLUMN_BLOB=binary
sqoop import --connect 'jdbc:oracle:thin:#host' --username $USER --password $Password --table $TABLE --hive-import --hive-table $HiveTable --map-column-hive COL_BLOB=binary --delete-target-dir --target-dir $TargetDir -m 1 -verbose

Using sqoop import, How to append rows into existing hive table?

From SQL server I imported and created a hive table using the below query.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by ','
Command was successful, imported the data and created a table with 10000 records.
I inserted 10 new records in SQL server and tried to append these 10 records into existing hive table using --where clause
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable
But the sqoop job is getting failed with error
ERROR tool.ImportTool: Error during import: Import job failed!
Where am I going wrong? any other alternatives to insert into table using sqoop.
EDIT:
After slightly changing the above command I am able to append the new rows.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable --fields-terminated-by ',' -m 1
Though it resolves the mentioned problem, I can't insert the modified rows. Is there any way to insert the modified rows without using
--incremental lastmodified parameter.
in order to append rows to hive table, use the same query you have been using before, just remove the --hive-overwrite.
I will share the 2 queries that I used to import in hive, one for overwriting and one for append, you can use the same for importing:
To OVERWRITE the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://database_connection_string/DATABASE=database_name,TMODE=ANSI,LOGMECH=LDAP --username z****** --password ******* --query "select * from ****** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-table hive_table_name --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_nameĀ  --delete-target-dir --hive-import --hive-overwrite --driver com.teradata.jdbc.TeraDriver
TO APPEND to the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://connection_string/DATABASE=db_name,TMODE=ANSI,LOGMECH=LDAP --username ****** --password ******--query "select * from **** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name --delete-target-dir --driver com.teradata.jdbc.TeraDriver
Note that I am using 4 mappers, you can use more as well.
I am not sure if you can give direct --append option in sqoop with --hive-import option. Its still not available atleast in version 1.4.
The default behavior is append when --hive-overwrite and --create-hive-table is missing. (atleast in this context.
I go with nakulchawla09's answer. Though remind yourself to keep the --split-by option . This will ensure the split names in hive data store is appropriately created. otherwise you will not like the default naming. You can ignore this comment in case you don't care for the backstage hive warehouse naming and backstage data store. When i tried with the below command
Before the append
beeline:hive2> select count(*) from geolocation;
+-------+--+
| _c0 |
+-------+--+
| 8000 |
+-------+--+
file in hive warehouse before the append
-rwxrwxrwx 1 root hdfs 479218 2018-10-12 11:03 /apps/hive/warehouse/geolocation/part-m-00000
sqoop command for appending additional 8k records again
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --target-dir /rawdata --hive-import --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
it created the below files. You can see the file name is not great because did not give a split by option or split hash (can be datetime or date).
-rwxrwxrwx 1 root hdfs 479218 2018-10-12 11:03 /apps/hive/warehouse/geolocation/part-m-00000
-rwxrwxrwx 1 root hdfs 479218 2018-10-12 11:10 /apps/hive/warehouse/geolocation/part-m-00000_copy_1
hive records appended now
beeline:hive2> select count(*) from geolocation;
+-------+--+
| _c0 |
+-------+--+
| 16000 |
+-------+--+
We can use this command:
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --query 'select * from demotable where ID > 10000' --hive-import --hive-table hivedb.demotable --target-dir demotable_data
Use --append option and -m 1 so it will be like below :
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --append -m 1

Resources