SQOOP to HIVE in Parquet - parquet

SQOOP is not able to create a hive table in PARQUET format.
sqoop import --connect jdbc:mysql://localhost:3306/sqoop_practice --username root --password hortonworks1 -m 1 --delete-target-dir --target-dir /user/hduser/sqoop/import_customer --driver com.mysql.jdbc.Driver --fetch-size 1000 --table customer --fields-terminated-by '~' --hive-import --hive-database hivepractice --hive-table customer_parquet --as-parquetfile;
Error: Caused by: MetaException(message:Table hivepractice.customer_parquet failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional.)

Related

SemanticException 10072: database does not exist (Sqoop)

I created a hive internal table using sqoop command.
sqoop import -Dmapreduce.map.memory.mb=4096
--driver com.mysql.jdbc.Driver
--connect 'jdbc:mysql://{mysql_url}'
--username 'xxxx'
--password 'xxxx'
--input-fields-terminated-by '\t'
--split-by id
--target-dir {hdfs_path}
--verbose -m 1
--hive-drop-import-delims
--fields-terminated-by '\t'
--hive-import
--hive-table '{table_name}'
--query "select id from temp WHERE \$CONDITIONS LIMIT 10"
I created a table into it and it was working find.
19/01/06 19:33:44 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hadoop/{hdfs_path}' INTO TABLE `tmp.temp`
19/01/06 19:33:44 INFO hive.HiveImport: Loading uploaded data into Hive
19/01/06 19:33:44 DEBUG hive.HiveImport: Using in-process Hive instance.
19/01/06 19:33:44 DEBUG util.SubprocessSecurityManager: Installing subprocess security manager
Logging initialized using configuration in jar:file:${HADOOP_HOME}/hive-1.1.0-cdh5.14.2/lib/hive-common-1.1.0-cdh5.14.2.jar!/hive-log4j.properties
It created in hdfs warehouse location.
$ hadoop dfs -ls {hdfs_path}
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
19/01/06 19:43:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
0 2019-01-06 19:33 {hdfs_path}/_SUCCESS
65 2019-01-06 19:33 {hdfs_path}/part-m-00000.gz
BUT it was an error:
FAILED: SemanticException [Error 10072]: Database does not exist: tmp
I already hive-site.xml into sqoop conf directory.
cp ${HIVE_HOME}/conf/hive-site.xml ${SQOOP_HOME}/conf/hive-site.xml
"hive.metastore.uris" was set local and remote thrift.
How do I do? Help me. Thanks
Please used this sqoop command to import data from sqoop to hive with existing table and change clause according to your requirement.i have just modified some clause and added some clause as per requirement in your command
ubuntu#localhost:/usr/local/hive$ sqoop import -Dmapreduce.map.memory.mb=4096 --connect 'jdbc:mysql://localhost/test' --username 'root' -P --input-fields-terminated-by ',' --split-by id --target-dir /user/hive/warehouse/test_hive --hive-drop-import-delims --fields-terminated-by ',' --hive-import --hive-database default --hive-table test_hive --query "select id from test WHERE \$CONDITIONS LIMIT 10" --driver com.mysql.jdbc.Driver --delete-target-dir
Happy Hadooppppppppp

Getting Error When trying to Import Data From Oracle to HIVE

I am running below command to import data from Oracle Db to HIVE
sqoop import --connect jdbc:oracle:thin:#//localhost:1521/newDB --username <USERNAME> --P --table <ORACLE_TABLE_NAME> --hive-table <HIVE_TABLE_NAME> --hive-import -m 1
I am getting below Error when i am running this Query
17/11/21 05:05:46 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/11/21 05:05:46 INFO manager.SqlManager: Using default fetchSize of 1000
17/11/21 05:05:46 INFO tool.CodeGenTool: Beginning code generation
17/11/21 05:05:47 INFO manager.OracleManager: Time zone has been set to GMT
17/11/21 05:05:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "<TABLE_NAME>" t WHERE 1=0
17/11/21 05:05:47 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLE_NAME>. Please ensure that your table name is correct.
Check the owner of table and try using table_owner.table_name for oracle source.
I find you are not used --create-hive-table and few other parameter in your query.
Below is the sqoop import query i use in my project:
oracle_connection.txt will have the connection info.
sqoop --options-file oracle_connection.txt \
--table $DATABASE.$TABLENAME \
-m $NUMMAPPERS \
--where "$CONDITION" \
--hive-import \
--map-column-hive "$COLLIST" \
--create-hive-table \
--hive-drop-import-delims \
--split-by $SPLITBYCOLUMN \
--hive-table $HIVEDATABASE.$TABLENAME \
--bindir sqoop_hive_rxhome/bindir/ \
--outdir sqoop_hive_rxhome/outdir

How to import from sql server to hive using sqoop in java jdbc?

I have a table in sql server which i should import it to hive with sqoop using jdbc in java, How can i connect to hive with sqoop using JDBC and import?
sqoop import --connect jdbc:mysql://localhost:3306/sqoopdatabase --username sqoopuser --P --table employee -m 1 --hive-database hivelearning --hive-import --hive-drop-import-delims

sqoop import error from teradata to hive

I am using below given sqoop command:
sqoop import
--libjars /usr/hdp/2.4.0.0-169/sqoop/lib,/usr/hdp/2.4.0.0-169/hive/lib
--connect jdbc:teradata://x/DATABASE=x
--connection-manager org.apache.sqoop.teradata.TeradataConnManager
--username ec
--password dc
--query "select * from hb where yr_nbr=2017"
--hive-table schema.table
--num-mappers 1
--hive-import
--target-dir /user/hive/warehouse/GG
I'm getting this error:
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
17/04/06 11:15:41 INFO mapreduce.Job: map 100% reduce 0%
17/04/06 11:15:41 INFO mapreduce.Job: Task Id : attempt_1491466460468_0029_m_000000_1, Status : FAILED
Error: org.apache.hadoop.fs.FileAlreadyExistsException: /user/root/temp_111508/part-m-00000 for client 192.168.211.133 already exists
From the error, I can guess that the output file is already in your target directory, may be from your previous sqoop import. There is an option in sqoop import named --delete-target-dir which will delete your target output directory and re-create them in your next sqoop import. Hope that helps.

Sqoop job unable to work with Hadoop Credential API

I have stored my database passwords in Hadoop CredentialProvider.
Sqoop import from terminal is working fine, successfully fetching the password from CredentialProvider.
sqoop import
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
--table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias db2-dev-password
But when I try to setup as a Sqoop job, it is unable to recognize the -Dhadoop.security.credential.provider.path argument.
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias
Following is the error message:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: --password-alias
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password
I couldn't find any special instructions in Sqoop User Guide for configuring Hadoop credential API with Sqoop Job.
How to resolve this issue?
Re positioning the Sqoop parameters solve the problem.
sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias myPasswordAlias
Place the Hadoop credential before the Sqoop job keyword.
Your Sqoop job command is not proper, i.e. --password-alias is incomplete.
Please execute below command in your Hadoop server
hadoop credential list -provider jceks://hdfs/user/vijay/myPassword.jceks
Add the output in below Sqoop job command
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias <<output of above command>>

Resources