I created a hive internal table using sqoop command.
sqoop import -Dmapreduce.map.memory.mb=4096
--driver com.mysql.jdbc.Driver
--connect 'jdbc:mysql://{mysql_url}'
--username 'xxxx'
--password 'xxxx'
--input-fields-terminated-by '\t'
--split-by id
--target-dir {hdfs_path}
--verbose -m 1
--hive-drop-import-delims
--fields-terminated-by '\t'
--hive-import
--hive-table '{table_name}'
--query "select id from temp WHERE \$CONDITIONS LIMIT 10"
I created a table into it and it was working find.
19/01/06 19:33:44 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hadoop/{hdfs_path}' INTO TABLE `tmp.temp`
19/01/06 19:33:44 INFO hive.HiveImport: Loading uploaded data into Hive
19/01/06 19:33:44 DEBUG hive.HiveImport: Using in-process Hive instance.
19/01/06 19:33:44 DEBUG util.SubprocessSecurityManager: Installing subprocess security manager
Logging initialized using configuration in jar:file:${HADOOP_HOME}/hive-1.1.0-cdh5.14.2/lib/hive-common-1.1.0-cdh5.14.2.jar!/hive-log4j.properties
It created in hdfs warehouse location.
$ hadoop dfs -ls {hdfs_path}
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
19/01/06 19:43:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
0 2019-01-06 19:33 {hdfs_path}/_SUCCESS
65 2019-01-06 19:33 {hdfs_path}/part-m-00000.gz
BUT it was an error:
FAILED: SemanticException [Error 10072]: Database does not exist: tmp
I already hive-site.xml into sqoop conf directory.
cp ${HIVE_HOME}/conf/hive-site.xml ${SQOOP_HOME}/conf/hive-site.xml
"hive.metastore.uris" was set local and remote thrift.
How do I do? Help me. Thanks
Please used this sqoop command to import data from sqoop to hive with existing table and change clause according to your requirement.i have just modified some clause and added some clause as per requirement in your command
ubuntu#localhost:/usr/local/hive$ sqoop import -Dmapreduce.map.memory.mb=4096 --connect 'jdbc:mysql://localhost/test' --username 'root' -P --input-fields-terminated-by ',' --split-by id --target-dir /user/hive/warehouse/test_hive --hive-drop-import-delims --fields-terminated-by ',' --hive-import --hive-database default --hive-table test_hive --query "select id from test WHERE \$CONDITIONS LIMIT 10" --driver com.mysql.jdbc.Driver --delete-target-dir
Happy Hadooppppppppp
Hi i am trying to import all table from all schema from Oracle DB to HDFS.
This is my script:
sqoop-import-all-tables -Dmapreduce.job.user.classpath.first=true -Dhadoop.security.credential.provider.path=jceks://x.jceks --connect jdbc:oracle:thin:#x.x.x.x:1521/yyyy --username xxxx --password xxxx --warehouse-dir /data-warehouse/xxxx --as-avrodatafile --compression-codec snappy --autoreset-to-one-mapper
When i am running this script, not getting any error and no any Job is starting.
Output:
Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: Permission denied
18/08/11 08:32:51 INFO sqoop.Sqoop: Running **Sqoop version: 1.4.6.2.6.2.0-205**
18/08/11 08:32:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/11 08:32:51 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/08/11 08:32:51 INFO manager.SqlManager: Using default fetchSize of 1000
18/08/11 08:32:53 INFO manager.OracleManager: Time zone has been set to IST
It seems that the user configured in sqoop does not have enough privileges to query and export the data from Oracle. Please check connect and query from command line to Oracle database.
Regards !!!
I am running below command to import data from Oracle Db to HIVE
sqoop import --connect jdbc:oracle:thin:#//localhost:1521/newDB --username <USERNAME> --P --table <ORACLE_TABLE_NAME> --hive-table <HIVE_TABLE_NAME> --hive-import -m 1
I am getting below Error when i am running this Query
17/11/21 05:05:46 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/11/21 05:05:46 INFO manager.SqlManager: Using default fetchSize of 1000
17/11/21 05:05:46 INFO tool.CodeGenTool: Beginning code generation
17/11/21 05:05:47 INFO manager.OracleManager: Time zone has been set to GMT
17/11/21 05:05:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "<TABLE_NAME>" t WHERE 1=0
17/11/21 05:05:47 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLE_NAME>. Please ensure that your table name is correct.
Check the owner of table and try using table_owner.table_name for oracle source.
I find you are not used --create-hive-table and few other parameter in your query.
Below is the sqoop import query i use in my project:
oracle_connection.txt will have the connection info.
sqoop --options-file oracle_connection.txt \
--table $DATABASE.$TABLENAME \
-m $NUMMAPPERS \
--where "$CONDITION" \
--hive-import \
--map-column-hive "$COLLIST" \
--create-hive-table \
--hive-drop-import-delims \
--split-by $SPLITBYCOLUMN \
--hive-table $HIVEDATABASE.$TABLENAME \
--bindir sqoop_hive_rxhome/bindir/ \
--outdir sqoop_hive_rxhome/outdir
I want to export hdfs file to sql server. I'm using sqoop for that
sqoop export --bindir . --connect "jdbc:sqlserver://server;database=db" --username sa --password pwd --table sqoop_test -m 1 --export-dir /user/sqooptest
but i get the following error.
Warning: /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
sqoop export --bindir . --connect "jdbc:sqlserver://server;database=db" --username sa --password pwd --table sqoop_test -m 1 --export-dir /user/sqooptest
Warning: /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 03:59:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 03:59:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 03:59:07 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 03:59:07 INFO tool.CodeGenTool: Beginning code generation
16/07/30 03:59:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [sqoop_test] AS t WHERE 1=0
16/07/30 03:59:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop/hadoop-2.6.0
Note: ./sqoop_test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/07/30 03:59:10 INFO orm.CompilationManager: Writing jar file: ./sqoop_test.jar
16/07/30 03:59:10 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at java.util.Arrays$ArrayList.<init>(Arrays.java:3813)
at java.util.Arrays.asList(Arrays.java:3800)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:76)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListing(FileListing.java:67)
at com.cloudera.sqoop.util.FileListing.getFileListing(FileListing.java:39)
at org.apache.sqoop.orm.CompilationManager.addClassFilesFromDir(CompilationManager.java:284)
at org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:346)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:109)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
the file has only 3 rows with three columns each. it has no null values. I tried using --input-null-string as well.
my sql table :
create table sqoop_test
(id int,
name nvarchar(200),
title nvarchar(200))
and the file content in hdfs is,
5,X,analyst
6,Y,architect
7,Z,lead
I'm trying to import data directly from mysql to parquet but it doesn't seem to work correctly...
I'm using CDH5.3 which includes Sqoop 1.4.5.
Here is my command line :
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile
Then I get this error :
Warning: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/01/09 14:31:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
15/01/09 14:31:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/01/09 14:31:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/01/09 14:31:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/01/09 14:31:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/01/09 14:31:49 INFO tool.CodeGenTool: Beginning code generation
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/01/09 14:31:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.jar
15/01/09 14:31:51 INFO mapreduce.ImportJobBase: Beginning query import.
15/01/09 14:31:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE (1 = 0)
15/01/09 14:31:51 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
15/01/09 14:31:51 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.create(Datasets.java:189)
at org.kitesdk.data.Datasets.create(Datasets.java:240)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:721)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I have no problem importing data to hive file format but parquet is a problem... Do you have any idea why this occurs ?
Thank you :)
Please do not use <db>.<table> with --hive-table. This doesn't work well with Parquet import. Sqoop uses Kite SDK to write Parquet files and it doesn't like this <db>.<table> format.
Instead, please use --hive-database --hive-table . for your command, it should be:
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database \
--username username --password mypass \
--query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id \
--hive-import --hive-database default --hive-table pages_users3 \
--target-dir hive_pages_users --as-parquetfile
Here's my pipeline in CDH 5.5 to import from a jdbc into Hive parquet files.
JDBC data source is for Oracle, but explanation below fits MySQL too.
1) Sqoop:
$ sqoop import --connect "jdbc:oracle:thin:#(complete TNS descriptor)" \
--username MRT_OWNER -P \
--compress --compression-codec snappy \
--as-parquetfile \
--table TIME_DIM \
--warehouse-dir /user/hive/warehouse \
--num-mappers 1
I chose --num-mappers as 1 because TIME_DIM table had just around ~20k rows, and it's not advised to split parquet table into multiple files for such a small dataset. Each mapper creates a separate output (parquet) file.
(ps. for Oracle users: I had to connect as owner of the source table, otherwise had to specify "MRT_OWNER.TIME_DIM", and was getting error org.kitesdk.data.ValidationException: Namespace MRT_OWNER.TIME_DIM is not alphanumeric (plus '_'), seems a sqoop bug).
(ps2. Table name had to be all-uppercase.. not sure if this is Oracle specific (shouldn't be) and if this is another sqoop bug).
(ps3. --compress --compression-codec snappy parameters were recognized but did not seem made any effect)
2) Above command creates a directory named
/user/hive/warehouse/TIME_DIM
It's a wise idea to move it to a specific Hive database directory, e.g.:
$ hadoop fs -mv /hivewarehouse/TIME_DIM /hivewarehouse/dwh.db/time_dim
Assuming name of Hive database/schema is "dwh".
3) Create Hive table, by taking schema directly from parquet file:
$ hadoop fs -ls /user/hive/warehouse/dwh.db/time_dim | grep parquet
-rwxrwx--x+ 3 hive hive 1216 2016-02-04 23:56 /user/hive/warehouse/dwh.db/time_dim/62679a1c-b848-426a-bb8e-9372328ddad7.parquet
If above command returns more than parquet file (it means you had more than one mapper, the --num-mappers parameter), you can pick any parquet file into the below command.
This command should run in Impala and not in Hive. Hive currently can't infer schema from parquet files, but Impala can:
[impala-shell] > CREATE TABLE dwh.time_dim
LIKE PARQUET '/user/hive/warehouse/dwh.db/time_dim/62679a1c-b848-426a-bb8e-9372328ddad7.parquet'
COMMENT 'sqooped from MRT_OWNER.TIME_DIM'
STORED AS PARQUET
LOCATION 'hdfs:///user/hive/warehouse/dwh.db/time_dim'
;
ps. It's also possible to infer schema from parquet using Spark, e.g.
spark.read.schema('hdfs:///user/hive/warehouse/dwh.db/time_dim')
4) Since table wasn't created in Hive (which collects stats automatically), it's a good idea to collect stats:
[impala-shell] > compute stats dwh.time_dim;
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_import_literal
--as-parquetfile
was added in Sqoop 1.4.6 (CDH 5.5).
(ps. for Oracle users: I had to connect as owner of the source table, otherwise had to specify "MRT_OWNER.TIME_DIM", and was getting error org.kitesdk.data.ValidationException: Namespace MRT_OWNER.TIME_DIM is not alphanumeric (plus '_'), seems a sqoop bug).
This can be fixed if database name and table name is written as db_name/table_name instead of db_name.table_name.
Seems like database support is missing in your distribution. It looks like it was added rather recently. Try setting --hive-table to --hive-table pages_users3 and removing --target-dir.
If the above doesn't work, try:
This blog post.
The docs.
Check with the user#sqoop.apache.org mailing list.
I found a solution, I droppped all the hive parts and use the target dir to store the data... Seems to work :
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --target-dir /home/cloudera/user/hive/warehouse/soprism.db/pages_users3 --as-parquetfile -m 1
I then link to the directory making an external table from Impala...