sqoop import getting hang while running MAPREDUCE job - hadoop

I am trying to import table from mysql to Sqoop. however while running the sqoop import -- the job is getting stuck in Mapreduce Job.
Please help
Code USED:
sqoop import --connect jdbc:mysql://localhost/test --direct --username=root --password= --table=authors --hive-import --hive-table=mydb.authors --target-dir=user/root/sample -m 1
Log:
16/10/31 14:32:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/31 14:32:11 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/10/31 14:32:11 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/10/31 14:32:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/31 14:32:11 INFO tool.CodeGenTool: Beginning code generation
16/10/31 14:32:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `authors` AS t LIMIT 1
16/10/31 14:32:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `authors` AS t LIMIT 1
16/10/31 14:32:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
16/10/31 14:32:12 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/509cb57707137dd45538cf81fd7e11b1/authors.java uses or overrides a deprecated API.
16/10/31 14:32:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/31 14:32:11 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/10/31 14:32:11 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/10/31 14:32:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/31 14:32:11 INFO tool.CodeGenTool: Beginning code generation
16/10/31 14:32:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `authors` AS t LIMIT 1
16/10/31 14:32:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `authors` AS t LIMIT 1
16/10/31 14:32:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
16/10/31 14:32:12 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/509cb57707137dd45538cf81fd7e11b1/authors.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/31 14:32:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/509cb57707137dd45538cf81fd7e11b1/authors.jar
16/10/31 14:32:14 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
16/10/31 14:32:14 INFO mapreduce.ImportJobBase: Beginning import of authors

you could create hive table before your execute the sqoop commend if you don't use --create-hive-table , chack your hive table exist?
eg:
sqoop import
--connect jdbc:oracle:thin:#//***/orcl
--username ***
--password ***
--table TEST
--hive-import
--create-hive-table
--hive-database myDB
--hive-table table1
--hive-override

Related

Error while importing tables from Mysql using Sqoop

I am trying to import table from Mysql database using sqoop. Mysql is installed in the same box where sqoop, hadoop and hive installed and i can access the database from terminal. while trying to import getting below error. Please help to resolve this.
sqoop/bin$ ./sqoop import --connect jdbc:mysql://localhost/sqoop_test --username **** --password ***** --table emp2 --delete-target-dir -m 1;
Warning: /home/skd799/Downloads/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/skd799/Downloads/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/skd799/Downloads/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/skd799/Downloads/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/05/18 15:24:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
18/05/18 15:24:09 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/18 15:24:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/05/18 15:24:09 INFO tool.CodeGenTool: Beginning code generation
18/05/18 15:24:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp2` AS t LIMIT 1
18/05/18 15:24:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp2` AS t LIMIT 1
18/05/18 15:24:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/skd799/Downloads/hadoop
Note: /tmp/sqoop-hduser/compile/d58481969b312338b764bc550c174b3a/emp2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/18 15:24:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/d58481969b312338b764bc550c174b3a/emp2.jar
18/05/18 15:24:12 INFO tool.ImportTool: Destination directory emp2 deleted.
18/05/18 15:24:12 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/05/18 15:24:12 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/05/18 15:24:12 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/05/18 15:24:12 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/05/18 15:24:12 INFO mapreduce.ImportJobBase: Beginning import of emp2
18/05/18 15:24:13 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/18 15:24:14 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/18 15:24:15 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/05/18 15:24:15 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 2.9424 seconds (0 bytes/sec)
18/05/18 15:24:15 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/05/18 15:24:15 INFO mapreduce.ImportJobBase: Retrieved 0 records.
18/05/18 15:24:15 ERROR tool.ImportTool: Error during import: Import job failed!
can you write your sqoop query for importing the tables

Sqoop import getting halted

I am trying to import a table from mysql to HDFS,but it is getting paused here as below:
sqoop import --connect jdbc:mysql://localhost/movielens --username root --table tutorials_tbl --m 1
16/11/26 10:47:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/11/26 10:47:33 INFO tool.CodeGenTool: Beginning code generation
16/11/26 10:47:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `tutorials_tbl` AS t LIMIT 1
16/11/26 10:47:34 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop
16/11/26 10:47:34 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop/hadoop-core.jar
16/11/26 10:47:36 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-training/compile/f150b283edf7b39ed9facc57a781542e/tutorials_tbl.java to /home/training/./tutorials_tbl.java
java.io.IOException: Destination '/home/training/./tutorials_tbl.java' already exists
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
at com.cloudera.sqoop.orm.CompilationManager.compile(CompilationManager.java:229)
at com.cloudera.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:85)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:369)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:455)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
16/11/26 10:47:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/f150b283edf7b39ed9facc57a781542e/tutorials_tbl.jar
16/11/26 10:47:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/11/26 10:47:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/11/26 10:47:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/11/26 10:47:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/11/26 10:47:36 INFO mapreduce.ImportJobBase: Beginning import of tutorials_tbl
This is the last line of the job.
Thanks in advance!
Sqoop is a Map only job hence following all semantics of MR .
Please delete "/home/training/./tutorials_tbl.java" and rerun your job.

sqoop null pointer exception

I am trying to run a sqoop import but it is failing with null pointer exception. Can somebody please help?
This is the command I ran:
sqoop import-all-tables --bindir ./ --num-mappers 1 --connect "jdbc:mysql://localhost/retail_db" --username=***** --password=****** --hive-import --hive-overwrite --create-hive-table --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --outdir java_files --bindir ./sqoop/
And this is what I get:
16/08/14 12:14:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/08/14 12:14:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/14 12:14:47 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/08/14 12:14:47 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/08/14 12:14:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/14 12:14:48 INFO tool.CodeGenTool: Beginning code generation
16/08/14 12:14:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/08/14 12:14:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/08/14 12:14:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: ./categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/14 12:14:49 INFO orm.CompilationManager: Writing jar file: ./categories.jar
16/08/14 12:14:49 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at java.util.Arrays$ArrayList.<init>(Arrays.java:2842)
at java.util.Arrays.asList(Arrays.java:2828)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:76)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82)
at org.apache.sqoop.util.FileListing.getFileListing(FileListing.java:67)
at com.cloudera.sqoop.util.FileListing.getFileListing(FileListing.java:39)
at org.apache.sqoop.orm.CompilationManager.addClassFilesFromDir(CompilationManager.java:284)
at org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:346)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:109)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:111)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Sqoop running into local job runner mode

When i run sqoop am not sure why it runs into local job runner mode and then says that i have provided invalid jobtracker url for LocalJobRunner. Can anyone tell whats going on?
$ bin/sqoop import -jt myjobtracker:50070 --connect jdbc:mysql://mydbhost.com/mydata --username foo --password bar --as-parquetfile --table campaigns --target-dir hdfs://myhdfs:8020/user/myself/campaigns
14/08/20 21:04:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-SNAPSHOT
14/08/20 21:04:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/08/20 21:04:51 INFO manager.SqlManager: Using default fetchSize of 1000
14/08/20 21:04:51 INFO tool.CodeGenTool: Beginning code generation
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `campaigns` AS t LIMIT 1
14/08/20 21:04:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-myself/compile/6acdb40688239f19ddf86a1290ad6c64/campaigns.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/08/20 21:04:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-myself/compile/6acdb40688239f19ddf86a1290ad6c64/campaigns.jar
14/08/20 21:04:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/08/20 21:04:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/08/20 21:04:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/08/20 21:04:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/08/20 21:04:54 INFO mapreduce.ImportJobBase: Beginning import of campaigns
14/08/20 21:04:54 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
14/08/20 21:04:54 WARN mapred.JobConf: The variable mapred.child.ulimit is no longer used.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/hbase/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/08/20 21:04:54 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/08/20 21:04:56 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/08/20 21:04:56 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "myjobtracker:50070"
14/08/20 21:04:56 ERROR security.UserGroupInformation: PriviledgedActionException as:myself (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
14/08/20 21:04:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1239)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1235)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1234)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1263)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:102)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Figured out the problem, i was running sqoop 1.4.5 and pointing it to the latest hadoop 2.0.0-cdh4.4.0 which had the yarn stuff also thats why it was complaining.
When i pointed sqoop to hadoop-0.20/2.0.0-cdh4.4.0 (MR1 i think) it worked.

Sqoop incremental import (db schema incorrect)

Im trying to do an incremental import using the Sanbox 2.1 and a Microsoft SQL Server (AdventureWorks database). For the incremental import i’m using the following command:
sqoop import --connect "jdbc:sqlserver://192.168.40.133:1434;database=AdventureWorksLT2012;username=test;password=test" --table ProductModel --hive-import --check-column ProductModelID --incremental append --last-value 128 -- --schema SalesLT
As you can see in the error message below, the select statement “SELECT MAX([SalesLT].[ProductModelID]) FROM ProductModel” is not constructed the right way.
The schema name is added to the column without the table name and the table name in the FROM clause is missing the schema name…
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/06/24 07:05:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
14/06/24 07:05:33 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/06/24 07:05:33 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/06/24 07:05:33 INFO manager.SqlManager: Using default fetchSize of 1000
14/06/24 07:05:33 INFO manager.SQLServerManager: We will use schema SalesLT
14/06/24 07:05:33 INFO tool.CodeGenTool: Beginning code generation
14/06/24 07:05:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [SalesLT].[ProductModel] AS t WHERE 1=0
14/06/24 07:05:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/bcb9143989664ced51458e8a0dbd52b9/ProductModel.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/06/24 07:05:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bcb9143989664ced51458e8a0dbd52b9/ProductModel.jar
14/06/24 07:05:37 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX([SalesLT].[ProductModelID]) FROM ProductModel
14/06/24 07:05:37 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'ProductModel'.
Any help is appreciated.
Thank you!
PS.
Importing a full table is working fine.
sqoop import --connect "jdbc:sqlserver://192.168.40.133:1434;database=AdventureWorksLT2012;username=test;password=test" --table ProductModel --hive-import -- --schema SalesLT
Try this :
sqoop import --connect "jdbc:sqlserver://192.168.40.133:1434;database=AdventureWorksLT2012;username=test;password=test" --table ProductModel --hive-import -- --schema SalesLT --incremental append --check-column ProductModelID --last-value "128"
I think following command will work..
sqoop import --connect "jdbc:sqlserver://192.168.40.133:1434;database=AdventureWorksLT2012;username=test;password=test" --table ProductModel --hive-import --incremental append --check-column ProductModelID --last-value "128" -- --schema SalesLT

Resources