Sqoop job get Stuck when import data from Oracle to Hive - oracle

So I was trying to import a table from Oracle to Hive, using Sqoop. Here is my query
sqoop-import --hive-import --connect jdbc:oracle:thin:#10.35.10.180:1521:dms
--table DEFECT
--hive-database inspex
--username INSPEX
--password inspex
Yarn seems to split the job into 4 parts.
17/02/17 15:15:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.9.1
17/02/17 15:15:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/02/17 15:15:17 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/02/17 15:15:17 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/02/17 15:15:18 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/02/17 15:15:18 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/17 15:15:18 INFO tool.CodeGenTool: Beginning code generation
17/02/17 15:15:18 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "DEFECT" t WHERE 1=0
17/02/17 15:15:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/17 15:15:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO mapreduce.ImportJobBase: Beginning import of DEFECT
17/02/17 15:15:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/02/17 15:15:21 INFO client.RMProxy: Connecting to ResourceManager at vn1.localdomain/10.35.10.17:8032
17/02/17 15:15:23 INFO db.DBInputFormat: Using read commited transaction isolation
17/02/17 15:15:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("DEFECT_INDEX"), MAX("DEFECT_INDEX") FROM "DEFECT"
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: number of splits:4
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487360998088_0003
17/02/17 15:15:45 INFO impl.YarnClientImpl: Submitted application application_1487360998088_0003
17/02/17 15:15:45 INFO mapreduce.Job: The url to track the job: http://vn1.localdomain:8088/proxy/application_1487360998088_0003/
17/02/17 15:15:45 INFO mapreduce.Job: Running job: job_1487360998088_0003
17/02/17 15:15:51 INFO mapreduce.Job: Job job_1487360998088_0003 running in uber mode : false
17/02/17 15:15:51 INFO mapreduce.Job: map 0% reduce 0%
17/02/17 15:15:57 INFO mapreduce.Job: map 50% reduce 0%
17/02/17 15:16:35 INFO mapreduce.Job: map 75% reduce 0%
Then it got stuck in 75% and runs forever. I notice that 3 out of 4 jobs are finished pretty quick. Except for one:
Seems like this job isn't making any progress and just stay at 0%. I checked out the syslog:
2017-02-17 15:15:53,795 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1487360998088_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#41480ec1)
2017-02-17 15:15:53,948 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-02-17 15:15:54,426 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/root/appcache/application_1487360998088_0003
2017-02-17 15:15:55,409 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-02-17 15:15:55,813 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-02-17 15:15:55,822 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-02-17 15:15:56,278 INFO [main] org.apache.sqoop.mapreduce.db.DBInputFormat: Using read commited transaction isolation
2017-02-17 15:15:56,411 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,491 INFO [main] org.apache.sqoop.mapreduce.db.OracleDBRecordReader: Time zone has been set to GMT
2017-02-17 15:15:56,564 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,610 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT "DEFECT_INDEX", "LAYER_SCAN_INDEX", "DEFECT_SITE_ID", "CLUSTER_ID", "CARRY_OVER", "VERIFIED_ADDER", "REPEATING_DEFECT", "TEST_NUMBER", "X_DIE_COORDINATE", "Y_DIE_COORDINATE", "X_COORDINATE", "Y_COORDINATE", "X_DEFECT_SIZE", "Y_DEFECT_SIZE", "D_SIZE", "INSPECT_INTENSITY", "PATTERN_ID", "SURFACE", "ANGLE", "HOT_SPOT", "ASPECT_RATIO", "GRAY_SCALE", "MACROSIGID", "REGIONID", "SEMREVSAMPLE", "POLARITY", "DBGROUP", "CAREAREAGROUPCODE", "VENNID", "SEGMENTID", "MDAT_OFFSET", "DESIGNFILEFLOORPLANID", "DBSCRITICALITYINDEX", "CELLSIZE", "PCI", "LINECOMPLEXITY", "DCIRANGE", "GDSX", "GDSY" FROM "DEFECT" WHERE ( "DEFECT_INDEX" >= 1 ) AND ( "DEFECT_INDEX" < 545225318 )
The log ends on starting executing query. I wait for like 10 hours and still there's no update. Which shouldn't be right because it's not really that big.
I didn't find any error in the log. Before this I have successfully imported several tables from oracle to hive. So I think my configuration is fine.
I have tried to set mapper from 1 to 100 and still didn't work. And I notice that the task which PK starts from 1~somenumber always show 0% progress while other works just fine.
I'm looking for any suggestions or help. Thanks.

by default, sqoop aplit your job in 4 mappers based on the PK of your table, however depending on the distribution of your data, this can be very inefficient. You didnt talk about the size of your cluster or your data but I would sugeest see how is the distribution of your data and try set a greater number of maps(-m option) using a different column to split the load (split-by option). your job is using the defect_index column to split the work

Related

ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF _DIR is set correctly

I am trying to import data from sqoop to hive
MySQL
use sample;
create table forhive( id int auto_increment,
firstname varchar(36),
lastname varchar(36),
primary key(id)
);
insert into forhive(firstname, lastname) values("sample","singh");
select * from forhive;
1 abhay agrawal
2 vijay sharma
3 sample singh
This is the Sqoop command I'm using (version 1.4.7)
sqoop import --connect jdbc:mysql://********:3306/sample
--table forhive --split-by id --columns id,firstname,lastname
--target-dir /home/programmeur_v/forhive
--hive-import --create-hive-table --hive-table sqp.forhive --username vaibhav -P
This is the error I'm getting
Error Log
18/08/02 19:19:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
18/08/02 19:19:55 INFO tool.BaseSqoopTool: Using Hive-specific
delimiters for output. You can override
18/08/02 19:19:55 INFO tool.BaseSqoopTool: delimiters with
--fields-terminated-by, etc.
18/08/02 19:19:55 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
18/08/02 19:19:55 INFO tool.CodeGenTool: Beginning code generation
18/08/02 19:19:56 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM forhive AS t LIMIT 1
18/08/02 19:19:56 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM forhive AS t LIMIT 1
18/08/02 19:19:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/home/programmeur_v/softwares/hadoop-2.9.1
Note:
/tmp/sqoop-programmeur_v/compile/e8ffa12496a2e421f80e1fa16e025d28/forhive.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details. 18/08/02 19:19:58
INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-programmeur_v/compile/e8ffa12496a2e421f80e1fa16e025d28/forhive.jar
18/08/02 19:19:58 WARN manager.MySQLManager: It looks like you are
importing from mysql.
18/08/02 19:19:58 WARN manager.MySQLManager: This transfer can be
faster! Use the --direct
18/08/02 19:19:58 WARN manager.MySQLManager: option to exercise a
MySQL-specific fast path.
18/08/02 19:19:58 INFO manager.MySQLManager: Setting zero DATETIME
behavior to convertToNull (mysql)
18/08/02 19:19:58 INFO mapreduce.ImportJobBase: Beginning import of
forhive
18/08/02 19:19:58 INFO Configuration.deprecation: mapred.jar is
deprecated. Instead, use mapreduce.job.jar
18/08/02 19:19:59 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
18/08/02 19:19:59 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
18/08/02 19:20:02 INFO db.DBInputFormat: Using read commited
transaction isolation
18/08/02 19:20:02 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(id), MAX(id) FROM forhive
18/08/02 19:20:02 INFO db.IntegerSplitter: Split size: 0; Num splits:
4 from: 1 to: 3
18/08/02 19:20:02 INFO mapreduce.JobSubmitter: number of splits:3
18/08/02 19:20:02 INFO Configuration.deprecation:
yarn.resourcemanager.system-metrics-publisher.enabled is deprecated.
Instead, use yarn.system-metrics-publisher.enabl ed
18/08/02 19:20:02 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1533231535061_0006
18/08/02 19:20:03 INFO impl.YarnClientImpl: Submitted application
application_1533231535061_0006
18/08/02 19:20:03 INFO mapreduce.Job: The url to track the job:
http://instance-1:8088/proxy/application_1533231535061_0006/
18/08/02 19:20:03 INFO mapreduce.Job: Running job:
job_1533231535061_0006
18/08/02 19:20:11 INFO mapreduce.Job: Job job_1533231535061_0006
running in uber mode : false
18/08/02 19:20:11 INFO mapreduce.Job: map 0% reduce 0%
18/08/02 19:20:21 INFO mapreduce.Job: map 33% reduce 0%
18/08/02 19:20:24 INFO mapreduce.Job: map 100% reduce 0%
18/08/02 19:20:25 INFO mapreduce.Job: Job job_1533231535061_0006
completed successfully
18/08/02 19:20:25 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=622830
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=295
HDFS: Number of bytes written=48
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Killed map tasks=1
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=27404
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=27404
Total vcore-milliseconds taken by all map tasks=27404
Total megabyte-milliseconds taken by all map tasks=28061696
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=295
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=671
CPU time spent (ms)=4210
Physical memory (bytes) snapshot=616452096
Virtual memory (bytes) snapshot=5963145216
Total committed heap usage (bytes)=350224384
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=48
18/08/02 19:20:25 INFO mapreduce.ImportJobBase: Transferred 48 bytes
in 25.828 seconds (1.8584 bytes/sec)
18/08/02 19:20:25 INFO mapreduce.ImportJobBase: Retrieved 3 records.
18/08/02 19:20:25 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat
import job data to Listeners for table forhive
18/08/02 19:20:25 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM forhive AS t LIMIT 1
18/08/02 19:20:25 INFO hive.HiveImport: Loading uploaded data into
Hive
18/08/02 19:20:25 ERROR hive.HiveConfig: Could not load
org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set
correctly.
18/08/02 19:20:25 ERROR tool.ImportTool: Import failed:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
... 12 more
After I did google for the same error I added HIVE_CONF_DIR also to my bashrc
export HIVE_HOME=/home/programmeur_v/softwares/apache-hive-1.2.2-bin
export
HIVE_CONF_DIR=/home/programmeur_v/softwares/apache-hive-1.2.2-bin/conf
export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$HIVE_CONF_DIR
All my Hadoop services are also up and running.
6976 NameNode
7286 SecondaryNameNode
7559 NodeManager
7448 ResourceManager
8522 DataNode
14587 Jps
I'm just unable to figure out what mistake I'm making here. Please guide!
Download the file "hive-common-0.10.0.jar" by googling. Place this in "sqoop/lib" folder. This solution worked for me.
Go to $HIVE_HOME/lib directory using
cd $HIVE_HOME/lib
then copy hive-common-x.x.x.jar and paste it in $SQOOP_HOME/lib using
cp hive-common-x.x.x.jar $SQOOP_HOME/lib
You need to download the file hive-common-0.10.0.jar and copy it to $SQOOP_HOME/lib folder.
edit your .bash_profile ,then add HADOOP_CLASSPATH
vim ~/.bash_profile
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
source ~/.bash_profile
I got the same issue when I tried to import data from MySQL to Hive with the following command:
sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password z*****3 --table users -m 1 --hive-home /opt/hive --hive-import --hive-overwrite
Finally, these environment variables made it work perfectly.
export HIVE_HOME=/opt/hive
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
export HIVE_CONF_DIR=$HIVE_HOME/conf

Getting error in Sqoop when Import-all-tables in Cloudera Quickstart VM

I'm getting the following error when I try to import-all-tables via sqoop:
sqoop import-all-tables -m 12 --connect enter code here"jdbc:mysql://quickstart.cloudera:3306/retail_db" --username=retail_dba --password=cloudera --warehouse-dir=/r/cloudera/sqoop_import
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/04/23 15:29:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
17/04/23 15:29:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/04/23 15:29:27 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/04/23 15:29:27 INFO tool.CodeGenTool: Beginning code generation
17/04/23 15:29:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
17/04/23 15:29:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
17/04/23 15:29:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/e8e72a2e112fced2b0f3251b5666473d/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/04/23 15:29:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e8e72a2e112fced2b0f3251b5666473d/categories.jar
17/04/23 15:29:30 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/04/23 15:29:30 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/04/23 15:29:30 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/04/23 15:29:30 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/04/23 15:29:30 INFO mapreduce.ImportJobBase: Beginning import of categories
17/04/23 15:29:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/04/23 15:29:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/04/23 15:29:32 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/192.168.40.134:8032
17/04/23 15:29:37 INFO db.DBInputFormat: Using read commited transaction isolation
17/04/23 15:29:37 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`category_id`), MAX(`category_id`) FROM `categories`
17/04/23 15:29:37 INFO db.IntegerSplitter: Split size: 4; Num splits: 12 from: 1 to: 58
17/04/23 15:29:38 INFO mapreduce.JobSubmitter: number of splits:12
17/04/23 15:29:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492945339848_0010
17/04/23 15:29:39 INFO impl.YarnClientImpl: Submitted application application_1492945339848_0010
17/04/23 15:29:39 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1492945339848_0010/
17/04/23 15:29:39 INFO mapreduce.Job: Running job: job_1492945339848_0010
17/04/23 15:29:52 INFO mapreduce.Job: Job job_1492945339848_0010 running in uber mode : false
17/04/23 15:29:52 INFO mapreduce.Job: map 0% reduce 0%
17/04/23 15:29:52 INFO mapreduce.Job: Job job_1492945339848_0010 failed with state FAILED due to: Application application_1492945339848_0010 failed 2 times due to AM Container for appattempt_1492945339848_0010_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1492945339848_0010/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1492945339848_0010_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:578)
at org.apache.hadoop.util.Shell.run(Shell.java:481)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
17/04/23 15:29:52 INFO mapreduce.Job: Counters: 0
17/04/23 15:29:52 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
17/04/23 15:29:52 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 19.6175 seconds (0 bytes/sec)
17/04/23 15:29:52 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/23 15:29:52 INFO mapreduce.ImportJobBase: Retrieved 0 records.
17/04/23 15:29:52 ERROR tool.ImportAllTablesTool: Error during import: Import job failed!`enter
Looks like application masters are getting killed repeatedly meaning, they are not getting as much as memory as they would like. If you are just trying out sqoop on cloudera virtual machine, dont use -m 12, this will try spawn 12 parallel map tasks which you (single) machine may not be able to handle. Leave off that setting altogether or try with --direct instead . Also whats going on with --warehousedir=/r/cloudera/sqoop_import ? is /r/ as typo or it should be /user/
Try this instead :
sqoop import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--warehouse-dir=/user/cloudera/sqoop_import
--username=retail_dba \
--direct
--password=cloudera;
try to load first one table instead import-all-tables, Also try to restrict your mappers while using import-all-tables , 12 mappers is hampering the memory on VM.
sqoop import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--warehouse-dir=/user/cloudera/sqoop_import
--username=retail_dba \
--password=cloudera
-m 2

sqoop export to hana failed

when I want to export data from sqoop to hana DB, I receive error message.
Here is sqoop call
sqoop export \
--connect "jdbc:sap://saphana:30115" \
--username username \
--password password \
--table "INTERFACE_BO.WR_CUSTOMER_DATA" \
--columns "WEEK_DAY_START,SALES_ORG,REG_ORDERS_CNT,REG_EMAILS_CNT,UNIQUE_EMAILS" \
--driver "com.sap.db.jdbc.Driver" \
--export-dir "/user/hive/warehouse/export.db/weekly_holding_report" \
--input-fields-terminated-by '\t'
This is the way how I create table
CREATE TABLE weekly_holding_report
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
stored as textfile
AS
SELECT
...
I also tried many variants of stored as like: stored as parquet, etc.. but without any change
And here is reposponse
Warning: /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/25 06:43:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
16/10/25 06:43:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/25 06:43:05 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
16/10/25 06:43:05 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/25 06:43:05 INFO tool.CodeGenTool: Beginning code generation
16/10/25 06:43:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INTERFACE_BO.WR_CUSTOMER_DATA AS t WHERE 1=0
16/10/25 06:43:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/515ace5e3ad6ad117ba5f5fa61bf279c/INTERFACE_BO_WR_CUSTOMER_DATA.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/25 06:43:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/515ace5e3ad6ad117ba5f5fa61bf279c/INTERFACE_BO.WR_CUSTOMER_DATA.jar
16/10/25 06:43:07 INFO mapreduce.ExportJobBase: Beginning export of INTERFACE_BO.WR_CUSTOMER_DATA
16/10/25 06:43:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/25 06:43:07 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
16/10/25 06:43:08 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/10/25 06:43:08 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16/10/25 06:43:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/25 06:43:08 INFO client.RMProxy: Connecting to ResourceManager at cz-dc-v-564.mall.local/10.200.58.21:8032
16/10/25 06:43:10 INFO input.FileInputFormat: Total input paths to process : 10
16/10/25 06:43:10 INFO input.FileInputFormat: Total input paths to process : 10
16/10/25 06:43:10 INFO mapreduce.JobSubmitter: number of splits:4
16/10/25 06:43:10 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16/10/25 06:43:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476740951886_1458
16/10/25 06:43:10 INFO impl.YarnClientImpl: Submitted application application_1476740951886_1458
16/10/25 06:43:10 INFO mapreduce.Job: The url to track the job: http://cz-dc-v-564.mall.local:8088/proxy/application_1476740951886_1458/
16/10/25 06:43:10 INFO mapreduce.Job: Running job: job_1476740951886_1458
16/10/25 06:43:18 INFO mapreduce.Job: Job job_1476740951886_1458 running in uber mode : false
16/10/25 06:43:18 INFO mapreduce.Job: map 0% reduce 0%
16/10/25 06:43:25 INFO mapreduce.Job: map 100% reduce 0%
16/10/25 06:43:25 INFO mapreduce.Job: Job job_1476740951886_1458 failed with state FAILED due to: Task failed task_1476740951886_1458_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/10/25 06:43:26 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=1
Killed map tasks=3
Launched map tasks=4
Data-local map tasks=1
Rack-local map tasks=3
Total time spent by all maps in occupied slots (ms)=39746
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=19873
Total vcore-seconds taken by all map tasks=19873
Total megabyte-seconds taken by all map tasks=30524928
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/10/25 06:43:26 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/10/25 06:43:26 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 17,9084 seconds (0 bytes/sec)
16/10/25 06:43:26 INFO mapreduce.ExportJobBase: Exported 0 records.
16/10/25 06:43:26 ERROR tool.ExportTool: Error during export: Export job failed!
Does anybody know what am I doing wrong?
Thanks for help!
Have you tried adding an sqoop argumment --m 1
this is basically for giving number of mappers so in this case try with mapper = 1
is your data is too huge(more than 1 gb's) then increase number of mappers
Check the logs for the corresponding job id, in your case job_1476740951886_1458. There must be some permission or connectivity issue(with the given saphana url), that is why it transferred zero bytes.

import-all-tables hanging in during hive-import

[cloudera#quickstart ~]$ sqoop import-all-tables --connect="jdbc:mysql://serverip:3306/dbname"
--username=xxx --password=yyy -m 1 --hive-import --hive-overwrite
--create-hive-table --hive-database sqoopimport --hive-home /user/hive/warehouse
It is running upto map-reduce and during hive import it is hanging in below step, am I doing any mistake?
Logging initialized using configuration in jar:file:/usr/jars/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
Full log:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/08/13 21:20:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
16/08/13 21:20:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/13 21:20:37 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/08/13 21:20:37 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/08/13 21:20:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/13 21:20:39 INFO tool.CodeGenTool: Beginning code generation
16/08/13 21:20:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/08/13 21:20:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/08/13 21:20:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/af1ef62229d717800a3cd16b42f53dc3/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/13 21:20:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/af1ef62229d717800a3cd16b42f53dc3/categories.jar
16/08/13 21:20:43 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/08/13 21:20:43 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/08/13 21:20:43 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/08/13 21:20:43 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/08/13 21:20:43 INFO mapreduce.ImportJobBase: Beginning import of categories
16/08/13 21:20:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/08/13 21:20:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/08/13 21:20:46 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
16/08/13 21:20:49 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/13 21:20:49 INFO mapreduce.JobSubmitter: number of splits:1
16/08/13 21:20:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470828230815_0030
16/08/13 21:20:50 INFO impl.YarnClientImpl: Submitted application application_1470828230815_0030
16/08/13 21:20:50 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1470828230815_0030/
16/08/13 21:20:50 INFO mapreduce.Job: Running job: job_1470828230815_0030
16/08/13 21:21:07 INFO mapreduce.Job: Job job_1470828230815_0030 running in uber mode : false
16/08/13 21:21:07 INFO mapreduce.Job: map 0% reduce 0%
16/08/13 21:21:18 INFO mapreduce.Job: map 100% reduce 0%
16/08/13 21:21:18 INFO mapreduce.Job: Job job_1470828230815_0030 completed successfully
16/08/13 21:21:18 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=139452
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=1029
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=1008512
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7879
Total vcore-seconds taken by all map tasks=7879
Total megabyte-seconds taken by all map tasks=1008512
Map-Reduce Framework
Map input records=58
Map output records=58
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=78
CPU time spent (ms)=950
Physical memory (bytes) snapshot=143511552
Virtual memory (bytes) snapshot=726700032
Total committed heap usage (bytes)=48234496
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1029
16/08/13 21:21:18 INFO mapreduce.ImportJobBase: Transferred 1.0049 KB in 32.6366 seconds (31.5291 bytes/sec)
16/08/13 21:21:18 INFO mapreduce.ImportJobBase: Retrieved 58 records.
16/08/13 21:21:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/08/13 21:21:18 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/jars/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties

Can't finish MR when using Sqoop transfer data from HDFS to MYSQL

While transferring data from HDFS to MySQL, a MapReduce job gets spawned. But, it gets stuck and does not get completed.
sqoop export --connect jdbc:mysql://crxy2:3306/test --username root --password 19911130 --table info --export-dir sqoop_export
I see following in the logs:
Warning: /software/sqoop-1.4.6.alpha/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /software/sqoop-1.4.6.alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /software/sqoop-1.4.6.alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /software/sqoop-1.4.6.alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/12/02 01:17:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/12/02 01:17:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/12/02 01:17:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/12/02 01:17:37 INFO tool.CodeGenTool: Beginning code generation
15/12/02 01:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `info` AS t LIMIT 1
15/12/02 01:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `info` AS t LIMIT 1
15/12/02 01:17:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.6.0
Note: /tmp/sqoop-root/compile/344126e97612def1e3976c1978c2e75e/info.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/12/02 01:17:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/344126e97612def1e3976c1978c2e75e/info.jar
15/12/02 01:17:42 INFO mapreduce.ExportJobBase: Beginning export of info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-0.98.8-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/12/02 01:17:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/12/02 01:17:46 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/02 01:17:50 INFO input.FileInputFormat: Total input paths to process : 1
15/12/02 01:17:50 INFO input.FileInputFormat: Total input paths to process : 1
15/12/02 01:17:50 INFO mapreduce.JobSubmitter: number of splits:4
15/12/02 01:17:50 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/12/02 01:17:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449047829255_0001
15/12/02 01:17:51 INFO impl.YarnClientImpl: Submitted application application_1449047829255_0001
15/12/02 01:17:52 INFO mapreduce.Job: The url to track the job: http://crxy2:8088/proxy/application_1449047829255_0001/
15/12/02 01:17:52 INFO mapreduce.Job: Running job: job_1449047829255_0001
15/12/02 01:18:12 INFO mapreduce.Job: Job job_1449047829255_0001 running in uber mode : false
15/12/02 01:18:12 INFO mapreduce.Job: map 0% reduce 0%
15/12/02 01:19:10 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:19:12 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:29:41 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_0, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_0 Timed out after 600 secs
15/12/02 01:29:42 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:29:58 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:40:11 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_1, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_1 Timed out after 600 secs
15/12/02 01:40:12 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:40:28 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:50:41 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_2, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_2 Timed out after 600 secs
15/12/02 01:50:42 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:51:00 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 02:01:13 INFO mapreduce.Job: Job job_1449047829255_0001 failed with state FAILED due to: Task failed task_1449047829255_0001_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/12/02 02:01:13 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=370395
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556
HDFS: Number of bytes written=0
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=4
Launched map tasks=7
Other local map tasks=3
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=2732612
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2732612
Total vcore-seconds taken by all map tasks=2732612
Total megabyte-seconds taken by all map tasks=2798194688
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=504
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=759
CPU time spent (ms)=5170
Physical memory (bytes) snapshot=245080064
Virtual memory (bytes) snapshot=2529026048
Total committed heap usage (bytes)=46792704
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/12/02 02:01:13 INFO mapreduce.ExportJobBase: Transferred 556 bytes in 2,607.4894 seconds (0.2132 bytes/sec)
15/12/02 02:01:13 INFO mapreduce.ExportJobBase: Exported 0 records.
15/12/02 02:01:13 ERROR tool.ExportTool: Error during export: Export job failed!
2015-12-02 08:01:15,791 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1449047829255_0002
2015-12-02 08:01:15,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1449047829255_0002_000001 is done. finalState=FINISHED
2015-12-02 08:01:15,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1449047829255_0002 requests cleared
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1449047829255_0002 user: root queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1449047829255_0002 user: root leaf-queue of parent: root #applications: 0
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1449047829255_0002,name=info.jar,user=root,queue=default,state=FINISHED,trackingUrl=http://crxy2:8088/proxy/application_1449047829255_0002/jobhistory/job/job_1449047829255_0002,appMasterHost=crxy2,startTime=1449069503787,finishTime=1449072069229,finalStatus=FAILED
2015-12-02 08:01:15,796 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1449047829255_0002_000001
2015-12-02 08:01:15,873 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
2015-12-02 08:01:15,873 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
2015-12-02 08:01:16,879 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
Questioner was looking at incorrect logs. He is able to troubleshoot the issue by going through failed task logs as per the suggestion in the comments section.

Resources