Sqoop fails during export to Oracle

Sqoop fails during export to Oracle - oracle

I have a table on Oracle and identical table on Hive (with relevant datatypes)
I'm trying to export hive table to Oracle using script:
sqoop export -D oraoop.disabled=true -Dmapred.job.queue.name=disco --connect jdbc:oracle:thin:#oracle:1521/tns \
--username someuser \
--password somepasswd \
--hcatalog-database hive_database \
--hcatalog-table TABLE_ON_HIVE \
--table TABLE_ON_ORACLE \
--num-mappers 5
And I get an error:
18/08/29 08:23:10 INFO mapreduce.Job: Job job_1535519043541_1004 running in uber mode : false
18/08/29 08:23:10 INFO mapreduce.Job: map 0% reduce 0%
18/08/29 08:23:28 INFO mapreduce.Job: map 100% reduce 0%
18/08/29 08:28:40 INFO mapreduce.Job: Task Id : attempt_1535519043541_1004_m_000000_0, Status : FAILED
AttemptID:attempt_1535519043541_1004_m_000000_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
18/08/29 08:28:41 INFO mapreduce.Job: map 0% reduce 0%
18/08/29 08:28:57 INFO mapreduce.Job: map 100% reduce 0%
18/08/29 08:34:09 INFO mapreduce.Job: Task Id : attempt_1535519043541_1004_m_000000_1, Status : FAILED
AttemptID:attempt_1535519043541_1004_m_000000_1 Timed out after 300 secs
18/08/29 08:34:10 INFO mapreduce.Job: map 0% reduce 0%
18/08/29 08:34:28 INFO mapreduce.Job: map 100% reduce 0%
18/08/29 08:39:39 INFO mapreduce.Job: Task Id : attempt_1535519043541_1004_m_000000_2, Status : FAILED
AttemptID:attempt_1535519043541_1004_m_000000_2 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
18/08/29 08:39:40 INFO mapreduce.Job: map 0% reduce 0%
18/08/29 08:39:56 INFO mapreduce.Job: map 100% reduce 0%
18/08/29 08:45:11 INFO mapreduce.Job: Job job_1535519043541_1004 failed with state FAILED due to: Task failed task_1535519043541_1004_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
18/08/29 08:45:11 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1312647
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1312647
Total vcore-seconds taken by all map tasks=1312647
Total megabyte-seconds taken by all map tasks=5376602112
18/08/29 08:45:11 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/08/29 08:45:11 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 1,359.6067 seconds (0 bytes/sec)
18/08/29 08:45:11 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/08/29 08:45:11 INFO mapreduce.ExportJobBase: Exported 0 records.
18/08/29 08:45:11 ERROR tool.ExportTool: Error during export: Export job failed!
So I get two attempts and export fails with no reason explained.
Did you face issue like mine?
Where can I find any more detailed logs?
Pawel

Please try with the below command
sqoop export --connect \
-Dmapred.job.queue.name=disco \
--username sqoop \
--password sqoop \
--table emp \
--update-mode allowinsert \
--update-key id \
--export-dir table_location \
--input-fields-terminated-by 'delimiter'
Note: --update-mode - we can pass two arguments "updateonly" - to update the records. this will update the records if the update key matches.
if you want to do upsert (If exists UPDATE else INSERT) then use "allowinsert" mode.
example:
--update-mode updateonly \ --> for updates
--update-mode allowinsert \ --> for upsert

Related

Sqoop export job failed after map 100%

I'm trying to export a table from HDFS to Oracle database, using the command:
sqoop export --connect jdbc:oracle:thin:#ip:port/db --username user -P --table OPORTUNIDADESHIVE --export-dir /user/hadoop/OPORTUNIDADES/000000_0 --input-fields-terminated-by "\t"
where OPORTUNIDADESHIVE is the table from Oracle and the file "000000_0" is the table extracted from Hive to HDFS. Both tables have the same columns.
This table from Hive is ROW FORMAT DELIMITED and FIELDS TERMINATED BY '\t'.
But in the end, the export gives me this error message:
2021-11-04 14:58:04,633 INFO mapreduce.Job: map 0% reduce 0%
2021-11-04 14:58:13,711 INFO mapreduce.Job: map 100% reduce 0%
2021-11-04 14:58:13,723 INFO mapreduce.Job: Job job_1635324128846_0049
failed with state FAILED due to: Task failed
task_1635324128846_0049_m_000000 Job failed as tasks failed.
failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
2021-11-04 14:58:13,792 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=2
Killed map tasks=2
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=28818
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14409
Total vcore-milliseconds taken by all map tasks=14409
Total megabyte-milliseconds taken by all map tasks=29509632 2021-11-04 14:58:13,799 WARN mapreduce.Counters: Group
FileSystemCounters is deprecated. Use
org.apache.hadoop.mapreduce.FileSystemCounter instead 2021-11-04
14:58:13,800 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
19.1507 seconds (0 bytes/sec) 2021-11-04 14:58:13,803 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2021-11-04 14:58:13,804 INFO mapreduce.ExportJobBase: Exported 0
records. 2021-11-04 14:58:13,804 ERROR mapreduce.ExportJobBase: Export
job failed! 2021-11-04 14:58:13,804 ERROR tool.ExportTool: Error
during export: Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:465)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

sqoop export is very sensitive to data type, length mismatch issue. so plese follow below steps to fix this issue -
Put hive and oracle definition side by side and check if data type and length are comparable. please note some data type like string in hive can hold more than 4000 char so you may need to handle such scenario specially.
if you see above comparison is fine then use this command to import data. Your command has export dir argument, it should be path to the table which is /user/hadoop/OPORTUNIDADES.
But i feel like its easy to use --hcatalog-table and here is complete command.
sqoop export --connect jdbc:oracle:thin:#mydb:5217/user --username user --password pass --hcatalog-database hive_db --hcatalog-table hive_tab --table orac_tab --input-null-string '\\\\N' --input-null-non-string '\\\\N' --m 6
Pls note we are using just input and output table and optional null handling.

Hive Testbench data generation failed

I cloned the Hive Testbench to try to run Hive benchmark on a hadoop cluster built with Apache binary distributions of Hadoop v2.9.0, Hive 2.3.0 and Tez 0.9.0.
I managed to finish the build of the two data generators: TPC-H and TPC-DS. Then the next step of data generation on either TPC-H and TPC-DS are all failed. The failure is very consistent that each time it would failed at the exactly same step and produce same error messages.
For TPC-H, the data generation screen output is here:
$ ./tpch-setup.sh 10
ls: `/tmp/tpch-generate/10/lineitem': No such file or directory
Generating data at scale factor 10.
...
18/01/02 14:43:00 INFO mapreduce.Job: Running job: job_1514226810133_0050
18/01/02 14:43:01 INFO mapreduce.Job: Job job_1514226810133_0050 running in uber mode : false
18/01/02 14:43:01 INFO mapreduce.Job: map 0% reduce 0%
18/01/02 14:44:38 INFO mapreduce.Job: map 10% reduce 0%
18/01/02 14:44:39 INFO mapreduce.Job: map 20% reduce 0%
18/01/02 14:44:46 INFO mapreduce.Job: map 30% reduce 0%
18/01/02 14:44:48 INFO mapreduce.Job: map 40% reduce 0%
18/01/02 14:44:58 INFO mapreduce.Job: map 70% reduce 0%
18/01/02 14:45:14 INFO mapreduce.Job: map 80% reduce 0%
18/01/02 14:45:15 INFO mapreduce.Job: map 90% reduce 0%
18/01/02 14:45:23 INFO mapreduce.Job: map 100% reduce 0%
18/01/02 14:45:23 INFO mapreduce.Job: Job job_1514226810133_0050 completed successfully
18/01/02 14:45:23 INFO mapreduce.Job: Counters: 0
SLF4J: Class path contains multiple SLF4J bindings.
...
ls: `/tmp/tpch-generate/10/lineitem': No such file or directory
Data generation failed, exiting.
For TPC-DS, the error messages are here:
$ ./tpcds-setup.sh 10
...
18/01/02 22:13:58 INFO Configuration.deprecation: mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
18/01/02 22:13:58 INFO client.RMProxy: Connecting to ResourceManager at /192.168.10.15:8032
18/01/02 22:13:59 INFO input.FileInputFormat: Total input files to process : 1
18/01/02 22:13:59 INFO mapreduce.JobSubmitter: number of splits:10
18/01/02 22:13:59 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
18/01/02 22:13:59 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/01/02 22:13:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1514226810133_0082
18/01/02 22:14:00 INFO client.YARNRunner: Number of stages: 1
18/01/02 22:14:00 INFO Configuration.deprecation: mapred.job.map.memory.mb is deprecated. Instead, use mapreduce.map.memory.mb
18/01/02 22:14:00 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.0, revision=0873a0118a895ca84cbdd221d8ef56fedc4b43d0, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2017-07-18T05:41:23Z ]
18/01/02 22:14:00 INFO client.RMProxy: Connecting to ResourceManager at /192.168.10.15:8032
18/01/02 22:14:00 INFO client.TezClient: Submitting DAG application with id: application_1514226810133_0082
18/01/02 22:14:00 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://192.168.10.15:8020/apps/tez,hdfs://192.168.10.15:8020/apps/tez/lib/
18/01/02 22:14:00 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
18/01/02 22:14:00 INFO client.TezClient: Tez system stage directory hdfs://192.168.10.15:8020/tmp/hadoop-yarn/staging/rapids/.staging/job_1514226810133_0082/.tez/application_1514226810133_0082 doesn't exist and is created
18/01/02 22:14:01 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1514226810133_0082, dagName=GenTable+all_10
18/01/02 22:14:01 INFO impl.YarnClientImpl: Submitted application application_1514226810133_0082
18/01/02 22:14:01 INFO client.TezClient: The url to track the Tez AM: http://boray05:8088/proxy/application_1514226810133_0082/
18/01/02 22:14:05 INFO client.RMProxy: Connecting to ResourceManager at /192.168.10.15:8032
18/01/02 22:14:05 INFO mapreduce.Job: The url to track the job: http://boray05:8088/proxy/application_1514226810133_0082/
18/01/02 22:14:05 INFO mapreduce.Job: Running job: job_1514226810133_0082
18/01/02 22:14:06 INFO mapreduce.Job: Job job_1514226810133_0082 running in uber mode : false
18/01/02 22:14:06 INFO mapreduce.Job: map 0% reduce 0%
18/01/02 22:15:51 INFO mapreduce.Job: map 10% reduce 0%
18/01/02 22:15:54 INFO mapreduce.Job: map 20% reduce 0%
18/01/02 22:15:55 INFO mapreduce.Job: map 40% reduce 0%
18/01/02 22:15:56 INFO mapreduce.Job: map 50% reduce 0%
18/01/02 22:16:07 INFO mapreduce.Job: map 60% reduce 0%
18/01/02 22:16:09 INFO mapreduce.Job: map 70% reduce 0%
18/01/02 22:16:11 INFO mapreduce.Job: map 80% reduce 0%
18/01/02 22:16:19 INFO mapreduce.Job: map 90% reduce 0%
18/01/02 22:19:54 INFO mapreduce.Job: map 100% reduce 0%
18/01/02 22:19:54 INFO mapreduce.Job: Job job_1514226810133_0082 completed successfully
18/01/02 22:19:54 INFO mapreduce.Job: Counters: 0
...
TPC-DS text data generation complete.
Loading text data into external tables.
Optimizing table time_dim (2/24).
Optimizing table date_dim (1/24).
Optimizing table item (3/24).
Optimizing table customer (4/24).
Optimizing table household_demographics (6/24).
Optimizing table customer_demographics (5/24).
Optimizing table customer_address (7/24).
Optimizing table store (8/24).
Optimizing table promotion (9/24).
Optimizing table warehouse (10/24).
Optimizing table ship_mode (11/24).
Optimizing table reason (12/24).
Optimizing table income_band (13/24).
Optimizing table call_center (14/24).
Optimizing table web_page (15/24).
Optimizing table catalog_page (16/24).
Optimizing table web_site (17/24).
make: *** [store_sales] Error 2
make: *** Waiting for unfinished jobs....
make: *** [store_returns] Error 2
Data loaded into database tpcds_bin_partitioned_orc_10.
I notice the targeted temporary HDFS directory during the job running and after the failure are always empty except for the generated sub-directories.
Now I even don't know if the failure is due to Hadoop configuration issues, or mismatch software versions or any other reasons. Any help?

I had similar issue when running this job. When I specified the hdfs location to this script where I had permissions to write to, the script was successful.
./tpcds-setup.sh 10 <hdfs_directory_path>
I still get this error when the script kicks off:
Data loaded into database tpcds_bin_partitioned_orc_10.
ls: `<hdfs_directory_path>/10': No such file or directory
However the script runs successfully and the data is generated and loaded into the hive tables at the end.
Hope that helps.

Can't finish MR when using Sqoop transfer data from HDFS to MYSQL

While transferring data from HDFS to MySQL, a MapReduce job gets spawned. But, it gets stuck and does not get completed.
sqoop export --connect jdbc:mysql://crxy2:3306/test --username root --password 19911130 --table info --export-dir sqoop_export
I see following in the logs:
Warning: /software/sqoop-1.4.6.alpha/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /software/sqoop-1.4.6.alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /software/sqoop-1.4.6.alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /software/sqoop-1.4.6.alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/12/02 01:17:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/12/02 01:17:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/12/02 01:17:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/12/02 01:17:37 INFO tool.CodeGenTool: Beginning code generation
15/12/02 01:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `info` AS t LIMIT 1
15/12/02 01:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `info` AS t LIMIT 1
15/12/02 01:17:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /software/hadoop-2.6.0
Note: /tmp/sqoop-root/compile/344126e97612def1e3976c1978c2e75e/info.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/12/02 01:17:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/344126e97612def1e3976c1978c2e75e/info.jar
15/12/02 01:17:42 INFO mapreduce.ExportJobBase: Beginning export of info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/software/hbase-0.98.8-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/12/02 01:17:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/12/02 01:17:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/12/02 01:17:46 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/12/02 01:17:50 INFO input.FileInputFormat: Total input paths to process : 1
15/12/02 01:17:50 INFO input.FileInputFormat: Total input paths to process : 1
15/12/02 01:17:50 INFO mapreduce.JobSubmitter: number of splits:4
15/12/02 01:17:50 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/12/02 01:17:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449047829255_0001
15/12/02 01:17:51 INFO impl.YarnClientImpl: Submitted application application_1449047829255_0001
15/12/02 01:17:52 INFO mapreduce.Job: The url to track the job: http://crxy2:8088/proxy/application_1449047829255_0001/
15/12/02 01:17:52 INFO mapreduce.Job: Running job: job_1449047829255_0001
15/12/02 01:18:12 INFO mapreduce.Job: Job job_1449047829255_0001 running in uber mode : false
15/12/02 01:18:12 INFO mapreduce.Job: map 0% reduce 0%
15/12/02 01:19:10 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:19:12 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:29:41 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_0, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_0 Timed out after 600 secs
15/12/02 01:29:42 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:29:58 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:40:11 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_1, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_1 Timed out after 600 secs
15/12/02 01:40:12 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:40:28 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 01:50:41 INFO mapreduce.Job: Task Id : attempt_1449047829255_0001_m_000001_2, Status : FAILED
AttemptID:attempt_1449047829255_0001_m_000001_2 Timed out after 600 secs
15/12/02 01:50:42 INFO mapreduce.Job: map 75% reduce 0%
15/12/02 01:51:00 INFO mapreduce.Job: map 100% reduce 0%
15/12/02 02:01:13 INFO mapreduce.Job: Job job_1449047829255_0001 failed with state FAILED due to: Task failed task_1449047829255_0001_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/12/02 02:01:13 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=370395
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=556
HDFS: Number of bytes written=0
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=4
Launched map tasks=7
Other local map tasks=3
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=2732612
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2732612
Total vcore-seconds taken by all map tasks=2732612
Total megabyte-seconds taken by all map tasks=2798194688
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=504
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=759
CPU time spent (ms)=5170
Physical memory (bytes) snapshot=245080064
Virtual memory (bytes) snapshot=2529026048
Total committed heap usage (bytes)=46792704
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/12/02 02:01:13 INFO mapreduce.ExportJobBase: Transferred 556 bytes in 2,607.4894 seconds (0.2132 bytes/sec)
15/12/02 02:01:13 INFO mapreduce.ExportJobBase: Exported 0 records.
15/12/02 02:01:13 ERROR tool.ExportTool: Error during export: Export job failed!
2015-12-02 08:01:15,791 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1449047829255_0002
2015-12-02 08:01:15,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1449047829255_0002_000001 is done. finalState=FINISHED
2015-12-02 08:01:15,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1449047829255_0002 requests cleared
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1449047829255_0002 user: root queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1449047829255_0002 user: root leaf-queue of parent: root #applications: 0
2015-12-02 08:01:15,794 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1449047829255_0002,name=info.jar,user=root,queue=default,state=FINISHED,trackingUrl=http://crxy2:8088/proxy/application_1449047829255_0002/jobhistory/job/job_1449047829255_0002,appMasterHost=crxy2,startTime=1449069503787,finishTime=1449072069229,finalStatus=FAILED
2015-12-02 08:01:15,796 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1449047829255_0002_000001
2015-12-02 08:01:15,873 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
2015-12-02 08:01:15,873 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
2015-12-02 08:01:16,879 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...

Questioner was looking at incorrect logs. He is able to troubleshoot the issue by going through failed task logs as per the suggestion in the comments section.

Sqoop using '--direct' option fails with mysqldump exit code 2 and 3

I am running Sqoop in AWS EMR. I am trying to copy a table ~10 GB from MySQL into HDFS.
I get the following exception
15/07/06 12:19:07 INFO mapreduce.Job: Task Id : attempt_1435664372091_0048_m_000000_2, Status : FAILED
Error: java.io.IOException: mysqldump terminated with status 3
at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:485)
at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:152)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
15/07/06 12:19:07 INFO mapreduce.Job: Task Id : attempt_1435664372091_0048_m_000005_2, Status : FAILED
Error: java.io.IOException: mysqldump terminated with status 2
at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:485)
at org.apache.sqoop.mapreduce.MySQLDumpMapper.map(MySQLDumpMapper.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:152)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
15/07/06 12:19:08 INFO mapreduce.Job: map 0% reduce 0%
15/07/06 12:19:20 INFO mapreduce.Job: map 25% reduce 0%
15/07/06 12:19:22 INFO mapreduce.Job: map 38% reduce 0%
15/07/06 12:19:23 INFO mapreduce.Job: map 50% reduce 0%
15/07/06 12:19:24 INFO mapreduce.Job: map 75% reduce 0%
15/07/06 12:19:25 INFO mapreduce.Job: map 100% reduce 0%
15/07/06 12:23:11 INFO mapreduce.Job: Job job_1435664372091_0048 failed with state FAILED due to: Task failed task_1435664372091_0048_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/07/06 12:23:11 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=28
Launched map tasks=28
Other local map tasks=28
Total time spent by all maps in occupied slots (ms)=34760760
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5793460
Total vcore-seconds taken by all map tasks=5793460
Total megabyte-seconds taken by all map tasks=8342582400
15/07/06 12:23:11 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
15/07/06 12:23:11 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 829.8697 seconds (0 bytes/sec)
15/07/06 12:23:11 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
15/07/06 12:23:11 INFO mapreduce.ImportJobBase: Retrieved 0 records.
15/07/06 12:23:11 ERROR tool.ImportTool: Error during import: Import job failed!
If I run with out '--direct' option, I get the communication exception as in https://issues.cloudera.org/browse/SQOOP-186
I have set 'net-write-timeout' and 'net-read-timeout' values in MySQL to 6000.
My Sqoop command looks like this
sqoop import -D mapred.task.timeout=0 --fields-terminated-by '\t' --escaped-by '\\' --optionally-enclosed-by '\"' --bindir ./ --connect jdbc:mysql://<remote ip>/<mysql db> --username tuser --password tuser --table table1 --target-dir=/base/table1 --split-by id -m 8 --direct
How to fix the same ? Am I missing something.
I have also created SQOOP JIRA - https://issues.apache.org/jira/browse/SQOOP-2411

I have seen this error occur when Sqoop cannot divide the key space evenly and one of the map tasks processes zero rows of data. Possible workarounds are changing the number of mappers (-n) or specifying a different key column (--split-by) that has evenly distributed values.

Can you try running the below command and see whether it works. Not sure but I guess there is some problem with your sqoop import command.
sqoop import --connect "jdbc:mysql://<remote ip>/<mysql db>" --password "core" --username "core" --table "TABLENAME" --target-dir "/sqoopfile2" -m 8 --direct

Map Reduce Job is reported as complete on history server while on console it shows as only half way thru

I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when I submit this job YARN created multiple applications for that submitted job , and the last application that is running in YARN is marked as complete even as on console its reported as only 58% complete . I have confirmed that its also not printing the log statements that its supposed to print when the job is actually complete .
Please see the output from the job submission console below. It just stops at 58% and job history server and YARN cluster UI reports that this job has already succeeded.
4/08/28 08:36:19 INFO mapreduce.Job: map 54% reduce 0%
14/08/28 08:44:13 INFO mapreduce.Job: map 55% reduce 0%
14/08/28 08:52:16 INFO mapreduce.Job: map 56% reduce 0%
14/08/28 08:59:22 INFO mapreduce.Job: map 57% reduce 0%
14/08/28 09:07:33 INFO mapreduce.Job: map 58% reduce 0%
Thanks.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sqoop fails during export to Oracle - oracle

Related

Sqoop export job failed after map 100%

Hive Testbench data generation failed

Can't finish MR when using Sqoop transfer data from HDFS to MYSQL

Sqoop using '--direct' option fails with mysqldump exit code 2 and 3

Map Reduce Job is reported as complete on history server while on console it shows as only half way thru

Categories

Resources