sqoop imported data but with empty part-m-00000 files? - hadoop

when importing the data from oracle database to HDFS using Apache sqoop. it's imported but empty files.
sqoop import --connect jdbc:oracle:thin:#192.168.0.15:1521:XE --username system --password system --table EMP -m 1 --target-dir /user/sinha
after running its creates part-m-00000 file without any data...while running query
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/03/05 09:43:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.12.0
18/03/05 09:43:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/03/05 09:44:00 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/03/05 09:44:58 INFO mapreduce.JobSubmitter: number of splits:1
18/03/05 09:45:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1520229051986_0016
18/03/05 09:45:03 INFO impl.YarnClientImpl: Submitted application application_1520229051986_0016
18/03/05 09:45:03 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1520229051986_0016/
18/03/05 09:45:03 INFO mapreduce.Job: Running job: job_1520229051986_0016
18/03/05 09:45:54 INFO mapreduce.Job: Job job_1520229051986_0016 running in uber mode : false
18/03/05 09:45:54 INFO mapreduce.Job: map 0% reduce 0%
18/03/05 09:46:35 INFO mapreduce.Job: map 100% reduce 0%
18/03/05 09:46:36 INFO mapreduce.Job: Job job_1520229051986_0016 completed successfully
18/03/05 09:46:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=151209
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=37383
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=37383
Total vcore-milliseconds taken by all map tasks=37383
Total megabyte-milliseconds taken by all map tasks=38280192
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=546
CPU time spent (ms)=5110
Physical memory (bytes) snapshot=143175680
Virtual memory (bytes) snapshot=1509150720
Total committed heap usage (bytes)=74973184
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
18/03/05 09:46:36 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 108.9264 seconds (0 bytes/sec)
18/03/05 09:46:36 INFO mapreduce.ImportJobBase: Retrieved 0 records
i don't know what is the problem?
Even if i check with "eval" command also it's display only column names of the table.

Looking at the logs, you source table don't have any records at all. Do a select * on your oracle table to validate. Add some records to your oracle table and try the sqoop operation again. You shall be able to fetch the data.

Related

Hadoop producing no output?

I've recently started learning how to use the Hadoop system, and decided it's time to try writing some code. Before that, I wanted to try running the examples seen in the Getting Started page. However, it does not seem to produce any visible results.
I'm currently using Hadoop version 3.3.1 using a single-node setup,
and using jdk 11.0.11. I am running this on Windows 10 (due to current development requirements).
I've used the following command on cmd:
hadoop jar %hadoop_home%/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep input /output 'dfs[a-z.]+'
The output to the command:
C:\Windows\system32>hadoop jar %hadoop_home%/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep input /output 'dfs[a-z.]+'
2021-12-15 00:33:10,486 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2021-12-15 00:33:10,800 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/E/.staging/job_1639519343908_0005
2021-12-15 00:33:11,029 INFO input.FileInputFormat: Total input files to process : 10
2021-12-15 00:33:11,108 INFO mapreduce.JobSubmitter: number of splits:10
2021-12-15 00:33:11,281 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1639519343908_0005
2021-12-15 00:33:11,281 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-15 00:33:11,442 INFO conf.Configuration: resource-types.xml not found
2021-12-15 00:33:11,443 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-12-15 00:33:11,497 INFO impl.YarnClientImpl: Submitted application application_1639519343908_0005
2021-12-15 00:33:11,527 INFO mapreduce.Job: The url to track the job: http://DESKTOP-S15C716:8088/proxy/application_1639519343908_0005/
2021-12-15 00:33:11,528 INFO mapreduce.Job: Running job: job_1639519343908_0005
2021-12-15 00:33:19,611 INFO mapreduce.Job: Job job_1639519343908_0005 running in uber mode : false
2021-12-15 00:33:19,615 INFO mapreduce.Job: map 0% reduce 0%
2021-12-15 00:33:31,178 INFO mapreduce.Job: map 50% reduce 0%
2021-12-15 00:33:32,263 INFO mapreduce.Job: map 60% reduce 0%
2021-12-15 00:33:39,624 INFO mapreduce.Job: map 90% reduce 0%
2021-12-15 00:33:40,632 INFO mapreduce.Job: map 100% reduce 0%
2021-12-15 00:33:41,636 INFO mapreduce.Job: map 100% reduce 100%
2021-12-15 00:33:41,648 INFO mapreduce.Job: Job job_1639519343908_0005 completed successfully
2021-12-15 00:33:41,760 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=3021766
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=31877
HDFS: Number of bytes written=86
HDFS: Number of read operations=35
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Killed map tasks=1
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=89653
Total time spent by all reduces in occupied slots (ms)=8222
Total time spent by all map tasks (ms)=89653
Total time spent by all reduce tasks (ms)=8222
Total vcore-milliseconds taken by all map tasks=89653
Total vcore-milliseconds taken by all reduce tasks=8222
Total megabyte-milliseconds taken by all map tasks=91804672
Total megabyte-milliseconds taken by all reduce tasks=8419328
Map-Reduce Framework
Map input records=819
Map output records=0
Map output bytes=0
Map output materialized bytes=60
Input split bytes=1139
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=60
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=90
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2952790016
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=30738
File Output Format Counters
Bytes Written=86
2021-12-15 00:33:41,790 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2021-12-15 00:33:41,814 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/E/.staging/job_1639519343908_0006
2021-12-15 00:33:41,855 INFO input.FileInputFormat: Total input files to process : 1
2021-12-15 00:33:41,913 INFO mapreduce.JobSubmitter: number of splits:1
2021-12-15 00:33:41,950 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1639519343908_0006
2021-12-15 00:33:41,950 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-15 00:33:42,179 INFO impl.YarnClientImpl: Submitted application application_1639519343908_0006
2021-12-15 00:33:42,190 INFO mapreduce.Job: The url to track the job: http://DESKTOP-S15C716:8088/proxy/application_1639519343908_0006/
2021-12-15 00:33:42,191 INFO mapreduce.Job: Running job: job_1639519343908_0006
2021-12-15 00:33:55,301 INFO mapreduce.Job: Job job_1639519343908_0006 running in uber mode : false
2021-12-15 00:33:55,302 INFO mapreduce.Job: map 0% reduce 0%
2021-12-15 00:34:00,336 INFO mapreduce.Job: map 100% reduce 0%
2021-12-15 00:34:06,366 INFO mapreduce.Job: map 100% reduce 100%
2021-12-15 00:34:07,375 INFO mapreduce.Job: Job job_1639519343908_0006 completed successfully
2021-12-15 00:34:07,404 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=548197
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=212
HDFS: Number of bytes written=0
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3232
Total time spent by all reduces in occupied slots (ms)=3610
Total time spent by all map tasks (ms)=3232
Total time spent by all reduce tasks (ms)=3610
Total vcore-milliseconds taken by all map tasks=3232
Total vcore-milliseconds taken by all reduce tasks=3610
Total megabyte-milliseconds taken by all map tasks=3309568
Total megabyte-milliseconds taken by all reduce tasks=3696640
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=126
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=13
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=536870912
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=86
File Output Format Counters
Bytes Written=0
Yet when viewing the contents of the now-made 'output' folder,
I receive the following result:
hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 1 E supergroup 0 2021-12-15 00:34 /output/_SUCCESS
-rw-r--r-- 1 E supergroup 0 2021-12-15 00:34 /output/part-r-00000
I.e. there's no data written to those files!
May anyone please assist me?
If you have no data in your HDFS input folder that matches the grep pattern 'dfs[a-z.]+', then the output will be empty
From the linked docs (which are for Unix, not Windows), make sure this command completed
bin/hdfs dfs -put %HADOOP_HOME%/etc/hadoop/*.xml input
And you can grep dfs $HADOOP_HOME/etc/hadoop/*.xml (at least on Unix) as well, locally, to verify there should be data output

Why there is no reducer when running 1TB teragen?

I am running a terasort benchmark for hadoop using the following command:
jar /Users/karan.verma/Documents/backups/h/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen -Dmapreduce.job.maps=100 1t random-data
and got the following logs printed for 100 map tasks:
18/03/27 13:06:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/27 13:06:04 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
18/03/27 13:06:05 INFO terasort.TeraSort: Generating -727379968 using 100
18/03/27 13:06:05 INFO mapreduce.JobSubmitter: number of splits:100
18/03/27 13:06:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522131782827_0001
18/03/27 13:06:06 INFO impl.YarnClientImpl: Submitted application application_1522131782827_0001
18/03/27 13:06:06 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1522131782827_0001/
18/03/27 13:06:06 INFO mapreduce.Job: Running job: job_1522131782827_0001
18/03/27 13:06:16 INFO mapreduce.Job: Job job_1522131782827_0001 running in uber mode : false
18/03/27 13:06:16 INFO mapreduce.Job: map 0% reduce 0%
18/03/27 13:06:29 INFO mapreduce.Job: map 2% reduce 0%
18/03/27 13:06:31 INFO mapreduce.Job: map 3% reduce 0%
18/03/27 13:06:32 INFO mapreduce.Job: map 5% reduce 0%
....
18/03/27 13:09:27 INFO mapreduce.Job: map 100% reduce 0%
and here is the final counters as printed on console:
18/03/27 13:09:29 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=10660990
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8594
HDFS: Number of bytes written=0
HDFS: Number of read operations=400
HDFS: Number of large read operations=0
HDFS: Number of write operations=200
Job Counters
Launched map tasks=100
Other local map tasks=100
Total time spent by all maps in occupied slots (ms)=983560
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=983560
Total vcore-milliseconds taken by all map tasks=983560
Total megabyte-milliseconds taken by all map tasks=1007165440
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=8594
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=9746
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=11220811776
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
and here is the output on job schedular:
Please suggest why there is no reduce task?
Your run command says that you're running teragen and not terasort. teragen simply generates data that you can then use for terasort, and so no reducers are needed.
To run terasort over the data that you've just generated, run:
hadoop jar /Users/karan.verma/Documents/backups/h/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort random-data terasort-output
You should then see reducers.
No reduce tasks run when executing teragen. Here is the documentation:
TeraGen will run map tasks to generate the data and will not run any reduce tasks. The default number of map task is defined by the "mapreduce.job.maps=2" param. It's the only purpose here is to generate the 1TB of random data in the following format " 10 bytes key | 2 bytes break | 32 bytes acsii/hex | 4 bytes break | 48 bytes filler | 4 bytes break | \r\n".

Sqoop Reading 0 records while doing a full load

PROBLEM DESCRIPTION:
I was trying to sqoop data but the sqoop returns zero records without any error.
But when I try to retrieve records using a certain limit it gets the data , but once i proceed further with a greater limit it does not fetch any record.
The QUERY that was passes using the Sqoop is as mentioned below:
select usr.id,usr.login,usr.auto_login,usr.password,usr.password_salt,usr.member,usr.first_name,usr.middle_name,usr.last_name,usr.user_type,usr.locale,usr.lastactivity_date,usr.lastpwdupdate,usr.generatedpwd,usr.registration_date,usr.email,usr.email_status,usr.receive_email,usr.last_emailed,usr.gender,usr.date_of_birth,usr.securitystatus,usr.description,usr.realm_id,usr.password_kdf,dcspp_order.last_modified_date, 20151223080640 FROM <TABLE_NAME> usr JOIN atgprdcore.dcspp_order ON (usr.id = dcspp_order.profile_id ) WHERE $CONDITIONS'
Generated SQOOP Command: sqoop job -Dmapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom" -libjars /<COMP>/stage/da_data/DataAqusition_ATG/dm-sqoop-1.0.0/lib/tdgssconfig.jar,/<COMP>/stage/da_data/DataAqusition_ATG/dm-sqoop-1.0.0/lib/ojdbc6.jar,/<COMP>/stage/da_data/DataAqusition_ATG/dm-sqoop-1.0.0/lib/nzjdbc3.jar,/<COMP>/stage/da_data/DataAqusition_ATG/dm-sqoop-1.0.0/lib/terajdbc4.jar -Dfile.encoding=UTF-8 -Dmapreduce.job.queuename=long_running -Dmapreduce.job.name=sample-job-name --create Sqoop_Utility1253423780 -- import --connect jdbc:oracle:thin:#10.202.201.15:9101:KOHLDBSA1 --username XXXXXX --password-file /tmp/sqoop-nzhdusr/27c6d6d50fccdc67342374a4f560d1d6-asdfg.txt --fetch-size 100 --query 'select usr.id,usr.login,usr.auto_login,usr.password,usr.password_salt,usr.member,usr.first_name,usr.middle_name,usr.last_name,usr.user_type,usr.locale,usr.lastactivity_date,usr.lastpwdupdate,usr.generatedpwd,usr.registration_date,usr.email,usr.email_status,usr.receive_email,usr.last_emailed,usr.gender,usr.date_of_birth,usr.securitystatus,usr.description,usr.realm_id,usr.password_kdf,dcspp_order.last_modified_date, 20151223080640 FROM <database>.<tablename> usr JOIN atgprdcore.dcspp_order ON (usr.id = dcspp_order.profile_id ) WHERE $CONDITIONS' --hive-drop-import-delims --null-string "" --target-dir /tmp/sqoop-nzhdusr/dps_user --num-mappers 1 --fields-terminated-by "|"
[INFO] running sqoop
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/12/23 08:07:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.10.0-881
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/12/23 08:07:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.10.0-881
15/12/23 08:07:18 INFO tool.CodeGenTool: Beginning code generation
15/12/23 08:07:19 INFO manager.OracleManager: Time zone has been set to GMT
15/12/23 08:07:19 INFO manager.SqlManager: Executing SQL statement: select usr.id,usr.login,usr.auto_login,usr.password,usr.password_salt,usr.member,usr.first_name,usr.middle_name,usr.last_name,usr.user_type,usr.locale,usr.lastactivity_date,usr.lastpwdupdate,usr.generatedpwd,usr.registration_date,usr.email,usr.email_status,usr.receive_email,usr.last_emailed,usr.gender,usr.date_of_birth,usr.securitystatus,usr.description,usr.realm_id,usr.password_kdf,dcspp_order.last_modified_date, 20151223080640 FROM <database>.<tablename> tab1 JOIN atgprdcore.dcspp_order ON (usr.id = dcspp_order.profile_id ) WHERE (1 = 0)
15/12/23 08:07:19 INFO manager.SqlManager: Executing SQL statement: select usr.id,usr.login,usr.auto_login,usr.password,usr.password_salt,usr.member,usr.first_name,usr.middle_name,usr.last_name,usr.user_type,usr.locale,usr.lastactivity_date,usr.lastpwdupdate,usr.generatedpwd,usr.registration_date,usr.email,usr.email_status,usr.receive_email,usr.last_emailed,usr.gender,usr.date_of_birth,usr.securitystatus,usr.description,usr.realm_id,usr.password_kdf,dcspp_order.last_modified_date, 20151223080640 FROM <database>.<tablename> tab2 JOIN atgprdcore.dcspp_order ON (usr.id = dcspp_order.profile_id ) WHERE (1 = 0)
15/12/23 08:07:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-nzhdusr/compile/ed8d5029fc473715d385a2c0b7e002c4/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/12/23 08:07:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-nzhdusr/compile/ed8d5029fc473715d385a2c0b7e002c4/QueryResult.jar
15/12/23 08:07:21 INFO mapreduce.ImportJobBase: Beginning query import.
15/12/23 08:07:21 INFO client.RMProxy: Connecting to ResourceManager at nhga0002.tst.<COMP>.com/10.200.0.3:8050
15/12/23 08:07:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 47174 for nzhdusr on ha-hdfs:<URL>
15/12/23 08:07:21 INFO security.TokenCache: Got dt for hdfs://<URL>; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:<URL>, Ident: (HDFS_DELEGATION_TOKEN token 47174 for nzhdusr)
15/12/23 08:07:23 INFO db.DBInputFormat: Using read commited transaction isolation
15/12/23 08:07:24 INFO mapreduce.JobSubmitter: number of splits:1
15/12/23 08:07:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1444949527622_18165
15/12/23 08:07:24 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:<URL>, Ident: (HDFS_DELEGATION_TOKEN token 47174 for nzhdusr)
15/12/23 08:07:24 INFO impl.YarnClientImpl: Submitted application application_1444949527622_18165
15/12/23 08:07:25 INFO mapreduce.Job: The url to track the job: https://nhga0002.tst.<COMP>.com:8090/proxy/application_1444949527622_18165/
15/12/23 08:07:25 INFO mapreduce.Job: Running job: job_1444949527622_18165
15/12/23 08:07:35 INFO mapreduce.Job: Job job_1444949527622_18165 running in uber mode : false
15/12/23 08:07:35 INFO mapreduce.Job: map 0% reduce 0%
15/12/23 08:24:57 INFO mapreduce.Job: map 100% reduce 0%
15/12/23 08:24:57 INFO mapreduce.Job: Job job_1444949527622_18165 completed successfully
15/12/23 08:24:57 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=117614
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=1039640
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1039640
Total vcore-seconds taken by all map tasks=1039640
Total megabyte-seconds taken by all map tasks=6919843840
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=119
CPU time spent (ms)=7760
Physical memory (bytes) snapshot=315817984
Virtual memory (bytes) snapshot=6523957248
Total committed heap usage (bytes)=1114112000
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/12/23 08:24:57 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 1,055.9463 seconds (0 bytes/sec)
15/12/23 08:24:57 INFO mapreduce.ImportJobBase: Retrieved 0 records.

How to prevent hadoop fail job due to failed reduce task

I have running a s3distcp job in AWS EMR hadoop 2.2.0 version. And the job keep failed with a failed reducer task after 3 attempts. I also tried both:
mapred.max.reduce.failures.percent
mapreduce.reduce.failures.maxpercent
to be 50 to the oozie hadoop action configuration and mapred-site.xml. But still the job failed.
And here are the logs:
2015-10-02 14:42:16,001 INFO [main] org.apache.hadoop.mapreduce.Job:
Task Id : attempt_1443541526464_0115_r_000010_2, Status : FAILED
2015-10-02 14:42:17,005 INFO [main] org.apache.hadoop.mapreduce.Job:
map 100% reduce 93% 2015-10-02 14:42:29,048 INFO [main]
org.apache.hadoop.mapreduce.Job: map 100% reduce 98% 2015-10-02
15:04:20,369 INFO [main] org.apache.hadoop.mapreduce.Job: map 100%
reduce 100% 2015-10-02 15:04:21,378 INFO [main]
org.apache.hadoop.mapreduce.Job: Job job_1443541526464_0115 failed
with state FAILED due to: Task failed task_1443541526464_0115_r_000010
Job failed as tasks failed. failedMaps:0 failedReduces:1
2015-10-02 15:04:21,451 INFO [main] org.apache.hadoop.mapreduce.Job:
Counters: 45
File System Counters
FILE: Number of bytes read=280
FILE: Number of bytes written=10512783
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=32185011
HDFS: Number of bytes written=0
HDFS: Number of read operations=170
HDFS: Number of large read operations=0
HDFS: Number of write operations=28
Job Counters
Failed reduce tasks=4
Launched map tasks=32
Launched reduce tasks=18
Data-local map tasks=15
Rack-local map tasks=17
Total time spent by all maps in occupied slots (ms)=2652786
Total time spent by all reduces in occupied slots (ms)=65506584
Map-Reduce Framework
Map input records=156810
Map output records=156810
Map output bytes=30892192
Map output materialized bytes=6583455
Input split bytes=3904
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=7168
Reduce input records=0
Reduce output records=0
Spilled Records=156810
Shuffled Maps =448
Failed Shuffles=0
Merged Map outputs=448
Failed Shuffles=0
Merged Map outputs=448
GC time elapsed (ms)=2524
CPU time spent (ms)=108250
Physical memory (bytes) snapshot=14838984704
Virtual memory (bytes) snapshot=106769969152
Total committed heap usage (bytes)=18048614400
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=32181107
File Output Format Counters
Bytes Written=0 2015-10-02 15:04:21,451 INFO [main] com.amazon.external.elasticmapreduce.s3distcp.S3DistCp: Try to
recursively delete
hdfs:/tmp/218ad028-8035-4f97-b113-3cfea04502fc/tempspace 2015-10-02
15:04:21,515 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library 2015-10-02 15:04:21,516 INFO [main]
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
[.deflate] 2015-10-02 15:04:21,554 INFO [main]
org.apache.hadoop.mapred.Task:
Task:attempt_1443541526464_0114_m_000000_0 is done. And is in the
process of committing 2015-10-02 15:04:21,570 INFO [main]
org.apache.hadoop.mapred.Task: Task
attempt_1443541526464_0114_m_000000_0 is allowed to commit now
2015-10-02 15:04:21,584 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
output of task 'attempt_1443541526464_0114_m_000000_0' to
hdfs://rnd2-emr-head.ec2.int$ 2015-10-02 15:04:21,598 INFO [main]
org.apache.hadoop.mapred.Task: Task
'attempt_1443541526464_0114_m_000000_0' done. 2015-10-02 15:04:21,616
INFO [Thread-6] amazon.emr.metrics.MetricsSaver: Inside MetricsSaver
Shutdown Hook
Any suggestions would be much appreciated.
Can you try cleaning the hdfs://tmp directory. Just take a backup of the directory as some other applications use tmp directory and in case you face any issues you can replace the tmp directory.

Oracle Sqoop Retrieves 0 Records

I have a table in Oracle XE 11g
SQL> create table bloblkup (
2 id NUMBER PRIMARY KEY,
3 name varchar(28) NOT NULL,
4 fdata BLOB
5 );
Table created.
SQL> desc bloblkup
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NOT NULL NUMBER
NAME NOT NULL VARCHAR2(28)
FDATA BLOB
populated with
SQL> select * from bloblkup;
ID NAME
---------- ----------------------------
FDATA
--------------------------------------------------------------------------------
1 photo.jpg
032135435135
From the cluster I attempt to Sqoop this table into HDFS
sqoop import --connect jdbc:oracle:thin:#Rhea:1521:xe --username SYSTEM --password oracle --table bloblkup --columns 'name' -m 1
that executes to completion every time but provides INFO
15/03/24 09:14:39 INFO mapreduce.ImportJobBase: Retrieved 0 records.
I can retrieve databases and tables.
I am logging in as system which created and owns the table.
I have also found that I can query tables such as the all_tables and return rows just not table that I have created through sqlplus>
I added --m 1 flag after the first attempted failed due to being unable to located an primary key for the table. I added a primary key to the table using alter table with no change.
thoughts?
console output:
[root#sandbox ~]# sqoop import --connect jdbc:oracle:thin:#Rhea:1521:xe --username SYSTEM --password oracle --table bloblkup --columns 'name' -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/03/24 09:14:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
15/03/24 09:14:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/03/24 09:14:02 INFO manager.SqlManager: Using default fetchSize of 1000
15/03/24 09:14:02 INFO tool.CodeGenTool: Beginning code generation
15/03/24 09:14:04 INFO manager.OracleManager: Time zone has been set to GMT
15/03/24 09:14:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM bloblkup t WHERE 1=0
15/03/24 09:14:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/ce267f99c7e1b14da474c2c395368b67/bloblkup.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/03/24 09:14:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ce267f99c7e1b14da474c2c395368b67/bloblkup.jar
15/03/24 09:14:08 INFO manager.OracleManager: Time zone has been set to GMT
15/03/24 09:14:08 INFO mapreduce.ImportJobBase: Beginning import of bloblkup
15/03/24 09:14:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/03/24 09:14:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/03/24 09:14:10 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.1.91:8050
15/03/24 09:14:12 INFO db.DBInputFormat: Using read commited transaction isolation
15/03/24 09:14:13 INFO mapreduce.JobSubmitter: number of splits:1
15/03/24 09:14:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427151026592_0037
15/03/24 09:14:14 INFO impl.YarnClientImpl: Submitted application application_1427151026592_0037
15/03/24 09:14:14 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1427151026592_0037/
15/03/24 09:14:14 INFO mapreduce.Job: Running job: job_1427151026592_0037
15/03/24 09:14:27 INFO mapreduce.Job: Job job_1427151026592_0037 running in uber mode : false
15/03/24 09:14:27 INFO mapreduce.Job: map 0% reduce 0%
15/03/24 09:14:38 INFO mapreduce.Job: map 100% reduce 0%
15/03/24 09:14:39 INFO mapreduce.Job: Job job_1427151026592_0037 completed successfully
15/03/24 09:14:39 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=107031
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8553
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8553
Total vcore-seconds taken by all map tasks=8553
Total megabyte-seconds taken by all map tasks=2138250
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
CPU time spent (ms)=2170
Physical memory (bytes) snapshot=145907712
Virtual memory (bytes) snapshot=897458176
Total committed heap usage (bytes)=75497472
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/03/24 09:14:39 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 28.8478 seconds (0 bytes/sec)
15/03/24 09:14:39 INFO mapreduce.ImportJobBase: Retrieved 0 records.
Update the Oracle driver version ojdbc6.jar appears to work, but be flaky with JDK 1.7 use ojdbc7.jar
Also, you must "commit" database changes in SQLPLUS for them to persist
Check whether jdbc driver is under $SQOOP_HOME/lib if not copy the ojdbc6.jar file to:
/usr/lib/sqoop/lib/ directory
Provide more details from console.
If every thing is fine then add --target-dir to see output on that specific directory.
/usr/bin/sqoop import --connect jdbc:oracle:thin:system/system#<IP address>:1521:xe --username <username> -P--table <database name>.<table name> --columns "<column names>" --target-dir <target directory path> -m 1

Resources