I am writing one mapper class which should read files from a HDFS Location and create a record (using custom class) for each file. The Code for Mapper class :-
package com.nayan.bigdata.hadoop;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.log4j.Logger;
/**
* #file : FileToRecordMapper.java
* #author : nayan
* #version : 1.0.0
* #date : 27-Aug-2013 12:13:44 PM
* #desc : Mapper class to read files and convert it into records.
*/
public class FileToRecordMapper extends
Mapper<LongWritable, Text, Text, RecordWritable> {
private static Logger logger = Logger.getLogger(FileToRecordMapper.class);
List<Path> allPaths;
FileSystem fs;
#Override
protected void cleanup(Context context)
throws IOException, InterruptedException {
logger.info("Inside cleanup method.");
}
#Override
protected void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException {
logger.info("Starting map method of FileToRecordMapper class.");
for(Path path : allPaths) {
FSDataInputStream in = this.fs.open(path);
Text filePath = new Text(path.getName());
Text directoryPath = new Text(path.getParent().getName());
Text filename = new Text(path.getName().substring(path.getName().lastIndexOf('/') + 1,
path.getName().length()));
byte[] b = new byte[1024];
StringBuilder contentBuilder = new StringBuilder();
while ((in.read(b)) > 0) {
contentBuilder.append(new String(b, "UTF-8"));
}
Text fileContent = new Text(contentBuilder.toString());
in.close();
RecordWritable record = new RecordWritable(filePath, filename,
fileContent, new LongWritable(System.currentTimeMillis()));
logger.info("Record Created : " + record);
context.write(directoryPath, record);
logger.info("map method of FileToRecordMapper class completed.");
}
}
#Override
public void run(Context context)
throws IOException, InterruptedException {
logger.info("Inside run method.");
}
#Override
protected void setup(Context context)
throws IOException, InterruptedException {
logger.info("Inside setup method.");
try {
logger.info("Starting configure method of FileToRecordMapper class.");
fs = FileSystem.get(context.getConfiguration());
Path path = new Path(context.getConfiguration().get("mapred.input.dir"));
allPaths = getAllPaths(path);
} catch (IOException e) {
logger.error("Error while fetching paths.", e);
}
logger.info("Paths : " + ((null != allPaths) ? allPaths : "null"));
logger.info("configure method of FileToRecordMapper class completed.");
super.setup(context);
}
private List<Path> getAllPaths(Path path) throws IOException {
ArrayList<Path> paths = new ArrayList<Path>();
getAllPaths(path, paths);
return paths;
}
private void getAllPaths(Path path, List<Path> paths) throws IOException{
try {
if (!this.fs.isFile(path)) {
for (FileStatus s : fs.listStatus(path)) {
getAllPaths(s.getPath(), paths);
}
} else {
paths.add(path);
}
} catch (IOException e) {
logger.error("File System Exception.", e);
throw e;
}
}
}
Class for record is :-
package com.nayan.bigdata.hadoop;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
/**
* #file : RecordWritable.java
* #author : nayan
* #version : 1.0.0
* #date : 21-Aug-2013 1:53:12 PM
* #desc : Class to create a record in Accumulo
*/
public class RecordWritable implements Writable {
private Text filePath;
private Text fileName;
private Text fileContent;
private LongWritable timeStamp;
public RecordWritable() {
this.filePath = new Text();
this.fileName = new Text();
this.fileContent = new Text();
this.timeStamp = new LongWritable(System.currentTimeMillis());
}
/**
* #param filePath
* #param fileName
* #param fileContent
* #param timeStamp
*/
public RecordWritable(Text filePath, Text fileName, Text fileContent,
LongWritable timeStamp) {
this.filePath = filePath;
this.fileName = fileName;
this.fileContent = fileContent;
this.timeStamp = timeStamp;
}
public Text getFilePath() {
return filePath;
}
public void setFilePath(Text filePath) {
this.filePath = filePath;
}
public Text getFileName() {
return fileName;
}
public void setFileName(Text fileName) {
this.fileName = fileName;
}
public Text getFileContent() {
return fileContent;
}
public void setFileContent(Text fileContent) {
this.fileContent = fileContent;
}
public LongWritable getTimeStamp() {
return timeStamp;
}
public void setTimeStamp(LongWritable timeStamp) {
this.timeStamp = timeStamp;
}
#Override
public int hashCode() {
return this.filePath.getLength() + this.fileName.getLength() + this.fileContent.getLength();
}
#Override
public boolean equals(Object obj) {
if(obj instanceof RecordWritable) {
RecordWritable otherRecord = (RecordWritable) obj;
return this.filePath.equals(otherRecord.filePath) && this.fileName.equals(otherRecord.fileName);
}
return false;
}
#Override
public String toString() {
StringBuilder recordDesc = new StringBuilder("Record Details ::\t");
recordDesc.append("File Path + ").append(this.filePath).append("\t");
recordDesc.append("File Name + ").append(this.fileName).append("\t");
recordDesc.append("File Content Length + ").append(this.fileContent.getLength()).append("\t");
recordDesc.append("File TimeStamp + ").append(this.timeStamp).append("\t");
return recordDesc.toString();
}
#Override
public void readFields(DataInput din) throws IOException {
filePath.readFields(din);
fileName.readFields(din);
fileContent.readFields(din);
timeStamp.readFields(din);
}
#Override
public void write(DataOutput dout) throws IOException {
filePath.write(dout);
fileName.write(dout);
fileContent.write(dout);
timeStamp.write(dout);
}
}
Job Runner class :-
package com.nayan.bigdata.hadoop;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
/**
* #file : HadoopJobRunner.java
* #author : nayan
* #version : 1.0.0
* #date : 22-Aug-2013 12:45:15 PM
* #desc : Class to run Hadoop MR job.
*/
public class HadoopJobRunner extends Configured implements Tool {
private static Logger logger = Logger.getLogger(HadoopJobRunner.class);
/**
* #param args
* #throws Exception
*/
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new HadoopJobRunner(), args);
System.exit(res);
}
#Override
public int run(String[] arg0) throws Exception {
logger.info("Initiating Hadoop Job.");
Configuration conf = new Configuration(true);
conf.setStrings("mapred.output.dir", arg0[1]);
conf.setStrings("mapred.input.dir", arg0[0]);
Job mrJob = new Job(conf, "FileRecordsJob");
mrJob.setJarByClass(HadoopJobRunner.class);
mrJob.setMapOutputKeyClass(Text.class);
mrJob.setMapOutputValueClass(RecordWritable.class);
mrJob.setMapperClass(FileToRecordMapper.class);
mrJob.setReducerClass(FileRecordsReducer.class);
mrJob.setOutputKeyClass(Text.class);
mrJob.setOutputValueClass(RecordWritable.class);
logger.info("MapRed Job Configuration : " + mrJob.getConfiguration().toString());
logger.info("Input Path : " + mrJob.getConfiguration().get("mapred.input.dir"));
return mrJob.waitForCompletion(true) ? 0 : 1;
}
}
Pom file for the project :-
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.nayan.bigdata</groupId>
<artifactId>BigDataOperations</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>BigDataOperations</name>
<properties>
<hadoop.version>0.20.2</hadoop.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.nayan.bigdata.hadoop.HadoopJobRunner</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
When I run the jar, I am getting output on console :-
[root#koversevm tmp]# hadoop jar BigDataOperations-1.0-SNAPSHOT.jar /usr/hadoop/sample /usr/hadoop/jobout
13/08/28 18:33:57 INFO hadoop.HadoopJobRunner: Initiating Hadoop Job.
13/08/28 18:33:57 INFO hadoop.HadoopJobRunner: Setting the input/output path.
13/08/28 18:33:57 INFO hadoop.HadoopJobRunner: MapRed Job Configuration : Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml
13/08/28 18:33:57 INFO hadoop.HadoopJobRunner: Input Path : null
13/08/28 18:33:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/08/28 18:33:58 INFO input.FileInputFormat: Total input paths to process : 8
13/08/28 18:33:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/28 18:33:58 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/28 18:33:58 INFO mapred.JobClient: Running job: job_201308281800_0008
13/08/28 18:33:59 INFO mapred.JobClient: map 0% reduce 0%
13/08/28 18:34:06 INFO mapred.JobClient: map 25% reduce 0%
13/08/28 18:34:13 INFO mapred.JobClient: map 50% reduce 0%
13/08/28 18:34:17 INFO mapred.JobClient: map 75% reduce 0%
13/08/28 18:34:23 INFO mapred.JobClient: map 100% reduce 0%
13/08/28 18:34:24 INFO mapred.JobClient: map 100% reduce 33%
13/08/28 18:34:26 INFO mapred.JobClient: map 100% reduce 100%
13/08/28 18:34:27 INFO mapred.JobClient: Job complete: job_201308281800_0008
13/08/28 18:34:27 INFO mapred.JobClient: Counters: 25
13/08/28 18:34:27 INFO mapred.JobClient: Job Counters
13/08/28 18:34:27 INFO mapred.JobClient: Launched reduce tasks=1
13/08/28 18:34:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=44066
13/08/28 18:34:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/28 18:34:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/08/28 18:34:27 INFO mapred.JobClient: Launched map tasks=8
13/08/28 18:34:27 INFO mapred.JobClient: Data-local map tasks=8
13/08/28 18:34:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19034
13/08/28 18:34:27 INFO mapred.JobClient: FileSystemCounters
13/08/28 18:34:27 INFO mapred.JobClient: FILE_BYTES_READ=6
13/08/28 18:34:27 INFO mapred.JobClient: HDFS_BYTES_READ=1011
13/08/28 18:34:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=549207
13/08/28 18:34:27 INFO mapred.JobClient: Map-Reduce Framework
13/08/28 18:34:27 INFO mapred.JobClient: Map input records=0
13/08/28 18:34:27 INFO mapred.JobClient: Reduce shuffle bytes=48
13/08/28 18:34:27 INFO mapred.JobClient: Spilled Records=0
13/08/28 18:34:27 INFO mapred.JobClient: Map output bytes=0
13/08/28 18:34:27 INFO mapred.JobClient: CPU time spent (ms)=3030
13/08/28 18:34:27 INFO mapred.JobClient: Total committed heap usage (bytes)=1473413120
13/08/28 18:34:27 INFO mapred.JobClient: Combine input records=0
13/08/28 18:34:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=1011
13/08/28 18:34:27 INFO mapred.JobClient: Reduce input records=0
13/08/28 18:34:27 INFO mapred.JobClient: Reduce input groups=0
13/08/28 18:34:27 INFO mapred.JobClient: Combine output records=0
13/08/28 18:34:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=1607675904
13/08/28 18:34:27 INFO mapred.JobClient: Reduce output records=0
13/08/28 18:34:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=23948111872
13/08/28 18:34:27 INFO mapred.JobClient: Map output records=0
But when I look into logs I found following exception :-
Task Logs: 'attempt_201308281800_0008_m_000000_0'
stdout logs
2013-08-28 18:34:01 DEBUG Child:82 - Child starting
2013-08-28 18:34:02 DEBUG Groups:136 - Creating new Groups object
2013-08-28 18:34:02 DEBUG Groups:59 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
2013-08-28 18:34:02 DEBUG UserGroupInformation:193 - hadoop login
2013-08-28 18:34:02 DEBUG UserGroupInformation:142 - hadoop login commit
2013-08-28 18:34:02 DEBUG UserGroupInformation:172 - using local user:UnixPrincipal: mapred
2013-08-28 18:34:02 DEBUG UserGroupInformation:664 - UGI loginUser:mapred (auth:SIMPLE)
2013-08-28 18:34:02 DEBUG FileSystem:1598 - Creating filesystem for file:///var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/jobToken
2013-08-28 18:34:02 DEBUG TokenCache:182 - Task: Loaded jobTokenFile from: /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/jobToken; num of sec keys = 0 Number of tokens 1
2013-08-28 18:34:02 DEBUG Child:106 - loading token. # keys =0; from file=/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/jobToken
2013-08-28 18:34:02 DEBUG UserGroupInformation:1300 - PriviledgedAction as:job_201308281800_0008 (auth:SIMPLE) from:org.apache.hadoop.mapred.Child.main(Child.java:121)
2013-08-28 18:34:02 DEBUG Client:256 - The ping interval is60000ms.
2013-08-28 18:34:02 DEBUG Client:299 - Use SIMPLE authentication for protocol TaskUmbilicalProtocol
2013-08-28 18:34:02 DEBUG Client:569 - Connecting to /127.0.0.1:50925
2013-08-28 18:34:02 DEBUG Client:762 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008: starting, having connections 1
2013-08-28 18:34:02 DEBUG Client:808 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 sending #0
2013-08-28 18:34:02 DEBUG Client:861 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 got value #0
2013-08-28 18:34:02 DEBUG RPC:230 - Call: getProtocolVersion 98
2013-08-28 18:34:02 DEBUG Client:808 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 sending #1
2013-08-28 18:34:02 DEBUG Client:861 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 got value #1
2013-08-28 18:34:02 DEBUG SortedRanges:347 - currentIndex 0 0:0
2013-08-28 18:34:02 DEBUG Counters:177 - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
2013-08-28 18:34:02 DEBUG Counters:314 - Adding SPILLED_RECORDS
2013-08-28 18:34:02 DEBUG Counters:177 - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
2013-08-28 18:34:02 DEBUG SortedRanges:347 - currentIndex 0 0:0
2013-08-28 18:34:02 DEBUG SortedRanges:347 - currentIndex 1 0:0
2013-08-28 18:34:02 DEBUG RPC:230 - Call: getTask 208
2013-08-28 18:34:03 DEBUG TaskRunner:653 - mapred.local.dir for child : /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/attempt_201308281800_0008_m_000000_0
2013-08-28 18:34:03 DEBUG NativeCodeLoader:40 - Trying to load the custom-built native-hadoop library...
2013-08-28 18:34:03 DEBUG NativeCodeLoader:47 - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
2013-08-28 18:34:03 DEBUG NativeCodeLoader:48 - java.library.path=/usr/java/jdk1.6.0_45/jre/lib/amd64/server:/usr/java/jdk1.6.0_45/jre/lib/amd64:/usr/java/jdk1.6.0_45/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/work
2013-08-28 18:34:03 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-08-28 18:34:03 DEBUG TaskRunner:709 - Fully deleting contents of /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/root/jobcache/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/work
2013-08-28 18:34:03 INFO JvmMetrics:71 - Initializing JVM Metrics with processName=MAP, sessionId=
2013-08-28 18:34:03 DEBUG Child:251 - Creating remote user to execute task: root
2013-08-28 18:34:03 DEBUG UserGroupInformation:1300 - PriviledgedAction as:root (auth:SIMPLE) from:org.apache.hadoop.mapred.Child.main(Child.java:260)
2013-08-28 18:34:03 DEBUG FileSystem:1598 - Creating filesystem for hdfs://localhost:8020
2013-08-28 18:34:04 DEBUG Client:256 - The ping interval is60000ms.
2013-08-28 18:34:04 DEBUG Client:299 - Use SIMPLE authentication for protocol ClientProtocol
2013-08-28 18:34:04 DEBUG Client:569 - Connecting to localhost/127.0.0.1:8020
2013-08-28 18:34:04 DEBUG Client:808 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root sending #2
2013-08-28 18:34:04 DEBUG Client:762 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root: starting, having connections 2
2013-08-28 18:34:04 DEBUG Client:861 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root got value #2
2013-08-28 18:34:04 DEBUG RPC:230 - Call: getProtocolVersion 18
2013-08-28 18:34:04 DEBUG DFSClient:274 - Short circuit read is false
2013-08-28 18:34:04 DEBUG DFSClient:280 - Connect to datanode via hostname is false
2013-08-28 18:34:04 DEBUG Task:516 - using new api for output committer
2013-08-28 18:34:04 INFO ProcessTree:65 - setsid exited with exit code 0
2013-08-28 18:34:04 INFO Task:539 - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#79ee2c2c
2013-08-28 18:34:04 DEBUG ProcfsBasedProcessTree:238 - [ 16890 ]
2013-08-28 18:34:04 DEBUG Client:808 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root sending #3
2013-08-28 18:34:04 DEBUG Client:861 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root got value #3
2013-08-28 18:34:04 DEBUG RPC:230 - Call: getBlockLocations 12
2013-08-28 18:34:04 DEBUG DFSClient:2595 - Connecting to /127.0.0.1:50010
2013-08-28 18:34:04 DEBUG FSInputChecker:1653 - DFSClient readChunk got seqno 0 offsetInBlock 0 lastPacketInBlock false packetLen 520
2013-08-28 18:34:04 DEBUG Counters:314 - Adding SPLIT_RAW_BYTES
2013-08-28 18:34:04 DEBUG DFSClient:2529 - Client couldn't reuse - didnt send code
2013-08-28 18:34:04 INFO MapTask:613 - Processing split: hdfs://localhost:8020/usr/hadoop/sample/2012MTCReportFINAL.pdf:0+1419623
2013-08-28 18:34:04 DEBUG Counters:314 - Adding MAP_INPUT_RECORDS
2013-08-28 18:34:04 DEBUG FileSystem:1598 - Creating filesystem for file:///
2013-08-28 18:34:04 INFO MapTask:803 - io.sort.mb = 100
2013-08-28 18:34:05 INFO MapTask:815 - data buffer = 79691776/99614720
2013-08-28 18:34:05 INFO MapTask:816 - record buffer = 262144/327680
2013-08-28 18:34:05 DEBUG Counters:314 - Adding MAP_OUTPUT_BYTES
2013-08-28 18:34:05 DEBUG Counters:314 - Adding MAP_OUTPUT_RECORDS
2013-08-28 18:34:05 DEBUG Counters:314 - Adding COMBINE_INPUT_RECORDS
2013-08-28 18:34:05 DEBUG Counters:314 - Adding COMBINE_OUTPUT_RECORDS
2013-08-28 18:34:05 WARN LoadSnappy:46 - Snappy native library not loaded
2013-08-28 18:34:05 DEBUG Client:808 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root sending #4
2013-08-28 18:34:05 DEBUG Client:861 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root got value #4
2013-08-28 18:34:05 DEBUG RPC:230 - Call: getBlockLocations 4
2013-08-28 18:34:05 INFO FileToRecordMapper:65 - Inside run method.
2013-08-28 18:34:05 INFO MapTask:1142 - Starting flush of map output
2013-08-28 18:34:05 INFO Task:830 - Task:attempt_201308281800_0008_m_000000_0 is done. And is in the process of commiting
2013-08-28 18:34:05 DEBUG Counters:177 - Creating group FileSystemCounters with nothing
2013-08-28 18:34:05 DEBUG Counters:314 - Adding FILE_BYTES_WRITTEN
2013-08-28 18:34:05 DEBUG Counters:314 - Adding HDFS_BYTES_READ
2013-08-28 18:34:05 DEBUG Counters:314 - Adding COMMITTED_HEAP_BYTES
2013-08-28 18:34:05 DEBUG ProcfsBasedProcessTree:238 - [ 16890 ]
2013-08-28 18:34:05 DEBUG Counters:314 - Adding CPU_MILLISECONDS
2013-08-28 18:34:05 DEBUG Counters:314 - Adding PHYSICAL_MEMORY_BYTES
2013-08-28 18:34:05 DEBUG Counters:314 - Adding VIRTUAL_MEMORY_BYTES
2013-08-28 18:34:05 DEBUG Client:808 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root sending #5
2013-08-28 18:34:05 DEBUG Client:861 - IPC Client (47) connection to localhost/127.0.0.1:8020 from root got value #5
2013-08-28 18:34:05 DEBUG RPC:230 - Call: getFileInfo 2
2013-08-28 18:34:05 DEBUG Task:658 - attempt_201308281800_0008_m_000000_0 Progress/ping thread exiting since it got interrupted
2013-08-28 18:34:05 DEBUG Client:808 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 sending #6
2013-08-28 18:34:05 DEBUG Client:861 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 got value #6
2013-08-28 18:34:05 DEBUG RPC:230 - Call: statusUpdate 3
2013-08-28 18:34:05 DEBUG Client:808 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 sending #7
2013-08-28 18:34:05 DEBUG Client:861 - IPC Client (47) connection to /127.0.0.1:50925 from job_201308281800_0008 got value #7
2013-08-28 18:34:05 DEBUG RPC:230 - Call: done 1
2013-08-28 18:34:05 INFO Task:942 - Task 'attempt_201308281800_0008_m_000000_0' done.
2013-08-28 18:34:05 INFO TaskLogsTruncater:69 - Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-08-28 18:34:05 DEBUG TaskLogsTruncater:174 - Truncation is not needed for /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/stdout
2013-08-28 18:34:05 DEBUG TaskLogsTruncater:174 - Truncation is not needed for /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/stderr
2013-08-28 18:34:05 DEBUG TaskLogsTruncater:202 - Cannot open /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/syslog for reading. Continuing with other log files
java.io.FileNotFoundException: /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/syslog (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:199)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
2013-08-28 18:34:05 DEBUG TaskLogsTruncater:202 - Cannot open /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/profile.out for reading. Continuing with other log files
java.io.FileNotFoundException: /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/profile.out (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:199)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
2013-08-28 18:34:05 DEBUG TaskLogsTruncater:202 - Cannot open /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/debugout for reading. Continuing with other log files
java.io.FileNotFoundException: /usr/lib/hadoop-0.20/logs/userlogs/job_201308281800_0008/attempt_201308281800_0008_m_000000_0/debugout (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:199)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
I have checked the permission and it works fine for the Sample WordCount program. I am new to Hadoop. I googled but could not find anything substantial. I am using hadoop-0.20.2-cdh3u6 on a single node setup.
Related
I'm running the code below and no output is generated (well, the output folder and the reducer output file are created, but there is nothing wihtin the part-r-00000 file). From the logs, I suspect the mappers are not emitting anything.
The code:
package com.telefonica.iot.tidoop.mrlib;
import com.telefonica.iot.tidoop.mrlib.utils.Constants;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
public class Count extends Configured implements Tool {
private static final Logger LOGGER = Logger.getLogger(Count.class);
public static class UnitEmitter extends Mapper<Object, Text, Text, LongWritable> {
private final Text commonKey = new Text("common-key");
#Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
context.write(commonKey, new LongWritable(1));
} // map
} // UnitEmitter
public static class Adder extends Reducer<Text, LongWritable, Text, LongWritable> {
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(key, new LongWritable(sum));
} // reduce
} // Adder
public static class AdderWithTag extends Reducer<Text, LongWritable, Text, LongWritable> {
private String tag;
#Override
public void setup(Context context) throws IOException, InterruptedException {
tag = context.getConfiguration().get(Constants.PARAM_TAG, "");
} // setup
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(new Text(tag), new LongWritable(sum));
} // reduce
} // AdderWithTag
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new Filter(), args);
System.exit(res);
} // main
#Override
public int run(String[] args) throws Exception {
// check the number of arguments, show the usage if it is wrong
if (args.length != 3) {
showUsage();
return -1;
} // if
// get the arguments
String input = args[0];
String output = args[1];
String tag = args[2];
// create and configure a MapReduce job
Configuration conf = this.getConf();
conf.set(Constants.PARAM_TAG, tag);
Job job = Job.getInstance(conf, "tidoop-mr-lib-count");
job.setNumReduceTasks(1);
job.setJarByClass(Count.class);
job.setMapperClass(UnitEmitter.class);
job.setCombinerClass(Adder.class);
job.setReducerClass(AdderWithTag.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output));
// run the MapReduce job
return job.waitForCompletion(true) ? 0 : 1;
} // main
private void showUsage() {
System.out.println("...");
} // showUsage
} // Count
The command executed, and the output logs:
$ hadoop jar target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar com.telefonica.iot.tidoop.mrlib.Count -libjars target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar tidoop/numbers tidoop/numbers_count onetag
15/11/05 17:24:52 INFO input.FileInputFormat: Total input paths to process : 1
15/11/05 17:24:52 WARN snappy.LoadSnappy: Snappy native library is available
15/11/05 17:24:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/11/05 17:24:53 INFO snappy.LoadSnappy: Snappy native library loaded
15/11/05 17:24:53 INFO mapred.JobClient: Running job: job_201507101501_23002
15/11/05 17:24:54 INFO mapred.JobClient: map 0% reduce 0%
15/11/05 17:25:00 INFO mapred.JobClient: map 100% reduce 0%
15/11/05 17:25:07 INFO mapred.JobClient: map 100% reduce 33%
15/11/05 17:25:08 INFO mapred.JobClient: map 100% reduce 100%
15/11/05 17:25:09 INFO mapred.JobClient: Job complete: job_201507101501_23002
15/11/05 17:25:09 INFO mapred.JobClient: Counters: 25
15/11/05 17:25:09 INFO mapred.JobClient: Job Counters
15/11/05 17:25:09 INFO mapred.JobClient: Launched reduce tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5350
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Rack-local map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: Launched map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8702
15/11/05 17:25:09 INFO mapred.JobClient: FileSystemCounters
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_READ=6
15/11/05 17:25:09 INFO mapred.JobClient: HDFS_BYTES_READ=1968928
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108226
15/11/05 17:25:09 INFO mapred.JobClient: Map-Reduce Framework
15/11/05 17:25:09 INFO mapred.JobClient: Map input records=598001
15/11/05 17:25:09 INFO mapred.JobClient: Reduce shuffle bytes=6
15/11/05 17:25:09 INFO mapred.JobClient: Spilled Records=0
15/11/05 17:25:09 INFO mapred.JobClient: Map output bytes=0
15/11/05 17:25:09 INFO mapred.JobClient: CPU time spent (ms)=2920
15/11/05 17:25:09 INFO mapred.JobClient: Total committed heap usage (bytes)=355663872
15/11/05 17:25:09 INFO mapred.JobClient: Combine input records=0
15/11/05 17:25:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=124
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input records=0
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input groups=0
15/11/05 17:25:09 INFO mapred.JobClient: Combine output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=328683520
15/11/05 17:25:09 INFO mapred.JobClient: Reduce output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1466642432
15/11/05 17:25:09 INFO mapred.JobClient: Map output records=0
The content of the output file:
$ hadoop fs -cat /user/frb/tidoop/numbers_count/part-r-00000
[frb#cosmosmaster-gi tidoop-mr-lib]$ hadoop fs -ls /user/frb/tidoop/numbers_count/
Found 3 items
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/_SUCCESS
drwxr----- - frb frb 0 2015-11-05 17:24 /user/frb/tidoop/numbers_count/_logs
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/part-r-00000
Any hints about what is happening?
Weird. I'd try using Mapper (identity mapper) with your job.
If the Mapper does not output anything there must be something weird with your hadoop installation, or job configuration.
When I run my code, I get the following exception:
hadoop#hadoop:~/testPrograms$ hadoop jar cp.jar CustomPartition /test/test.txt /test/output33
15/03/03 16:33:33 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/03/03 16:33:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/03/03 16:33:33 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/03/03 16:33:33 INFO input.FileInputFormat: Total input paths to process : 1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: number of splits:1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/03/03 16:33:35 INFO mapreduce.Job: Running job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Starting task: attempt_local1055584612_0001_m_000000_0
15/03/03 16:33:35 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/03/03 16:33:35 INFO mapred.MapTask: Processing split: hdfs://node1/test/test.txt:0+107
15/03/03 16:33:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/03/03 16:33:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/03/03 16:33:35 INFO mapred.MapTask: soft limit at 83886080
15/03/03 16:33:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/03/03 16:33:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/03/03 16:33:35 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/03/03 16:33:35 INFO mapred.MapTask: Starting flush of map output
15/03/03 16:33:35 INFO mapred.LocalJobRunner: map task executor complete.
15/03/03 16:33:35 WARN mapred.LocalJobRunner: job_local1055584612_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at CustomPartition$MapperClass.map(CustomPartition.java:27)
at CustomPartition$MapperClass.map(CustomPartition.java:17)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 running in uber mode : false
15/03/03 16:33:36 INFO mapreduce.Job: map 0% reduce 0%
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 failed with state FAILED due to: NA
15/03/03 16:33:36 INFO mapreduce.Job: Counters: 0
I am trying to partition based on the game the persons play. Each word is separated by a tab. And after the three fields, I got the next line by pressing the return key.
My code:
public class CustomPartition {
public static class MapperClass extends Mapper<Object, Text, Text, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String itr[] = value.toString().split("\t");
String game=itr[2].toString();
String nameGoals=itr[0]+"\t"+itr[1];
context.write(new Text(game), new Text(nameGoals));
}
}
public static class GoalPartition extends Partitioner<Text, Text> {
#Override
public int getPartition(Text key,Text value, int numReduceTasks){
if(key.toString()=="football")
{return 0;}
else if(key.toString()=="basketball")
{return 1;}
else// (key.toString()=="icehockey")
{return 2;}
}
}
public static class ReducerClass extends Reducer<Text,Text,Text,Text> {
#Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String name="";
String game="";
int maxGoals=0;
for (Text val : values)
{
String valTokens[]= val.toString().split("\t");
int goals = Integer.parseInt(valTokens[1]);
if(goals > maxGoals)
{
name = valTokens[0];
game = key.toString();
maxGoals = goals;
context.write(new Text(name), new Text ("game"+game+"score"+maxGoals));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "custom partition");
job.setJarByClass(CustomPartition.class);
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setPartitionerClass(GoalPartition.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I am new to hadoop and after installing Hadoop 2.2.0 I tried to follow example http://www.srccodes.com/p/article/45/run-hadoop-wordcount-mapreduce-example-windows to try a simple map reduce job.
However whenever I try to do the map reduce job over the txt file I created, I keep getting failures with this message
c:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.ja
r wordcount /input output
14/03/26 14:20:48 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
14/03/26 14:20:50 INFO input.FileInputFormat: Total input paths to process : 1
14/03/26 14:20:51 INFO mapreduce.JobSubmitter: number of splits:1
14/03/26 14:20:51 INFO Configuration.deprecation: user.name is deprecated. Inste
ad, use mapreduce.job.user.name
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.jar is deprecated. Inst
ead, use mapreduce.job.jar
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.value.class is d
eprecated. Instead, use mapreduce.job.output.value.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.combine.class is dep
recated. Instead, use mapreduce.job.combine.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.map.class is depreca
ted. Instead, use mapreduce.job.map.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.reduce.class is depr
ecated. Instead, use mapreduce.job.reduce.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.input.dir is deprecated
. Instead, use mapreduce.input.fileinputformat.inputdir
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.dir is deprecate
d. Instead, use mapreduce.output.fileoutputformat.outputdir
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated
. Instead, use mapreduce.job.maps
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.key.class is dep
recated. Instead, use mapreduce.job.output.key.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.working.dir is deprecat
ed. Instead, use mapreduce.job.working.dir
14/03/26 14:20:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_13
95833928952_0004
14/03/26 14:20:52 INFO impl.YarnClientImpl: Submitted application application_13
95833928952_0004 to ResourceManager at /0.0.0.0:8032
14/03/26 14:20:52 INFO mapreduce.Job: The url to track the job: http://GoncaloPe
reira:8088/proxy/application_1395833928952_0004/
14/03/26 14:20:52 INFO mapreduce.Job: Running job: job_1395833928952_0004
14/03/26 14:21:08 INFO mapreduce.Job: Job job_1395833928952_0004 running in uber
mode : false
14/03/26 14:21:08 INFO mapreduce.Job: map 0% reduce 0%
14/03/26 14:21:20 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:21:33 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_1, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:21:48 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_2, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:22:04 INFO mapreduce.Job: map 100% reduce 100%
14/03/26 14:22:10 INFO mapreduce.Job: Job job_1395833928952_0004 failed with sta
te FAILED due to: Task failed task_1395833928952_0004_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/03/26 14:22:10 INFO mapreduce.Job: Counters: 6
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=48786
Total time spent by all reduces in occupied slots (ms)=0
Since I followed everything with no issues step by step I have no idea why this might be, does anyone know?
Edit: Tried adopt 2.3.0 same issue happens with the example jar given, and the code bellow I tried compile, no idea what the issue is
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class teste {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(teste.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I had the same issue (java.lang.ClassCastException) and was able to solve it by running Hadoop with admin privileges. The problem seems to be the creation of symbolic links which by default is not possible for non-admin Windows users. Open a console as administrator and then proceed as described in the example from your link.
link you provided has input perameter as input NOT /input...try with this syntax...
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output
if this doesn't work than see this - Link and modify the mapper class.
I'm using Hadoop 1.2.1 and for some reason my Word Count output looks strange:
input file:
this is sparta this was sparta hello world goodbye world
output in hdfs:
goodbye 1
hello 1
is 1
sparta 1
sparta 1
this 1
this 1
was 1
world 1
world 1
code:
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
And here's some relevant console output:
14/01/04 16:17:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/04 16:17:37 INFO input.FileInputFormat: Total input paths to process : 1
14/01/04 16:17:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/01/04 16:17:37 WARN snappy.LoadSnappy: Snappy native library not loaded
14/01/04 16:17:38 INFO mapred.JobClient: Running job: job_201401041506_0013
14/01/04 16:17:39 INFO mapred.JobClient: map 0% reduce 0%
14/01/04 16:17:45 INFO mapred.JobClient: map 100% reduce 0%
14/01/04 16:17:52 INFO mapred.JobClient: map 100% reduce 33%
14/01/04 16:17:54 INFO mapred.JobClient: map 100% reduce 100%
14/01/04 16:17:55 INFO mapred.JobClient: Job complete: job_201401041506_0013
14/01/04 16:17:55 INFO mapred.JobClient: Counters: 26
14/01/04 16:17:55 INFO mapred.JobClient: Job Counters
14/01/04 16:17:55 INFO mapred.JobClient: Launched reduce tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6007
14/01/04 16:17:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/01/04 16:17:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/01/04 16:17:55 INFO mapred.JobClient: Launched map tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: Data-local map tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9167
14/01/04 16:17:55 INFO mapred.JobClient: File Output Format Counters
14/01/04 16:17:55 INFO mapred.JobClient: Bytes Written=77
14/01/04 16:17:55 INFO mapred.JobClient: FileSystemCounters
14/01/04 16:17:55 INFO mapred.JobClient: FILE_BYTES_READ=123
14/01/04 16:17:55 INFO mapred.JobClient: HDFS_BYTES_READ=169
14/01/04 16:17:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=122037
14/01/04 16:17:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=77
14/01/04 16:17:55 INFO mapred.JobClient: File Input Format Counters
14/01/04 16:17:55 INFO mapred.JobClient: Bytes Read=57
14/01/04 16:17:55 INFO mapred.JobClient: Map-Reduce Framework
14/01/04 16:17:55 INFO mapred.JobClient: Map output materialized bytes=123
14/01/04 16:17:55 INFO mapred.JobClient: Map input records=10
14/01/04 16:17:55 INFO mapred.JobClient: Reduce shuffle bytes=123
14/01/04 16:17:55 INFO mapred.JobClient: Spilled Records=20
14/01/04 16:17:55 INFO mapred.JobClient: Map output bytes=97
14/01/04 16:17:55 INFO mapred.JobClient: Total committed heap usage (bytes)=269619200
14/01/04 16:17:55 INFO mapred.JobClient: Combine input records=0
14/01/04 16:17:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=112
14/01/04 16:17:55 INFO mapred.JobClient: Reduce input records=10
14/01/04 16:17:55 INFO mapred.JobClient: Reduce input groups=7
14/01/04 16:17:55 INFO mapred.JobClient: Combine output records=0
14/01/04 16:17:55 INFO mapred.JobClient: Reduce output records=10
14/01/04 16:17:55 INFO mapred.JobClient: Map output records=10
What can cause this? I'm very new to Hadoop, so i'm not sure where to look.
Thanks!
You're using an old API signature. In 1.x+ the reduce method changed to use iterables instead of iterator (which was what the old 0.x API used, so you will see iterator in many examples in books and on the web).
http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/Reducer.html#reduce%28KEYIN,%20java.lang.Iterable,%20org.apache.hadoop.mapreduce.Reducer.Context%29
Try
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
The #Override annotation tells your compiler to check that your reduce method is overriding the correct method signature in the parent class.
I'm using Hadoop 1.2.1 and Spring Hadoop 1.0.2
I wanted to check the Spring autowiring in a Hadoop Mapper. I wrote this configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<context:property-placeholder location="configuration.properties"/>
<context:component-scan base-package="it.test"/>
<hdp:configuration id="hadoopConfiguration">
fs.default.name=${hd.fs}
</hdp:configuration>
<hdp:job id="my-job"
mapper="hadoop.mapper.MyMapper"
reducer="hadoop.mapper.MyReducer"
output-path="/root/Scrivania/outputSpring/out"
input-path="/root/Scrivania/outputSpring/in" jar="" />
<hdp:job-runner id="my-job-runner" job-ref="my-job" run-at-startup="true"/>
<hdp:hbase-configuration configuration-ref="hadoopConfiguration" zk-quorum="${hbase.host}" zk-port="${hbase.port}"/>
<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
<property name="configuration" ref="hbaseConfiguration"/>
</bean>
</beans>
Then I created this Mapper
public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
private static final Log logger = ....
#Autowired
private IHistoricalDataService hbaseService;
private List<HistoricalDataModel> data;
#SuppressWarnings({ "unchecked", "rawtypes" })
#Override
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
super.cleanup(context);
}
#SuppressWarnings({ "rawtypes", "unchecked" })
#Override
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
super.setup(context);
try {
data = hbaseService.findAllHistoricalData();
logger.warn("Data "+data);
} catch (Exception e) {
String message = "Errore nel setup del contesto; messaggio errore: "+e.getMessage();
logger.fatal(message, e);
throw new InterruptedException(message);
}
}
#Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
// TODO Auto-generated method stub
super.map(key, value, context);
}
}
As you can see MyMapper does nothing; the only thing I want to print is the data variable; nothing exception
When I launch it in my IDE (Eclipse Luna) by a JUnit Test I can see only this prints:
16:19:11,902 INFO [XmlBeanDefinitionReader] Loading XML bean definitions from class path resource [application-context.xml]
16:19:12,540 INFO [GenericApplicationContext] Refreshing org.springframework.context.support.GenericApplicationContext#150e804: startup date [Mon Dec 02 16:19:12 CET 2013]; root of context hierarchy
16:19:12,693 INFO [PropertySourcesPlaceholderConfigurer] Loading properties file from class path resource [configuration.properties]
16:19:12,722 INFO [DefaultListableBeanFactory] Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory#109f81a: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,pinfClusteringHistoricalDataDao,historicalDataServiceImpl,clusterAnalysisSvcImpl,org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,hadoopConfiguration,clusterAnalysisJob,clusterAnalysisJobRunner,hbaseConfiguration,hbaseTemplate,org.springframework.context.annotation.ConfigurationClassPostProcessor.importAwareProcessor]; root of factory hierarchy
16:19:13,516 INFO [JobRunner] Starting job [clusterAnalysisJob]
16:19:13,568 WARN [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16:19:13,584 WARN [JobClient] No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
16:19:13,619 INFO [FileInputFormat] Total input paths to process : 0
16:19:13,998 INFO [JobClient] Running job: job_local265750426_0001
16:19:14,065 INFO [LocalJobRunner] Waiting for map tasks
16:19:14,065 INFO [LocalJobRunner] Map task executor complete.
16:19:14,127 INFO [ProcessTree] setsid exited with exit code 0
16:19:14,134 INFO [Task] Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#b1258d
16:19:14,144 INFO [LocalJobRunner]
16:19:14,148 INFO [Merger] Merging 0 sorted segments
16:19:14,149 INFO [Merger] Down to the last merge-pass, with 0 segments left of total size: 0 bytes
16:19:14,149 INFO [LocalJobRunner]
16:19:14,219 INFO [Task] Task:attempt_local265750426_0001_r_000000_0 is done. And is in the process of commiting
16:19:14,226 INFO [LocalJobRunner]
16:19:14,226 INFO [Task] Task attempt_local265750426_0001_r_000000_0 is allowed to commit now
16:19:14,251 INFO [FileOutputCommitter] Saved output of task 'attempt_local265750426_0001_r_000000_0' to /root/Scrivania/outputSpring/out
16:19:14,254 INFO [LocalJobRunner] reduce > reduce
16:19:14,255 INFO [Task] Task 'attempt_local265750426_0001_r_000000_0' done.
16:19:15,001 INFO [JobClient] map 0% reduce 100%
16:19:15,005 INFO [JobClient] Job complete: job_local265750426_0001
16:19:15,007 INFO [JobClient] Counters: 13
16:19:15,007 INFO [JobClient] File Output Format Counters
16:19:15,007 INFO [JobClient] Bytes Written=0
16:19:15,007 INFO [JobClient] FileSystemCounters
16:19:15,007 INFO [JobClient] FILE_BYTES_READ=22
16:19:15,007 INFO [JobClient] FILE_BYTES_WRITTEN=67630
16:19:15,007 INFO [JobClient] Map-Reduce Framework
16:19:15,008 INFO [JobClient] Reduce input groups=0
16:19:15,008 INFO [JobClient] Combine output records=0
16:19:15,008 INFO [JobClient] Reduce shuffle bytes=0
16:19:15,008 INFO [JobClient] Physical memory (bytes) snapshot=0
16:19:15,008 INFO [JobClient] Reduce output records=0
16:19:15,008 INFO [JobClient] Spilled Records=0
16:19:15,008 INFO [JobClient] CPU time spent (ms)=0
16:19:15,009 INFO [JobClient] Total committed heap usage (bytes)=111935488
16:19:15,009 INFO [JobClient] Virtual memory (bytes) snapshot=0
16:19:15,009 INFO [JobClient] Reduce input records=0
16:19:15,009 INFO [JobRunner] Completed job [clusterAnalysisJob]
16:19:15,028 WARN [SpringHadoopTest] Scrivo............ OOOOOOO
It seems that the JOb starts but my Mapper is never executed; can anybody suggest to me where I'm wrong?
There is no autowiring of mappers or reducers. These classes are loaded by Hadoop so there is no application context associated with them at runtime. The application context is only available as part of the workflow orchestration of the jobs.
I don't know why your setup method isn't logging any messages, are you sure you specified the right class and package for the mapper?
-Thomas
Is it possible that your input file exists, but is empty? With no input splits, no mapper tasks would ever get created. Just a guess...