How to fix this issue that I'm getting while executing map-reduce java code in Eclipse in Windows 10? - hadoop

I am trying to execute a Java Program to Count the words in the input file using mapreduce in hadoop. I am using windows 10 and eclipse IDE.
I am getting fileNotFoundException when the reducer starts executing. Mapper executes completely. Please help to resolve the issue. Stuck here for quite a while.
public class CountMax {
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value,Context context) throws IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
System.out.println("In mapper");
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {
public int maxCount = 0;
public String maxCountWord = "";
public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
sum+=x.get();
if(sum>maxCount){
maxCountWord = key.toString();
maxCount = sum;
System.out.println(sum);
System.out.println(key);
}
System.out.println(maxCountWord);
}
public void setup(Context context)throws IOException, InterruptedException {
System.out.println("in SETUP");
}
protected void cleanup(Context context)throws IOException,InterruptedException {
context.write( new Text(maxCountWord), new IntWritable(maxCount) );
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(CountMax.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path inp = new Path("C:/input.txt");
Path out = new Path("C:/output");
FileInputFormat.addInputPath(job, inp);
FileOutputFormat.setOutputPath(job, out);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
This is what I'm getting in the console :
2019-10-07 00:07:39,461 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local2096667908_0001_m_000000_0
2019-10-07 00:07:39,461 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2019-10-07 00:07:39,463 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2019-10-07 00:07:39,463 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local2096667908_0001_r_000000_0
2019-10-07 00:07:39,472 INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(108)) - File Output Committer Algorithm version is 1
2019-10-07 00:07:39,472 INFO util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(192)) - ProcfsBasedProcessTree currently is supported only on Linux.
2019-10-07 00:07:39,499 INFO mapred.Task (Task.java:initialize(614)) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree#7734a2ef
2019-10-07 00:07:39,501 INFO mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#18daa432
2019-10-07 00:07:39,509 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(205)) - MergerManager: memoryLimit=1314232704, maxSingleShuffleLimit=328558176, mergeThreshold=867393600, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-10-07 00:07:39,510 INFO reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local2096667908_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-10-07 00:07:39,528 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2019-10-07 00:07:39,532 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local2096667908_0001
java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#1
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: C:/tmp/hadoop-SahilJ%20PC/mapred/local/localRunner/SahilJ%20PC/jobcache/job_local2096667908_0001/attempt_local2096667908_0001_m_000000_0/output/file.out.index
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:200)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:71)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:57)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:124)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)

Just making a new user account without any space in the name, with all the admin rights and doing the whole setup again in the new account worked and solved the issue.

Related

Hadoop mapreduce - mapping NullPointerException

I need to write a simple map-reduce program that , given as input a directed graph represented as a list of edges, produces the same graph where each edge (x,y) with x>y is replaced by (y,x) and there are no repetitions of edges in the output graph.
INPUT
1;3
2;1
0;1
3;1
2;0
1;1
2;1
OUTPUT
1;3
1;2
0;1
0;2
1;1
This is the code :
public class ExamGraph {
// mapper class
public static class MyMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
value = new Text( value.toString());
String[] campi = value.toString().split(";");
if (Integer.getInteger(campi[0]) > Integer.getInteger(campi[1]))
context.write(new Text(campi[1]+";"+campi[0]), NullWritable.get());
else context.write(new Text(campi[0]+";"+campi[1]), NullWritable.get());
}
}
// reducer class
public static class MyReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
#Override
protected void reduce(Text key, Iterable <NullWritable> values , Context context)
throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
// create new job
Job job = Job.getInstance(new Configuration());
// job is based on jar containing this class
job.setJarByClass(ExamGraph.class);
// for logging purposes
job.setJobName("ExamGraph");
// set input path in HDFS
FileInputFormat.addInputPath(job, new Path(args[0]));
// set output path in HDFS (destination must not exist)
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set mapper and reducer classes
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// An InputFormat for plain text files.
// Files are broken into lines. Either linefeed or carriage-return are used
// to signal end of line. Keys are the position in the file, and values
// are the line of text.
job.setInputFormatClass(TextInputFormat.class);
// set type of output keys and values for both mappers and reducers
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
// start job
job.waitForCompletion(true);
}
}
When I run the jar file using :
hadoop jar path/jar JOBNAME /inputlocation /outputlocation
I got this error :
18/05/22 02:13:11 INFO mapreduce.Job: Task Id : attempt_1526979627085_0001_m_000000_1, Status : FAILED
Error: java.lang.NullPointerException
at ExamGraph$MyMapper.map(ExamGraph.java:38)
at ExamGraph$MyMapper.map(ExamGraph.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
But I did not find the error in the code.
Found the problem , I confused the method getInteger() with the parseInt() in the mapper.

not able to run wordcount example in mapreduce 2.x

I am trying to execute mapreduce word count exaple in mapreduce 2.x in java.... I had create the jar but while executing it is showing the error like WordMapper class not found in my package but I had declared that in my package.....please help me to solve the issue......
this is my WordCount driver code :
package com.mapreduce2.x;
public class WordCount {
public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException
{
Configuration conf=new Configuration();
org.apache.hadoop.mapreduce.Job job= new org.apache.hadoop.mapreduce.Job(conf, "Word_Count");
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(job, new Path(args[0]));
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}}
And this is my WordMapper Class :-
public class WordMapper extends Mapper<LongWritable, Text, Text,IntWritable>{
private final static IntWritable one=new IntWritable(1);
private Text word=new Text();
public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException
{
String line=value.toString();
StringTokenizer tokenizer=new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word, one);
}
}}
WordReducer code -
public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable> values,Context context) throws IOException, InterruptedException
{
int sum =0;
while(values.hasNext())
{
sum= sum+values.next().get();
}
context.write(key, new IntWritable(sum));
}}
it is showing the folowing error while executing-
15/05/29 10:12:26 INFO mapreduce.Job: map 0% reduce 0%
15/05/29 10:12:33 INFO mapreduce.Job: Task Id : attempt_1432876892622_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mapreduce2.x.WordMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2076)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mapreduce2.x.WordMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
... 8 more
Include the class name while running the JAR file or you can specify the main class name while creating the JAR file.
If you are running without class name then specify the class name while running the JAR.
Use the command
hadoop jar word.jar com.mapreduce2.x.WordMapper /input /output
Here word.jar is the JAR file name.
OR
you can also include the main class name while creating a jar file.
Steps:
File --> export --> JAR --> location --> then click next --> it asks for selecting main class --> select the class and click OK
After that you can run the jar file with the command
hadoop jar word.jar /input /output
Hope this solves your issue.
Try adding the commented line below
Job job = new Job(conf, "wordcount");
//job.setJarByClass(WordCount.class);
It worked for me
You can try this one:(IN Linux/Unix)
Remove the package name in the java code.
Inside the directory which contains the java program, create a new directory called classes. ex: Hadoop-Wordcount -> classes , WordCount.java
compile: javac -classpath $HADOOP_HOME/hadoop-common-2.7.1.jar:$HADOOP_HOME/hadoop-mapreduce-client-core-2.7.1.jar:$HADOOP_HOME/hadoop-annotations-2.7.1.jar:$HADOOP_HOME/commons-cli-1.2.jar -d ./classes WordCount.java
create a jar jar -cvf wordcount.jar -C ./classes/ .
5.run bin/hadoop jar $HADOOP_HOME/Hadoop-WordCount/wordcount.jar WordCount input output

Hadoop Exception in thread "main" java.io.FileNotFoundException: hadoop-mapreduce-client-core-2.6.0.jar even though that file exists

I am working with hadoop-2.6.0 and hbase-0.98.9. While running the hadoop job, it is throwing java.io.FileNotFoundException even though that file exists and also exists in the classpath, but still it is looking for in hdfs:// path. What could be the problem? I did check here, but that question is for third party jars. Here it is in the classpath. Here is the error.
15/05/23 02:08:39 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6737ca, quorum=localhost:2181, baseZNode=/hbase
15/05/23 02:08:39 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
15/05/23 02:08:39 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
15/05/23 02:08:39 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d7fd352eb000c, negotiated timeout = 40000
15/05/23 02:08:40 INFO mapreduce.TableOutputFormat: Created table instance for Energy
15/05/23 02:08:40 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/05/23 02:08:40 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/home/vijaykumar/hadoop/hadoop_tmpdir/mapred/staging/vijaykumar1706101359/.staging/job_local1706101359_0001
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/vijaykumar/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at habseWrite.run(habseWrite.java:142)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at habseWrite.main(habseWrite.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Mapper class
public static class WriteMapper extends Mapper<LongWritable, Text, IntWritable, Text> {
IntWritable k = new IntWritable();
Text res = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String val = value.toString();
int row = val.hashCode();
k.set(row);
res.set(val);
context.write(k, res);
}
}
Reducer code
public static class WriteReducer extends TableReducer<IntWritable, Text, Text> {
public static final byte[] area = "Area".getBytes();
public static final byte[] prop = "Property".getBytes();
private Text rowkey = new Text();
private int rowCount = 0;
public void reduce(IntWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String X1 = "",X2="",X3="",X4="",X5="",
X6="",X7="",X8="",Y1="",Y2="";
for (Text val : values) {
String[] v = val.toString().split("\t");
X1 = v[0];
X2 = v[1];
X3 = v[2];
X4 = v[3];
X5 = v[4];
X6 = v[5];
X7 = v[6];
X8 = v[7];
Y1 = v[8];
Y2 = v[9];
}
String k = "row"+rowCount;
Put put = new Put(Bytes.toBytes(k.toString()));
put.add(area, "X1".getBytes(), Bytes.toBytes(X1));
put.add(area, "X5".getBytes(), Bytes.toBytes(X5));
put.add(area, "X6".getBytes(), Bytes.toBytes(X6));
put.add(area, "Y1".getBytes(), Bytes.toBytes(Y1));
put.add(area, "Y2".getBytes(), Bytes.toBytes(Y2));
put.add(prop, "X2".getBytes(), Bytes.toBytes(X2));
put.add(prop, "X3".getBytes(), Bytes.toBytes(X3));
put.add(prop, "X4".getBytes(), Bytes.toBytes(X4));
put.add(prop, "X7".getBytes(), Bytes.toBytes(X7));
put.add(prop, "X8".getBytes(), Bytes.toBytes(X8));
rowCount++;
rowkey.set(k);
context.write(rowkey, put);
}
}
And main
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new habseWrite(), args);
System.exit(res);
}
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = HBaseConfiguration.create();
String inputPath = args[0];
Job job = new Job(conf,"HBase_write");
job.setInputFormatClass(TextInputFormat.class);
job.setJarByClass(habseWrite.class);
job.setMapperClass(habseWrite.WriteMapper.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Text.class);
TableMapReduceUtil.initTableReducerJob(
"Energy", // output table
WriteReducer.class, // reducer class
job);
job.setNumReduceTasks(1);
FileInputFormat.setInputPaths(job, inputPath);
return(job.waitForCompletion(true) ? 0 : 1 );
}

Avro AvroMultipleOutputs part-r-00000: File is not open for writing Exception

I have written a MapReduce Job with Avro 1.7.4 on Hadoop 2.3.0. In the first step I wrote all the Avro Results in a AvroSequenceFile. Everything worked well without problems.
Then I tried to use the AvroMultipleOutputs class to write the results in different files. I wrote the same MapReduce Job without using Avro and it was no problem to write the data in two separate files (btw. part-r-00000 was created in hdfs but left empty).
The Avro variant produces exceptions if I write data in the reducer. (If I comment out the lines which write data out I get no exceptions).
Here is Job configurations:
this.conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(ArchiveDataProcessorMR.class);
Path inPath = new Path(props.getProperty("inPath").trim());
Path outPath = new Path(props.getProperty("outPath").trim());
Path outPathMeta = new Path(props.getProperty("outPath.meta").trim());
Path outPathPayload = new Path(props.getProperty("outPath.payload").trim());
// cleanup resources from previous run
FileSystem fs = FileSystem.get(conf);
fs.delete(outPath, true);
FileInputFormat.setInputPaths(job, inPath);
FileOutputFormat.setOutputPath(job, outPath);
AvroSequenceFileInputFormat<DataExportKey,DataExportValue> sequenceInputFormat = new AvroSequenceFileInputFormat<DataExportKey,DataExportValue>();
job.setInputFormatClass(sequenceInputFormat.getClass());
AvroJob.setInputValueSchema(job, DataExportValue.getClassSchema());
AvroJob.setInputKeySchema(job, DataExportKey.getClassSchema());
AvroJob.setMapOutputValueSchema(job, DataExportValue.getClassSchema());
AvroJob.setMapOutputKeySchema(job, DataExportKey.getClassSchema());
job.setMapperClass(ArchiveDataMapper.class);
job.setReducerClass(ArchiveDataReducer.class);
AvroSequenceFileOutputFormat<DataExportKey,DataExportValue> sequenceOutputFormat = new AvroSequenceFileOutputFormat<DataExportKey,DataExportValue>();
AvroJob.setOutputKeySchema(job, DataExportKey.getClassSchema());
AvroJob.setOutputValueSchema(job, DataExportValue.getClassSchema());
job.setOutputFormatClass(AvroSequenceFileOutputFormat.class);
AvroMultipleOutputs.addNamedOutput(job, "meta", sequenceOutputFormat.getClass(), DataExportKey.getClassSchema(), DataExportValue.getClassSchema());
AvroMultipleOutputs.addNamedOutput(job, "payload", sequenceOutputFormat.getClass(), DataExportKey.getClassSchema(), DataExportValue.getClassSchema());
The reducer code (without the business logic) looks like
public static class ArchiveDataReducer extends Reducer<AvroKey<DataExportKey>, AvroValue<DataExportValue>,AvroKey<DataExportKey>,AvroValue<DataExportValue>> {
private AvroMultipleOutputs amos;
public void setup(Context context) throws IOException, InterruptedException {
this.amos = new AvroMultipleOutputs(context);
}
public void cleanup(Context context) throws IOException, InterruptedException {
this.amos.close();
}
/**
* #param key
*/
public void reduce(AvroKey<DataExportKey> key, Iterable<AvroValue<DataExportValue>> xmlIter, Context context) throws IOException, InterruptedException {
try {
DataExportValue newValue = new DataExportValue();
if (key.datum()......) {
... snip...
amos.write("meta",key, new AvroValue<DataExportValue>(newValue));
} else {
... snip...
amos.write("payload",key, new AvroValue<DataExportValue>(newValue));
}
} catch (Exception e) {
e.printStackTrace();
}
}
} // class ArchiveDataReducer
The exception message is
14/05/04 06:52:58 INFO mapreduce.Job: map 100% reduce 0%
14/05/04 06:53:09 INFO mapreduce.Job: map 100% reduce 91%
14/05/04 06:53:09 INFO mapreduce.Job: Task Id : attempt_1399104292130_0016_r_000000_1, Status : FAILED
Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /applications/wd/data_avro/_temporary/1/_temporary/attempt_1399104292130_0016_r_000000_1/part-r-00000: File is not open for writing. Holder DFSClient_attempt_1399104292130_0016_r_000000_1_338983539_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2856)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2667)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2573)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1264)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1112)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:522)
Any hints how to solve this problem? Did you get another example with AvroMultipleOutputs running? I would like to see your code.
Kind regards
Martin

(Hadoop) MapReduce - Chain jobs - JobControl doesn't stop

I need to chain two MapReduce jobs. I used JobControl to set job2 as dependent of job1.
It works, output files are created!! But it doesn't stop!
In the shell it remains in this state:
12/09/11 19:06:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/09/11 19:06:25 INFO input.FileInputFormat: Total input paths to process : 1
12/09/11 19:06:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/09/11 19:06:25 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/11 19:07:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/09/11 19:07:00 INFO input.FileInputFormat: Total input paths to process : 1
How can I stop it?
This is my main.
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Configuration conf2 = new Configuration();
Job job1 = new Job(conf, "canzoni");
job1.setJarByClass(CanzoniOrdinate.class);
job1.setMapperClass(CanzoniMapper.class);
job1.setReducerClass(CanzoniReducer.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class);
ControlledJob cJob1 = new ControlledJob(conf);
cJob1.setJob(job1);
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path("/user/hduser/tmp"));
Job job2 = new Job(conf2, "songsort");
job2.setJarByClass(CanzoniOrdinate.class);
job2.setMapperClass(CanzoniSorterMapper.class);
job2.setSortComparatorClass(ReverseOrder.class);
job2.setInputFormatClass(KeyValueTextInputFormat.class);
job2.setReducerClass(CanzoniSorterReducer.class);
job2.setMapOutputKeyClass(IntWritable.class);
job2.setMapOutputValueClass(Text.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(IntWritable.class);
ControlledJob cJob2 = new ControlledJob(conf2);
cJob2.setJob(job2);
FileInputFormat.addInputPath(job2, new Path("/user/hduser/tmp/part*"));
FileOutputFormat.setOutputPath(job2, new Path(args[1]));
JobControl jobctrl = new JobControl("jobctrl");
jobctrl.addJob(cJob1);
jobctrl.addJob(cJob2);
cJob2.addDependingJob(cJob1);
jobctrl.run();
////////////////
// NEW CODE ///
//////////////
// delete jobctrl.run();
Thread t = new Thread(jobctrl);
t.start();
String oldStatusJ1 = null;
String oldStatusJ2 = null;
while (!jobctrl.allFinished()) {
String status =cJob1.toString();
String status2 =cJob2.toString();
if (!status.equals(oldStatusJ1)) {
System.out.println(status);
oldStatusJ1 = status;
}
if (!status2.equals(oldStatusJ2)) {
System.out.println(status2);
oldStatusJ2 = status2;
}
}
System.exit(0);
}
}
I essentially did what Pietro alluded to above.
public class JobRunner implements Runnable {
private JobControl control;
public JobRunner(JobControl _control) {
this.control = _control;
}
public void run() {
this.control.run();
}
}
and in my map/reduce class I have:
public void handleRun(JobControl control) throws InterruptedException {
JobRunner runner = new JobRunner(control);
Thread t = new Thread(runner);
t.start();
while (!control.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
}
in which I just pass the jobControl object.
The JobControl object itself is Runnable, so you can just use it like this:
new Thread(myJobControlInstance).start()
Just a tweak to the code snippet what sinemetu1 had shared..
You can drop call to the JobRunner as JobControl by itself implements Runnable
Thread thread = new Thread(jobControl);
thread.start();
while (!jobControl.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
I also stumbled upon this link where the user confirms that JobControl can be run ONLY with new thread.
https://www.mail-archive.com/common-user#hadoop.apache.org/msg00556.html
try this:
Thread jcThread = new Thread(jobControl);
jcThread.start();
System.out.println("循环判断jobControl运行状态 >>>>>>>>>>>>>>>>");
while (true) {
if (jobControl.allFinished()) {
System.out.println("====>> jobControl.allFinished=" + jobControl.getSuccessfulJobList());
jobControl.stop();
// 如果不加 break 或者 return,程序会一直循环
break;
}
if (jobControl.getFailedJobList().size() > 0) {
succ = 0;
System.out.println("====>> jobControl.getFailedJobList=" + jobControl.getFailedJobList());
jobControl.stop();
// 如果不加 break 或者 return,程序会一直循环
break;
}
}

Resources