How to use Spring Batch to read CSV files which contains mutiple line in one cell?

How to use Spring Batch to read CSV files which contains mutiple line in one cell? - spring

Raw CSV is like this:
First line: Name, StudentID, comment
Data:
Name, StudentId, Comment
Jake, 12312, poor
Emma, 12324, good
Mary, 13214, need more work on programming
and math.
The comment cell of the last entry of the csv data contains two lines. I want to treat it as one line data.
When I read the file using flatItemReader, it throws error about "expected token 3 but actual 1" I guess it treat the second line as a new line.
Is there a way to treat them as one line?

Have your reader just return the raw string for each line without trying to split on the delimiter. Make a processor (has to be stateful) to handle the parsing. The only tricky part is you'll have to signal to the processor when you've reached the EOF somehow so it isn't waiting to see if it should aggregate the next line. Something like this:
public class AggregatingItemProcessor<T> implements ItemProcessor<T, T>, InitializingBean {
private BiPredicate<T, T> aggregatePredicate;
private BiFunction<T, T, T> aggregator;
public void setAggregatePredicate(BiPredicate<T, T> aggregatePredicate) {
this.aggregatePredicate = aggregatePredicate;
}
public void setAggregator(BiFunction<T, T, T> aggregator) {
this.aggregator = aggregator;
}
private T cur;
#Override
public T process(T item) throws Exception {
if(cur == null) {
cur = item;
return null;
}
if(aggregatePredicate.test(cur, item)) {
cur = aggregator.apply(cur, item);
return null;
} else {
T toRet = cur;
cur = item;
return toRet;
}
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(aggregatePredicate, "Predicate to determine if records should be aggregated must not be null.");
Assert.notNull(aggregator, "Function for aggregating items must not be null.");
}
}
Then the config...
static final String EOF_MARKER = "\0";
#Bean
public FlatFileItemReader<String> reader() {
final FlatFileItemReader<String> reader = new FlatFileItemReader<String>() {
private boolean finished = false;
#Override
public String read() throws Exception, UnexpectedInputException, ParseException {
if(finished) return null;
String next = super.read();
if(next == null) {
finished = true;
return EOF_MARKER;
}
return next;
}
};
reader.setLineMapper((s, i) -> s);
return reader;
}
#Bean
public AggregatingItemProcessor<String> processor() {
final AggregatingItemProcessor<String> processor = new AggregatingItemProcessor<>();
processor.setAggregatePredicate((s1, s2) -> !EOF_MARKER.equals(s2) && StringUtils.countOccurrencesOf(s2, ",") < 2);
processor.setAggregator(String::concat);
return processor;
}

Related

I want to write a Hadoop MapReduce Join in Java

I'm completely new in Hadoop Framework and I want to write a "MapReduce" program (HadoopJoin.java) that joins on x attribute between two tables R and S. The structure of the two tables is :
R (tag : char, x : int, y : varchar(30))
and
S (tag : char, x : int, z : varchar(30))
For example we have for R table :
r 10 r-10-0
r 11 r-11-0
r 12 r-12-0
r 21 r-21-0
And for S table :
s 11 s-11-0
s 21 s-41-0
s 21 s-41-1
s 12 s-31-0
s 11 s-31-1
The result should look like :
r 11 r-11-0 s 11 s-11-0
etc.
Can anyone help me please ?

It will be very difficult to describe join in mapreduce for someone who is new to this Framework but here I provide a working implementation for your situation and I definitely recommend you to read section 9 of Hadoop The Definitive Guide 4th Eddition. It has described how to implement Join in mapreduce very well.
First of all you might consider using higher level frameworks such as Pig, Hive and Spark because they provide join operation in their core part of implementation.
Secondly There are many ways to implement mapreduce depending of the nature of your data. This ways include map-side join and reduce-side join. In this answer I have implemented the reduce-side join:
Implementation:
First of all we should have two different mapper for two different datset notice that in your case same mapper can be used for two dataset but in many situation you need different mappers for different dataset and because of that I have defined two mappers to make this solution more general:
I have used TextPair that have two attributes, one of them is the key that is used to join data and the other one is a tag that specify which dataset this record belongs to. If it belongs to the first dataset this tag will be 0. otherwise it will be 1.
I have implemented TextPair.FirstComparator to ensure that for each key(join by key) the record of the first dataset is the first key which is received by reducer. And all the other records in second dataset with that id are received after that. Actually this line of code will do the trick for us:
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
So in reducer the first record that we will receive is the record from dataset1 and after that we receive record from dataset2. The only thing that should be done is that we have to write those records.
Mapper for dataset1:
public class JoinDataSet1Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "0"), value);
}
}
Mapper for DataSet2:
public class JoinDataSet2Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "1"), value);
}
}
Reducer:
public class JoinReducer extends Reducer<TextPair, Text, NullWritable, Text> {
public static class KeyPartitioner extends Partitioner<TextPair, Text> {
#Override
public int getPartition(TextPair key, Text value, int numPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions;
}
}
#Override
protected void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Iterator<Text> iter = values.iterator();
Text stationName = new Text(iter.next());
while (iter.hasNext()) {
Text record = iter.next();
Text outValue = new Text(stationName.toString() + "\t" + record.toString());
context.write(NullWritable.get(), outValue);
}
}
}
Custom key:
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
}
JobDriver:
public class JoinJob extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "Join two DataSet");
job.setJarByClass(getClass());
Path ncdcInputPath = new Path(getConf().get("job.input1.path"));
Path stationInputPath = new Path(getConf().get("job.input2.path"));
Path outputPath = new Path(getConf().get("job.output.path"));
MultipleInputs.addInputPath(job, ncdcInputPath,
TextInputFormat.class, JoinDataSet1Mapper.class);
MultipleInputs.addInputPath(job, stationInputPath,
TextInputFormat.class, JoinDataSet2Mapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
job.setPartitionerClass(JoinReducer.KeyPartitioner.class);
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
job.setMapOutputKeyClass(TextPair.class);
job.setReducerClass(JoinReducer.class);
job.setOutputKeyClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new JoinJob(), args);
System.exit(exitCode);
}
}

Spring batch read file one by one. File content is not constant

MultiResourceItemReader reads all files sequentially.
I want once one file read completely, it should call processor/writer.it should not read next file.
Since file content is not constant, i can't go with chunk size.
Any idea on chunk policy to decide end of file content?

I think you should write a step which read/process/write only one file with a "single file item reader" (like FlatFileItemReader). And repeat the step while there are files remainig.
Spring batch gives you a feature to do so : conditional flows and in particular the programmatic flow decision which gives you a smart way to decide when to stop a loop between steps (when there is not file any more)
And since you will not be able to give a constant input file name to your reader, you should also have a look at Late binding section.
Hope this will be enough to help you. Please, make comments if you need more details.

Using MultiResourceItemReader, assigning multiple file reasources.
Using custom file reader as delegate, reading a file completely
For reading file completely, come up with a logic
#Bean
public MultiResourceItemReader<SimpleFileBean> simpleReader()
{
Resource[] resourceList = getFileResources();
if(resourceList == null) {
System.out.println("No input files available");
}
MultiResourceItemReader<SimpleFileBean> resourceItemReader = new MultiResourceItemReader<SimpleFileBean>();
resourceItemReader.setResources(resourceList);
resourceItemReader.setDelegate(simpleFileReader());
return resourceItemReader;
}
#Bean
SimpleInboundReader simpleFileReader() {
return new SimpleInboundReader(customSimpleFileReader());
}
#Bean
public FlatFileItemReader customSimpleFileReader() {
return new FlatFileItemReaderBuilder()
.name("customFileItemReader")
.lineMapper(new PassThroughLineMapper())
.build();
}
public class SimpleInboundReader implements ResourceAwareItemReaderItemStream{
private Object currentItem = null;
private FileModel fileModel = null;
private String fileName = null;
private boolean fileRead = false;
private ResourceAwareItemReaderItemStream<String> delegate;
public SimpleInboundReader(ResourceAwareItemReaderItemStream<String> delegate) {
this.delegate = delegate;
}
#Override
public void open(ExecutionContext executionContext) throws ItemStreamException {
delegate.open(executionContext);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
delegate.update(executionContext);
}
#Override
public void close() throws ItemStreamException {
delegate.close();
}
#Override
public void setResource(Resource resource) {
fileName = resource.getFilename();
this.delegate.setResource(resource);
}
String getNextLine() throws UnexpectedInputException, ParseException, NonTransientResourceException, Exception {
return delegate.read();
}
#Override
public SimpleFileBean read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
SimpleFileBean simpleFileBean = null;
String currentLine = null;
currentLine=delegate.read();
if(currentLine != null) {
simpleFileBean = new SimpleFileBean();
simpleFileBean.getLines().add(currentLine);
while ((currentLine = getNextLine()) != null) {
simpleFileBean.getLines().add(currentLine);
}
return simpleFileBean;
}
return null;
}
}

How to skip even lines of a Stream<String> obtained from the Files.lines

In this case just odd lines have meaningful data and there is no character that uniquely identifies those lines. My intention is to get something equivalent to the following example:
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isOddLine())
.map(line -> toDomainObject(line))
Is there any “clean” way to do it, without sharing global state?

No, there's no way to do this conveniently with the API. (Basically the same reason as to why there is no easy way of having a zipWithIndex, see Is there a concise way to iterate over a stream with indices in Java 8?).
You can still use Stream, but go for an iterator:
Iterator<String> iter = Files.lines(src).iterator();
while (iter.hasNext()) {
iter.next(); // discard
toDomainObject(iter.next()); // use
}
(You might want to use try-with-resource on that stream though.)

A clean way is to go one level deeper and implement a Spliterator. On this level you can control the iteration over the stream elements and simply iterate over two items whenever the downstream requests one item:
public class OddLines<T> extends Spliterators.AbstractSpliterator<T>
implements Consumer<T> {
public static <T> Stream<T> oddLines(Stream<T> source) {
return StreamSupport.stream(new OddLines(source.spliterator()), false);
}
private static long odd(long l) { return l==Long.MAX_VALUE? l: (l+1)/2; }
Spliterator<T> originalLines;
OddLines(Spliterator<T> source) {
super(odd(source.estimateSize()), source.characteristics());
originalLines=source;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if(originalLines==null || !originalLines.tryAdvance(action))
return false;
if(!originalLines.tryAdvance(this)) originalLines=null;
return true;
}
#Override
public void accept(T t) {}
}
Then you can use it like
Stream<DomainObject> res = OddLines.oddLines(Files.lines(src))
.map(line -> toDomainObject(line));
This solution has no side effects and retains most advantages of the Stream API like the lazy evaluation. However, it should be clear that it hasn’t a useful semantics for unordered stream processing (beware about the subtle aspects like using forEachOrdered rather than forEach when performing a terminal action on all elements) and while supporting parallel processing in principle, it’s unlikely to be very efficient…

As aioobe said, there isn't a convenient way to do this, but there are several inconvenient ways. :-)
Here's another spliterator-based approach. Unlike Holger's, which wraps another spliterator, this one does the I/O itself. This gives greater control over things like ordering, but it also means that it has to deal with IOException and close handling. I also threw in a Predicate parameter that lets you get a crack at which lines get passed through.
static class LineSpliterator extends Spliterators.AbstractSpliterator<String>
implements AutoCloseable {
final BufferedReader br;
final LongPredicate pred;
long count = 0L;
public LineSpliterator(Path path, LongPredicate pred) throws IOException {
super(Long.MAX_VALUE, Spliterator.ORDERED);
br = Files.newBufferedReader(path);
this.pred = pred;
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
try {
String s;
while ((s = br.readLine()) != null) {
if (pred.test(++count)) {
action.accept(s);
return true;
}
}
return false;
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
#Override
public void close() {
try {
br.close();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static Stream<String> lines(Path path, LongPredicate pred) throws IOException {
LineSpliterator ls = new LineSpliterator(path, pred);
return StreamSupport.stream(ls, false)
.onClose(() -> ls.close());
}
}
You'd use it within a try-with-resources to ensure that the file is closed, even if an exception occurs:
static void printOddLines() throws IOException {
try (Stream<String> lines = LineSpliterator.lines(PATH, x -> (x & 1L) == 1L)) {
lines.forEach(System.out::println);
}
}

You can do this with a custom spliterator:
public class EvenOdd {
public static final class EvenSpliterator<T> implements Spliterator<T> {
private final Spliterator<T> underlying;
boolean even;
public EvenSpliterator(Spliterator<T> underlying, boolean even) {
this.underlying = underlying;
this.even = even;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (even) {
even = false;
return underlying.tryAdvance(action);
}
if (!underlying.tryAdvance(t -> {})) {
return false;
}
return underlying.tryAdvance(action);
}
#Override
public Spliterator<T> trySplit() {
if (!hasCharacteristics(SUBSIZED)) {
return null;
}
final Spliterator<T> newUnderlying = underlying.trySplit();
if (newUnderlying == null) {
return null;
}
final boolean oldEven = even;
if ((newUnderlying.estimateSize() & 1) == 1) {
even = !even;
}
return new EvenSpliterator<>(newUnderlying, oldEven);
}
#Override
public long estimateSize() {
return underlying.estimateSize()>>1;
}
#Override
public int characteristics() {
return underlying.characteristics();
}
}
public static void main(String[] args) {
final EvenSpliterator<Integer> spliterator = new EvenSpliterator<>(IntStream.range(1, 100000).parallel().mapToObj(Integer::valueOf).spliterator(), false);
final List<Integer> result = StreamSupport.stream(spliterator, true).parallel().collect(Collectors.toList());
final List<Integer> expected = IntStream.range(1, 100000 / 2).mapToObj(i -> i * 2).collect(Collectors.toList());
if (result.equals(expected)) {
System.out.println("Yay! Expected result.");
}
}
}

Following the #aioobe algorithm, here's another spliterator-based approach, as proposed by #Holger but more concise, even if less effective.
public static <T> Stream<T> filterOdd(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
Then you can use it like
Stream<DomainObject> res = Files.lines(src)
filterOdd(res)
.map(line -> toDomainObject(line))

Custom WritableCompare displays object reference as output

I am new to Hadoop and Java, and I feel there is something obvious I am just missing. I am using Hadoop 1.0.3 if that means anything.
My goal for using hadoop is to take a bunch of files and parse them one file at a time (as opposed to line by line). Each file will produce multiple key-values, but context to the other lines is important. The key and value are multi-value/composite, so I have implemented WritableCompare for the key and Writable for the value. Because the processing of each file take a bit of CPU, I want to save the output of the mapper, then run multiple reducers later on.
For the composite keys, I followed [http://stackoverflow.com/questions/12427090/hadoop-composite-key][1]
The problem is, the output is just Java object references as opposed to the composite key and value. Example:
LinkKeyWritable#bd2f9730 LinkValueWritable#8752408c
I am not sure if the problem is related to not reducing the data at all or
Here is my main class:
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Parser.class);
conf.setJobName("raw_parser");
conf.setOutputKeyClass(LinkKeyWritable.class);
conf.setOutputValueClass(LinkValueWritable.class);
conf.setMapperClass(RawMap.class);
conf.setNumMapTasks(0);
conf.setInputFormat(PerFileInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
PerFileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
And my Mapper class:
public class RawMap extends MapReduceBase implements
Mapper {
public void map(NullWritable key, Text value,
OutputCollector<LinkKeyWritable, LinkValueWritable> output,
Reporter reporter) throws IOException {
String json = value.toString();
SerpyReader reader = new SerpyReader(json);
GoogleParser parser = new GoogleParser(reader);
for (String page : reader.getPages()) {
String content = reader.readPageContent(page);
parser.addPage(content);
}
for (Link link : parser.getLinks()) {
LinkKeyWritable linkKey = new LinkKeyWritable(link);
LinkValueWritable linkValue = new LinkValueWritable(link);
output.collect(linkKey, linkValue);
}
}
}
Link is basically a struct of various information that get's split between LinkKeyWritable and LinkValueWritable
LinkKeyWritable:
public class LinkKeyWritable implements WritableComparable<LinkKeyWritable>{
protected Link link;
public LinkKeyWritable() {
super();
link = new Link();
}
public LinkKeyWritable(Link link) {
super();
this.link = link;
}
#Override
public void readFields(DataInput in) throws IOException {
link.batchDay = in.readLong();
link.source = in.readUTF();
link.domain = in.readUTF();
link.path = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeLong(link.batchDay);
out.writeUTF(link.source);
out.writeUTF(link.domain);
out.writeUTF(link.path);
}
#Override
public int compareTo(LinkKeyWritable o) {
return ComparisonChain.start().
compare(link.batchDay, o.link.batchDay).
compare(link.domain, o.link.domain).
compare(link.path, o.link.path).
result();
}
#Override
public int hashCode() {
return Objects.hashCode(link.batchDay, link.source, link.domain, link.path);
}
#Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable) {
final LinkKeyWritable o = (LinkKeyWritable)obj;
return Objects.equal(link.batchDay, o.link.batchDay)
&& Objects.equal(link.source, o.link.source)
&& Objects.equal(link.domain, o.link.domain)
&& Objects.equal(link.path, o.link.path);
}
return false;
}
}
LinkValueWritable:
public class LinkValueWritable implements Writable{
protected Link link;
public LinkValueWritable() {
link = new Link();
}
public LinkValueWritable(Link link) {
this.link = new Link();
this.link.type = link.type;
this.link.description = link.description;
}
#Override
public void readFields(DataInput in) throws IOException {
link.type = in.readUTF();
link.description = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(link.type);
out.writeUTF(link.description);
}
#Override
public int hashCode() {
return Objects.hashCode(link.type, link.description);
}
#Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable) {
final LinkKeyWritable o = (LinkKeyWritable)obj;
return Objects.equal(link.type, o.link.type)
&& Objects.equal(link.description, o.link.description);
}
return false;
}
}

I think the answer is in the implementation of the TextOutputFormat. Specifically, the LineRecordWriter's writeObject method:
/**
* Write the object to the byte stream, handling Text as a special
* case.
* #param o the object to print
* #throws IOException if the write throws, we pass it on
*/
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());
} else {
out.write(o.toString().getBytes(utf8));
}
}
As you can see, if your key or value is not a Text object, it calls the toString method on it and writes that out. Since you've left toString unimplemented in your key and value, it's using the Object class's implementation, which is writing out the reference.
I'd say that you should try writing an appropriate toString function or using a different OutputFormat.

It looks like you have a list of objects just like you wanted. You need to implement toString() on your writable if you want a human-readable version printed out instead of an ugly java reference.

Using a custom Object as key emitted by mapper

I have a situation in which mapper emits as key an object of custom type.
It has two fields an intWritable ID, and a data array IntArrayWritable.
The implementation is as follows.
`
import java.io.*;
import org.apache.hadoop.io.*;
public class PairDocIdPerm implements WritableComparable<PairDocIdPerm> {
public PairDocIdPerm(){
this.permId = new IntWritable(-1);
this.SignaturePerm = new IntArrayWritable();
}
public IntWritable getPermId() {
return permId;
}
public void setPermId(IntWritable permId) {
this.permId = permId;
}
public IntArrayWritable getSignaturePerm() {
return SignaturePerm;
}
public void setSignaturePerm(IntArrayWritable signaturePerm) {
SignaturePerm = signaturePerm;
}
private IntWritable permId;
private IntArrayWritable SignaturePerm;
public PairDocIdPerm(IntWritable permId,IntArrayWritable SignaturePerm) {
this.permId = permId;
this.SignaturePerm = SignaturePerm;
}
#Override
public void write(DataOutput out) throws IOException {
permId.write(out);
SignaturePerm.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
#Override
public int hashCode() { // same permId must go to same reducer. there fore just permId
return permId.get();//.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof PairDocIdPerm) {
PairDocIdPerm tp = (PairDocIdPerm) o;
return permId.equals(tp.permId) && SignaturePerm.equals(tp.SignaturePerm);
}
return false;
}
#Override
public String toString() {
return permId + "\t" +SignaturePerm.toString();
}
#Override
public int compareTo(PairDocIdPerm tp) {
int cmp = permId.compareTo(tp.permId);
Writable[] ar, other;
ar = this.SignaturePerm.get();
other = tp.SignaturePerm.get();
if (cmp == 0) {
for(int i=0;i<ar.length;i++){
if(((IntWritable)ar[i]).get() == ((IntWritable)other[i]).get()){cmp= 0;continue;}
else if(((IntWritable)ar[i]).get() < ((IntWritable)other[i]).get()){ return -1;}
else if(((IntWritable)ar[i]).get() > ((IntWritable)other[i]).get()){return 1;}
}
}
return cmp;
//return 1;
}
}`
I require the keys with same Id to go to the same reducer with their sort order as coded in the compareTo method.
However when i use this, my job execution status is always map100% reduce 0%.
The reduce never runs to completion. Is there any thing wrong in this implementation?
In general what is the likely problem if reducer status is always 0%.

I think this might be a possible null pointer exception in the read method:
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
permId is null in this case.
So what you have to do is this:
IntWritable permId = new IntWritable();
Either in the field initializer or before the read.
However, your code is horrible to read.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to use Spring Batch to read CSV files which contains mutiple line in one cell? - spring

Related

I want to write a Hadoop MapReduce Join in Java

Spring batch read file one by one. File content is not constant

How to skip even lines of a Stream<String> obtained from the Files.lines

Custom WritableCompare displays object reference as output

Using a custom Object as key emitted by mapper

Categories

Resources