How to skip even lines of a Stream<String> obtained from the Files.lines - java-8

In this case just odd lines have meaningful data and there is no character that uniquely identifies those lines. My intention is to get something equivalent to the following example:
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isOddLine())
.map(line -> toDomainObject(line))
Is there any “clean” way to do it, without sharing global state?

No, there's no way to do this conveniently with the API. (Basically the same reason as to why there is no easy way of having a zipWithIndex, see Is there a concise way to iterate over a stream with indices in Java 8?).
You can still use Stream, but go for an iterator:
Iterator<String> iter = Files.lines(src).iterator();
while (iter.hasNext()) {
iter.next(); // discard
toDomainObject(iter.next()); // use
}
(You might want to use try-with-resource on that stream though.)

A clean way is to go one level deeper and implement a Spliterator. On this level you can control the iteration over the stream elements and simply iterate over two items whenever the downstream requests one item:
public class OddLines<T> extends Spliterators.AbstractSpliterator<T>
implements Consumer<T> {
public static <T> Stream<T> oddLines(Stream<T> source) {
return StreamSupport.stream(new OddLines(source.spliterator()), false);
}
private static long odd(long l) { return l==Long.MAX_VALUE? l: (l+1)/2; }
Spliterator<T> originalLines;
OddLines(Spliterator<T> source) {
super(odd(source.estimateSize()), source.characteristics());
originalLines=source;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if(originalLines==null || !originalLines.tryAdvance(action))
return false;
if(!originalLines.tryAdvance(this)) originalLines=null;
return true;
}
#Override
public void accept(T t) {}
}
Then you can use it like
Stream<DomainObject> res = OddLines.oddLines(Files.lines(src))
.map(line -> toDomainObject(line));
This solution has no side effects and retains most advantages of the Stream API like the lazy evaluation. However, it should be clear that it hasn’t a useful semantics for unordered stream processing (beware about the subtle aspects like using forEachOrdered rather than forEach when performing a terminal action on all elements) and while supporting parallel processing in principle, it’s unlikely to be very efficient…

As aioobe said, there isn't a convenient way to do this, but there are several inconvenient ways. :-)
Here's another spliterator-based approach. Unlike Holger's, which wraps another spliterator, this one does the I/O itself. This gives greater control over things like ordering, but it also means that it has to deal with IOException and close handling. I also threw in a Predicate parameter that lets you get a crack at which lines get passed through.
static class LineSpliterator extends Spliterators.AbstractSpliterator<String>
implements AutoCloseable {
final BufferedReader br;
final LongPredicate pred;
long count = 0L;
public LineSpliterator(Path path, LongPredicate pred) throws IOException {
super(Long.MAX_VALUE, Spliterator.ORDERED);
br = Files.newBufferedReader(path);
this.pred = pred;
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
try {
String s;
while ((s = br.readLine()) != null) {
if (pred.test(++count)) {
action.accept(s);
return true;
}
}
return false;
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
#Override
public void close() {
try {
br.close();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static Stream<String> lines(Path path, LongPredicate pred) throws IOException {
LineSpliterator ls = new LineSpliterator(path, pred);
return StreamSupport.stream(ls, false)
.onClose(() -> ls.close());
}
}
You'd use it within a try-with-resources to ensure that the file is closed, even if an exception occurs:
static void printOddLines() throws IOException {
try (Stream<String> lines = LineSpliterator.lines(PATH, x -> (x & 1L) == 1L)) {
lines.forEach(System.out::println);
}
}

You can do this with a custom spliterator:
public class EvenOdd {
public static final class EvenSpliterator<T> implements Spliterator<T> {
private final Spliterator<T> underlying;
boolean even;
public EvenSpliterator(Spliterator<T> underlying, boolean even) {
this.underlying = underlying;
this.even = even;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (even) {
even = false;
return underlying.tryAdvance(action);
}
if (!underlying.tryAdvance(t -> {})) {
return false;
}
return underlying.tryAdvance(action);
}
#Override
public Spliterator<T> trySplit() {
if (!hasCharacteristics(SUBSIZED)) {
return null;
}
final Spliterator<T> newUnderlying = underlying.trySplit();
if (newUnderlying == null) {
return null;
}
final boolean oldEven = even;
if ((newUnderlying.estimateSize() & 1) == 1) {
even = !even;
}
return new EvenSpliterator<>(newUnderlying, oldEven);
}
#Override
public long estimateSize() {
return underlying.estimateSize()>>1;
}
#Override
public int characteristics() {
return underlying.characteristics();
}
}
public static void main(String[] args) {
final EvenSpliterator<Integer> spliterator = new EvenSpliterator<>(IntStream.range(1, 100000).parallel().mapToObj(Integer::valueOf).spliterator(), false);
final List<Integer> result = StreamSupport.stream(spliterator, true).parallel().collect(Collectors.toList());
final List<Integer> expected = IntStream.range(1, 100000 / 2).mapToObj(i -> i * 2).collect(Collectors.toList());
if (result.equals(expected)) {
System.out.println("Yay! Expected result.");
}
}
}

Following the #aioobe algorithm, here's another spliterator-based approach, as proposed by #Holger but more concise, even if less effective.
public static <T> Stream<T> filterOdd(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
Then you can use it like
Stream<DomainObject> res = Files.lines(src)
filterOdd(res)
.map(line -> toDomainObject(line))

Related

How to use Spring Batch to read CSV files which contains mutiple line in one cell?

Raw CSV is like this:
First line: Name, StudentID, comment
Data:
Name, StudentId, Comment
Jake, 12312, poor
Emma, 12324, good
Mary, 13214, need more work on programming
and math.
The comment cell of the last entry of the csv data contains two lines. I want to treat it as one line data.
When I read the file using flatItemReader, it throws error about "expected token 3 but actual 1" I guess it treat the second line as a new line.
Is there a way to treat them as one line?
Have your reader just return the raw string for each line without trying to split on the delimiter. Make a processor (has to be stateful) to handle the parsing. The only tricky part is you'll have to signal to the processor when you've reached the EOF somehow so it isn't waiting to see if it should aggregate the next line. Something like this:
public class AggregatingItemProcessor<T> implements ItemProcessor<T, T>, InitializingBean {
private BiPredicate<T, T> aggregatePredicate;
private BiFunction<T, T, T> aggregator;
public void setAggregatePredicate(BiPredicate<T, T> aggregatePredicate) {
this.aggregatePredicate = aggregatePredicate;
}
public void setAggregator(BiFunction<T, T, T> aggregator) {
this.aggregator = aggregator;
}
private T cur;
#Override
public T process(T item) throws Exception {
if(cur == null) {
cur = item;
return null;
}
if(aggregatePredicate.test(cur, item)) {
cur = aggregator.apply(cur, item);
return null;
} else {
T toRet = cur;
cur = item;
return toRet;
}
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(aggregatePredicate, "Predicate to determine if records should be aggregated must not be null.");
Assert.notNull(aggregator, "Function for aggregating items must not be null.");
}
}
Then the config...
static final String EOF_MARKER = "\0";
#Bean
public FlatFileItemReader<String> reader() {
final FlatFileItemReader<String> reader = new FlatFileItemReader<String>() {
private boolean finished = false;
#Override
public String read() throws Exception, UnexpectedInputException, ParseException {
if(finished) return null;
String next = super.read();
if(next == null) {
finished = true;
return EOF_MARKER;
}
return next;
}
};
reader.setLineMapper((s, i) -> s);
return reader;
}
#Bean
public AggregatingItemProcessor<String> processor() {
final AggregatingItemProcessor<String> processor = new AggregatingItemProcessor<>();
processor.setAggregatePredicate((s1, s2) -> !EOF_MARKER.equals(s2) && StringUtils.countOccurrencesOf(s2, ",") < 2);
processor.setAggregator(String::concat);
return processor;
}

Subscribers onnext does not contain complete item

We are working with project reactor and having a huge problem right now. This is how we produce (publish our data):
public Flux<String> getAllFlux() {
return Flux.<String>create(sink -> {
new Thread(){
public void run(){
Iterator<Cache.Entry<String, MyObject>> iterator = getAllIterator();
ObjectMapper mapper = new ObjectMapper();
while(iterator.hasNext()) {
try {
sink.next(mapper.writeValueAsString(iterator.next().getValue()));
} catch (IOException e) {
e.printStackTrace();
}
}
sink.complete();
}
} .start();
});
}
As you can see we are taking data from an iterator and are publishing each item in that iterator as a json string. Our subscriber does the following:
flux.subscribe(new Subscriber<String>() {
private Subscription s;
int amount = 1; // the amount of received flux payload at a time
int onNextAmount;
String completeItem="";
ObjectMapper mapper = new ObjectMapper();
#Override
public void onSubscribe(Subscription s) {
System.out.println("subscribe");
this.s = s;
this.s.request(amount);
}
#Override
public void onNext(String item) {
MyObject myObject = null;
try {
System.out.println(item);
myObject = mapper.readValue(completeItem, MyObject.class);
System.out.println(myObject.toString());
} catch (IOException e) {
System.out.println(item);
System.out.println("failed: " + e.getLocalizedMessage());
}
onNextAmount++;
if (onNextAmount % amount == 0) {
this.s.request(amount);
}
}
#Override
public void onError(Throwable t) {
System.out.println(t.getLocalizedMessage())
}
#Override
public void onComplete() {
System.out.println("completed");
});
}
As you can see we are simply printing the String item which we receive and parsing it into an object using jackson wrapper. The problem we got now is that for most of our items everything works fine:
{"itemId": "someId", "itemDesc", "some description"}
But for some items the String is cut off like this for example:
{"itemId": "some"
And the next item after that would be
"Id", "itemDesc", "some description"}
There is no pattern for those cuts. It is completely random and it is different everytime we run that code. Ofcourse our jackson is gettin an error Unexpected end of Input with that behaviour.
So what is causing such a behaviour and how can we solve it?
Solution:
Send the Object inside the flux instead of the String:
public Flux<ItemIgnite> getAllFlux() {
return Flux.create(sink -> {
new Thread(){
public void run(){
Iterator<Cache.Entry<String, ItemIgnite>> iterator = getAllIterator();
while(iterator.hasNext()) {
sink.next(iterator.next().getValue());
}
}
} .start();
});
}
and use the following produces type:
#RequestMapping(value="/allFlux", method=RequestMethod.GET, produces="application/stream+json")
The key here is to use stream+json and not only json.

Reading OKIO stream twice

I am using OKHTTP for networking and currently get a charStream from response.charStream() which I then pass for GSON for parsing. Once parsed and inflated, I deflate the model again to save to disk using a stream. It seems like extra work to have to go from networkReader to Model to DiskWriter. Is it possible with OKIO to instead go from networkReader to JSONParser(reader) as well as networkReader to DiskWriter(reader). Basically I want to to be able to read from the network stream twice.
You can use a MirroredSource (taken from this gist).
public class MirroredSource {
private final Buffer buffer = new Buffer();
private final Source source;
private final AtomicBoolean sourceExhausted = new AtomicBoolean();
public MirroredSource(final Source source) {
this.source = source;
}
public Source original() {
return new okio.Source() {
#Override
public long read(final Buffer sink, final long byteCount) throws IOException {
final long bytesRead = source.read(sink, byteCount);
if (bytesRead > 0) {
synchronized (buffer) {
sink.copyTo(buffer, sink.size() - bytesRead, bytesRead);
// Notfiy the mirror to continue
buffer.notify();
}
} else {
sourceExhausted.set(true);
}
return bytesRead;
}
#Override
public Timeout timeout() {
return source.timeout();
}
#Override
public void close() throws IOException {
source.close();
sourceExhausted.set(true);
synchronized (buffer) {
buffer.notify();
}
}
};
}
public Source mirror() {
return new okio.Source() {
#Override
public long read(final Buffer sink, final long byteCount) throws IOException {
synchronized (buffer) {
while (!sourceExhausted.get()) {
// only need to synchronise on reads when the source is not exhausted.
if (buffer.request(byteCount)) {
return buffer.read(sink, byteCount);
} else {
try {
buffer.wait();
} catch (final InterruptedException e) {
//No op
}
}
}
}
return buffer.read(sink, byteCount);
}
#Override
public Timeout timeout() {
return new Timeout();
}
#Override
public void close() throws IOException { /* not used */ }
};
}
}
Usage would look like:
MirroredSource mirroredSource = new MirroredSource(response.body().source()); //Or however you're getting your original source
Source originalSource = mirroredSource.original();
Source secondSource = mirroredSource.mirror();
doParsing(originalSource);
writeToDisk(secondSource);
originalSource.close();
If you want something more robust you can repurpose Relay from OkHttp.

Minimal implementation of JavaFX TextInputArea

I'm investigating the best way to write a rich text editor in JavaFX - don't mention the HTMLEditor to me: we've spent literally months hacking at it and I could write reams about why it isn't suitable for our purposes! Choice at the moment is to extend AnchorPane and do all of the layout, navigation etc. from scratch or to extend TextInputArea, which looks as though it would help. Anyone have their own implementation of that or would like to propose a minimal implementation?
FWIW here's a scrap from me:
public class TryPain3 extends TextInputControl {
private AnchorPane rootNode = new AnchorPane();
public TryPain3() {
super(new Content() {
private String text = "";
#Override
public String get(int i, int i1) {
return text.substring(i, i1);
}
#Override
public void insert(int i, String string, boolean bln) {
}
#Override
public void delete(int i, int i1, boolean bln) {
}
#Override
public int length() {
return text.length();
}
#Override
public String get() {
return text;
}
#Override
public void addListener(ChangeListener<? super String> cl) {
}
#Override
public void removeListener(ChangeListener<? super String> cl) {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
#Override
public String getValue() {
return text;
}
#Override
public void addListener(InvalidationListener il) {
}
#Override
public void removeListener(InvalidationListener il) {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
});
setEditable(true);
Text text1 = new Text("fred was here");
text1.setFont(Font.font("Tahoma", FontWeight.NORMAL, 18));
text1.setTextAlignment(TextAlignment.LEFT);
text1.setFontSmoothingType(FontSmoothingType.LCD);
rootNode.getChildren().add(text1);
setSkin(new TP3Skin(this, rootNode));
}
class TP3Skin implements Skin<TryPain3> {
TryPain3 tp;
Node root;
public TP3Skin(TryPain3 tp, Node root) {
this.tp = tp;
this.root = root;
}
#Override
public TryPain3 getSkinnable() {
return tp;
}
#Override
public Node getNode() {
return root;
}
#Override
public void dispose() {
tp = null;
rootNode = null;
}
}
}
It looks as though the skin is not optional.
Questions I'd like to find out are things like:
how is the UI supposed to be drawn - I'm quite happy to code it from scratch but how to get benefit of calls to forward() as an example
should the UI creation be done in the Skin?
whether the base class deals with things like where to put the cursor if you click on a bit of text
I'm sure other questions will arise from this.
You may want to try next JavaFX 8.0 control TextFlow, which allows aggregation of various text styles. See examples here: https://wikis.oracle.com/display/OpenJDK/Rich+Text+API+Samples
JavaFX 8 is part of JDK8. So you can download developers build here http://jdk8.java.net/download.html and it will include JavaFX and new TextFlow control.

Using a custom Object as key emitted by mapper

I have a situation in which mapper emits as key an object of custom type.
It has two fields an intWritable ID, and a data array IntArrayWritable.
The implementation is as follows.
`
import java.io.*;
import org.apache.hadoop.io.*;
public class PairDocIdPerm implements WritableComparable<PairDocIdPerm> {
public PairDocIdPerm(){
this.permId = new IntWritable(-1);
this.SignaturePerm = new IntArrayWritable();
}
public IntWritable getPermId() {
return permId;
}
public void setPermId(IntWritable permId) {
this.permId = permId;
}
public IntArrayWritable getSignaturePerm() {
return SignaturePerm;
}
public void setSignaturePerm(IntArrayWritable signaturePerm) {
SignaturePerm = signaturePerm;
}
private IntWritable permId;
private IntArrayWritable SignaturePerm;
public PairDocIdPerm(IntWritable permId,IntArrayWritable SignaturePerm) {
this.permId = permId;
this.SignaturePerm = SignaturePerm;
}
#Override
public void write(DataOutput out) throws IOException {
permId.write(out);
SignaturePerm.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
#Override
public int hashCode() { // same permId must go to same reducer. there fore just permId
return permId.get();//.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof PairDocIdPerm) {
PairDocIdPerm tp = (PairDocIdPerm) o;
return permId.equals(tp.permId) && SignaturePerm.equals(tp.SignaturePerm);
}
return false;
}
#Override
public String toString() {
return permId + "\t" +SignaturePerm.toString();
}
#Override
public int compareTo(PairDocIdPerm tp) {
int cmp = permId.compareTo(tp.permId);
Writable[] ar, other;
ar = this.SignaturePerm.get();
other = tp.SignaturePerm.get();
if (cmp == 0) {
for(int i=0;i<ar.length;i++){
if(((IntWritable)ar[i]).get() == ((IntWritable)other[i]).get()){cmp= 0;continue;}
else if(((IntWritable)ar[i]).get() < ((IntWritable)other[i]).get()){ return -1;}
else if(((IntWritable)ar[i]).get() > ((IntWritable)other[i]).get()){return 1;}
}
}
return cmp;
//return 1;
}
}`
I require the keys with same Id to go to the same reducer with their sort order as coded in the compareTo method.
However when i use this, my job execution status is always map100% reduce 0%.
The reduce never runs to completion. Is there any thing wrong in this implementation?
In general what is the likely problem if reducer status is always 0%.
I think this might be a possible null pointer exception in the read method:
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
permId is null in this case.
So what you have to do is this:
IntWritable permId = new IntWritable();
Either in the field initializer or before the read.
However, your code is horrible to read.

Resources