Functionaljava: sorting a list of arbitrary types - java-8

I have a very simple Java bean, WatchedFile, which has a fileName field.
I would like to sort a fj.data.List of WatchedFile objects, but I'm struggling with defining an fj.Ord for the list's sort() method. This is what I came up with:
protected List<WatchedFile> getWatchedFileList(String path) throws IOException {
List<File> files = List.list(new File(path).listFiles());
return files
.map((file) -> new WatchedFile(file.getName(), false, file.length()))
.sort(Ord.ord(new F<WatchedFile, F<WatchedFile, Ordering>>()
{
#Override
public F<WatchedFile, Ordering> f(final WatchedFile watchedFile1)
{
return new F<WatchedFile, Ordering>()
{
#Override
public Ordering f(final WatchedFile watchedFile2)
{
int compareResult = watchedFile1.fileName.compareTo(watchedFile2.fileName);
return (compareResult < 0 ? Ordering.LT :
(compareResult > 0 ? Ordering.GT : Ordering.EQ));
}
};
}
}));
}
This is ugly! I'm sure there is a better way of instantiating an Ord object... Possibly utilizing some Java 8 magick?

protected List<WatchedFile> getWatchedFileList(String path) throws IOException {
List<File> files = Arrays.asList(new File(path).listFiles());
return files.stream()
.map(file -> new WatchedFile(file.getName(), false, file.length()))
.sorted((wf1, wf2)->wf1.fileName.compareTo(wf2.fileName))
.collect(Collectors.toList());
}
It’s recommended to have a method public String getFileName() in your class WatchedFile. In that case you can simply say:
protected List<WatchedFile> getWatchedFileList(String path) throws IOException {
List<File> files = Arrays.asList(new File(path).listFiles());
return files.stream()
.map(file -> new WatchedFile(file.getName(), false, file.length()))
.sorted(Comparator.comparing(WatchedFile::getFileName))
.collect(Collectors.toList());
}
And, using NIO2 for getting the directory entries, it may look like:
protected List<WatchedFile> getWatchedFileList(String path) throws IOException {
try {
return Files.list(Paths.get(path))
.map(p -> new WatchedFile(p.toString(), false, fileSize(p)))
.sorted(Comparator.comparing(WatchedFile::getFileName))
.collect(Collectors.toList());
} catch(UncheckedIOException ex) { throw ex.getCause(); }
}
private long fileSize(Path path) {
try { return Files.size(path); }
catch (IOException ex) { throw new UncheckedIOException(ex); }
}
If you want to stay within the “functional-java” API, a solution can look like:
protected List<WatchedFile> getWatchedFileList(String path) throws IOException {
List<File> files = List.list(new File(path).listFiles());
return files
.map(file -> new WatchedFile(file.getName(), false, file.length()))
.sort(Ord.stringOrd.comap(wf -> wf.fileName));
}
The key point is that you don’t need (shouldn’t) re-implement the way, Strings are compared. Instead, specify the function to get the property value to compare. Compare with Java 8 factory method Comparator.comparing used in the second code example.

Related

Spring batch read file one by one. File content is not constant

MultiResourceItemReader reads all files sequentially.
I want once one file read completely, it should call processor/writer.it should not read next file.
Since file content is not constant, i can't go with chunk size.
Any idea on chunk policy to decide end of file content?
I think you should write a step which read/process/write only one file with a "single file item reader" (like FlatFileItemReader). And repeat the step while there are files remainig.
Spring batch gives you a feature to do so : conditional flows and in particular the programmatic flow decision which gives you a smart way to decide when to stop a loop between steps (when there is not file any more)
And since you will not be able to give a constant input file name to your reader, you should also have a look at Late binding section.
Hope this will be enough to help you. Please, make comments if you need more details.
Using MultiResourceItemReader, assigning multiple file reasources.
Using custom file reader as delegate, reading a file completely
For reading file completely, come up with a logic
#Bean
public MultiResourceItemReader<SimpleFileBean> simpleReader()
{
Resource[] resourceList = getFileResources();
if(resourceList == null) {
System.out.println("No input files available");
}
MultiResourceItemReader<SimpleFileBean> resourceItemReader = new MultiResourceItemReader<SimpleFileBean>();
resourceItemReader.setResources(resourceList);
resourceItemReader.setDelegate(simpleFileReader());
return resourceItemReader;
}
#Bean
SimpleInboundReader simpleFileReader() {
return new SimpleInboundReader(customSimpleFileReader());
}
#Bean
public FlatFileItemReader customSimpleFileReader() {
return new FlatFileItemReaderBuilder()
.name("customFileItemReader")
.lineMapper(new PassThroughLineMapper())
.build();
}
public class SimpleInboundReader implements ResourceAwareItemReaderItemStream{
private Object currentItem = null;
private FileModel fileModel = null;
private String fileName = null;
private boolean fileRead = false;
private ResourceAwareItemReaderItemStream<String> delegate;
public SimpleInboundReader(ResourceAwareItemReaderItemStream<String> delegate) {
this.delegate = delegate;
}
#Override
public void open(ExecutionContext executionContext) throws ItemStreamException {
delegate.open(executionContext);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
delegate.update(executionContext);
}
#Override
public void close() throws ItemStreamException {
delegate.close();
}
#Override
public void setResource(Resource resource) {
fileName = resource.getFilename();
this.delegate.setResource(resource);
}
String getNextLine() throws UnexpectedInputException, ParseException, NonTransientResourceException, Exception {
return delegate.read();
}
#Override
public SimpleFileBean read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
SimpleFileBean simpleFileBean = null;
String currentLine = null;
currentLine=delegate.read();
if(currentLine != null) {
simpleFileBean = new SimpleFileBean();
simpleFileBean.getLines().add(currentLine);
while ((currentLine = getNextLine()) != null) {
simpleFileBean.getLines().add(currentLine);
}
return simpleFileBean;
}
return null;
}
}

How do I extract a list of Objects from a java stream?

Note that there are 3 levels of nesting here -
I am trying to retrieve a list of filters from a filtersubgroup list which is in a filtersupergroup list.
the filter has a property - fieldname.
I managed to extract the list of field names from the data structure using the following method :
public static List<String> extractFilterFieldsAsAList(QuerySearchRequestDTO requestDTO) {
return requestDTO.getFilter().parallelStream()
.map(new Function<FilterSuperGroup, String>() {
#Override
public String apply(FilterSuperGroup filterSuperGroup) {
return filterSuperGroup.getFilterSubGroup().parallelStream()
.map(new Function<FilterSubGroup, String>() {
#Override
public String apply(FilterSubGroup filterSubGroup) {
return filterSubGroup.getFilter().parallelStream()
.map(new Function<Filter, String>() {
#Override
public String apply(Filter t) {
// TODO Auto-generated method stub
return t.getFieldName();
}
}).collect(Collectors.joining(" "));
}
}).collect(Collectors.joining(" "));
}
}).collect(Collectors.toList());
}
Question: However, in another scenario (BELOW) I need a List as the output.
I want to replace findAny().get() with something that doesn't break the stream nesting. findAny().get() is obviously incorrect as it returns only one element from each nest. How do I fix this?
public static List<Filter> prepareFilterListFromRequest(QuerySearchRequestDTO requestDTO) {
List<Filter> listToReturn = new ArrayList<>();
listToReturn.addAll(requestDTO.getFilter().stream()
.map(new Function<FilterSuperGroup, List<Filter>>() {
#Override
public List<Filter> apply(FilterSuperGroup filterSuperGroup) {
return filterSuperGroup.getFilterSubGroup().parallelStream()
.map(new Function<FilterSubGroup, Filter>() {
#Override
public Filter apply(FilterSubGroup filterSubGroup) {
// TODO Auto-generated method stub
return filterSubGroup.getFilter().stream()
.map(new Function<Filter, Filter>() {
#Override
public Filter apply(Filter t) {
// TODO Auto-generated method stub
return t;
}
}).findAny().get();
}
}).collect(Collectors.toList());
}
}).findAny().get()
);
return listToReturn;
}

Subscribers onnext does not contain complete item

We are working with project reactor and having a huge problem right now. This is how we produce (publish our data):
public Flux<String> getAllFlux() {
return Flux.<String>create(sink -> {
new Thread(){
public void run(){
Iterator<Cache.Entry<String, MyObject>> iterator = getAllIterator();
ObjectMapper mapper = new ObjectMapper();
while(iterator.hasNext()) {
try {
sink.next(mapper.writeValueAsString(iterator.next().getValue()));
} catch (IOException e) {
e.printStackTrace();
}
}
sink.complete();
}
} .start();
});
}
As you can see we are taking data from an iterator and are publishing each item in that iterator as a json string. Our subscriber does the following:
flux.subscribe(new Subscriber<String>() {
private Subscription s;
int amount = 1; // the amount of received flux payload at a time
int onNextAmount;
String completeItem="";
ObjectMapper mapper = new ObjectMapper();
#Override
public void onSubscribe(Subscription s) {
System.out.println("subscribe");
this.s = s;
this.s.request(amount);
}
#Override
public void onNext(String item) {
MyObject myObject = null;
try {
System.out.println(item);
myObject = mapper.readValue(completeItem, MyObject.class);
System.out.println(myObject.toString());
} catch (IOException e) {
System.out.println(item);
System.out.println("failed: " + e.getLocalizedMessage());
}
onNextAmount++;
if (onNextAmount % amount == 0) {
this.s.request(amount);
}
}
#Override
public void onError(Throwable t) {
System.out.println(t.getLocalizedMessage())
}
#Override
public void onComplete() {
System.out.println("completed");
});
}
As you can see we are simply printing the String item which we receive and parsing it into an object using jackson wrapper. The problem we got now is that for most of our items everything works fine:
{"itemId": "someId", "itemDesc", "some description"}
But for some items the String is cut off like this for example:
{"itemId": "some"
And the next item after that would be
"Id", "itemDesc", "some description"}
There is no pattern for those cuts. It is completely random and it is different everytime we run that code. Ofcourse our jackson is gettin an error Unexpected end of Input with that behaviour.
So what is causing such a behaviour and how can we solve it?
Solution:
Send the Object inside the flux instead of the String:
public Flux<ItemIgnite> getAllFlux() {
return Flux.create(sink -> {
new Thread(){
public void run(){
Iterator<Cache.Entry<String, ItemIgnite>> iterator = getAllIterator();
while(iterator.hasNext()) {
sink.next(iterator.next().getValue());
}
}
} .start();
});
}
and use the following produces type:
#RequestMapping(value="/allFlux", method=RequestMethod.GET, produces="application/stream+json")
The key here is to use stream+json and not only json.

How to skip even lines of a Stream<String> obtained from the Files.lines

In this case just odd lines have meaningful data and there is no character that uniquely identifies those lines. My intention is to get something equivalent to the following example:
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isOddLine())
.map(line -> toDomainObject(line))
Is there any “clean” way to do it, without sharing global state?
No, there's no way to do this conveniently with the API. (Basically the same reason as to why there is no easy way of having a zipWithIndex, see Is there a concise way to iterate over a stream with indices in Java 8?).
You can still use Stream, but go for an iterator:
Iterator<String> iter = Files.lines(src).iterator();
while (iter.hasNext()) {
iter.next(); // discard
toDomainObject(iter.next()); // use
}
(You might want to use try-with-resource on that stream though.)
A clean way is to go one level deeper and implement a Spliterator. On this level you can control the iteration over the stream elements and simply iterate over two items whenever the downstream requests one item:
public class OddLines<T> extends Spliterators.AbstractSpliterator<T>
implements Consumer<T> {
public static <T> Stream<T> oddLines(Stream<T> source) {
return StreamSupport.stream(new OddLines(source.spliterator()), false);
}
private static long odd(long l) { return l==Long.MAX_VALUE? l: (l+1)/2; }
Spliterator<T> originalLines;
OddLines(Spliterator<T> source) {
super(odd(source.estimateSize()), source.characteristics());
originalLines=source;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if(originalLines==null || !originalLines.tryAdvance(action))
return false;
if(!originalLines.tryAdvance(this)) originalLines=null;
return true;
}
#Override
public void accept(T t) {}
}
Then you can use it like
Stream<DomainObject> res = OddLines.oddLines(Files.lines(src))
.map(line -> toDomainObject(line));
This solution has no side effects and retains most advantages of the Stream API like the lazy evaluation. However, it should be clear that it hasn’t a useful semantics for unordered stream processing (beware about the subtle aspects like using forEachOrdered rather than forEach when performing a terminal action on all elements) and while supporting parallel processing in principle, it’s unlikely to be very efficient…
As aioobe said, there isn't a convenient way to do this, but there are several inconvenient ways. :-)
Here's another spliterator-based approach. Unlike Holger's, which wraps another spliterator, this one does the I/O itself. This gives greater control over things like ordering, but it also means that it has to deal with IOException and close handling. I also threw in a Predicate parameter that lets you get a crack at which lines get passed through.
static class LineSpliterator extends Spliterators.AbstractSpliterator<String>
implements AutoCloseable {
final BufferedReader br;
final LongPredicate pred;
long count = 0L;
public LineSpliterator(Path path, LongPredicate pred) throws IOException {
super(Long.MAX_VALUE, Spliterator.ORDERED);
br = Files.newBufferedReader(path);
this.pred = pred;
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
try {
String s;
while ((s = br.readLine()) != null) {
if (pred.test(++count)) {
action.accept(s);
return true;
}
}
return false;
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
#Override
public void close() {
try {
br.close();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static Stream<String> lines(Path path, LongPredicate pred) throws IOException {
LineSpliterator ls = new LineSpliterator(path, pred);
return StreamSupport.stream(ls, false)
.onClose(() -> ls.close());
}
}
You'd use it within a try-with-resources to ensure that the file is closed, even if an exception occurs:
static void printOddLines() throws IOException {
try (Stream<String> lines = LineSpliterator.lines(PATH, x -> (x & 1L) == 1L)) {
lines.forEach(System.out::println);
}
}
You can do this with a custom spliterator:
public class EvenOdd {
public static final class EvenSpliterator<T> implements Spliterator<T> {
private final Spliterator<T> underlying;
boolean even;
public EvenSpliterator(Spliterator<T> underlying, boolean even) {
this.underlying = underlying;
this.even = even;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (even) {
even = false;
return underlying.tryAdvance(action);
}
if (!underlying.tryAdvance(t -> {})) {
return false;
}
return underlying.tryAdvance(action);
}
#Override
public Spliterator<T> trySplit() {
if (!hasCharacteristics(SUBSIZED)) {
return null;
}
final Spliterator<T> newUnderlying = underlying.trySplit();
if (newUnderlying == null) {
return null;
}
final boolean oldEven = even;
if ((newUnderlying.estimateSize() & 1) == 1) {
even = !even;
}
return new EvenSpliterator<>(newUnderlying, oldEven);
}
#Override
public long estimateSize() {
return underlying.estimateSize()>>1;
}
#Override
public int characteristics() {
return underlying.characteristics();
}
}
public static void main(String[] args) {
final EvenSpliterator<Integer> spliterator = new EvenSpliterator<>(IntStream.range(1, 100000).parallel().mapToObj(Integer::valueOf).spliterator(), false);
final List<Integer> result = StreamSupport.stream(spliterator, true).parallel().collect(Collectors.toList());
final List<Integer> expected = IntStream.range(1, 100000 / 2).mapToObj(i -> i * 2).collect(Collectors.toList());
if (result.equals(expected)) {
System.out.println("Yay! Expected result.");
}
}
}
Following the #aioobe algorithm, here's another spliterator-based approach, as proposed by #Holger but more concise, even if less effective.
public static <T> Stream<T> filterOdd(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
Then you can use it like
Stream<DomainObject> res = Files.lines(src)
filterOdd(res)
.map(line -> toDomainObject(line))

Custom WritableCompare displays object reference as output

I am new to Hadoop and Java, and I feel there is something obvious I am just missing. I am using Hadoop 1.0.3 if that means anything.
My goal for using hadoop is to take a bunch of files and parse them one file at a time (as opposed to line by line). Each file will produce multiple key-values, but context to the other lines is important. The key and value are multi-value/composite, so I have implemented WritableCompare for the key and Writable for the value. Because the processing of each file take a bit of CPU, I want to save the output of the mapper, then run multiple reducers later on.
For the composite keys, I followed [http://stackoverflow.com/questions/12427090/hadoop-composite-key][1]
The problem is, the output is just Java object references as opposed to the composite key and value. Example:
LinkKeyWritable#bd2f9730 LinkValueWritable#8752408c
I am not sure if the problem is related to not reducing the data at all or
Here is my main class:
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Parser.class);
conf.setJobName("raw_parser");
conf.setOutputKeyClass(LinkKeyWritable.class);
conf.setOutputValueClass(LinkValueWritable.class);
conf.setMapperClass(RawMap.class);
conf.setNumMapTasks(0);
conf.setInputFormat(PerFileInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
PerFileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
And my Mapper class:
public class RawMap extends MapReduceBase implements
Mapper {
public void map(NullWritable key, Text value,
OutputCollector<LinkKeyWritable, LinkValueWritable> output,
Reporter reporter) throws IOException {
String json = value.toString();
SerpyReader reader = new SerpyReader(json);
GoogleParser parser = new GoogleParser(reader);
for (String page : reader.getPages()) {
String content = reader.readPageContent(page);
parser.addPage(content);
}
for (Link link : parser.getLinks()) {
LinkKeyWritable linkKey = new LinkKeyWritable(link);
LinkValueWritable linkValue = new LinkValueWritable(link);
output.collect(linkKey, linkValue);
}
}
}
Link is basically a struct of various information that get's split between LinkKeyWritable and LinkValueWritable
LinkKeyWritable:
public class LinkKeyWritable implements WritableComparable<LinkKeyWritable>{
protected Link link;
public LinkKeyWritable() {
super();
link = new Link();
}
public LinkKeyWritable(Link link) {
super();
this.link = link;
}
#Override
public void readFields(DataInput in) throws IOException {
link.batchDay = in.readLong();
link.source = in.readUTF();
link.domain = in.readUTF();
link.path = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeLong(link.batchDay);
out.writeUTF(link.source);
out.writeUTF(link.domain);
out.writeUTF(link.path);
}
#Override
public int compareTo(LinkKeyWritable o) {
return ComparisonChain.start().
compare(link.batchDay, o.link.batchDay).
compare(link.domain, o.link.domain).
compare(link.path, o.link.path).
result();
}
#Override
public int hashCode() {
return Objects.hashCode(link.batchDay, link.source, link.domain, link.path);
}
#Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable) {
final LinkKeyWritable o = (LinkKeyWritable)obj;
return Objects.equal(link.batchDay, o.link.batchDay)
&& Objects.equal(link.source, o.link.source)
&& Objects.equal(link.domain, o.link.domain)
&& Objects.equal(link.path, o.link.path);
}
return false;
}
}
LinkValueWritable:
public class LinkValueWritable implements Writable{
protected Link link;
public LinkValueWritable() {
link = new Link();
}
public LinkValueWritable(Link link) {
this.link = new Link();
this.link.type = link.type;
this.link.description = link.description;
}
#Override
public void readFields(DataInput in) throws IOException {
link.type = in.readUTF();
link.description = in.readUTF();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(link.type);
out.writeUTF(link.description);
}
#Override
public int hashCode() {
return Objects.hashCode(link.type, link.description);
}
#Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable) {
final LinkKeyWritable o = (LinkKeyWritable)obj;
return Objects.equal(link.type, o.link.type)
&& Objects.equal(link.description, o.link.description);
}
return false;
}
}
I think the answer is in the implementation of the TextOutputFormat. Specifically, the LineRecordWriter's writeObject method:
/**
* Write the object to the byte stream, handling Text as a special
* case.
* #param o the object to print
* #throws IOException if the write throws, we pass it on
*/
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());
} else {
out.write(o.toString().getBytes(utf8));
}
}
As you can see, if your key or value is not a Text object, it calls the toString method on it and writes that out. Since you've left toString unimplemented in your key and value, it's using the Object class's implementation, which is writing out the reference.
I'd say that you should try writing an appropriate toString function or using a different OutputFormat.
It looks like you have a list of objects just like you wanted. You need to implement toString() on your writable if you want a human-readable version printed out instead of an ugly java reference.

Resources