I have two Java Stream<String> A and B.
How can I, at each step, given a predicate p, pick an element from either A or B? The element that has not been picked has to stay at the head of the stream so it can be picked at the next try.
Use can use a zip method. The standard library doesn't include one as standard, but you can just copy the source shown below (from this question).
import java.util.Arrays;
import java.util.List;
import java.util.function.BiFunction;
import java.util.stream.Collectors;
class Main {
public static void main(String[] args) {
List<String> list1 = Arrays.asList("a1", "a2", "a3");
List<String> list2 = Arrays.asList("b1", "b2", "b3");
BiFunction<String, String, String> picker = (a, b) -> {
// pick whether you want a from list1, or b from list2
return a;
};
List<String> result =
StreamUtils.zip(list1.stream(), list2.stream(), picker)
.collect(Collectors.toList());
System.out.println(result);
}
}
StreamUtils.java
import java.util.Objects;
import java.util.Iterator;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
import java.util.function.BiFunction;
public class StreamUtils {
public static<A, B, C> Stream<C> zip(Stream<? extends A> a,
Stream<? extends B> b,
BiFunction<? super A, ? super B, ? extends C> zipper) {
Objects.requireNonNull(zipper);
Spliterator<? extends A> aSpliterator = Objects.requireNonNull(a).spliterator();
Spliterator<? extends B> bSpliterator = Objects.requireNonNull(b).spliterator();
// Zipping looses DISTINCT and SORTED characteristics
int characteristics = aSpliterator.characteristics() & bSpliterator.characteristics() &
~(Spliterator.DISTINCT | Spliterator.SORTED);
long zipSize = ((characteristics & Spliterator.SIZED) != 0)
? Math.min(aSpliterator.getExactSizeIfKnown(), bSpliterator.getExactSizeIfKnown())
: -1;
Iterator<A> aIterator = Spliterators.iterator(aSpliterator);
Iterator<B> bIterator = Spliterators.iterator(bSpliterator);
Iterator<C> cIterator = new Iterator<C>() {
#Override
public boolean hasNext() {
return aIterator.hasNext() && bIterator.hasNext();
}
#Override
public C next() {
return zipper.apply(aIterator.next(), bIterator.next());
}
};
Spliterator<C> split = Spliterators.spliterator(cIterator, zipSize, characteristics);
return (a.isParallel() || b.isParallel())
? StreamSupport.stream(split, true)
: StreamSupport.stream(split, false);
}
}
If the two input streams are already sorted by the merge function, the new stream will be sorted by the merge function eventually. It's going to be something like merge sort. So you just need to concat the two streams, sorted then.
final Stream<String> a = Stream.of("a", "b", "c");
final Stream<String> b = Stream.of("1", "2", "3");
Stream.concat(a, b).sorted((a, b) -> mergeFunction /* TODO */);
Otherwise, probably you will need the stream API in abacus-common:
final Stream<String> a = Stream.of("a", "b", "c");
final Stream<String> b = Stream.of("1", "2", "3");
Stream.merge(a, b, mergeFunction);
Related
I was going through How to remove a key from HashMap while iterating over it?, but my requirement is bit different.
class Main {
public static void main(String[] args) {
Map<String, String> hashMap = new HashMap<>();
hashMap.put("RED", "#FF0000");
hashMap.put("BLACK", null);
hashMap.put("BLUE", "#0000FF");
hashMap.put("GREEN", "#008000");
hashMap.put("WHITE", null);
// I wan't result like below - get All keys whose value is null
List<String> collect = hashMap.values()
.stream()
.filter(e -> e == null)
.collect(Collectors.toList());
System.out.println(collect);
// Result - BLACK, WHITE in list
}
}
Try this:
import java.util.*;
import java.util.stream.*;
class Main {
public static void main(String[] args) {
Map<String, String> hashMap = new HashMap<>();
hashMap.put("RED", "#FF0000");
hashMap.put("BLACK", null);
hashMap.put("BLUE", "#0000FF");
hashMap.put("GREEN", "#008000");
hashMap.put("WHITE", null);
// I wan't result like below - get All keys whose value is null
List<String> collect = hashMap.keySet()
.stream()
.filter(e -> Objects.isNull(hashMap.get(e)))
.collect(Collectors.toList());
System.out.println(collect);
// Result - BLACK, WHITE in list
}
}
As pointed out in the comments, you can try this as well:
import java.util.*;
import java.util.stream.*;
class Main {
public static void main(String[] args) {
Map<String, String> hashMap = new HashMap<>();
hashMap.put("RED", "#FF0000");
hashMap.put("BLACK", null);
hashMap.put("BLUE", "#0000FF");
hashMap.put("GREEN", "#008000");
hashMap.put("WHITE", null);
// I wan't result like below - get All keys whose value is null
List<String> collect = hashMap.entrySet()
.stream()
.filter(e -> Objects.isNull(e.getValue()))
.map(e -> e.getKey())
.collect(Collectors.toList());
System.out.println(collect);
// Result - BLACK, WHITE in list
}
}
This is more optimized, as compared to the first solution.
I have a number from 1 to 10,000 stored in an array of long. When adding them sequentially it will give a result of 50,005,000.
I have writing an Spliterator where if a size of array is longer than 1000, it will be splitted to another array.
Here is my code. But when I run it, the result from addition is far greater than 50,005,000. Can someone tell me what is wrong with my code?
Thank you so much.
import java.util.Arrays;
import java.util.Optional;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.stream.LongStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class SumSpliterator implements Spliterator<Long> {
private final long[] numbers;
private int currentPosition = 0;
public SumSpliterator(long[] numbers) {
super();
this.numbers = numbers;
}
#Override
public boolean tryAdvance(Consumer<? super Long> action) {
action.accept(numbers[currentPosition++]);
return currentPosition < numbers.length;
}
#Override
public long estimateSize() {
return numbers.length - currentPosition;
}
#Override
public int characteristics() {
return SUBSIZED;
}
#Override
public Spliterator<Long> trySplit() {
int currentSize = numbers.length - currentPosition;
if( currentSize <= 1_000){
return null;
}else{
currentPosition = currentPosition + 1_000;
return new SumSpliterator(Arrays.copyOfRange(numbers, 1_000, numbers.length));
}
}
public static void main(String[] args) {
long[] twoThousandNumbers = LongStream.rangeClosed(1, 10_000).toArray();
Spliterator<Long> spliterator = new SumSpliterator(twoThousandNumbers);
Stream<Long> stream = StreamSupport.stream(spliterator, false);
System.out.println( sumValues(stream) );
}
private static long sumValues(Stream<Long> stream){
Optional<Long> optional = stream.reduce( ( t, u) -> t + u );
return optional.get() != null ? optional.get() : Long.valueOf(0);
}
}
I have the strong feeling that you didn’t get the purpose of splitting right. It’s not meant to copy the underlying data but just provide access to a range of it. Keep in mind that spliterators provide read-only access. So you should pass the original array to the new spliterator and configure it with an appropriate position and length instead of copying the array.
But besides the inefficiency of copying, the logic is obviously wrong: You pass Arrays.copyOfRange(numbers, 1_000, numbers.length) to the new spliterator, so the new spliterator contains the elements from position 1000 to the end of the array and you advance the current spliterator’s position by 1000, so the old spliterator covers the elements from currentPosition + 1_000 to the end of the array. So both spliterators will cover elements at the end of the array while at the same time, depending on the previous value of currentPosition, elements at the beginning might not be covered at all. So when you want to advance the currentPosition by 1_000 the skipped range is expressed by Arrays.copyOfRange(numbers, currentPosition, 1_000) instead, referring to the currentPosition before advancing.
It’s should also be noted, that a spliterator should attempt to split balanced, that is, in the middle if the size is known. So splitting off thousand elements is not the right strategy for an array.
Further, your tryAdvance method is wrong. It should not test after calling the consumer but before, returning false if there are no more elements, which also implies that the consumer has not been called.
Putting it all together, the implementation may look like
public class MyArraySpliterator implements Spliterator<Long> {
private final long[] numbers;
private int currentPosition, endPosition;
public MyArraySpliterator(long[] numbers) {
this(numbers, 0, numbers.length);
}
public MyArraySpliterator(long[] numbers, int start, int end) {
this.numbers = numbers;
currentPosition=start;
endPosition=end;
}
#Override
public boolean tryAdvance(Consumer<? super Long> action) {
if(currentPosition < endPosition) {
action.accept(numbers[currentPosition++]);
return true;
}
return false;
}
#Override
public long estimateSize() {
return endPosition - currentPosition;
}
#Override
public int characteristics() {
return ORDERED|NONNULL|SIZED|SUBSIZED;
}
#Override
public Spliterator<Long> trySplit() {
if(estimateSize()<=1000) return null;
int middle = (endPosition + currentPosition)>>>1;
MyArraySpliterator prefix
= new MyArraySpliterator(numbers, currentPosition, middle);
currentPosition=middle;
return prefix;
}
}
But of course, it’s recommended to provide a specialized forEachRemaining implementation, where possible:
#Override
public void forEachRemaining(Consumer<? super Long> action) {
int pos=currentPosition, end=endPosition;
currentPosition=end;
for(;pos<end; pos++) action.accept(numbers[pos]);
}
As a final note, for the task of summing longs from an array, a Spliterator.OfLong and a LongStream is preferred and that work has already been done, see Arrays.spliterator() and LongStream.sum(), making the whole task as simple as Arrays.stream(numbers).sum().
I have the following setup: The mapper outputs records with key type K1 and value type V1, K1 being WritableComparable. The combiner thus gets K1 and Iterable<V1> as its input. It then does an aggregation and outputs exactly one K1, V1 record. The reducer takes the input from the combiners, again being K1, Iterable<V1>. To my understanding, there must exist exactly one K1, Iterable<V1> pair for each individual K1 at the Reduce phase. The reducer then outputs exactly one K2, V2. K2 is WritableComparable again.
My problem now is: I get multiple K2, V2 in my output files, even in the same file! The compare methods of my key classes are correct, I double-checked it. What is going wrong here? Do I also have to implement equals and hashCode? I thought equality is carried-out via comparing and checking if the compare result is 0.
Or are there other things I forgot?
Here are the key implementations:
The writable the key inherits from:
package somepackage;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class SomeWritable implements Writable {
private String _string1;
private String _string2;
public SomeWritable() {
super();
}
public String getString1() {
return _string1;
}
public void setString1(final String string1) {
_string1 = string1;
}
public String getString2() {
return _string2;
}
public void setString2(final String string2) {
_string2 = string2;
}
#Override
public void write(final DataOutput out) throws IOException {
out.writeUTF(_string1);
out.writeUTF(_string2);
}
#Override
public void readFields(final DataInput in) throws IOException {
_string1 = in.readUTF();
_string2 = in.readUTF();
}
}
The key I use:
package somepackage;
import static org.apache.commons.lang.ObjectUtils.compare;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
public class SomeKey extends SomeWritable implements
WritableComparable<SomeKey> {
private String _someOtherString;
public String getSomeOtherString() {
return _someOtherString;
}
public void setSomeOtherString(final String someOtherString) {
_someOtherString = someOtherString;
}
#Override
public void write(final DataOutput out) throws IOException {
super.write(out);
out.writeUTF(_someOtherString);
}
#Override
public void readFields(final DataInput in) throws IOException {
super.readFields(in);
_someOtherString = in.readUTF();
}
#Override
public int compareTo(final SomeKey o) {
if (o == null) {
return 1;
}
if (o == this) {
return 0;
}
final int c1 = compare(_someOtherString, o._someOtherString);
if (c1 != 0) {
return c1;
}
final int c2 = compare(getString1(), o.getString1());
if (c2 != 0) {
return c2;
}
return compare(getString2(), o.getString2());
}
}
I solved the problem: to make sure the same key is always distributed to the same reducer, hashCode() for the key must be implemented based on the current values in the key. Even if they are mutable. With this in place, everything works.
One must then be extremely careful not using these types in sets or as keys in maps etc.
I'm just looking for some leeway into how I would get the empty methods below to respond to my harcoded arrayList (and HashMap if needed.).
I'll understand if no one can help me out directly, just some good advice.
import java.io.IOException;
import java.text.ParseException;
import java.util.Comparator;
import java.util.InputMismatchException;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.ListIterator;
import java.util.Collections;
import java.util.Set;
#SuppressWarnings("unused")
public class Inventory
{
Scanner in = new Scanner(System.in);
ArrayList<Sellable> groceries;
HashMap<String, Integer> stock;
public Inventory()
{
groceries = new ArrayList<Sellable>();
stock = new HashMap <String,Integer> ();
//HARDCODING...:
Sellable n1 = new Produce("Corn", 3, 3.00);
Sellable n2 = new Snack("Natural Popcorn Seeds", 2.50);
Sellable n3 = new Produce("Potatoes", 3, 3.00);
Sellable n4 = new Snack("Organic Potato Chips", 2.50);
Sellable n5 = new Produce("Apples", 5, 1.75);
Sellable n6 = new Snack("Apple Juice - 128 oz.", 3.50);
Sellable n7 = new Produce("Oranges", 5, 1.75);
Sellable n8 = new Snack("Orange Juice - 128 oz.", 3.50);
//ADD TO HASHMAP
groceries.add(n1);
groceries.add(n2);
groceries.add(n3);
groceries.add(n4);
groceries.add(n5);
groceries.add(n6);
groceries.add(n7);
groceries.add(n8);
//PUT UP FOR PRINTING
stock.put(n1.getName(), 50);
stock.put(n2.getName(), 100);
stock.put(n3.getName(), 50);
stock.put(n4.getName(), 100);
stock.put(n5.getName(), 50);
stock.put(n6.getName(), 100);
stock.put(n7.getName(), 50);
stock.put(n8.getName(), 100);
}
public void add(Sellable SE)
{
}
public boolean decrementStock(String name)
{
}
public boolean decrementStock(Sellable SE)
{
}
public boolean incrementStock(String SE)
{
}
public void add(Sellable SE)
{
groceries.add(SE);
}
public void decrementStock(String name)
{
Integer val = stock.get(name);
if(val != null) stock.put(name, val--);
}
public void decrementStock(Sellable SE)
{
decrementStock(SE.getName());
}
public void incrementStock(String SE)
{
Integer val = stock.get(name);
if(val != null) stock.put(name, val++);
}
I switched the boolean methods to void because I didn't know which are the specification about how a boolean value should be chosen as return value.
(How) Can I use Bigram Features with the OpenNLP Document Classifier?
I have a collection of very short documents (titles, phrases, and sentences), and I would like to add bigram features, of the kind used in the tool LibShortText
http://www.csie.ntu.edu.tw/~cjlin/libshorttext/
is this possible?
The documentation only explains how to do this using the Name Finder using the
BigramNameFeatureGenerator()
and not the Document Classifier
I believe the trainer and classifier allow for custom featuregenerators in their methods, however they must be implemntation of FeatureGenerator, and BigramFeatureGenerator is not an impl of that. So I made a quick impl as an inner class below.. so Try this (untested) code when you get a chance
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import opennlp.tools.doccat.DoccatModel;
import opennlp.tools.doccat.DocumentCategorizerME;
import opennlp.tools.doccat.DocumentSample;
import opennlp.tools.doccat.DocumentSampleStream;
import opennlp.tools.doccat.FeatureGenerator;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
public class DoccatUsingBigram {
public static void main(String[] args) throws IOException {
InputStream dataIn = new FileInputStream(args[0]);
try {
ObjectStream<String> lineStream =
new PlainTextByLineStream(dataIn, "UTF-8");
//here you can use it as part of building the model
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
DoccatModel model = DocumentCategorizerME.train("en", sampleStream, 10, 100, new MyBigramFeatureGenerator());
///now you would use it like this
DocumentCategorizerME classifier = new DocumentCategorizerME(model);
String[] someData = "whatever you are trying to classify".split(" ");
Collection<String> bigrams = new MyBigramFeatureGenerator().extractFeatures(someData);
double[] categorize = classifier.categorize(bigrams.toArray(new String[bigrams.size()]));
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
}
}
public static class MyBigramFeatureGenerator implements FeatureGenerator {
#Override
public Collection<String> extractFeatures(String[] text) {
return generate(Arrays.asList(text), 2, "");
}
private List<String> generate(List<String> input, int n, String separator) {
List<String> outGrams = new ArrayList<String>();
for (int i = 0; i < input.size() - (n - 2); i++) {
String gram = "";
if ((i + n) <= input.size()) {
for (int x = i; x < (n + i); x++) {
gram += input.get(x) + separator;
}
gram = gram.substring(0, gram.lastIndexOf(separator));
outGrams.add(gram);
}
}
return outGrams;
}
}
}
hope this helps...
You can use NGramFeatureGenerator.java class in OpenNLP[1] for you use case.
[1] https://github.com/apache/opennlp
Thanks,
Madhawa