String stream joining: stream has already been operated upon or closed - java-8

Using java8 to concatenate an object certain field value with a "_". The last line in the code throws an "stream has already been operated upon or closed".
Stream<Field> fields = ...
Stream<String> exclusions = ...
Stream<String> stringStream = fields.filter(f -> exclusions.anyMatch(e -> e.equals(f.getName())))
.map(f -> {
f.setAccessible(true);
Object value = null;
try {
value = f.get(obj);
} catch (IllegalAccessException e) {
e.printStackTrace();
}
return value;
})
.filter(v -> v != null)
.map(Object::toString);
String suffix = stringStream.collect(Collectors.joining("_"));
EDIT: I have tried this with:
List<Foo> list = new ArrayList<>();
list.stream().filter(item -> item != null).map(item -> {
String value = null;
return value;
}).filter(item -> item != null).map(item -> {
String value = null;
return value;
}).collect(Collectors.joining(""));
And there is no such exception.

How many times is the first filter called? More then once right? The exclusions that you use in the first call to filter is consumed via anyMatch; thus the second time you try to use it - you get the exception.
The way to solve it, would be to stream on every single filter operation:
filter(f -> sourceOfExclusions.stream().anyMatch...

Related

how to "group by" messages based on a header value

I'm trying to create a file zip based on on the file extension which follows this standard: filename.{NUMBER}, what I'm doing is reading a folder, grouping by .{number} and then creating a unique file .zip with that .num at the end, for example:
folder /
file.01
file2.01
file.02
file2.02
folder -> /processed
file.01.zip which contains -> file.01, file2.01
file02.zip which contains -> file.02, file2.02
what I done is using an outboundGateway, splitting files, enrich headers reading the file extension, and then aggregating reading that header, but doesn't seems to work properly.
public IntegrationFlow integrationFlow() {
return flow
.handle(Ftp.outboundGateway(FTPServers.PC_LOCAL.getFactory(), AbstractRemoteFileOutboundGateway.Command.MGET, "payload")
.fileExistsMode(FileExistsMode.REPLACE)
.filterFunction(ftpFile -> {
int extensionIndex = ftpFile.getName().indexOf(".");
return extensionIndex != -1 && ftpFile.getName().substring(extensionIndex).matches("\\.([0-9]*)");
})
.localDirectory(new File("/tmp")))
.split() //receiving an iterator, creates a message for each file
.enrichHeaders(headerEnricherSpec -> headerEnricherSpec.headerExpression("warehouseId", "payload.getName().substring(payload.getName().indexOf('.') +1)"))
.aggregate(aggregatorSpec -> aggregatorSpec.correlationExpression("headers['warehouseId']"))
.transform(new ZipTransformer())
.log(message -> {
log.info(message.getHeaders().toString());
return message;
});
}
it's giving me a single message containing all files, I should expect 2 messages.
due to the nature of this dsl, I have a dynamic number of files, so I couldn't count messages (files) ending with the same number, and I don't think timeout could be a good release Strategy, I just wrote the code on my own without writing to disk:
.<List<File>, List<Message<ByteArrayOutputStream>>>transform(files -> {
HashMap<String, ZipOutputStream> zipOutputStreamHashMap = new HashMap<>();
HashMap<String, ByteArrayOutputStream> zipByteArrayMap = new HashMap<>();
ArrayList<Message<ByteArrayOutputStream>> messageList = new ArrayList<>();
files.forEach(file -> {
String warehouseId = file.getName().substring(file.getName().indexOf('.') + 1);
ZipOutputStream warehouseStream = zipOutputStreamHashMap.computeIfAbsent(warehouseId, s -> new ZipOutputStream(zipByteArrayMap.computeIfAbsent(s, s1 -> new ByteArrayOutputStream())));
try {
warehouseStream.putNextEntry(new ZipEntry(file.getName()));
FileInputStream inputStream = new FileInputStream(file);
byte[] bytes = new byte[4096];
int length;
while ((length = inputStream.read(bytes)) >= 0) {
warehouseStream.write(bytes, 0, length);
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
});
zipOutputStreamHashMap.forEach((s, zipOutputStream) -> {
try {
zipOutputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
});
zipByteArrayMap.forEach((key, byteArrayOutputStream) -> {
messageList.add(MessageBuilder.withPayload(byteArrayOutputStream).setHeader("warehouseId", key).build());
});
return messageList;
})
.split()
.transform(ByteArrayOutputStream::toByteArray)
.handle(Ftp.outboundAdapter(FTPServers.PC_LOCAL.getFactory(), FileExistsMode.REPLACE)
......

How can I convert Stream to IntStream

I want to implement a static method called youngWinners that given a Stream<Winner> returns a new Stream<Winner> containing the winners that are younger than 35, ordered alphabetically.
Inside my file I have : index, year, age, name, movie.
My problem is that I don't know how I can convert Stream into IntStream to compare the age with 35. I also got a bit confused, do I have to use comparators to do this or not ?
public static Stream<Winner> youngWinners(Stream<Winner> young) {
// Stream<Winner> youngWin = young;
String[] toString = young.toArray(s -> new String[s]);
Arrays.stream(toString).flatMap((<any> f) -> {
try {
return Files.lines(Paths.get(f))
.filter(age -> int (age) <= 35 )
.mapToInt(a -> a.getWinnerage())
.map(WinneropsDB::new);
} catch (Exception e) {
System.out.println("error");
return null;
}
});
return null;
}
you don't need an IntStream at all.
Simply do:
return young.filter(w -> w.getAge() < 35)
.sorted(Comparator.comparing(Winner::getName));

adding parallell to a stream causes NullPointerException

I'm trying to get my head around Java streams. It was my understanding that they provide an easy way to parallellize behaviour, and that also not all operations benefit from parallellization, but that you always have the option to do it by just slapping .parallell() on to an existing stream. This might make the stream go slower in some cases, or return the elements in a different order at the end etc, but you always have the option to parallellize a stream. That's why I got confused when I changed this method:
public static List<Integer> primeSequence() {
List<Integer> list = new LinkedList<Integer>();
IntStream.range(1, 10)
.filter(x -> isPrime(x))
.forEach(list::add);
return list;
}
//returns {2,3,5,7}
to this:
public static List<Integer> primeSequence() {
List<Integer> list = new LinkedList<Integer>();
IntStream.range(1, 10).parallel()
.filter(x -> isPrime(x))
.forEach(list::add);
return list;
}
//throws NullPointerException();
I thought all streams were serial unless otherwise stated and parallel() just made then execute in parallel. What am I missing here? Why does it throw an Exception?
There is one significant issue with your initial primeSequence method implementation - you mix stream iteration with outer list modification. You should avoid using streams that way, otherwise you will face a lot of problems. Like the one you have described. If you take a look at how add(E element) method is implemented you will see something like this:
public boolean add(E e) {
this.linkLast(e);
return true;
}
void linkLast(E e) {
LinkedList.Node<E> l = this.last;
LinkedList.Node<E> newNode = new LinkedList.Node(l, e, (LinkedList.Node)null);
this.last = newNode;
if (l == null) {
this.first = newNode;
} else {
l.next = newNode;
}
++this.size;
++this.modCount;
}
If you use CopyOnWriteArrayList instead of a LinkedList in your example, there will be no NullPointerException thrown - only because CopyOnWriteArrayList uses locking for multithread execution synchronization:
public boolean add(E e) {
ReentrantLock lock = this.lock;
lock.lock();
boolean var6;
try {
Object[] elements = this.getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
this.setArray(newElements);
var6 = true;
} finally {
lock.unlock();
}
return var6;
}
But it is still not the best way to utilize parallel stream.
Correct way to use Stream API
Consider following modification to your code:
public static List<Integer> primeSequence() {
return IntStream.range(1, 10)
.parallel()
.filter(x -> isPrime(x))
.boxed()
.collect(Collectors.toList());
}
Instead of modifying some outer list (of any kind) we are collecting the result and return a final list. You can transform any list to a stream using .stream() method and you don't have to worry about initial list - all operation you will apply to that list won't modify the input and the result will be a copy of the input list.
I hope it helps.

Java8: Filter and map on same method output

We are trying to refactor below code to java 8:
List<String> list = new ArrayList<>();
Iterator<Obj> i = x.iterator();
while (i.hasNext()) {
String y = m(i.next().getKey());
if (y != null) {
list.add(y);
}
}
return list;
So far we have come up with:
return x.stream()
.filter(s -> m(s.getKey()) != null)
.map(t -> m(t.getKey()))
.collect(Collectors.toList());
But the method m() is being invoked twice here. Is there any way around?
Well you can do the filtering after the mapping step:
x.stream()
.map(s -> m(s.getKey()))
.filter(Objects::nonNull)
.collect(Collectors.toList());

Load Json Data using Pig

I am trying to extract data from below mention json format by pig using jsonLoader():
{"Partition":"10","Key":"618897","Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]}
{"Partition":"11","Key":"618900","Properties2":[{"K":"A","T":"String","V":"W”"},{"K":"B","T":"String","V":"X"}, {"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}
Right now I am able to extract data from “partition” ,“key” and “V” for every array objects with the following code:
A= LOAD '/home/hduser/abc.jon' Using JsonLoader('Partition:chararray,Key:chararray,Properties2:{(K:chararray,T:chararray,V:chararray)},Timestamp:chararray');
B= foreach A generate $0,$1,BagToString(Properties2.V,'\t') as vl:chararray;
store B into './Result/outPut2';
From above code I am getting "Properties2" array value on the sequence basis not column basis, it is creating problem whenever sequence changed or new object comes in existence.
Please help me to extract data on the basis of column( K values.)
My Output
Expected Output
Thanks In Advance
You have two options here
1.Use elephant-bird which will give you a map of key and value.
A = LOAD '/apps/pig/json_sample' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
B = FOREACH A GENERATE json#'Partition',json#'Key',json#'Properties2';
dump B;
will give you an output of :
(10,618897,{([T#String,K#A,V#M ]),([T#String,K#B,V#N]),([T#String,K#D,V#O])})
(11,618900,{([T#String,K#A,V#W”]),([T#String,K#B,V#X]),([T#String,K#C,V#Y]),([T#String,K#D,V#Z])})
Or you have to write a custom loader which has to do this
a).It should know what is the correct order of values that will be coming
for the key K
b).Go through each of these values and see if the json is missing any of this key and return an empty/null char for that location.
Am posting the getNext() method of the CustomJsonLoader which will do the same:
#Override
public Tuple getNext() throws IOException {
// TODO Auto-generated method stub
try {
boolean notDone = in.nextKeyValue();
if (!notDone) {
return null;
}
Text value = (Text) in.getCurrentValue();
List<String> valueList = new ArrayList<String>();
if (value != null) {
String jsonString = value.toString();
System.out.println(jsonString);
JSONParser parser = new JSONParser();
JSONObject obj = null;
try {
obj = (JSONObject) parser.parse(jsonString);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("obj is "+obj);
if (obj != null) {
String partition = (String) obj.get("Partition");
String key = (String) obj.get("Key");
valueList.add(partition);
valueList.add(key);
JSONArray innArr = (JSONArray) obj.get("Properties2");
char[] innKeys = new char[] { 'A', 'B', 'C', 'D' };
Map<String,String> keyMap = new HashMap<String,String>();
for (Object innObj : innArr) {
JSONObject jsonObj = (JSONObject) innObj;
keyMap.put(jsonObj.get("K")+"",jsonObj.get("V")+"");
}
for (int i = 0; i < innKeys.length; i++) {
char ch = innKeys[i];
if (keyMap.containsKey(ch+"")) {
valueList.add(keyMap.get(ch+""));
}else{
valueList.add("");
}
}
Tuple t = tupleFactory.newTuple(valueList);
return t;
}
}
return null;
} catch (InterruptedException e) {
}
}
and register it and run :
REGISTER udf/CustomJsonLoader.jar
A = LOAD '/apps/pig/json_sample' USING CustomJsonLoader();
DUMP A;
(10,618897,M,N,,O)
(11,618900,W,X,Y,Z)
Hope this helps!

Resources