How to get size of stream and return the original stream in java8? - java-8

I am trying to do something like below:
Stream<Student> allStudent=studentRepo.findAll()
long count=allStudent.count();
then
return allStudent
But problem : count() is an terminal operation and after that i am not able to return the stream.
The reason for doing this is to stream all student record over Kafka and at the same time send the record count to the consumer.

Well, if the stream has SIZED characteristic, you can use the size from the spliterator object:
Spliterator<Integer> spliterator = stream.spliterator();
long count = spliterator.getExactSizeIfKnown();
...
return StreamSupport.stream(spliterator, stream.isParallel());
But if getExactSizeIfKnown returns -1 - eigher try to save the stream to an intermediate collection, get size and then use the stream() method to return the data, or think of something else.

Count using peek() to a list then stream that:
AtomicInteger count = new AtomicInteger();
allStudent = allStudent
.peek(o -> i.incrementAndGet())
.collect(toList())
.stream();
// do something with count
return allStudent;
Or more mundane:
List<Student> students = allStudent.stream().collect(toList());
long count = students.size();
return students.stream();

Related

Is it possible to get top 10 from ktable\kstream?

I have a topic with a String key which is a signal type and Signal value which is a class like this
public clas Signal {
public final int deviceId;
public final int value;
...
}
Each device can send signal values which raise or fall with time without a pattern.
Is it possible to get top 10 devices with max signal value at all period of time by each type (key of the topic) as a KTable<String,Signal>? Would it helped if all signal values were raising?
Topic structure can be changed if needed.
It is possible to do with Kafka Streams for the case when values are always raising, for example. It is needed to create own Top10 aggregate, which stores top 10 and updates it on add call:
final var builder = new StreamsBuilder();
final var topTable = builder
.table(
SignalChange.TOPIC_NAME,
Consumed.with(Serdes.String(), new SignalChange.Serde())
).toStream()
.groupByKey()
.aggregate(
() -> new Top10(),
(k, v, top10) -> top10.add(v),
Materialized.with(Serdes.String(), new Top10.Serde())
);
topTable can then be joined with any stream requesting for the top.

Assign random UUID on a key's first occurrence in a stream

I'm looking for a solution on how to assign a random UUID to a key only on its first occurrence in a stream.
Example:
time key value assigned uuid
| 1 A fff17a1e-9943-11eb-a8b3-0242ac130003
| 2 B f01d2c42-9943-11eb-a8b3-0242ac130003
| 3 C f8f1e880-9943-11eb-a8b3-0242ac130003
| 1 X fff17a1e-9943-11eb-a8b3-0242ac130003 (same as above)
v 1 Y fff17a1e-9943-11eb-a8b3-0242ac130003 (same as above)
As you can see fff17a1e-9943-11eb-a8b3-0242ac130003 is assigned to key "1" on its first occurrence. This uuid is subsequently reused on its second and third occurrence. The order doesn't matter, though. There is no seed for the generated uuid either.
My idea was to use a leftJoin() with a KStream and a KTable with key/uuid mappings. If the right side of the leftJoin is null I have to create a new UUID and add it to the mapping table. However, I think this does not work when there are several new entries with the same key in a short period of time. I guess this will create several UUIDs for the same key.
Is there an easy solution for this or is this simply not possible with streaming?
I don't think you need a join in your use case because joins are to merge to different streams that arrive with equal IDs. You said that you receive just one stream of events. So, your use case is an aggregation over one stream.
What I understood of your question is that you receive events: A, B, C, ... Then you want to assign some ID. You say that the ID is random. So, this is very uncertain. If it is random how would you know that A -> fff17a1e-9943-11eb-a8b3-0242ac130003 and X -> fff17a1e-9943-11eb-a8b3-0242ac130003 (the same). I suppose that you might have a seed to generate this UUID. And then you create a key based also on this seed.
I suggest you start with this sample of word count. then on the first map:
.map((key, value) -> new KeyValue<>(value, value))
you replace it with your map function. Something like this:
.map((k, v) -> {
if (v.equalsIgnoreCase("A")) {
return new KeyValue<String, ValueWithUUID>("1", new ValueWithUUID(v));
} else if (v.equalsIgnoreCase("B")) {
return new KeyValue<String, ValueWithUUID>("2", new ValueWithUUID(v));
} else {
return new KeyValue<String, ValueWithUUID>("0", new ValueWithUUID(v));
}
})
...
class ValueWithUUID {
String value;
String uuid;
public ValueWithUUID(String value) {
this.value = value;
// generate your UUID based on the value. It is random, but as you show in your question it might have a seed.
this.uuid = generateRandomUUIDWithSeed();
}
public String generateRandomUUIDWithSeed() {
return "fff17a1e-9943-11eb-a8b3-0242ac130003";
}
}
Then you decide if you want to use a windowed aggregation, every 30 seconds for instance. Or a non-windowing aggregation that updates the results for every event that arrives. Here is one nice example.
You can aggregate the raw stream as ktable, in the processing, generate or reuse the uuid; then use the stream of ktable.
final KStream<String, String> streamWithoutUUID = builder.stream("topic_name");
KTable<String, String> tableWithUUID = streamWithoutUUID.groupByKey().aggregate(
() -> "",
(k, v, t) -> {
if (!t.startsWith("uuid:")) {
return "uuid:" + "call your buildUUID function here" + ";value:" + v;
} else {
return t.split(";", 2)[0] + ";value:" + v;
}
},
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as("state_name")
.withKeySerde(Serdes.String()).withValueSerde(Serdes.String()));
final KStream<String, String> streamWithUUID = tableWithUUID.toStream();

Is it possible for a kafka steams application to write multiple outputs from a single input?

I'm unsure if kafka-streams is the correct solution for a problem I'm trying to solve. I'd like to be able to use it because of the parallelism and fault tolerance it provides, but I'm struggling to come up with a way to achieve a desired processing pipeline.
The pipeline is something like this:
A record of some type arrives on an input topic
Information in this record is used to perform a database query, which returns many results
I'd like to be able to write out each result as an individual record, with its own key, rather than as a collection of results in a single record.
Ignoring the single output record per result requirement for a moment, I have code that looks like this:
Serde<String> stringSerde = Serdes.String();
JsonSerde<MyInput> inputSerde = new JsonSerde<>();
JsonSerde<List<MyOutput>> outputSerde = new JsonSerde<>();
Consumed<String, MyInput> consumer = Consumed.with(stringSerde, inputSerde);
KStream<String, MyInput> receiver = builder.stream("input-topic", consumer);
KStream<String, List<MyOutput>> outputs = receiver.mapValues(this::mapInputToManyOutputs);
outputs.to("output-topic", Produced.with(stringSerde, outputSerde));
This is simple enough, 1 message in, 1 message (albeit a collection) out.
What I'd like to be able to do is something like:
Serde<String> stringSerde = Serdes.String();
JsonSerde<MyInput> inputSerde = new JsonSerde<>();
JsonSerde<MyOutput> outputSerde = new JsonSerde<>();
Consumed<String, MyInput> consumer = Consumed.with(stringSerde, inputSerde);
KStream<String, MyInput> receiver = builder.stream("input-topic", consumer);
KStream<String, List<MyOutput>> outputs = receiver.mapValues(this::mapInputToManyOutputs);
KStream<String, MyOutput> sink = outputs.???
sink.to("output-topic", Produced.with(stringSerde, outputSerde));
I cannot come up with anything sensible for an operation or operations to perform on the outputs stream.
Any suggestions? Or is kafka-streams maybe not the right solution to a problem like this?
yes, it's possible, for that you need to use KStream flatMap transformation. FlatMap transforms each record of the input stream into zero or more records in the output stream (both key and value type can be altered arbitrarily)
kStream = kStream.flatMap(
(key, value) -> {
List<KeyValue<String, MyOutput>> result = new ArrayList<>();
// do your logic here
return result;
});
kStream.to("output-topic", Produced.with(stringSerde, outputSerde));
Thanks, Vasiliy, flatMap was indeed what I needed. I looked at it earlier, thought it was the right operation but then got confused and mistakenly discarded it.
Combining what I had before with your suggestion, the following works, assuming MyOutput implements a method called getKey():
Serde<String> stringSerde = Serdes.String();
JsonSerde<MyInput> inputSerde = new JsonSerde<>();
JsonSerde<MyOutput> outputSerde = new JsonSerde<>();
Consumed<String, MyInput> consumer = Consumed.with(stringSerde, inputSerde);
KStream<String, MyInput> receiver = builder.stream("input-topic", consumer);
KStream<String, List<MyOutput>> outputs = receiver.mapValues(this::mapInputToManyOutputs);
KStream<String, MyOutput> sink = outputs.flatMap(((key, value) ->
value.stream().map(o -> new KeyValue<>(o.getKey(), o)).collect(Collectors.toList())));
sink.to("output-topic", Produced.with(stringSerde, outputSerde));

Caching Java 8 stream

Suppose I have a list which I perform multiple stream operations on.
bobs = myList.stream()
.filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
...
and
tonies = myList.stream()
.filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())
Can I not just do:
Stream<Person> stream = myList.stream();
which then means I can do:
bobs = stream.filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
tonies = stream.filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())
NO, you can't. One Stream can only be use one time It will throw below error when you will try to reuse:
java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
As per Java Docs:
A stream should be operated on (invoking an intermediate or terminal stream operation) only once.
But a neat solution to your query will be to use Stream Suplier. It looks like below:
Supplier<Stream<Person>> streamSupplier = myList::stream;
bobs = streamSupplier.get().filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
tonies = streamSupplier.get().filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())
But again, every get call will return a new stream.
No you can't, doc says:
A stream should be operated on (invoking an intermediate or terminal
stream operation) only once.
But you can use a single stream by filtering all elements you want once and then group them the way you need:
Set<String> names = ...; // construct a sets containing bob, tony, etc
Map<String,List<Person>> r = myList.stream()
.filter(p -> names.contains(p.getName())
.collect(Collectors.groupingBy(Person::getName);
List<Person> tonies = r.get("tony");
List<Person> bobs = r.get("bob");
Well, what you can do in your case is generate dynamic stream pipelines. Assuming that the only variable in your pipeline is the name of the person that you filter by.
We can represent this as a Function<String, Stream<Person>> as in the following :
final Function<String, Stream<Person>> pipelineGenerator = name -> persons.stream().filter(person -> Objects.equals(person.getName(), name));
final List<Person> bobs = pipelineGenerator.apply("bob").collect(Collectors.toList());
final List<Person> tonies = pipelineGenerator.apply("tony").collect(Collectors.toList());
As already mentioned a given stream should be operated upon only once.
I can understand the "idea" of caching a reference to an object if you're going to refer to it more than once, or to simply avoid creating more objects than necessary.
However, you should not be concerned when invoking myList.stream() every time you need to query again as creating a stream, in general, is a cheap operation.

Java 8 stream. Convert array to LinkedHashMap and continue work with entrySet in single stream

I need to count the frequency of words in array. And handle result(I think it must be an entrySet )... An order of the entrySet is also important. So I suppose, that array must be converted to LinkedHashMap...
Map<String, Integer> map = new LinkedHashMap<>();
for(String word : words) {
Integer count = map.get(word);
count = (count == null) ? 1: ++count;
map.put(word, count);
}
I found next solution but the order is not respected.
And is it possible use that stream without collect operation(but with map or flatMap)?
Map<String, Long> collect =
wordsList.stream().collect(groupingBy(Function.identity(), counting()));
Thank you.
you're close but you'll need to use the groupingBy collector that takes 3 arguments like this:
LinkedHashMap<String, Long> resultSet =
wordsList.stream()
.collect(groupingBy(Function.identity(),
LinkedHashMap::new,
counting()));
The second argument to the groupingBy collector being the supplier and in your case since you want a LinkedHashMap then that's what you'll need to provide as shown above.

Resources