Assign random UUID on a key's first occurrence in a stream

Assign random UUID on a key's first occurrence in a stream - apache-kafka-streams

I'm looking for a solution on how to assign a random UUID to a key only on its first occurrence in a stream.
Example:
time key value assigned uuid
| 1 A fff17a1e-9943-11eb-a8b3-0242ac130003
| 2 B f01d2c42-9943-11eb-a8b3-0242ac130003
| 3 C f8f1e880-9943-11eb-a8b3-0242ac130003
| 1 X fff17a1e-9943-11eb-a8b3-0242ac130003 (same as above)
v 1 Y fff17a1e-9943-11eb-a8b3-0242ac130003 (same as above)
As you can see fff17a1e-9943-11eb-a8b3-0242ac130003 is assigned to key "1" on its first occurrence. This uuid is subsequently reused on its second and third occurrence. The order doesn't matter, though. There is no seed for the generated uuid either.
My idea was to use a leftJoin() with a KStream and a KTable with key/uuid mappings. If the right side of the leftJoin is null I have to create a new UUID and add it to the mapping table. However, I think this does not work when there are several new entries with the same key in a short period of time. I guess this will create several UUIDs for the same key.
Is there an easy solution for this or is this simply not possible with streaming?

I don't think you need a join in your use case because joins are to merge to different streams that arrive with equal IDs. You said that you receive just one stream of events. So, your use case is an aggregation over one stream.
What I understood of your question is that you receive events: A, B, C, ... Then you want to assign some ID. You say that the ID is random. So, this is very uncertain. If it is random how would you know that A -> fff17a1e-9943-11eb-a8b3-0242ac130003 and X -> fff17a1e-9943-11eb-a8b3-0242ac130003 (the same). I suppose that you might have a seed to generate this UUID. And then you create a key based also on this seed.
I suggest you start with this sample of word count. then on the first map:
.map((key, value) -> new KeyValue<>(value, value))
you replace it with your map function. Something like this:
.map((k, v) -> {
if (v.equalsIgnoreCase("A")) {
return new KeyValue<String, ValueWithUUID>("1", new ValueWithUUID(v));
} else if (v.equalsIgnoreCase("B")) {
return new KeyValue<String, ValueWithUUID>("2", new ValueWithUUID(v));
} else {
return new KeyValue<String, ValueWithUUID>("0", new ValueWithUUID(v));
}
})
...
class ValueWithUUID {
String value;
String uuid;
public ValueWithUUID(String value) {
this.value = value;
// generate your UUID based on the value. It is random, but as you show in your question it might have a seed.
this.uuid = generateRandomUUIDWithSeed();
}
public String generateRandomUUIDWithSeed() {
return "fff17a1e-9943-11eb-a8b3-0242ac130003";
}
}
Then you decide if you want to use a windowed aggregation, every 30 seconds for instance. Or a non-windowing aggregation that updates the results for every event that arrives. Here is one nice example.

You can aggregate the raw stream as ktable, in the processing, generate or reuse the uuid; then use the stream of ktable.
final KStream<String, String> streamWithoutUUID = builder.stream("topic_name");
KTable<String, String> tableWithUUID = streamWithoutUUID.groupByKey().aggregate(
() -> "",
(k, v, t) -> {
if (!t.startsWith("uuid:")) {
return "uuid:" + "call your buildUUID function here" + ";value:" + v;
} else {
return t.split(";", 2)[0] + ";value:" + v;
}
},
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as("state_name")
.withKeySerde(Serdes.String()).withValueSerde(Serdes.String()));
final KStream<String, String> streamWithUUID = tableWithUUID.toStream();

Related

Is it possible to get top 10 from ktable\kstream?

I have a topic with a String key which is a signal type and Signal value which is a class like this
public clas Signal {
public final int deviceId;
public final int value;
...
}
Each device can send signal values which raise or fall with time without a pattern.
Is it possible to get top 10 devices with max signal value at all period of time by each type (key of the topic) as a KTable<String,Signal>? Would it helped if all signal values were raising?
Topic structure can be changed if needed.

It is possible to do with Kafka Streams for the case when values are always raising, for example. It is needed to create own Top10 aggregate, which stores top 10 and updates it on add call:
final var builder = new StreamsBuilder();
final var topTable = builder
.table(
SignalChange.TOPIC_NAME,
Consumed.with(Serdes.String(), new SignalChange.Serde())
).toStream()
.groupByKey()
.aggregate(
() -> new Top10(),
(k, v, top10) -> top10.add(v),
Materialized.with(Serdes.String(), new Top10.Serde())
);
topTable can then be joined with any stream requesting for the top.

How to get new userinput in a stream while its running using Java8

I need to validate user input and if it doesn't meet the conditions then I need to replace it with correct input. So far I am stuck on two parts. Im fairly new to java8 and not so familiar with all the libraries so if you can give me advice on where to read up more on these I would appreciate it.
List<String> input = Arrays.asList(args);
List<String> validatedinput = input.stream()
.filter(p -> {
if (p.matches("[0-9, /,]+")) {
return true;
}
System.out.println("The value has to be positve number and not a character");
//Does the new input actually get saved here?
sc.nextLine();
return false;
}) //And here I am not really sure how to map the String object
.map(String::)
.validatedinput(Collectors.toList());

This type of logic shouldn't be done with streams, a while loop would be a good candidate for it.
First, let's partition the data into two lists, one list representing the valid inputs and the other representing invalid inputs:
Map<Boolean, List<String>> resultSet =
Arrays.stream(args)
.collect(Collectors.partitioningBy(s -> s.matches(yourRegex),
Collectors.toCollection(ArrayList::new)));
Then create the while loop to ask the user to correct all their invalid inputs:
int i = 0;
List<String> invalidInputs = resultSet.get(false);
final int size = invalidInputs.size();
while (i < size){
System.out.println("The value --> " + invalidInputs.get(i) +
" has to be positive number and not a character");
String temp = sc.nextLine();
if(temp.matches(yourRegex)){
resultSet.get(true).add(temp);
i++;
}
}
Now, you can collect the list of all the valid inputs and do what you like with it:
List<String> result = resultSet.get(true);

ArrayIndexOutOfBounds, while using Java 8 streams to iterate a list

I have a List of Objects called md. Each of this objects has an activityName, a startTime and an endTime(for the activity).
I want to iterate over this list and for each activity, get the startTime and endTime.
Map<String,Long> m1 = new HashMap<String,Long>();
m1 = md
.stream()
.map(s->s.activityName)
.collect(HashMap<String,Long>::new,
(map,string)->{
String d1 = md.get(md.indexOf(string)).startTime;
String d2 = md.get(md.indexOf(string)).endTime;
.
.
.
},HashMap<String,Long>::putAll);
It gives me java.lang.ArrayIndexOutOfBoundsException: -1 when I try to get the index of string String d1 = md.get(md.indexOf(string)).startTime;
Is there any other way to simplify the code using Lambda expressions?
What if I have two activities with the same name (Drinking for ex).Will it only return the index of the first Drinking activity it finds?

It seems that you are missing that fact that once you do:
md.stream().map(s -> s.activityName)
your Stream has become Stream<String>; while your md is still List<YourObject>
And in the map operation you are trying to find a String inside md, this obviously does not exist, thus a -1.
So you need a Map<String, Long> that is activitaName -> duration it takes(could be Date/Long)
md.stream()
.collect(Collectors.toMap(s -> s.activityName, x -> {
Date start = // parse s.startTime
Date end = // parse s.endTime
return end.minus(start);
}));
Now the parsing depends on the dates you use.

how to convert forEach to lambda

Iterator<Rate> rateIt = rates.iterator();
int lastRateOBP = 0;
while (rateIt.hasNext())
{
Rate rate = rateIt.next();
int currentOBP = rate.getPersonCount();
if (currentOBP == lastRateOBP)
{
rateIt.remove();
continue;
}
lastRateOBP = currentOBP;
}
how can i use above code convert to lambda by stream of java 8? such as list.stream().filter().....but i need to operation list.

The simplest solution is
Set<Integer> seen = new HashSet<>();
rates.removeIf(rate -> !seen.add(rate.getPersonCount()));
it utilizes the fact that Set.add will return false if the value is already in the Set, i.e. has been already encountered. Since these are the elements you want to remove, all you have to do is negating it.
If keeping an arbitrary Rate instance for each group with the same person count is sufficient, there is no sorting needed for this solution.
Like with your original Iterator-based solution, it relies on the mutability of your original Collection.

If you really want distinct and sorted as you say in your comments, than it is as simple as :
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.toCollection(() ->
new TreeSet<>(Comparator.comparing(Rate::getPersonCount))));
But notice that in your example with an iterator you are not removing duplicates, but only duplicates that are continuous (I've exemplified that in the comment to your question).
EDIT
It seems that you want distinct by a Function; or in simpler words you want distinct elements by personCount, but in case of a clash you want to take the max pos.
Such a thing is not yet available in jdk. But it might be, see this.
Since you want them sorted and distinct by key, we can emulate that with:
Collection<Rate> sorted = rates.stream()
.collect(Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new))
.values();
System.out.println(sorted);
On the other hand if you absolutely need to return a TreeSet to actually denote that this are unique elements and sorted:
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new),
map -> {
TreeSet<Rate> set = new TreeSet<>(Comparator.comparing(Rate::getPersonCount));
set.addAll(map.values());
return set;
}));

This should work if your Rate type has natural ordering (i.e. implements Comparable):
List<Rate> l = rates.stream()
.distinct()
.sorted()
.collect(Collectors.toList());
If not, use a lambda as a custom comparator:
List<Rate> l = rates.stream()
.distinct()
.sorted( (r1,r2) -> ...some code to compare two rates... )
.collect(Collectors.toList());
It may be possible to remove the call to sorted if you just need to remove duplicates.

Using Java 8 streams for aggregating list objects

We are using 3 lists ListA,ListB,ListC to keep the marks for 10 students in 3 subjects (A,B,C).
Subject B and C are optional, so only few students out of 10 have marks in those subjects
Class Student{
String studentName;
int marks;
}
ListA has records for 10 students, ListB for 5 and ListC for 3 (which is also the size of the lists)
Want to know how we can sum up the marks of the students for their subjects using java 8 steam.
I tried the following
List<Integer> list = IntStream.range(0,listA.size() -1).mapToObj(i -> listA.get(i).getMarks() +
listB.get(i).getMarks() +
listC.get(i).getMarks()).collect(Collectors.toList());;
There are 2 issues with this
a) It will give IndexOutOfBoundsException as listB and listC don't have 10 elements
b) The returned list if of type Integer and I want it to be of type Student.
Any inputs will be very helpful

You can make a stream of the 3 lists and then call flatMap to put all the lists' elements into a single stream. That stream will contain one element per student per mark, so you will have to aggregate the result by student name. Something along the lines of:
Map<String, Integer> studentMap = Stream.of(listA, listB, listC)
.flatMap(Collection::stream)
.collect(groupingBy(student -> student.name, summingInt(student -> student.mark)));
Alternatively, if your Student class has getters for its fields, you can change the last line to make it more readable:
Map<String, Integer> studentMap = Stream.of(listA, listB, listC)
.flatMap(Collection::stream)
.collect(groupingBy(Student::getName, summingInt(Student::getMark)));
Then check the result by printing out the studentMap:
studentMap.forEach((key, value) -> System.out.println(key + " - " + value));
If you want to create a list of Student objects instead, you can use the result of the first map and create a new stream from its entries (this particular example assumes your Student class has an all-args constructor so you can one-line it):
List<Student> studentList = Stream.of(listA, listB, listC)
.flatMap(Collection::stream)
.collect(groupingBy(Student::getName, summingInt(Student::getMark)))
.entrySet().stream()
.map(mapEntry -> new Student(mapEntry.getKey(), mapEntry.getValue()))
.collect(toList());

I would do it as follows:
Map<String, Student> result = Stream.of(listA, listB, listC)
.flatMap(List::stream)
.collect(Collectors.toMap(
Student::getName, // key: student's name
s -> new Student(s.getName(), s.getMarks()), // value: new Student
(s1, s2) -> { // merge students with same name: sum marks
s1.setMarks(s1.getMarks() + s2.getMarks());
return s1;
}));
Here I've used Collectors.toMap to create the map (I've also assumed you have a constructor for Student that receives a name and marks).
This version of Collectors.toMap expects three arguments:
A function that returns the key for each element (here it's Student::getName)
A function that returns the value for each element (I've created a new Student instance that is a copy of the original element, this is to not modify instances from the original stream)
A merge function that is to be used when there are elements that have the same key, i.e. for students with the same name (I've summed the marks here).
If you could add the following copy constructor and method to your Student class:
public Student(Student another) {
this.name = another.name;
this.marks = another.marks;
}
public Student merge(Student another) {
this.marks += another.marks;
return this;
}
Then you could rewrite the code above in this way:
Map<String, Student> result = Stream.of(listA, listB, listC)
.flatMap(List::stream)
.collect(Collectors.toMap(
Student::getName,
Student::new,
Student::merge));

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Assign random UUID on a key's first occurrence in a stream - apache-kafka-streams

Related

Is it possible to get top 10 from ktable\kstream?

How to get new userinput in a stream while its running using Java8

ArrayIndexOutOfBounds, while using Java 8 streams to iterate a list

how to convert forEach to lambda

Using Java 8 streams for aggregating list objects

Categories

Resources