Kafka Streams DSL Predicate stream seperation

Kafka Streams DSL Predicate stream seperation - apache-kafka-streams

branch(new predicate{
business logic
if(condition)
return true
else
return false;
When the condition is false how to push to different stream. Currently creating another predicate which collects all other records which doesn't satisfy the above predicate in chain. Is there a way to do in same predicate?

for that you need to pass also second predicate that always returns true
KStream<String, String>[] branches = kStream.branch(
yourPredicate,
(String key, String value) -> true
);
branches[0].to(firstTopic);
branches[1].to(secondTopic);

Related

Spring Data JPA filtering by best match

I want to implement filtering for multiple fields of an entity which is ordered by best match. By best match I mean that the more of the filtered fields match the higher in the order the result is listed. I want this to work dynamically, so I can add more filters later on.
I have been looking for a solution for a long time now and I didn't find an elegant way to do this with JPA.
My approach is to concatenate all my predicates with or and then order them by how many of the fields match. This is done by dynamically creating a CASE statement for each possible combination of the filters (this is a powerset and leads to a lot of CASE statements). Then I give every subset a rank (= size of the subset) and then I sort by the rank in descending order. This way subsets with more elements (= filters) are ranked higher.
From a few tests I can see that I already takes up to 10s for 4 filters, so that can't be a good solution.
Here is my code:
private fun orderByBestMatch(): Specification<User?> {
return Specification<User?> { root: Root<User?>, query: CriteriaQuery<*>, builder: CriteriaBuilder ->
val benefit = getExpressionForNestedClass<String>(root, "benefit")
val umbrellaTerm = getExpressionForNestedClass<String>(root, "umbrellaTerm")
val specialization = getExpressionForNestedClass<String>(root, "specialization")
val salaryExpectation = root.get<Number>("salaryExpectation")
val matcher: CriteriaBuilder.Case<Int> = builder.selectCase()
for (set in powerSetOfsearchedFields()) {
if(set.isNotEmpty()) {
var predicate: Predicate? = when(set.elementAt(0).key) {
"umbrellaTerm" -> builder.like(umbrellaTerm, set.elementAt(0).value.toString())
"specialization" -> builder.like(specialization, set.elementAt(0).value.toString())
"benefit" -> builder.like(benefit, set.elementAt(0).value.toString())
"salaryExpectation" -> builder.equal(salaryExpectation, set.elementAt(0).value.toString())
else -> null
}
for (i in 1 until set.size) {
predicate = when(set.elementAt(1).key) {
"umbrellaTerm" -> builder.and(predicate, builder.like(umbrellaTerm, set.elementAt(1).value.toString()))
"specialization" -> builder.and(predicate, builder.like(specialization, set.elementAt(1).value.toString()))
"benefit" -> builder.and(predicate, builder.like(benefit, set.elementAt(1).value.toString()))
"salaryExpectation" -> builder.and(predicate, builder.equal(salaryExpectation, set.elementAt(1).value.toString()))
else -> null
}
}
matcher.`when`(predicate, set.size)
}
}
matcher.otherwise(0)
query.orderBy(builder.desc(matcher))
query.distinct(true)
builder.isTrue(builder.literal(true))// just here for the function to have a return value
// result?.toPredicate(root, query, builder)
}
}
This function is used in a Builder class I implemented and is appended to the Specification with an and when building the Specification.
The Specification is then passed to UserRepository.findall().
Is there a better way (maybe even an out of the box way) to implement this behaviour?
Thanks in advance

Is this written in a proper java8 way?

I have a POJO where I have a Map<String, String> field.
I need to do the below checks.
Check POJO object null
Check map is null
Get a value from a map
Parse to boolean - It is not happening in the below snippet
Return true or false.
For negative cases in first 4 statement should lead to false value.
I have something as below.
Optional<Object> optional = Optional.ofNullable(event)
.map(Event::getAttributes)
.map(attrMap -> attrMap.get("restructured"));
return optional.isPresent();
How can I do this in Java8 way? I see that if value is null, NPE is thrown. Is there any way to do as I mentioned in the steps i.e if null return false?

Optional#orElse is exactly what you need:
return Optional.ofNullable(e)
.map(Event::getAttribute)
.map(m -> m.get("restructured"))
.map(Boolean::parseBoolean)
.orElse(false);
If any of the steps produces null then result resolves to false.

Caching Java 8 stream

Suppose I have a list which I perform multiple stream operations on.
bobs = myList.stream()
.filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
...
and
tonies = myList.stream()
.filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())
Can I not just do:
Stream<Person> stream = myList.stream();
which then means I can do:
bobs = stream.filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
tonies = stream.filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())

NO, you can't. One Stream can only be use one time It will throw below error when you will try to reuse:
java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
As per Java Docs:
A stream should be operated on (invoking an intermediate or terminal stream operation) only once.
But a neat solution to your query will be to use Stream Suplier. It looks like below:
Supplier<Stream<Person>> streamSupplier = myList::stream;
bobs = streamSupplier.get().filter(person -> person.getName().equals("Bob"))
.collect(Collectors.toList())
tonies = streamSupplier.get().filter(person -> person.getName().equals("tony"))
.collect(Collectors.toList())
But again, every get call will return a new stream.

No you can't, doc says:
A stream should be operated on (invoking an intermediate or terminal
stream operation) only once.
But you can use a single stream by filtering all elements you want once and then group them the way you need:
Set<String> names = ...; // construct a sets containing bob, tony, etc
Map<String,List<Person>> r = myList.stream()
.filter(p -> names.contains(p.getName())
.collect(Collectors.groupingBy(Person::getName);
List<Person> tonies = r.get("tony");
List<Person> bobs = r.get("bob");

Well, what you can do in your case is generate dynamic stream pipelines. Assuming that the only variable in your pipeline is the name of the person that you filter by.
We can represent this as a Function<String, Stream<Person>> as in the following :
final Function<String, Stream<Person>> pipelineGenerator = name -> persons.stream().filter(person -> Objects.equals(person.getName(), name));
final List<Person> bobs = pipelineGenerator.apply("bob").collect(Collectors.toList());
final List<Person> tonies = pipelineGenerator.apply("tony").collect(Collectors.toList());

As already mentioned a given stream should be operated upon only once.
I can understand the "idea" of caching a reference to an object if you're going to refer to it more than once, or to simply avoid creating more objects than necessary.
However, you should not be concerned when invoking myList.stream() every time you need to query again as creating a stream, in general, is a cheap operation.

Java 8 Streams Filter a list based on a condition

I am trying to extract a filtered list on top of the original list based on some condition. I am using backport version of Java 8 and am not pretty sure how to do this.I get the Set from ccarReport.getCcarReportWorkflowInstances() call. I need to iterate and filter this set based on a condition match( I am comparing the date attribute in each object with the request date being passed. Below is the code
Set<CcarReportWorkflowInstance> ccarReportWorkflowInstanceSet = ccarReport.getCcarReportWorkflowInstances();
List<CcarReportWorkflowInstance> ccarReportWorkflowInstances = StreamSupport.stream(ccarReportWorkflowInstanceSet).filter(ccarReportWorkflowInstance -> DateUtils.isSameDay(cobDate, ccarReportWorkflowInstance.getCobDate()));
The routine which is doing the job
public List<CcarRepWfInstDTO> fetchReportInstances(Long reportId, Date cobDate) {
List<CcarRepWfInstDTO> ccarRepWfInstDTOs = null;
CcarReport ccarReport = validateInstanceSearchParams(reportId, cobDate);
Set<CcarReportWorkflowInstance> ccarReportWorkflowInstanceSet = ccarReport.getCcarReportWorkflowInstances();
List<CcarReportWorkflowInstance> ccarReportWorkflowInstances = StreamSupport.stream(ccarReportWorkflowInstanceSet).filter(ccarReportWorkflowInstance -> DateUtils.isSameDay(cobDate, ccarReportWorkflowInstance.getCobDate()));
ccarRepWfInstDTOs = ccarRepWfInstMapper.ccarRepWfInstsToCcarRepWfInstDTOs(ccarReportWorkflowInstances);
return ccarRepWfInstDTOs;
}
Error I get when I tried to use streams.

Assuming I understood what you are trying to do, you can replace your method body with a single line :
return
validateInstanceSearchParams(reportId, cobDate).getCcarReportWorkflowInstances()
.stream()
.filter(c -> DateUtils.isSameDay(cobDate, c.getCobDate()))
.collect(Collectors.toList());
You can obtain a Stream from the Set by using the stream() method. No need for StreamSupport.stream().
After filtering the Stream, you should collect it into the output List.
I'd use shorter variable and method names. Your code is painful to read.

How to evaluate a complex expression tree against incremental data?

I have a collection of data and a collection of search filters I want to run against that data. The filters follow the LDAP search filter format and are parsed into an expression tree. The data is read one item at a time and processed through all the filters. Intermediate match results are stored in each leaf node of the tree until all the data has been processed. Then the final results are obtained by traversing the tree and applying the logical operators to each leaf node's intermediate result. For example, if I have the filter (&(a=b)(c=d)) then my tree will look like this:
root = "&"
left = "a=b"
right = "c=d"
So if a=b and c=d then both the left and right child nodes are a match and thus the filter is a match.
The data is a collection of different types of objects, each with their own fields. For example, assume the collection represents a class at a school:
class { name = "math" room = "12A" }
teacher { name = "John" age = "35" }
student { name = "Billy" age = "6" grade = "A" }
student { name = "Jane" age = "7" grade = "B" }
So a filter might look like (&(teacher.name=John)(student.age>6)(student.grade=A)) and be parsed like so:
root = "&"
left = "teacher.name=John"
right = "&"
left = "student.age>6"
right = "student.grade=A"
I run the class object against it; no matches. I run the teacher object against it; root.left is a match. I run the first student node against it; root.right.right is a match. I run the second student node against it; root.right.left is a match. Then I traverse the tree and determine that all nodes matched and thus the final result is a match.
The problem is the intermediate matches need to be constrained based upon commonality: the student.age and student.grade filters need to somehow be tied together in order to store an intermediate match only if they match for the same object. I can't for the life of me figure out how to do this.
My filter node abstract base class:
class FilterNode
{
public:
virtual void Evaluate(string ObjectName, map<string, string> Attributes) = 0;
virtual bool IsMatch() = 0;
};
I have a LogicalFilterNode class that handles logical AND, OR, and NOT operations; it's implementation is pretty straightforward:
void LogicalFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
m_Left->Evaluate(ObjectName, Attributes);
m_Right->Evaluate(ObjectName, Attributes);
}
bool LogicalFilterNode::IsMatch()
{
switch(m_Operator)
{
case AND:
return m_Left->IsMatch() && m_Right->IsMatch();
case OR:
return m_Left->IsMatch() || m_Right->IsMatch();
case NOT:
return !m_Left->IsMatch();
}
return false;
}
Then I have a ComparisonFilterNode class that handles the leaf nodes:
void ComparisonFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
if(ObjectName == m_ObjectName) // e.g. "teacher", "student", etc.
{
foreach(string_pair Attribute in Attributes)
{
Evaluate(Attribute.Name, Attribute.Value);
}
}
}
void ComparisonFilterNode::Evaluate(string AttributeName, string AttributeValue)
{
if(AttributeName == m_AttributeName) // e.g. "age", "grade", etc.
{
if(Compare(AttributeValue, m_AttributeValue) // e.g. "6", "A", etc.
{
m_IsMatch = true;
}
}
}
bool ComparisonFilterNode::IsMatch() { return m_IsMatch; }
How it's used:
FilterNode* Root = Parse(...);
foreach(Object item in Data)
{
Root->Evaluate(item.Name, item.Attributes);
}
bool Match = Root->IsMatch();
Essentially what I need is for AND statements where the children have the same object name, the AND statement should only match if the children match for the same object.

Create a new unary "operator", let's call it thereExists, which:
Does have state, and
Declares that its child subexpression must be satisfied by a single input record.
Specifically, for each instance of a thereExists operator in an expression tree you should store a single bit indicating whether or not the subexpression below this tree node has been satisfied by any of the input records seen so far. These flags will initially be set to false.
To continue processing your dataset efficiently (i.e. input record by input record, without having to load the entire dataset into memory), you should first preprocess the query expression tree to pull out a list of all instances of the thereExists operator. Then as you read in each input record, test it against the child subexpression of each of these operators that still has its satisfied flag set to false. Any subexpression that is now satisfied should toggle its parent thereExists node's satisfied flag to true -- and it would be a good idea to also attach a copy of the satisfying record to the newly-satisfied thereExists node, if you want to actually see more than a "yes" or "no" answer to the overall query.
You only need to evaluate tree nodes above a thereExists node once, after all input records have been processed as described above. Notice that anything referring to properties of an individual record must appear somewhere beneath a thereExists node in the tree. Everything above a thereExists node in the tree is only allowed to test "global" properties of the collection, or combine the results of thereExists nodes using logical operators (AND, OR, XOR, NOT, etc.). Logical operators themselves can appear anywhere in the tree.
Using this, you can now evaluate expressions like
root = "&"
left = thereExists
child = "teacher.name=John"
right = "|"
left = thereExists
child = "&"
left = "student.age>6"
right = "student.grade=A"
right = thereExists
child = "student.name = Billy"
This will report "yes" if the collection of records contains both a teacher whose name is "John" and either a student named "Billy" or an A student aged over 6, or "no" otherwise. If you track satisfying records as I suggested, you'll also be able to dump these out in the case of a "yes" answer.
You could also add a second operator type, forAll, which checks that its subexpression is true for every input record. But this is probably not as useful, and in any case you can simulate forAll(expr) with not(thereExists(not(expr))).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio