I am using the Mutiny library within the Quarkus framework in Java 11.
I am wonderring which is the best way of running several events sequentially by storing them into a Multi object. I am going to describe my issue in the following java-like pseudocode:
for(P1 p1 : params1){
for(P2 p2 : params2){
multiObject.add(functionThatRetunsUni(p1, p2))
}
}
multiObject.runAll().sequentially();
I need to develop the actions sequentially since the function described in the pseudocode persist entities in a DB, so it maybe the case that two of the calls to the method need to persist the same entity.
I don't know about the best way, but I tend to use a builder object for running several Uni sequentially.
// I'm just assuming the return type of functionThatReturnsUni is Uni<String> for this brief example
Builder<String> builder = Uni.join().builder();
for (P1 p1 : params1){
for (P2 p2 : params2){
builder.add(functionThatReturnsUni(p1, p2));
}
}
return builder.joinAll().andFailFast();
Related
I have the method below, where I am calling several ReactiveMongoRepositories in order to receive and process certain documents. Since I am kind of new to Webflux, I am learning as I go.
To my feeling the code below doesn't feel very efficient, as I am opening multiple streams at the same time. This non-blocking way of writing code makes it complicated somehow to get a value from a stream and re-use that value in the cascaded flatmaps down the line.
In the example below I have to call the userRepository twice, since I want the user at the beginning and than later as well. Is there a possibility to do this more efficiently with Webflux?
public Mono<Guideline> addGuideline(Guideline guideline, String keycloakUserId) {
Mono<Guideline> guidelineMono = userRepository.findByKeycloakUserId(keycloakUserId)
.flatMap(user -> {
return teamRepository.findUserInTeams(user.get_id());
}).zipWith(instructionRepository.findById(guideline.getInstructionId()))
.zipWith(userRepository.findByKeycloakUserId(keycloakUserId))
.flatMap(objects -> {
User user = objects.getT2();
Instruction instruction = objects.getT1().getT2();
Team team = objects.getT1().getT1();
if (instruction.getTeamId().equals(team.get_id())) {
guideline.setAddedByUser(user.get_id());
guideline.setTeamId(team.get_id());
guideline.setDateAdded(new Date());
guideline.setGuidelineStatus(GuidelineStatus.ACTIVE);
guideline.setGuidelineSteps(Arrays.asList());
return guidelineRepository.save(guideline);
} else {
return Mono.error(new InstructionDoesntBelongOrExistException("Unable to add, since this Instruction does not belong to you or doesn't exist anymore!"));
}
});
return guidelineMono;
}
i'll post my earlier comment as an answer. If anyone feels like writing the correct code for it then go ahead.
i don't have access to an IDE current so cant write an example but you could start by fetching the instruction from the database.
Keep that Mono<Instruction> then you fetch your User and flatMap the User and fetch the Team from the database. Then you flatMap the team and build a Mono<Tuple> consisting of Mono<Tuple<User, Team>>.
After that you take your 2 Monos and use zipWith with a Combinator function and build a Mono<Tuple<User, Team, Instruction>> that you can flatMap over.
So basically fetch 1 item, then fetch 2 items, then Combinate into 3 items. You can create Tuples using the Tuples.of(...) function.
Scenario: Read records from DB and create 4 different output files from it.
Tech Stack:
Springboot 2.x
springBatch 4.2.x
ArangoDB 3.6.x
Current Approach: SpringBatch job which has the below steps in sequence:
jobBuilderFactory.get("alljobs")
.start(step("readAllData")) //reads all records from db, stores it in Obj1 (R1)
.next(step("processData1")) //(P1)
.next(step("writer1")) // writes it to file1(W1)
.next(step("reader2")) // reads the same obj1(R2)
.next(step("processor2")) // processes it (P2)
.next(step("writer2")) // writes it to file1(W2)
.next(step("reader3")) // reads the same obj1 (R3)
.next(step("processor3")) // processes it (P3)
.next(step("writer3")) // writes it to file1(W3)
.next(step("reader4")) // reads the same obj1(R4)
.next(step("processor4")) // processes it (P4)
.next(step("writer4")) // writes it to file1 (W4)
.build()
Problem: Since the volume of data coming from DB is HUGE, > 200,000 records, hence now we are fetching the records via cursor in a batch of 10,000 records.
Target state of the job: A reader job which fetches the records from DB via a cursor in batch of 1000 records:
For each batch of 1000 records I have to run processor and writer for the same.
Also, since for all the rest 3 processor and writers, the data set will be the same (Obj1 which will be fetched from the cursor), triggering them in parallel.
Reader1() {
while(cursor.hasNext()) {
Obj1 = cursor.next();
a) P1(Obj1); | c) R2(Obj1); | c) R3(Obj1); | c) R4(Obj1); ||
b) W1(Obj1); | d) P2(Obj1); | d) P3(Obj1); | d) P4(Obj1); || All these running in parallel.
| e) W2(Obj1); | e) W3(Obj1); | e) W4(Obj1); ||
}
}
Below are approaches that popped in my mind:
Invoke the Job inside the cursor itself and execute all steps P1....W4 inside the cursor iteration by iteration.
Invoke a job which has first step as Reader1, and then inside the cursor, invoke another subJob which has all these P1....W4 in parallel, since we can not go out of the cursor.
Kindly suggest the best way to implement.
Thanks in Advance.
Update:
I was trying to make the steps(P1....W4) inside My Reader1 step in a loop , but am stuck with the implementation as everything here is written as a Step and am not sure how to call multiple steps inside R1 step in a loop . I tried using a Decider , putting P1...W4 in a Flow(flow) :
flowbuilder.start(step("R1"))
.next(decider())
.on(COMPLETED).end()
.from(decider())
.on(CONTINUE)
.flow(flow)
job.start(flow)
.next(flow).on("CONTINUE").to(endJob()).on("FINISHED").end()
.end()
.build()
But I am not able to go back to the next cursor iterations , since the cursor iteration is there in the R1 step only.
I also tried to put all steps R1...W4(including Reader1) in the same flow, but the flow ended up throwing cyclic flow error .
Kindly suggest what should be a better way to implement this? How to make all the other steps called in parallel inside the cursor iterating in R1 step.
I believe using 4 parallel steps is a good option for you. Even if you would have 4 threads reading from the same data, you should benefit from parallel steps during the processing/writing phases. This should definitely perform better than 4 steps in sequence. BTW, 200k records is not that much (of course it depends on the record size and how it is mapped, but I think this should be ok, reading data is never the bottleneck).
It's always about trade-offs.. Here I'm trading a bit of read duplication for a better overall throughput thanks to parallel steps. I would not kill my self to make sure items are read only once and complicate things.
A good analogy of such a trade-off in the database world is accepting some data duplication in favor of faster queries (think of NoSQL design where it is sometime recommended to duplicate some data to avoid expensive joins).
This is how I finally designed the solution:
So, I re-framed the whole flow from a Tasklet based approach to an Orchestrated Chunk Based approach .
Job will have 1 step called - fetchProcessAndWriteData .
jobBuilderFactory.get("allChunkJob")
.start(step("fetchProcessAndWriteData"))
.next(step("updatePostJobRunDetails"))
.build()
fetchProcessAndWriteData : will have a reader , masterProcessor and masterWriter with a chunk size of 10,000 .
steps
.get("fetchProcessAndWriteData")
.chunk(BATCHSIZE)
.reader(chunkReader)
.processecor(masterProcessor)
.writer(masterWriter)
.listener(listener())
.build()
chunkReader- reader data in chunks from the database cursor and pass it on to the masterProcessor .
masterProcessor - accepts data one by one and pass the records to all the other processecors - P1, P2, P3, P4
and stores the processed data in a compositeResultBean .
CompositeResultBean consists of data holders for all 4 types of records .
List<Record> recordType1.
List<Record> recordType2.
List<Record> recordType3.
List<Record> recordType4.
This bean is then returned from the process method of the masterProcessor .
public Object process(Object item){
..
bean.setRecordType1(P1.process(item));
bean.setRecordType2(P2.process(item));
bean.setRecordType3(P3.process(item));
bean.setRecordType4(P4.process(item));
return bean;
}
masterWriter - this step accepts a List of records i.e. list of compositeResultBean here. Iterate on the list of bean and call the respective
writers W1, W2,W3,W4 writer() method with the data held in each of compositeResultBean attributes .
public void write(List list) {
list.forEach(record -> {
W1.write(isInitialBatch,list.getRecordType1());
W2.write(isInitialBatch,list.getRecordType2());
W3.write(isInitialBatch,list.getRecordType3());
W4.write(isInitialBatch,list.getRecordType4());
});
}
These whole steps are carried in a batch of 10k records and write the data into the file .
Another challenge that I faced during writing the File was that I would have to replace the already existing file for the very first time the record are written ,but have to append for the later ones in the same file .
I solved this problem by overring chunkListener in the masterWriter - where I pulled in the batch # and set a static flag isInitialBatch defaulting to TRUE.
This variable is set inside the
beforeChunk()
if chunkContext.getStepContext().getStepExecution().getCommitCount()==0 as TRUE , else FALSE .
The same boolean is passed int he FileWriter which opens the file in append - TRUE or FALSE mode .
W1.write(isInitialBatch,list.getRecordType1());
I am using Akka in Play Controller and performing ask() to a actor by name publish , and internal publish actor performs ask to multiple actors and passes reference of sender. The controller actor needs to wait for response from multiple actors and create a list of response.
Please find the code below. but this code is only waiting for 1 response and latter terminating. Please suggest
// Performs ask to publish actor
Source<Object,NotUsed> inAsk = Source.fromFuture(ask(publishActor,service.getOfferVerifyRequest(request).getPayloadData(),1000));
final Sink<String, CompletionStage<String>> sink = Sink.head();
final Flow<Object, String, NotUsed> f3 = Flow.of(Object.class).map(elem -> {
log.info("Data in Graph is " +elem.toString());
return elem.toString();
});
RunnableGraph<CompletionStage<String>> result = RunnableGraph.fromGraph(
GraphDSL.create(
sink , (builder , out) ->{
final Outlet<Object> source = builder.add(inAsk).out();
builder
.from(source)
.via(builder.add(f3))
.to(out); // to() expects a SinkShape
return ClosedShape.getInstance();
}
));
ActorMaterializer mat = ActorMaterializer.create(aSystem);
CompletionStage<String> fin = result.run(mat);
fin.toCompletableFuture().thenApply(a->{
log.info("Data is "+a);
return true;
});
log.info("COMPLETED CONTROLLER ");
If you have several responses ask won't cut it, that is only for a single request-response where the response ends up in a Future/CompletionStage.
There are a few different strategies to wait for all answers:
One is to create an intermediate actor whose only job is to collect all answers and then when all partial responses has arrived respond to the original requestor, that way you could use ask to get a single aggregate response back.
Another option would be to use Source.actorRef to get an ActorRef that you could use as sender together with tell (and skip using ask). Inside the stream you would then take elements until some criteria is met (time has passed or elements have been seen). You may have to add an operator to mimic the ask response timeout to make sure the stream fails if the actor never responds.
There are some other issues with the code shared, one is creating a materializer on each request, these have a lifecycle and will fill up your heap over time, you should rather get a materializer injected from play.
With the given logic there is no need whatsoever to use the GraphDSL, that is only needed for complex streams with multiple inputs and outputs or cycles. You should be able to compose operators using the Flow API alone (see for example https://doc.akka.io/docs/akka/current/stream/stream-flows-and-basics.html#defining-and-running-streams )
class A {
def algorithmImplementation (...) = { ... }
}
object A {
def algorithmImplementation (...) = { ... }
}
In which circumstances should the class be used and in which should the object be used (for implementing an algorithm, e.g. Dijkstra-Algorithm, as shown above) ?
Which criterias should be considered when making such a decision ?
At the moment, I can not really see what the beneftis of using a class are.
If you only have one implementation, this can largely be a judgement call. You mentioned Dijkstra's algorithm, which runs on a graph. Now you can write that algorithm to take a graph object as an explicit parameter. In that case, the algorithm would presumably appear in the Graph singleton object. Then it might be called as something like Graph.shortestPath(myGraph,fromNode,toNode).
Or you can write the algorithm in the Graph class, in which case it no longer takes the graph as an explicit parameter. Now it is called as something like myGraph.shortestPath(fromNode,toNode).
The latter case probably makes more sense when there is one main argument (eg, the graph), especially one that serves as a kind of context for the algorithm. But it may come down to which syntax you prefer.
However, if you have multiple implementations, the balance tips more toward the class approach, especially when the choice of which implementation is better depends on the choice of representation. For example, you might have two different implementations of shortestPath, one that works better on adjacency matrices and one that works better on adjacency lists. With the class approach, you can easily have two different graph classes for the two different representations, and each can have its own implementation of shortestPath. Then, when you call myGraph.shortestPath(fromNode,toNode), you automatically get the right implementation, even if you don't know whether myGraph uses adjacency matrices or adjacency lists. (This is kind of the whole point of OO.)
Classes can have subclasses which override implementation, objects cannot be subclassed.
Classes can also be type parametric where objects cannot be.
There's only ever one instance of the object, or at least one instance per container. A class has multiple instances. That means that the class can be parameterized with values
class A(param1 : int, param2 : int) {
def algorithmImplementation(arg : List[String]) = // use arg and params
}
And that can be reused like
val A42_13 = new A(42, 13)
val result1 = A42_13.algorithmImplementation(List("hello", "world"))
val result2 = A42_13.algorithmImplementation(List("goodbye", "cruel", "world"))
To bring all this home relative to your example of Djikstra's algorithm: imagine you want to write one implementation of the algorithm that is reusable across multiple node types. Then you might want to parameterize by the Node type, the type of metric used to measure distance, and the function used to calculate distance.
val Djikstra[Node, Metric <: Comparable[Metric]](distance : (Node, Node) => Metric) {
def compute(node : Node, nodes : Seq[Node]) : Seq[Metric] = {...}
}
You create one instance of Djikstra per distinct type of node/metric/distance function that you use in your program and reuse that instance without having to pass all that information in everytime you compute Djikstra.
In summary, classes are more flexible. Use them when you need the flexibility. Otherwise objects are fine.
Sorry if this is basic but I was trying to pick up on .Net 3.5.
Question: Is there anything great about Func<> and it's 5 overloads? From the looks of it, I can still create a similar delgate on my own say, MyFunc<> with the exact 5 overloads and even more.
eg: public delegate TResult MyFunc<TResult>() and a combo of various overloads...
The thought came up as I was trying to understand Func<> delegates and hit upon the following scenario:
Func<int,int> myDelegate = (y) => IsComposite(10);
This implies a delegate with one parameter of type int and a return type of type int. There are five variations (if you look at the overloads through intellisense). So I am guessing that we can have a delegate with no return type?
So am I justified in saying that Func<> is nothing great and just an example in the .Net framework that we can use and if needed, create custom "func<>" delegates to suit our own needs?
Thanks,
The greatness lies in establishing shared language for better communication.
Instead of defining your own delegate types for the same thing (delegate explosion), use the ones provided by the framework. Anyone reading your code instantly grasps what you are trying to accomplish.. minimizes the time to 'what is this piece of code actually doing?'
So as soon as I see a
Action = some method that just does something and returns no output
Comparison = some method that compares two objects of the same type and returns an int to indicate order
Converter = transforms Obj A into equivalent Obj B
EventHandler = response/handler to an event raised by some object given some input in the form of an event argument
Func = some method that takes some parameters, computes something and returns a result
Predicate = evaluate input object against some criteria and return pass/fail status as bool
I don't have to dig deeper than that unless it is my immediate area of concern. So if you feel the delegate you need fits one of these needs, use them before rolling your own.
Disclaimer: Personally I like this move by the language designers.
Counter-argument : Sometimes defining your delegate may help communicate intent better. e.g. System.Threading.ThreadStart over System.Action. So it’s a judgment call in the end.
The Func family of delegates (and their return-type-less cousins, Action) are not any greater than anything else you'd find in the .NET framework. They're just there for re-use so you don't have to redefine them. They have type parameters to keep things generic. E.g., a Func<T0,bool> is the same as a System.Predicate<T> delegate. They were originally designed for LINQ.
You should be able to just use the built-in Func delegate for any value-returning method that accepts up to 4 arguments instead of defining your own delegate for such a purpose unless you want the name to reflect your intention, which is cool.
Cases where you would absolutely need to define your delegate types include methods that accept more than 4 arguments, methods with out, ref, or params parameters, or recursive method signatures (e.g., delegate Foo Foo(Foo f)).
In addition to Marxidad's correct answer:
It's worth being aware of Func's related family, the Action delegates. Again, these are types overloaded by the number of type parameters, but declared to return void.
If you want to use Func/Action in a .NET 2.0 project but with a simple route to upgrading later on, you can cut and paste the declarations from my version comparison page. If you declare them in the System namespace then you'll be able to upgrade just by removing the declarations later - but then you won't be able to (easily) build the same code in .NET 3.5 without removing the declarations.
Decoupling dependencies and unholy tie-ups is one singular thing that makes it great. Everything else one can debate and claim to be doable in some home-grown way.
I've been refactoring slightly more complex system with an old and heavy lib and got blocked on not being able to break compile time dependency - because of the named delegate lurking on "the other side". All assembly loading and reflection didn't help - compiler would refuse to just cast a delegate() {...} to object and whatever you do to pacify it would fail on the other side.
Delegate type comparison which is structural at compile time turns nominal after that (loading, invoking). That may seem OK while you are thinking in terms of "my darling lib is going to be used forever and by everyone" but it doesn't scale to even slightly more complex systems. Fun<> templates bring a degree of structural equivalence back into the world of nominal typing . That's the aspect you can't achieve by rolling out your own.
Example - converting:
class Session (
public delegate string CleanBody(); // tying you up and you don't see it :-)
public static void Execute(string name, string q, CleanBody body) ...
to:
public static void Execute(string name, string q, Func<string> body)
Allows completely independent code to do reflection invocation like:
Type type = Type.GetType("Bla.Session, FooSessionDll", true);
MethodInfo methodInfo = type.GetMethod("Execute");
Func<string> d = delegate() { .....} // see Ma - no tie-ups :-)
Object [] params = { "foo", "bar", d};
methodInfo.Invoke("Trial Execution :-)", params);
Existing code doesn't notice the difference, new code doesn't get dependence - peace on Earth :-)
One thing I like about delegates is that they let me declare methods within methods like so, this is handy when you want to reuse a piece of code but you only need it within that method. Since the purpose here is to limit the scope as much as possible Func<> comes in handy.
For example:
string FormatName(string pFirstName, string pLastName) {
Func<string, string> MakeFirstUpper = (pText) => {
return pText.Substring(0,1).ToUpper() + pText.Substring(1);
};
return MakeFirstUpper(pFirstName) + " " + MakeFirstUpper(pLastName);
}
It's even easier and more handy when you can use inference, which you can if you create a helper function like so:
Func<T, TReturn> Lambda<T, TReturn>(Func<T, TReturn> pFunc) {
return pFunc;
}
Now I can rewrite my function without the Func<>:
string FormatName(string pFirstName, string pLastName) {
var MakeFirstUpper = Lambda((string pText) => {
return pText.Substring(0,1).ToUpper() + pText.Substring(1);
});
return MakeFirstUpper(pFirstName) + " " + MakeFirstUpper(pLastName);
}
Here's the code to test the method:
Console.WriteLine(FormatName("luis", "perez"));
Though it is an old thread I had to add that func<> and action<> also help us use covariance and contra variance.
http://msdn.microsoft.com/en-us/library/dd465122.aspx