We are currently designing a web service based process, in which we will be using the web-service invoke and receive steps to communicate with a Microsoft biz-talk server.
Our main concern is that a task on the receive step can wait for some time (up to one week) until the biz-talk responds to us, which (we think) would incur a performance penalty on the workflow system as it will be polling for response.
My question is, is there any known performance considerations for the receive step, specially for leaving work items for extended periods?
No, I don't think there will be any undue "overhead". Yes, internally the process engine "polls". For just about anything. Including invoking components, or executing timers. But from a system perspective, you're just waiting for a request.
It sounds like a "receive" step is exactly the right solution here.
Related
Reading Spring in action 5th edition chapter 11, last paragraph in section 11.1.2
By accepting a Mono as input, the method is invoked immediately
without waiting for the Taco to be resolved from the request body. And
because the repository is also reactive, it’ll accept a Mono and
immediately return a Flux, from which you call next() and return
the resulting Mono … all before the request is even processed!
How the service will immediately return before the request is even processed? Isn't that counter-intuitive?
I mean should the request be processed first before returning a response?
The book has everything you need. It is a well-written book, just make sure to read carefully while actually (make sure to download the source code from Manning) running the code. It will help you understand better.
From the book (https://livebook.manning.com/book/spring-in-action-fifth-edition/chapter-11/v-7/6):
11.1 Working with Spring WebFlux
Typical Servlet-based web frameworks, such as Spring MVC, are blocking
and multithreaded in nature, using a single thread per connection. As
requests are handled, a worker thread is pulled from a thread pool to
process the request. Meanwhile, the request thread is blocked until it
is notified by the worker thread that it is finished.
Consequently, blocking web frameworks do not scale effectively under
heavy request volume. Latency in slow worker threads makes things even
worse, because it will take longer for the worker thread to be
returned to the pool to be ready to handle another request.
In some use cases, this arrangement is perfectly acceptable. In fact,
this is largely how most web applications have been developed for well
over a decade. But times are changing and the clients of these web
applications have grown from people occasionally viewing websites on
the web browser to people frequently consuming content and using
applications that consume APIs. And these days the so-called "Internet
of Things" where humans aren’t even involved while cars, jet engines,
and other non-traditional clients are constantly exchanging data with
our APIs. With an increasing number of clients consuming our web
applications, scalability is more important than ever.
Asynchronous web frameworks, in contrast, achieve higher scalability
with fewer threads—generally one per CPU core. By applying a technique
known as event looping (as illustrated in Figure 11.1), these
frameworks are able to handle many requests per thread, making the
per-connection cost much cheaper.
In an event loop, everything is handled as an event, including
requests and callbacks from intensive operations (such as database and
network operations). When a costly operation is needed, the event loop
registers a callback for that operation to be performed in parallel
while the event loop moves on to handle other events. When the
operation is complete, the completion is treated as an event by the
event loop the same as requests. As a result, asynchronous web
frameworks are able to scale better under heavy request volume with
fewer threads (and thus reduced overhead for thread management).
Read the rest of this section and it will clarify any other concern.
Also, check Reactor https://github.com/reactor/reactor-core
For a complete example if you are still having difficulties https://www.baeldung.com/spring-webflux
We are planning to migrate from monolith to micro-services based architecture. Now i own the responsibility of talking a module out of monolith.
Existing Monolith:
1) Code is very tightly coupled.
2) APIs are called recursively with different parameters.
3) Some of the calls with-in the module which i am planning to extract out contains calls to a system which takes approx 9 minutes to complete.
Unfortunately that's a synchronous.
Points to note:
1) I am starting with a single api migration which is a very important one and is not performing well.
2) This api consists of parallel calls to another system for performing
bunch of tasks. All the calls are blocking and time-consuming (consider
avg response time to be 5-6 min)
Moving to microservice based architecture : There are 2 approaches that comes to my mind while moving the aforementioned api from monolith to a separate microservice, along with solving the problem of blocking threads due to time taking blocking calls.
a) moving in phases :
- Create a separate module
- In this module provide an api to push events to kafka, another
module will in-turn process the request and push the response back
to kafka
- monolith for now will call above mentioned api to push events to
kafka
- New module will inturn call back the monolith when the task
complete (received response on a separate topic in kafka)
- Monolith once get response for all the tasks will trigger some post
processing activity.
Advantage:
1) It will solve the problem of sync- blocking call.
Disadvantage:
1) Changes are required in the monolith, which could introduce some
bugs.
2) No fallbacks are available for the case if bug gets introduced.
b) Move the API at once to the microservice :
Initially which will share common
data source with the monolith and solve the problem of blocking calls
via introduction of kafka between new microservice and the module which
takes time to process the request.
Advantage:
1) Fallback is available in monolith
Disadvantage:
1) Initially data source is shared between the systems.
What should be the best approach to do these kinds of complex tasks ?
You have to take care of several things.
First
Going to microservice will be slower (90% of the time) than a monolith because you introduce latency. So never forget it when you go with it.
Second
You ask if it is a good way to go with kafka. I may answer yes in most of the case but you mentioned that today the process is synchronous. If it is for transactional reasons, you won't be able to solve it with a message broker I guess because you'll update your strong consistency system to an eventually one. https://en.wikipedia.org/wiki/Eventual_consistency
I am not saying that it is a bad solution only that it change your workflow and may impact some business rules.
As a solution I offer this:
1 - Break the seams in your monolith by introducing functional key and api composition inside the monolith (read Sam Newman's book to help).
2 - Introduce the eventual consistency inside the monolith to test if it fits the purpose. It will be easier to rollback if not.
No you have to possibility:
The second step went well so go ahead and put the code of the service into a microservice out of the monolith.
The second step did not fit then think about doing the risky thing in a specific service or use distributed transactions (be careful with this solution it could be hard to manage).
I believe the best approach would be moving option 1: Moving in phases. However, it is possible to do it while having a fallback strategy. You can keep the a version of the untouched backend to serve as a backup if your new service encounters issues.
The approach is described in more details in the article: Low risk monolith to microservice evolution It provides more details in the implementation and the thought processes behind why a phased approach has lower risk. However, the need to change the backend would still be present, but hopefully mitigated through unit testing.
I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.
In order to improve my flows, I would like to test a few scenarios in which the flow/App or Integration Node is stopped while the message is still being processed (to test how transactional my flows actually are, depending on different settings). As IIB9 is fast with processing simple requests, I don't have the time to shut down the flow quickly enough.
I tried to use the debugger, but that doesn't seem to work; I cannot stop the flow or App while debugging, and shutting down the Integration Node doesn't seem to work well either.
Is there an (in-build) way to make the broker work really slowly so I have the time to shut it down? Or should I just think of a really complicated compute node to keep it occupied for a few seconds?
Any suggestions (also for the latter if that is the best option) are welcome.
Really complicated compute node will take a lot of CPU. I would prefer making flow wait for something.
Eg. A flow with HTTP request node or SOAP request node making call to external service. Make this external service take time like say 120 seconds.
We have a webservice, which will be called to provide the delivery date of the product, while purchasing in eComm website.
We are using IBM Sterling Order Management in the backend, and its OOB webservice and its OOB service.
This webservice (WSDL) is taking more time, more than 40 seconds, which create timeoutexception in other integrated systems (Middleware).
So we want to improve the performance of this webservice. Could you please help me to provide the way to improve the performance ? Will it be improved if the Server's spec has been upgraded ? As it the OOB service, we can't customize it.
First of all you need to figure out the performance bottleneck. To start with you could put a verbose trace on the OOB Webservice. Use the logs and see if you can zero-in on any particular component or sql taking consuming majority of the time. If it's sql, you can tune/baseline the OOB query/tables using indexes.
If you have any user exits implemented (for the OOB API), ensure that they are lean and aren't making any expensive API calls like changeOrder API.
One of the questions to be asked here would be if the webservice needs to respond with the actual processing results or if it could move the actual processing to the background eg: separate integration server and just respond with a simple acknowledgement of the webservice request. If the service only needs to respond with an acknowledgement you could possibly move the actual processing to a separate async service.
First try to find out where the actual problem is and hence here the few pointers,
1) Check in OMS how much time the service is taking with the same input which you are using ti invoke the webservice.
2) If from OMS end response time is fine then check the network latency/bandwidth.
3) CPU usage while hitting the webservice.