Testing transactionality of IBM Integration Bus - ibm-integration-bus

In order to improve my flows, I would like to test a few scenarios in which the flow/App or Integration Node is stopped while the message is still being processed (to test how transactional my flows actually are, depending on different settings). As IIB9 is fast with processing simple requests, I don't have the time to shut down the flow quickly enough.
I tried to use the debugger, but that doesn't seem to work; I cannot stop the flow or App while debugging, and shutting down the Integration Node doesn't seem to work well either.
Is there an (in-build) way to make the broker work really slowly so I have the time to shut it down? Or should I just think of a really complicated compute node to keep it occupied for a few seconds?
Any suggestions (also for the latter if that is the best option) are welcome.

Really complicated compute node will take a lot of CPU. I would prefer making flow wait for something.
Eg. A flow with HTTP request node or SOAP request node making call to external service. Make this external service take time like say 120 seconds.

Related

Filenet BPM Webservice receive step design consederations

We are currently designing a web service based process, in which we will be using the web-service invoke and receive steps to communicate with a Microsoft biz-talk server.
Our main concern is that a task on the receive step can wait for some time (up to one week) until the biz-talk responds to us, which (we think) would incur a performance penalty on the workflow system as it will be polling for response.
My question is, is there any known performance considerations for the receive step, specially for leaving work items for extended periods?
No, I don't think there will be any undue "overhead". Yes, internally the process engine "polls". For just about anything. Including invoking components, or executing timers. But from a system perspective, you're just waiting for a request.
It sounds like a "receive" step is exactly the right solution here.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

Mesosphere Marathon API performance

I'm currently designing with my team a system, that orchestrates a large number of containers in Marathon. We need to get current status of apps in Marathon and have here two options:
Poll the list of tasks through the API. Probably GET /v2/apps/ and GET /v2/apps/{app_id} API resources are going to be used.
Receive realtime events from the Event Bus.
Well, the second option seems to be more optimal, but anyway, I would like to know, how performant is Marathon's API.
How much load can Marathon take? Can it process, let's say, 1K requests per second?
PS: We want to deliver status updates into UI. Since we can start and stop the apps this status is of dynamic in nature. Most of the apps run only for 1-2 minutes, however, some can run for as long as it is required unless stopped.
If you want state information, the Event Bus via the /events endpoint is probably not the right way to takle this, because if delivers a stream of events. Actual this would mean that you have to track the overall state yourself...
I'd recommend to use GET /v2/apps with an additional embed parameter, see the docs.
For example
GET /v2/apps?embed=apps.tasks
To me it not really clear why you'd have to call this 1k times/sec. Your frontend UI will probably not be capable of doing that I guess...

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Spring Batch Parallel Job Scaling

I'm currently working on a Spring Batch POC and have got a pretty good handle on most of the actual Spring Batch features. I've currently got a program that uses Spring Integration to receive an HttpRequest and use message channels to eventually send the job executions to the job launcher in a queue. What we'd really like to do is implement some kind of "scheduler/load balancer" (not quite sure what to call it) before the job launcher that will look at the currently running worker nodes and the size of the input file and make a decision on how many worker nodes the job should be allowed. We would probably also want to be able to change the amount of worker nodes a job has while it is running to allow more jobs to run.
The idea is that we'd have a server running that could accept many job requests at any time, and a large cluster of machines that jobs will be partitioned onto. We'd like to be able to scale horizontally, so whenever the server isn't busy it can make full use of the hardware, as well as being able to make sure that small jobs don't get constantly blocked by larger jobs.
From my research it seems like we'd have to implement another framework to do this (do GridGain and Hadoop allow this?), but I figured I'd ask to see what people recommended to do something like this, and if there's a way to do it without implementing another large framework.
Sorry if anything is unclear or confusing, I'm just a lowly intern who started learning Spring and Spring Batch last month and I'm far from completely understanding everything, especially this scaling stuff. Just ask and I'll try to clear things up.
Thanks for any help!
Take a look at the 'spring-batch-integration' project under the spring-batch-admin umbrella project https://github.com/SpringSource/spring-batch-admin
It has a number of examples of using spring-integration to distribute work to other nodes. IN particular see the chunk and partition packages. Just swap out the spring integration channels with jms channel adapters. By distributing work partitions via JMS, you can scale out the number of worker nodes as needed.
There are a number of threads on this subject in the spring integration forum; search for 'PartitionHandler'.
Hope that helps.

Resources