Mesosphere Marathon API performance - performance

I'm currently designing with my team a system, that orchestrates a large number of containers in Marathon. We need to get current status of apps in Marathon and have here two options:
Poll the list of tasks through the API. Probably GET /v2/apps/ and GET /v2/apps/{app_id} API resources are going to be used.
Receive realtime events from the Event Bus.
Well, the second option seems to be more optimal, but anyway, I would like to know, how performant is Marathon's API.
How much load can Marathon take? Can it process, let's say, 1K requests per second?
PS: We want to deliver status updates into UI. Since we can start and stop the apps this status is of dynamic in nature. Most of the apps run only for 1-2 minutes, however, some can run for as long as it is required unless stopped.

If you want state information, the Event Bus via the /events endpoint is probably not the right way to takle this, because if delivers a stream of events. Actual this would mean that you have to track the overall state yourself...
I'd recommend to use GET /v2/apps with an additional embed parameter, see the docs.
For example
GET /v2/apps?embed=apps.tasks
To me it not really clear why you'd have to call this 1k times/sec. Your frontend UI will probably not be capable of doing that I guess...

Related

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.
RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Policy for EC2 and ELB based on number of transcoding processes on each instance

I need to transcode massive number of audio files on a series of auto-scaling instances behind an ELB. The core of transcoding script is based on Node.Js and FFMPEG. Queuing is impossible because users are not patience! I need to control the number of transcodings on each instance to avoid CPU 100% problem.
My questions:
A- Is there any way to define a policy for ELB to control the number of connections to each instance? if not is there any parameter to control average CPU utilization on each instance and add a new one after triggering level? (I have found this slide but it is not complete) If it adds a new instance on the fly how much it takes time the new instance be 100% operative to serve the user ( I mean does auto scaling have long latency?)
B- Is there another alternative architecture to achieve same transcoding solution? (I have included my current idea to this answer as a drawing). I can not use third party solutions like Transcoding.com I need to have my native solution.
C- I use node.js for each instance and by socket to the user browser show progress. From browser side I send regularly some ajax request to the node.js side to get the progress information. Does this mechanism has problem with sticky session?
Thanks you.
If your scaling needs to take place in response to individual requests on the server (i.e. a single request would require X number of machines to execute in desired timeframe), then autoscaling is probably not going to be the answer for you, as you will have delay as the new instances become active. You will also potentially have much higher cost to run service in such manner as you could scale up and time a number of times in response to individual request, charging you for one hour minimum for each instance that is started.
If however you are concerned with autoscaling, to for example, increase your fleet 50% during peak times when you get request volume spikes (i.e. you already have many servers serving many requests, but you just need to keep latency down during peak hours by adding more instances), then autoscaling should probably work just fine for you.
There are any number of triggers you can configure to control scaling events in such a case.
ELB does support session affinity ("sticky" sessions).
You will want to use an AWS SDK. Normally you'd use one of the official ones for C#, Ruby etc. Since you're on node.js, try using this SDK on github to monitor, throttle and create instance connection pools etc.
https://github.com/awssum/awssum
there's also AWS2JS
https://github.com/SaltwaterC/aws2js

Heroku: web dyno vs. worker dyno? How many/what ratio do I need?

I was curious as to what the difference between web and worker dynos is on Heroku. They give a one sentence explanation on their pricing page, but this just left me confused. How do I know how many to pick of each? Is there a ratio I should aim for? I'm pretty new to this stuff, so can someone give an in depth explanation, or maybe some sort of way I can calculate how many and which kind of dynos I would need?
Also, I'm confused about what they mean by the amount of hours for each dyno.
http://www.heroku.com/pricing
I also happened upon this article. As one of their suggested solutions, they said to increase the amount of dynos. Which type of dyno are they referring to here?
http://devcenter.heroku.com/articles/backlog-too-deep
Your best indication if you need more dynos (aka processes on Cedar) is your heroku logs. Make sure you upgrade to expanded logging (it's free) so that you can tail your log.
You are looking for the heroku.router entries and the value you are most interested is the queue value - if this is constantly more than 0 then it's a good sign you need to add more dynos. Essentially this means than there are more requests coming in than your process can handle so they are being queued. If they are queued too long without returning any data they will be timed out.
There's no ideal ratio I'm afraid, you could have an app doing 100 requests a second needing many web processes but just doesn't make use of workers. You only need worker processes if you are doing processing in the background like sending emails etc etc.
ps Backlog too deep would be a Dyno web process that would cause it.
UPDATE: On March 26 2013 Heroku removed queue and wait fields from the log out put.
queue and wait fields have been removed from router log messages.
Also, the Heroku router no longer sets X-Heroku-Dynos-In-Use,
X-Heroku-Queue-Depth and X-Heroku-Queue-Wait-Time HTTP headers for
incoming requests.
Dynos are basically processes that run on your instance. With the new Cedar stack, they can be set up to execute any arbitrary shell command. For web applications, you generally have one process called "web" which is responsible for responding to HTTP requests from users. All other processes are what were previously called "workers." These run continuously in the background for things like cron, processing queues, and any heavy computation that you don't want to tie up your web processes. You can also scale each type of process, so that multiple processes of each type will be booted up for additional concurrency. The amount of each that you use really depends on the needs of your application and the load it receives. You can use tools like the New Relic plugin to monitor these things. Take a look at the articles on the Process Model and the Procfile in Heroku's dev center for more details.
A number of people have mentioned that there is no known ratio and that the ratio of web workers to 'background' workers you will want is dependent on how you designed your application - that is correct. However I thought it might be useful to add that as a general rule of thumb, you want your web workers - and thus the controller actions they are serving - to be lightning quick and very lightweight, to reduce latency in response times from browser actions. If there is some browser action that would require more than, say, about a half a second of real time to serve, then you will probably want to architect some sort of system that pushes the bulk of that action on to a queue.
You would then design an offline worker dyno(s) that will service this queue. They can take much longer because there are no HTTP responses pending on their output. Perhaps the page you rendered from the initial browser request that pushed the action will serve some Javascript that starts a thread that checks to see if the request has finished every 5 seconds, or something along those lines.
I still can't give you a ratio to work with for the same reason others have given, but hopefully this helps you decide how to architect your app. (I should also mention this is just one design out of many valid ones.)
https://stackoverflow.com/a/19965981/1233555 - Heroku has gone to random routing, so some dynos can have queues stacking up (while they serve a lengthy request) while other dynos are free. Avoid this by making sure that all requests are handled very quickly in your web dynos. This will reduce the number of web dynos you need, while requiring more worker dynos.
You also need to care about your web app supporting concurrency, which only some Rails configs do - try Unicorn, or carefully-written code (for I/O that doesn't block the EventMachine) with Thin.
You probably have to try, rather than calculate, to see how many dynos of each kind you need. Make sure their New Relic reports the dyno queue - see the above link.
Short answer is that you need as many as you need to keep your queues down.
As John describes, if you start seeing a queue in your logs then you need more dynos. If you start seeing your background queues getting too long (how you get this info is dependant on what you have implemented) then you need more workers.
There is no ratio as it's very much dependent on your application design and usage.

What Windows API to look into for building a scheduling application?

Why not use the Windows scheduler?
I have several applications that have to run at certain times according to business rules not the typical every weekday at 1pm.
I also need a way for the applications to provide feedback of their progress so that I can have rules that notify me when the applications are running slow or aren't even running anymore.
What Windows API should I be looking into? (like, a time version of the FileWatcher apis)
What's the best way to have the application notify the scheduler of its progress (files, sockets, windows messages, ???)?
For Vista/Win2k8, there's the nice Task Scheduler 2.0 API: http://msdn.microsoft.com/en-us/library/aa384138(VS.85).aspx. Previous version have the Task Scheduler 1.0 API, but I've never used it.
AppControls has a CronJob component that you can use to create scheduled events. This saves your program from having to wake up every minute and check the schedule itself. Instead, just schedule the job and indicate a callback method.
I have used this component for scheduling jobs myself and have been very happy with the way that it works.
I think what you really want is a common framework for your applications that report to something (you or the system messages or tracing or perfmon, event log, whatever) and also to receive via some inter process protocol a way to receive messages and respond.
based on the reporting you can change the scheduling or make changes, etc.
So, there is some monitor app, and then each of your other apps does common reporting.
events I can think of:
- started
- stopped
- error
- normal log messages
- and of course specific things your apps do.
I think there are probably existing classes/framework that do this - you'll have to check around.
If it were me, I would make a service that could talk to all the other apps and perhaps was even an http server. It would be able to route messages to particular apps and start stop those processes and query them.
There are lots of ways to do what you want though. those were just off the top of my head.
Alternatively you might just be able to get these to be services and they handle messages sent to them. Their normal processing does nothing until they are "woken up" with some task command.
You have more questions in one. Normally you should split them. But let's overlook this and try to answer.
To schedule certain events (including running an application): Use TJvScheduledEvents from JVCL. IMHO JVCL is the best Delphi open source library around with extensive number of components, developers & support. TJvScheduledEvents is quite neat, uses threads for event scheduling and also you have in JVCL a detailed editor for your events (it needs a small hack to use it though).
To provide 'feedback' from your applications to a (remote) central point: A very very very good solution (if your requirements permit) is to log the progress of your applications in a table (let's call it LOG) on a Firebird server. In LOG you can have the following fields: COMPUTER, USERNAME, APPNAME, MSG, LOGDATE (etc. etc.). In the After Insert trigger of the LOG table you can fire an event (let's call it NEW_LOG). In your console app you can register the interest for this event and so, your application will be automatically updated with everything which happens in any of your applications, so you can do log analysis, graphs etc. Of course you can do it with IB, but IB costs.
...going on Windows API route you need headers (which probably aren't translated), you'll encounter our dearest Pointers/PChars etc. etc. Of course, building from scratch everything isn't worthwhile but when this is already done in a Delphi way, why don't use it?
Use service with a timer that is fired regulary (for example each minute). It reads the schedule and looks if some are due before the next iteration. If so, you can execute them.
You can add an interface that shows all running apps. For the feedback and query that using a desktop application.

Resources