Initial call on ServiceFabric proxy is VERY slow - performance

Whenever a I'm calling one service fabric service from another, the first call on the proxy is VERY slow i.e. 100x slower than all subsequent calls. I've put timings in that record the time immediately before the call and then the time immediately in the service method being called and this can easily be over 60 seconds! The service fabric cluster is a standalone cluster running on 12 nodes/VM's.
Interestingly the length of time the first call takes seems to be related to the number of nodes i.e. if I deactivate half the nodes the time is reduced (though not by half). Also when running the exact same code on a dev cluster running on my local PC the length of time the first call take is typically around 8 second with subsequent calls taking < 10ms on either system. In addition, creating another proxy to the same service in the same client process still result in fast call times, it seem as if the proxy factory (which I believe SF caches per client process) is created on first use of the proxy and take a very long time.
Interestingly no exceptions are thrown and the services actually work!
So my question is, why does it take so long the first time a call is made from one service to another on a proxy created with ServiceProxy.Create()?

According to The SF remoting docs (See below, emphasis mine), ServiceProxy.Create is a wrapper around ServiceProxyFactory, and the first call also involves setting up the factory for the subsequent calls.
ServiceProxyFactory is a factory that creates proxy for different remoting interfaces. If you use API ServiceProxy.Create for creating proxy, then framework creates the singleton ServiceProxyFactory. It is useful to create one manually when you need to override IServiceRemotingClientFactory properties. Factory is an expensive operation. ServiceProxyFactory maintains cache of communication client. Best practice is to cache ServiceProxyFactory for as long as possible.

I have not experienced slow resolution anywhere near what you have, however I create my proxies when my API service starts up using dependency injection.
The way I have my system set up is that the stateless API service ( core) communicates with the backend SF services.
It's possible that I am actually experiencing a longer delay, but by the time I go to use the application the resolution process has already started and finished, rather than the resolution starting when I make the first request to the app.
private void InitializeContainer(IApplicationBuilder app)
// Add application presentation components:
// Add application services.
Container.Register(() => ServiceProxy.Create<IContestService>(FabricUrl.ContestService), Lifestyle.Transient);
Container.Register(() => ServiceProxy.Create<IFriendService>(FabricUrl.FriendService), Lifestyle.Transient);
Container.Register(() => ServiceProxy.Create<IUserService>(FabricUrl.UserService), Lifestyle.Transient);
Container.Register(() => ServiceProxy.Create<IBillingService>(FabricUrl.BillingService), Lifestyle.Transient);
// Cross-wire ASP.NET services (if any). For instance:
// NOTE: Prevent cross-wired instances as much as possible.
// See:


Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk)

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?
It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

Azure Functions - Java CosmosClientBuilder slow on initial connection

we're using Azure Cloud Functions with the Java SDK and connect to the Cosmos DB using the following Java API
CosmosClient client = new CosmosClientBuilder()
This buildClient() starts a connection to CosmosDB, which takes 2 to 3 seconds.
The subsequent database queries using that client are fast.
Only this first setup of the connection is pretty slow.
We keep the CosmosClient as a static variable, so we can reuse it between multiple http requests that go to our function.
But once the function is getting cold (when Azure shuts it down after a few minutes unused), the static variable gets lost and will be reconnected, when the function is started up again.
Is there a way to make this initial connection to cosmos DB faster?
Or do you think we need to increase the time a function stays online, if we need faster response times?
This is a expected behavior, see
The first request a client does needs to go through a warm-up step. This warm-up consists of fetching the account information, container information, routing and partitioning information in order to know where to route the requests (as you experienced, further requests do not get this extra latency). Hence the importance of maintaining a singleton instance.
In some Functions plan (Consumption) instances get de-provisioned if there is no activity, in which case, any existing instance of the client is destroyed, so when a new instance is provisioned, your first request will pay this warm-up cost.
There are currently no workaround I'm aware of in the Java SDK but this should not affect your P99 latency since it's just the first request on a cold client.
Hope this and the video help with the reason.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?
It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps
With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.
To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

WCF Service - Startup takes extra time

I find that WCF service will take 8-10 seconds to load the first hit. After that it will take less than a second.
Any thoughts?
Probably due to .NET's cold start. Have you looked at setting up the IIS Warmup Module which initializes dependancies before an initial request?
From the Learn IIS website
Decrease the response time for first requests by pre-loading worker processes. The IIS Application Warm-Up module lets you configure the Web application to be pre-loaded before the first request arrives so that the worker process responds to the first Web request more quickly.
Increase reliability by pre-loading worker processes when overlapped recycling occurs. Because the recycled worker process in an overlapped recycling scenario only communicates its readiness and starts accepting requests after it finishes loading and initializing the resources as specified by the configuration, pre-loading the dependencies reduces the response times for the first requests.
Customize the pre-loading of applications. You can configure the IIS Application Warm-Up module to initialize Web applications by using specific Web pages and user identities. This makes it possible to create specific initialization processes that can be executed synchronously or asynchronously, depending on the initialization logic. In addition, these procedures can use specific identities to ensure a proper initialization.

performance of accessing a mono server application via remoting

This is my setting: I have written a .NET application for local client machines, which implements a feature that could also be used on a webpage. To keep this example simple, assume that the client installs a software into which he can enter some data and gets some data back.
The idea is to create a webpage that holds a form into which the user enters the same data and gets the same results back as above. Due to the company's available web servers, the first idea was to create a mono webservice, but this was dismissed for reasons unknown. The "service" is not to be run as a webservice, but should be called by a PHP script. This is currently realized by calling the mono application via shell_exec from PHP.
So now I am stuck with a mono port of my application, which works fine, but takes way too long to execute. I have already stripped out all unnecessary dlls, methods etc, but calling the application via the command line - submitting the desired data via commandline parameters - takes approximately 700ms. We expect about 10 hits per second, so this could only work when setting up a lot of servers for this task.
I assume the 700m are related to the cost of starting the application every time, because it does not make much difference in terms of time if I handle the request only once or five hundred times (I take the original input, vary it slighty and do 500 iterations with "new" data every time. Starting from the second iteration, the processing time drops down to approximately 1ms per iteration)
My next idea was to setup the mono application as a remoting server, so that it only has to be started once and can then handle incoming requests. I therefore wrote another mono application that serves as the client. Calling the client, letting the client pass the data to the server and retrieving the result now takes 344ms. This is better, but still way slower than I would expect and want it to be.
I have then implemented a new project from scratch based on this blog post and get stuck with the same performance issues.
The question is: am I missing something related to the mono-projects that could improve the speed of the client/server? Although the idea of creating a webservice for this task was dismissed, would a webservice perform better under these circumstances (as I would not need the client application to call the service), although it is said that remoting is faster than webservices?
I could have made that clearer, but implementing a webservice is currently not an option (and please don't ask why, I didn't write the requirements ;))
Meanwhile I have checked that it's indeed the startup of the client, which takes most of the time in the remoting scenario.
I could imagine accessing the server via pipes from the command line, which would be perfectly suitable in my scenario. I guess this would be done using sockets?
You can try to use AOT to reduce the startup time. On .NET you would use ngen for that purpoise, on mono just do a mono --aot on all assemblies used by your application.
AOT'ed code is slower than JIT'ed code, but has the advantage of reducing startup time.
You can even try to AOT framework assemblies such as mscorlib and System.
I believe that remoting is not an ideal thing to use in this scenario. However your idea of having mono on server instead of starting it every time is indeed solid.
Did you consider using SOAP webservices over HTTP? This would also help you with your 'web page' scenario.
Even if it is a little to slow for you in my experience a custom RESTful services implementation would be easier to work with than remoting.
