"Google Cloud Tasks", performance of task creation - performance

I have "Google cloud function" in region europe-west1 (Belgium) which creates tasks in "Google cloud task queue" located in europe-west3 (Germany). It's similar to tutorial: https://cloud.google.com/tasks/docs/tutorial-gcf
It takes about 1-2 seconds for cloud function to create a task. It's not optimized solution because CF has to wait idle so long.
How to optimize task creation time?
Will moving of Cloud function and Task queue to one region significantly improve task creation speed?
Can I just stop cloud function when send "task queue creation" request and don't wait for response? Of course, it's bad practice and I would like to avoid this.
import { v2beta3 } from '#google-cloud/tasks';
// cloud function
(req, res) => {
const client = new v2beta3.CloudTasksClient();
...
// send "create task" request
client.createTask(...);
setTimeout(() => {
// Stop cloud function without processing of queue response
res.sendStatus(200);
}, 100);
}

Ad 1. If you define task name in createTask, this might provide opportunity for optimization. Providing task name affects performance because of additional lookup. Lookup cost is lower if you don't assign task name in advance. If you require it, ensure that subsequent task IDs you use are not sequential. For instance, lookup in a long collection of sequentially-named tasks like taskname1, taskname2, taskname3 is slower than for sa7239112, 92sdkdskfs, 4812kdfsdf.
Documentation:
Because there is an extra lookup cost to identify duplicate task
names, these calls have significantly increased latency. Using hashed
strings for the task id or for the prefix of the task id is
recommended. Choosing task ids that are sequential or have sequential
prefixes, for example using a timestamp, causes an increase in latency
and error rates in all task commands. The infrastructure relies on an
approximately uniform distribution of task ids to store and serve
tasks efficiently.

The API will take as long as it needs to take. I don't believe there is any way to speed up that module.
That sounds like something you could experiment with.
No, a Cloud Function can not terminate until all of its async work is also complete, otherwise Cloud Function will clamp down on the resources in that function, and that work might not finish. You should be using promises to determine when async work completes, and only send the response afterward.

Some sample code provided by Google still refers to the beta client:
const {v2beta3} = require('#google-cloud/tasks');
const client = new v2beta3.CloudTasksClient();
Make sure to change this for the latest version:
const {CloudTasksClient} = require('#google-cloud/tasks');
const client = new CloudTasksClient();
There are significant performance improvements, and doing this has changed my tests from 1-2 seconds per request to milliseconds per request.

Related

ktor server - when to move to another coroutine context

This may be a question about coroutines in general, but in my ktor server (netty engine, default configuration) application I perform serveral asyncronous calls to a database and api endpoint and want to make sure I am using coroutines efficiently. My question are as follows:
Is there a tool or method to work out if my code is using coroutines effectively, or do I just need to use curl to spam my endpoint and measure the performance of moving processes to another context e.g. compute?
I don't want to start moving tasks/jobs to another context 'just in case' but should I treat the default coroutine context in my Route.route() similar to the Android main thread and perform the minimum amount of work on it?
Here is an rough example of the code that I'm using:
fun Route.route() {
get("/") {
call.respondText(getRemoteText())
}
}
suspend fun getRemoteText() : String? {
return suspendCoroutine { cont ->
val document = 3rdPartyLibrary.get()
if (success) {
cont.resume(data)
} else {
cont.resume(null)
}
}
}
You could use something like Apache Jmeter, but writing a script and spamming your server with curl seems also a good option to me
Coroutines are pretty efficient when it comes to context/thread switching, and with Dispatchers.Default and Dispatchers.IO you'll get a thread-pool. There are a couple of documentations around this, but I think you can definitely leverage these Dispatchers for heavy operations
There are few tools for testing endpoints. Jmeter is good, there are also command line tools like wrk, wrk2 and siege.
Of course context switching costs. The coroutine in routing is safe to run blocking operations unless you have the option shareWorkGroup set. However, usually it's good to use a separate thread pool because you can control it's size (max threads number) to not get you database down.

Rate limiting algorithm for throttling request

I need to design a rate limiter service for throttling requests.
For every incoming request a method will check if the requests per second has exceeded its limit or not. If it has exceeded then it will return the amount of time it needs to wait for being handled.
Looking for a simple solution which just uses system tick count and rps(request per second). Should not use queue or complex rate limiting algorithms and data structures.
Edit: I will be implementing this in c++. Also, note I don't want to use any data structures to store the request currently getting executed.
API would be like:
if (!RateLimiter.Limit())
{
do work
RateLimiter.Done();
}
else
reject request
The most common algorithm used for this is token bucket. There is no need to invent a new thing, just search for an implementation on your technology/language.
If your app is high avalaible / load balanced you might want to keep the bucket information on some sort of persistent storage. Redis is a good candidate for this.
I wrote Limitd is a different approach, is a daemon for limits. The application ask the daemon using a limitd client if the traffic is conformant. The limit is configured on the limitd server and the app is agnostic to the algorithm.
since you give no hint of language or platform I'll just give out some pseudo code..
things you are gonna need
a list of current executing requests
a wait to get notified where a requests is finished
and the code can be as simple as
var ListOfCurrentRequests; //A list of the start time of current requests
var MaxAmoutOfRequests;// just a limit
var AverageExecutionTime;//if the execution time is non deterministic the best we can do is have a average
//for each request ether execute or return the PROBABLE amount to wait
function OnNewRequest(Identifier)
{
if(count(ListOfCurrentRequests) < MaxAmoutOfRequests)//if we have room
{
Struct Tracker
Tracker.Request = Identifier;
Tracker.StartTime = Now; // save the start time
AddToList(Tracker) //add to list
}
else
{
return CalculateWaitTime()//return the PROBABLE time it will take for a 'slot' to be available
}
}
//when request as ended release a 'slot' and update the average execution time
function OnRequestEnd(Identifier)
{
Tracker = RemoveFromList(Identifier);
UpdateAverageExecutionTime(Now - Tracker.StartTime);
}
function CalculateWaitTime()
{
//the one that started first is PROBABLY the first to finish
Tracker = GetTheOneThatIsRunnigTheLongest(ListOfCurrentRequests);
//assume the it will finish in avg time
ProbableTimeToFinish = AverageExecutionTime - Tracker.StartTime;
return ProbableTimeToFinish
}
but keep in mind that there are several problems with this
assumes that by returning the wait time the client will issue a new request after the time as passed. since the time is a estimation, you can not use it to delay execution, or you can still overflow the system
since you are not keeping a queue and delaying the request, a client can be waiting for more time that what he needs.
and for last, since you do not what to keep a queue, to prioritize and delay the requests, this mean that you can have a live lock, where you tell a client to return later, but when he returns someone already took its spot, and he has to return again.
so the ideal solution should be a actual execution queue, but since you don't want one.. I guess this is the next best thing.
according to your comments you just what a simple (not very precise) requests per second flag. in that case the code can be something like this
var CurrentRequestCount;
var MaxAmoutOfRequests;
var CurrentTimestampWithPrecisionToSeconds
function CanRun()
{
if(Now.AsSeconds > CurrentTimestampWithPrecisionToSeconds)//second as passed reset counter
CurrentRequestCount=0;
if(CurrentRequestCount>=MaxAmoutOfRequests)
return false;
CurrentRequestCount++
return true;
}
doesn't seem like a very reliable method to control whatever.. but.. I believe it's what you asked..

async and await: are they bad?

We recently developed a site based on SOA but this site ended up having terrible load and performance issues when it went under load. I posted a question related this issue here:
ASP.NET website becomes unresponsive under load
The site is made of an API (WEB API) site which is hosted on a 4-node cluster and a web site which is hosted on another 4-node cluster and makes calls to the API. Both are developed using ASP.NET MVC 5 and all actions/methods are based on async-await method.
After running the site under some monitoring tools such as NewRelic, investigating several dump files and profiling the worker process, it turned out that under a very light load (e.g. 16 concurrent users) we ended up having around 900 threads which utilized 100% of CPU and filled up the IIS thread queue!
Even though we managed to deploy the site to the production environment by introducing heaps of caching and performance amendments many developers in our team believe that we have to remove all async methods and covert both API and the web site to normal Web API and Action methods which simply return an Action result.
I personally am not happy with approach because my gut feeling is that we have not used the async methods properly otherwise it means that Microsoft has introduced a feature that basically is rather destructive and unusable!
Do you know any reference that clears it out that where and how async methods should/can be used? How we should use them to avoid such dramas? e.g. Based on what I read on MSDN I believe the API layer should be async but the web site could be a normal no-async ASP.NET MVC site.
Update:
Here is the async method that makes all the communications with the API.
public static async Task<T> GetApiResponse<T>(object parameters, string action, CancellationToken ctk)
{
using (var httpClient = new HttpClient())
{
httpClient.BaseAddress = new Uri(BaseApiAddress);
var formatter = new JsonMediaTypeFormatter();
return
await
httpClient.PostAsJsonAsync(action, parameters, ctk)
.ContinueWith(x => x.Result.Content.ReadAsAsync<T>(new[] { formatter }).Result, ctk);
}
}
Is there anything silly with this method? Note that when we converted all method to non-async methods we got a heaps better performance.
Here is a sample usage (I've cut the other bits of the code which was related to validation, logging etc. This code is the body of a MVC action method).
In our service wrapper:
public async static Task<IList<DownloadType>> GetSupportedContentTypes()
{
string userAgent = Request.UserAgent;
var parameters = new { Util.AppKey, Util.StoreId, QueryParameters = new { UserAgent = userAgent } };
var taskResponse = await Util.GetApiResponse<ApiResponse<SearchResponse<ProductItem>>>(
parameters,
"api/Content/ContentTypeSummary",
default(CancellationToken));
return task.Data.Groups.Select(x => x.DownloadType()).ToList();
}
And in the Action:
public async Task<ActionResult> DownloadTypes()
{
IList<DownloadType> supportedTypes = await ContentService.GetSupportedContentTypes();
Is there anything silly with this method? Note that when we converted
all method to non-async methods we got a heaps better performance.
I can see at least two things going wrong here:
public static async Task<T> GetApiResponse<T>(object parameters, string action, CancellationToken ctk)
{
using (var httpClient = new HttpClient())
{
httpClient.BaseAddress = new Uri(BaseApiAddress);
var formatter = new JsonMediaTypeFormatter();
return
await
httpClient.PostAsJsonAsync(action, parameters, ctk)
.ContinueWith(x => x.Result.Content
.ReadAsAsync<T>(new[] { formatter }).Result, ctk);
}
}
Firstly, the lambda you're passing to ContinueWith is blocking:
x => x.Result.Content.ReadAsAsync<T>(new[] { formatter }).Result
This is equivalent to:
x => {
var task = x.Result.Content.ReadAsAsync<T>(new[] { formatter });
task.Wait();
return task.Result;
};
Thus, you're blocking a pool thread on which the lambda is happened to be executed. This effectively kills the advantage of the naturally asynchronous ReadAsAsync API and reduces the scalability of your web app. Watch out for other places like this in your code.
Secondly, an ASP.NET request is handled by a server thread with a special synchronization context installed on it, AspNetSynchronizationContext. When you use await for continuation, the continuation callback will be posted to the same synchronization context, the compiler-generated code will take care of this. OTOH, when you use ContinueWith, this doesn't happen automatically.
Thus, you need to explicitly provide the correct task scheduler, remove the blocking .Result (this will return a task) and Unwrap the nested task:
return
await
httpClient.PostAsJsonAsync(action, parameters, ctk).ContinueWith(
x => x.Result.Content.ReadAsAsync<T>(new[] { formatter }),
ctk,
TaskContinuationOptions.None,
TaskScheduler.FromCurrentSynchronizationContext()).Unwrap();
That said, you really don't need such added complexity of ContinueWith here:
var x = await httpClient.PostAsJsonAsync(action, parameters, ctk);
return await x.Content.ReadAsAsync<T>(new[] { formatter });
The following article by Stephen Toub is highly relevant:
"Async Performance: Understanding the Costs of Async and Await".
If I have to call an async method in a sync context, where using await
is not possible, what is the best way of doing it?
You almost never should need to mix await and ContinueWith, you should stick with await. Basically, if you use async, it's got to be async "all the way".
For the server-side ASP.NET MVC / Web API execution environment, it simply means the controller method should be async and return a Task or Task<>, check this. ASP.NET keeps track of pending tasks for a given HTTP request. The request is not getting completed until all tasks have been completed.
If you really need to call an async method from a synchronous method in ASP.NET, you can use AsyncManager like this to register a pending task. For classic ASP.NET, you can use PageAsyncTask.
At worst case, you'd call task.Wait() and block, because otherwise your task might continue outside the boundaries of that particular HTTP request.
For client side UI apps, some different scenarios are possible for calling an async method from synchronous method. For example, you can use ContinueWith(action, TaskScheduler.FromCurrentSynchronizationContext()) and fire an completion event from action (like this).
async and await should not create a large number of threads, particularly not with just 16 users. In fact, it should help you make better use of threads. The purpose of async and await in MVC is to actually give up the thread pool thread when it's busy processing IO bound tasks. This suggests to me that you are doing something silly somewhere, such as spawning threads and then waiting indefinitely.
Still, 900 threads is not really a lot, and if they're using 100% cpu, then they're not waiting.. they're chewing on something. It's this something that you should be looking into. You said you have used tools like NewRelic, well what did they point to as the source of this CPU usage? What methods?
If I were you, I would first prove that merely using async and await are not the cause of your problems. Simply create a simple site that mimics the behavior and then run the same tests on it.
Second, take a copy of your app, and start stripping stuff out and then running tests against it. See if you can track down where the problem is exactly.
There is a lot of stuff to discuss.
First of all, async/await can help you naturally when your application has almost no business logic. I mean the point of async/await is to do not have many threads in sleep mode waiting for something, mostly some IO, e.g. database queries (and fetching). If your application does huge business logic using cpu for 100%, async/await does not help you.
The problem of 900 threads is that they are inefficient - if they run concurrently. The point is that it's better to have such number of "business" threads as you server has cores/processors. The reason is thread context switching, lock contention and so on. There is a lot of systems like LMAX distruptor pattern or Redis which process data in one thread (or one thread per core). It's just better as you do not have to handle locking.
How to reach described approach? Look at disruptor, queue incoming requests and processed them one by one instead of parallel.
Opposite approach, when there is almost no business logic, and many threads just waits for IO is good place where to put async/await into work.
How it mostly works: there is a thread which reads bytes from network - mostly only one. Once some some request arrive, this thread reads the data. There is also limited thread pool of workers which processes requests. The point of async is that once one processing thread is waiting for some thing, mostly io, db, the thread is returned in poll and can be used for another request. Once IO response is ready, some thread from pool is used to finish the processing. This is the way how you can use few threads to server thousand request in a second.
I would suggest that you should draw some picture how your site is working, what each thread does and how concurrently it works. Note that it's necessary to decide whether throughput or latency is important for you.

What is the cost of creating actors in Akka?

Consider a scenario in which I am implementing a system that processes incoming tasks using Akka. I have a primary actor that receives tasks and dispatches them to some worker actors that process the tasks.
My first instinct is to implement this by having the dispatcher create an actor for each incoming task. After the worker actor processes the task it is stopped.
This seems to be the cleanest solution for me since it adheres to the principle of "one task, one actor". The other solution would be to reuse actors - but this involves the extra-complexity of cleanup and some pool management.
I know that actors in Akka are cheap. But I am wondering if there is an inherent cost associated with repeated creation and deletion of actors. Is there any hidden cost associated with the data structures Akka uses for the bookkeeping of actors ?
The load should be of the order of tens or hundreds of tasks per second - think of it as a production webserver that creates one actor per request.
Of course, the right answer lies in the profiling and fine tuning of the system based on the type of the incoming load.
But I wondered if anyone could tell me something from their own experience ?
LATER EDIT:
I should given more details about the task at hand:
Only N active tasks can run at some point. As #drexin pointed out - this would be easily solvable using routers. However, the execution of tasks isn't a simple run and be done type of thing.
Tasks may require information from other actors or services and thus may have to wait and become asleep. By doing so they release an execution slot. The slot can be taken by another waiting actor which now has the opportunity to run. You could make an analogy with the way processes are scheduled on one CPU.
Each worker actor needs to keep some state regarding the execution of the task.
Note: I appreciate alternative solutions to my problem, and I will certainly take them into consideration. However, I would also like an answer to the main question regarding the intensive creation and deletion of actors in Akka.
You should not create an actor for every request, you should rather use a router to dispatch the messages to a dynamic amount of actors. That's what routers are for. Read this part of the docs for more information: http://doc.akka.io/docs/akka/2.0.4/scala/routing.html
edit:
Creating top-level actors (system.actorOf) is expensive, because every top-level actor will initialize an error kernel as well and those are expensive. Creating child actors (inside an actor context.actorOf) is way cheaper.
But still I suggest you to rethink this, because depending on the frequency of the creation and deletion of actors you will also put afditional pressure on the GC.
edit2:
And most important, actors are not threads! So even if you create 1M actors, they will only run on as many threads as the pool has. So depending on the throughput setting in the config every actor will process n messages before the thread gets released to the pool again.
Note that blocking a thread (includes sleeping) will NOT return it to the pool!
An actor which will receive one message right after its creation and die right after sending the result can be replaced by a future. Futures are more lightweight than actors.
You can use pipeTo to receive the future result when its done. For instance in your actor launching the computations:
def receive = {
case t: Task => future { executeTask( t ) }.pipeTo(self)
case r: Result => processTheResult(r)
}
where executeTask is your function taking a Task to return a Result.
However, I would reuse actors from a pool through a router as explained in #drexin answer.
I've tested with 10000 remote actors created from some main context by a root actor, same scheme as in prod module a single actor was created. MBP 2.5GHz x2:
in main: main ? root // main asks root to create an actor
in main: actorOf(child) // create a child
in root: watch(child) // watch lifecycle messages
in root: root ? child // wait for response (connection check)
in child: child ! root // response (connection ok)
in root: root ! main // notify created
Code:
def start(userName: String) = {
logger.error("HELLOOOOOOOO ")
val n: Int = 10000
var t0, t1: Long = 0
t0 = System.nanoTime
for (i <- 0 to n) {
val msg = StartClient(userName + i)
Await.result(rootActor ? msg, timeout.duration).asInstanceOf[ClientStarted] match {
case succ # ClientStarted(userName) =>
// logger.info("[C][SUCC] Client started: " + succ)
case _ =>
logger.error("Terminated on waiting for response from " + i + "-th actor")
throw new RuntimeException("[C][FAIL] Could not start client: " + msg)
}
}
t1 = System.nanoTime
logger.error("Starting of a single actor of " + n + ": " + ((t1 - t0) / 1000000.0 / n.toDouble) + " ms")
}
The result:
Starting of a single actor of 10000: 0.3642917 ms
There was a message stating that "Slf4jEventHandler started" between "HELOOOOOOOO" and "Starting of a single", so the experiment seems even more realistic (?)
Dispatchers was a default (a PinnedDispatcher starting a new thread each and every time), and it seemed like all that stuff is the same as Thread.start() was, for a long long time since Java 1 - 500K-1M cycles or so ^)
That's why I've changed all code inside loop, to a new java.lang.Thread().start()
The result:
Starting of a single actor of 10000: 0.1355219 ms
Actors make great finite state machines so let that help drive your design here. If your request handling state is greatly simplified by having one actor per request then do that. I find that actors are particularly good at managing more than two states as a rule of thumb.
Commonly though, one request handling actor that references request state from within a collection that it maintains as part of its own state is a common approach. Note that this can also be achieved with an Akka reactive stream and the use of the scan stage.

Building an high performance node.js application with cluster and node-webworker

I'm not a node.js master, so I'd like to have more points of view about this.
I'm creating an HTTP node.js web server that must handle not only lots of concurrent connections but also long running jobs. By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection.
For example:
var http = require('http');
http.createServer(function (req, res) {
doSomething(); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
So I was thinking to run all the long running jobs in separate threads using the node-webworker library:
var http = require('http');
var sys = require('sys');
var Worker = require('webworker');
http.createServer(function (req, res) {
var w = new Worker('doSomething.js'); // This takes a long time to execute
// Return a response
}).listen(1337, "127.0.0.1");
And to make the whole thing more performant, I thought to also use cluster to create a new node process for each CPU core.
In this way I expect to balance the client connections through different processes with cluster (let's say 4 node processes if I run it on a quad-core), and then execute the long running job on separate threads with node-webworker.
Is there something wrong with this configuration?
I see that this post is a few months old, but I wanted to provide a comment to this in the event that someone comes along.
"By default node.js runs on one process, and if there's a piece of code that takes a long time to execute any subsequent connection must wait until the code ends what it's doing on the previous connection."
^-- This is not entirely true. If doSomething(); is required to complete before you send back the response, then yes, but if it isn't, you can make use of the Asynchronous functionality available to you in the core of Node.js, and return immediately, while this item processes in the background.
A quick example of what I'm explaining can be seen by adding the following code in your server:
setTimeout(function(){
console.log("Done with 5 second item");
}, 5000);
If you hit the server a few times, you will get an immediate response on the client side, and eventually see the console fill with the messages seconds after the response was sent.
Why don't you just copy and paste your code into a file and run it over JXcore like
$ jx mt-keep:4 mysourcefile.js
and see how it performs. If you need a real multithreading without leaving the safety of single threading try JX. its 100% node.JS 0.12+ compatible. You can spawn the threads and run a whole node.js app inside each of them separately.
You might want to check out Q-Oper8 instead as it should provide a more flexible architecture for this kind of thing. Full info at:
https://github.com/robtweed/Q-Oper8

Resources