Since hack is a single threaded language, what is the benefit of using a concurrent block?
concurrent {
await func_a;
await func_b;
}
My understanding is that one job is waiting until the other job is over.
Concurrent doesn't mean multithreading
The concurrent block will wait for all async operations (awaitables) in that block similarly to javascript Promise.all (also single threaded).
Without concurrent:
await func_a; // 1 sec
await func_b; // 2 sec
await func_c; // 3 sec
// will get here after at least 6 seconds (sum of requests time)
With concurrent:
concurrent {
await func_a; // 1 sec
await func_b; // 2 sec
await func_c; // 3 sec
}
// will get here after at least 3 seconds (longest request time)
It fits if you want to make multiple IO requests in parallel.
It doesn't fit if you want to run multiple CPU jobs.
Related
I'm currently working on using paralellism in a Flux. Right now I'm having problems with the backpressure. In our case we have a fast producing service we want to consume, but we are much slower.
With a normal flux, this works so far, but we want to have parallelism. What I see when I'm using the approach with
.parallel(2)
.runOn(Schedulers.parallel())
that there is a big request on the beginning, which takes quite a long time to process. Here occurs also a different problem, that if we take too long to process, we somehow seem to generate a cancel event in the producer service (we consume it via webflux rest-call), but no cancel event is seen in the consumer.
But back to problem 1, how is it possible to bring this thing back to sync. I know of the prefetch parameter on the .parallel() method, but it does not work as I expect.
A minimum example would be something like this
fun main() {
val atomicInteger = AtomicInteger(0)
val receivedCount = AtomicInteger(0)
val processedCount = AtomicInteger(0)
Flux.generate<Int> {
it.next(atomicInteger.getAndIncrement())
println("Emitted ${atomicInteger.get()}")
}.doOnEach { it.get()?.let { receivedCount.addAndGet(1) } }
.parallel(2, 1)
.runOn(Schedulers.parallel())
.flatMap {
Thread.sleep(200)
log("Work on $it")
processedCount.addAndGet(1)
Mono.just(it * 2)
}.subscribe {
log("Received ${receivedCount.get()} and processed ${processedCount.get()}")
}
Thread.sleep(25000)
}
where I can observe logs like this
...
Emitted 509
Emitted 510
Emitted 511
Emitted 512
Emitted 513
2022-02-02T14:12:58.164465Z - Thread[parallel-1,5,main] Work on 0
2022-02-02T14:12:58.168469Z - Thread[parallel-2,5,main] Work on 1
2022-02-02T14:12:58.241966Z - Thread[parallel-1,5,main] Received 513 and processed 2
2022-02-02T14:12:58.241980Z - Thread[parallel-2,5,main] Received 513 and processed 2
2022-02-02T14:12:58.442218Z - Thread[parallel-2,5,main] Work on 3
2022-02-02T14:12:58.442215Z - Thread[parallel-1,5,main] Work on 2
2022-02-02T14:12:58.442315Z - Thread[parallel-2,5,main] Received 513 and processed 3
2022-02-02T14:12:58.442338Z - Thread[parallel-1,5,main] Received 513 and processed 4
So how could I adjust that thing that I can use the parallelism but stay in backpressure/sync with my producer? The only way I got it to work is with a semaphore acquired before the parallelFlux and released after work, but this is not really a nice solution.
Ok for this szenario it seemed crucial that prefetch of parallel and runOn had to bet seen very low, here to 1.
With defaults from 256, we requested too much from our producer, so that there was already a cancel event because of the long time between the first block of requests for getting the prefetch and the next one when the Flux decided to fill the buffer again.
Reading the asyncio documentation, I realize that I don't understand a very basic and fundamental aspect: the difference between awaiting a coroutine directly, and awaiting the same coroutine when it's wrapped inside a task.
In the documentation examples the two calls to the say_after coroutine are running sequentially when awaited without create_task, and concurrently when wrapped in create_task. So I understand that this is basically the difference, and that it is quite an important one.
However what confuses me is that in the example code I read everywhere (for instance showing how to use aiohttp), there are many places where a (user-defined) coroutine is awaited (usually in the middle of some other user-defined coroutine) without being wrapped in a task, and I'm wondering why that is the case. What are the criteria to determine when a coroutine should be wrapped in a task or not?
What are the criteria to determine when a coroutine should be wrapped in a task or not?
You should use a task when you want your coroutine to effectively run in the background. The code you've seen just awaits the coroutines directly because it needs them running in sequence. For example, consider an HTTP client sending a request and waiting for a response:
# these two don't make too much sense in parallel
await session.send_request(req)
resp = await session.read_response()
There are situations when you want operations to run in parallel. In that case asyncio.create_task is the appropriate tool, because it turns over the responsibility to execute the coroutine to the event loop. This allows you to start several coroutines and sit idly while they execute, typically waiting for some or all of them to finish:
dl1 = asyncio.create_task(session.get(url1))
dl2 = asyncio.create_task(session.get(url2))
# run them in parallel and wait for both to finish
resp1 = await dl1
resp2 = await dl2
# or, shorter:
resp1, resp2 = asyncio.gather(session.get(url1), session.get(url2))
As shown above, a task can be awaited as well. Just like awaiting a coroutine, that will block the current coroutine until the coroutine driven by the task has completed. In analogy to threads, awaiting a task is roughly equivalent to join()-ing a thread (except you get back the return value). Another example:
queue = asyncio.Queue()
# read output from process in an infinite loop and
# put it in a queue
async def process_output(cmd, queue, identifier):
proc = await asyncio.create_subprocess_shell(cmd)
while True:
line = await proc.readline()
await queue.put((identifier, line))
# create multiple workers that run in parallel and pour
# data from multiple sources into the same queue
asyncio.create_task(process_output("top -b", queue, "top")
asyncio.create_task(process_output("vmstat 1", queue, "vmstat")
while True:
identifier, output = await queue.get()
if identifier == 'top':
# ...
In summary, if you need the result of a coroutine in order to proceed, you should just await it without creating a task, i.e.:
# this is ok
resp = await session.read_response()
# unnecessary - it has the same effect, but it's
# less efficient
resp = await asyncio.create_task(session.read_reponse())
To continue with the threading analogy, creating a task just to await it immediately is like running t = Thread(target=foo); t.start(); t.join() instead of just foo() - inefficient and redundant.
I have a .Net Core application with Hangfire implementation.
There is a recurring job per minute as below:-
RecurringJob.AddOrUpdate<IS2SScheduledJobs>(x => x.ProcessInput(), Cron.MinuteInterval(1));
var hangfireOptions = new BackgroundJobServerOptions
{
WorkerCount = 20,
};
_server = new BackgroundJobServer(hangfireOptions);
The ProcessInput() internally checks the BlockingCollection() of some Ids to process, it keeps on continuously processing.
There is a time when the first ten ProcessInput() jobs keeps on processing with 10 workers where as the other new ProcessInput() jobs gets enqueued.
For this purpose, I wanted to increase the workers count, say around 50, so that there would be 50 ProcessInput() jobs being processed parallely.
Please suggest.
Thanks.
In the .NET Core version you can set the worker count when adding the Hangfire Server:
services.AddHangfireServer(options => options.WorkerCount = 50);
you can try increasing the worker process using this method. You have to know the number of processors running on your servers. To get 100 workers, you can try
var options = new BackgroundJobServerOptions { WorkerCount = Environment.ProcessorCount * 25 };
app.UseHangfireServer(options);
Source
I have 1000 http API requests to be made. I have kept them all as promises in an array. I want to execute them in "BATCHES" of 100 at a time - not more than that to avoid hitting any API rate-limit / throttling etc.
While bluebirdJS provides the .map() function with the concurrency option what it does is it limits the number of calls made AT A TIME. Meaning it will ensure that no more than 100 concurrent requests are being worked on at a time - as soon as the 1st request resolves, it will begin processing the 101st request - it doesn't wait for all the 100 to resolve first before starting with the next 100.
The "BATCHING" behavior i am looking for is to first process the 100 requests, and ONLY AFTER all of the 100 requests have completed it should begin with the next 100 requests.
Does BlueBirdJS provide any API out of the box to handle batches this way?
You can split big urls array to an array of batches. For each batch run Promise#map which will be resolved when all async operations are finished. And run these batches in sequence using Array#reduce.
let readBatch(urls) {
return Promise.map(url => request(url));
}
let read(urlBatches) {
return urlBatches.reduce((p, urls) => {
return p.then(() => readBatch(urls));
}, Promise.resolve());
}
const BATCH_SIZE = 100;
let urlBatches = [];
for (let i = 0; i < urls.length; i+= BATCH_SIZE) {
let batch = array.slice(i, i + BATCH_SIZE);
urlBatches.push(batch);
}
read(urlBatches)
.then(() => { ... }) // will be called when all 1000 urls are processed
I am running a database download in a background thread. The threads work fine and I execute group wait before continuing.
The problem I have is that I need to start an activity indicator and it seems that due to the group_wait it gets blocked.
Is there a way to run such heavy process, ensure that all threads get completed while allowing the activity indicator to run?
I start the activity indicator with (I also tried starting the indicator w/o the dispatch_async):
dispatch_async(dispatch_get_main_queue(), {
activityIndicator.startAnimating()
})
After which, I start the thread group:
let group: dispatch_group_t = dispatch_group_create()
let queue: dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0) //also tried QOS_CLASS_BACKGROUND
while iter > 0 {
iter--
dispatch_group_enter(group)
dispatch_group_async(group, queue, {
do {
print("in queue \(iter)")
temp += try query.findObjects()
query.skip += query.limit
} catch let error as NSError {
print("Fetch failed: \(error.localizedDescription)")
}
dispatch_group_leave(group)
})
}
// Wait for all threads to finish and proceed
As I am using Parse, I have modified the code as follows (psuedo code for simplicity):
trigger the activity indicator with startAnimating()
call the function that hits Parse
set an observer in the Parse class on an int to trigger an action when the value reaches 0
get count of new objects in Parse
calculate how many loop iterations I need to pull all the data (using max objects per query = 1000 which is Parse max)
while iterations > 0 {
create a Parse query object
set the query skip value
use query.findObjectsInBackroundWithBlock ({
pull objects and add to a temp array
observer--
)}
iterations--
}
When the observer hits 0, trigger a delegate to return to the caller
Works like a charm.