I'm trying to use async/await to run long running tasks in parallel and run them at the same time but I cannot add the tasks to the task collection and add them to the Task.WhenAll() because it immediately starts running the tasks. I have the following code:
var task1 = _bll.longRunningTask1().ConfigureAwait(false);
var task2 = _bll.longRunningTask2().ConfigureAwait(false);
The above long running tasks are already running by this point so cannot first store them in a collection and then put them into Task.WhenAll() and then run them. What is the correct way of doing this to get them into the collection and run them only with the Task.WhenAll()?
List<Task> tasks = new List<Task>();
tasks.Add(task1);
tasks.Add(task2);
await Task.WhenAll(tasks);
I'm trying to use async/await to ... run them at the same time
You can do asynchronous concurrency by using Task.WhenAll.
The above long running tasks are already running by this point so cannot first store them in a collection and then put them into Task.WhenAll() and then run them.
Task.WhenAll does not run tasks. Task.WhenAll merely (asynchronously) waits for multiple tasks to complete.
What is the correct way of doing this
It's the code you already have:
var task1 = _bll.longRunningTask1().ConfigureAwait(false);
var task2 = _bll.longRunningTask2().ConfigureAwait(false);
await Task.WhenAll(task1, task2).ConfigureAwait(false);
which is pretty much equivalent to explicitly creating a List<T>:
var task1 = _bll.longRunningTask1().ConfigureAwait(false);
var task2 = _bll.longRunningTask2().ConfigureAwait(false);
List<Task> tasks = new List<Task>();
tasks.Add(task1);
tasks.Add(task2);
await Task.WhenAll(tasks).ConfigureAwait(false);
Either of these should work just fine.
Related
I am doing some heavy processing that needs async methods. One of my methods returns a list of dictionaries that needs to go through heavy processing prior to adding it to another awaitable object. ie.
def cpu_bound_task_here(record):
```some complicated preprocessing of record```
return record
After the answer given below by the kind person, my code is now just stuck.
async def fun():
print("Socket open")
record_count = 0
symbol = obj.symbol.replace("-", "").replace("/", "")
loop = asyncio.get_running_loop()
await obj.send()
while True:
try:
records = await obj.receive()
if not records:
continue
record_count += len(records)
So what the above function does, is its streaming values asynchronously and does some heavy processing prior to pushing to redis indefinitely. I made the necessary changes and now I'm stuck.
As that output tells you, run_in_executor returns a Future. You need to await it to get its result.
record = await loop.run_in_executor(
None, something_cpu_bound_task_here, record
)
Note that any arguments to something_cpu_bound_task_here need to be passed to run_in_executor.
Additionally, as you've mentioned that this is a CPU-bound task, you'll want to make sure you're using a concurrent.futures.ProcessPoolExecutor. Unless you've called loop.set_default_executor somewhere, the default is an instance of ThreadPoolExecutor.
with ProcessPoolExecutor() as executor:
for record in records:
record = await loop.run_in_executor(
executor, something_cpu_bound_task_here, record
)
Finally, your while loop is effectively running synchronously. You need to wait for the future and then for obj.add before moving on to process the next item in records. You might want to restructure your code a bit and use something like gather to allow for some concurrency.
async def process_record(record, obj, loop, executor):
record = await loop.run_in_executor(
executor, something_cpu_bound_task_here, record
)
await obj.add(record)
async def fun():
loop = asyncio.get_running_loop()
records = await receive()
with ProcessPoolExecutor() as executor:
await asyncio.gather(
*[process_record(record, obj, loop, executor) for record in records]
)
I'm not sure how to handle obj since that isn't defined in your example, but I'm sure you can figure that out.
Check out the library Pypeln, it is perfect for streaming tasks between process, thread, and asyncio pools:
import pypeln as pl
data = get_iterable()
data = pl.task.map(f1, data, workers=100) # asyncio
data = pl.thread.flat_map(f2, data, workers=10)
data = filter(f3, data)
data = pl.process.map(f4, data, workers=5, maxsize=200)
I want add and delete schedule tasks in runtime from views, is this possible? Maybe someone has an example code or good aticle about it?
Consider this approach. Instead of adding and deletion scheduled tasks, you may check every minute (or with another precision) actual moment against your views and run necessary tasks immediately. This will be easier. Check Quartz Scheduler, its CronExpression has isSatisfiedBy(Date date) method.
#Scheduled(cron = "5 * * * * *) // do not set seconds to zero, cause it may fit xx:yy:59
public void runTasks() {
LocalTime now = LocalTime.now(); // or Date now = new Date();
// check and run
}
I have met the same problem with you. Maybe, I can provide a not such good solution with the help of redis or database.
In the scheduled task, you can read a flag from redis, then you can decide if continue the task. For example
#Scheduled(cron = "....")
void myTask() {
Boolean flag = readFlagFromRedis(); // you can write the flag into redis or database to control the task
if (flag) {
// continue your task
}
}
then, you can control the tasks schedule in runtime.
Although I don't think this is a beautiful solution, it can meet your requirements
I need to create a workflow which runs any time an incident is created or updated (or one workflow for each).
When I create a workflow and set the "Table" to Incident, it will run every time an incident is created, but it doesn't run when an incident is updated. I've searched through the wiki and read a slideshow talk on workflow creation, but so far no dice.
Thanks.
You would need to create a business rule on the Incident table that would call your workflow each time there is an update:
var updateOwner = new GlideRecord('wf_workflow');
updateOwner.addQuery('name', '<workflow_name>');
updateOwner.query();
if (updateOwner.next()) {
var wf = new Workflow();
var workflowId = '' + updateOwner.sys_id;
var vars = {};
wf.startFlow(workflowId, current, current.operation, vars);
gs.addInfoMessage('Workflow initiated.');
}
I am unable to get the total number of users in my USER class through JavaScript. And I want to schedule a task that calculates the number of users in my USER class every midnight.
To get total number of users -
var GameScore = Parse.Object.extend("GameScore");
Parse.Cloud.useMasterKey(); // FOR QUERYING USERS
var query = new Parse.Query(GameScore);
query.find({
success: function(results) {
// results.length - total number of objects available.
},
error: function(error) {
}
});
To Schedule a TASK at every regular interval, you have to use Jobs in Cloud Code.
Steps -
Write a method in Javascript as per documentation.
Host it using deploy to Cloud code.
Go to - Jobs available on LHS of your Dashboard of your app. Usually # https://parse.com/apps/APP_NAME/cloud_code/jobs
Then make a schedule. There is an option of executing task at particular time, everyday. Of course, you can use it to make your choice of Interval using code.
I have the task: need to select data from "TABLE_FROM", modify it and insert to the "TABLE_TO". The main problem is script must run on production and shouldn't hurts live site performance, but "TABLE_FROM" contains hundred millions of rows. Going to run the script using nodejs. What techniques are using to resolve such kind of problems? ie. how to make this script running "slowly" or other words "softly" to prevent DB and CPU overload?
Time of script execution is irrelevant. I use Cassandra DB.
Sample code:
var OFFSET = 0;
var BATCHSIZE = 100;
var TIMEOUT = 1000;
function fetchPush() {
// fetch from TABLE_FROM, possibly in batches
rows = fetch(OFFSET, BATCHSIZE);
// push to TABLE_TO
push(rows);
// do next batch in timeout
setTimeout(fetchPush, TIMEOUT);
}
Here I'm assuming the fetch and push are blocking calls, for async processing you could (obviously) use async.