Does the Grand Central Dispatch API allow an execution context (thread) to query any thread-specific state during runtime? Specifically, is there a GCD equivalent to the OpenMP call
omp_get_thread_num()?
If you would like to execute an operation n-times to a bunch of things (like an map operation), you can use dispatch_apply.
dispatch_apply(10, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, NULL), ^(size_t index) {
void * my_thing = my_tings[index];
// ...
});
Related
I am doing some heavy processing that needs async methods. One of my methods returns a list of dictionaries that needs to go through heavy processing prior to adding it to another awaitable object. ie.
def cpu_bound_task_here(record):
```some complicated preprocessing of record```
return record
After the answer given below by the kind person, my code is now just stuck.
async def fun():
print("Socket open")
record_count = 0
symbol = obj.symbol.replace("-", "").replace("/", "")
loop = asyncio.get_running_loop()
await obj.send()
while True:
try:
records = await obj.receive()
if not records:
continue
record_count += len(records)
So what the above function does, is its streaming values asynchronously and does some heavy processing prior to pushing to redis indefinitely. I made the necessary changes and now I'm stuck.
As that output tells you, run_in_executor returns a Future. You need to await it to get its result.
record = await loop.run_in_executor(
None, something_cpu_bound_task_here, record
)
Note that any arguments to something_cpu_bound_task_here need to be passed to run_in_executor.
Additionally, as you've mentioned that this is a CPU-bound task, you'll want to make sure you're using a concurrent.futures.ProcessPoolExecutor. Unless you've called loop.set_default_executor somewhere, the default is an instance of ThreadPoolExecutor.
with ProcessPoolExecutor() as executor:
for record in records:
record = await loop.run_in_executor(
executor, something_cpu_bound_task_here, record
)
Finally, your while loop is effectively running synchronously. You need to wait for the future and then for obj.add before moving on to process the next item in records. You might want to restructure your code a bit and use something like gather to allow for some concurrency.
async def process_record(record, obj, loop, executor):
record = await loop.run_in_executor(
executor, something_cpu_bound_task_here, record
)
await obj.add(record)
async def fun():
loop = asyncio.get_running_loop()
records = await receive()
with ProcessPoolExecutor() as executor:
await asyncio.gather(
*[process_record(record, obj, loop, executor) for record in records]
)
I'm not sure how to handle obj since that isn't defined in your example, but I'm sure you can figure that out.
Check out the library Pypeln, it is perfect for streaming tasks between process, thread, and asyncio pools:
import pypeln as pl
data = get_iterable()
data = pl.task.map(f1, data, workers=100) # asyncio
data = pl.thread.flat_map(f2, data, workers=10)
data = filter(f3, data)
data = pl.process.map(f4, data, workers=5, maxsize=200)
i thought i got the hang of dexie, but now i'm flabbergasted:
two tables, each with a handful of records. Komps & Bretts
output all Bretts
rdb.Bretts.each(brett => {
console.log(brett);
})
output all Komps
rdb.Komps.each(komp=> {
console.log(komp);
})
BUT: this only outputs the Bretts, for some weird reason, Komps is empty
rdb.Bretts.each(brett => {
console.log(brett);
rdb.Komps.each(komp=> {
console.log(komp);
})
})
i've tried all kinds of combinations with async/await, then() etc, the inner loop cannot find any data in the inner table, whatever table i want to something with.
2nd example. This Works:
await rdb.Komps.get(163);
This produces an error ("Failed to execute 'objectStore' on 'IDBTransaction…ction': The specified object store was not found.")
rdb.Bretts.each(async brett => {
await rdb.Komps.get(163);
})
Is there some kind of locking going on? something that can be disabled?
Thank you!
Calling rdb.Bretts.each() will implicitly launch a readOnly transaction limited to 'Bretts' only. This means that within the callback you can only reach that table. And that's the reason why it doesn't find the Comps table at that point. To get access to the Comps table from within the each callback, you would need to include it in an explicit transaction block:
rdb.transaction('r', 'Komps', 'Bretts', () => {
rdb.Bretts.each(brett => {
console.log(brett);
rdb.Komps.each(komp=> {
console.log(komp);
});
});
});
However, each() does not respect promises returned by the callback, so even this fix would not be something that I would recommend either - even if it would solve your problem. You could easlily get race conditions as you loose the control of the flow when launching new each() from an each callback.
I would recommend you to using toArray(), get(), bulkGet() and other methods than each() where possible. toArray() is also faster than each() as it can utilize faster IDB Api IDBObjectStore.getAll() and IDBIndex.getAll() when possible. And you don't nescessarily need to encapsulate the code in a transaction block (unless you really need that atomicy).
const komps = await rdb.Komps.toArray();
await Promise.all(
komps.map(
async komp => {
// Do some async call per komp:
const brett = await rdb.Bretts.get(163));
console.log("brett with id 163", brett);
}
)
);
Now this example is a bit silly as it does the exact same db.Bretts.get(163) for each komp it founds, but you could replace 163 with some dynamic value there.
Conclusion: There are two issues.
The implicit transaction of Dexie's operation and the callback to each() lives within that limited transaction (tied to one single table only) unless you surround the call with a bigger explicit transaction block.
Try avoid to start new async operation within the callback of Dexie's db.Table.each() as it does not expect promises to be returned from its callback. You can do it but it is better to stick with methods where you can keep control of the async flow.
I'm building a Flux app using MartyJS (which is pretty close to "vanilla" Flux and uses the same underlying dispatcher). It contains stores with an inherent dependency relationship. For example, a UserStore tracks the current user, and an InstanceStore tracks instances of data owned by the current user. Instance data is fetched from an API asynchronously.
The question is what to do to the state of the InstanceStore when the user changes.
I've come to believe (e.g. reading answers by #fisherwebdev on SO) that it's most appropriate to make AJAX requests in the action creator function, and to have an AJAX "success" result in an action that in turn causes stores to change.
So, to fetch the user (i.e. log in), I'm making an AJAX call in the action creator function, and when it resolves, I'm dispatching a RECEIVE_USER action with the user as a payload. The UserStore listens to this and updates its state accordingly.
However, I also need to re-fetch all the data in the InstanceStore if the user is changed.
Option 1: I can listen to RECEIVE_USER in the InstanceStore, and if it is a new user, trigger an AJAX request, which in turn creates another action, which in turn causes the InstanceStore to update. The problem with this is that it feels like cascading actions, although technically it's async so the dispatcher will probably allow it.
Option 2: Another way would be for InstanceStore to listen to change events emitted by UserStore and do the request-action dance then, but this feels wrong too.
Option 3: A third way would be for the action creator to orchestrate the two AJAX calls and dispatch the two actions separately. However, now the action creator has to know a lot about how the stores relate to one another.
One of the answers in Where should ajax request be made in Flux app? makes me think option 1 is the right one, but the Flux docs also imply that stores triggering actions is not good.
Something like option 3 seems like the cleanest solution to me, followed by option 1. My reasoning:
Option 2 deviates from the expected way of handling dependencies between stores (waitfor), and you'd have to check after each change event to figure out which ones are relevant and which ones can be ignored, or start using multiple event types; it could get pretty messy.
I think option 1 is viable; as Bill Fisher remarked in the post you linked, it's OK for API calls to be made from within stores provided that the resulting data is handled by calling new Actions. But OK doesn't necessarily mean ideal, and you'd probably achieve better separation of concerns and reduce cascading if you can collect all your API calls and action initiation in one place (i.e. ActionCreators). And that would be consistent with option 3.
However, now the action creator has to know a lot about how the stores
relate to one another.
As I see it, the action creator doesn't need to know anything about what the stores are doing. It just needs to log in a user and then get the data associated with the user. Whether this is done through one API call or two, these are logically very closely coupled and make sense within the scope of one action creator. Once the user is logged in and the data is obtained, you could fire two actions (e.g. LOGGED_IN, GOT_USER_DATA) or even just one action that contains all the data needed for both. Either way, the actions are just echoing what the API calls did, and it's up to the stores to decide what to do with it.
I'd suggest using a single action to update both stores, because this seems like a perfect use case for waitfor: when one action triggers a handler in both stores, you can instruct InstanceStore to wait for UserStore's handler to finish before InstanceStore's handler executes. It would look something like this:
UserStore.dispatchToken = AppDispatcher.register(function(payload) {
switch (payload.actionType) {
case Constants.GOT_USER_DATA:
...(handle UserStore response)...
break;
...
}
});
...
InstanceStore.dispatchToken = AppDispatcher.register(function(payload) {
switch (payload.actionType) {
case Constants.GOT_USER_DATA:
AppDispatcher.waitFor([UserStore.dispatchToken]);
...(handle InstanceStore response)...
break;
...
}
});
Option 1 seems the best choice conceptually to me. There are 2 separate API calls, so you have 2 sets of events.
It's a lot of events in a small amount of code, but Flux relies always using the simple, standard Action->Store->View approach. Once you do something clever (like option 2), you've changed that. If other devs can no longer safely assume that any Action flow works the exact same as every other one, you've lost a big benefit of Flux.
It won't be the shortest approach in code though. MartyJS looks like it will be a little neater than Facebook's own Flux library at least!
A different option; if logins must always refresh the InstanceStore, why not have the login API call include all of the InstanceStore data as well?
(And taking it further; why have 2 separate stores? They seem very strongly coupled either way, and there's no reason you couldn't still make calls to the InstanceStore API without re-calling login anyway)
I usually use promises to resolve such situation.
For example:
// UserAction.js
var Marty = require( 'marty' );
var Constants = require( '../constants/UserConstants' );
var vow = require( 'vow' );
module.exports = Marty.createActionCreators({
...
handleFormEvent: function ( path, e ) {
var dfd = vow.defer();
var prom = dfd.promise();
this.dispatch( Constants.CHANGE_USER, dfd, prom );
}
});
// UserStore.js
var Marty = require( 'marty' );
var Constants = require( '../constants/UserConstants' );
module.exports = Marty.createStore({
id: 'UserStore',
handlers: {
changeUser : UserConstants.CHANGE_USER
},
changeUser: function ( dfd, __ ) {
$.ajax( /* fetch new user */ )
.then(function ( resp ) {
/* do what you need */
dfd.resolve( resp );
});
}
});
// InstanceStore.js
var Marty = require( 'marty' );
var UserConstants = require( '../constants/UserConstants' );
module.exports = Marty.createStore({
id: 'InstanceStore',
handlers: {
changeInstanceByUser : UserConstants.CHANGE_USER
},
changeInstanceByUser: function ( __, prom ) {
prom.then(function ( userData ) {
/* OK, user now is switched */
$.ajax( /* fetch new instance */ )
.then(function ( resp ) { ... });
}
});
I have the task: need to select data from "TABLE_FROM", modify it and insert to the "TABLE_TO". The main problem is script must run on production and shouldn't hurts live site performance, but "TABLE_FROM" contains hundred millions of rows. Going to run the script using nodejs. What techniques are using to resolve such kind of problems? ie. how to make this script running "slowly" or other words "softly" to prevent DB and CPU overload?
Time of script execution is irrelevant. I use Cassandra DB.
Sample code:
var OFFSET = 0;
var BATCHSIZE = 100;
var TIMEOUT = 1000;
function fetchPush() {
// fetch from TABLE_FROM, possibly in batches
rows = fetch(OFFSET, BATCHSIZE);
// push to TABLE_TO
push(rows);
// do next batch in timeout
setTimeout(fetchPush, TIMEOUT);
}
Here I'm assuming the fetch and push are blocking calls, for async processing you could (obviously) use async.
I need to trigger updates of a whole bunch of records in a Salesforce database without really updating any values. This is to make a few formulas to recalculate some fields.
Here's what I tried - a schedulable class (say I want it to run every night):
global class acmePortfolioDummyUpdate implements Schedulable
{
global void execute(SchedulableContext SC)
{
for (Acme_Portfolio__c p : [Select Id From Acme_Portfolio__c]) {
update(p);
}
}
}
update(p) is a DML statement and Salesforce limits the number of them to 150. In my case it's about a few thousands of records.
Also, I need to do this across many different portfolios. SF limits the number of scheduled classes to 10.
Any workaround for this?
Thanks
Try Batch Apex. You can schedule Your batch using schedulable class. Correct me if I'm wrong but aren't formulas recalculated each time You read them?
Edit: Comment don't have enought space.
I'm not guarantee this will compile (dont have access to org right now), but try sth like this:
global class batchClass implements Database.batchable<sObject>{
global Database.QueryLocator start(Database.BatchableContext BC){
return Database.getQueryLocator('Select Id From Acme_Portfolio__c');
}
global void execute(Database.BatchableContext BC, List<sObject> scope){
update scope;
}
global void finish(Database.BatchableContext BC){
}
}
And run this from system log:
Database.executeBatch(new batchClass());